Commit Graph

14403 Commits

Author SHA1 Message Date
Nelson Elhage
6b8c92ba07 netlink: Make nlmsg_find_attr take a const nlmsghdr*.
This will let us use it on a nlmsghdr stored inside a netlink_callback.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-04 12:26:34 -07:00
Sjur Brændeland
2c24a5d1b4 caif: SPI-driver bugfix - incorrect padding.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:03 -07:00
André Carvalho de Matos
f2527ec436 caif: Bugfix for socket priority, bindtodev and dbg channel.
Changes:
o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute priority instead.

o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute ifindex instead.

o Wrong assert statement for RFM layer segmentation.

o CAIF Debug channels was not working over SPI, caif_payload_info
  containing padding info must be initialized.

o Check on pointer before dereferencing when unregister dev in caif_dev.c

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:03 -07:00
Uwe Kleine-König
b595076a18 tree-wide: fix comment/printk typos
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-01 15:38:34 -04:00
Linus Torvalds
1840897ab5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  b43: Fix warning at drivers/mmc/core/core.c:237 in mmc_wait_for_cmd
  mac80211: fix failure to check kmalloc return value in key_key_read
  libertas: Fix sd8686 firmware reload
  ath9k: Fix incorrect access of rate flags in RC
  netfilter: xt_socket: Make tproto signed in socket_mt6_v1().
  stmmac: enable/disable rx/tx in the core with a single write.
  net: atarilance - flags should be unsigned long
  netxen: fix kdump
  pktgen: Limit how much data we copy onto the stack.
  net: Limit socket I/O iovec total length to INT_MAX.
  USB: gadget: fix ethernet gadget crash in gether_setup
  fib: Fix fib zone and its hash leak on namespace stop
  cxgb3: Fix panic in free_tx_desc()
  cxgb3: fix crash due to manipulating queues before registration
  8390: Don't oops on starting dev queue
  dccp ccid-2: Stop polling
  dccp: Refine the wait-for-ccid mechanism
  dccp: Extend CCID packet dequeueing interface
  dccp: Return-value convention of hc_tx_send_packet()
  igbvf: fix panic on load
  ...
2010-10-29 14:17:12 -07:00
Pavel Emelyanov
4aa2c466a7 fib: Fix fib zone and its hash leak on namespace stop
When we stop a namespace we flush the table and free one, but the
added fn_zone-s (and their hashes if grown) are leaked. Need to free.
Tries releases all its stuff in the flushing code.

Shame on us - this bug exists since the very first make-fib-per-net
patches in 2.6.27 :(

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:03 -07:00
Venkateswararao Jujjuri (JV)
b165d60145 9p: Add datasync to client side TFSYNC/RFSYNC for dotl
SYNOPSIS
    size[4] Tfsync tag[2] fid[4] datasync[4]

    size[4] Rfsync tag[2]

DESCRIPTION

    The Tfsync transaction transfers ("flushes") all modified in-core data of
    file identified by fid to the disk device (or other  permanent  storage
    device)  where that  file  resides.

    If datasync flag is specified data will be fleshed but does not flush
    modified metadata unless  that  metadata  is  needed  in order to allow a
    subsequent data retrieval to be correctly handled.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:49 -05:00
M. Mohan Kumar
329176cc2c 9p: Implement TREADLINK operation for 9p2000.L
Synopsis

	size[4] TReadlink tag[2] fid[4]
	size[4] RReadlink tag[2] target[s]

Description
	Readlink is used to return the contents of the symoblic link
        referred by fid. Contents of symboic link is returned as a
        response.

	target[s] - Contents of the symbolic link referred by fid.

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:48 -05:00
M. Mohan Kumar
1d769cd192 9p: Implement TGETLOCK
Synopsis

    size[4] TGetlock tag[2] fid[4] getlock[n]
    size[4] RGetlock tag[2] getlock[n]

Description

TGetlock is used to test for the existence of byte range posix locks on a file
identified by given fid. The reply contains getlock structure. If the lock could
be placed it returns F_UNLCK in type field of getlock structure.  Otherwise it
returns the details of the conflicting locks in the getlock structure

    getlock structure:
      type[1] - Type of lock: F_RDLCK, F_WRLCK
      start[8] - Starting offset for lock
      length[8] - Number of bytes to check for the lock
             If length is 0, check for lock in all bytes starting at the location
            'start' through to the end of file
      pid[4] - PID of the process that wants to take lock/owns the task
               in case of reply
      client[4] - Client id of the system that owns the process which
                  has the conflicting lock

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
M. Mohan Kumar
a099027c77 9p: Implement TLOCK
Synopsis

    size[4] TLock tag[2] fid[4] flock[n]
    size[4] RLock tag[2] status[1]

Description

Tlock is used to acquire/release byte range posix locks on a file
identified by given fid. The reply contains status of the lock request

    flock structure:
        type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK
        flags[4] - Flags could be either of
          P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a
            conflicting lock exists, wait for that lock to be released.
          P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is
            trying to reclaim a lock after a server restrart (due to crash)
        start[8] - Starting offset for lock
        length[8] - Number of bytes to lock
          If length is 0, lock all bytes starting at the location 'start'
          through to the end of file
        pid[4] - PID of the process that wants to take lock
        client_id[4] - Unique client id

        status[1] - Status of the lock request, can be
          P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or
          P9_LOCK_GRACE(3)
          P9_LOCK_SUCCESS - Request was successful
          P9_LOCK_BLOCKED - A conflicting lock is held by another process
          P9_LOCK_ERROR - Error while processing the lock request
          P9_LOCK_GRACE - Server is in grace period, it can't accept new lock
            requests in this period (except locks with
            P9_LOCK_FLAGS_RECLAIM flag set)

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
Venkateswararao Jujjuri (JV)
920e65dc69 [9p] Introduce client side TFSYNC/RFSYNC for dotl.
SYNOPSIS
    size[4] Tfsync tag[2] fid[4]

    size[4] Rfsync tag[2]

DESCRIPTION

The Tfsync transaction transfers ("flushes") all modified in-core data of
file identified by fid to the disk device (or other  permanent  storage
device)  where that  file  resides.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
Arun R Bharadwaj
4f7ebe8072 net/9p: This patch implements TLERROR/RLERROR on the 9P client.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:45 -05:00
Amarnath Revanna
a10c02036f caif-u5500: Adding shared memory include
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 12:29:51 -07:00
Eric Dumazet
b914c4ea92 inetpeer: __rcu annotations
Adds __rcu annotations to inetpeer
	(struct inet_peer)->avl_left
	(struct inet_peer)->avl_right

This is a tedious cleanup, but removes one smp_wmb() from link_to_pool()
since we now use more self documenting rcu_assign_pointer().

Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in
all cases we dont need a memory barrier.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:33 -07:00
Eric Dumazet
7a2b03c517 fib_rules: __rcu annotates ctarget
Adds __rcu annotation to (struct fib_rule)->ctarget

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:32 -07:00
Eric Dumazet
b33eab0844 tunnels: add __rcu annotations
Add __rcu annotations to :
        (struct ip_tunnel)->prl
        (struct ip_tunnel_prl_entry)->next
        (struct xfrm_tunnel)->next
	struct xfrm_tunnel *tunnel4_handlers
	struct xfrm_tunnel *tunnel64_handlers

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:32 -07:00
Eric Dumazet
e0ad61ec86 net: add __rcu annotations to protocol
Add __rcu annotations to :
        struct net_protocol *inet_protos
        struct net_protocol *inet6_protos

And use appropriate casts to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:31 -07:00
Eric Dumazet
1c31720a74 ipv4: add __rcu annotations to routes.c
Add __rcu annotations to :
        (struct dst_entry)->rt_next
        (struct rt_hash_bucket)->chain

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:31 -07:00
Eric Dumazet
43a951e999 ipv4: add __rcu annotations to ip_ra_chain
Add __rcu annotations to :
        (struct ip_ra_chain)->next
	struct ip_ra_chain *ip_ra_chain;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:28 -07:00
Eric Dumazet
0d7da9ddd9 net: add __rcu annotation to sk_filter
Add __rcu annotation to :
        (struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:28 -07:00
Eric Dumazet
1c87733d06 net_ns: add __rcu annotations
add __rcu annotation to (struct net)->gen, and use
rcu_dereference_protected() in net_assign_generic()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:27 -07:00
Eric Dumazet
6f0bcf1525 tunnels: add _rcu annotations
(struct ip6_tnl)->next is rcu protected :
(struct ip_tunnel)->next is rcu protected :
(struct xfrm6_tunnel)->next is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:45 -07:00
Eric Dumazet
3cc77ec74e net/802: add __rcu annotations
(struct net_device)->garp_port is rcu protected :
(struct garp_port)->applicants is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:44 -07:00
Linus Torvalds
5f05647dd8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-23 11:47:02 -07:00
Linus Torvalds
888a6f77e0 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (52 commits)
  sched: fix RCU lockdep splat from task_group()
  rcu: using ACCESS_ONCE() to observe the jiffies_stall/rnp->qsmask value
  sched: suppress RCU lockdep splat in task_fork_fair
  net: suppress RCU lockdep false positive in sock_update_classid
  rcu: move check from rcu_dereference_bh to rcu_read_lock_bh_held
  rcu: Add advice to PROVE_RCU_REPEATEDLY kernel config parameter
  rcu: Add tracing data to support queueing models
  rcu: fix sparse errors in rcutorture.c
  rcu: only one evaluation of arg in rcu_dereference_check() unless sparse
  kernel: Remove undead ifdef CONFIG_DEBUG_LOCK_ALLOC
  rcu: fix _oddness handling of verbose stall warnings
  rcu: performance fixes to TINY_PREEMPT_RCU callback checking
  rcu: upgrade stallwarn.txt documentation for CPU-bound RT processes
  vhost: add __rcu annotations
  rcu: add comment stating that list_empty() applies to RCU-protected lists
  rcu: apply TINY_PREEMPT_RCU read-side speedup to TREE_PREEMPT_RCU
  rcu: combine duplicate code, courtesy of CONFIG_PREEMPT_RCU
  rcu: Upgrade srcu_read_lock() docbook about SRCU grace periods
  rcu: document ways of stalling updates in low-memory situations
  rcu: repair code-duplication FIXMEs
  ...
2010-10-21 12:54:12 -07:00
David S. Miller
9941fb6276 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-10-21 08:21:34 -07:00
Patrick McHardy
3b1a1ce6f4 Merge branch 'for-patrick' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6 2010-10-21 16:25:51 +02:00
Balazs Scheidler
3b9afb2991 tproxy: added IPv6 socket lookup function to nf_tproxy_core
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 16:12:14 +02:00
Balazs Scheidler
aa976fc011 tproxy: added udp6_lib_lookup function
Just like with IPv4, we need access to the UDP hash table to look up local
sockets, but instead of exporting the global udp_table, export a lookup
function.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 16:05:41 +02:00
Balazs Scheidler
e97c3e278e tproxy: split off ipv6 defragmentation to a separate module
Like with IPv4, TProxy needs IPv6 defragmentation but does not
require connection tracking. Since defragmentation was coupled
with conntrack, I split off the two, creating an nf_defrag_ipv6 module,
similar to the already existing nf_defrag_ipv4.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 16:03:43 +02:00
stephen hemminger
32a875adcd 9p: client code cleanup
Make p9_client_version static since only used in one file.
Remove p9_client_auth because it is defined but never used.
Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 04:26:39 -07:00
Balazs Scheidler
093d282321 tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()
When __inet_inherit_port() is called on a tproxy connection the wrong locks are
held for the inet_bind_bucket it is added to. __inet_inherit_port() made an
implicit assumption that the listener's port number (and thus its bind bucket).
Unfortunately, if you're using the TPROXY target to redirect skbs to a
transparent proxy that assumption is not true anymore and things break.

This patch adds code to __inet_inherit_port() so that it can handle this case
by looking up or creating a new bind bucket for the child socket and updates
callers of __inet_inherit_port() to gracefully handle __inet_inherit_port()
failing.

Reported by and original patch from Stephen Buck <stephen.buck@exinda.com>.
See http://marc.info/?t=128169268200001&r=1&w=2 for the original discussion.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 13:06:43 +02:00
Balazs Scheidler
6006db84a9 tproxy: add lookup type checks for UDP in nf_tproxy_get_sock_v4()
Also, inline this function as the lookup_type is always a literal
and inlining removes branches performed at runtime.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 12:47:34 +02:00
Balazs Scheidler
106e4c26b1 tproxy: kick out TIME_WAIT sockets in case a new connection comes in with the same tuple
Without tproxy redirections an incoming SYN kicks out conflicting
TIME_WAIT sockets, in order to handle clients that reuse ports
within the TIME_WAIT period.

The same mechanism didn't work in case TProxy is involved in finding
the proper socket, as the time_wait processing code looked up the
listening socket assuming that the listener addr/port matches those
of the established connection.

This is not the case with TProxy as the listener addr/port is possibly
changed with the tproxy rule.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 12:45:14 +02:00
Changli Gao
3511c9132f net_sched: remove the unused parameter of qdisc_create_dflt()
The first parameter dev isn't in use in qdisc_create_dflt().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:47 -07:00
stephen hemminger
1c4c40c42d xfrm: make xfrm_bundle_ok local
Only used in one place.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:46 -07:00
stephen hemminger
8d8a0b1cc2 rtnetlink: remove rtnl_kill_links
The function rtnl_kill_links is defined but never used.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:45 -07:00
stephen hemminger
6f747aca5e xfrm6: make xfrm6_tunnel_free_spi local
Function only defined and used in one file.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:45 -07:00
Julian Anastasov
0d79641a96 ipvs: provide address family for debugging
As skb->protocol is not valid in LOCAL_OUT add
parameter for address family in packet debugging functions.
Even if ports are not present in AH and ESP change them to
use ip_vs_tcpudp_debug_packet to show at least valid addresses
as before. This patch removes the last user of skb->protocol
in IPVS.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 11:04:43 +02:00
Julian Anastasov
fc60476761 ipvs: changes for local real server
This patch deals with local real servers:

- Add support for DNAT to local address (different real server port).
It needs ip_vs_out hook in LOCAL_OUT for both families because
skb->protocol is not set for locally generated packets and can not
be used to set 'af'.

- Skip packets in ip_vs_in marked with skb->ipvs_property because
ip_vs_out processing can be executed in LOCAL_OUT but we still
have the conn_out_get check in ip_vs_in.

- Ignore packets with inet->nodefrag from local stack

- Require skb_dst(skb) != NULL because we use it to get struct net

- Add support for changing the route to local IPv4 stack after DNAT
depending on the source address type. Local client sets output
route and the remote client sets input route. It looks like
IPv6 does not need such rerouting because the replies use
addresses from initial incoming header, not from skb route.

- All transmitters now have strict checks for the destination
address type: redirect from non-local address to local real
server requires NAT method, local address can not be used as
source address when talking to remote real server.

- Now LOCALNODE is not set explicitly as forwarding
method in real server to allow the connections to provide
correct forwarding method to the backup server. Not sure if
this breaks tools that expect to see 'Local' real server type.
If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL.
Now it should be possible connections in backup that lost
their fwmark information during sync to be forwarded properly
to their daddr, even if it is local address in the backup server.
By this way backup could be used as real server for DR or TUN,
for NAT there are some restrictions because tuple collisions
in conntracks can create problems for the traffic.

- Call ip_vs_dst_reset when destination is updated in case
some real server IP type is changed between local and remote.

[ horms@verge.net.au: removed trailing whitespace ]
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 11:03:46 +02:00
Julian Anastasov
190ecd27cd ipvs: do not schedule conns from real servers
This patch is needed to avoid scheduling of
packets from local real server when we add ip_vs_in
in LOCAL_OUT hook to support local client.

 	Currently, when ip_vs_in can not find existing
connection it tries to create new one by calling ip_vs_schedule.

 	The default indication from ip_vs_schedule was if
connection was scheduled to real server. If real server is
not available we try to use the bypass forwarding method
or to send ICMP error. But in some cases we do not want to use
the bypass feature. So, add flag 'ignored' to indicate if
the scheduler ignores this packet.

 	Make sure we do not create new connections from replies.
We can hit this problem for persistent services and local real
server when ip_vs_in is added to LOCAL_OUT hook to handle
local clients.

 	Also, make sure ip_vs_schedule ignores SYN packets
for Active FTP DATA from local real server. The FTP DATA
connection should be created on SYN+ACK from client to assign
correct connection daddr.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 10:50:41 +02:00
Julian Anastasov
cf356d69db ipvs: switch to notrack mode
Change skb->ipvs_property semantic. This is preparation
to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1
will be used to avoid expensive lookups for traffic sent by
transmitters. Now when conntrack support is not used we call
ip_vs_notrack method to avoid problems in OUTPUT and
POST_ROUTING hooks instead of exiting POST_ROUTING as before.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 10:50:20 +02:00
Julian Anastasov
8b27b10f58 ipvs: optimize checksums for apps
Avoid full checksum calculation for apps that can provide
info whether csum was broken after payload mangling. For now only
ip_vs_ftp mangles payload and it updates the csum, so the full
recalculation is avoided for all packets.

 	Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP).
It is needed to support SNAT from local address for the case
when csum is fully recalculated.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 10:50:02 +02:00
David S. Miller
5eeaa2db16 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-10-20 01:59:48 -07:00
Hans Schillstrom
714f095f74 ipvs: IPv6 tunnel mode
IPv6 encapsulation uses a bad source address for the tunnel.
i.e. VIP will be used as local-addr and encap. dst addr.
Decapsulation will not accept this.

Example
LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100)
   (eth0 2003::1:0:1/96)
RS  (ethX 2003::1:0:5/96)

tcpdump
2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40)  2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0

In Linux IPv6 impl. you can't have a tunnel with an any cast address
receiving packets (I have not tried to interpret RFC 2473)
To have receive capabilities the tunnel must have:
 - Local address set as multicast addr or an unicast addr
 - Remote address set as an unicast addr.
 - Loop back addres or Link local address are not allowed.

This causes us to setup a tunnel in the Real Server with the
LVS as the remote address, here you can't use the VIP address since it's
used inside the tunnel.

Solution
Use outgoing interface IPv6 address (match against the destination).
i.e. use ip6_route_output() to look up the route cache and
then use ipv6_dev_get_saddr(...) to set the source address of the
encapsulated packet.

Additionally, cache the results in new destination
fields: dst_cookie and dst_saddr and properly check the
returned dst from ip6_route_output. We now add xfrm_lookup
call only for the tunneling method where the source address
is a local one.

Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-19 10:38:48 +02:00
Pablo Neira Ayuso
ebbf41df4a netfilter: ctnetlink: add expectation deletion events
This patch allows to listen to events that inform about
expectations destroyed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-19 10:19:06 +02:00
Eric Dumazet
8e602ce298 netns: reorder fields in struct net
In a network bench, I noticed an unfortunate false sharing between
'loopback_dev' and 'count' fields in "struct net".

'count' is written each time a socket is created or destroyed, while
loopback_dev might be often read in routing code.

Move loopback_dev in a read mostly section of "struct net"

Note: struct netns_xfrm is cache line aligned on SMP.
(It contains a "struct dst_ops")
Move it at the end to avoid holes, and reduce sizeof(struct net) by 128
bytes on ia32.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17 13:49:14 -07:00
stephen hemminger
31e3c3f6f1 tipc: cleanup function namespace
Do some cleanups of TIPC based on make namespacecheck
  1. Don't export unused symbols
  2. Eliminate dead code
  3. Make functions and variables local
  4. Rename buf_acquire to tipc_buf_acquire since it is used in several files

Compile tested only.
This make break out of tree kernel modules that depend on TIPC routines.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16 11:13:24 -07:00
John W. Linville
c64557d666 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-10-15 16:11:56 -04:00
John W. Linville
1a63c353c8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6 into for-davem 2010-10-15 16:00:02 -04:00
Kumar Sanghvi
b3d6255388 Phonet: 'connect' socket implementation for Pipe controller
Based on suggestion by Rémi Denis-Courmont to implement 'connect'
for Pipe controller logic,  this patch implements 'connect' socket
call for the Pipe controller logic.
The patch does following:-
- Removes setsockopts for PNPIPE_CREATE and PNPIPE_DESTROY
- Adds setsockopt for setting the Pipe handle value
- Implements connect socket call
- Updates the Pipe controller logic

User-space should now follow below sequence with Pipe controller:-
-socket
-bind
-setsockopt for PNPIPE_PIPE_HANDLE
-connect
-setsockopt for PNPIPE_ENCAP_IP
-setsockopt for PNPIPE_ENABLE

GPRS/3G data has been tested working fine with this.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-13 14:40:34 -07:00
Johannes Berg
7be5086d4c mac80211: add probe request filter flag
Using the frame registration notification, we
can see when probe requests are requested and
notify the low-level driver via filtering. The
flag is also set in AP and IBSS modes.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-13 15:45:22 -04:00
Johannes Berg
271733cf84 cfg80211: notify drivers about frame registrations
Drivers may need to adjust their filters according
to frame registrations, so notify them about them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-13 15:45:22 -04:00
Andrei Emeltchenko
534c92fde7 Bluetooth: clean up rfcomm code
Remove dead code and unused rfcomm thread events

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-10-12 12:44:53 -03:00
Mat Martineau
796c86eec8 Bluetooth: Add common code for stream-oriented recvmsg()
This commit adds a bt_sock_stream_recvmsg() function for use by any
Bluetooth code that uses SOCK_STREAM sockets.  This code is copied
from rfcomm_sock_recvmsg() with minimal modifications to remove
RFCOMM-specific functionality and improve readability.

L2CAP (with the SOCK_STREAM socket type) and RFCOMM have common needs
when it comes to reading data.  Proper stream read semantics require
that applications can read from a stream one byte at a time and not
lose any data.  The RFCOMM code already operated on and pulled data
from the underlying L2CAP socket, so very few changes were required to
make the code more generic for use with non-RFCOMM data over L2CAP.

Applications that need more awareness of L2CAP frame boundaries are
still free to use SOCK_SEQPACKET sockets, and may verify that they
connection did not fall back to basic mode by calling getsockopt().

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-10-12 12:44:51 -03:00
David Vrabel
8f1e174223 Bluetooth: HCI devices are either BR/EDR or AMP radios
HCI transport drivers may not know what type of radio an AMP device has
so only say whether they're BR/EDR or AMP devices.

Signed-off-by: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-10-12 12:44:51 -03:00
Eric Dumazet
e37ef961e5 neigh: reorder struct neighbour fields
Le mardi 12 octobre 2010 à 00:02 +0200, Eric Dumazet a écrit :
> Here is the followup patch.
>
> Thanks !
>

Oops, this was an old version, the up2date ones also took care of "used"
field.

I guess its time for a sleep, sorry again.

[PATCH net-next V2] neigh: reorder struct neighbour fields

(refcnt) and (ha_lock, ha, used, dev, output, ops, primary_key) should
be placed on a separate cache lines.

refcnt can be often written, while other fields are mostly read.

This gave me good result on stress test :

before:

real    0m45.570s
user    0m15.525s
sys     9m56.669s

After:

real    0m41.841s
user    0m15.261s
sys     8m45.949s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 16:09:14 -07:00
Eric Dumazet
fc66f95c68 net dst: use a percpu_counter to track entries
struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

After:

real	0m45.570s
user	0m15.525s
sys	9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real	0m41.841s
user	0m15.261s
sys	8m45.949s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 13:06:53 -07:00
Eric Dumazet
0ed8ddf404 neigh: Protect neigh->ha[] with a seqlock
Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
dirtying neighbour in stress situation (many different flows / dsts)

Dirtying takes place because of read_lock(&n->lock) and n->used writes.

Switching to a seqlock, and writing n->used only on jiffies changes
permits less dirtying.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 12:54:04 -07:00
David S. Miller
d122179a3c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/core/ethtool.c
2010-10-11 12:30:34 -07:00
Felix Fietkau
8610c29a2c cfg80211: add channel utilization stats to the survey command
Using these, user space can calculate a relative channel utilization
with arbitrary intervals by regularly taking snapshots of the survey
results.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-11 15:04:20 -04:00
Ben Greear
5a5c731aa5 wireless: Set some stats used by /proc/net/wireless (wext)
Some stats for /proc/net/wireless (and wext in general) are not
being set.  This patch addresses a few of those with values easily
obtained from mac80211 core.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-11 15:04:19 -04:00
John W. Linville
e9a68707d7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ipw2x00/ipw2200.c
2010-10-08 15:39:28 -04:00
Johannes Berg
388ac775be cfg80211: constify WDS address
There's no need for the WDS peer address
to not be const, so make it const.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-07 14:41:28 -04:00
Ingo Molnar
556ef63255 Merge commit 'v2.6.36-rc7' into core/rcu
Merge reason: Update from -rc3 to -rc7.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-07 09:43:45 +02:00
Ingo Molnar
d4f8f217b8 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu 2010-10-07 09:43:11 +02:00
Eric Dumazet
767e97e1e0 neigh: RCU conversion of struct neighbour
This is the second step for neighbour RCU conversion.

(first was commit d6bf7817 : RCU conversion of neigh hash table)

neigh_lookup() becomes lockless, but still take a reference on found
neighbour. (no more read_lock()/read_unlock() on tbl->lock)

struct neighbour gets an additional rcu_head field and is freed after an
RCU grace period.

Future work would need to eventually not take a reference on neighbour
for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to
use a noref bit like we did for skb->_dst.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06 18:01:33 -07:00
Bruno Randolf
b206b4ef06 nl80211/mac80211: Add retry and failed transmission count to station info
This information is already available in mac80211, we just need to export it
via cfg80211 and nl80211.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-06 16:30:43 -04:00
Johannes Berg
e31b82136d cfg80211/mac80211: allow per-station GTKs
This adds API to allow adding per-station GTKs,
updates mac80211 to support it, and also allows
drivers to remove a key from hwaccel again when
this may be necessary due to multiple GTKs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-06 16:30:40 -04:00
Eric Dumazet
ebc0ffae5d fib: RCU conversion of fib_lookup()
fib_lookup() converted to be called in RCU protected context, no
reference taken and released on a contended cache line (fib_clntref)

fib_table_lookup() and fib_semantic_match() get an additional parameter.

struct fib_info gets an rcu_head field, and is freed after an rcu grace
period.

Stress test :
(Sending 160.000.000 UDP frames on same neighbour,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_HASH) (about same results for FIB_TRIE)

Before patch :

real	1m31.199s
user	0m13.761s
sys	23m24.780s

After patch:

real	1m5.375s
user	0m14.997s
sys	15m50.115s

Before patch Profile :

13044.00 15.4% __ip_route_output_key vmlinux
 8438.00 10.0% dst_destroy           vmlinux
 5983.00  7.1% fib_semantic_match    vmlinux
 5410.00  6.4% fib_rules_lookup      vmlinux
 4803.00  5.7% neigh_lookup          vmlinux
 4420.00  5.2% _raw_spin_lock        vmlinux
 3883.00  4.6% rt_set_nexthop        vmlinux
 3261.00  3.9% _raw_read_lock        vmlinux
 2794.00  3.3% fib_table_lookup      vmlinux
 2374.00  2.8% neigh_resolve_output  vmlinux
 2153.00  2.5% dst_alloc             vmlinux
 1502.00  1.8% _raw_read_lock_bh     vmlinux
 1484.00  1.8% kmem_cache_alloc      vmlinux
 1407.00  1.7% eth_header            vmlinux
 1406.00  1.7% ipv4_dst_destroy      vmlinux
 1298.00  1.5% __copy_from_user_ll   vmlinux
 1174.00  1.4% dev_queue_xmit        vmlinux
 1000.00  1.2% ip_output             vmlinux

After patch Profile :

13712.00 15.8% dst_destroy             vmlinux
 8548.00  9.9% __ip_route_output_key   vmlinux
 7017.00  8.1% neigh_lookup            vmlinux
 4554.00  5.3% fib_semantic_match      vmlinux
 4067.00  4.7% _raw_read_lock          vmlinux
 3491.00  4.0% dst_alloc               vmlinux
 3186.00  3.7% neigh_resolve_output    vmlinux
 3103.00  3.6% fib_table_lookup        vmlinux
 2098.00  2.4% _raw_read_lock_bh       vmlinux
 2081.00  2.4% kmem_cache_alloc        vmlinux
 2013.00  2.3% _raw_spin_lock          vmlinux
 1763.00  2.0% __copy_from_user_ll     vmlinux
 1763.00  2.0% ip_output               vmlinux
 1761.00  2.0% ipv4_dst_destroy        vmlinux
 1631.00  1.9% eth_header              vmlinux
 1440.00  1.7% _raw_read_unlock_bh     vmlinux

Reference results, if IP route cache is enabled :

real	0m29.718s
user	0m10.845s
sys	7m37.341s

25213.00 29.5% __ip_route_output_key   vmlinux
 9011.00 10.5% dst_release             vmlinux
 4817.00  5.6% ip_push_pending_frames  vmlinux
 4232.00  5.0% ip_finish_output        vmlinux
 3940.00  4.6% udp_sendmsg             vmlinux
 3730.00  4.4% __copy_from_user_ll     vmlinux
 3716.00  4.4% ip_route_output_flow    vmlinux
 2451.00  2.9% __xfrm_lookup           vmlinux
 2221.00  2.6% ip_append_data          vmlinux
 1718.00  2.0% _raw_spin_lock_bh       vmlinux
 1655.00  1.9% __alloc_skb             vmlinux
 1572.00  1.8% sock_wfree              vmlinux
 1345.00  1.6% kfree                   vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05 20:39:38 -07:00
Eric Dumazet
d6bf781712 net neigh: RCU conversion of neigh hash table
David

This is the first step for RCU conversion of neigh code.

Next patches will convert hash_buckets[] and "struct neighbour" to RCU
protected objects.

Thanks

[PATCH net-next] net neigh: RCU conversion of neigh hash table

Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
neigh_table", a new structure is defined :

struct neigh_hash_table {
       struct neighbour        **hash_buckets;
       unsigned int            hash_mask;
       __u32                   hash_rnd;
       struct rcu_head         rcu;
};

And "struct neigh_table" has an RCU protected pointer to such a
neigh_hash_table.

This means the signature of (*hash)() function changed: We need to add a
third parameter with the actual hash_rnd value, since this is not
anymore a neigh_table field.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05 14:54:36 -07:00
Johannes Berg
ff4c92d85c genetlink: introduce pre_doit/post_doit hooks
Each family may have some amount of boilerplate
locking code that applies to most, or even all,
commands.

This allows a family to handle such things in
a more generic way, by allowing it to
 a) include private flags in each operation
 b) specify a pre_doit hook that is called,
    before an operation's doit() callback and
    may return an error directly,
 c) specify a post_doit hook that can undo
    locking or similar things done by pre_doit,
    and finally
 d) include two private pointers in each info
    struct passed between all these operations
    including doit(). (It's two because I'll
    need two in nl80211 -- can be extended.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-05 13:35:30 -04:00
Helmut Schaa
78be49ec2a mac80211: distinct between max rates and the number of rates the hw can report
Some drivers cannot handle multiple retry rates specified by the rc
algorithm but instead use their own retry table (for example rt2800).
However, if such a device registers itself with a max_rates value of 1
the rc algorithm cannot make use of the extended information the device
can provide about retried rates. On the other hand, if a device
registers itself with a max_rates value > 1 the rc algorithm assumes
that the device can handle multi rate retries.

Fix this issue by introducing another hw parameter max_report_rates that
can be set to a different value then max_rates to indicate if a device
is capable of reporting more rates then specified in max_rates.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-05 13:35:28 -04:00
Johannes Berg
ea229e6826 cfg80211: remove spurious __KERNEL__ ifdef
The net/cfg80211.h header file isn't exported to
userspace, so there's no need for any kind of
__KERNEL__ protection in it. If it was exported,
everything else in it would need protection as
well, not just the logging stuff ...

Cc:Joe Perches <joe@perches.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-05 13:35:23 -04:00
Felix Fietkau
17e5a80828 nl80211: allow drivers to indicate whether the survey data channel is in use
Some user space applications only want to display survey data for
the operating channel, however there is no API to get that yet.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-05 13:35:22 -04:00
stephen hemminger
c61393ea83 ipv6: make __ipv6_isatap_ifid static
Another exported symbol only used in one file

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05 00:47:39 -07:00
stephen hemminger
1df9916e46 fib: fib_rules_cleanup can be static
fib_rules_cleanup_ups is only defined and used in one place.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05 00:47:39 -07:00
Patrick McHardy
eecc545856 netfilter: add missing xt_log.h file
Forgot to add xt_log.h in commit a8defca0 (netfilter: ipt_LOG:
add bufferisation to call printk() once)

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-04 23:24:21 +02:00
David S. Miller
21a180cda0 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/Kconfig
	net/ipv4/tcp_timer.c
2010-10-04 11:56:38 -07:00
Stephen Hemminger
0c200d9353 netfilter: nf_nat: make find/put static
The functions nf_nat_proto_find_get and nf_nat_proto_put are
only used internally in nf_nat_core. This might break some out
of tree NAT module.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-04 20:53:18 +02:00
Simon Horman
0d1e71b04a IPVS: Allow configuration of persistence engines
Allow the persistence engine of a virtual service to be set, edited
and unset.

This feature only works with the netlink user-space interface.

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04 22:45:24 +09:00
Simon Horman
8be67a6617 IPVS: management of persistence engine modules
This is based heavily on the scheduler management code

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04 22:45:24 +09:00
Simon Horman
a3c918acd2 IPVS: Add persistence engine data to /proc/net/ip_vs_conn
This shouldn't break compatibility with userspace as the new data
is at the end of the line.

I have confirmed that this doesn't break ipvsadm, the main (only?)
user-space user of this data.

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04 22:45:24 +09:00
Simon Horman
85999283a2 IPVS: Add struct ip_vs_pe
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04 22:45:24 +09:00
Simon Horman
f11017ec2d IPVS: Add struct ip_vs_conn_param
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
2010-10-04 22:45:24 +09:00
Eric Dumazet
c7d4426a98 net: introduce DST_NOCACHE flag
While doing stress tests with IP route cache disabled, and multi queue
devices, I noticed a very high contention on one rwlock used in
neighbour code.

When many cpus are trying to send frames (possibly using a high
performance multiqueue device) to the same neighbour, they fight for the
neigh->lock rwlock in order to call neigh_hh_init(), and fight on
hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())

But we dont need to call neigh_hh_init() for dst that are used only
once. It costs four atomic operations at least, on two contended cache
lines, plus the high contention on neigh->lock rwlock.

Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
inserted in route cache.

With the stress test bench, sending 160000000 frames on one neighbour,
results are :

Before patch:

real	2m28.406s
user	0m11.781s
sys	36m17.964s


After patch:

real	1m26.532s
user	0m12.185s
sys	20m3.903s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-03 22:17:54 -07:00
John W. Linville
41f4a6f71f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-10-01 11:12:36 -04:00
Eric Dumazet
367e5e3769 neigh: reorder fields in struct neighbour
On 64bit arches, there are two 32bit holes that we can remove.

sizeof(struct neighbour) shrinks from 0xf8 to 0xf0 bytes

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-01 00:36:51 -07:00
Gustavo F. Padovan
e454c84464 Bluetooth: Fix deadlock in the ERTM logic
The Enhanced Retransmission Mode(ERTM) is a realiable mode of operation
of the Bluetooth L2CAP layer. Think on it like a simplified version of
TCP.
The problem we were facing here was a deadlock. ERTM uses a backlog
queue to queue incomimg packets while the user is helding the lock. At
some moment the sk_sndbuf can be exceeded and we can't alloc new skbs
then the code sleep with the lock to wait for memory, that stalls the
ERTM connection once we can't read the acknowledgements packets in the
backlog queue to free memory and make the allocation of outcoming skb
successful.

This patch actually affect all users of bt_skb_send_alloc(), i.e., all
L2CAP modes and SCO.

We are safe against socket states changes or channels deletion while the
we are sleeping wait memory. Checking for the sk->sk_err and
sk->sk_shutdown make the code safe, since any action that can leave the
socket or the channel in a not usable state set one of the struct
members at least. Then we can check both of them when getting the lock
again and return with the proper error if something unexpected happens.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Ulisses Furquim <ulisses@profusion.mobi>
2010-09-30 12:19:35 -03:00
stephen hemminger
1b9f409293 tcp: tcp_enter_quickack_mode can be static
Function only used in tcp_input.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-29 19:45:36 -07:00
stephen hemminger
a64de47c09 arp: remove unnecessary export of arp_broken_ops
arp_broken_ops is only used in arp.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-29 19:45:35 -07:00
Tom Herbert
4465b46900 ipv4: Allow configuring subnets as local addresses
This patch allows a host to be configured to respond to any address in
a specified range as if it were local, without actually needing to
configure the address on an interface.  This is done through routing
table configuration.  For instance, to configure a host to respond
to any address in 10.1/16 received on eth0 as a local address we can do:

ip rule add from all iif eth0 lookup 200
ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200

This host is now reachable by any 10.1/16 address (route lookup on
input for packets received on eth0 can find the route).  On output, the
rule will not be matched so that this host can still send packets to
10.1/16 (not sent on loopback).  Presumably, external routing can be
configured to make sense out of this.

To make this work, we needed to modify the logic in finding the
interface which is assigned a given source address for output
(dev_ip_find).  We perform a normal fib_lookup instead of just a
lookup on the local table, and in the lookup we ignore the input
interface for matching.

This patch is useful to implement IP-anycast for subnets of virtual
addresses.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-28 23:38:15 -07:00
Pablo Neira Ayuso
bc01befdcf netfilter: ctnetlink: add support for user-space expectation helpers
This patch adds the basic infrastructure to support user-space
expectation helpers via ctnetlink and the netfilter queuing
infrastructure NFQUEUE. Basically, this patch:

* adds NF_CT_EXPECT_USERSPACE flag to identify user-space
  created expectations. I have also added a sanity check in
  __nf_ct_expect_check() to avoid that kernel-space helpers
  may create an expectation if the master conntrack has no
  helper assigned.
* adds some branches to check if the master conntrack helper
  exists, otherwise we skip the code that refers to kernel-space
  helper such as the local expectation list and the expectation
  policy.
* allows to set the timeout for user-space expectations with
  no helper assigned.
* a list of expectations created from user-space that depends
  on ctnetlink (if this module is removed, they are deleted).
* includes USERSPACE in the /proc output for expectations
  that have been created by a user-space helper.

This patch also modifies ctnetlink to skip including the helper
name in the Netlink messages if no kernel-space helper is set
(since no user-space expectation has not kernel-space kernel
assigned).

You can access an example user-space FTP conntrack helper at:
http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-userspace-POC.tar.bz

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-28 21:06:34 +02:00
Eric Dumazet
290b895e0b tunnels: prepare percpu accounting
Tunnels are going to use percpu for their accounting.

They are going to use a new tstats field in net_device.

skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx()

IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27 21:30:42 -07:00
Kumar Sanghvi
8d98efa84b Phonet: Implement Pipe Controller to support Nokia Slim Modems
Phonet stack assumes the presence of Pipe Controller, either in Modem or
on Application Processing Engine user-space for the Pipe data.
Nokia Slim Modems like WG2.5 used in ST-Ericsson U8500 platform do not
implement Pipe controller in them.
This patch adds Pipe Controller implemenation to Phonet stack to support
Pipe data over Phonet stack for Nokia Slim Modems.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27 21:30:41 -07:00
Ulrich Weber
fb0c5f0bc8 tproxy: check for transparent flag in ip_route_newports
as done in ip_route_connect()

Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27 15:03:33 -07:00
Johannes Berg
554891e63a mac80211: move packet flags into packet
commit 8c0c709eea
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Wed Nov 25 17:46:15 2009 +0100

    mac80211: move cmntr flag out of rx flags

moved the CMNTR flag into the skb RX flags for
some aggregation cleanups, but this was wrong
since the optimisation this flag tried to make
requires that it is kept across the processing
of multiple interfaces -- which isn't true for
flags in the skb. The patch not only broke the
optimisation, it also introduced a bug: under
some (common!) circumstances the flag will be
set on an already freed skb!

However, investigating this in more detail, I
found that most of the flags that we set should
be per packet, _except_ for this one, due to
a-MPDU processing. Additionally, the flags used
for processing (currently just this one) need
to be reset before processing a new packet.

Since we haven't actually seen bugs reported as
a result of the wrong flags handling (which is
not too surprising -- the only real bug case I
can come up with is an a-MSDU contained in an
a-MPDU), I'll make a different fix for rc.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-27 15:57:54 -04:00
Ben Greear
686b9cb994 mac80211/ath9k: Support AMPDU with multiple VIFs.
The old ieee80211_find_sta_by_hw method didn't properly
find VIFS when there was more than one per AP.  This caused
AMPDU logic in ath9k to get the wrong VIF when trying to
account for transmitted SKBs.

This patch changes ieee80211_find_sta_by_hw to take a
localaddr argument to distinguish between VIFs with the
same AP but different local addresses.  The method name
is changed to ieee80211_find_sta_by_ifaddr.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-27 15:57:45 -04:00
David S. Miller
e40051d134 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/qlcnic/qlcnic_init.c
	net/ipv4/ip_output.c
2010-09-27 01:03:03 -07:00
Neil Horman
2cc6d2bf3d ipv6: add a missing unregister_pernet_subsys call
Clean up a missing exit path in the ipv6 module init routines.  In
addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
for the ipv6_addr_label_ops structure.  But if module loading fails, or if the
ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
which leaves a now-bogus address on the pernet_list, leading to oopses in
subsequent registrations.  This patch cleans up both the failed load path and
the unload path.  Tested by myself with good results.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/net/addrconf.h |    1 +
 net/ipv6/addrconf.c    |   11 ++++++++---
 net/ipv6/addrlabel.c   |    5 +++++
 3 files changed, 14 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-26 19:09:25 -07:00
Eric Dumazet
7a91b434e2 net: update SOCK_MIN_RCVBUF
SOCK_MIN_RCVBUF current value is 256 bytes

It doesnt permit to receive the smallest possible frame, considering
socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
headroom, and we go over the limit.

With old kernels and 32bit arches, we were under the limit, if netdriver
was doing copybreak.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-26 18:53:07 -07:00
Tom Herbert
693019e90c net: reset skb queue mapping when rx'ing over tunnel
Reset queue mapping when an skb is reentering the stack via a tunnel.
On second pass, the queue mapping from the original device is no
longer valid.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-26 18:48:40 -07:00
Christian Lamparter
eb7d3066cf mac80211: clear txflags for ps-filtered frames
This patch fixes stale mac80211_tx_control_flags for
filtered / retried frames.

Because ieee80211_handle_filtered_frame feeds skbs back
into the tx path, they have to be stripped of some tx
flags so they won't confuse the stack, driver or device.

Cc: <stable@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-24 15:54:30 -04:00
Eric Dumazet
a02cec2155 net: return operator cleanup
Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-23 14:33:39 -07:00
Pablo Neira Ayuso
8b008faf92 netfilter: ctnetlink: allow to specify the expectation flags
With this patch, you can specify the expectation flags for user-space
created expectations.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-22 08:36:59 +02:00
David S. Miller
a0741ca949 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-09-21 18:17:19 -07:00
Eric Dumazet
48daa3bb84 ipv6: addrconf.h cleanups
- Use rcu_dereference_rtnl() in __in6_dev_get
- kerneldoc for __in6_dev_get() and in6_dev_get()
- Use inline functions instead of macros

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-21 18:04:47 -07:00
John W. Linville
b618f6f885 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	arch/arm/mach-omap2/board-omap3pandora.c
	drivers/net/wireless/ath/ath5k/base.c
2010-09-21 15:49:14 -04:00
Julian Anastasov
8a8030407f ipvs: make rerouting optional with snat_reroute
Add new sysctl flag "snat_reroute". Recent kernels use
ip_route_me_harder() to route LVS-NAT responses properly by
VIP when there are multiple paths to client. But setups
that do not have alternative default routes can skip this
routing lookup by using snat_reroute=0.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-21 17:38:57 +02:00
Julian Anastasov
f4bc17cdd2 ipvs: netfilter connection tracking changes
Add more code to IPVS to work with Netfilter connection
tracking and fix some problems.

- Allow IPVS to be compiled without connection tracking as in
2.6.35 and before. This can avoid keeping conntracks for all
IPVS connections because this costs memory. ip_vs_ftp still
depends on connection tracking and NAT as implemented for 2.6.36.

- Add sysctl var "conntrack" to enable connection tracking for
all IPVS connections. For loaded IPVS directors it needs
tuning of nf_conntrack_max limit.

- Add IP_VS_CONN_F_NFCT connection flag to request the connection
to use connection tracking. This allows user space to provide this
flag, for example, in dest->conn_flags. This can be useful to
request connection tracking per real server instead of forcing it
for all connections with the "conntrack" sysctl. This flag is
set currently only by ip_vs_ftp and of course by "conntrack" sysctl.

- Add ip_vs_nfct.c file to hold all connection tracking code,
by this way main code should not depend of netfilter conntrack
support.

- Return back the ip_vs_post_routing handler as in 2.6.35 and use
skb->ipvs_property=1 to allow IPVS to work without connection
tracking

Connection tracking:

- most of the code is already in 2.6.36-rc

- alter conntrack reply tuple for LVS-NAT connections when first packet
from client is forwarded and conntrack state is NEW or RELATED.
Additionally, alter reply for RELATED connections from real server,
again for packet in original direction.

- add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering
reply) for LVS-TUN early because we want to call nf_reset. It is
needed because we add IPIP header and the original conntrack
should be preserved, not destroyed. The transmitted IPIP packets
can reuse same conntrack, so we do not set skb->ipvs_property.

- try to destroy conntrack when the IPVS connection is destroyed.
It is not fatal if conntrack disappears before that, it depends
on the used timers.

Fix problems from long time:

- add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-21 17:35:41 +02:00
Thomas Egerer
8444cf712c xfrm: Allow different selector family in temporary state
The family parameter xfrm_state_find is used to find a state matching a
certain policy. This value is set to the template's family
(encap_family) right before xfrm_state_find is called.
The family parameter is however also used to construct a temporary state
in xfrm_state_find itself which is wrong for inter-family scenarios
because it produces a selector for the wrong family. Since this selector
is included in the xfrm_user_acquire structure, user space programs
misinterpret IPv6 addresses as IPv4 and vice versa.
This patch splits up the original init_tempsel function into a part that
initializes the selector respectively the props and id of the temporary
state, to allow for differing ip address families whithin the state.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-20 11:11:38 -07:00
Julian Anastasov
3575792e00 ipvs: extend connection flags to 32 bits
- the sync protocol supports 16 bits only, so bits 0..15 should be
used only for flags that should go to backup server, bits 16 and
above should be allocated for flags not sent to backup.

- use IP_VS_CONN_F_DEST_MASK as mask of connection flags in
destination that can be changed by user space

- allow IP_VS_CONN_F_ONE_PACKET to be set in destination

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-17 14:18:16 +02:00
Johannes Berg
2ca27bcff7 mac80211: add p2p device type support
When a driver advertises p2p device support,
mac80211 will handle it, but internally it will
rewrite the interface type to STA/AP rather than
P2P-STA/GO since otherwise a lot of paths need
to be touched that are otherwise identical. A
p2p boolean tells drivers whether or not a given
interface will be used for p2p or not.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-16 15:46:07 -04:00
Joe Perches
9c37663929 include/net/cfg80211.h: wiphy_<level> messages use dev_printk
The output becomes:

[   41.261941] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-16 15:19:44 -04:00
Rémi Denis-Courmont
507215f8d0 Phonet: list subscribed resources via proc_fs
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-15 21:31:33 -07:00
Rémi Denis-Courmont
4e3d16ce5e Phonet: resource routing backend
When both destination device and object are nul, Phonet routes the
packet according to the resource field. In fact, this is the most
common pattern when sending Phonet "request" packets. In this case,
the packet is delivered to whichever endpoint (socket) has
registered the resource.

This adds a new table so that Linux processes can register their
Phonet sockets to Phonet resources, if they have adequate privileges.

(Namespace support is not implemented at the moment.)

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-15 21:31:32 -07:00
Rémi Denis-Courmont
6482f554e2 Phonet: remove dangling pipe if an endpoint is closed early
Closing a pipe endpoint is not normally allowed by the Phonet pipe,
other than as a side after-effect of removing the pipe between two
endpoints. But there is no way to prevent Linux userspace processes
from being killed or suffering from bugs, so this can still happen.
We might as well forcefully close Phonet pipe endpoints then.

The cellular modem supports only a few existing pipes at a time. So we
really should not leak them. This change instructs the modem to destroy
the pipe if either of the pipe's endpoint (Linux socket) is closed too
early.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-15 21:31:31 -07:00
Alexey Kuznetsov
01f83d6984 tcp: Prevent overzealous packetization by SWS logic.
If peer uses tiny MSS (say, 75 bytes) and similarly tiny advertised
window, the SWS logic will packetize to half the MSS unnecessarily.

This causes problems with some embedded devices.

However for large MSS devices we do want to half-MSS packetize
otherwise we never get enough packets into the pipe for things
like fast retransmit and recovery to work.

Be careful also to handle the case where MSS > window, otherwise
we'll never send until the probe timer.

Reported-by: ツ Leandro Melo de Sales <leandroal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-15 12:01:44 -07:00
Joe Perches
55b1804c67 net/irda: Use static const char * const where possible
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-14 20:22:05 -07:00
Felix Fietkau
2944f45d9d mac80211: add a note about iterating interfaces during add_interface()
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-14 16:14:26 -04:00
David S. Miller
e548833df8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/mac80211/main.c
2010-09-09 22:27:33 -07:00
Eric Dumazet
719f835853 udp: add rehash on connect()
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).

Problem is that following sequence :

fd = socket(...)
connect(fd, &remote, ...)

not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)

Sequence is :
 - autobind() : choose a random local port, insert socket in hash tables
              [while local address is INADDR_ANY]
 - connect() : set remote address and port, change local address to IP
              given by a route lookup.

When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.

One solution to this problem is to rehash datagram socket if needed.

We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.

This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.

Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 21:45:01 -07:00
Joe Perches
e3634169bc include/net/raw.h: Convert raw_seq_private macro to inline
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 13:42:22 -07:00
Julian Anastasov
6523ce1525 ipvs: fix active FTP
- Do not create expectation when forwarding the PORT
  command to avoid blocking the connection. The problem is that
  nf_conntrack_ftp.c:help() tries to create the same expectation later in
  POST_ROUTING and drops the packet with "dropping packet" message after
  failure in nf_ct_expect_related.

- Change ip_vs_update_conntrack to alter the conntrack
  for related connections from real server. If we do not alter the reply in
  this direction the next packet from client sent to vport 20 comes as NEW
  connection. We alter it but may be some collision happens for both
  conntracks and the second conntrack gets destroyed immediately. The
  connection stucks too.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 10:39:57 -07:00
Li Zefan
3fb5a99191 cls_cgroup: Fix rcu lockdep warning
Dave reported an rcu lockdep warning on 2.6.35.4 kernel

task->cgroups and task->cgroups->subsys[i] are protected by RCU.
So we avoid accessing invalid pointers here. This might happen,
for example, when you are deref-ing those pointers while someone
move @task from one cgroup to another.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-03 09:55:24 -07:00
John W. Linville
78ab952717 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-09-02 13:30:07 -04:00
Gerrit Renker
3d5b99ae82 TCP: update initial windows according to RFC 5681
This updates the use of larger initial windows, as originally specified in
RFC 3390, to use the newer IW values specified in RFC 5681, section 3.1.

The changes made in RFC 5681 are:
 a) the setting now is more clearly specified in units of segments (as the
    comments  by John Heffner emphasized, this was not very clear in RFC 3390);
 b) for connections with 1095 < SMSS <= 2190 there is now a change:
    - RFC 3390 says that IW <= 4380,
    - RFC 5681 says that IW = 3 * SMSS <= 6570.

Since RFC 3390 is older and "only" proposed standard, whereas the newer RFC 5681
is already draft standard, it seems preferable to use the newer IW variant.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:50:44 -07:00
Gerrit Renker
22b71c8f4f tcp/dccp: Consolidate common code for RFC 3390 conversion
This patch consolidates initial-window code common to TCP and CCID-2:
 * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
 * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:45:26 -07:00
Jerry Chu
dca43c75e7 tcp: Add TCP_USER_TIMEOUT socket option.
This patch provides a "user timeout" support as described in RFC793. The
socket option is also needed for the the local half of RFC5482 "TCP User
Timeout Option".

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int,
when > 0, to specify the maximum amount of time in ms that transmitted
data may remain unacknowledged before TCP will forcefully close the
corresponding connection and return ETIMEDOUT to the application. If
0 is given, TCP will continue to use the system default.

Increasing the user timeouts allows a TCP connection to survive extended
periods without end-to-end connectivity. Decreasing the user timeouts
allows applications to "fail fast" if so desired. Otherwise it may take
upto 20 minutes with the current system defaults in a normal WAN
environment.

The socket option can be made during any state of a TCP connection, but
is only effective during the synchronized states of a connection
(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK).
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
TCP_USER_TIMEOUT will overtake keepalive to determine when to close a
connection due to keepalive failure.

The option does not change in anyway when TCP retransmits a packet, nor
when a keepalive probe will be sent.

This option, like many others, will be inherited by an acceptor from its
listener.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:23:33 -07:00
Johannes Berg
34d4bc4d41 mac80211: support runtime interface type changes
Add support to mac80211 for changing the interface
type even when the interface is UP, if the driver
supports it.

To achieve this
 * add a new driver callback for switching,
 * split some of the interface up/down code out
   into new functions (do_open/do_stop), and
 * maintain an own __SDATA_RUNNING bit that will
   not be set during interface type, so that any
   other code doesn't use the interface.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:53:31 -04:00
Johannes Berg
c0692b8fe2 cfg80211: allow changing port control protocol
Some vendor specified mechanisms for 802.1X-style
functionality use a different protocol than EAP
(even if EAP is vendor-extensible). Allow setting
the ethertype for the protocol when a driver has
support for this. The default if unspecified is
EAP, of course.

Note: This is suitable only for station mode, not
      for AP implementation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:07 -04:00
Johannes Berg
8789d459bc mac80211: allow scan to complete from any context
The ieee80211_scan_completed() function was a frequent
source of potential deadlocks, since it is called by
drivers but may call back into drivers, so drivers had
to make sure to call it without any locks held, which
frequently lead to more complex code in drivers. Avoid
that problem by allowing the function to be called in
any context, and queueing the actual work it does.
Also update the documentation for it to indicate this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:06 -04:00
Joe Perches
145ce502e4 net/sctp: Use pr_fmt and pr_<level>
Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to
use do { print } while (0) guards.
Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when
lines were continued.
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Add a missing newline in "Failed bind hash alloc"

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 14:11:48 -07:00
John W. Linville
e569aa78ba Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	drivers/net/wireless/libertas/if_sdio.c
2010-08-25 14:51:42 -04:00
Bob Copeland
2738bd682d mac80211: trivial spelling fixes
Fix spelling and readability of a few lines of kernel doc:

    s/issueing/issuing/g
    s/approriate/appropriate/g
    s/supported by simply/supported simply by/
    s/IEEE80211_HW_BEACON_FILTERING/IEEE80211_HW_BEACON_FILTER/g

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-25 14:33:17 -04:00
David S. Miller
ad1af0fedb tcp: Combat per-cpu skew in orphan tests.
As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.

Fix this by doing the full expensive sum check if the optimized check
triggers.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-08-25 02:27:49 -07:00
Johannes Berg
d70e96932d cfg80211: add some documentation
Add some documentation for cfg80211. I'm hoping some of
the regulatory documentation will be filled by somebody
more familiar with it, hint hint! :)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-24 16:32:06 -04:00
Johannes Berg
633dd1ea68 mac80211: fix docbook
Fix a small problem in the documentation for
ieee80211_request_smps, and a now erroneous
inclusion of enum ieee80211_key_alg, which no
longer exists after the change to ciphers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-24 16:32:04 -04:00
Johannes Berg
2e161f78e5 cfg80211/mac80211: extensible frame processing
Allow userspace to register for more than just
action frames by giving the frame subtype, and
make it possible to use this in various modes
as well.

With some tweaks and some added functionality
this will, in the future, also be usable in AP
mode and be able to replace the cooked monitor
interface currently used in that case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-24 16:27:56 -04:00
Johannes Berg
633adf1ad1 cfg80211: mark ieee80211_hdrlen const
This function analyses only its single, value-passed
argument, and has no side effects. Thus it can be
const, which makes mac80211 smaller, for example:

   text	   data	    bss	    dec	    hex	filename
 362518	  16720	    884	 380122	  5ccda	mac80211.ko (before)
 362358	  16720	    884	 379962	  5cc3a	mac80211.ko (after)

a 160 byte saving in text size, and an optimisation
because the function won't be called as often.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-24 16:27:54 -04:00
Eric Dumazet
81ce790bd7 irda: use net_device_stats from struct net_device
struct net_device has its own struct net_device_stats member, so use
this one instead of a private copy in the irlan_cb struct.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-21 23:32:31 -07:00
Dmitry Kozlov
00959ade36 PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)
PPP: introduce "pptp" module which implements point-to-point tunneling protocol using pppox framework
NET: introduce the "gre" module for demultiplexing GRE packets on version criteria
     (required to pptp and ip_gre may coexists)
NET: ip_gre: update to use the "gre" module

This patch introduces then pptp support to the linux kernel which
dramatically speeds up pptp vpn connections and decreases cpu usage in
comparison of existing user-space implementation
(poptop/pptpclient). There is accel-pptp project
(https://sourceforge.net/projects/accel-pptp/) to utilize this module,
it contains plugin for pppd to use pptp in client-mode and modified
pptpd (poptop) to build high-performance pptp NAS.

There was many changes from initial submitted patch, most important are:
1. using rcu instead of read-write locks
2. using static bitmap instead of dynamically allocated
3. using vmalloc for memory allocation instead of BITS_PER_LONG + __get_free_pages
4. fixed many coding style issues
Thanks to Eric Dumazet.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-21 23:05:39 -07:00
Grégoire Baron
eb4d406545 net/sched: add ACT_CSUM action to update packets checksums
net/sched: add ACT_CSUM action to update packets checksums

ACT_CSUM can be called just after ACT_PEDIT in order to re-compute some
altered checksums in IPv4 and IPv6 packets. The following checksums are
supported by this patch:
 - IPv4: IPv4 header, ICMP, IGMP, TCP, UDP & UDPLite
 - IPv6: ICMPv6, TCP, UDP & UDPLite
It's possible to request in the same action to update different kind of
checksums, if the packets flow mix TCP, UDP and UDPLite, ...

An example of usage is done in the associated iproute2 patch.

Version 3 changes:
 - remove useless goto instructions
 - improve IPv6 hop options decoding

Version 2 changes:
 - coding style correction
 - remove useless arguments of some functions
 - use stack in tcf_csum_dump()
 - add tcf_csum_skb_nextlayer() to factor code

Signed-off-by: Gregoire Baron <baronchon@n7mm.org>
Acked-by: jamal <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-20 01:42:59 -07:00
Arnd Bergmann
0906a372f2 net/netfilter: __rcu annotations
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2010-08-19 17:18:01 -07:00
Paul E. McKenney
d34a16661e net: convert to rcu_dereference_index_check()
The task_cls_classid() function applies rcu_dereference() to integers,
which does not work with the shiny new sparse-based checking in
rcu_dereference().  This commit therefore moves to the new RCU API
rcu_dereference_index_check().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-08-19 17:17:59 -07:00
Oliver Hartkopp
2244d07bfa net: simplify flags for tx timestamping
This patch removes the abstraction introduced by the union skb_shared_tx in
the shared skb data.

The access of the different union elements at several places led to some
confusion about accessing the shared tx_flags e.g. in skb_orphan_try().

    http://marc.info/?l=linux-netdev&m=128084897415886&w=2

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-19 00:08:30 -07:00
Johannes Berg
97359d1235 mac80211: use cipher suite selectors
Currently, mac80211 translates the cfg80211
cipher suite selectors into ALG_* values.
That isn't all too useful, and some drivers
benefit from the distinction between WEP40
and WEP104 as well. Therefore, convert it
all to use the cipher suite selectors.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-16 16:45:11 -04:00
Johannes Berg
d1f5b7a34a mac80211: allow drivers to request SM PS mode change
Sometimes drivers have more information than the
stack about how their antennas/chains are used,
and may require that the SM PS mode be changed.
This could happen, for example, when detecting
that the user disconnected an antenna. Thus this
patch introduces API to allow drivers to request
SM PS mode changes.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-16 15:26:40 -04:00
Johannes Berg
7da7cc1d42 mac80211: per interface idle notification
Sometimes we don't just need to know whether or
not the device is idle, but also per interface.
This adds that reporting capability to mac80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-16 15:26:40 -04:00
John W. Linville
9714d315d2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-08-16 14:40:44 -04:00
John W. Linville
4e6cbfd09c mac80211: support use of NAPI for bottom-half processing
This patch implement basic infrastructure to support use of NAPI by
mac80211-based hardware drivers.

Because mac80211 devices can support multiple netdevs, a dummy netdev
is used for interfacing with the NAPI code in the core of the network
stack.  That structure is hidden from the hardware drivers, but the
actual napi_struct is exposed in the ieee80211_hw structure so that the
poll routines in drivers can retrieve that structure.  Hardware drivers
can also specify their own weight value for NAPI polling.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-16 14:39:46 -04:00
David S. Miller
1c114f42a5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-08-10 15:59:38 -07:00
Mat Martineau
db12d647cc Bluetooth: Use 3-DH5 payload size for default ERTM max PDU size
The previous value of 672 for L2CAP_DEFAULT_MAX_PDU_SIZE is based on
the default L2CAP MTU.  That default MTU is calculated from the size
of two DH5 packets, minus ACL and L2CAP b-frame header overhead.

ERTM is used with newer basebands that typically support larger 3-DH5
packets, and i-frames and s-frames have more header overhead.  With
clean RF conditions, basebands will typically attempt to use 1021-byte
3-DH5 packets for maximum throughput.  Adjusting for 2 bytes of ACL
headers plus 10 bytes of worst-case L2CAP headers yields 1009 bytes
of payload.

This PDU size imposes less overhead for header bytes and gives the
baseband the option to choose 3-DH5 packets, but is small enough for
ERTM traffic to interleave well with other L2CAP or SCO data.
672-byte payloads do not allow the most efficient over-the-air
packet choice, and cannot achieve maximum throughput over BR/EDR.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-08-10 07:59:11 -04:00
Mat Martineau
fa235562fb Bluetooth: Change default L2CAP ERTM retransmit timeout
The L2CAP specification requires that the ERTM retransmit timeout be at
least 2 seconds for BR/EDR connections.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-08-10 07:59:11 -04:00
Randy Dunlap
53c3fa2064 net/sock.h: add missing kernel-doc notation
Add missing kernel-doc notation to struct sock:

Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_pid'
Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_cred'
Warning(include/net/sock.h:324): No description found for parameter 'sk_classid'
Warning(include/net/sock.h:324): Excess struct/union/enum/typedef member 'sk_peercred' description in 'sock'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-10 00:09:20 -07:00
David S. Miller
e225567960 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-08-06 13:30:43 -07:00
John W. Linville
c0068c8589 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2010-08-05 15:54:28 -04:00
Linus Torvalds
6ba74014c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
  phy/marvell: add 88ec048 support
  igb: Program MDICNFG register prior to PHY init
  e1000e: correct MAC-PHY interconnect register offset for 82579
  hso: Add new product ID
  can: Add driver for esd CAN-USB/2 device
  l2tp: fix export of header file for userspace
  can-raw: Fix skb_orphan_try handling
  Revert "net: remove zap_completion_queue"
  net: cleanup inclusion
  phy/marvell: add 88e1121 interface mode support
  u32: negative offset fix
  net: Fix a typo from "dev" to "ndev"
  igb: Use irq_synchronize per vector when using MSI-X
  ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
  e1000e: Fix irq_synchronize in MSI-X case
  e1000e: register pm_qos request on hardware activation
  ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
  net: Add getsockopt support for TCP thin-streams
  cxgb4: update driver version
  cxgb4: add new PCI IDs
  ...

Manually fix up conflicts in:
 - drivers/net/e1000e/netdev.c: due to pm_qos registration
   infrastructure changes
 - drivers/net/phy/marvell.c: conflict between adding 88ec048 support
   and cleaning up the IDs
 - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
   conflict (registration change vs marking it static)
2010-08-04 11:47:58 -07:00
David S. Miller
83bf2e4089 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-08-02 15:07:58 -07:00
Aneesh Kumar K.V
eda25e4616 net/9p: Implement TXATTRCREATE 9p call
TXATTRCREATE:  Prepare a fid for setting xattr value on a file system object.

 size[4] TXATTRCREATE tag[2] fid[4] name[s] attr_size[8] flags[4]
 size[4] RXATTRCREATE tag[2]

txattrcreate gets a fid pointing to xattr. This fid can later be
used to set the xattr value.

flag value is derived from set Linux setxattr. The manpage says
"The flags parameter can be used to refine the semantics of the operation.
XATTR_CREATE specifies a pure create, which fails if the named attribute
exists already. XATTR_REPLACE specifies a pure replace operation, which
fails if the named attribute does not already exist. By default (no flags),
the extended attribute will be created if need be, or will simply replace
the value if the attribute exists."

The actual setxattr operation happens when the fid is clunked. At that point
the written byte count and the attr_size specified in TXATTRCREATE should be
same otherwise an error will be returned.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:34 -05:00
Aneesh Kumar K.V
0ef63f345c net/9p: Implement attrwalk 9p call
TXATTRWALK: Descend a ATTR namespace

 size[4] TXATTRWALK tag[2] fid[4] newfid[4] name[s]
 size[4] RXATTRWALK tag[2] size[8]

txattrwalk gets a fid pointing to xattr. This fid can later be
used to read the xattr value. If name is NULL the fid returned
can be used to get the list of extended attribute associated to
the file system object.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:33 -05:00
M. Mohan Kumar
ef56547efa 9p: Implement LOPEN
Implement 9p2000.L version of open(LOPEN) interface in 9p client.

For LOPEN, no need to convert the flags to and from 9p mode to VFS mode.

Synopsis:

    size[4] Tlopen tag[2] fid[4] mode[4]

    size[4] Rlopen tag[2] qid[13] iounit[4]

[Fix mode bit format - jvrao@linux.vnet.ibm.com]

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbegren <ericvh@gmail.com>
2010-08-02 14:28:32 -05:00
Venkateswararao Jujjuri (JV)
5643135a28 fs/9p: This patch implements TLCREATE for 9p2000.L protocol.
SYNOPSIS

    size[4] Tlcreate tag[2] fid[4] name[s] flags[4] mode[4] gid[4]

    size[4] Rlcreate tag[2] qid[13] iounit[4]

DESCRIPTION

The Tlreate request asks the file server to create a new regular file with the
name supplied, in the directory (dir) represented by fid.
The mode argument specifies the permissions to use. New file is created with
the uid if the fid and with supplied gid.

The flags argument represent Linux access mode flags with which the caller
is requesting to open the file with. Protocol allows all the Linux access
modes but it is upto the server to allow/disallow any of these acess modes.
If the server doesn't support any of the access mode, it is expected to
return error.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:32 -05:00
M. Mohan Kumar
01a622bd74 9p: Implement TMKDIR
Implement TMKDIR as part of 2000.L Work

Synopsis

    size[4] Tmkdir tag[2] fid[4] name[s] mode[4] gid[4]

    size[4] Rmkdir tag[2] qid[13]

Description

    mkdir asks the file server to create a directory with given name,
    mode and gid. The qid for the new directory is returned with
    the mkdir reply message.

Note: 72 is selected as the opcode for TMKDIR from the reserved list.

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:31 -05:00
M. Mohan Kumar
4b43516ab1 9p: Implement TMKNOD
Synopsis

    size[4] Tmknod tag[2] fid[4] name[s] mode[4] major[4] minor[4] gid[4]

    size[4] Rmknod tag[2] qid[13]

Description

    mknod asks the file server to create a device node with given major and
    minor number, mode and gid. The qid for the new device node is returned
    with the mknod reply message.

[sripathik@in.ibm.com: Fix error handling code]

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:30 -05:00
Venkateswararao Jujjuri (JV)
50cc42ff3d 9p: Define and implement TSYMLINK for 9P2000.L
Create a symbolic link

SYNOPSIS

size[4] Tsymlink tag[2] fid[4] name[s] symtgt[s] gid[4]

size[4] Rsymlink tag[2] qid[13]

DESCRIPTION

Create a symbolic link named 'name' pointing to 'symtgt'.
gid represents the effective group id of the caller.
The  permissions of a symbolic link are irrelevant hence it is omitted
from the protocol.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Reviewed-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:29 -05:00
Venkateswararao Jujjuri (JV)
652df9a7fd 9p: Define and implement TLINK for 9P2000.L
This patch adds a helper function to get the dentry from inode and
uses it in creating a Hardlink

SYNOPSIS

size[4] Tlink tag[2] dfid[4] oldfid[4] newpath[s]

size[4] Rlink tag[2]

DESCRIPTION

Create a link 'newpath' in directory pointed by dfid linking to oldfid path.

[sripathik@in.ibm.com : p9_client_link should not free req structure
if p9_client_rpc has returned an error.]

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:25 -05:00
Sripathi Kodi
87d7845aa0 9p: Implement client side of setattr for 9P2000.L protocol.
SYNOPSIS

      size[4] Tsetattr tag[2] attr[n]

      size[4] Rsetattr tag[2]

    DESCRIPTION

      The setattr command changes some of the file status information.
      attr resembles the iattr structure used in Linux kernel. It
      specifies which status parameter is to be changed and to what
      value. It is laid out as follows:

         valid[4]
            specifies which status information is to be changed. Possible
            values are:
            ATTR_MODE       (1 << 0)
            ATTR_UID        (1 << 1)
            ATTR_GID        (1 << 2)
            ATTR_SIZE       (1 << 3)
            ATTR_ATIME      (1 << 4)
            ATTR_MTIME      (1 << 5)
            ATTR_ATIME_SET  (1 << 7)
            ATTR_MTIME_SET  (1 << 8)

            The last two bits represent whether the time information
            is being sent by the client's user space. In the absense
            of these bits the server always uses server's time.

         mode[4]
            File permission bits

         uid[4]
            Owner id of file

         gid[4]
            Group id of the file

         size[8]
            File size

         atime_sec[8]
            Time of last file access, seconds

         atime_nsec[8]
            Time of last file access, nanoseconds

         mtime_sec[8]
            Time of last file modification, seconds

         mtime_nsec[8]
            Time of last file modification, nanoseconds

Explanation of the patches:
--------------------------

*) The kernel just copies relevent contents of iattr structure to
   p9_iattr_dotl structure and passes it down to the client. The
   only check it has is calling inode_change_ok()
*) The p9_iattr_dotl structure does not have ctime and ia_file
   parameters because I don't think these are needed in our case.
   The client user space can request updating just ctime by calling
   chown(fd, -1, -1). This is handled on server side without a need
   for putting ctime on the wire.
*) The server currently supports changing mode, time, ownership and
   size of the file.
*) 9P RFC says "Either all the changes in wstat request happen, or
   none of them does: if the request succeeds, all changes were made;
   if it fails, none were."
   I have not done anything to implement this specifically because I
   don't see a reason.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:25:10 -05:00
Sripathi Kodi
f085312204 9p: getattr client implementation for 9P2000.L protocol.
SYNOPSIS

              size[4] Tgetattr tag[2] fid[4] request_mask[8]

              size[4] Rgetattr tag[2] lstat[n]

           DESCRIPTION

              The getattr transaction inquires about the file identified by fid.
              request_mask is a bit mask that specifies which fields of the
              stat structure is the client interested in.

              The reply will contain a machine-independent directory entry,
              laid out as follows:

                 st_result_mask[8]
                    Bit mask that indicates which fields in the stat structure
                    have been populated by the server

                 qid.type[1]
                    the type of the file (directory, etc.), represented as a bit
                    vector corresponding to the high 8 bits of the file's mode
                    word.

                 qid.vers[4]
                    version number for given path

                 qid.path[8]
                    the file server's unique identification for the file

                 st_mode[4]
                    Permission and flags

                 st_uid[4]
                    User id of owner

                 st_gid[4]
                    Group ID of owner

                 st_nlink[8]
                    Number of hard links

                 st_rdev[8]
                    Device ID (if special file)

                 st_size[8]
                    Size, in bytes

                 st_blksize[8]
                    Block size for file system IO

                 st_blocks[8]
                    Number of file system blocks allocated

                 st_atime_sec[8]
                    Time of last access, seconds

                 st_atime_nsec[8]
                    Time of last access, nanoseconds

                 st_mtime_sec[8]
                    Time of last modification, seconds

                 st_mtime_nsec[8]
                    Time of last modification, nanoseconds

                 st_ctime_sec[8]
                    Time of last status change, seconds

                 st_ctime_nsec[8]
                    Time of last status change, nanoseconds

                 st_btime_sec[8]
                    Time of creation (birth) of file, seconds

                 st_btime_nsec[8]
                    Time of creation (birth) of file, nanoseconds

                 st_gen[8]
                    Inode generation

                 st_data_version[8]
                    Data version number

              request_mask and result_mask bit masks contain the following bits
                 #define P9_STATS_MODE          0x00000001ULL
                 #define P9_STATS_NLINK         0x00000002ULL
                 #define P9_STATS_UID           0x00000004ULL
                 #define P9_STATS_GID           0x00000008ULL
                 #define P9_STATS_RDEV          0x00000010ULL
                 #define P9_STATS_ATIME         0x00000020ULL
                 #define P9_STATS_MTIME         0x00000040ULL
                 #define P9_STATS_CTIME         0x00000080ULL
                 #define P9_STATS_INO           0x00000100ULL
                 #define P9_STATS_SIZE          0x00000200ULL
                 #define P9_STATS_BLOCKS        0x00000400ULL

                 #define P9_STATS_BTIME         0x00000800ULL
                 #define P9_STATS_GEN           0x00001000ULL
                 #define P9_STATS_DATA_VERSION  0x00002000ULL

                 #define P9_STATS_BASIC         0x000007ffULL
                 #define P9_STATS_ALL           0x00003fffULL

        This patch implements the client side of getattr implementation for
        9P2000.L. It introduces a new structure p9_stat_dotl for getting
        Linux stat information along with QID. The data layout is similar to
        stat structure in Linux user space with the following major
        differences:

        inode (st_ino) is not part of data. Instead qid is.

        device (st_dev) is not part of data because this doesn't make sense
        on the client.

        All time variables are 64 bit wide on the wire. The kernel seems to use
        32 bit variables for these variables. However, some of the architectures
        have used 64 bit variables and glibc exposes 64 bit variables to user
        space on some architectures. Hence to be on the safer side we have made
        these 64 bit in the protocol. Refer to the comments in
        include/asm-generic/stat.h

        There are some additional fields: st_btime_sec, st_btime_nsec, st_gen,
        st_data_version apart from the bitmask, st_result_mask. The bit mask
        is filled by the server to indicate which stat fields have been
        populated by the server. Currently there is no clean way for the
        server to obtain these additional fields, so it sends back just the
        basic fields.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbegren <ericvh@gmail.com>
2010-08-02 14:25:09 -05:00
Sripathi Kodi
7751bdb3a0 9p: readdir implementation for 9p2000.L
This patch implements the kernel part of readdir() implementation for 9p2000.L

    Change from V3: Instead of inode, server now sends qids for each dirent

    SYNOPSIS

    size[4] Treaddir tag[2] fid[4] offset[8] count[4]
    size[4] Rreaddir tag[2] count[4] data[count]

    DESCRIPTION

    The readdir request asks the server to read the directory specified by 'fid'
    at an offset specified by 'offset' and return as many dirent structures as
    possible that fit into count bytes. Each dirent structure is laid out as
    follows.

            qid.type[1]
              the type of the file (directory, etc.), represented as a bit
              vector corresponding to the high 8 bits of the file's mode
              word.

            qid.vers[4]
              version number for given path

            qid.path[8]
              the file server's unique identification for the file

            offset[8]
              offset into the next dirent.

            type[1]
              type of this directory entry.

            name[256]
              name of this directory entry.

    This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L.
    This function sends P9_TREADDIR command to the server. In response the server
    sends a buffer filled with dirent structures. This is different from the
    existing v9fs_dir_readdir() call which receives stat structures from the server.
    This results in significant speedup of readdir() on large directories.
    For example, doing 'ls >/dev/null' on a directory with 10000 files on my
    laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds
    with the new readdir.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:25:07 -05:00
Changli Gao
f43dc98b3b netfilter: nf_nat: make unique_tuple return void
The only user of unique_tuple() get_unique_tuple() doesn't care about the
return value of unique_tuple(), so make unique_tuple() return void (nothing).

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-08-02 17:20:54 +02:00
Simon Horman
5c0d2374a1 ipvs: provide default ip_vs_conn_{in,out}_get_proto
This removes duplicate code by providing a default implementation
which is used by 3 of the 4 modules that provide these call.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-08-02 17:12:44 +02:00
Changli Gao
ee92d37861 netfilter: nf_conntrack_extend: introduce __nf_ct_ext_exist()
some users of nf_ct_ext_exist() know ct->ext isn't NULL. For these users, the
check for ct->ext isn't necessary, the function __nf_ct_ext_exist() can be
used instead.

the type of the return value of nf_ct_ext_exist() is changed to bool.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-08-02 17:06:19 +02:00
David Miller
ea4bd8ba80 Bluetooth: Use list_head for HCI blacklist head
The bdaddr in the list root is completely unused and just
taking up space.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-31 16:06:58 -07:00
John W. Linville
ae3568adf4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-07-29 14:47:07 -04:00
Christian Lamparter
b7753c8cd5 cfg80211: fix dev <-> wiphy typo
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-29 12:55:00 -04:00
Johannes Berg
e5b900d228 mac80211: allow drivers to request DTIM period
Some features require knowing the DTIM period
before associating. This implements the ability
to wait for a beacon in mac80211 before assoc
to provide this value. It is optional since
most likely not all drivers will need this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-29 12:55:00 -04:00
Felix Fietkau
4552124543 mac80211: inform drivers about the off-channel status on channel changes
For some drivers it can be useful to know whether the channel they're
supposed to switch to is going to be used for short off-channel work or
scanning, or whether the hardware is expected to stay on it for a while
longer. This is important for various kinds of calibration work, which
takes longer to complete and should keep some persistent state, even if
the channel temporarily changes.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-28 16:24:02 -04:00
John W. Linville
099284bdec Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2010-07-28 16:17:49 -04:00
David S. Miller
bb7e95c8fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x_main.c

Merge bnx2x bug fixes in by hand... :-/

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-27 21:01:35 -07:00
Marcel Holtmann
e73439d8c0 Bluetooth: Defer SCO setup if mode change is pending
Certain headsets such as the Motorola H350 will reject SCO and eSCO
connection requests while the ACL is transitioning from sniff mode
to active mode. Add synchronization so that SCO and eSCO connection
requests will wait until the ACL has fully transitioned to active mode.

< HCI Command: Exit Sniff Mode (0x02|0x0004) plen 2
    handle 12
> HCI Event: Command Status (0x0f) plen 4
    Exit Sniff Mode (0x02|0x0004) status 0x00 ncmd 1
< HCI Command:  Setup Synchronous Connection (0x01|0x0028) plen 17
    handle 12 voice setting 0x0040
> HCI Event: Command Status (0x0f) plen 4
    Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1
> HCI Event: Number of Completed Packets (0x13) plen 5
    handle 12 packets 1
> HCI Event: Mode Change (0x14) plen 6
    status 0x00 handle 12 mode 0x00 interval 0
    Mode: Active
> HCI Event: Synchronous Connect Complete (0x2c) plen 17
    status 0x10 handle 14 bdaddr 00:1A:0E:50:28:A4 type SCO
    Error: Connection Accept Timeout Exceeded

Signed-off-by: Ron Shaffer <rshaffer@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-27 12:29:04 -07:00
Joe Perches
073730d771 wireless: Convert wiphy_debug macro to function
Save a few bytes of text

(allyesconfig)
$ size drivers/net/wireless/built-in.o*
   text	   data	    bss	    dec	    hex	filename
3924568	 100548	 871056	4896172	 4ab5ac	drivers/net/wireless/built-in.o.new
3926520	 100548	 871464	4898532	 4abee4	drivers/net/wireless/built-in.o.old

$ size net/wireless/core.o*
   text	   data	    bss	    dec	    hex	filename
  12843	    216	   3768	  16827	   41bb	net/wireless/core.o.new
  12328	    216	   3656	  16200	   3f48	net/wireless/core.o

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-27 15:14:13 -04:00
Joe Perches
e1db74fcc3 include/net/cfg80211.h: Add wiphy_<level> printk equivalents
Simplify logging messages for wiphy devices

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-27 15:14:13 -04:00
John W. Linville
800f65bba8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-commands.h
2010-07-27 11:59:19 -04:00
John W. Linville
3289a8368c lib80211: remove unused host_build_iv option
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-26 15:09:04 -04:00
stephen hemminger
3b87956ea6 net sched: fix race in mirred device removal
This fixes hang when target device of mirred packet classifier
action is removed.

If a mirror or redirection action is configured to cause packets
to go to another device, the classifier holds a ref count, but was assuming
the adminstrator cleaned up all redirections before removing. The fix
is to add a notifier and cleanup during unregister.

The new list is implicitly protected by RTNL mutex because
it is held during filter add/delete as well as notifier.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-24 21:04:20 -07:00
David S. Miller
2a88e7e559 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-commands.h
2010-07-23 14:03:38 -07:00
Hannes Eder
7f1c407579 IPVS: make FTP work with full NAT support
Use nf_conntrack/nf_nat code to do the packet mangling and the TCP
sequence adjusting.  The function 'ip_vs_skb_replace' is now dead
code, so it is removed.

To SNAT FTP, use something like:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
    --vport 21 -j SNAT --to-source 192.168.10.10
and for the data connections in passive mode:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
    --vportctl 21 -j SNAT --to-source 192.168.10.10
using '-m state --state RELATED' would also works.

Make sure the kernel modules ip_vs_ftp, nf_conntrack_ftp, and
nf_nat_ftp are loaded.

[ up-port and minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-23 12:48:52 +02:00
Gustavo F. Padovan
3f30fc1570 net: remove last uses of __attribute__((packed))
Network code uses the __packed macro instead of __attribute__((packed)).

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-21 14:44:29 -07:00
Gustavo F. Padovan
942875ffc1 irda: Use __packed annotation instead IRDA_PACKED macro
Remove IRDA_PACKED macro, which maps to __attribute__((packed)). IRDA is
one of the last users of __attribute__((packet)). Networking code uses
__packed now.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-21 14:44:29 -07:00
Gustavo F. Padovan
66c853cc21 Bluetooth: Use __packed annotation
To make net/ and include/net/ code consistent use __packed instead of
__attribute__ ((packed)). Bluetooth subsystem was one of the last net
subsys still using __attribute__ ((packed)).

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:13 -07:00
Gustavo F. Padovan
5d8868ff3d Bluetooth: Add Google's copyright to L2CAP
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:12 -07:00
Suraj Sumangala
9981151086 Bluetooth: Implemented HCI frame reassembly for RX from stream
Implemented frame reassembly implementation for reassembling fragments
received from stream.

Signed-off-by: Suraj Sumangala <suraj@atheros.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:12 -07:00
Suraj Sumangala
33e882a5f2 Bluetooth: Implement hci_reassembly helper to reassemble RX packets
Implements feature to reassemble received HCI frames from any input stream

Signed-off-by: Suraj Sumangala <suraj@atheros.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:12 -07:00
Suraj Sumangala
cd4c53919e Bluetooth: Add one more buffer for HCI stream reassembly
Additional reassembly buffer to keep track of stream reasembly

Signed-off-by: Suraj Sumangala <suraj@atheros.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:12 -07:00
Gustavo F. Padovan
ce5706bd69 Bluetooth: Add Copyright notice to L2CAP
Copyright for the time I worked on L2CAP during the Google Summer of Code
program.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:11 -07:00
Gustavo F. Padovan
e0f66218b3 Bluetooth: Remove the send_lock spinlock from ERTM
Using a lock to deal with the ERTM race condition - interruption with
new data from the hci layer - is wrong. We should use the native skb
backlog queue.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:09 -07:00
Gustavo F. Padovan
cf6c2c0b9f Bluetooth: Disconnect early if mode is not supported
When mode is mandatory we shall not send connect request and report this
to the userspace as well.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:09 -07:00
Ron Shaffer
2d0a03460a Bluetooth: Reassigned copyright to Code Aurora Forum
Qualcomm, Inc. has reassigned rights to Code Aurora Forum. Accordingly,
as files are modified by Code Aurora Forum members, the copyright
statement will be updated.

Signed-off-by: Ron Shaffer <rshaffer@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:06 -07:00
Ron Shaffer
04fafe4ed7 Bluetooth: Remove extraneous white space
Deleted extraneous white space from the end of several lines

Signed-off-by: Ron Shaffer <rshaffer@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:05 -07:00
Johan Hedberg
f03585689f Bluetooth: Add blacklist support for incoming connections
In some circumstances it could be desirable to reject incoming
connections on the baseband level. This patch adds this feature through
two new ioctl's: HCIBLOCKADDR and HCIUNBLOCKADDR. Both take a simple
Bluetooth address as a parameter. BDADDR_ANY can be used with
HCIUNBLOCKADDR to remove all devices from the blacklist.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-07-21 10:39:05 -07:00
David S. Miller
11fe883936 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/vhost/net.c
	net/bridge/br_device.c

Fix merge conflict in drivers/vhost/net.c with guidance from
Stephen Rothwell.

Revert the effects of net-2.6 commit 573201f36f
since net-next-2.6 has fixes that make bridge netpoll work properly thus
we don't need it disabled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-20 18:25:24 -07:00
John W. Linville
c28991a02c wireless: correct sparse warning in wext-compat.c
CHECK   net/wireless/wext-compat.c
net/wireless/wext-compat.c:1434:5: warning: symbol 'cfg80211_wext_siwpmksa' was not declared. Should it be static?

Add declaration in cfg80211.h.  Also add an EXPORT_SYMBOL_GPL, since all
the peer functions have it.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-20 16:49:37 -04:00
John W. Linville
4f366c5dab wireless: only use alpha2 regulatory information from country IE
The meaning and/or usage of the country IE is somewhat poorly defined.
In practice, this means that regulatory rulesets in a country IE are
often incomplete and might be untrustworthy.  This removes the code
associated with interpreting those rulesets while preserving respect
for country "alpha2" codes also contained in the country IE.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-20 16:44:35 -04:00
Johannes Berg
4ced3f74da mac80211: move QoS-enable to BSS info
Ever since

commit e1b3ec1a2a
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Mon Mar 29 12:18:34 2010 +0200

    mac80211: explicitly disable/enable QoS

mac80211 is telling drivers, in particular
iwlwifi, whether QoS is enabled or not.

However, this is only relevant for station mode,
since only then will any device send nullfunc
frames and need to know whether they should be
QoS frames or not. In other modes, there are
(currently) no frames the device is supposed to
send.

When you now consider virtual interfaces, it
becomes apparent that the current mechanism is
inadequate since it enables/disables QoS on a
global scale, where for nullfunc frames it has
to be on a per-interface scale.

Due to the above considerations, we can change
the way mac80211 advertises the QoS state to
drivers to only ever advertise it as "off" in
station mode, and make it a per-BSS setting.

Tested-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-20 16:02:58 -04:00
Eric Dumazet
f86586fa48 tcp: sizeof struct tcp_skb_cb is 44
Correct comment stating sizeof(struct tcp_skb_cb) is 36 or 40, since its
44 bytes, since commit 951dbc8ac7 ([IPV6]: Move nextheader offset
to the IP6CB).

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-15 21:41:00 -07:00
Pablo Neira Ayuso
cca5cf91c7 nfnetlink_log: do not expose NFULNL_COPY_DISABLED to user-space
This patch moves NFULNL_COPY_PACKET definition from
linux/netfilter/nfnetlink_log.h to net/netfilter/nfnetlink_log.h
since this copy mode is only for internal use.

I have also changed the value from 0x03 to 0xff. Thus, we avoid
a gap from user-space that may confuse users if we add new
copy modes in the future.

This change was introduced in:
http://www.spinics.net/lists/netfilter-devel/msg13535.html

Since this change is not included in any stable Linux kernel,
I think it's safe to make this change now. Anyway, this copy
mode does not make any sense from user-space, so this patch
should not break any existing setup.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-15 11:27:41 +02:00
Tom Herbert
b0f77d0eae net: fix problem in reading sock TX queue
Fix problem in reading the tx_queue recorded in a socket.  In
dev_pick_tx, the TX queue is read by doing a check with
sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get.
The problem is that there is not mutual exclusion across these
calls in the socket so it it is possible that the queue in the
sock can be invalidated after sk_tx_queue_recorded is called so
that sk_tx_queue get returns -1, which sets 65535 in queue_index
and thus dev_pick_tx returns 65536 which is a bogus queue and
can cause crash in dev_queue_xmit.

We fix this by only calling sk_tx_queue_get which does the proper
checks.  The interface is that sk_tx_queue_get returns the TX queue
if the sock argument is non-NULL and TX queue is recorded, else it
returns -1.  sk_tx_queue_recorded is no longer used so it can be
completely removed.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-14 20:50:29 -07:00
Changli Gao
7ba4291007 inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() and inet_sendpage()
a new boolean flag no_autobind is added to structure proto to avoid the autobind
calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
TCP's sendmsg() and sendpage() pathes.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/inet_common.h |    4 ++++
 include/net/sock.h        |    1 +
 include/net/tcp.h         |    8 ++++----
 net/ipv4/af_inet.c        |   15 +++++++++------
 net/ipv4/tcp.c            |   11 +++++------
 net/ipv4/tcp_ipv4.c       |    3 +++
 net/ipv6/af_inet6.c       |    8 ++++----
 net/ipv6/tcp_ipv6.c       |    3 +++
 8 files changed, 33 insertions(+), 20 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-12 20:21:46 -07:00
Changli Gao
53d3176b28 net: cleanups
remove useless blanks.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/inet_common.h |   55 ++++-------
 include/net/tcp.h         |  222 +++++++++++++++++-----------------------------
 include/net/udp.h         |   38 +++----
 3 files changed, 123 insertions(+), 192 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-12 20:21:45 -07:00
David S. Miller
597e608a84 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-07-07 15:59:38 -07:00
David S. Miller
e490c1defe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-07-02 22:42:06 -07:00
John Fastabend
f0796d5c73 net: decreasing real_num_tx_queues needs to flush qdisc
Reducing real_num_queues needs to flush the qdisc otherwise
skbs with queue_mappings greater then real_num_tx_queues can
be sent to the underlying driver.

The flow for this is,

dev_queue_xmit()
	dev_pick_tx()
		skb_tx_hash()  => hash using real_num_tx_queues
		skb_set_queue_mapping()
	...
	qdisc_enqueue_root() => enqueue skb on txq from hash
...
dev->real_num_tx_queues -= n
...
sch_direct_xmit()
	dev_hard_start_xmit()
		ndo_start_xmit(skb,dev) => skb queue set with old hash

skbs are enqueued on the qdisc with skb->queue_mapping set
0 < queue_mappings < real_num_tx_queues.  When the driver
decreases real_num_tx_queues skb's may be dequeued from the
qdisc with a queue_mapping greater then real_num_tx_queues.

This fixes a case in ixgbe where this was occurring with DCB
and FCoE. Because the driver is using queue_mapping to map
skbs to tx descriptor rings we can potentially map skbs to
rings that no longer exist.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-02 21:59:07 -07:00
John Fastabend
4ef6acff83 sched: qdisc_reset_all_tx is calling qdisc_reset without qdisc_lock
When calling qdisc_reset() the qdisc lock needs to be held.  In
this case there is at least one driver i4l which is using this
without holding the lock.  Add the locking here.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-02 21:59:07 -07:00
David S. Miller
05318bc905 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	drivers/net/wireless/libertas/host.h
2010-07-01 17:34:14 -07:00
Changli Gao
d6bebca92c fragment: add fast path for in-order fragments
add fast path for in-order fragments

As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/inet_frag.h |    1 +
 net/ipv4/ip_fragment.c  |   12 ++++++++++++
 net/ipv6/reassembly.c   |   11 +++++++++++
 3 files changed, 24 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30 13:44:29 -07:00
Eric Dumazet
4ce3c183fc snmp: 64bit ipstats_mib for all arches
/proc/net/snmp and /proc/net/netstat expose SNMP counters.

Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.

This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.

This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.

# netstat -s|egrep "InOctets|OutOctets"
    InOctets: 244068329096
    OutOctets: 244069348848

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30 13:31:19 -07:00
Kulikov Vasiliy
787a34456d net/neighbour.h: fix typo
'Shoul' must be 'should'.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30 12:06:52 -07:00
Andreas Steffen
4efd7e8335 xfrm: fix XFRMA_MARK extraction in xfrm_mark_get
Determine the size of the xfrm_mark struct, not of its pointer.

Signed-off-by: Andreas Steffen <andreas.steffen@strongswan.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30 10:43:31 -07:00
Sjur Braendeland
529d6dad5b caif-driver: Add CAIF-SPI Protocol driver.
This patch introduces the CAIF SPI Protocol Driver for
CAIF Link Layer.

This driver implements a platform driver to accommodate for a
platform specific SPI device. A general platform driver is not
possible as there are no SPI Slave side Kernel API defined.
A sample CAIF SPI Platform device can be found in
.../Documentation/networking/caif/spi_porting.txt

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-29 00:08:21 -07:00
Changli Gao
210d6de78c act_mirred: don't clone skb when skb isn't shared
don't clone skb when skb isn't shared

When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need
to clone a new skb. As the skb will be freed after this function returns, we
can use it freely once we get a reference to it.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/sch_generic.h |   11 +++++++++--
 net/sched/act_mirred.c    |    6 +++---
 2 files changed, 12 insertions(+), 5 deletions(-)
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-28 23:24:32 -07:00
Florian Westphal
172d69e63c syncookies: add support for ECN
Allows use of ECN when syncookies are in effect by encoding ecn_ok
into the syn-ack tcp timestamp.

While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES.
With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc
removes the "if (0)".

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-26 22:00:03 -07:00
Eric Dumazet
1823e4c80e snmp: add align parameter to snmp_mib_init()
In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.

Callers can use __alignof__(type) to provide correct
alignment.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-25 21:33:17 -07:00
Tim Gardner
a8756201ba netfilter: xt_connbytes: Force CT accounting to be enabled
Check at rule install time that CT accounting is enabled. Force it
to be enabled if not while also emitting a warning since this is not
the default state.

This is in preparation for deprecating CONFIG_NF_CT_ACCT upon which
CONFIG_NETFILTER_XT_MATCH_CONNBYTES depended being set.

Added 2 CT accounting support functions:

nf_ct_acct_enabled() - Get CT accounting state.
nf_ct_set_acct() - Enable/disable CT accountuing.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-25 14:44:07 +02:00
Juuso Oikarinen
fa61cf70a6 cfg80211/mac80211: Update set_tx_power to use mBm instead of dBm units
In preparation for a TX power setting interface in the nl80211, change the
.set_tx_power function to use mBm units instead of dBm for greater accuracy and
smaller power levels.

Also, already in advance move the tx_power_setting enumeration to nl80211.

This change affects the .tx_set_power function prototype. As a result, the
corresponding changes are needed to modules using it. These are mac80211,
iwmc3200wifi and rndis_wlan.

Cc: Samuel Ortiz <samuel.ortiz@intel.com>
Cc: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Samuel Ortiz <samuel.ortiz@intel.com>
Acked-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-24 15:42:33 -04:00
David S. Miller
8244132ea8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/ip_output.c
2010-06-23 18:26:27 -07:00
Jiri Olsa
7b2ff18ee7 net - IP_NODEFRAG option for IPv4 socket
this patch is implementing IP_NODEFRAG option for IPv4 socket.
The reason is, there's no other way to send out the packet with user
customized header of the reassembly part.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-23 13:16:38 -07:00
Justin P. Mattock
1dc8d8c06d net: Fix a typo in netlink.h
Fix a typo in include/net/netlink.h
should be finalize instead of finanlize

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-23 12:58:40 -07:00
Eric Dumazet
8f1c14b2e3 snmp: fix SNMP_ADD_STATS()
commit aa2ea0586d (tcp: fix outsegs stat for TSO segments) incorrectly
assumed SNMP_ADD_STATS() was used from BH context.

Fix this using mib[!in_softirq()] instead of mib[0]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-23 11:48:19 -07:00
Juuso Oikarinen
f90754c15f mac80211: Add interface for driver to temporarily disable dynamic ps
This mechanism introduced in this patch applies (at least) for hardware
designs using a single shared antenna for both WLAN and BT. In these designs,
the antenna must be toggled between WLAN and BT.

In those hardware, managing WLAN co-existence with Bluetooth requires WLAN
full power save whenever there is Bluetooth activity in order for WLAN to be
able to periodically relinquish the antenna to be used for BT. This is because
BT can only access the shared antenna when WLAN is idle or asleep.

Some hardware, for instance the wl1271, are able to indicate to the host
whenever there is BT traffic. In essence, the hardware will send an indication
to the host whenever there is, for example, SCO traffic or A2DP traffic, and
will send another indication when the traffic is over.

The hardware gets information of Bluetooth traffic via hardware co-existence
control lines - these lines are used to negotiate the shared antenna
ownership. The hardware will give the antenna to BT whenever WLAN is sleeping.

This patch adds the interface to mac80211 to facilitate temporarily disabling
of dynamic power save as per request of the WLAN driver. This interface will
immediately force WLAN to full powersave, hence allowing BT coexistence as
described above.

In these kind of shared antenna desings, when WLAN powersave is fully disabled,
Bluetooth will not work simultaneously with WLAN at all. This patch does not
address that problem. This interface will not change PSM state, so if PSM is
disabled it will remain so. Solving this problem requires knowledge about BT
state, and is best done in user-space.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-21 15:39:59 -04:00
Sjur Braendeland
2aa40aef9d caif: Use link layer MTU instead of fixed MTU
Previously CAIF supported maximum transfer size of ~4050.
The transfer size is now calculated dynamically based on the
link layers mtu size.

Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-20 19:46:06 -07:00
Sjur Braendeland
a7da1f55a8 caif: Bugfix - RFM must support segmentation.
CAIF Remote File Manager may send or receive more than 4050 bytes.
Due to this The CAIF RFM service have to support segmentation.

Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-20 19:46:05 -07:00
Sjur Braendeland
b1c74247b9 caif: Bugfix not all services uses flow-ctrl.
Flow control is not used by all CAIF services.
The usage of flow control is now part of the gerneal
initialization function for CAIF Services.

Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-20 19:46:05 -07:00
David S. Miller
bb9c03d8a6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-06-17 14:19:06 -07:00
Patrick McHardy
c68cd6cc21 netfilter: nf_nat: support user-specified SNAT rules in LOCAL_IN
2.6.34 introduced 'conntrack zones' to deal with cases where packets
from multiple identical networks are handled by conntrack/NAT. Packets
are looped through veth devices, during which they are NATed to private
addresses, after which they can continue normally through the stack
and possibly have NAT rules applied a second time.

This works well, but is needlessly complicated for cases where only
a single SNAT/DNAT mapping needs to be applied to these packets. In that
case, all that needs to be done is to assign each network to a seperate
zone and perform NAT as usual. However this doesn't work for packets
destined for the machine performing NAT itself since its corrently not
possible to configure SNAT mappings for the LOCAL_IN chain.

This patch adds a new INPUT chain to the NAT table and changes the
targets performing SNAT to be usable in that chain.

Example usage with two identical networks (192.168.0.0/24) on eth0/eth1:

iptables -t raw -A PREROUTING -i eth0 -j CT --zone 1
iptables -t raw -A PREROUTING -i eth0 -j MARK --set-mark 1
iptables -t raw -A PREROUTING -i eth1 -j CT --zone 2
iptabels -t raw -A PREROUTING -i eth1 -j MARK --set-mark 2

iptables -t nat -A INPUT       -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A POSTROUTING -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A INPUT       -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t nat -A POSTROUTING -m mark --mark 2 -j NETMAP --to 10.0.1.0/24

iptables -t raw -A PREROUTING -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A OUTPUT     -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A PREROUTING -d 10.0.1.0/24 -j CT --zone 2
iptables -t raw -A OUTPUT     -d 10.0.1.0/24 -j CT --zone 2

iptables -t nat -A PREROUTING -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT     -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A PREROUTING -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT     -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-17 06:12:26 +02:00
Eric W. Biederman
7361c36c52 af_unix: Allow credentials to work across user and pid namespaces.
In unix_skb_parms store pointers to struct pid and struct cred instead
of raw uid, gid, and pid values, then translate the credentials on
reception into values that are meaningful in the receiving processes
namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:58:16 -07:00
Eric W. Biederman
257b5358b3 scm: Capture the full credentials of the scm sender.
Start capturing not only the userspace pid, uid and gid values of the
sending process but also the struct pid and struct cred of the sending
process as well.

This is in preparation for properly supporting SCM_CREDENTIALS for
sockets that have different uid and/or pid namespaces at the different
ends.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:56 -07:00
Eric W. Biederman
109f6e39fa af_unix: Allow SO_PEERCRED to work across namespaces.
Use struct pid and struct cred to store the peer credentials on struct
sock.  This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.

This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:55 -07:00
Eric W. Biederman
812e876e84 scm: Reorder scm_cookie.
Reorder the fields in scm_cookie so they pack better on 64bit.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:34 -07:00
Florian Westphal
8c76368174 syncookies: check decoded options against sysctl settings
Discard the ACK if we find options that do not match current sysctl
settings.

Previously it was possible to create a connection with sack, wscale,
etc. enabled even if the feature was disabled via sysctl.

Also remove an unneeded call to tcp_sack_reset() in
cookie_check_timestamp: Both call sites (cookie_v4_check,
cookie_v6_check) zero "struct tcp_options_received", hand it to
tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
and then call cookie_check_timestamp().

Even if num_sacks/dsacks were changed, the structure is allocated on
the stack and after cookie_check_timestamp returns only a few selected
members are copied to the inet_request_sock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:42:15 -07:00
Eric Dumazet
317fe0e6c5 inetpeer: restore small inet_peer structures
Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches.

Thats a bit unfortunate, since old size was exactly 64 bytes.

This can be solved, using an union between this rcu_head an four fields,
that are normally used only when a refcount is taken on inet_peer.
rcu_head is used only when refcnt=-1, right before structure freeing.

Add a inet_peer_refcheck() function to check this assertion for a while.

We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 11:55:39 -07:00
Eric Dumazet
aa1039e73c inetpeer: RCU conversion
inetpeer currently uses an AVL tree protected by an rwlock.

It's possible to make most lookups use RCU

1) Add a struct rcu_head to struct inet_peer

2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
lookup. This is a normal function, not a macro like lookup().

3) Add a limit to number of links followed by lookup_rcu_bh(). This is
needed in case we fall in a loop.

4) add an smp_wmb() in link_to_pool() right before node insert.

5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
last reference to an inet_peer, since lockless readers could increase
refcount, even while we hold peers.lock.

6) Delay struct inet_peer freeing after rcu grace period so that
lookup_rcu_bh() cannot crash.

7) inet_getpeer() first attempts lockless lookup.
   Note this lookup can fail even if target is in AVL tree, but a
concurrent writer can let tree in a non correct form.
   If this attemps fails, lock is taken a regular lookup is performed
again.

8) convert peers.lock from rwlock to a spinlock

9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
128 bytes)
In a future patch, this is probably possible to revert this part, if rcu
field is put in an union to share space with rid, ip_id_count, tcp_ts &
tcp_ts_stamp. These fields being manipulated only with refcnt > 0.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:23:38 -07:00
David S. Miller
16fb62b6b4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-06-15 13:49:24 -07:00
Juuso Oikarinen
ff61638105 mac80211: Fix ps-qos network latency handling
The ps-qos latency handling is broken. It uses predetermined latency values
to select specific dynamic PS timeouts. With common AP configurations, these
values overlap with beacon interval and are therefore essentially useless
(for network latencies less than the beacon interval, PSM is disabled.)

This patch remedies the problem by replacing the predetermined network latency
values with one high value (1900ms) which is used to go trigger full psm. For
backwards compatibility, the value 2000ms is still mapped to a dynamic ps
timeout of 100ms.

Currently also the mac80211 internal value for storing user space configured
dynamic PSM values is incorrectly in the driver visible ieee80211_conf struct.
Move it to the ieee80211_local struct.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-15 16:00:48 -04:00
Changli Gao
a3433f35a5 tcp: unify tcp flag macros
unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH,
TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced
with the corresponding TCPHDR_*.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/tcp.h                      |   24 ++++++-------
 net/ipv4/tcp.c                         |    8 ++--
 net/ipv4/tcp_input.c                   |    2 -
 net/ipv4/tcp_output.c                  |   59 ++++++++++++++++-----------------
 net/netfilter/nf_conntrack_proto_tcp.c |   32 ++++++-----------
 net/netfilter/xt_TCPMSS.c              |    4 --
 6 files changed, 58 insertions(+), 71 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:56:19 -07:00
Patrick McHardy
f9181f4ffc Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	include/net/netfilter/xt_rateest.h
	net/bridge/br_netfilter.c
	net/netfilter/nf_conntrack_core.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-15 17:31:06 +02:00
Juuso Oikarinen
685429623f mac80211: Fix circular locking dependency in ARP filter handling
There is a circular locking dependency when configuring the
hardware ARP filters on association, occurring when flushing the mac80211
workqueue. This is what happens:

[   92.026800] =======================================================
[   92.030507] [ INFO: possible circular locking dependency detected ]
[   92.030507] 2.6.34-04781-g2b2c009 #85
[   92.030507] -------------------------------------------------------
[   92.030507] modprobe/5225 is trying to acquire lock:
[   92.030507]  ((wiphy_name(local->hw.wiphy))){+.+.+.}, at: [<ffffffff8105b5c0>] flush_workq
ueue+0x0/0xb0
[   92.030507]
[   92.030507] but task is already holding lock:
[   92.030507]  (rtnl_mutex){+.+.+.}, at: [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20
[   92.030507]
[   92.030507] which lock already depends on the new lock.
[   92.030507]
[   92.030507]
[   92.030507] the existing dependency chain (in reverse order) is:
[   92.030507]
[   92.030507] -> #2 (rtnl_mutex){+.+.+.}:
[   92.030507]        [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[   92.030507]        [<ffffffff81341754>] mutex_lock_nested+0x44/0x300
[   92.030507]        [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20
[   92.030507]        [<ffffffffa022d47c>] ieee80211_assoc_done+0x6c/0xe0 [mac80211]
[   92.030507]        [<ffffffffa022f2ad>] ieee80211_work_work+0x31d/0x1280 [mac80211]

[   92.030507] -> #1 ((&local->work_work)){+.+.+.}:
[   92.030507]        [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[   92.030507]        [<ffffffff8105a51a>] worker_thread+0x22a/0x370
[   92.030507]        [<ffffffff8105ecc6>] kthread+0x96/0xb0
[   92.030507]        [<ffffffff81003a94>] kernel_thread_helper+0x4/0x10
[   92.030507]
[   92.030507] -> #0 ((wiphy_name(local->hw.wiphy))){+.+.+.}:
[   92.030507]        [<ffffffff81075fdc>] __lock_acquire+0x1c0c/0x1d50
[   92.030507]        [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[   92.030507]        [<ffffffff8105b60e>] flush_workqueue+0x4e/0xb0
[   92.030507]        [<ffffffffa023ff7b>] ieee80211_stop_device+0x2b/0xb0 [mac80211]
[   92.030507]        [<ffffffffa0231635>] ieee80211_stop+0x3e5/0x680 [mac80211]

The locking in this case is quite complex. Fix the problem by rewriting the
way the hardware ARP filter list is handled - i.e. make a copy of the address
list to the bss_conf struct, and provide that list to the hardware driver
when needed.

The current patch will enable filtering also in promiscuous mode. This may need
to be changed in the future.

Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-14 15:42:31 -04:00
Teemu Paasikivi
fbd2c8dcbc mac80211: Set basic rates while joining ibss network
This patch adds support to nl80211 and mac80211 to set basic rates when
joining/creating ibss network.

Original patch was posted by Johannes Berg on the linux-wireless posting list.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-14 15:39:34 -04:00
Johannes Berg
85ad181ea7 mac80211: allow drivers to sleep in ampdu_action
Allow drivers to sleep, and indicate this in
the documentation. ath9k has some locking I
don't understand, so keep it safe and disable
BHs in it, all other drivers look fine with
the context change.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-14 15:39:28 -04:00
Johannes Berg
5d22c89b9b mac80211: remove non-irqsafe aggregation callbacks
The non-irqsafe aggregation start/stop done
callbacks are currently only used by ath9k_htc,
and can cause callbacks into the driver again.
This might lead to locking issues, which will
only get worse as we modify locking. To avoid
trouble, remove the non-irqsafe versions and
change ath9k_htc to use those instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-14 15:39:27 -04:00
Eric Dumazet
c7de2cf053 pkt_sched: gen_kill_estimator() rcu fixes
gen_kill_estimator() API is incomplete or not well documented, since
caller should make sure an RCU grace period is respected before
freeing stats_lock.

This was partially addressed in commit 5d944c640b
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already RCU
protected.

A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock, already RCU
protected.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-11 18:37:08 -07:00
David S. Miller
14599f1e34 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	drivers/net/wireless/wl12xx/wl1271.h
	drivers/net/wireless/wl12xx/wl1271_cmd.h
2010-06-11 11:34:06 -07:00
Changli Gao
d8d1f30b95 net-next: remove useless union keyword
remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 23:31:35 -07:00
Eric Dumazet
592fcb9dfa ip: ip_ra_control() rcu fix
commit 66018506e1 (ip: Router Alert RCU conversion) introduced RCU
lookups to ip_call_ra_chain(). It missed proper deinit phase :
When ip_ra_control() deletes an ip_ra_chain, it should make sure
ip_call_ra_chain() users can not start to use socket during the rcu
grace period. It should also delay the sock_put() after the grace
period, or we risk a premature socket freeing and corruptions, as
raw sockets are not rcu protected yet.

This delay avoids using expensive atomic_inc_not_zero() in
ip_call_ra_chain().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 22:47:08 -07:00
Jiri Pirko
88e7594a97 phonet: use call_rcu for phonet device free
Use call_rcu rather than synchronize_rcu.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-09 16:14:25 -07:00
Eric Dumazet
b3c5163fe0 netfilter: nf_conntrack: per_cpu untracking
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet, slowing down performance.

This patch converts it to a per_cpu variable.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-09 14:43:38 +02:00
Eric Dumazet
5bfddbd46a netfilter: nf_conntrack: IPS_UNTRACKED bit
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet. This is bad for performance.
__read_mostly annotation is also a bad choice.

This patch introduces IPS_UNTRACKED bit so that we can use later a
per_cpu untrack structure more easily.

A new helper, nf_ct_untracked_get() returns a pointer to
nf_conntrack_untracked.

Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add
IPS_NAT_DONE_MASK bits to untracked status.

nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-08 16:09:52 +02:00
Johannes Berg
abe37c4b84 wireless: fix kernel-doc
Fix a whole bunch of kernel-doc warnings
and errors that crop up when running it on
mac80211 and cfg80211; the latter isn't
normally done so lots of bit-rot happened.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-08 09:31:21 -04:00
Eric Dumazet
339bb99e4a netfilter: xt_rateest: Better struct xt_rateest layout
We currently dirty two cache lines in struct xt_rateest, this hurts SMP
performance.

This patch moves lock/bstats/rstats at beginning of structure so that
they share a single cache line.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-08 14:11:19 +02:00
Eric Dumazet
66018506e1 ip: Router Alert RCU conversion
Straightforward conversion to RCU.

One rwlock becomes a spinlock, and is static.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07 21:25:21 -07:00
Tom Herbert
a8b690f98b tcp: Fix slowness in read /proc/net/tcp
This patch address a serious performance issue in reading the
TCP sockets table (/proc/net/tcp).

Reading the full table is done by a number of sequential read
operations.  At each read operation, a seek is done to find the
last socket that was previously read.  This seek operation requires
that the sockets in the table need to be counted up to the current
file position, and to count each of these requires taking a lock for
each non-empty bucket.  The whole algorithm is O(n^2).

The fix is to cache the last bucket value, offset within the bucket,
and the file position returned by the last read operation.   On the
next sequential read, the bucket and offset are used to find the
last read socket immediately without needing ot scan the previous
buckets  the table.  This algorithm t read the whole table is O(n).

The improvement offered by this patch is easily show by performing
cat'ing /proc/net/tcp on a machine with a lot of connections.  With
about 182K connections in the table, I see the following:

- Without patch
time cat /proc/net/tcp > /dev/null

real	1m56.729s
user	0m0.214s
sys	1m56.344s

- With patch
time cat /proc/net/tcp > /dev/null

real	0m0.894s
user	0m0.290s
sys	0m0.594s

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07 00:43:42 -07:00
David S. Miller
eedc765ca4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sfc/net_driver.h
	drivers/net/sfc/siena.c
2010-06-06 17:42:02 -07:00
Steffen Klassert
8764ab2ca7 net: check for refcount if pop a stacked dst_entry
xfrm triggers a warning if dst_pop() drops a refcount
on a noref dst. This patch changes dst_pop() to
skb_dst_pop(). skb_dst_pop() drops the refcnt only
on a refcounted dst. Also we don't clone the child
dst_entry, so it is not refcounted and we can use
skb_dst_set_noref() in xfrm_output_one().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-04 15:56:00 -07:00
Sujith
38a6cc7538 mac80211: Remove deprecated sta_notify commands
STA_NOTIFY_ADD and STA_NOTIFY_REMOVE have no users anymore,
and station addition/removal are indicated to drivers
using sta_add() and sta_remove(), which can sleep.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-04 15:32:25 -04:00
Johannes Berg
6a8579d0e6 mac80211: clean up ieee80211_stop_tx_ba_session
There's no sense in letting anything but internal
mac80211 functions set the initiator to anything
but WLAN_BACK_INITIATOR, since WLAN_BACK_RECIPIENT
is only valid when we have received a frame from
the peer, which we react to directly in mac80211.

The debugfs code I recently added got this wrong
as well.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-03 14:10:45 -04:00
Juuso Oikarinen
2b2c009ecf mac80211: Add support for hardware ARP query filtering
Some hardware allow extended filtering of ARP frames not intended for
the host. To perform such filtering, the hardware needs to know the current
IP address(es) of the host, bound to its interface.

Add support for ARP filtering to mac80211 by adding a new op to the driver
interface, allowing to configure the current IP addresses. This op is called
upon association with the currently configured address(es), and when
associated whenever the IP address(es) change.

This patch adds configuration of IPv4 addresses only, as IPv6 addresses don't
need ARP filtering.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-03 14:10:45 -04:00
Johannes Berg
095dfdb0c4 mac80211: remove tx status ampdu_ack_map
There's a single use of this struct member, but
as it is write-only it clearly not necessary.
Thus we can free up some space here, even if we
don't need it right now it seems pointless to
carry around the variable.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-03 14:10:44 -04:00
Eric Dumazet
bc10502dba net: use __packed annotation
cleanup patch.

Use new __packed annotation in net/ and include/
(except netfilter)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-03 03:21:52 -07:00
Johannes Berg
252aa631f8 cfg80211: make action channel type optional
When sending action frames, we want to verify
that we do that on the correct channel. However,
checking the channel type in addition can get in
the way, since the channel type could change on
the fly during an association, and it's not
useful to have the channel type anyway since it
has no effect on the transmission. Therefore,
make it optional to specify so that if wanted,
it can still be checked, but is not required.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-02 16:13:27 -04:00
Walter Goldens
77c2061d10 wireless: fix several minor description typos
Signed-off-by: Walter Goldens <goldenstranger@yahoo.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-02 16:13:18 -04:00
Arnaud Ebalard
20c59de2e6 ipv6: Refactor update of IPv6 flowi destination address for srcrt (RH) option
There are more than a dozen occurrences of following code in the
IPv6 stack:

    if (opt && opt->srcrt) {
            struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
            ipv6_addr_copy(&final, &fl.fl6_dst);
            ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
            final_p = &final;
    }

Replace those with a helper. Note that the helper overrides final_p
in all cases. This is ok as final_p was previously initialized to
NULL when declared.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 07:08:31 -07:00
Eric Dumazet
c2d9ba9bce net: CONFIG_NET_NS reduction
Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 05:16:23 -07:00
Eric Dumazet
79640a4ca6 net: add additional lock to qdisc to increase throughput
When many cpus compete for sending frames on a given qdisc, the qdisc
spinlock suffers from very high contention.

The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire
the lock, and cannot dequeue packets fast enough, since it must wait for
this lock for each dequeued packet.

One solution to this problem is to force all cpus spinning on a second
lock before trying to get the main lock, when/if they see
__QDISC_STATE_RUNNING already set.

The owning cpu then compete with at most one other cpu for the main
lock, allowing for higher dequeueing rate.

Based on a previous patch from Alexander Duyck. I added the heuristic to
avoid the atomic in fast path, and put the new lock far away from the
cache line used by the dequeue worker. Also try to release the busylock
lock as late as possible.

Tests with following script gave a boost from ~50.000 pps to ~600.000
pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
(A single netperf flow can reach ~800.000 pps on this platform)

for j in `seq 0 3`; do
  for i in `seq 0 7`; do
    netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
  done
done

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 05:09:29 -07:00
Eric Dumazet
3711210576 net: QDISC_STATE_RUNNING dont need atomic bit ops
__QDISC_STATE_RUNNING is always changed while qdisc lock is held.

We can avoid two atomic operations in xmit path, if we move this bit in
a new __state container.

Location of this __state container is carefully chosen so that fast path
only dirties one qdisc cache line.

THROTTLED bit could later be moved into this __state location too, to
avoid dirtying first qdisc cache line.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 03:24:13 -07:00
Eric Dumazet
bc135b23d0 net: Define accessors to manipulate QDISC_STATE_RUNNING
Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a
second patch will move on another location.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 03:23:51 -07:00
Eric Dumazet
b1faf56664 net: sock_queue_err_skb() dont mess with sk_forward_alloc
Correct sk_forward_alloc handling for error_queue would need to use a
backlog of frames that softirq handler could not deliver because socket
is owned by user thread. Or extend backlog processing to be able to
process normal and error packets.

Another possibility is to not use mem charge for error queue, this is
what I implemented in this patch.

Note: this reverts commit 29030374
(net: fix sk_forward_alloc corruptions), since we dont need to lock
socket anymore.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 23:44:05 -07:00
Linus Torvalds
72da3bc0cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
  netlink: bug fix: wrong size was calculated for vfinfo list blob
  netlink: bug fix: don't overrun skbs on vf_port dump
  xt_tee: use skb_dst_drop()
  netdev/fec: fix ifconfig eth0 down hang issue
  cnic: Fix context memory init. on 5709.
  drivers/net: Eliminate a NULL pointer dereference
  drivers/net/hamradio: Eliminate a NULL pointer dereference
  be2net: Patch removes redundant while statement in loop.
  ipv6: Add GSO support on forwarding path
  net: fix __neigh_event_send()
  vhost: fix the memory leak which will happen when memory_access_ok fails
  vhost-net: fix to check the return value of copy_to/from_user() correctly
  vhost: fix to check the return value of copy_to/from_user() correctly
  vhost: Fix host panic if ioctl called with wrong index
  net: fix lock_sock_bh/unlock_sock_bh
  net/iucv: Add missing spin_unlock
  net: ll_temac: fix checksum offload logic
  net: ll_temac: fix interrupt bug when interrupt 0 is used
  sctp: dubious bitfields in sctp_transport
  ipmr: off by one in __ipmr_fill_mroute()
  ...
2010-05-28 10:18:40 -07:00
Eric Dumazet
8a74ad60a5 net: fix lock_sock_bh/unlock_sock_bh
This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-27 00:30:53 -07:00
Dan Carpenter
ff937938e7 sctp: dubious bitfields in sctp_transport
Sparse complains because these one-bit bitfields are signed.
  include/net/sctp/structs.h:879:24: error: dubious one-bit signed bitfield
  include/net/sctp/structs.h:889:31: error: dubious one-bit signed bitfield
  include/net/sctp/structs.h:895:26: error: dubious one-bit signed bitfield
  include/net/sctp/structs.h:898:31: error: dubious one-bit signed bitfield
  include/net/sctp/structs.h:901:27: error: dubious one-bit signed bitfield

It doesn't cause a problem in the current code, but it would be better
to clean it up.  This was introduced by c0058a35aa: "sctp: Save some
room in the sctp_transport by using bitfields".

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 00:40:11 -07:00
Herbert Xu
ea16f912a6 cls_cgroup: Initialise classid when module is absent
When the cls_cgroup module is not loaded, task_cls_classid will
return an uninitialised classid instead of zero.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-25 18:53:57 -07:00
Linus Torvalds
b1cdc4670b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits)
  drivers/net/usb/asix.c: Fix pointer cast.
  be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
  proc_dointvec: write a single value
  hso: add support for new products
  Phonet: fix potential use-after-free in pep_sock_close()
  ath9k: remove VEOL support for ad-hoc
  ath9k: change beacon allocation to prefer the first beacon slot
  sock.h: fix kernel-doc warning
  cls_cgroup: Fix build error when built-in
  macvlan: do proper cleanup in macvlan_common_newlink() V2
  be2net: Bug fix in init code in probe
  net/dccp: expansion of error code size
  ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep
  wireless: fix sta_info.h kernel-doc warnings
  wireless: fix mac80211.h kernel-doc warnings
  iwlwifi: testing the wrong variable in iwl_add_bssid_station()
  ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs()
  ath9k_htc: dereferencing before check in hif_usb_tx_cb()
  rt2x00: Fix rt2800usb TX descriptor writing.
  rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions.
  ...
2010-05-25 16:59:51 -07:00
David S. Miller
a261af927d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-05-25 13:15:11 -07:00
Alexey Dobriyan
4be929be34 kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Randy Dunlap
acfbe96a30 sock.h: fix kernel-doc warning
Fix sock.h kernel-doc warning:
Warning(include/net/sock.h:1438): No description found for parameter 'wq'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 23:54:18 -07:00
Herbert Xu
937eada45f cls_cgroup: Fix build error when built-in
There is a typo in cgroup_cls_state when cls_cgroup is built-in.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 23:53:37 -07:00
Randy Dunlap
4e8998f09b wireless: fix mac80211.h kernel-doc warnings
Fix kernel-doc warnings in mac80211.h:

Warning(include/net/mac80211.h:838): No description found for parameter 'ap_addr'
Warning(include/net/mac80211.h:1726): No description found for parameter 'get_survey'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:42 -04:00
John W. Linville
3dc3fc52ea Revert "ath9k: Group Key fix for VAPs"
This reverts commit 03ceedea97.

This patch was reported to cause a regression in which connectivity is
lost and cannot be reestablished after a suspend/resume cycle.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 14:59:27 -04:00
Linus Torvalds
a69eee4988 Revert "ath9k: Group Key fix for VAPs"
This reverts commit 03ceedea97, since it
breaks resume from suspend-to-ram on Rafael's Acer Ferrari One.
NetworkManager thinks everything is ok, but it can't connect to the AP
to get an IP address after the resume.

In fact, it even breaks resume for non-ath9k chipsets: reverting it also
fixes Rafael's Toshiba Protege R500 with the iwlagn driver.  As Johannes
says:

  "Indeed, this patch needs to be reverted. That mac80211 change is wrong
   and completely unnecessary."

Reported-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Daniel Yingqiang Ma <yma.cool@gmail.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-24 07:45:43 -07:00
Herbert Xu
f845172531 cls_cgroup: Store classid in struct sock
Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 00:12:34 -07:00
Sjur Braendeland
7aecf4944f caif: Bugfix - use standard Linux lists
Discovered bug when running high number of parallel connect requests.
Replace buggy home brewed list with linux/list.h.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:57:41 -07:00
Sripathi Kodi
4681dbdacb 9p: add 9P2000.L rename operation
I made a V2 of this patch on top of my patches for VFS switches.
All the changes were due to change in some offsets.

rename - change name of file or directory

size[4] Trename tag[2] fid[4] newdirfid[4] name[s]
size[4] Rrename tag[2]

The rename message is used to change the name of a file, possibly moving it
to a new directory.  The 9P wstat message can only rename a file within the
same directory.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-05-21 16:44:34 -05:00
Sripathi Kodi
bda8e77520 9p: add 9P2000.L statfs operation
I made a V2 of this patch on top of my patches for VFS switches. The
change was adding v9fs_statfs pointer to v9fs_super_ops_dotl
instead of v9fs_super_ops.

statfs - get file system statistics

size[4] Tstatfs tag[2] fid[4]
size[4] Rstatfs tag[2] type[4] bsize[4] blocks[8] bfree[8] bavail[8]
                files[8] ffree[8] fsid[8] namelen[4]

The statfs message is used to request file system information returned
by the statfs(2) system call, which is used by df(1) to report file
system and disk space usage.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-05-21 16:44:33 -05:00
David S. Miller
41499bd676 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-05-20 23:12:18 -07:00
Joerg Marx
fc350777c7 netfilter: nf_conntrack: fix a race in __nf_conntrack_confirm against nf_ct_get_next_corpse()
This race was triggered by a 'conntrack -F' command running in parallel
to the insertion of a hash for a new connection. Losing this race led to
a dead conntrack entry effectively blocking traffic for a particular
connection until timeout or flushing the conntrack hashes again.
Now the check for an already dying connection is done inside the lock.

Signed-off-by: Joerg Marx <joerg.marx@secunet.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-20 15:55:30 +02:00
Herbert Xu
e9d3e08497 ipv6: Replace inet6_ifaddr->dead with state
This patch replaces the boolean dead flag on inet6_ifaddr with
a state enum.  This allows us to roll back changes when deleting
an address according to whether DAD has completed or not.

This patch only adds the state field and does not change the logic.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:36:06 -07:00
Eric Dumazet
d19d56ddc8 net: Introduce skb_tunnel_rx() helper
skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.

There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:36:55 -07:00
David S. Miller
820ae8a80e Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-05-17 21:09:11 -07:00
andrew hendry
37cda78741 X25: Move accept approve flag to bitfield
Moves the x25 accept approve flag from char into bitfield.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:27 -07:00
andrew hendry
b7792e34cb X25: Move interrupt flag to bitfield
Moves the x25 interrupt flag from char into bitfield.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:27 -07:00
andrew hendry
cb863ffd4a X25: Move qbit flag to bitfield
Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:26 -07:00
Eric Dumazet
407eadd996 net: implements ip_route_input_noref()
ip_route_input() is the version returning a refcounted dst, while
ip_route_input_noref() returns a non refcounted one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:51 -07:00
Eric Dumazet
7fee226ad2 net: add a noref bit on skb dst
Use low order bit of skb->_skb_dst to tell dst is not refcounted.

Change _skb_dst to _skb_refdst to make sure all uses are catched.

skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.

New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)

skb_dst_drop() drops a reference only if skb dst was refcounted.

skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.

Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().

Use skb_dst_force() in dev_requeue_skb().

Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:50 -07:00
John W. Linville
6fe70aae0d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-05-17 13:57:43 -04:00
David S. Miller
6811d58fc1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	include/linux/if_link.h
2010-05-16 22:26:58 -07:00
Eric Dumazet
a465419b1f net: Introduce sk_route_nocaps
TCP-MD5 sessions have intermittent failures, when route cache is
invalidated. ip_queue_xmit() has to find a new route, calls
sk_setup_caps(sk, &rt->u.dst), destroying the 

sk->sk_route_caps &= ~NETIF_F_GSO_MASK

that MD5 desperately try to make all over its way (from
tcp_transmit_skb() for example)

So we send few bad packets, and everything is fine when
tcp_transmit_skb() is called again for this socket.

Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:36:33 -07:00
Eric Dumazet
35790c0421 tcp: fix MD5 (RFC2385) support
TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.

We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.

Fix is to disable preemption and BH.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:34:04 -07:00
Amerigo Wang
e3826f1e94 net: reserve ports for applications using fixed port numbers
(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:28:40 -07:00
David S. Miller
bf47f4b0ba Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6 2010-05-12 23:30:45 -07:00
Allan Stephens
8e1c298c01 tipc: Update commenting in TIPC API
Eliminate comments in TIPC's main API files that are either obsolete,
incorrect, misleading, or unhelpful.  It also adds in one new comment.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-12 23:02:23 -07:00
Johannes Berg
5ce6e438d5 mac80211: add offload channel switch support
This adds support for offloading the channel switch
operation to devices that support such, typically
by having specific firmware API for it. The reasons
for this could be that the firmware provides better
timing or that regulatory enforcement done by the
device requires special handling of CSAs.

In order to allow drivers to specify the timing to
the device, the new channel_switch callback will
pass through the received frame's mactime, where
available.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-12 16:39:05 -04:00
David S. Miller
278554bd65 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ar9170/usb.c
	drivers/scsi/iscsi_tcp.c
	net/ipv4/ipmr.c
2010-05-12 00:05:35 -07:00
John W. Linville
cc755896a4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	drivers/net/wireless/ath/ar9170/main.c
2010-05-11 14:24:55 -04:00
Patrick McHardy
d1db275dd3 ipv6: ip6mr: support multiple tables
This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT6_TABLE. The table number is
stored in the raw socket data and affects all following ip6mr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
is created with a default routing rule pointing to it. Newly created pim6reg
devices have the table number appended ("pim6regX"), with the exception of
devices created in the default table, which are named just "pim6reg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip -6 mrule add iif eth0 lookup 123
# ip -6 mrule add oif eth0 lookup 123

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:55 +02:00
Patrick McHardy
6bd5214339 ipv6: ip6mr: move mroute data into seperate structure
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:53 +02:00
Patrick McHardy
f30a778421 ipv6: ip6mr: convert struct mfc_cache to struct list_head
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:51 +02:00
Patrick McHardy
c476efbcde ipv6: ip6mr: move unres_queue and timer to per-namespace data
The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.

As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-11 14:40:48 +02:00
David S. Miller
d250fe91ae Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-05-10 23:03:26 -07:00
Patrick McHardy
1e4b105712 Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	net/bridge/br_device.c
	net/bridge/br_forward.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-10 18:39:28 +02:00
Marcel Holtmann
f48fd9c8cd Bluetooth: Create per controller workqueue
Instead of having a global workqueue for all controllers, it makes
more sense to have a workqueue per controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:34:03 +02:00
Gustavo F. Padovan
dfc909befb Bluetooth: Fix race condition on l2cap_ertm_send()
l2cap_ertm_send() can be called both from user context and bottom half
context. The socket locks for that contexts are different, the user
context uses a mutex(which can sleep) and the second one uses a
spinlock_bh. That creates a race condition when we have interruptions on
both contexts at the same time.

The better way to solve this is to add a new spinlock to lock
l2cap_ertm_send() and the vars it access. The other solution was to defer
l2cap_ertm_send() with a workqueue, but we the sending process already
has one defer on the hci layer. It's not a good idea add another one.

The patch refactor the code to create l2cap_retransmit_frames(), then we
encapulate the lock of l2cap_ertm_send() for some call. It also changes
l2cap_retransmit_frame() to l2cap_retransmit_one_frame() to avoid
confusion

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:53 +02:00
Gustavo F. Padovan
1890d36bb5 Bluetooth: Implement Local Busy Condition handling
Supports Local Busy condition handling through a waitqueue that wake ups
each 200ms and try to push the packets to the upper layer. If it can
push all the queue then it leaves the Local Busy state.

The patch modifies the behaviour of l2cap_ertm_reassembly_sdu() to
support retry of the push operation.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:53 +02:00
Gustavo F. Padovan
9a9c6a3441 Bluetooth: Make hci_send_acl() void
hci_send_acl can't fail, so we can make it void. This patch changes
that and all the funcions that use hci_send_acl().
That change exposed a bug on sending connectionless data. We were not
reporting the lenght send back to the user space.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:52 +02:00
Gustavo F. Padovan
68d7f0ce91 Bluetooth: Enable option to configure Max Transmission value via sockopt
With the sockopt extension we can set a per-channel MaxTx value.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
803020c6fa Bluetooth: Change acknowledgement to use the value of txWindow
Now that we can set the txWindow we need to change the acknowledgement
procedure to ack after each (pi->txWindow/6 + 1). The plus 1 is to avoid
the zero value.
It also renames pi->num_to_ack to a better name: pi->num_acked.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:50 +02:00
Gustavo F. Padovan
14b5aa71ec Bluetooth: Add sockopt configuration for txWindow on L2CAP
Now we can set/get Transmission Window size via sockopt.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:49 +02:00
Gustavo F. Padovan
1c7621596d Bluetooth: Fix configuration of the MPS value
We were accepting values bigger than we can accept. This was leading
ERTM to drop packets because of wrong FCS checks.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
c1b4f43be0 Bluetooth: Add timer to Acknowledge I-frames
We ack I-frames on each txWindow/5 I-frames received, but if the sender
stop to send I-frames and it's not a txWindow multiple we can leave some
frames unacked.
So I added a timer to ack I-frames on this case. The timer expires in
200ms.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:48 +02:00
Gustavo F. Padovan
d5392c8f1e Bluetooth: Implement 'Send IorRRorRNR' event
After receive a RR with P bit set ERTM shall use this funcion to choose
what type of frame to reply with F bit = 1.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:46 +02:00
Gustavo F. Padovan
0d861d8b8e Bluetooth: Make hci_send_sco() void
It also removes an unneeded check for the MTU. The check is done before
on sco_send_frame()

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-05-10 09:28:45 +02:00
Neil Horman
3ee943728f ipv4: remove ip_rt_secret timer (v4)
A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant.  This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-08 01:57:52 -07:00
Johannes Berg
0aaffa9b96 mac80211: improve HT channel handling
Currently, when one interface switches HT mode,
all others will follow along. This is clearly
undesirable, since the new one might switch to
no-HT while another one is operating in HT.

Address this issue by keeping track of the HT
mode per interface, and allowing only changes
that are compatible, i.e. switching into HT40+
is not possible when another interface is in
HT40-, in that case the second one needs to
fall back to HT20.

Also, to allow drivers to know what's going on,
store the per-interface HT mode (channel type)
in the virtual interface's bss_conf.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-07 14:55:51 -04:00
Johannes Berg
f444de05d2 cfg80211/mac80211: better channel handling
Currently (all tested with hwsim) you can do stupid
things like setting up an AP on a certain channel,
then adding another virtual interface and making
that associate on another channel -- this will make
the beaconing to move channel but obviously without
the necessary IEs data update.

In order to improve this situation, first make the
configuration APIs (cfg80211 and nl80211) aware of
multi-channel operation -- we'll eventually need
that in the future anyway. There's one userland API
change and one API addition. The API change is that
now SET_WIPHY must be called with virtual interface
index rather than only wiphy index in order to take
effect for that interface -- luckily all current
users (hostapd) do that. For monitor interfaces, the
old setting is preserved, but monitors are always
slaved to other devices anyway so no guarantees.

The second userland API change is the introduction
of a per virtual interface SET_CHANNEL command, that
hostapd should use going forward to make it easier
to understand what's going on (it can automatically
detect a kernel with this command).

Other than mac80211, no existing cfg80211 drivers
are affected by this change because they only allow
a single virtual interface.

mac80211, however, now needs to be aware that the
channel settings are per interface now, and needs
to disallow (for now) real multi-channel operation,
which is another important part of this patch.

One of the immediate benefits is that you can now
start hostapd to operate on a hardware that already
has a connection on another virtual interface, as
long as you specify the same channel.

Note that two things are left unhandled (this is an
improvement -- not a complete fix):

 * different HT/no-HT modes

   currently you could start an HT AP and then
   connect to a non-HT network on the same channel
   which would configure the hardware for no HT;
   that can be fixed fairly easily

 * CSA

   An AP we're connected to on a virtual interface
   might indicate switching channels, and in that
   case we would follow it, regardless of how many
   other interfaces are operating; this requires
   more effort to fix but is pretty rare after all

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-07 14:55:50 -04:00
Johannes Berg
ac8dd506e4 mac80211: fix BSS info reconfiguration
When reconfiguring an interface due to a previous
hardware restart, mac80211 will currently include
the new IBSS flag on non-IBSS interfaces which may
confuse drivers.

Instead of doing the ~0 trick, simply spell out
which things are going to be reconfigured.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-07 14:55:49 -04:00
Vlad Yasevich
50b5d6ad63 sctp: Fix a race between ICMP protocol unreachable and connect()
ICMP protocol unreachable handling completely disregarded
the fact that the user may have locked the socket.  It proceeded
to destroy the association, even though the user may have
held the lock and had a ref on the association.  This resulted
in the following:

Attempt to release alive inet socket f6afcc00

=========================
[ BUG: held lock freed! ]
-------------------------
somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
there!
 (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
1 lock held by somenu/2672:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c

stack backtrace:
Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
Call Trace:
 [<c1232266>] ? printk+0xf/0x11
 [<c1038553>] debug_check_no_locks_freed+0xce/0xff
 [<c10620b4>] kmem_cache_free+0x21/0x66
 [<c1185f25>] __sk_free+0x9d/0xab
 [<c1185f9c>] sk_free+0x1c/0x1e
 [<c1216e38>] sctp_association_put+0x32/0x89
 [<c1220865>] __sctp_connect+0x36d/0x3f4
 [<c122098a>] ? sctp_connect+0x13/0x4c
 [<c102d073>] ? autoremove_wake_function+0x0/0x33
 [<c12209a8>] sctp_connect+0x31/0x4c
 [<c11d1e80>] inet_dgram_connect+0x4b/0x55
 [<c11834fa>] sys_connect+0x54/0x71
 [<c103a3a2>] ? lock_release_non_nested+0x88/0x239
 [<c1054026>] ? might_fault+0x42/0x7c
 [<c1054026>] ? might_fault+0x42/0x7c
 [<c11847ab>] sys_socketcall+0x6d/0x178
 [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c1002959>] syscall_call+0x7/0xb

This was because the sctp_wait_for_connect() would aqcure the socket
lock and then proceed to release the last reference count on the
association, thus cause the fully destruction path to finish freeing
the socket.

The simplest solution is to start a very short timer in case the socket
is owned by user.  When the timer expires, we can do some verification
and be able to do the release properly.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 00:56:07 -07:00
John W. Linville
83163244f8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	drivers/net/wireless/libertas_tf/cmd.c
	drivers/net/wireless/libertas_tf/main.c
2010-05-05 16:14:16 -04:00
David S. Miller
f546061840 Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev
Add missing linux/vmalloc.h include to net/sctp/probe.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-03 16:24:31 -07:00
David S. Miller
7ef527377b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-05-02 22:02:06 -07:00
Jan Engelhardt
1183f3838c net: fix compile error due to double return type in SOCK_DEBUG
Fix this one:
include/net/sock.h: error: two or more data types in declaration specifiers

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-02 13:42:39 -07:00
Eric Dumazet
4381548237 net: sock_def_readable() and friends RCU conversion
sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
  - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
  - Use wq_has_sleeper() to eventually wakeup tasks.
  - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
  macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-01 15:00:15 -07:00
Vlad Yasevich
0e3aef8d09 sctp: Tag messages that can be Nagle delayed at creation.
When we create the sctp_datamsg and fragment the user data,
we know exactly if we are sending full segments or not and
how they might be bundled.  During this time, we can mark
messages a Nagle capable or not.  This makes the check at
transmit time much simpler.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
cf9b4812e1 sctp: fast recovery algorithm is per association.
SCTP fast recovery algorithm really applies per association
and impacts all transports.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
b2cf9b6bd9 sctp: update transport initializations
Right now, sctp transports are not fully initialized and when
adding any new fields, they have to be explicitely initialized.
This is prone to mistakes.  So we switch to calling kzalloc()
which makes things much simpler.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:10 -04:00
Vlad Yasevich
c0058a35aa sctp: Save some room in the sctp_transport by using bitfields
Saves some room in the sctp_transport structure.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Vlad Yasevich
ae19c54866 sctp: remove 'resent' bit from the chunk
The 'resent' bit is used to make sure that we don't update
rto estimate based on retransmitted chunks.  However, we already
have the 'rto_pending' bit that we test when need to update rto,
so 'resent' bit is just extra.  Additionally, we currently have
a bug in that we always set a 'resent' bit and thus rto estimate
is only updated by Heartbeats.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Wei Yongjun
52688d6ec9 sctp: discard ABORT chunk with zero verification tag in COOKIE-WAIT state
In current implementation if ABORT chunk is received with T flag is set
and zero verification tag in COOKIE-WAIT state, the ABORT chunk will be
always accepted. This is because in COOKIE-WAIT state, the endpoint does
not know the peer's verification tag, and it's zero in the endpoint.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 21:42:44 -04:00
Eric Dumazet
767dd03369 net: speedup sock_recv_ts_and_drops()
sock_recv_ts_and_drops() is fat and slow (~ 4% of cpu time on some
profiles)

We can test all socket flags at once to make fast path fast again.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-30 16:29:42 -07:00
John W. Linville
f5c044e53a mac80211: remove deprecated noise field from ieee80211_rx_status
Also remove associated IEEE80211_HW_NOISE_DBM from ieee80211_hw_flags.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-30 15:38:13 -04:00
Eric Dumazet
f84af32cbc net: ip_queue_rcv_skb() helper
When queueing a skb to socket, we can immediately release its dst if
target socket do not use IP_CMSG_PKTINFO.

tcp_data_queue() can drop dst too.

This to benefit from a hot cache line and avoid the receiver, possibly
on another cpu, to dirty this cache line himself.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 15:31:51 -07:00
Eric Dumazet
4b0b72f7dd net: speedup udp receive path
Since commit 95766fff ([UDP]: Add memory accounting.), 
each received packet needs one extra sock_lock()/sock_release() pair.

This added latency because of possible backlog handling. Then later,
ticket spinlocks added yet another latency source in case of DDOS.

This patch introduces lock_sock_bh() and unlock_sock_bh()
synchronization primitives, avoiding one atomic operation and backlog
processing.

skb_free_datagram_locked() uses them instead of full blown
lock_sock()/release_sock(). skb is orphaned inside locked section for
proper socket memory reclaim, and finally freed outside of it.

UDP receive path now take the socket spinlock only once.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 14:35:48 -07:00
Neil Horman
5fa782c2f5 sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
Ok, version 4

Change Notes:
1) Minor cleanups, from Vlads notes

Summary:

Hey-
	Recently, it was reported to me that the kernel could oops in the
following way:

<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU:    0
<5> EIP:    0060:[<c02bff27>]    Not tainted VLI
<5> EFLAGS: 00010216   (2.6.9-89.0.25.EL)
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
<5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
<5> ds: 007b   es: 007b   ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d
<5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490
<5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004
<5> Call Trace:
<5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5>  [<c01555a4>] cache_grow+0x140/0x233
<5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5>  [<c02d005e>] nf_iterate+0x40/0x81
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
<5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e103e>] ip_rcv+0x334/0x3b4
<5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5>  [<c02c67a4>] process_backlog+0x6c/0xd9
<5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
<5>  [<c012a7b1>] __do_softirq+0x35/0x79
<5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5>  [<c01094de>] do_softirq+0x46/0x4d

Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.

The problem is in sctp_process_unk_param:
if (NULL == *errp)
	*errp = sctp_make_op_error_space(asoc, chunk,
					 ntohs(chunk->chunk_hdr->length));

	if (*errp) {
		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
				 WORD_ROUND(ntohs(param.p->length)));
		sctp_addto_chunk(*errp,
			WORD_ROUND(ntohs(param.p->length)),
				  param.v);

When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.

The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173

I've tested the below fix and confirmed that it fixes the issue.  We move to a
strategy whereby we allocate a fixed size error chunk and ignore errors we don't
have space to report.  Tested by me successfully

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 14:22:01 -07:00
Johannes Berg
8fc214ba95 mac80211: notify driver about IBSS status
Some drivers (e.g. iwlwifi) need to know and try
to figure it out based on other things, but making
it explicit is definitely better.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-28 16:50:29 -04:00
Sjur Braendeland
8d545c8f95 caif: Disconnect without waiting for response
Changes:
o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous
  disconnect, not waiting for any response from the modem. Due to this
  the function cfcnfg_linkdestroy_rsp does nothing anymore.
o Because disconnect may take down a connection before a connect response
  is received the function cfcnfg_linkup_rsp is checking if the client is
  still waiting for the response, if not a disconnect request is sent to
  the modem.
o cfctrl is no longer keeping track of pending disconnect requests.
o Added function cfctrl_cancel_req, which is used for deleting a pending
  connect request if disconnect is done before connect response is received.
o Removed unused function cfctrl_insert_req2
o Added better handling of connect reject from modem.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:13 -07:00
Sjur Braendeland
5b20865675 caif: Add reference counting to service layer
Changes:
o Added functions cfsrvl_get and cfsrvl_put.
o Added support release_client to use by socket and net device.
o Increase reference counting for in-flight packets from cfmuxl

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:12 -07:00
Sjur Braendeland
e539d83cc8 caif: Rename functions in cfcnfg and caif_dev
Changes:
 o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
 o Fixed typo cfcfg to cfcnfg
 o Renamed linkid to channel_id
 o Updated documentation in caif_dev.h
 o Minor formatting changes

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:11 -07:00
Vlad Yasevich
c078669340 sctp: Fix oops when sending queued ASCONF chunks
When we finish processing ASCONF_ACK chunk, we try to send
the next queued ASCONF.  This action runs the sctp state
machine recursively and it's not prepared to do so.

kernel BUG at kernel/timer.c:790!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/ipv6/initstate
Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0
EIP is at add_timer+0xd/0x1b
EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
Stack:
 c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
<0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
00000004
<0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
000000d0
Call Trace:
 [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp]
 [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp]
 [<d1863386>] ? sctp_pname+0x0/0x1d [sctp]
 [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
 [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
 [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
 [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp]
 [<d1863334>] ? sctp_cname+0x0/0x52 [sctp]
 [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
 [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp]
 [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp]

Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie>
Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:16:34 -07:00
Wei Yongjun
561b1733a4 sctp: avoid irq lock inversion while call sk->sk_data_ready()
sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
not be used in this case. Therefore, we have to make a new function
sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.33-rc6 #129
---------------------------------------------------------
sctp_darn/1517 just changed the state of lock:
 (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80
but this lock took another, SOFTIRQ-unsafe lock in the past:
 (slock-AF_INET){+.-...}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
1 lock held by sctp_darn/1517:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp]

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:16:31 -07:00
Jiri Pirko
05fceb4ad7 net: disallow to use net_assign_generic externally
Now there's no need to use this fuction directly because it's handled by
register_pernet_device. So to make this simple and easy to understand,
make this static to do not tempt potentional users.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 15:49:02 -07:00
Eric Dumazet
c377411f24 net: sk_add_backlog() take rmem_alloc into account
Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.

We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.

Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 15:13:20 -07:00
David S. Miller
c58dc01bab net: Make RFS socket operations not be inet specific.
Idea from Eric Dumazet.

As for placement inside of struct sock, I tried to choose a place
that otherwise has a 32-bit hole on 64-bit systems.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-04-27 15:11:48 -07:00
Johannes Berg
a060bbfe4e mac80211: give virtual interface to hw_scan
When scanning, it is somewhat important to scan
on the correct virtual interface. All drivers
that currently implement hw_scan only support a
single virtual interface, but that may change
and then we'd want to be ready.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-27 16:09:23 -04:00
Juuso Oikarinen
9043f3b89a cfg80211: Remove default dynamic PS timeout value
Now that the mac80211 is choosing dynamic ps timeouts based on the ps-qos
network latency configuration, configure a default value of -1 as the dynamic
ps timeout in cfg80211. This value allows the mac80211 to determine the value
to be used.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-27 16:09:22 -04:00
Juuso Oikarinen
195e294d21 mac80211: Determine dynamic PS timeout based on ps-qos network latency
Determine the dynamic PS timeout based on the configured ps-qos network
latency. For backwards wext compatibility, allow the dynamic PS timeout
configured by the cfg80211 to overrule the automatically determined value.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-27 16:09:22 -04:00
Felix Fietkau
fd8aaaf351 cfg80211: add ap isolation support
This is used to configure APs to not bridge traffic between connected stations.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-27 16:09:21 -04:00
David S. Miller
bb61187465 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6 2010-04-27 12:57:39 -07:00
Eric Dumazet
0b53ff2ead net: fix a lockdep rcu warning in __sk_dst_set()
__sk_dst_set() might be called while no state can be integrated in a
rcu_dereference_check() condition.

So use rcu_dereference_raw() to shutup lockdep warnings (if
CONFIG_PROVE_RCU is set)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 12:53:26 -07:00
Eric Dumazet
18f9f1365d rps: inet_rps_save_rxhash() argument is not const
const qualifier on sock argument is misleading, since we can modify rxhash.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 12:53:25 -07:00
Flavio Leitner
6c37e5de45 TCP: avoid to send keepalive probes if receiving data
RFC 1122 says the following:
...
  Keep-alive packets MUST only be sent when no data or
  acknowledgement packets have been received for the
  connection within an interval.
...

The acknowledgement packet is reseting the keepalive
timer but the data packet isn't. This patch fixes it by
checking the timestamp of the last received data packet
too when the keepalive timer expires.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 12:53:25 -07:00
Paul E. McKenney
7ec75c582e net: suppress RCU lockdep false positive in twsk_net()
Calls to twsk_net() are in some cases protected by reference counting
as an alternative to RCU protection.  Cases covered by reference counts
include __inet_twsk_kill(), inet_twsk_free(), inet_twdr_do_twkill_work(),
inet_twdr_twcal_tick(), and tcp_timewait_state_process().  RCU is used
by inet_twsk_purge().  Locking is used by established_get_first()
and established_get_next().  Finally, __inet_twsk_hashdance() is an
initialization case.

It appears to be non-trivial to locate the appropriate locks and
reference counts from within twsk_net(), so used rcu_dereference_raw().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 12:39:01 -07:00
Patrick McHardy
3d0c9c4eb2 net: fib_rules: mark arguments to fib_rules_register const and __net_initdata
fib_rules_register() duplicates the template passed to it without modification,
mark the argument as const. Additionally the templates are only needed when
instantiating a new namespace, so mark them as __net_initdata, which means
they can be discarded when CONFIG_NET_NS=n.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-26 16:02:04 +02:00
David S. Miller
b7d6a43211 Merge branch 'net-next-2.6_20100423a/br/br_multicast_v3' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next 2010-04-23 23:37:24 -07:00
Brian Haley
4b340ae20d IPv6: Complete IPV6_DONTFRAG support
Finally add support to detect a local IPV6_DONTFRAG event
and return the relevant data to the user if they've enabled
IPV6_RECVPATHMTU on the socket.  The next recvmsg() will
return no data, but have an IPV6_PATHMTU as ancillary data.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23 23:35:29 -07:00
Brian Haley
13b52cd446 IPv6: Add dontfrag argument to relevant functions
Add dontfrag argument to relevant functions for
IPV6_DONTFRAG support, as well as allowing the value
to be passed-in via ancillary cmsg data.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23 23:35:28 -07:00
John W. Linville
3b51cc996e Merge branch 'master' into for-davem
Conflicts:
	drivers/net/wireless/ath/ath9k/phy.c
	drivers/net/wireless/iwlwifi/iwl-6000.c
	drivers/net/wireless/iwlwifi/iwl-debugfs.c
2010-04-23 14:43:45 -04:00
YOSHIFUJI Hideaki
6e7cb83707 ipv6 mcast: Introduce include/net/mld.h for MLD definitions.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-23 13:35:55 +09:00
Andrew Hendry
5ebfbc06aa X25: Add if_x25.h and x25 to device identifiers
V2 Feedback from John Hughes.
- Add header for userspace implementations such as xot/xoe to use
- Use explicit values for interface stability
- No changes to driver patches

V1
- Use identifiers instead of magic numbers for X25 layer 3 to device interface.
- Also fixed checkpatch notes on updated code.

[ Add new user header to include/linux/Kbuild  -DaveM ]

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 16:12:36 -07:00
Eric Dumazet
f68c224fed dst: rcu check refinement
__sk_dst_get() might be called from softirq, with socket lock held.

[  159.026180] include/net/sock.h:1200 invoked rcu_dereference_check()
without protection!
[  159.026261] 
[  159.026261] other info that might help us debug this:
[  159.026263] 
[  159.026425] 
[  159.026426] rcu_scheduler_active = 1, debug_locks = 0
[  159.026552] 2 locks held by swapper/0:
[  159.026609]  #0:  (&icsk->icsk_retransmit_timer){+.-...}, at:
[<ffffffff8104fc15>] run_timer_softirq+0x105/0x350
[  159.026839]  #1:  (slock-AF_INET){+.-...}, at: [<ffffffff81392b8f>]
tcp_write_timer+0x2f/0x1e0
[  159.027063] 
[  159.027064] stack backtrace:
[  159.027172] Pid: 0, comm: swapper Not tainted
2.6.34-rc5-03707-gde498c8-dirty #36
[  159.027252] Call Trace:
[  159.027306]  <IRQ>  [<ffffffff810718ef>] lockdep_rcu_dereference
+0xaf/0xc0
[  159.027411]  [<ffffffff8138e4f7>] tcp_current_mss+0xa7/0xb0
[  159.027537]  [<ffffffff8138fa49>] tcp_write_wakeup+0x89/0x190
[  159.027600]  [<ffffffff81391936>] tcp_send_probe0+0x16/0x100
[  159.027726]  [<ffffffff81392cd9>] tcp_write_timer+0x179/0x1e0
[  159.027790]  [<ffffffff8104fca1>] run_timer_softirq+0x191/0x350
[  159.027980]  [<ffffffff810477ed>] __do_softirq+0xcd/0x200


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 16:06:59 -07:00
Tom Herbert
aa2ea0586d tcp: fix outsegs stat for TSO segments
Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter.  Without
doing this, the counter can be off by orders of magnitude from the
actual number of segments sent.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 16:00:00 -07:00
Johannes Berg
672724403b radiotap parser: fix endian annotation
When I updated this from the corresponding
userspace library, an annotation error crept
in -- this variable needs to be annotated as
little endian. No effect on code generation.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-21 14:15:19 -04:00
Eric Dumazet
aa39514516 net: sk_sleep() helper
Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
	return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 16:37:13 -07:00
Felix Fietkau
f79d9bad37 mac80211: add flags for STBC (Space-Time Block Coding)
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20 11:52:21 -04:00
Stanislaw Gruszka
80725f454e mac80211: document IEEE80211_CONF_CHANGE_QOS
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20 11:50:52 -04:00
Holger Schurig
1289723ef2 mac80211: sample survey implementation for mac80211 & hwsim
This adds the survey function to both mac80211 itself and to mac80211_hwsim.
For the latter driver, we simply invent some noise level.A real driver which
cannot determine the real channel noise MUST NOT report any noise, especially
not a magically conjured one :-)

Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20 11:50:52 -04:00
Daniel Yingqiang Ma
03ceedea97 ath9k: Group Key fix for VAPs
When I set up multiple VAPs with ath9k, I encountered an issue that
the traffic may be lost after a while.

The detailed phenomenon is
1. After a while the clients connected to one of these VAPs will get
into a state that no broadcast/multicast packets can be transfered
successfully while the unicast packets can be transfered normally.
2. Minutes latter the unitcast packets transfer will fail as well,
because the ARP entry is expired and it can't be freshed due to the
broadcast trouble.

It's caused by the group key overwritten and someone discussed this
issue in ath9k-devel maillist before, but haven't work out a fix yet.

I referred the method in madwifi, and made a patch for ath9k.
The method is to set the high bit of the sender(AP)'s address, and
associated that mac and the group key. It requires the hardware
supports multicast frame key search. It seems true for AR9160.

Not sure whether it's the correct way to fix this issue. But it seems
to work in my test. The patch is attached, feel free to revise it.

Signed-off-by: Daniel Yingqiang ma <yma.cool@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20 11:50:51 -04:00
Patrick McHardy
6291055465 Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	net/ipv6/netfilter/ip6t_REJECT.c
	net/netfilter/xt_limit.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-20 16:02:01 +02:00
Daniel Halperin
93d95b12b3 mac80211: fix typo in comments
The flag is called IEEE80211_TX_STAT_AMPDU rather than using the whole word
STATUS.

Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19 16:41:42 -04:00
Tom Herbert
fec5e652e5 rfs: Receive Flow Steering
This patch implements receive flow steering (RFS).  RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running.  RFS is an
extension of Receive Packet Steering (RPS).

The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure.  The rxhash is passed in skb's received on
the connection from netif_receive_skb.  For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.

The convolution of the simple approach is that it would potentially
allow OOO packets.  If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.

To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.

rps_sock_table is a global hash table.  Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.

rps_dev_flow_table is specific to each device queue.  Each entry
contains a CPU and a tail queue counter.  The CPU is the "current"
CPU for a matching flow.  The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.

Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length.  When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.

And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted.  When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:

- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter >= queue tail counter in the
rps_dev_flow table.  This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.

Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality.  2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.

This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.

There are two configuration parameters for RFS.  The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue.  Both are rounded to power of two.

The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).

The benefits of RFS are dependent on cache hierarchy, application
load, and other factors.  On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation.  However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp.  The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.

e1000e on 8 core Intel
   No RFS or RPS		104K tps at 30% CPU
   No RFS (best RPS config):    290K tps at 63% CPU
   RFS				303K tps at 61% CPU

RPC test	tps	CPU%	50/90/99% usec latency	Latency StdDev
  No RFS/RPS	103K	48%	757/900/3185		4472.35
  RPS only:	174K	73%	415/993/2468		491.66
  RFS		223K	73%	379/651/1382		315.61

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-16 16:01:27 -07:00
Luis R. Rodriguez
0a56bd0ae3 mac80211: add LDPC control flag
LDPC will be enabled through the rate control algorithm
for each buffer the the tx_info flags.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-16 15:43:48 -04:00
Shan Wei
4e15ed4d93 net: replace ipfragok with skb->local_df
As Herbert Xu said: we should be able to simply replace ipfragok
with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
has droped ipfragok and set local_df value properly.

The patch kills the ipfragok parameter of .queue_xmit().

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15 23:36:37 -07:00
John W. Linville
5c01d56693 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ath5k/phy.c
	drivers/net/wireless/wl12xx/wl1271_main.c
2010-04-15 16:21:34 -04:00
Bart De Schuymer
e179e6322a netfilter: bridge-netfilter: Fix MAC header handling with IP DNAT
- fix IP DNAT on vlan- or pppoe-encapsulated traffic: The functions
neigh_hh_output() or dst->neighbour->output() overwrite the complete
Ethernet header, although we only need the destination MAC address.
For encapsulated packets, they ended up overwriting the encapsulating
header. The new code copies the Ethernet source MAC address and
protocol number before calling dst->neighbour->output(). The Ethernet
source MAC and protocol number are copied back in place in
br_nf_pre_routing_finish_bridge_slow(). This also makes the IP DNAT
more transparent because in the old scheme the source MAC of the
bridge was copied into the source address in the Ethernet header. We
also let skb->protocol equal ETH_P_IP resp. ETH_P_IPV6 during the
execution of the PF_INET resp. PF_INET6 hooks.

- Speed up IP DNAT by calling neigh_hh_bridge() instead of
neigh_hh_output(): if dst->hh is available, we already know the MAC
address so we can just copy it.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15 12:26:39 +02:00
Patrick McHardy
f0ad0860d0 ipv4: ipmr: support multiple tables
This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT_TABLE. The table number is
stored in the raw socket data and affects all following ipmr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT)
is created with a default routing rule pointing to it. Newly created pimreg
devices have the table number appended ("pimregX"), with the exception of
devices created in the default table, which are named just "pimreg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip mrule add iif eth0 lookup 123
# ip mrule add oif eth0 lookup 123

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:34 -07:00
Patrick McHardy
0c12295a74 ipv4: ipmr: move mroute data into seperate structure
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:34 -07:00
Patrick McHardy
862465f2e7 ipv4: ipmr: convert struct mfc_cache to struct list_head
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:33 -07:00
Patrick McHardy
e258beb22f ipv4: ipmr: move unres_queue and timer to per-namespace data
The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.

As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:32 -07:00
Patrick McHardy
f74e49b561 ipv4: raw: move struct raw_sock and raw_sk() to include/net/raw.h
A following patch will use struct raw_sock to store state for ipmr,
so having the definitions in icmp.h doesn't fit very well anymore.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:31 -07:00
Patrick McHardy
d8a566beaa net: fib_rules: consolidate IPv4 and DECnet ->default_pref() functions.
Both functions are equivalent, consolidate them since a following patch
needs a third implementation for multicast routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:30 -07:00
Eric Dumazet
b6c6712a42 net: sk_dst_cache RCUification
With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
work.

sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)

This rwlock is readlocked for a very small amount of time, and dst
entries are already freed after RCU grace period. This calls for RCU
again :)

This patch converts sk_dst_lock to a spinlock, and use RCU for readers.

__sk_dst_get() is supposed to be called with rcu_read_lock() or if
socket locked by user, so use appropriate rcu_dereference_check()
condition (rcu_read_lock_held() || sock_owned_by_user(sk))

This patch avoids two atomic ops per tx packet on UDP connected sockets,
for example, and permits sk_dst_lock to be much less dirtied.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 01:41:33 -07:00
Herbert Xu
bb29624614 inet: Remove unused send_check length argument
inet: Remove unused send_check length argument

This patch removes the unused length argument from the send_check
function in struct inet_connection_sock_af_ops.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yinghai <yinghai.lu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:09 -07:00
David S. Miller
871039f02f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/stmmac/stmmac_main.c
	drivers/net/wireless/wl12xx/wl1271_cmd.c
	drivers/net/wireless/wl12xx/wl1271_main.c
	drivers/net/wireless/wl12xx/wl1271_spi.c
	net/core/ethtool.c
	net/mac80211/scan.c
2010-04-11 14:53:53 -07:00
David S. Miller
4a1032faac Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-04-11 02:44:30 -07:00
Timo Teräs
e4077e018b xfrm: Fix crashes in xfrm_lookup()
From: Timo Teräs <timo.teras@iki.fi>

Happens because CONFIG_XFRM_SUB_POLICY is not enabled, and one of
the helper functions I used did unexpected things in that case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-08 11:27:42 -07:00
John W. Linville
0f2df9eac7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into merge
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ath5k/phy.c
	drivers/net/wireless/iwlwifi/iwl-4965.c
	drivers/net/wireless/iwlwifi/iwl-agn.c
	drivers/net/wireless/iwlwifi/iwl-core.c
	drivers/net/wireless/iwlwifi/iwl-core.h
	drivers/net/wireless/iwlwifi/iwl-tx.c
2010-04-08 13:34:54 -04:00
John Hughes
f5eb917b86 x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.
Here is a patch to stop X.25 examining fields beyond the end of the packet.

For example, when a simple CALL ACCEPTED was received:

	10 10 0f

x25_parse_facilities was attempting to decode the FACILITIES field, but this
packet contains no facilities field.

Signed-off-by: John Hughes <john@calva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:29:25 -07:00
Jouni Malinen
d5cdfacb35 cfg80211: Add local-state-change-only auth/deauth/disassoc
cfg80211 is quite strict on allowing authentication and association
commands only in certain states. In order to meet these requirements,
user space applications may need to clear authentication or
association state in some cases. Currently, this can be done with
deauth/disassoc command, but that ends up sending out Deauthentication
or Disassociation frame unnecessarily. Add a new nl80211 attribute to
allow this sending of the frame be skipped, but with all other
deauth/disassoc operations being completed.

Similar state change is also needed for IEEE 802.11r FT protocol in
the FT-over-DS case which does not use Authentication frame exchange
in a transition to another BSS. For this to work with cfg80211, an
authentication entry needs to be created for the target BSS without
sending out an Authentication frame. The nl80211 authentication
command can be used for this purpose, too, with the new attribute to
indicate that the command is only for changing local state. This
enables wpa_supplicant to complete FT-over-DS transition successfully.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 14:37:56 -04:00
Timo Teräs
80c802f307 xfrm: cache bundles instead of policies for outgoing flows
__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.

This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:19 -07:00
Timo Teräs
fe1a5f031e flow: virtualize flow cache entry methods
This allows to validate the cached object before returning it.
It also allows to destruct object properly, if the last reference
was held in flow cache. This is also a prepartion for caching
bundles in the flow cache.

In return for virtualizing the methods, we save on:
- not having to regenerate the whole flow cache on policy removal:
  each flow matching a killed policy gets refreshed as the getter
  function notices it smartly.
- we do not have to call flow_cache_flush from policy gc, since the
  flow cache now properly deletes the object if it had any references

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:18 -07:00
David S. Miller
4a35ecf8bf Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bonding/bond_main.c
	drivers/net/via-velocity.c
	drivers/net/wireless/iwlwifi/iwl-agn.c
2010-04-06 23:53:30 -07:00
Linus Torvalds
749d229761 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: saving negative to unsigned char
  9p: return on mutex_lock_interruptible()
  9p: Creating files with names too long should fail with ENAMETOOLONG.
  9p: Make sure we are able to clunk the cached fid on umount
  9p: drop nlink remove
  fs/9p: Clunk the fid resulting from partial walk of the name
  9p: documentation update
  9p: Fix setting of protocol flags in v9fs_session_info structure.
2010-04-05 13:42:54 -07:00
Aneesh Kumar K.V
6d96d3ab7a 9p: Make sure we are able to clunk the cached fid on umount
dcache prune happen on umount. So we cannot mark the client
satus disconnect. That will prevent a 9p call to the server

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05 10:37:36 -05:00
Jiri Pirko
22bedad3ce net: convert multicast list to list_head
Converts the list and the core manipulating with it to be the same as uc_list.

+uses two functions for adding/removing mc address (normal and "global"
 variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
 manipulation with lists on a sandbox (used in bonding and 80211 drivers)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:22:15 -07:00
YOSHIFUJI Hideaki / 吉藤英明
bd2c77a0a7 ipv6 fib: Make rt6_info{} more cache-line aware.
The head element of rt6_info{} is dst_entry{}, and
IPv6 specific elements follow.

Because elements at the end of dst_entry{} are frequently
updated, it is not good to put frequently-used static
elements, such as rt6i_idev, rt6i_dst or rt6i_flags in the
same cache line.

On the other hand, fib6_table, rt6i_node or rt6i_gateway are
rarely used, so it is okay to stay in the same cache line.

Let's rearrange rt6_info{}.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-01 18:41:41 -07:00
Eric Dumazet
5d944c640b gen_estimator: deadlock fix
One of my test machine got a deadlock during "tc" sessions,
adding/deleting classes & filters, using traffic estimators.

After some analysis, I believe we have a potential use after free case
in est_timer() :

spin_lock(e->stats_lock); << HERE >>
read_lock(&est_lock);
if (e->bstats == NULL)   << TEST >>
	goto skip;

Test is done a bit late, because after estimator is killed, and before
rcu grace period elapsed, we might already have freed/reuse memory where
e->stats_locks points to (some qdisc->q.lock)

A possible fix is to respect a rcu grace period at Qdisc dismantle time.

On 64bit, sizeof(struct Qdisc) is exactly 192 bytes. Adding 16 bytes to
it (for struct rcu_head) is a problem because it might change
performance, given QDISC_ALIGNTO is 32 bytes.

This is why I also change QDISC_ALIGNTO to 64 bytes, to satisfy most
current alignment requirements.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-01 18:38:48 -07:00
Joe Perches
f9ea3eb442 include/net/iw_handler.h: Use SIOCIWFIRST not SIOCSIWCOMMIT in comment
to match use in IW_IOCTL_IDX macro

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-31 14:49:12 -04:00
Stanislaw Gruszka
e1b3ec1a2a mac80211: explicitly disable/enable QoS
Add interface to disable/enable QoS (aka WMM or WME). Currently drivers
enable it explicitly when ->conf_tx method is called, and newer disable.
Disabling is needed for some APs, which do not support QoS, such
we should send QoS frames to them.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-31 14:43:59 -04:00
Zhu Yi
e3cf8b3f7b mac80211: support paged rx SKBs
Mac80211 drivers can now pass paged SKBs to mac80211 via
ieee80211_rx{_irqsafe}. The implementation currently use
skb_linearize() in a few places i.e. management frame
handling, software decryption, defragmentation and A-MSDU
process. We will optimize them one by one later.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Cc: Kalle Valo <kalle.valo@iki.fi>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-31 14:39:34 -04:00
YOSHIFUJI Hideaki / 吉藤英明
d57b8fb8a8 ipv6: Use __fls() instead of fls() in __ipv6_addr_diff().
Because we have ensured that the argument is non-zero,
it is better to use __fls() and generate better code.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 23:28:46 -07:00
Sjur Braendeland
2721c5b9dd net-caif: add CAIF Link layer device header files
Header files for CAIF Link layer net-device,
and link-layer registration.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:08:45 -07:00
Sjur Braendeland
09009f30de net-caif: add CAIF core protocol stack header files
Add include files for the CAIF Core protocol stack.

caif_layer.h - Defines the structure of the CAIF protocol layers
cfcnfg.h     - CAIF Configuration Module for services and link layers
cfctrl.h     - CAIF Control Protocol Layer
cffrml.h     - CAIF Framing Layer
cfmuxl.h     - CAIF Muxing Layer
cfpkt.h	     - CAIF Packet layer (skb helper functions)
cfserl.h     - CAIF Serial Layer
cfsrvl.h     - CAIF Service Layer

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:08:45 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
David S. Miller
7905e357eb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-03-29 13:50:10 -07:00
David S. Miller
0c0dbfecbf decnet: Remove unused FIB metric macros.
Unlike the ipv4 side, these are completely unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27 19:23:46 -07:00
Juuso Oikarinen
a97c13c345 mac80211: Add support for connection quality monitoring
Add support for the set_cqm_config op. This op function configures the
requested connection quality monitor rssi threshold and rssi hysteresis
values to the hardware  if the hardware supports
IEEE80211_HW_SUPPORTS_CQM.

For unsupported hardware, currently -EOPNOTSUPP is returned, so the mac80211
is currently not doing connection quality monitoring on the host. This could be
added later, if needed.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-24 16:04:33 -04:00
Juuso Oikarinen
d6dc1a3863 cfg80211: Add connection quality monitoring support to nl80211
Add support for basic configuration of a connection quality monitoring to the
nl80211 interface, and basic support for notifying about triggered monitoring
events.

Via this interface a user-space connection manager may configure and receive
pre-warning events of deteriorating WLAN connection quality, and start
preparing for roaming in advance, before the connection is already lost.

An example usage of such a trigger is starting scanning for nearby AP's in
an attempt to find one with better connection quality, and associate to it
before the connection characteristics of the existing connection become too bad
or the association is even lost, leading in a prolonged delay in connectivity.

The interface currently supports only RSSI, but it could be later extended
to include other parameters, such as signal-to-noise ratio, if need for that
arises.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-24 16:02:37 -04:00
Juuso Oikarinen
1e4dcd0124 mac80211: Add support for connection monitor in hardware
This patch is based on a RFC patch by Kalle Valo.

The wl1271 has a feature which handles the connection monitor logic
in hardware, basically sending periodically nullfunc frames and reporting
to the host if AP is lost, after attempting to recover by sending
probe-requests to the AP.

Add support to mac80211 by adding a new flag IEEE80211_HW_CONNECTION_MONITOR
which prevents conn_mon_timer from triggering during idle periods, and
prevents sending probe-requests to the AP if beacon-loss is indicated by the
hardware.

Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-23 16:51:42 -04:00
David S. Miller
33e2bf6aa1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ath5k/phy.c
2010-03-22 18:15:15 -07:00
Eric Dumazet
ec733b15a3 net: snmp mib cleanup
There is no point to align or pad mibs to cache lines, they are per cpu
allocated with a 8 bytes alignment anyway.
This wastes space for no gain. This patch removes __SNMP_MIB_ALIGN__

Since SNMP mibs contain "unsigned long" fields only, we can relax the
allocation alignment from "unsigned long long" to "unsigned long"

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:34:16 -07:00
Marcel Holtmann
aef7d97cc6 Bluetooth: Convert debug files to actually use debugfs instead of sysfs
Some of the debug files ended up wrongly in sysfs, because at that point
of time, debugfs didn't exist. Convert these files to use debugfs and
also seq_file. This patch converts all of these files at once and then
removes the exported symbol for the Bluetooth sysfs class.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-03-21 05:49:35 +01:00
stephen hemminger
502a2ffd73 ipv6: convert idev_list to list macros
Convert to list macro's for the list of addresses per interface
in IPv6.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:45:09 -07:00
stephen hemminger
5c578aedcb IPv6: convert addrconf hash list to RCU
Convert from reader/writer lock to RCU and spinlock for addrconf
hash list.

Adds an additional helper macro for hlist_for_each_entry_continue_rcu
to handle the continue case.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:44:35 -07:00
stephen hemminger
c2e21293c0 ipv6: convert addrconf list to hlist
Using hash list macros, simplifies code and helps later RCU.

This patch includes some initialization that is not strictly necessary,
since an empty hlist node/list is all zero; and list is in BSS
and node is allocated with kzalloc.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:44:34 -07:00
stephen hemminger
372e6c8f1f ipv6: convert temporary address list to list macros
Use list macros instead of open coded linked list.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:44:34 -07:00
Pablo Neira Ayuso
f5d410f2ea netlink: fix unaligned access in nla_get_be64()
This patch fixes a unaligned access in nla_get_be64() that was
introduced by myself in a17c859849.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 22:47:23 -07:00
Linus Torvalds
ceb804cd0f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: Skip check for mandatory locks when unlocking
  9p: Fixes a simple bug enabling writes beyond 2GB.
  9p: Change the name of new protocol from 9p2010.L to 9p2000.L
  fs/9p: re-init the wstat in readdir loop
  net/9p: Add sysfs mount_tag file for virtio 9P device
  net/9p: Use the tag name in the config space for identifying mount point
2010-03-14 11:11:08 -07:00
Linus Torvalds
d89b218b80 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
  bridge: ensure to unlock in error path in br_multicast_query().
  drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
  sky2: Avoid rtnl_unlock without rtnl_lock
  ipv6: Send netlink notification when DAD fails
  drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant
  ipconfig: Handle devices which take some time to come up.
  mac80211: Fix memory leak in ieee80211_if_write()
  mac80211: Fix (dynamic) power save entry
  ipw2200: use kmalloc for large local variables
  ath5k: read eeprom IQ calibration values correctly for G mode
  ath5k: fix I/Q calibration (for real)
  ath5k: fix TSF reset
  ath5k: use fixed antenna for tx descriptors
  libipw: split ieee->networks into small pieces
  mac80211: Fix sta_mtx unlocking on insert STA failure path
  rt2x00: remove KSEG1ADDR define from rt2x00soc.h
  net: add ColdFire support to the smc91x driver
  asix: fix setting mac address for AX88772
  ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
  net: Fix dev_mc_add()
  ...
2010-03-13 14:50:18 -08:00
Sripathi Kodi
45bc21edb5 9p: Change the name of new protocol from 9p2010.L to 9p2000.L
This patch changes the name of the new 9P protocol from 9p2010.L to
9p2000.u. This is because we learnt that the name 9p2010 is already
being used by others.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13 08:57:29 -06:00
Linus Torvalds
c32da02342 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits)
  doc: fix typo in comment explaining rb_tree usage
  Remove fs/ntfs/ChangeLog
  doc: fix console doc typo
  doc: cpuset: Update the cpuset flag file
  Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed
  Remove drivers/parport/ChangeLog
  Remove drivers/char/ChangeLog
  doc: typo - Table 1-2 should refer to "status", not "statm"
  tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
  No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h
  devres/irq: Fix devm_irq_match comment
  Remove reference to kthread_create_on_cpu
  tree-wide: Assorted spelling fixes
  tree-wide: fix 'lenght' typo in comments and code
  drm/kms: fix spelling in error message
  doc: capitalization and other minor fixes in pnp doc
  devres: typo fix s/dev/devm/
  Remove redundant trailing semicolons from macros
  fix typo "definetly" -> "definitely" in comment
  tree-wide: s/widht/width/g typo in comments
  ...

Fix trivial conflict in Documentation/laptops/00-INDEX
2010-03-12 16:04:50 -08:00
Alexey Dobriyan
8467005da3 nsproxy: remove INIT_NSPROXY()
Remove INIT_NSPROXY(), use C99 initializer.
Remove INIT_IPC_NS(), INIT_NET_NS() while I'm at it.

Note: headers trim will be done later, now it's quite pointless because
results will be invalidated by merge window.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:40 -08:00
YOSHIFUJI Hideaki / 吉藤英明
2b4c32972b ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
Commit a43912ab19... ("tunnel: eliminate recursion field") eliminated
use of recursion field from tunnel structures, but its definition
still exists in ip6_tnl{}.

Let's remove that unused field.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10 07:32:29 -08:00
Johannes Berg
62bb2ac5cb mac80211: deprecate RX status noise
The noise value as is won't be used, isn't
filled by most drivers and doesn't really
make a whole lot of sense on a per packet
basis -- proper cfg80211 survey support in
mac80211 will need to be different.

Mark the struct member as deprecated so it
will be removed from drivers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-09 15:02:53 -05:00
Zhu Yi
4045635318 net: add __must_check to sk_add_backlog
Add the "__must_check" tag to sk_add_backlog() so that any failure to
check and drop packets will be warned about.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 10:45:26 -08:00
Jiri Kosina
318ae2edc3 Merge branch 'for-next' into for-linus
Conflicts:
	Documentation/filesystems/proc.txt
	arch/arm/mach-u300/include/mach/debug-macro.S
	drivers/net/qlge/qlge_ethtool.c
	drivers/net/qlge/qlge_main.c
	drivers/net/typhoon.c
2010-03-08 16:55:37 +01:00
YOSHIFUJI Hideaki / 吉藤英明
0c9a2ac1f8 ipv6: Optmize translation between IPV6_PREFER_SRC_xxx and RT6_LOOKUP_F_xxx.
IPV6_PREFER_SRC_xxx definitions:
| #define IPV6_PREFER_SRC_TMP             0x0001
| #define IPV6_PREFER_SRC_PUBLIC          0x0002
| #define IPV6_PREFER_SRC_COA             0x0004

RT6_LOOKUP_F_xxx definitions:
| #define RT6_LOOKUP_F_SRCPREF_TMP        0x00000008
| #define RT6_LOOKUP_F_SRCPREF_PUBLIC     0x00000010
| #define RT6_LOOKUP_F_SRCPREF_COA        0x00000020

So, we can translate between these two groups by shift operation
instead of multiple 'if's.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:25:53 -08:00
Zhu Yi
a3a858ff18 net: backlog functions rename
sk_add_backlog -> __sk_add_backlog
sk_add_backlog_limited -> sk_add_backlog

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:03 -08:00
Zhu Yi
8eae939f14 net: add limit for socket backlog
We got system OOM while running some UDP netperf testing on the loopback
device. The case is multiple senders sent stream UDP packets to a single
receiver via loopback on local host. Of course, the receiver is not able
to handle all the packets in time. But we surprisingly found that these
packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
Instead, they are kept queuing to sk->sk_backlog and finally ate up all
the memory. We believe this is a secure hole that a none privileged user
can crash the system.

The root cause for this problem is, when the receiver is doing
__release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
sk_receive_queue with the softirq enabled. In the above case, multiple
busy senders will almost make it an endless loop. The skbs in the
backlog end up eat all the system memory.

The issue is not only for UDP. Any protocols using socket backlog is
potentially affected. The patch adds limit for socket backlog so that
the backlog size cannot be expanded endlessly.

Reported-by: Alex Shi <alex.shi@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:33:59 -08:00
Sripathi Kodi
342fee1d5c 9P2010.L handshake: Remove "dotu" variable
Removes 'dotu' variable and make everything dependent
on 'proto_version' field.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:42 -06:00
Sripathi Kodi
0fb80abd91 9P2010.L handshake: Add mount option
Add new mount V9FS mount option to specify protocol version

This patch adds a new mount option to specify protocol version.
With this option it is possible to use "-o version=" switch to
specify 9P protocol version to use. Valid options for version
are:
9p2000
9p2000.u
9p2010.L

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:42 -06:00
Mike Galbraith
c839d30a41 net: add scheduler sync hint to tcp_prequeue().
Decreases the odds wakee will suffer from frequent cache misses.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04 00:53:51 -08:00
David S. Miller
e5c1a0aa00 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-03-03 22:42:54 -08:00
Sujith
4fa0043731 mac80211: Fix HT rate control configuration
Handling HT configuration changes involved setting the channel
with the new HT parameters and then issuing a rate_update()
notification to the driver.

This behavior changed after the off-channel changes. Now, the channel
is not updated with the new HT params in enable_ht() - instead, it
is now done when the scan work terminates. This results in the driver
depending on stale information, defaulting to non-HT mode always.

Fix this by passing the new channel type to the driver.

Cc: stable@kernel.org
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-03 15:39:21 -05:00
Herbert Xu
87c1e12b5e ipsec: Fix bogus bundle flowi
When I merged the bundle creation code, I introduced a bogus
flowi value in the bundle.  Instead of getting from the caller,
it was instead set to the flow in the route object, which is
totally different.

The end result is that the bundles we created never match, and
we instead end up with an ever growing bundle list.

Thanks to Jamal for find this problem.

Reported-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-03 01:04:37 -08:00
David S. Miller
47871889c6 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/firmware/iscsi_ibft.c
2010-02-28 19:23:06 -08:00
David S. Miller
46976c042b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2010-02-28 00:57:28 -08:00
Marcel Holtmann
943da25d95 Bluetooth: Add controller types for BR/EDR and 802.11 AMP
With the Bluetooth 3.0 specification and the introduction of alternate
MAC/PHY (AMP) support, it is required to differentiate between primary
BR/EDR controllers and 802.11 AMP controllers. So introduce a special
type inside HCI device for differentiation.

For now all AMP controllers will be treated as raw devices until an
AMP manager has been implemented.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-02-27 14:05:38 +01:00
Marcel Holtmann
ca325f6989 Bluetooth: Convert inquiry cache to use debugfs instead of sysfs
The output of the inquiry cache is only useful for debugging purposes
and so move it into debugfs.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-02-27 14:05:38 +01:00
Marcel Holtmann
c13854cef4 Bluetooth: Convert controller hdev->type to hdev->bus
The hdev->type is misnamed and should be actually hdev->bus instead. So
convert it now.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-02-27 14:05:38 +01:00
Patrick McHardy
3729d50212 rtnetlink: support specifying device flags on device creation
commit e8469ed959c373c2ff9e6f488aa5a14971aebe1f
Author: Patrick McHardy <kaber@trash.net>
Date:   Tue Feb 23 20:41:30 2010 +0100

Support specifying the initial device flags when creating a device though
rtnl_link. Devices allocated by rtnl_create_link() are marked as INITIALIZING
in order to surpress netlink registration notifications. To complete setup,
rtnl_configure_link() must be called, which performs the device flag changes
and invokes the deferred notifiers if everything went well.

Two examples:

# add macvlan to eth0
#
$ ip link add link eth0 up allmulticast on type macvlan

[LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 26:f8:84:02:f9:2a brd ff:ff:ff:ff:ff:ff
[ROUTE]ff00::/8 dev macvlan0  table local  metric 256  mtu 1500 advmss 1440 hoplimit 0
[ROUTE]fe80::/64 dev macvlan0  proto kernel  metric 256  mtu 1500 advmss 1440 hoplimit 0
[LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500
    link/ether 26:f8:84:02:f9:2a
[ADDR]11: macvlan0    inet6 fe80::24f8:84ff:fe02:f92a/64 scope link
       valid_lft forever preferred_lft forever
[ROUTE]local fe80::24f8:84ff:fe02:f92a via :: dev lo  table local  proto none  metric 0  mtu 16436 advmss 16376 hoplimit 0
[ROUTE]default via fe80::215:e9ff:fef0:10f8 dev macvlan0  proto kernel  metric 1024  mtu 1500 advmss 1440 hoplimit 0
[NEIGH]fe80::215:e9ff:fef0:10f8 dev macvlan0 lladdr 00:15:e9:f0:10:f8 router STALE
[ROUTE]2001:6f8:974::/64 dev macvlan0  proto kernel  metric 256  expires 0sec mtu 1500 advmss 1440 hoplimit 0
[PREFIX]prefix 2001:6f8:974::/64 dev macvlan0 onlink autoconf valid 14400 preferred 131084
[ADDR]11: macvlan0    inet6 2001:6f8:974:0:24f8:84ff:fe02:f92a/64 scope global dynamic
       valid_lft 86399sec preferred_lft 14399sec

# add VLAN to eth1, eth1 is down
#
$ ip link add link eth1 up type vlan id 1000
RTNETLINK answers: Network is down

<no events>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27 02:43:40 -08:00
Ulrich Weber
45bb006090 ipv6: Remove IPV6_ADDR_RESERVED
RFC 4291 section 2.4 states that all uncategorized addresses
should be considered as Global Unicast.

This will remove IPV6_ADDR_RESERVED completely
and return IPV6_ADDR_UNICAST in ipv6_addr_type() instead.

Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26 03:59:07 -08:00
David S. Miller
19bc291c99 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-core.h
	drivers/net/wireless/rt2x00/rt2800pci.c
2010-02-25 23:26:21 -08:00
Paul E. McKenney
a898def29e net: Add checking to rcu_dereference() primitives
Update rcu_dereference() primitives to use new lockdep-based
checking. The rcu_dereference() in __in6_dev_get() may be
protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
The rcu_dereference() in __sk_free() is protected by the fact
that it is never reached if an update could change it.  Check
for this by using rcu_dereference_check() to verify that the
struct sock's ->sk_wmem_alloc counter is zero.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25 09:41:03 +01:00
Jamal Hadi Salim
8ca2e93b55 xfrm: SP lookups signature with mark
pass mark to all SP lookups to prepare them for when we add code
to have them search.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-22 16:21:12 -08:00
Jamal Hadi Salim
bd55775c8d xfrm: SA lookups signature with mark
pass mark to all SA lookups to prepare them for when we add code
to have them search.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-22 16:20:22 -08:00
Jamal Hadi Salim
bf825f81b4 xfrm: introduce basic mark infrastructure
Add basic structuring and accessors for xfrm mark

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-22 16:19:45 -08:00
stephen hemminger
808f5114a9 packet: convert socket list to RCU (v3)
Convert AF_PACKET to use RCU, eliminating one more reader/writer lock.

There is no need for a real sk_del_node_init_rcu(), because sk_del_node_init
is doing the equivalent thing to hlst_del_init_rcu already; but added
some comments to try and make that obvious.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-22 15:45:56 -08:00
Kalle Valo
ffb9eb3d8b nl80211: add power save commands
The most needed command from nl80211, which Wireless Extensions had,
is support for power save mode. Add a simple command to make it possible
to enable and disable power save via nl80211.

I was also planning about extending the interface, for example adding the
timeout value, but after thinking more about this I decided not to do it.
Basically there were three reasons:

Firstly, the parameters for power save are very much hardware dependent.
Trying to find a unified interface which would work with all hardware, and
still make sense to users, will be very difficult.

Secondly, IEEE 802.11 power save implementation in Linux is still in state
of flux. We have a long way to still to go and there is no way to predict
what kind of implementation we will have after few years. And because we
need to support nl80211 interface a long time, practically forever, adding
now parameters to nl80211 might create maintenance problems later on.

Third issue are the users. Power save parameters are mostly used for
debugging, so debugfs is better, more flexible, interface for this.
For example, wpa_supplicant currently doesn't configure anything related
to power save mode. It's better to strive that kernel can automatically
optimise the power save parameters, like with help of pm qos network
and other traffic parameters.

Later on, when we have better understanding of power save, we can extend
this command with more features, if there's a need for that.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-19 15:52:40 -05:00
Andreas Petlund
7e38017557 net: TCP thin dupack
This patch enables fast retransmissions after one dupACK for
TCP if the stream is identified as thin. This will reduce
latencies for thin streams that are not able to trigger fast
retransmissions due to high packet interarrival time. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin.

Signed-off-by: Andreas Petlund <apetlund@simula.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18 15:43:09 -08:00
Andreas Petlund
36e31b0af5 net: TCP thin linear timeouts
This patch will make TCP use only linear timeouts if the
stream is thin. This will help to avoid the very high latencies
that thin stream suffer because of exponential backoff. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin. A maximum of 6 linear
timeouts is tried before exponential backoff is resumed.

Signed-off-by: Andreas Petlund <apetlund@simula.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18 15:43:08 -08:00
Andreas Petlund
5aa4b32fc8 net: TCP thin-stream detection
Inline function to dynamically detect thin streams based on
the number of packets in flight. Used to dynamically trigger
thin-stream mechanisms if enabled by ioctl or sysctl.

Signed-off-by: Andreas Petlund <apetlund@simula.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18 15:43:07 -08:00
Alexey Dobriyan
b54452b07a const: struct nla_policy
Make remaining netlink policies as const.
Fixup coding style where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18 14:30:18 -08:00
Alexey Dobriyan
bbef49daca ipv6: use standard lists for FIB walks
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18 14:30:17 -08:00
Patrick McHardy
37ee3d5b3e netfilter: nf_defrag_ipv4: fix compilation error with NF_CONNTRACK=n
As reported by Randy Dunlap <randy.dunlap@oracle.com>, compilation
of nf_defrag_ipv4 fails with:

include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type
include/net/netfilter/nf_conntrack.h:178: error: 'const struct sk_buff' has no member named 'nfct'
include/net/netfilter/nf_conntrack.h:185: error: implicit declaration of function 'nf_conntrack_put'
include/net/netfilter/nf_conntrack.h:294: error: 'const struct sk_buff' has no member named 'nfct'
net/ipv4/netfilter/nf_defrag_ipv4.c:45: error: 'struct sk_buff' has no member named 'nfct'
net/ipv4/netfilter/nf_defrag_ipv4.c:46: error: 'struct sk_buff' has no member named 'nfct'

net/nf_conntrack.h must not be included with NF_CONNTRACK=n, add a
few #ifdefs. Long term the header file should be fixed to be usable
even with NF_CONNTRACK=n.

Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-18 19:04:44 +01:00
Venkata Mohan Reddy
2906f66a56 ipvs: SCTP Trasport Loadbalancing Support
Enhance IPVS to load balance SCTP transport protocol packets. This is done
based on the SCTP rfc 4960. All possible control chunks have been taken
care. The state machine used in this code looks some what lengthy. I tried
to make the state machine easy to understand.

Signed-off-by: Venkata Mohan Reddy Koppula <mohanreddykv@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-18 12:31:05 +01:00
Stephen Hemminger
6457d26bd4 IPv6: convert mc_lock to spinlock
Only used for writing, so convert to spinlock

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17 18:48:44 -08:00
Joe Perches
9874c41cd5 ipv6.h: reassembly: replace calculated magic number with multiplication
On Tue, 2010-02-16 at 16:47 +0100, Patrick McHardy wrote:
> Joe Perches wrote:
> >> @@ -246,6 +246,8 @@ extern int ipv6_opt_accepted(struct sock *sk, struct sk_buff *skb);
> >>  int ip6_frag_nqueues(struct net *net);
> >>  int ip6_frag_mem(struct net *net);
> >>
> >> +#define IPV6_FRAG_HIGH_THRESH	262144		/* == 256*1024 */
> >> +#define IPV6_FRAG_LOW_THRESH	196608		/* == 192*1024 */
> >>  #define IPV6_FRAG_TIMEOUT	(60*HZ)		/* 60 seconds */
> >
> > 196608 isn't a number I want to remember.
> > Is this better as:
> >
> > #define IPV6_FRAG_HIGH_THRESH	(256 * 1024)	/* 262144 */
> > #define IPV6_FRAG_LOW_THRESH	(192 * 1024)	/* 196608 */
>
> Please send a patch, I'll apply it once these patches are in Dave's
> tree.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17 00:03:28 -08:00
Tejun Heo
7d720c3e4f percpu: add __percpu sparse annotations to net
Add __percpu sparse annotations to net.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

The macro and type tricks around snmp stats make things a bit
interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16 23:05:38 -08:00
Eric W. Biederman
54716e3beb net neigh: Decouple per interface neighbour table controls from binary sysctls
Stop computing the number of neighbour table settings we have by
counting the number of binary sysctls.  This behaviour was silly
and meant that we could not add another neighbour table setting
without also adding another binary sysctl.

Don't pass the binary sysctl path for neighour table entries
into neigh_sysctl_register.  These parameters are no longer
used and so are just dead code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16 15:55:18 -08:00
David S. Miller
749f621e20 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-02-16 11:15:13 -08:00
Jouni Malinen
026331c4d9 cfg80211/mac80211: allow registering for and sending action frames
This implements a new command to register for action frames
that userspace wants to handle instead of the in-kernel
rejection. It is then responsible for rejecting ones that
it decided not to handle. There is no unregistration, but
the socket can be closed for that.

Frames that are not registered for will not be forwarded
to userspace and will be rejected by the kernel, the
cfg80211 API helps implementing that.

Additionally, this patch adds a new command that allows
doing action frame transmission from userspace. It can be
used either to exchange action frames on the current
operational channel (e.g., with the AP with which we are
currently associated) or to exchange off-channel Public
Action frames with the remain-on-channel command.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-15 16:14:15 -05:00
Patrick McHardy
5d0aa2ccd4 netfilter: nf_conntrack: add support for "conntrack zones"
Normally, each connection needs a unique identity. Conntrack zones allow
to specify a numerical zone using the CT target, connections in different
zones can use the same identity.

Example:

iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1
iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-15 18:13:33 +01:00
Patrick McHardy
8fea97ec17 netfilter: nf_conntrack: pass template to l4proto ->error() handler
The error handlers might need the template to get the conntrack zone
introduced in the next patches to perform a conntrack lookup.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-15 17:45:08 +01:00
Uwe Kleine-König
dfff0615d2 tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-15 15:38:10 +01:00
Ben Hutchings
1a5778aa00 net: Fix first line of kernel-doc for a few functions
The function name must be followed by a space, hypen, space, and a
short description.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-14 22:35:47 -08:00
David S. Miller
f6f223039c Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-02-14 17:45:59 -08:00
jamal
a63374631e xfrm: use proper kernel types
kernel side should use uxx instead of __uxx types

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12 13:27:48 -08:00
Patrick McHardy
2bec5a369e ipv6: fib: fix crash when changing large fib while dumping it
When the fib size exceeds what can be dumped in a single skb, the
dump is suspended and resumed once the last skb has been received
by userspace. When the fib is changed while the dump is suspended,
the walker might contain stale pointers, causing a crash when the
dump is resumed.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
PGD 5347a067 PUD 65c7067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
...
RIP: 0010:[<ffffffffa01bce04>]
[<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
...
Call Trace:
 [<ffffffff8104aca3>] ? mutex_spin_on_owner+0x59/0x71
 [<ffffffffa01bd105>] inet6_dump_fib+0x11b/0x1b9 [ipv6]
 [<ffffffff81371af4>] netlink_dump+0x5b/0x19e
 [<ffffffff8134f288>] ? consume_skb+0x28/0x2a
 [<ffffffff81373b69>] netlink_recvmsg+0x1ab/0x2c6
 [<ffffffff81372781>] ? netlink_unicast+0xfa/0x151
 [<ffffffff813483e0>] __sock_recvmsg+0x6d/0x79
 [<ffffffff81348a53>] sock_recvmsg+0xca/0xe3
 [<ffffffff81066d4b>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff811ed1f8>] ? radix_tree_lookup_slot+0xe/0x10
 [<ffffffff810b3ed7>] ? find_get_page+0x90/0xa5
 [<ffffffff810b5dc5>] ? filemap_fault+0x201/0x34f
 [<ffffffff810ef152>] ? fget_light+0x2f/0xac
 [<ffffffff813519e7>] ? verify_iovec+0x4f/0x94
 [<ffffffff81349a65>] sys_recvmsg+0x14d/0x223

Store the serial number when beginning to walk the fib and reload
pointers when continuing to walk after a change occured. Similar
to other dumping functions, this might cause unrelated entries to
be missed when entries are deleted.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12 12:06:35 -08:00
Alexey Dobriyan
857b409a48 netfilter: nf_conntrack: elegantly simplify nf_ct_exp_net()
Remove #ifdef at nf_ct_exp_net() by using nf_ct_net().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-12 06:24:46 +01:00
Patrick McHardy
9d288dffe3 netfilter: nf_conntrack_sip: add T.38 FAX support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-11 12:30:21 +01:00
Patrick McHardy
010c0b9f34 netfilter: nf_nat: support mangling a single TCP packet multiple times
nf_nat_mangle_tcp_packet() can currently only handle a single mangling
per window because it only maintains two sequence adjustment positions:
the one before the last adjustment and the one after.

This patch makes sequence number adjustment tracking in
nf_nat_mangle_tcp_packet() optional and allows a helper to manually
update the offsets after the packet has been fully handled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-11 12:27:09 +01:00
Patrick McHardy
b87921bdf2 netfilter: nf_conntrack: show helper and class in /proc/net/nf_conntrack_expect
Make the output a bit more informative by showing the helper an expectation
belongs to and the expectation class.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-11 12:22:48 +01:00
Li Zefan
c4146644a5 net: add a wrapper sk_entry()
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-10 11:12:07 -08:00
Patrick McHardy
9ab99d5a43 Merge branch 'master' of /repos/git/net-next-2.6
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-10 14:17:10 +01:00
David S. Miller
b1109bf085 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-02-09 11:44:44 -08:00
Vivek Natarajan
375177bf35 mac80211: Retry null data frame for power save.
Even if the null data frame is not acked by the AP, mac80211
goes into power save. This might lead to loss of frames
from the AP.
Prevent this by restarting dynamic_ps_timer when ack is not
received for null data frames.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-09 14:10:05 -05:00
Kalle Valo
349e6b7289 mac80211: remove get_tx_stats() driver op
get_tx_stats() driver operation is not currently used anywhere in mac80211
and there are no plans to use it in the not-so-near future. So it can go
without anyone missing it.

Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08 16:51:01 -05:00
Johannes Berg
34e895075e mac80211: allow station add/remove to sleep
Many drivers would like to sleep during station
addition and removal, and currently have a high
complexity there from not being able to.

This introduces two new callbacks sta_add() and
sta_remove() that drivers can implement instead
of using sta_notify() and that can sleep, and
the new sta_add() callback is also allowed to
fail.

The reason we didn't do this previously is that
the IBSS code wants to insert stations from the
RX path, which is a tasklet, so cannot sleep.
This patch will keep the station allocation in
that path, but moves adding the station to the
driver out of line. Since the addition can now
fail, we can have IBSS peer structs the driver
rejected -- in that case we still talk to the
station but never tell the driver about it in
the control.sta pointer. If there will ever be
a driver that has a low limit on the number of
stations and that cannot talk to any stations
that are not known to it, we need to do come up
with a new strategy of handling larger IBSSs,
maybe quicker expiry or rejecting peers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08 16:50:53 -05:00
Johannes Berg
33e5a2f776 wireless: update radiotap parser
Upstream radiotap has adopted the namespace
proposal David Young made and I then took care
of, for which I had adapted the radiotap parser
as a library outside the kernel. This brings
the in-kernel parser up to speed.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08 16:50:53 -05:00
Patrick McHardy
d696c7bdaa netfilter: nf_conntrack: fix hash resizing with namespaces
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.

Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.

Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 11:18:07 -08:00
Eric Dumazet
5b3501faa8 netfilter: nf_conntrack: per netns nf_conntrack_cachep
nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.

If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.

We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).

If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
[Patrick: added unique slab name allocation]
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08 11:16:56 -08:00
David S. Miller
10be7eb36b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-02-04 08:58:14 -08:00
Patrick McHardy
84f3bb9ae9 netfilter: xtables: add CT target
Add a new target for the raw table, which can be used to specify conntrack
parameters for specific connections, f.i. the conntrack helper.

The target attaches a "template" connection tracking entry to the skb, which
is used by the conntrack core when initializing a new conntrack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 17:17:06 +01:00
Patrick McHardy
b2a15a604d netfilter: nf_conntrack: support conntrack templates
Support initializing selected parameters of new conntrack entries from a
"conntrack template", which is a specially marked conntrack entry attached
to the skb.

Currently the helper and the event delivery masks can be initialized this
way.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 14:40:17 +01:00
Patrick McHardy
0cebe4b416 netfilter: ctnetlink: support selective event delivery
Add two masks for conntrack end expectation events to struct nf_conntrack_ecache
and use them to filter events. Their default value is "all events" when the
event sysctl is on and "no events" when it is off. A following patch will add
specific initializations. Expectation events depend on the ecache struct of
their master conntrack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 13:51:51 +01:00
Patrick McHardy
858b313300 netfilter: nf_conntrack: split up IPCT_STATUS event
Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated
when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is
generated when the IPS_ASSURED bit is set.

In combination with a following patch to support selective event delivery,
this can be used for "sparse" conntrack replication: start replicating the
conntrack entry after it reached the ASSURED state and that way it's SYN-flood
resistant.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 13:48:53 +01:00
Patrick McHardy
794e68716b netfilter: ctnetlink: only assign helpers for matching protocols
Make sure not to assign a helper for a different network or transport
layer protocol to a connection.

Additionally change expectation deletion by helper to compare the name
directly - there might be multiple helper registrations using the same
name, currently one of them is chosen in an unpredictable manner and
only those expectations are removed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 13:41:29 +01:00
Felix Fietkau
17ad353b8d mac80211: fix monitor mode tx radiotap header handling
When an injected frame gets buffered for a powersave STA or filtered
and retransmitted, mac80211 attempts to parse the radiotap header
again, which doesn't work because it's gone at that point.
This patch adds a new flag for checking the availability of a radiotap
header, so that it only attempts to parse it once, reusing the tx info
on the next call to ieee80211_tx().
This fixes severe issues with rekeying in AP mode.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-01 15:40:08 -05:00
Luis R. Rodriguez
09d989d179 cfg80211: add regulatory hint disconnect support
This adds a new regulatory hint to be used when we know all
devices have been disconnected and idle. This can happen
when we suspend, for instance. When we disconnect we can
no longer assume the same regulatory rules learned from
a country IE or beacon hints are applicable so restore
regulatory settings to an initial state.

Since driver hints are cached on the wiphy that called
the hint, those hints are not reproduced onto cfg80211
as the wiphy will respect its own wiphy->regd regardless.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-01 15:40:06 -05:00
Hagen Paul Pfeifer
57dbb2d83d sched: add head drop fifo queue
This adds an additional queuing strategy, called pfifo_head_drop,
to remove the oldest skb in the case of an overflow within the queue -
the head element - instead of the last skb (tail). To remove the oldest
skb in congested situations is useful for sensor network environments
where newer packets reflect the superior information.

Reviewed-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-28 21:27:00 -08:00
Alexey Dobriyan
a166477390 netns xfrm: xfrm6_tunnel in netns
I'm not sure about rcu stuff near kmem cache destruction:
* checks for non-empty hashes look bogus, they're done _before_
  rcu_berrier()
* unregistering netns ops is done before kmem_cache destoy
  (as it should), and unregistering involves rcu barriers by itself

So it looks nothing should be done.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-28 06:31:05 -08:00
David S. Miller
05ba712d7e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-01-28 06:12:38 -08:00
Johannes Berg
56007a028c mac80211: wait for beacon before enabling powersave
Because DTIM information is required for powersave
but is only conveyed in beacons, wait for a beacon
before enabling powersave, and change the way the
information is conveyed to the driver accordingly.

mwl8k doesn't currently seem to implement PS but
requires the DTIM period in a different way; after
talking to Lennert we agreed to just have mwl8k do
the parsing itself in the finalize_join work.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-26 11:53:21 -05:00
Johannes Berg
c21dbf9214 cfg80211: export cfg80211_find_ie
This new function (previously a static function
called just "find_ie" can be used to find a
specific IE in a buffer of IEs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-26 11:53:20 -05:00
Kalle Valo
eb807fb238 mac80211: fix update_tkip_key() documentation about the context
Johannes noticed that I had incorrectly documented the context of
update_tkip_key() driver operation. It must be atomic because all
RX code is run inside rcu critical section.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-25 16:36:28 -05:00
Alexey Dobriyan
d7c7544c3d netns xfrm: deal with dst entries in netns
GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.

Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.

Reorder GC threshold initialization so it'd be done before registering
XFRM policies.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-24 22:47:53 -08:00
Alexey Dobriyan
e071041be0 netns xfrm: fix "ip xfrm state|policy count" misreport
"ip xfrm state|policy count" report SA/SP count from init_net,
not from netns of caller process.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-23 23:10:42 -08:00
Alexey Dobriyan
e754834e65 icmp: move icmp_err_convert[] to .rodata
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-23 01:21:28 -08:00
Alexey Dobriyan
5833929cc2 net: constify MIB name tables
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-23 01:21:27 -08:00
David S. Miller
51c24aaaca Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-01-23 00:31:06 -08:00
David S. Miller
6be325719b Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-01-22 22:45:46 -08:00
Johannes Berg
ef15aac607 cfg80211: export multiple MAC addresses in sysfs
If a device has multiple MAC addresses, userspace will
need to know about that. Similarly, if it allows the
MAC addresses to vary by a bitmask.

If a driver exports multiple addresses, it is assumed
that it will be able to deal with that many different
addresses, which need not necessarily match the ones
programmed into the device; if a mask is set then the
device should deal addresses within that mask based
on an arbitrary "base address".

To test it all and show how it is used, add support
to hwsim even though it can't actually deal with
addresses different from the default.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-22 16:11:16 -05:00
Johannes Berg
b3fbdcf49f mac80211: pass vif and station to update_tkip_key
When a TKIP key is updated, we should pass the station
pointer instead of just the address, since drivers can
use that to store their own data. We also need to pass
the virtual interface pointer.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-22 16:08:55 -05:00
Shan Wei
7c070aa947 IPv6: reassembly: replace magic number with macro definitions
Use macro to define high/low thresh value, refer to IPV6_FRAG_TIMEOUT.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-20 10:42:41 +01:00
Johannes Berg
c6fcf6bcfc mac80211: re-enable re-transmission of filtered frames
In an earlier commit,

    mac80211: disable software retry for now

    Pavel Roskin reported a problem that seems to be due to
    software retry of already transmitted frames. It turns
    out that we've never done that correctly, but due to
    some recent changes it now crashes in the TX code. I've
    added a comment in the patch that explains the problem
    better and also points to possible solutions -- which
    I can't implement right now.

I disabled software retry of failed/filtered frames
because it was broken. With the work of the previous
patches, it now becomes fairly easy to re-enable it
by adding a flag indicating that the frame shouldn't
be modified, but still running it through the transmit
handlers to populate the control information.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-19 16:25:21 -05:00
David S. Miller
6373464288 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-core.h
2010-01-19 11:43:42 -08:00
Alexey Dobriyan
e9d3897cc2 netfilter: netns: #ifdef ->iptable_security, ->ip6table_security
'security' tables depend on SECURITY, so ifdef them.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-18 08:08:37 +01:00
Octavian Purdila
72659ecce6 tcp: account SYN-ACK timeouts & retransmissions
Currently we don't increment SYN-ACK timeouts & retransmissions
although we do increment the same stats for SYN. We seem to have lost
the SYN-ACK accounting with the introduction of tcp_syn_recv_timer
(commit 2248761e in the netdev-vger-cvs tree).

This patch fixes this issue. In the process we also rename the v4/v6
syn/ack retransmit functions for clarity. We also add a new
request_socket operations (syn_ack_timeout) so we can keep code in
inet_connection_sock.c protocol agnostic.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-17 19:09:39 -08:00
Jarek Poplawski
d00c362f1b ax25: netrom: rose: Fix timer oopses
Wrong ax25_cb refcounting in ax25_send_frame() and by its callers can
cause timer oopses (first reported with 2.6.29.6 kernel).

Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14905

Reported-by: Bernard Pidoux <bpidoux@free.fr>
Tested-by: Bernard Pidoux <bpidoux@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-16 01:04:04 -08:00
Kalle Valo
c99445b140 mac80211: improve powersave documentation
There has been some confusion how drivers should implement powersave
support. Improve the documentation a bit to make it more clear what
drivers need to do. Also mention about U-APSD.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-14 18:16:56 -05:00
Kalle Valo
9d173fc5df mac80211: fix mac80211.h documentation warnings
There were some warnings about missing documentation and a missing reference.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-14 18:16:55 -05:00
Linus Torvalds
4a24eef671 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  net: fix build erros with CONFIG_BUG=n, CONFIG_GENERIC_BUG=n
  ipv6: skb_dst() can be NULL in ipv6_hop_jumbo().
  tg3: Update copyright and driver version
  tg3: Disable 5717 serdes and B0 support
  tg3: Add reliable serdes detection for 5717 A0
  tg3: Fix std rx prod ring handling
  tg3: Fix std prod ring nicaddr for 5787 and 57765
  sfc: Fix conditions for MDIO self-test
  sfc: Fix polling for slow MCDI operations
  e1000e: workaround link issues on busy hub in half duplex on 82577/82578
  e1000e: MDIO slow mode should always be done for 82577
  ixgbe: update copyright dates
  ixgbe: Do not attempt to perform interrupts in netpoll when down
  cfg80211: fix refcount imbalance when wext is disabled
  mac80211: fix queue selection for data frames on monitor interfaces
  iwlwifi: silence buffer overflow warning
  iwlwifi: disable tx on beacon update notification
  iwlwifi: fix iwl_queue_used bug when read_ptr == write_ptr
  mac80211: fix endian error
  mac80211: add missing sanity checks for action frames
  ...
2010-01-14 08:36:15 -08:00
Octavian Purdila
cd65c3c7d1 net: fix build erros with CONFIG_BUG=n, CONFIG_GENERIC_BUG=n
Fixed build errors introduced by commit 7ad6848c (ip: fix mc_loop
checks for tunnels with multicast outer addresses)

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-13 18:10:36 -08:00
Alexey Dobriyan
cd8c20b650 netfilter: nfnetlink: netns support
Make nfnl socket per-petns.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-13 16:02:14 +01:00
Linus Torvalds
597d8c7178 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
  sky2: Fix oops in sky2_xmit_frame() after TX timeout
  Documentation/3c509: document ethtool support
  af_packet: Don't use skb after dev_queue_xmit()
  vxge: use pci_dma_mapping_error to test return value
  netfilter: ebtables: enforce CAP_NET_ADMIN
  e1000e: fix and commonize code for setting the receive address registers
  e1000e: e1000e_enable_tx_pkt_filtering() returns wrong value
  e1000e: perform 10/100 adaptive IFS only on parts that support it
  e1000e: don't accumulate PHY statistics on PHY read failure
  e1000e: call pci_save_state() after pci_restore_state()
  netxen: update version to 4.0.72
  netxen: fix set mac addr
  netxen: fix smatch warning
  netxen: fix tx ring memory leak
  tcp: update the netstamp_needed counter when cloning sockets
  TI DaVinci EMAC: Handle emac module clock correctly.
  dmfe/tulip: Let dmfe handle DM910x except for SPARC on-board chips
  ixgbe: Fix compiler warning about variable being used uninitialized
  netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq()
  mv643xx_eth: don't include cache padding in rx desc buffer size
  ...

Fix trivial conflict in drivers/scsi/cxgb3i/cxgb3i_offload.c
2010-01-12 20:53:29 -08:00
Kalle Valo
ab13315af9 mac80211: add U-APSD client support
Add Unscheduled Automatic Power-Save Delivery (U-APSD) client support. The
idea is that the data frames from the client trigger AP to send the buffered
frames with ACs which have U-APSD enabled. This decreases latency and makes it
possible to save even more power.

Driver needs to use IEEE80211_HW_UAPSD to enable the feature. The current
implementation assumes that firmware takes care of the wakeup and
hardware needing IEEE80211_HW_PS_NULLFUNC_STACK is not yet supported.

Tested with wl1251 on a Nokia N900 and Cisco Aironet 1231G AP and running
various test traffic with ping.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 14:20:58 -05:00
Jouni Malinen
34a6eddbab cfg80211: Store IEs from both Beacon and Probe Response frames
Store information elements from Beacon and Probe Response frames in
separate buffers to allow both sets to be made available through
nl80211. This allows user space applications to get access to IEs from
Beacon frames even if we have received Probe Response frames from the
BSS. Previously, the IEs from Probe Response frames would have
overridden the IEs from Beacon frames.

This feature is of somewhat limited use since most protocols include
the same (or extended) information in Probe Response frames. However,
there are couple of exceptions where the IEs from Beacon frames could
be of some use: TIM IE is only included in Beacon frames (and it would
be needed to figure out the DTIM period used in the BSS) and at least
some implementations of Wireless Provisioning Services seem to include
the full IE only in Beacon frames).

The new BSS attribute for scan results is added to allow both the IE
sets to be delivered. This is done in a way that maintains the
previously used behavior for applications that are not aware of the
new NL80211_BSS_BEACON_IES attribute.

Signed-off-by: Jouni Malinen <j@w1.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 13:51:28 -05:00
Kalle Valo
05e54ea6cc mac80211: create Probe Request template
Certain type of hardware, for example wl1251 and wl1271, need a template
for the Probe Request. Create a function ieee80211_probereq_get() which
creates the template and drivers send it to hardware.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 13:51:25 -05:00
Kalle Valo
7044cc565b mac80211: add functions to create PS Poll and Nullfunc templates
Some hardware, for example wl1251 and wl1271, handle the transmission
of power save related frames in hardware, but the driver is responsible
for creating the templates. It's better to create the templates in mac80211,
that way all drivers can benefit from this.

Add two new functions, ieee80211_pspoll_get() and ieee80211_nullfunc_get()
which drivers need to call to get the frame. Drivers are also responsible
for updating the templates after each association.

Also new struct ieee80211_hdr_3addr is added to ieee80211.h to make it
easy to calculate length of the Nullfunc frame.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 13:51:24 -05:00
Jouni Malinen
13ae75b103 nl80211: New command for setting TX rate mask for rate control
Add a new NL80211_CMD_SET_TX_BITRATE_MASK command and related
attributes to provide support for setting TX rate mask for rate
control. This uses the existing cfg80211 set_bitrate_mask operation
that was previously used only with WEXT compat code (SIOCSIWRATE). The
nl80211 command allows more generic configuration of allowed rates as
a mask instead of fixed/max rate.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 13:51:23 -05:00
Jouni Malinen
37eb0b164c cfg80211/mac80211: Use more generic bitrate mask for rate control
Extend struct cfg80211_bitrate_mask to actually use a bitfield mask
instead of just a single fixed or maximum rate index. This change
itself does not modify the behavior (except for debugfs files), but it
prepares cfg80211 and mac80211 for a new nl80211 command for setting
which rates can be used in TX rate control.

Since frames are now going through the rate control algorithm
unconditionally, the internal IEEE80211_TX_INTFL_RCALGO flag can now
be removed. The RC implementations can use the rate_idx_mask value to
optimize their behavior if only a single rate is enabled.

The old max_rate_idx in struct ieee80211_tx_rate_control is maintained
(but commented as deprecated) for backwards compatibility with existing
RC implementations. Once these implementations have been updated to
use the more generic rate_idx_mask, the max_rate_idx value can be
removed.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 13:50:11 -05:00
Jouni Malinen
e00cfce0cb mac80211: Select lowest rate based on basic rate set in AP mode
If the basic rate set is configured to not include the lowest rate
(e.g., basic rate set = 6, 12, 24 Mbps in IEEE 802.11g mode), the AP
should not send out broadcast frames at 1 Mbps. This type of
configuration can be used to optimize channel usage in cases where
there is no need for backwards compatibility with IEEE 802.11b-only
devices.

In AP mode, mac80211 was unconditionally using the lowest rate for
Beacon frames and similarly, with all rate control algorithms that use
rate_control_send_low(), the lowest rate ended up being used for all
broadcast frames (and all unicast frames that are sent before
association). Change this to take into account the basic rate
configuration in AP mode, i.e., use the lowest rate in the basic rate
set instead of the lowest supported rate when selecting the rate.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 13:50:09 -05:00
Lukáš Turek
310bc676e3 mac80211: Add new callback set_coverage_class
Mac80211 callback to driver set_coverage_class() sets slot time and ACK
timeout for given IEEE 802.11 coverage class. The callback is optional,
but it's essential for long distance links.

Signed-off-by: Lukas Turek <8an@praha12.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 13:50:07 -05:00
Lukáš Turek
81077e82c3 nl80211: Add new WIPHY attribute COVERAGE_CLASS
The new attribute NL80211_ATTR_WIPHY_COVERAGE_CLASS sets IEEE 802.11
Coverage Class, which depends on maximum distance of nodes in a
wireless network. It's required for long distance links (more than a few
hundred meters).

The attribute is now ignored by two non-mac80211 drivers, rndis and
iwmc3200wifi, together with WIPHY_PARAM_RETRY_SHORT and
WIPHY_PARAM_RETRY_LONG. If it turns out to be a problem, we could split
set_wiphy_params callback or add new capability bits.

Signed-off-by: Lukas Turek <8an@praha12.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-12 13:49:17 -05:00
Stephen Hemminger
d218d11133 tcp: Generalized TTL Security Mechanism
This patch adds the kernel portions needed to implement
RFC 5082 Generalized TTL Security Mechanism (GTSM).
It is a lightweight security measure against forged
packets causing DoS attacks (for BGP). 

This is already implemented the same way in BSD kernels.
For the necessary Quagga patch 
  http://www.gossamer-threads.com/lists/quagga/dev/17389

Description from Cisco
  http://www.cisco.com/en/US/docs/ios/12_3t/12_3t7/feature/guide/gt_btsh.html

It does add one byte to each socket structure, but I did
a little rearrangement to reuse a hole (on 64 bit), but it
does grow the structure on 32 bit

This should be documented on ip(4) man page and the Glibc in.h
file also needs update.  IPV6_MINHOPLIMIT should also be added
(although BSD doesn't support that).  

Only TCP is supported, but could also be added to UDP, DCCP, SCTP
if desired.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-11 16:28:01 -08:00
David S. Miller
d4a66e752d Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/benet/be_cmds.h
	include/linux/sysctl.h
2010-01-10 22:55:03 -08:00
Rémi Denis-Courmont
fea93ecef6 Phonet: zero-copy GPRS TX
Send aligned pipe payload if requested to do so. Then, the socket buffer
needs not be fragmented anymore.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-07 00:24:55 -08:00
Rémi Denis-Courmont
fc6a110754 Phonet: zero-copy aligned GPRS RX
Newer Nokia cellular modems can use aligned payload for their GPRS pipe.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-07 00:24:54 -08:00
Octavian Purdila
7ad6848c7e ip: fix mc_loop checks for tunnels with multicast outer addresses
When we have L3 tunnels with different inner/outer families
(i.e. IPV4/IPV6) which use a multicast address as the outer tunnel
destination address, multicast packets will be loopbacked back to the
sending socket even if IP*_MULTICAST_LOOP is set to disabled.

The mc_loop flag is present in the family specific part of the socket
(e.g. the IPv4 or IPv4 specific part).  setsockopt sets the inner
family mc_loop flag. When the packet is pushed through the L3 tunnel
it will eventually be processed by the outer family which if different
will check the flag in a different part of the socket then it was set.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-06 20:37:01 -08:00
Catalin(ux) M. BOIE
6f7edb4881 IPVS: Allow boot time change of hash size
I was very frustrated about the fact that I have to recompile the kernel
to change the hash size. So, I created this patch.

If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
command line, or, if you built IPVS as modules, you can add
options ip_vs conn_tab_bits=??.

To keep everything backward compatible, you still can select the size at
compile time, and that will be used as default.

It has been about a year since this patch was originally posted
and subsequently dropped on the basis of insufficient test data.

Mark Bergsma has provided the following test results which seem
to strongly support the need for larger hash table sizes:

We do however run into the same problem with the default setting (212 =
4096 entries), as most of our LVS balancers handle around a million
connections/SLAB entries at any point in time (around 100-150 kpps
load). With only 4096 hash table entries this implies that each entry
consists of a linked list of 256 connections *on average*.

To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
with both the default 4096 table size, and the same kernel recompiled
with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
quick test setup with a part of Wikimedia/Wikipedia's live traffic
mirrored by the switch to the test host.

With the default setting, at ~ 120 kpps packet load we saw a typical %si
CPU usage of around 30-35%, and oprofile reported a hot spot in
ip_vs_conn_in_get:

samples  %        image name               app name
symbol name
1719761  42.3741  ip_vs.ko                 ip_vs.ko      ip_vs_conn_in_get
302577    7.4554  bnx2                     bnx2          /bnx2
181984    4.4840  vmlinux                  vmlinux       __ticket_spin_lock
128636    3.1695  vmlinux                  vmlinux       ip_route_input
74345     1.8318  ip_vs.ko                 ip_vs.ko      ip_vs_conn_out_get
68482     1.6874  vmlinux                  vmlinux       mwait_idle

After loading the recompiled kernel with 218 entries, %si CPU usage
dropped in half to around 12-18%, and oprofile looks much healthier,
with only 7% spent in ip_vs_conn_in_get:

samples  %        image name               app name
symbol name
265641   14.4616  bnx2                     bnx2         /bnx2
143251    7.7986  vmlinux                  vmlinux      __ticket_spin_lock
140661    7.6576  ip_vs.ko                 ip_vs.ko     ip_vs_conn_in_get
94364     5.1372  vmlinux                  vmlinux      mwait_idle
86267     4.6964  vmlinux                  vmlinux      ip_route_input

[ horms@verge.net.au: trivial up-port and minor style fixes ]
Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Cc: Mark Bergsma <mark@wikimedia.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-05 05:50:24 +01:00
David S. Miller
3a999e6eb5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-12-30 13:51:29 -08:00
Linus Torvalds
c3bf4906fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (74 commits)
  Revert "b43: Enforce DMA descriptor memory constraints"
  iwmc3200wifi: fix array out-of-boundary access
  wl1251: timeout one too soon in wl1251_boot_run_firmware()
  mac80211: fix propagation of failed hardware reconfigurations
  mac80211: fix race with suspend and dynamic_ps_disable_work
  ath9k: fix missed error codes in the tx status check
  ath9k: wake hardware during AMPDU TX actions
  ath9k: wake hardware for interface IBSS/AP/Mesh removal
  ath9k: fix suspend by waking device prior to stop
  cfg80211: fix error path in cfg80211_wext_siwscan
  wl1271_cmd.c: cleanup char => u8
  iwlwifi: Storage class should be before const qualifier
  ath9k: Storage class should be before const qualifier
  cfg80211: fix race between deauth and assoc response
  wireless: remove remaining qual code
  rt2x00: Add USB ID for Linksys WUSB 600N rev 2.
  ath5k: fix SWI calibration interrupt storm
  mac80211: fix ibss join with fixed-bssid
  libertas: Remove carrier signaling from the scan code
  orinoco: fix GFP_KERNEL in orinoco_set_key with interrupts disabled
  ...
2009-12-30 12:37:35 -08:00
John W. Linville
891dc5e737 Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
Conflicts:
	drivers/net/wireless/libertas/scan.c
2009-12-30 15:25:08 -05:00
David S. Miller
7f9d3577e2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-12-29 19:44:25 -08:00
Kalle Valo
e1781ed33a mac80211: annotate sleeping driver ops
To make it easier to notice cases of calling sleeping ops in atomic context,
annotate driver-ops.h with appropiate might_sleep() calls. At the same time,
also document in mac80211.h the op functions with missing contexts.

mac80211 doesn't seem to use get_tx_stats anywhere currently. Just to be on
the safe side, I documented it to be atomic, but hopefully the op can be
removed in the future.

Compile-tested only.

Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-28 16:55:10 -05:00
Johannes Berg
1ed32e4fc8 mac80211: remove struct ieee80211_if_init_conf
All its members (vif, mac_addr, type) are now available
in the vif struct directly, so we can pass that instead
of the conf struct. I generated this patch (except the
mac80211 and header file changes) with this semantic
patch:

@@
identifier conf, fn, hw;
type tp;
@@
tp fn(struct ieee80211_hw *hw,
-struct ieee80211_if_init_conf *conf)
+struct ieee80211_vif *vif)
{
<...
(
-conf->type
+vif->type
|
-conf->mac_addr
+vif->addr
|
-conf->vif
+vif
)
...>
}

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-28 16:55:07 -05:00
Johannes Berg
98b6218388 mac80211/cfg80211: add station events
When, for instance, a new IBSS peer is found, userspace
wants to be notified. Add events for all new stations
that mac80211 learns about.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-28 16:55:06 -05:00
Jouni Malinen
9588bbd552 cfg80211: add remain-on-channel command
Add new commands for requesting the driver to remain awake
on a specified channel for the specified amount of time
(and another command to cancel such an operation). This
can be used to implement userspace-controlled off-channel
operations, like Public Action frame exchange on another
channel than the operation channel.

The off-channel operation should behave similarly to scan,
i.e. the local station (if associated) moves into power
save mode to request the AP to buffer frames for it and
then moves to the other channel to allow the off-channel
operation to be completed. The duration parameter can be
used to request enough time to receive a response from
the target station.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-28 16:55:02 -05:00
Johannes Berg
a80f7c0b08 mac80211: introduce flush operation
We've long lacked a good confirmation that frames
have really gone out, e.g. before going off-channel
for a scan. Add a flush() operation that drivers
can implement to provide that confirmation, and use
it in a few places:
 * before scanning sends the nullfunc frames
 * after scanning sends the nullfunc frames, if any
 * when going idle, to send any pending frames

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-28 16:54:51 -05:00
Johannes Berg
671adc93b6 wireless: remove remaining qual code
This removes the remaining users of the rx status
'qual' field and the field itself.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-28 16:19:45 -05:00
John W. Linville
ea1e4b8420 Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-12-28 15:09:11 -05:00
Octavian Purdila
8beb9ab6c2 llc: convert llc_sap_list to RCU
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-26 20:46:28 -08:00
Octavian Purdila
52d58aef5e llc: replace the socket list with a local address based hash
For the cases where a lot of interfaces are used in conjunction with a
lot of LLC sockets bound to the same SAP, the iteration of the socket
list becomes prohibitively expensive.

Replacing the list with a a local address based hash significantly
improves the bind and listener lookup operations as well as the
datagram delivery.

Connected sockets delivery is also improved, but this patch does not
address the case where we have lots of sockets with the same local
address connected to different remote addresses.

In order to keep the socket sanity checks alive and fast a socket
counter was added to the SAP structure.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-26 20:45:32 -08:00
Octavian Purdila
6d2e3ea284 llc: use a device based hash table to speed up multicast delivery
This patch adds a per SAP device based hash table to solve the
multicast delivery scalability issue when we have large number of
interfaces and a large number of sockets bound to the same SAP.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-26 20:43:57 -08:00
Octavian Purdila
b76f5a8427 llc: convert the socket list to RCU locking
For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism,
which require some extra checks in the lookup code:

a) If the current socket was released, reallocated & inserted in
another list it will short circuit the iteration for the current list,
thus we need to restart the lookup.

b) If the current socket was released, reallocated & inserted in the
same list we just need to recheck it matches the look-up criteria and
if not we can skip to the next element.

In this case there is no need to restart the lookup, since sockets are
inserted at the start of the list and the worst that will happen is
that we will iterate throught some of the list elements more then
once.

Note that the /proc and multicast delivery was not yet converted to
RCU, it still uses spinlocks for protection.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-26 20:41:43 -08:00
Octavian Purdila
e5cd6fe391 llc: add support for LLC_OPT_PKTINFO
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-26 20:40:34 -08:00
David S. Miller
d346f49d0b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-12-25 16:34:56 -08:00
laurent chavey
31d12926e3 net: Add rtnetlink init_rcvwnd to set the TCP initial receive window
Add rtnetlink init_rcvwnd to set the TCP initial receive window size
advertised by passive and active TCP connections.
The current Linux TCP implementation limits the advertised TCP initial
receive window to the one prescribed by slow start. For short lived
TCP connections used for transaction type of traffic (i.e. http
requests), bounding the advertised TCP initial receive window results
in increased latency to complete the transaction.
Support for setting initial congestion window is already supported
using rtnetlink init_cwnd, but the feature is useless without the
ability to set a larger TCP initial receive window.
The rtnetlink init_rcvwnd allows increasing the TCP initial receive
window, allowing TCP connection to advertise larger TCP receive window
than the ones bounded by slow start.

Signed-off-by: Laurent Chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-23 14:13:30 -08:00
Krishna Kumar
12d50c46dc tcp: Remove check in __tcp_push_pending_frames
tcp_push checks tcp_send_head and calls __tcp_push_pending_frames,
which again checks tcp_send_head, and this unnecessary check is
done for every other caller of __tcp_push_pending_frames.

Remove tcp_send_head check in __tcp_push_pending_frames and add
the check to tcp_push_pending_frames. Other functions call
__tcp_push_pending_frames only when tcp_send_head would evaluate
to true.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-23 14:13:28 -08:00
David S. Miller
b4de921ae6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-12-23 14:09:17 -08:00
Johannes Berg
0f78231bff mac80211: enable spatial multiplexing powersave
Enable spatial multiplexing in mac80211 by telling the
driver what to do and, where necessary, sending action
frames to the AP to update the requested SMPS mode.

Also includes a trivial implementation for hwsim that
just logs the requested mode.

For now, the userspace interface is in debugfs only,
and let you toggle the requested mode at any time.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-22 13:31:16 -05:00
Zhu Yi
eaf85ca7fe wireless: add ieee80211_amsdu_to_8023s
Move the A-MSDU handling code from mac80211 to cfg80211 so that more
drivers can use it. The new created function ieee80211_amsdu_to_8023s
converts an A-MSDU frame to a list of 802.3 frames.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-22 13:31:15 -05:00
Johannes Berg
47846c9b0c mac80211: reduce reliance on netdev
For bluetooth 3, we will most likely not have
a netdev for a virtual interface (sdata), so
prepare for that by reducing the reliance on
having a netdev. This patch moves the name
and address fields into the sdata struct and
uses them from there all over. Some work is
needed to keep them sync'ed, but that's not
a lot of work and in slow paths anyway.

In doing so, this also reduces the number of
pointer dereferences in many places, because
of things like sdata->dev->dev_addr becoming
sdata->vif.addr.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-21 18:38:52 -05:00
David S. Miller
ed4b2019a6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-12-21 11:54:49 -08:00
Linus Torvalds
59be2e04e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
  net: sh_eth alignment fix for sh7724 using NET_IP_ALIGN V2
  ixgbe: allow tx of pre-formatted vlan tagged packets
  ixgbe: Fix 82598 premature copper PHY link indicatation
  ixgbe: Fix tx_restart_queue/non_eop_desc statistics counters
  bcm63xx_enet: fix compilation failure after get_stats_count removal
  packet: dont call sleeping functions while holding rcu_read_lock()
  tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.
  ipvs: zero usvc and udest
  netfilter: fix crashes in bridge netfilter caused by fragment jumps
  ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
  sky2: leave PCI config space writeable
  sky2: print Optima chip name
  x25: Update maintainer.
  ipvs: fix synchronization on connection close
  netfilter: xtables: document minimal required version
  drivers/net/bonding/: : use pr_fmt
  can: CAN_MCP251X should depend on HAS_DMA
  drivers/net/usb: Correct code taking the size of a pointer
  drivers/net/cpmac.c: Correct code taking the size of a pointer
  drivers/net/sfc: Correct code taking the size of a pointer
  ...
2009-12-16 10:33:18 -08:00
David S. Miller
81e839efc2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-12-15 21:08:53 -08:00
David S. Miller
bb5b7c1126 tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.
It creates a regression, triggering badness for SYN_RECV
sockets, for example:

[19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
[19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
[19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
[19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
[19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000

This is likely caused by the change in the 'estab' parameter
passed to tcp_parse_options() when invoked by the functions
in net/ipv4/tcp_minisocks.c

But even if that is fixed, the ->conn_request() changes made in
this patch series is fundamentally wrong.  They try to use the
listening socket's 'dst' to probe the route settings.  The
listening socket doesn't even have a route, and you can't
get the right route (the child request one) until much later
after we setup all of the state, and it must be done by hand.

This stuff really isn't ready, so the best thing to do is a
full revert.  This reverts the following commits:

f55017a93f
022c3f7d82
1aba721eba
cda42ebd67
345cda2fd6
dc343475ed
05eaade278
6a2a2d6bf8

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-15 20:56:42 -08:00
Patrick McHardy
8fa9ff6849 netfilter: fix crashes in bridge netfilter caused by fragment jumps
When fragments from bridge netfilter are passed to IPv4 or IPv6 conntrack
and a reassembly queue with the same fragment key already exists from
reassembling a similar packet received on a different device (f.i. with
multicasted fragments), the reassembled packet might continue on a different
codepath than where the head fragment originated. This can cause crashes
in bridge netfilter when a fragment received on a non-bridge device (and
thus with skb->nf_bridge == NULL) continues through the bridge netfilter
code.

Add a new reassembly identifier for packets originating from bridge
netfilter and use it to put those packets in insolated queues.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14805

Reported-and-Tested-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-15 16:59:59 +01:00
Patrick McHardy
0b5ccb2ee2 ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
Currently the same reassembly queue might be used for packets reassembled
by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT),
as well as local delivery. This can cause "packet jumps" when the fragment
completing a reassembled packet is queued from a different position in the
stack than the previous ones.

Add a "user" identifier to the reassembly queue key to seperate the queues
of each caller, similar to what we do for IPv4.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-15 16:59:18 +01:00
Gertjan van Wingerde
d24deb2580 mac80211: Add define for TX headroom reserved by mac80211 itself.
Add a definition of the amount of TX headroom reserved by mac80211 itself
for its own purposes. Also add BUILD_BUG_ON to validate the value.
This define can then be used by drivers to request additional TX headroom
in the most efficient manner.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-14 14:22:31 -05:00
Linus Torvalds
d0316554d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
  m68k: rename global variable vmalloc_end to m68k_vmalloc_end
  percpu: add missing per_cpu_ptr_to_phys() definition for UP
  percpu: Fix kdump failure if booted with percpu_alloc=page
  percpu: make misc percpu symbols unique
  percpu: make percpu symbols in ia64 unique
  percpu: make percpu symbols in powerpc unique
  percpu: make percpu symbols in x86 unique
  percpu: make percpu symbols in xen unique
  percpu: make percpu symbols in cpufreq unique
  percpu: make percpu symbols in oprofile unique
  percpu: make percpu symbols in tracer unique
  percpu: make percpu symbols under kernel/ and mm/ unique
  percpu: remove some sparse warnings
  percpu: make alloc_percpu() handle array types
  vmalloc: fix use of non-existent percpu variable in put_cpu_var()
  this_cpu: Use this_cpu_xx in trace_functions_graph.c
  this_cpu: Use this_cpu_xx for ftrace
  this_cpu: Use this_cpu_xx in nmi handling
  this_cpu: Use this_cpu operations in RCU
  this_cpu: Use this_cpu ops for VM statistics
  ...

Fix up trivial (famous last words) global per-cpu naming conflicts in
	arch/x86/kvm/svm.c
	mm/slab.c
2009-12-14 09:58:24 -08:00
David S. Miller
501706565b Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	include/net/tcp.h
2009-12-11 17:12:17 -08:00
Heiko Carstens
60c2ffd3d2 net: fix compat_sys_recvmmsg parameter type
compat_sys_recvmmsg has a compat_timespec parameter and not a
timespec parameter. This way we also get rid of an odd cast.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-11 15:07:56 -08:00
Linus Torvalds
4ef58d4e2a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
  tree-wide: fix misspelling of "definition" in comments
  reiserfs: fix misspelling of "journaled"
  doc: Fix a typo in slub.txt.
  inotify: remove superfluous return code check
  hdlc: spelling fix in find_pvc() comment
  doc: fix regulator docs cut-and-pasteism
  mtd: Fix comment in Kconfig
  doc: Fix IRQ chip docs
  tree-wide: fix assorted typos all over the place
  drivers/ata/libata-sff.c: comment spelling fixes
  fix typos/grammos in Documentation/edac.txt
  sysctl: add missing comments
  fs/debugfs/inode.c: fix comment typos
  sgivwfb: Make use of ARRAY_SIZE.
  sky2: fix sky2_link_down copy/paste comment error
  tree-wide: fix typos "couter" -> "counter"
  tree-wide: fix typos "offest" -> "offset"
  fix kerneldoc for set_irq_msi()
  spidev: fix double "of of" in comment
  comment typo fix: sybsystem -> subsystem
  ...
2009-12-09 19:43:33 -08:00
Damian Lukowski
2f7de5710a tcp: Stalling connections: Move timeout calculation routine
This patch moves retransmits_timed_out() from include/net/tcp.h
to tcp_timer.c, where it is used.

Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08 20:56:11 -08:00
Damian Lukowski
07f29bc5bb tcp: Stalling connections: Fix timeout calculation routine
This patch fixes a problem in the TCP connection timeout calculation.
Currently, timeout decisions are made on the basis of the current
tcp_time_stamp and retrans_stamp, which is usually set at the first
retransmission.
However, if the retransmission fails in tcp_retransmit_skb(),
retrans_stamp is not updated and remains zero. This leads to wrong
decisions in retransmits_timed_out() if tcp_time_stamp is larger than
the specified timeout, which is very likely.
In this case, the TCP connection dies after the first attempted
(and unsuccessful) retransmission.

With this patch, tcp_skb_cb->when is used instead, when retrans_stamp
is not available.

This bug has been introduced together with retransmits_timed_out() in
2.6.32, as the number of retransmissions has been used for timeout
decisions before. The corresponding commit was
6fa12c8503 (Revert Backoff [v3]:
Calculate TCP's connection close threshold as a time value.).

Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for
testing.

Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08 20:56:11 -08:00
Eric Dumazet
3cdaedae63 tcp: Fix a connect() race with timewait sockets
When we find a timewait connection in __inet_hash_connect() and reuse
it for a new connection request, we have a race window, releasing bind
list lock and reacquiring it in __inet_twsk_kill() to remove timewait
socket from list.

Another thread might find the timewait socket we already chose, leading to
list corruption and crashes.

Fix is to remove timewait socket from bind list before releasing the bind lock.

Note: This problem happens if sysctl_tcp_tw_reuse is set.

Reported-by: kapil dakhane <kdakhane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08 20:17:51 -08:00
Eric Dumazet
9327f7053e tcp: Fix a connect() race with timewait sockets
First patch changes __inet_hash_nolisten() and __inet6_hash()
to get a timewait parameter to be able to unhash it from ehash
at same time the new socket is inserted in hash.

This makes sure timewait socket wont be found by a concurrent
writer in __inet_check_established()

Reported-by: kapil dakhane <kdakhane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08 20:17:51 -08:00
Linus Torvalds
d7fc02c7ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
  mac80211: fix reorder buffer release
  iwmc3200wifi: Enable wimax core through module parameter
  iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
  iwmc3200wifi: Coex table command does not expect a response
  iwmc3200wifi: Update wiwi priority table
  iwlwifi: driver version track kernel version
  iwlwifi: indicate uCode type when fail dump error/event log
  iwl3945: remove duplicated event logging code
  b43: fix two warnings
  ipw2100: fix rebooting hang with driver loaded
  cfg80211: indent regulatory messages with spaces
  iwmc3200wifi: fix NULL pointer dereference in pmkid update
  mac80211: Fix TX status reporting for injected data frames
  ath9k: enable 2GHz band only if the device supports it
  airo: Fix integer overflow warning
  rt2x00: Fix padding bug on L2PAD devices.
  WE: Fix set events not propagated
  b43legacy: avoid PPC fault during resume
  b43: avoid PPC fault during resume
  tcp: fix a timewait refcnt race
  ...

Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
	kernel/sysctl_check.c
	net/ipv4/sysctl_net_ipv4.c
	net/ipv6/addrconf.c
	net/sctp/sysctl.c
2009-12-08 07:55:01 -08:00
Linus Torvalds
1557d33007 Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
  security/tomoyo: Remove now unnecessary handling of security_sysctl.
  security/tomoyo: Add a special case to handle accesses through the internal proc mount.
  sysctl: Drop & in front of every proc_handler.
  sysctl: Remove CTL_NONE and CTL_UNNUMBERED
  sysctl: kill dead ctl_handler definitions.
  sysctl: Remove the last of the generic binary sysctl support
  sysctl net: Remove unused binary sysctl code
  sysctl security/tomoyo: Don't look at ctl_name
  sysctl arm: Remove binary sysctl support
  sysctl x86: Remove dead binary sysctl support
  sysctl sh: Remove dead binary sysctl support
  sysctl powerpc: Remove dead binary sysctl support
  sysctl ia64: Remove dead binary sysctl support
  sysctl s390: Remove dead sysctl binary support
  sysctl frv: Remove dead binary sysctl support
  sysctl mips/lasat: Remove dead binary sysctl support
  sysctl drivers: Remove dead binary sysctl support
  sysctl crypto: Remove dead binary sysctl support
  sysctl security/keys: Remove dead binary sysctl support
  sysctl kernel: Remove binary sysctl logic
  ...
2009-12-08 07:38:50 -08:00
Jiri Kosina
d014d04386 Merge branch 'for-next' into for-linus
Conflicts:

	kernel/irq/chip.c
2009-12-07 18:36:35 +01:00
David S. Miller
8f56874bd7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-12-04 13:25:15 -08:00
André Goddard Rosa
af901ca181 tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:55 +01:00
Eric Dumazet
13475a30b6 tcp: connect() race with timewait reuse
Its currently possible that several threads issuing a connect() find
the same timewait socket and try to reuse it, leading to list
corruptions.

Condition for bug is that these threads bound their socket on same
address/port of to-be-find timewait socket, and connected to same
target. (SO_REUSEADDR needed)

To fix this problem, we could unhash timewait socket while holding
ehash lock, to make sure lookups/changes will be serialized. Only
first thread finds the timewait socket, other ones find the
established socket and return an EADDRNOTAVAIL error.

This second version takes into account Evgeniy's review and makes sure
inet_twsk_put() is called outside of locked sections.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 16:17:43 -08:00
David S. Miller
a7fca0ccec Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2009-12-03 13:51:02 -08:00
Eric W. Biederman
b099ce2602 net: Batch inet_twsk_purge
This function walks the whole hashtable so there is no point in
passing it a network namespace.  Instead I purge all timewait
sockets from dead network namespaces that I find.  If the namespace
is one of the once I am trying to purge I am guaranteed no new timewait
sockets can be formed so this will get them all.  If the namespace
is one I am not acting for it might form a few more but I will
call inet_twsk_purge again and  shortly to get rid of them.  In
any even if the network namespace is dead timewait sockets are
useless.

Move the calls of inet_twsk_purge into batch_exit routines so
that if I am killing a bunch of namespaces at once I will just
call inet_twsk_purge once and save a lot of redundant unnecessary
work.

My simple 4k network namespace exit test the cleanup time dropped from
roughly 8.2s to 1.6s.  While the time spent running inet_twsk_purge fell
to about 2ms.  1ms for ipv4 and 1ms for ipv6.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:23:47 -08:00
Eric W. Biederman
e9c5158ac2 net: Allow fib_rule_unregister to batch
Refactor the code so fib_rules_register always takes a template instead
of the actual fib_rules_ops structure that will be used.  This is
required for network namespace support so 2 out of the 3 callers already
do this, it allows the error handling to be made common, and it allows
fib_rules_unregister to free the template for hte caller.

Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu
to allw multiple namespaces to be cleaned up in the same rcu grace
period.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:22:55 -08:00
Eric W. Biederman
d79d792ef9 net: Allow xfrm_user_net_exit to batch efficiently.
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from
other parts of the xfrm code.  Add xfrm.nlsk_stash a copy of xfrm.nlsk that
will never be set to NULL.  This allows the synchronize_net and
netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets
have been set to NULL.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:22:03 -08:00
Eric W. Biederman
72ad937abd net: Add support for batching network namespace cleanups
- Add exit_list to struct net to support building lists of network
  namespaces to cleanup.

- Add exit_batch to pernet_operations to allow running operations only
  once during a network namespace exit.  Instead of once per network
  namespace.

- Factor opt ops_exit_list and ops_exit_free so the logic with cleanup
  up a network namespace does not need to be duplicated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:22:01 -08:00
Patrick McHardy
1b038a5e60 net 03/05: fib_rules: add oif classification
commit 68144d350f4f6c348659c825cde6a82b34c27a91
Author: Patrick McHardy <kaber@trash.net>
Date:   Thu Dec 3 12:05:25 2009 +0100

    net: fib_rules: add oif classification

    Support routing table lookup based on the flow's oif. This is useful to
    classify packets originating from sockets bound to interfaces differently.

    The route cache already includes the oif and needs no changes.

    Signed-off-by: Patrick McHardy <kaber@trash.net>

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:14:36 -08:00
Patrick McHardy
491deb24bf net 02/05: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME
commit 229e77eec406ad68662f18e49fda8b5d366768c5
Author: Patrick McHardy <kaber@trash.net>
Date:   Thu Dec 3 12:05:23 2009 +0100

    net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME

    The next patch will add oif classification, rename interface related members
    and attributes to reflect that they're used for iif classification.

    Signed-off-by: Patrick McHardy <kaber@trash.net>

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:14:36 -08:00
Patrick McHardy
d285834001 net 01/05: fib_rules: rearrange struct fib_rule
commit b8952893d5d86f69c4e499d191b98c6658f64b0f
Author: Patrick McHardy <kaber@trash.net>
Date:   Thu Dec 3 12:05:22 2009 +0100

    net: fib_rules: rearrange struct fib_rule

    The ifname member is only used to resolve interface names and is not needed
    during rule lookups. The target and ctarget members however are used during
    rule lookups and are currently located in a second cacheline.

    Move ifname further to the end to make sure both target and ctarget are
    located in the same cacheline as other members used during rule lookups.

    The layout on 64 bit changes from:

    struct fib_rule {
    	...
            u32                        table;                /*    56     4 */
            u8                         action;               /*    60     1 */

            /* XXX 3 bytes hole, try to pack */

            /* --- cacheline 1 boundary (64 bytes) --- */
            u32                        target;               /*    64     4 */

            /* XXX 4 bytes hole, try to pack */

            struct fib_rule *          ctarget;              /*    72     8 */
            struct rcu_head            rcu;                  /*    80    16 */
            struct net *               fr_net;               /*    96     8 */
    };

    to:

    struct fib_rule {
    	...
            u32                        table;                /*    40     4 */
            u8                         action;               /*    44     1 */

            /* XXX 3 bytes hole, try to pack */

            u32                        target;               /*    48     4 */

            /* XXX 4 bytes hole, try to pack */

            struct fib_rule *          ctarget;              /*    56     8 */
            /* --- cacheline 1 boundary (64 bytes) --- */
            char                       ifname[16];           /*    64    16 */
            struct rcu_head            rcu;                  /*    80    16 */
            struct net *               fr_net;               /*    96     8 */

    };

    Signed-off-by: Patrick McHardy <kaber@trash.net>

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:14:34 -08:00
Gustavo F. Padovan
4ec10d9720 Bluetooth: Implement RejActioned flag
RejActioned is used to prevent retransmission when a entity is on the
WAIT_F state, i.e., waiting for a frame with F-bit set due local busy
condition or a expired retransmission timer. (When these two events raise
they send a frame with the Poll bit set and enters in the WAIT_F state to
wait for a frame with the Final bit set.)
The local entity doesn't send I-frames(the data frames) until the receipt
of a frame with F-bit set. When that happens it also set RejActioned to false.
RejActioned is a mandatory feature of ERTM spec.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03 19:34:24 +01:00
Gustavo F. Padovan
9f121a5a80 Bluetooth: Fix sending ReqSeq on I-frames
As specified by ERTM spec an ERTM channel can acknowledge received
I-frames(the data frames) by sending an I-frame with the proper ReqSeq
value (i.e. ReqSeq is set to BufferSeq).  Until now we aren't setting the
ReqSeq value on I-frame control bits. That way we can save sending
S-frames(Supervise frames) only to acknowledge receipt of I-frames. It
is very helpful to the full-duplex channel.
ReqSeq is the packet sequence number sent in an acknowledgement frame to
acknowledge receipt of frames up to (ReqSeq - 1).
BufferSeq controls the receiver buffer, it is used to delay
acknowledgement of new frames to not cause buffer overflow. BufferSeq
value is not increased until frames are pulled by reassembly function.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03 19:34:23 +01:00
Marcel Holtmann
c78ae28314 Bluetooth: Unobfuscate tasklet_schedule usage
The tasklet schedule function helpers are just an obfuscation. So remove
them and call the schedule functions directly.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03 19:34:21 +01:00
Marcel Holtmann
76bca88012 Bluetooth: Turn hci_recv_frame into an exported function
For future simplification it is important that the hci_recv_frame
function is no longer an inline function. So move it into the module
itself and export it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03 19:34:20 +01:00
Ilpo Järvinen
8818a9d884 tcp: clear hints to avoid a stale one (nfs only affected?)
Eric Dumazet mentioned in a context of another problem:

"Well, it seems NFS reuses its socket, so maybe we miss some
cleaning as spotted in this old patch"

I've not check under which conditions that actually happens but
if true, we need to make sure we don't accidently leave stale
hints behind when the write queue had to be purged (whether reusing
with NFS can actually happen if purging took place is something I'm
not sure of).

...At least it compiles.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:24:02 -08:00
William Allen Simpson
4957faade1 TCPCT part 1g: Responder Cookie => Initiator
Parse incoming TCP_COOKIE option(s).

Calculate <SYN,ACK> TCP_COOKIE option.

Send optional <SYN,ACK> data.

This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

Requires:
   TCPCT part 1a: add request_values parameter for sending SYNACK
   TCPCT part 1b: generate Responder Cookie secret
   TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
   TCPCT part 1d: define TCP cookie option, extend existing struct's
   TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
   TCPCT part 1f: Initiator Cookie => Responder

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:26 -08:00
William Allen Simpson
435cf559f0 TCPCT part 1d: define TCP cookie option, extend existing struct's
Data structures are carefully composed to require minimal additions.
For example, the struct tcp_options_received cookie_plus variable fits
between existing 16-bit and 8-bit variables, requiring no additional
space (taking alignment into consideration).  There are no additions to
tcp_request_sock, and only 1 pointer in tcp_sock.

This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

The principle difference is using a TCP option to carry the cookie nonce,
instead of a user configured offset in the data.  This is more flexible and
less subject to user configuration error.  Such a cookie option has been
suggested for many years, and is also useful without SYN data, allowing
several related concepts to use the same extension option.

    "Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
    http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html

    "Re: what a new TCP header might look like", May 12, 1998.
    ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail

These functions will also be used in subsequent patches that implement
additional features.

Requires:
   TCPCT part 1a: add request_values parameter for sending SYNACK
   TCPCT part 1b: generate Responder Cookie secret
   TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:25 -08:00
William Allen Simpson
519855c508 TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
Define sysctl (tcp_cookie_size) to turn on and off the cookie option
default globally, instead of a compiled configuration option.

Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
data values, retrieving variable cookie values, and other facilities.

Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
near its corresponding struct tcp_options_received (prior to changes).

This is a straightforward re-implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

These functions will also be used in subsequent patches that implement
additional features.

Requires:
   net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:24 -08:00
William Allen Simpson
da5c78c826 TCPCT part 1b: generate Responder Cookie secret
Define (missing) hash message size for SHA1.

Define hashing size constants specific to TCP cookies.

Add new function: tcp_cookie_generator().

Maintain global secret values for tcp_cookie_generator().

This is a significantly revised implementation of earlier (15-year-old)
Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform.

Linux RCU technique appears to be well-suited to this application, though
neither of the circular queue items are freed.

These functions will also be used in subsequent patches that implement
additional features.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:23 -08:00
William Allen Simpson
e6b4d11367 TCPCT part 1a: add request_values parameter for sending SYNACK
Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission.  Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.

Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:23 -08:00
David S. Miller
ff9c38bba3 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/mac80211/ht.c
2009-12-01 22:13:38 -08:00
Eric W. Biederman
65c0cfafce net: remove [un]register_pernet_gen_... and update the docs.
No that all of the callers have been updated to set fields in
struct pernet_operations, and simplified to let the network
namespace core handle the allocation and freeing of the storage
for them, remove the surpurpflous methods and update the docs
to the new style.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:16:00 -08:00
Eric W. Biederman
f875bae065 net: Automatically allocate per namespace data.
To get the full benefit of batched network namespace cleanup netowrk
device deletion needs to be performed by the generic code.  When
using register_pernet_gen_device and freeing the data in exit_net
it is impossible to delay allocation until after exit_net has called
as the device uninit methods are no longer safe.

To correct this, and to simplify working with per network namespace data
I have moved allocation and deletion of per network namespace data into
the network namespace core.  The core now frees the data only after
all of the network namespace exit routines have run.

Now it is only required to set the new fields .id and .size
in the pernet_operations structure if you want network namespace
data to be managed for you automatically.

This makes the current register_pernet_gen_device and
register_pernet_gen_subsys routines unnecessary.  For the moment
I have left them as compatibility wrappers in net_namespace.h
They will be removed once all of the users have been updated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:51 -08:00
Eric W. Biederman
2b035b3997 net: Batch network namespace destruction.
It is fairly common to kill several network namespaces at once.  Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments.  As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.

To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch.  For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.

An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.

In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu).  The bulk of that 44s was spent in inet_twsk_purge.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:51 -08:00
Eric W. Biederman
a5ee155136 net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCH
The motivation for an additional notifier in batched netdevice
notification (rt_do_flush) only needs to be called once per batch not
once per namespace.

For further batching improvements I need a guarantee that the
netdevices are unregistered in order allowing me to unregister an all
of the network devices in a network namespace at the same time with
the guarantee that the loopback device is really and truly
unregistered last.

Additionally it appears that we moved the route cache flush after
the final synchronize_net, which seems wrong and there was no
explanation.  So I have restored the original location of the final
synchronize_net.

Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:50 -08:00
Linus Torvalds
29e553631b Merge branch 'security' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
* 'security' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
  mac80211: fix spurious delBA handling
  mac80211: fix two remote exploits
2009-11-30 16:47:16 -08:00
Johannes Berg
827d42c9ac mac80211: fix spurious delBA handling
Lennert Buytenhek noticed that delBA handling in mac80211
was broken and has remotely triggerable problems, some of
which are due to some code shuffling I did that ended up
changing the order in which things were done -- this was

  commit d75636ef9c
  Author: Johannes Berg <johannes@sipsolutions.net>
  Date:   Tue Feb 10 21:25:53 2009 +0100

    mac80211: RX aggregation: clean up stop session

and other parts were already present in the original

  commit d92684e660
  Author: Ron Rindjunsky <ron.rindjunsky@intel.com>
  Date:   Mon Jan 28 14:07:22 2008 +0200

      mac80211: A-MPDU Tx add delBA from recipient support

The first problem is that I moved a BUG_ON before various
checks -- thereby making it possible to hit. As the comment
indicates, the BUG_ON can be removed since the ampdu_action
callback must already exist when the state is != IDLE.

The second problem isn't easily exploitable but there's a
race condition due to unconditionally setting the state to
OPERATIONAL when a delBA frame is received, even when no
aggregation session was ever initiated. All the drivers
accept stopping the session even then, but that opens a
race window where crashes could happen before the driver
accepts it. Right now, a WARN_ON may happen with non-HT
drivers, while the race opens only for HT drivers.

For this case, there are two things necessary to fix it:
 1) don't process spurious delBA frames, and be more careful
    about the session state; don't drop the lock

 2) HT drivers need to be prepared to handle a session stop
    even before the session was really started -- this is
    true for all drivers (that support aggregation) but
    iwlwifi which can be fixed easily. The other HT drivers
    (ath9k and ar9170) are behaving properly already.

Reported-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-30 13:55:51 -05:00
David S. Miller
9b963e5d0e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/ieee802154/fakehard.c
	drivers/net/e1000e/ich8lan.c
	drivers/net/e1000e/phy.c
	drivers/net/netxen/netxen_nic_init.c
	drivers/net/wireless/ath/ath9k/main.c
2009-11-29 00:57:15 -08:00
andrew hendry
2f5517aefc X25: Move SYSCTL ifdefs into header
Moves the CONFIG_SYSCTL ifdefs in x25_init into header.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29 00:24:59 -08:00
David S. Miller
5656b6ca19 Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev 2009-11-29 00:16:22 -08:00
Andrei Pelinescu-Onciul
5fdd4baef6 sctp: on T3_RTX retransmit all the in-flight chunks
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding  transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
 T3-rtx timer expired but did not fit in one MTU (rule E3 above)
 should be marked for retransmission and sent as soon as cwnd
 allows (normally, when a SACK arrives). ".

This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).

Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).

This commit reverts d0ce92910b and
also removes the now unused transport->last_rto, introduced in
 b6157d8e03.

p.s  The problem is not only when multiple paths are there.  It
can happen in a single homed environment.  If the application
stops sending data, it possible to have a hung association.

Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29 00:14:02 -08:00
Samuel Ortiz
67fbb16be6 nl80211: PMKSA caching support
This is an interface to set, delete and flush PMKIDs through nl80211.
Main users would be fullmac devices which firmwares are capable of
generating the RSN IEs for the re-association requests, e.g. iwmc3200wifi.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-28 15:05:05 -05:00
Johannes Berg
8c0c709eea mac80211: move cmntr flag out of rx flags
The RX flags should soon be used only for flags
that cannot change within an a-MPDU, so move the
cooked monitor flag into the RX status flags.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-28 15:05:01 -05:00
Martin Willi
4447bb33f0 xfrm: Store aalg in xfrm_state with a user specified truncation length
Adding a xfrm_state requires an authentication algorithm specified
either as xfrm_algo or as xfrm_algo_auth with a specific truncation
length. For compatibility, both attributes are dumped to userspace,
and we also accept both attributes, but prefer the new syntax.

If no truncation length is specified, or the authentication algorithm
is specified using xfrm_algo, the truncation length from the algorithm
description in the kernel is used.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-25 15:48:38 -08:00
David S. Miller
4ba3eb034f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-11-24 15:01:29 -08:00
Vlad Yasevich
46d5a80855 sctp: Update max.burst implementation
Current implementation of max.burst ends up limiting new
data during cwnd decay period.  The decay is happening becuase
the connection is idle and we are allowed to fill the congestion
window.  The point of max.burst is to limit micro-bursts in response
to large acks.  This still happens, as max.burst is still applied
to each transmit opportunity.  It will also apply if a very large
send is made (greater then allowed by burst).

Tested-by: Florian Niederbacher <florian.niederbacher@student.uibk.ac.at>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:54:00 -05:00
Vlad Yasevich
a5b03ad214 sctp: Turn the enum socket options into defines
Recent attempt to remove deprecated socket options demonstrated
that removing options from the enum space will have severe
binary compatibility issues.  The reason is that it changes
the subsequent enum space and causes option values to be redefined.
To solve this, and to get rid of the ugly double statements for
every option, we simply convert to the #define scheme.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:59 -05:00
Vlad Yasevich
245cba7e55 sctp: Remove useless last_time_used variable
The transport last_time_used variable is rather useless.
It was only used when determining if CWND needs to be updated
due to idle transport.  However, idle transport detection was
based on a Heartbeat timer and last_time_used was not incremented
when sending Heartbeats.  As a result the check for cwnd reduction
was always true.  We can get rid of the variable and just base
our cwnd manipulation on the HB timer (like the code comment sais).
We also have to call into the cwnd manipulation function regardless
of whether HBs are enabled or not.  That way we will detect idle
transports if the user has disabled Heartbeats.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:58 -05:00
Amerigo Wang
a242b41ded sctp: remove deprecated SCTP_GET_*_OLD stuffs
SCTP_GET_*_OLD stuffs are schedlued to be removed.

Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:58 -05:00
Vlad Yasevich
90f2f5318b sctp: Update SWS avaoidance receiver side algorithm
We currently send window update SACKs every time we free up 1 PMTU
worth of data.  That a lot more SACKs then necessary.  Instead, we'll
now send back the actuall window every time we send a sack, and do
window-update SACKs when a fraction of the receive buffer has been
opened.  The fraction is controlled with a sysctl.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:57 -05:00
Vlad Yasevich
6383cfb3ed sctp: Fix malformed "Invalid Stream Identifier" error
The "Invalid Stream Identifier" error has a 16 bit reserved
field at the end, thus making the parameter length be 8 bytes.
We've never supplied that reserved field making wireshark
tag the packet as malformed.

Reported-by: Chris Dischino <cdischino@sonusnet.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:56 -05:00
Wei Yongjun
b93d647174 sctp: implement the sender side for SACK-IMMEDIATELY extension
This patch implement the sender side for SACK-IMMEDIATELY
extension.

  Section 4.1.  Sender Side Considerations

  Whenever the sender of a DATA chunk can benefit from the
  corresponding SACK chunk being sent back without delay, the sender
  MAY set the I-bit in the DATA chunk header.

  Reasons for setting the I-bit include

  o  The sender is in the SHUTDOWN-PENDING state.

  o  The application requests to set the I-bit of the last DATA chunk
     of a user message when providing the user message to the SCTP
     implementation.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:56 -05:00
Eric Dumazet
8964be4a9a net: rename skb->iif to skb->skb_iif
To help grep games, rename iif to skb_iif

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-20 15:35:04 -08:00
Johannes Berg
a58ce43f2f mac80211: avoid spurious deauth frames/messages
With WEXT, it happens frequently that the SME
requests an authentication but then deauthenticates
right away because some new parameters came along.
Every time this happens we print a deauth message
and send a deauth frame, but both of that is rather
confusing. Avoid it by aborting the authentication
process silently, and telling cfg80211 about that.

The patch looks larger than it really is:
__cfg80211_auth_remove() is split out from
cfg80211_send_auth_timeout(), there's no new code
except __cfg80211_auth_canceled() (a one-liner) and
the mac80211 bits (7 new lines of code).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:09:02 -05:00
Johannes Berg
7351c6bd48 mac80211: request TX status where needed
Right now all frames mac80211 hands to the driver
have the IEEE80211_TX_CTL_REQ_TX_STATUS flag set to
request TX status. This isn't really necessary, only
the injected frames need TX status (the latter for
hostapd) so move setting this flag.

The rate control algorithms also need TX status, but
they don't require it.

Also, rt2x00 uses that bit for its own purposes and
seems to require it being set for all frames, but
that can be fixed in rt2x00.

This doesn't really change anything for any drivers
but in the future drivers using hw-rate control may
opt to not report TX status for frames that don't
have the IEEE80211_TX_CTL_REQ_TX_STATUS flag set.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com> [rt2x00 bits]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:08:56 -05:00
Johannes Berg
9bc383de37 cfg80211: introduce capability for 4addr mode
It's very likely that not many devices will support
four-address mode in station or AP mode so introduce
capability bits for both modes, set them in mac80211
and check them when userspace tries to use the mode.
Also, keep track of 4addr in cfg80211 (wireless_dev)
and not in mac80211 any more. mac80211 can also be
improved for the VLAN case by not looking at the
4addr flag but maintaining the station pointer for
it correctly. However, keep track of use_4addr for
station mode in mac80211 to avoid all the derefs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:08:53 -05:00
Johannes Berg
5be83de54c cfg80211: convert bools into flags
We've accumulated a number of options for wiphys
which make more sense as flags as we keep adding
more. Convert the existing ones.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:08:50 -05:00
David S. Miller
3505d1a9fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sfc/sfe4001.c
	drivers/net/wireless/libertas/cmd.c
	drivers/staging/Kconfig
	drivers/staging/Makefile
	drivers/staging/rtl8187se/Kconfig
	drivers/staging/rtl8192e/Kconfig
2009-11-18 22:19:03 -08:00
Linus Torvalds
486bfe5c7c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  cxgb3: fix premature page unmap
  ibm_newemac: Fix EMACx_TRTR[TRT] bit shifts
  vlan: Fix register_vlan_dev() error path
  gro: Fix illegal merging of trailer trash
  sungem: Fix Serdes detection.
  net: fix mdio section mismatch warning
  ppp: fix BUG on non-linear SKB (multilink receive)
  ixgbe: Fixing EEH handler to handle more than one error
  net: Fix the rollback test in dev_change_name()
  Revert "isdn: isdn_ppp: Use SKB list facilities instead of home-grown implementation."
  TI Davinci EMAC : Fix Console Hang when bringing the interface down
  smsc911x: Fix Console Hang when bringing the interface down.
  mISDN: fix error return in HFCmulti_init()
  forcedeth: mac address fix
  r6040: fix version printing
  Bluetooth: Fix regression with L2CAP configuration in Basic Mode
  Bluetooth: Select Basic Mode as default for SOCK_SEQPACKET
  Bluetooth: Set general bonding security for ACL by default
  r8169: Fix receive buffer length when MTU is between 1515 and 1536
  can: add the missing netlink get_xstats_size callback
  ...
2009-11-18 14:54:45 -08:00
Johannes Berg
af65cd96dd mac80211: make software rate control optional
Some devices implement the entire rate control in
firmware in some way, like wl1271 or like iwlwifi
which does some things in software but not a lot.
Therefore generic software rate control is rather
useless for them and just adds avoidable overhead
to the transmit path.

It's fairly simple to let drivers indicate that
they do not need rate control, but they need to
fulfil a number of conditions that we encode in
WARN_ONs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:24 -05:00
Johannes Berg
c951ad3550 mac80211: convert aggregation to operate on vifs/stas
The entire aggregation code currently operates on the
hw pointer and station addresses, but that needs to
change to make stations purely per-vif; As one step
preparing for that make the aggregation code callable
with the station, or by the combination of virtual
interface and station address.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:15 -05:00
Felix Fietkau
599bf6a4d6 mac80211: add the total ampdu length to tx info
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:10 -05:00
David S. Miller
dfef948ed2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-11-18 10:55:32 -08:00
Rémi Denis-Courmont
eeb74a9d45 Phonet: convert devices list to RCU
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 10:08:26 -08:00
Eric W. Biederman
bb9074ff58 Merge commit 'v2.6.32-rc7'
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
2009-11-17 01:01:34 -08:00
David S. Miller
a2bfbc072e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/can/Kconfig
2009-11-17 00:05:02 -08:00
Eric Dumazet
2c1409a0a2 inetpeer: Optimize inet_getid()
While investigating for network latencies, I found inet_getid() was a
contention point for some workloads, as inet_peer_idlock is shared
by all inet_getid() users regardless of peers.

One way to fix this is to make ip_id_count an atomic_t instead
of __u16, and use atomic_add_return().

In order to keep sizeof(struct inet_peer) = 64 on 64bit arches
tcp_ts_stamp is also converted to __u32 instead of "unsigned long".

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:46:58 -08:00
Lucian Adrian Grijincu
5256f2ef3a inet: fix inet_bind_bucket_for_each
The first "node" is supposed to be the cursor used in the for_each.

The second "node" is ment literally and should not be macro expanded:
it's the name of the hlist_node field from the inet_bind_bucket.

This currently works because when inet_bind_bucket_for_each is called
it's argument is still "node".

Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:46:56 -08:00
William Allen Simpson
bee7ca9ec0 net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED
Define two symbols needed in both kernel and user space.

Remove old (somewhat incorrect) kernel variant that wasn't used in
most cases.  Default should apply to both RMSS and SMSS (RFC2581).

Replace numeric constants with defined symbols.

Stand-alone patch, originally developed for TCPCT.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:38:48 -08:00
Vlad Yasevich
409b95aff3 sctp: Set source addresses on the association before adding transports
Recent commit 8da645e101
	sctp: Get rid of an extra routing lookup when adding a transport
introduced a regression in the connection setup.  The behavior was

different between IPv4 and IPv6.  IPv4 case ended up working because the
route lookup routing returned a NULL route, which triggered another
route lookup later in the output patch that succeeded.  In the IPv6 case,
a valid route was returned for first call, but we could not find a valid
source address at the time since the source addresses were not set on the
association yet.  Thus resulted in a hung connection.

The solution is to set the source addresses on the association prior to
adding peers.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 19:56:50 -08:00
Holger Schurig
61fa713c75 cfg80211: return channel noise via survey API
This patch implements the NL80211_CMD_GET_SURVEY command and an get_survey()
ops that a driver can implement. The goal of this command is to allow a
drivers to report channel survey data (e.g. channel noise, channel
occupation).

For now, only the mechanism to report back channel noise has been
implemented.

In future, there will either be a survey-trigger command --- or the existing
scan-trigger command will be enhanced. This will allow user-space to
request survey for arbitrary channels.

Note: any driver that cannot report channel noise should not report
any value at all, e.g. made-up -92 dBm.

Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:58 -05:00
Rui Paulo
63c5723bc3 mac80211: add nl80211/cfg80211 handling of the new mesh root mode option.
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:57 -05:00
Rui Paulo
d19b3bf638 mac80211: replace "destination" with "target" to follow the spec
Resulting object files have the same MD5 as before.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:56 -05:00
Eric W. Biederman
f8572d8f2a sysctl net: Remove unused binary sysctl code
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-12 02:05:06 -08:00
Felix Fietkau
8b787643ca nl80211: add a parameter for using 4-address frames on virtual interfaces
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 17:02:07 -05:00
Eric Dumazet
30fff9231f udp: bind() optimisation
UDP bind() can be O(N^2) in some pathological cases.

Thanks to secondary hash tables, we can make it O(N)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:38 -08:00
Rémi Denis-Courmont
6b0d07ba15 Phonet: put sockets in a hash table
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:33 -08:00
David S. Miller
f6d773cd4f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-11-09 11:17:24 -08:00
Linus Torvalds
1ce55238e2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  net/fsl_pq_mdio: add module license GPL
  can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo()
  can: should not use __dev_get_by_index() without locks
  hisax: remove bad udelay call to fix build error on ARM
  ipip: Fix handling of DF packets when pmtudisc is OFF
  qlge: Set PCIe reset type for EEH to fundamental.
  qlge: Fix early exit from mbox cmd complete wait.
  ixgbe: fix traffic hangs on Tx with ioatdma loaded
  ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled
  ixgbe: Fix gso_max_size for 82599 when DCB is enabled
  macsonic: fix crash on PowerBook 520
  NET: cassini, fix lock imbalance
  ems_usb: Fix byte order issues on big endian machines
  be2net: Bug fix to send config commands to hardware after netdev_register
  be2net: fix to set proper flow control on resume
  netfilter: xt_connlimit: fix regression caused by zero family value
  rt2x00: Don't queue ieee80211 work after USB removal
  Revert "ipw2200: fix oops on missing firmware"
  decnet: netdevice refcount leak
  netfilter: nf_nat: fix NAT issue in 2.6.30.4+
  ...
2009-11-09 09:51:42 -08:00
Yury Polyanskiy
9e0d57fd6d xfrm: SAD entries do not expire correctly after suspend-resume
This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine 
in the suspended state. This leads to the connectivity problems because 
after resuming local machine thinks that the SAD entry is still valid, while 
it has already been expired on the remote server.

  The cause of this is very simple: the timeouts in the net/xfrm are bound to 
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.

  I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.

  This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).

  This patch is against 2.6.31.4. Please CC me.

Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:58:41 -08:00
Eric Dumazet
512615b6b8 udp: secondary hash on (local port, local address)
Extends udp_table to contain a secondary hash table.

socket anchor for this second hash is free, because UDP
doesnt use skc_bind_node : We define an union to hold
both skc_bind_node & a new hlist_nulls_node udp_portaddr_node

udp_lib_get_port() inserts sockets into second hash chain
(additional cost of one atomic op)

udp_lib_unhash() deletes socket from second hash chain
(additional cost of one atomic op)

Note : No spinlock lockdep annotation is needed, because
lock for the secondary hash chain is always get after
lock for primary hash chain.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:06 -08:00
Eric Dumazet
d4cada4ae1 udp: split sk_hash into two u16 hashes
Union sk_hash with two u16 hashes for udp (no extra memory taken)

One 16 bits hash on (local port) value (the previous udp 'hash')

One 16 bits hash on (local address, local port) values, initialized
but not yet used. This second hash is using jenkin hash for better
distribution.

Because the 'port' is xored later, a partial hash is performed
on local address + net_hash_mix(net)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:05 -08:00
Eric Dumazet
fdcc8aa953 udp: add a counter into udp_hslot
Adds a counter in udp_hslot to keep an accurate count
of sockets present in chain.

This will permit to upcoming UDP lookup algo to chose
the shortest chain when secondary hash is added.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:04 -08:00
Eric W. Biederman
81adee47df net: Support specifying the network namespace upon device creation.
There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.

We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call.  To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.

In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2009-11-08 00:53:51 -08:00
David S. Miller
10d626f4f4 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan 2009-11-06 17:57:51 -08:00
Johannes Berg
af81858172 mac80211: async station powersave handling
Some devices require that all frames to a station
are flushed when that station goes into powersave
mode before being able to send frames to that
station again when it wakes up or polls -- all in
order to avoid reordering and too many or too few
frames being sent to the station when it polls.

Normally, this is the case unless the station
goes to sleep and wakes up very quickly again.
But in that case, frames for it may be pending
on the hardware queues, and thus races could
happen in the case of multiple hardware queues
used for QoS/WMM. Normally this isn't a problem,
but with the iwlwifi mechanism we need to make
sure the race doesn't happen.

This makes mac80211 able to cope with the race
with driver help by a new WLAN_STA_PS_DRIVER
per-station flag that can be controlled by the
driver and tells mac80211 whether it can transmit
frames or not. This flag must be set according to
very specific rules outlined in the documentation
for the function that controls it.

When we buffer new frames for the station, we
normally set the TIM bit right away, but while
the driver has blocked transmission to that sta
we need to avoid that as well since we cannot
respond to the station if it wakes up due to the
TIM bit. Once the driver unblocks, we can set
the TIM bit.

Similarly, when the station just wakes up, we
need to wait until all other frames are flushed
before we can transmit frames to that station,
so the same applies here, we need to wait for
the driver to give the OK.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-06 16:49:10 -05:00
David S. Miller
62d83681e5 Merge branch 'linux-2.6.33.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2009-11-06 05:01:54 -08:00
Dmitry Eremin-Solenikov
bb1cafb8fc ieee802154: add support for creation/removal of logic interfaces
Add support for two more NL802154 commands: ADD_IFACE and DEL_IFACE,
thus allowing creation and removal of logic WPAN interfaces on the top
of wpan-phy.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:32:24 +03:00
Dmitry Eremin-Solenikov
42723448b8 ieee802154: add an mlme_ops call to retrieve PHY object
ops->get_phy should increment reference to wpan-phy. As we return
the external structure, we should do refcounting correctly.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:32:18 +03:00
Dmitry Eremin-Solenikov
a9966b580a ieee802154: constify struct net_device in some operations
Some of ieee802154 operations really shouldn't change passed net_device.
Constify passed argument.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:31:16 +03:00
Dmitry Eremin-Solenikov
e9cf356c0c wpan-phy: follow usual patter of devices registration
Follow the usual pattern of devices registration by adding new function
(wpan_phy_set_dev) that sets child->parent relationship and removing
parent argument from wpan_phy_register call.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:29:50 +03:00
Dmitry Eremin-Solenikov
a0b4a738e0 wpan-phy: allow specifying a per-page channel mask
IEEE 802.15.4-2006 defines channel pages that hold channels (max 32 pages,
27 channels per page). Allow the driver to specify supported channels
on pages, other than the first one.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:23:41 +03:00
Dmitry Eremin-Solenikov
1c889f4db6 wpan-phy: add wpan-phy iteration functions
Add API to iterate over the wpan-phy instances.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:12:24 +03:00
Dmitry Eremin-Solenikov
69d9ab96f9 wpan-phy: add a helper to put the wpan_phy device
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:10:32 +03:00
David S. Miller
230f9bb701 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/usb/cdc_ether.c

All CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 00:55:55 -08:00
Jozsef Kadlecsik
f9dd09c7f7 netfilter: nf_nat: fix NAT issue in 2.6.30.4+
Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
over NAT. The "cause" of the problem was a fix of unacknowledged data
detection with NAT (commit a3a9f79e36).
However, actually, that fix uncovered a long standing bug in TCP conntrack:
when NAT was enabled, we simply updated the max of the right edge of
the segments we have seen (td_end), by the offset NAT produced with
changing IP/port in the data. However, we did not update the other parameter
(td_maxend) which is affected by the NAT offset. Thus that could drift
away from the correct value and thus resulted breaking active FTP.

The patch below fixes the issue by *not* updating the conntrack parameters
from NAT, but instead taking into account the NAT offsets in conntrack in a
consistent way. (Updating from NAT would be more harder and expensive because
it'd need to re-calculate parameters we already calculated in conntrack.)

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 00:43:42 -08:00
Eric Paris
13f18aa05f net: drop capability from protocol definitions
struct can_proto had a capability field which wasn't ever used.  It is
dropped entirely.

struct inet_protosw had a capability field which can be more clearly
expressed in the code by just checking if sock->type = SOCK_RAW.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-05 21:40:17 -08:00
Gilad Ben-Yossef
6a2a2d6bf8 tcp: Use defaults when no route options are available
Trying to parse the option of a SYN packet that we have
no route entry for should just use global wide defaults
for route entry options.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Tested-by: Valdis.Kletnieks@vt.edu
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04 23:24:15 -08:00
Johannes Berg
5ed176e1c4 mac80211: make ieee80211_find_sta per virtual interface
Since we have a TODO item to make all station
management dependent on virtual interfaces, I
figured I'd start with pushing such a change
to drivers before more drivers start using the
ieee80211_find_sta() API with a hw pointer and
cause us grief later on.

For now continue exporting the old API in form
of ieee80211_find_sta_by_hw(), but discourage
its use strongly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-04 18:44:48 -05:00
Eric Dumazet
fd2c3ef761 net: cleanup include/net
This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.

struct something
{

becomes :

struct something {

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04 05:06:25 -08:00
Linus Torvalds
a84216e671 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  mac80211: check interface is down before type change
  cfg80211: fix NULL ptr deref
  libertas if_usb: Fix crash on 64-bit machines
  mac80211: fix reason code output endianness
  mac80211: fix addba timer
  ath9k: fix misplaced semicolon on rate control
  b43: Fix DMA TX bounce buffer copying
  mac80211: fix BSS leak
  rt73usb.c : more ids
  ipw2200: fix oops on missing firmware
  gre: Fix dev_addr clobbering for gretap
  sky2: set carrier off in probe
  net: fix sk_forward_alloc corruption
  pcnet_cs: add cis of PreMax PE-200 ethernet pcmcia card
  r8169: Fix card drop incoming VLAN tagged MTU byte large jumbo frames
  ibmtr: possible Read buffer overflow?
  net: Fix RPF to work with policy routing
  net: fix kmemcheck annotations
  e1000e: rework disable K1 at 1000Mbps for 82577/82578
  e1000e: config PHY via software after resets
  ...
2009-11-03 07:44:01 -08:00
Eric Van Hensbergen
3e2796a90c 9p: fix readdir corner cases
The patch below also addresses a couple of other corner cases in readdir
seen with a large (e.g. 64k) msize.  I'm not sure what people think of
my co-opting of fid->aux here.  I'd be happy to rework if there's a better
way.

When the size of the user supplied buffer passed to readdir is smaller
than the data returned in one go by the 9P read request, v9fs_dir_readdir()
currently discards extra data so that, on the next call, a 9P read
request will be issued with offset < previous offset + bytes returned,
which voilates the constraint described in paragraph 3 of read(5) description.
This patch preseves the leftover data in fid->aux for use in the next call.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-11-02 08:43:45 -06:00
Johannes Berg
c27f2fded5 mac80211: deprecate qual value
This value is unused by mac80211, because it was only
be used by wireless extensions, and turned out to not
be useful there because the quality value needs to be
comparable between scan results and the current value
which is impossible when the qual value is calculated
taking into account noise, for example.

Since it is unused anyway, this patch deprecates it
in the hope that drivers will remove their sometimes
quite expensive calculations of the value.

I'm open to actual uses of the value, but the best
way of using it seems to be what the Intel drivers do
which should probably be generalised if we have noise
values from the hardware.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-30 16:50:39 -04:00
Johannes Berg
eddcbb94f7 mac80211: introduce ieee80211_beacon_get_tim()
Compared to ieee80211_beacon_get(), the new function
ieee80211_beacon_get_tim() returns information on the
location and length of the TIM IE, which some drivers
need in order to generate the TIM on the device. The
old function, ieee80211_beacon_get(), becomes a small
static inline wrapper around the new one to not break
all drivers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-30 16:50:39 -04:00
Johannes Berg
0869aea0eb mac80211: remove RX_FLAG_RADIOTAP
While there may be a case for a driver adding its
own bits of radiotap information, none currently
does. Also, drivers would have to copy the code
to generate the radiotap bits that now mac80211
generates. If some driver in the future needs to
add some driver-specific information I'd expect
that to be in a radiotap vendor namespace and we
can add a different way of passing such data up
and having mac80211 include it.

Additionally, rename IEEE80211_CONF_RADIOTAP to
IEEE80211_CONF_MONITOR since it's still used by
b43(legacy) to obtain per-frame timestamps.

The purpose of this patch is to simplify the RX
code in mac80211 to make it easier to add paged
skb support.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-30 16:49:20 -04:00
Johannes Berg
6a86b9c78e mac80211: fix radiotap header generation
In

  commit 601ae7f25a
  Author: Bruno Randolf <br1@einfach.org>
  Date:   Thu May 8 19:22:43 2008 +0200

      mac80211: make rx radiotap header more flexible

code was added that tried to align the radiotap header
position in memory based on the radiotap header length.
Quite obviously, that is completely useless.

Instead of trying to do that, use unaligned accesses
to generate the radiotap header. To properly do that,
we also need to mark struct ieee80211_radiotap_header
packed, but that is fine since it's already packed
(and it should be marked packed anyway since its a
wire format).

Cc: Bruno Randolf <br1@einfach.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-30 16:49:20 -04:00
jamal
b0c110ca8e net: Fix RPF to work with policy routing
Policy routing is not looked up by mark on reverse path filtering.
This fixes it.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29 22:49:12 -07:00
David S. Miller
ed3f2e40f3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-10-29 02:47:13 -07:00
Gilad Ben-Yossef
0c3adfb8ec Add dst_feature to query route entry features
Adding an accessor to existing  dst_entry feautres field and
refactor the only supported feature (allfrag) to use it.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29 01:28:42 -07:00
Gilad Ben-Yossef
022c3f7d82 Allow tcp_parse_options to consult dst entry
We need tcp_parse_options to be aware of dst_entry to
take into account per dst_entry TCP options settings

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29 01:28:41 -07:00
Eric Dumazet
23289a37e2 net: add a list_head parameter to dellink() method
Adding a list_head parameter to rtnl_link_ops->dellink() methods
allow us to queue devices on a list, in order to dismantle
them all at once.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28 02:22:07 -07:00
Kalle Valo
e36e49f733 mac80211: add ieee80211_rx_ni()
ieee80211_rx() must be called with bottom halves disabled. To simplify
driver development implement ieee80211_rx_ni() which disables BH. This
function must be used when in process context.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-27 16:48:21 -04:00
Holger Schurig
ce470613cd cfg80211: no cookies in cfg80211_send_XXX()
Get rid of cookies in cfg80211_send_XXX() functions.

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-27 16:48:16 -04:00
David S. Miller
cfadf853f6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sh_eth.c
2009-10-27 01:03:26 -07:00
Eric Dumazet
7c28bd0b8e rtnetlink: speedup rtnl_dump_ifinfo()
When handling large number of netdevice, rtnl_dump_ifinfo()
is very slow because it has O(N^2) complexity.

Instead of scanning one single list, we can use the 256 sub lists
of the dev_index hash table.

This considerably speedups "ip link" operations

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24 06:13:17 -07:00
Eric Dumazet
ef9a9d1183 ipv6 sit: RCU conversion phase I
SIT tunnels use one rwlock to protect their prl entries.

This first patch adds RCU locking for prl management,
with standard call_rcu() calls.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24 06:07:55 -07:00
jamal
1c55d62e77 pkt_sched: skbedit add support for setting mark
This adds support for setting the skb mark.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-22 21:56:42 -07:00
Krishna Kumar
ea94ff3b55 net: Fix for dst_negative_advice
dst_negative_advice() should check for changed dst and reset
sk_tx_queue_mapping accordingly. Pass sock to the callers of
dst_negative_advice.

(sk_reset_txq is defined just for use by dst_negative_advice. The
only way I could find to get around this is to move dst_negative_()
from dst.h to dst.c, include sock.h in dst.c, etc)

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 18:55:46 -07:00
Krishna Kumar
e022f0b4a0 net: Introduce sk_tx_queue_mapping
Introduce sk_tx_queue_mapping; and functions that set, test and
get this value. Reset sk_tx_queue_mapping to -1 whenever the dst
cache is set/reset, and in socket alloc. Setting txq to -1 and
using valid txq=<0 to n-1> allows the tx path to use the value
of sk_tx_queue_mapping directly instead of subtracting 1 on every
tx.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 18:55:45 -07:00
Eric Dumazet
abf90cca97 net: Fix struct inet_timewait_sock bitfield annotation
commit 9e337b0f (net: annotate inet_timewait_sock bitfields)
added 4/8 bytes in struct inet_timewait_sock.

Fix this by declaring tw_ipv6_offset in the 'flags' bitfield
The 14 bits hole is named tw_pad to make it cleary apparent.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 01:13:26 -07:00
Arnaldo Carvalho de Melo
748879776e net: Avoid compiler warning for mmsghdr when CONFIG_COMPAT is not selected
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 01:09:17 -07:00
Inaky Perez-Gonzalez
c2315b4ea9 wimax/i2400m: clarify and fix i2400m->{ready,updown}
The i2400m driver uses two different bits to distinguish how much the
driver is up. i2400m->ready is used to denote that the infrastructure
to communicate with the device is up and running. i2400m->updown is
used to indicate if 'ready' and the device is up and running, ready to
take control and data traffic.

However, all this was pretty dirty and not clear, with many open spots
where race conditions were present.

This commit cleans up the situation by:

 - documenting the usage of both bits

 - setting them only in specific, well controlled places
   (i2400m_dev_start, i2400m_dev_stop)

 - ensuring the i2400m workqueue can't get in the middle of the
   setting by flushing it when i2400m->ready is set to zero. This
   allows the report hook not having to check again for the bit to be
   set [rx.c:i2400m_report_hook_work()].

 - using i2400m->updown to determine if the device is up and running
   instead of the wimax state in i2400m_dev_reset_handle().

 - not loosing missed messages sent by the hardware before
   i2400m->ready is set. In rx.c, whatever the device sends can be
   sent to user space over the message pipes as soon as the wimax
   device is registered, so don't wait for i2400m->ready to be set.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-10-19 15:56:07 +09:00
Steffen Klassert
eb2ff967a5 xfrm: remove skb_icv_walk
The last users of skb_icv_walk are converted to ahash now,
so skb_icv_walk is unused and can be removed.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 21:32:01 -07:00
Steffen Klassert
2ad9afbf5c ah: Remove obsolete code
ah4 and ah6 are converted to ahash now, so we can remove the
code for the obsolete hash algorithm.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 21:32:00 -07:00
Steffen Klassert
49cbf95248 ah: Add struct crypto_ahash to ah_data
To support for ahash algorithms, we add a pointer to a
crypto_ahash to ah_data.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 21:31:58 -07:00
Eric Dumazet
c720c7e838 inet: rename some inet_sock fields
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 18:52:53 -07:00
Rémi Denis-Courmont
f062f41d06 Phonet: routing table Netlink interface
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:17 -07:00
Rémi Denis-Courmont
55748ac046 Phonet: routing table backend
The Phonet "universe" only has 64 addresses, so we keep a trivial flat
routing table.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:16 -07:00
Rémi Denis-Courmont
f14001fcd7 Phonet: deliver broadcast packets to broadcast sockets
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:15 -07:00
David S. Miller
421355de87 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-13 12:55:20 -07:00
David S. Miller
417c5233db Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-10-13 11:41:34 -07:00
Eric Dumazet
f373b53b5f tcp: replace ehash_size by ehash_mask
Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:44:02 -07:00
Arnaldo Carvalho de Melo
a2e2725541 net: Introduce recvmmsg socket syscall
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 23:40:10 -07:00
Neil Horman
3b885787ea net: Generalize socket rx gap / receive queue overflow cmsg
Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames.  This value was
exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option.  As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames.  It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count).  Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me.  This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 13:26:31 -07:00
Johannes Berg
d20ef63d32 mac80211: document ieee80211_rx() context requirement
ieee80211_rx() must be called with softirqs disabled
since the networking stack requires this for netif_rx()
and some code in mac80211 can assume that it can not
be processing its own tasklet and this call at the same
time.

It may be possible to remove this requirement after a
careful audit of mac80211 and doing any needed locking
improvements in it along with disabling softirqs around
netif_rx(). An alternative might be to push all packet
processing to process context in mac80211, instead of
to the tasklet, and add other synchronisation.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-12 15:55:53 -04:00
Eric Dumazet
5fdb9973c1 net: Fix struct sock bitfield annotation
Since commit a98b65a3 (net: annotate struct sock bitfield), we lost
8 bytes in struct sock on 64bit arches because of
kmemcheck_bitfield_end(flags) misplacement.

Fix this by putting together sk_shutdown, sk_no_check, sk_userlocks,
sk_protocol and sk_type in the 'flags' 32bits bitfield

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-11 23:03:52 -07:00
David S. Miller
8aa0f64ac3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-10-09 14:40:09 -07:00
Eric Dumazet
f86dcc5aa8 udp: dynamically size hash tables at boot time
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
several setups.

4000 active UDP sockets -> 32 sockets per chain in average. An
incoming frame has to lookup all sockets to find best match, so long
chains hurt latency.

Instead of a fixed size hash table that cant be perfect for every
needs, let UDP stack choose its table size at boot time like tcp/ip
route, using alloc_large_system_hash() helper

Add an optional boot parameter, uhash_entries=x so that an admin can
force a size between 256 and 65536 if needed, like thash_entries and
rhash_entries.

dmesg logs two new lines :
[    0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[    0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)

Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
debugging spinlocks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 22:00:22 -07:00
Kalle Valo
dfce95f51f cfg80211: add firmware and hardware version to wiphy
It's useful to provide firmware and hardware version to user space and have a
generic interface to retrieve them. Users can provide the version information
in bug reports etc.

Add fields for firmware and hardware version to struct wiphy.

(Dropped nl80211 bits for now and modified remaining bits in favor of
ethtool. -- JWL)

Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:46 -04:00
Johannes Berg
3d23e349d8 wext: refactor
Refactor wext to
 * split out iwpriv handling
 * split out iwspy handling
 * split out procfs support
 * allow cfg80211 to have wireless extensions compat code
   w/o CONFIG_WIRELESS_EXT

After this, drivers need to
 - select WIRELESS_EXT	- for wext support
 - select WEXT_PRIV	- for iwpriv support
 - select WEXT_SPY	- for iwspy support

except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.

Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:43 -04:00
Stephen Hemminger
ec1b4cf74c net: mark net_proto_ops as const
All usages of structure net_proto_ops should be declared const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:46 -07:00
Eric Dumazet
d250a5f90e pkt_sched: gen_estimator: Dont report fake rate estimators
Jarek Poplawski a écrit :
>
>
> Hmm... So you made me to do some "real" work here, and guess what?:
> there is one serious checkpatch warning! ;-) Plus, this new parameter
> should be added to the function description. Otherwise:
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> Thanks,
> Jarek P.
>
> PS: I guess full "Don't" would show we really mean it...

Okay :) Here is the last round, before the night !

Thanks again

[RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.

# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0

User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)

After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

We add a parameter to gnet_stats_copy_rate_est() function so that
it can use gen_estimator_active(bstats, r), as suggested by Jarek.

This parameter can be NULL if check is not necessary, (htb for
example has a mandatory rate estimator)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:07:42 -07:00
YOSHIFUJI Hideaki / 吉藤英明
fa857afcf7 ipv6 sit: 6rd (IPv6 Rapid Deployment) Support.
IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
deploy IPv6 unicast service to IPv4 sites to which it provides
customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
IPv4 encapsulation in order to transit IPv4-only network
infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
prefix of its own in place of the fixed 6to4 prefix.

With this option enabled, the SIT driver offers 6rd functionality by
providing additional ioctl API to configure the IPv6 Prefix for in
stead of static 2002::/16 for 6to4.

Original patch was done by Alexandre Cassen <acassen@freebox.fr>
based on old Internet-Draft.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:07:37 -07:00
Eric Dumazet
bcdce7195e net: speedup sk_wake_async()
An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))

On 32bit arches :

offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
offsetof(struct sock, sk_lock)         =0x34   (rw)

offsetof(struct sock, sk_sleep)        =0x50 (read)
offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
offsetof(struct sock, sk_receive_queue)=0x74   (rw)

offsetof(struct sock, sk_forward_alloc)=0x98   (rw)

offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter)       =0xf8    (read)

offsetof(struct sock, sk_socket)       =0x138 (read)

offsetof(struct sock, sk_data_ready)   =0x15c   (read)


We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

This avoids one cache line load per incoming packet for common cases (no fasync())

We can leave (or even move in a future patch) sk->sk_socket in a cold location

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-06 17:28:29 -07:00
Eric Dumazet
0bfbedb14a tunnels: Optimize tx path
We currently dirty a cache line to update tunnel device stats
(tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
counters that already are present in cpu cache, in the cache
line shared with txq->_xmit_lock

This patch extends IPTUNNEL_XMIT() macro to use txq pointer
provided by the caller.

Also &tunnel->dev->stats can be replaced by &dev->stats

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:57 -07:00
Stephen Hemminger
16c6cf8bb4 ipv4: fib table algorithm performance improvement
The FIB algorithim for IPV4 is set at compile time, but kernel goes through
the overhead of function call indirection at runtime. Save some
cycles by turning the indirect calls to direct calls to either
hash or trie code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:56 -07:00
Christoph Lameter
4ea7334b6d this_cpu: Use this_cpu ops for network statistics
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-03 19:48:22 +09:00
Christoph Lameter
4eb41d10c7 this_cpu: Use this_cpu operations for SNMP statistics
SNMP statistic macros can be signficantly simplified.
This will also reduce code size if the arch supports these operations
in hardware.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-03 19:48:22 +09:00
David S. Miller
b7058842c9 net: Make setsockopt() optlen be unsigned.
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:12:20 -07:00
Johannes Berg
8f1546cadf wext: add back wireless/ dir in sysfs for cfg80211 interfaces
The move away from having drivers assign wireless handlers,
in favour of making cfg80211 assign them, broke the sysfs
registration (the wireless/ dir went missing) because the
handlers are now assigned only after registration, which is
too late.

Fix this by special-casing cfg80211-based devices, all
of which are required to have an ieee80211_ptr, in the
sysfs code, and also using get_wireless_stats() to have
the same values reported as in procfs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-28 16:55:07 -04:00
Sascha Hlusiak
d1f8297a96 Revert "sit: stateless autoconf for isatap"
This reverts commit 645069299a.

While the code does not actually break anything, it does not completely follow
RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the
ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better
implemented in userspace.

The kernel should not send RS packages for ISATAP at all.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-26 20:28:07 -07:00
Eric Dumazet
a43912ab19 tunnel: eliminate recursion field
It seems recursion field from "struct ip_tunnel" is not anymore needed.
recursion prevention is done at the upper level (in dev_queue_xmit()),
since we use HARD_TX_LOCK protection for tunnels.

This avoids a cache line ping pong on "struct ip_tunnel" : This structure
should be now mostly read on xmit and receive paths.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24 15:39:22 -07:00
Alexey Dobriyan
8d65af789f sysctl: remove "struct file *" argument of ->proc_handler
It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Abhishek Kulkarni
60e78d2c99 9p: Add fscache support to 9p
This patch adds a persistent, read-only caching facility for
9p clients using the FS-Cache caching backend.

When the fscache facility is enabled, each inode is associated
with a corresponding vcookie which is an index into the FS-Cache
indexing tree. The FS-Cache indexing tree is indexed at 3 levels:
- session object associated with each mount.
- inode/vcookie
- actual data (pages)

A cache tag is chosen randomly for each session. These tags can
be read off /sys/fs/9p/caches and can be passed as a mount-time
parameter to re-attach to the specified caching session.

Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-09-23 13:03:46 -05:00
Jarek Poplawski
926e61b7c4 pkt_sched: Fix tx queue selection in tc_modify_qdisc
After the recent mq change there is the new select_queue qdisc class
method used in tc_modify_qdisc, but it works OK only for direct child
qdiscs of mq qdisc. Grandchildren always get the first tx queue, which
would give wrong qdisc_root etc. results (e.g. for sch_htb as child of
sch_prio). This patch fixes it by using parent's dev_queue for such
grandchildren qdiscs. The select_queue method's return type is changed
BTW.

With feedback from: Patrick McHardy <kaber@trash.net>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15 02:53:07 -07:00
Moni Shoua
75c78500dd bonding: remap muticast addresses without using dev_close() and dev_open()
This patch fixes commit e36b9d16c6. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:

*. It assumes tha the device is UP when calling dev_close(), or otherwise
   dev_close() has no affect. It is worth to mention that initscripts (Redhat)
   and sysconfig (Suse) doesn't act the same in this matter. 
*. dev_close() has other side affects, like deleting entries from the routing
   table, which might be unnecessary.

The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
   
Reported-by:   Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15 02:37:40 -07:00
Ilpo Järvinen
0b6a05c1db tcp: fix ssthresh u16 leftover
It was once upon time so that snd_sthresh was a 16-bit quantity.
...That has not been true for long period of time. I run across
some ancient compares which still seem to trust such legacy.
Put all that magic into a single place, I hopefully found all
of them.

Compile tested, though linking of allyesconfig is ridiculous
nowadays it seems.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15 01:30:10 -07:00
Alexey Dobriyan
41135cc836 net: constify struct inet6_protocol
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-14 17:03:05 -07:00
Alexey Dobriyan
32613090a9 net: constify struct net_protocol
Remove long removed "inet_protocol_base" declaration.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-14 17:03:01 -07:00
David S. Miller
9a0da0d19c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-09-10 18:17:09 -07:00
Patrick McHardy
23bcf634c8 net_sched: fix estimator lock selection for mq child qdiscs
When new child qdiscs are attached to the mq qdisc, they are actually
attached as root qdiscs to the device queues. The lock selection for
new estimators incorrectly picks the root lock of the existing and
to be replaced qdisc, which results in a use-after-free once the old
qdisc has been destroyed.

Mark mq qdisc instances with a new flag and treat qdiscs attached to
mq as children similar to regular root qdiscs.

Additionally prevent estimators from being attached to the mq qdisc
itself since it only updates its byte and packet counters during dumps.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-09 18:11:23 -07:00
David S. Miller
6ec1c69a8f net_sched: add classful multiqueue dummy scheduler
This patch adds a classful dummy scheduler which can be used as root qdisc
for multiqueue devices and exposes each device queue as a child class.

This allows to address queues individually and graft them similar to regular
classes. Additionally it presents an accumulated view of the statistics of
all real root qdiscs in the dummy root.

Two new callbacks are added to the qdisc_ops and qdisc_class_ops:

- cl_ops->select_queue selects the tx queue number for new child classes.

- qdisc_ops->attach() overrides root qdisc device grafting to attach
  non-shared qdiscs to the queues.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06 02:07:05 -07:00
Patrick McHardy
589983cd21 net_sched: move dev_graft_qdisc() to sch_generic.c
It will be used in a following patch by the multiqueue qdisc.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06 02:07:05 -07:00
Wei Yongjun
9237ccbc0b sctp: turn flags in 'struct sctp_association' into bit fields
This shrinks the size of struct sctp_association a little.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:02 -04:00
Bhaskar Dutta
723884339f sctp: Sysctl configuration for IPv4 Address Scoping
This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.

In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.

This sysctl configuration allows SCTP to function in such
unconventional networking environments.

Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack

Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:01 -04:00
Vlad Yasevich
a803c94230 sctp: Turn flags in 'sctp_packet' into bit fields
This shrinks the size of sctp_packet a little.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:01 -04:00
Vlad Yasevich
f68b2e05f3 sctp: Fix SCTP_MAXSEG socket option to comply to spec.
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:00 -04:00
Vlad Yasevich
cb95ea32a4 sctp: Don't do NAGLE delay on large writes that were fragmented small
SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU.  Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens.  The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich
4d3c46e683 sctp: drop a_rwnd to 0 when receive buffer overflows.
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages.  To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion.  When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time.  This worked well in testing and under stress produced
rather even recovery.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich
9c5c62be2f sctp: Send user messages to the lower layer as one
Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:57 -04:00
Vlad Yasevich
bec9640bb0 sctp: Disallow new connection on a closing socket
If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down.  If this was a result of a close() call, there will be no socket
and will cause a memory leak.  We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00
Rami Rosen
b4e8c6a7e6 sctp: remove unused union (sctp_cmsg_data_t) definition
This patch removes an unused union definition (sctp_cmsg_data_t)
from include/net/sctp/user.h.

Signed-off-by: Rami Rosen <rosenrami@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:55 -04:00
Wu Fengguang
aa1330766c tcp: replace hard coded GFP_KERNEL with sk_allocation
This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
			tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:45:45 -07:00
Eric Dumazet
2e59af3dcb vlan: multiqueue vlan device
vlan devices are currently not multi-queue capable.

We can do that with a new rtnl_link_ops method,
get_tx_queues(), called from rtnl_create_link()

This new method gets num_tx_queues/real_num_tx_queues
from real device.

register_vlan_device() is also handled.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 18:03:00 -07:00
Stephen Hemminger
3b401a81c0 inet: inet_connection_sock_af_ops const
The function block inet_connect_sock_af_ops contains no data
make it constant.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:49 -07:00
David S. Miller
6cdee2f96a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/yellowfin.c
2009-09-02 00:32:56 -07:00
David S. Miller
2fbd3da387 pkt_sched: Revert tasklet_hrtimer changes.
These are full of unresolved problems, mainly that conversions don't
work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers
tasklets can't be killed from softirq context.

And when a qdisc gets reset, that's exactly what we need to do here.

We'll work this out in the net-next-2.6 tree and if warranted we'll
backport that work to -stable.

This reverts the following 3 changesets:

a2cb6a4dd4
("pkt_sched: Fix bogon in tasklet_hrtimer changes.")

38acce2d79
("pkt_sched: Convert CBQ to tasklet_hrtimer.")

ee5f9757ea
("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer")

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:59:25 -07:00
Stephen Hemminger
89d69d2b75 net: make neigh_ops constant
These tables are never modified at runtime. Move to read-only
section.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:57 -07:00
Damian Lukowski
5152fc7de3 RTO connection timeout: coding style fixes and comments
This patch affects the retransmits_timed_out() function.

Changes:
1) Variables have more meaningful names
2) retransmits_timed_out() has an introductionary comment.
3) Small coding style changes.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:47 -07:00
Alexey Dobriyan
86393e52c3 netns: embed ip6_dst_ops directly
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.

For that:
* move struct dst_ops into separate header to fix circular dependencies
	I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns

For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:31 -07:00
Damian Lukowski
6fa12c8503 Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.
RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
which may represent a number of allowed retransmissions or a timeout value.
Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
in number of allowed retransmissions.

For any desired threshold R2 (by means of time) one can specify tcp_retries2
(by means of number of retransmissions) such that TCP will not time out
earlier than R2. This is the case, because the RTO schedule follows a fixed
pattern, namely exponential backoff.

However, the RTO behaviour is not predictable any more if RTO backoffs can be
reverted, as it is the case in the draft
"Make TCP more Robust to Long Connectivity Disruptions"
(http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).

In the worst case TCP would time out a connection after 3.2 seconds, if the
initial RTO equaled MIN_RTO and each backoff has been reverted.

This patch introduces a function retransmits_timed_out(N),
which calculates the timeout of a TCP connection, assuming an initial
RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.

Whenever timeout decisions are made by comparing the retransmission counter
to some value N, this function can be used, instead.

The meaning of tcp_retries2 will be changed, as many more RTO retransmissions
can occur than the value indicates. However, it yields a timeout which is
similar to the one of an unpatched, exponentially backing off TCP in the same
scenario. As no application could rely on an RTO greater than MIN_RTO, there
should be no risk of a regression.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 02:45:47 -07:00
Damian Lukowski
f1ecd5d9e7 Revert Backoff [v3]: Revert RTO on ICMP destination unreachable
Here, an ICMP host/network unreachable message, whose payload fits to
TCP's SND.UNA, is taken as an indication that the RTO retransmission has
not been lost due to congestion, but because of a route failure
somewhere along the path.
With true congestion, a router won't trigger such a message and the
patched TCP will operate as standard TCP.

This patch reverts one RTO backoff, if an ICMP host/network unreachable
message, whose payload fits to TCP's SND.UNA, arrives.
Based on the new RTO, the retransmission timer is reset to reflect the
remaining time, or - if the revert clocked out the timer - a retransmission
is sent out immediately.
Backoffs are only reverted, if TCP is in RTO loss recovery, i.e. if
there have been retransmissions and reversible backoffs, already.

Changes from v2:
1) Renaming of skb in tcp_v4_err() moved to another patch.
2) Reintroduced tcp_bound_rto() and __tcp_set_rto().
3) Fixed code comments.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 02:45:42 -07:00
Yi Zou
7114323b17 dcbnl: Add support for setapp/getapp to netdev dcbnl_rtnl_ops
Adds support of dcbnl setapp/getapp to dcbnl_rtnl_ops in netdev to allow
LLDs to implement their corresponding dcbnl setapp/getapp ops to support
the IEEE 802.1Q DCBX setapp/getapp commands.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:24:30 -07:00
David S. Miller
b9caaabb99 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2009-08-30 21:30:39 -07:00
Eric Dumazet
df19a62677 tcp: keepalive cleanups
Introduce keepalive_probes(tp) helper, and use it, like 
keepalive_time_when(tp) and keepalive_intvl_when(tp)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:48:54 -07:00
John W. Linville
103bf9f7d3 mac80211: remove ieee80211_rx namespace hack
With the libipw naming scheme change, it is no longer necessary for
mac80211 to avoid the ieee80211_rx name clash.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:29 -04:00
John W. Linville
b0a4e7d8a2 libipw: switch from ieee80211_* to libipw_* naming policy
This eliminates the dual definition of ieee80211_channel (and possibly
others), further clarifying who defines what and paving the way for
inclusion of cfg80211.h.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:28 -04:00
Gustavo F. Padovan
2246b2f1b4 Bluetooth: Handle L2CAP case when the remote receiver is busy
Implement all issues related to RemoteBusy in the RECV state table.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-26 00:12:20 -07:00
Patrick McHardy
3993832464 netfilter: nfnetlink: constify message attributes and headers
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25 16:07:58 +02:00
Patrick McHardy
3a6c2b419b netlink: constify nlmsghdr arguments
Consitfy nlmsghdr arguments to a couple of functions as preparation
for the next patch, which will constify the netlink message data in
all nfnetlink users.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25 16:07:40 +02:00
David S. Miller
9409172262 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan 2009-08-23 19:19:30 -07:00
David S. Miller
ee5f9757ea pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer
None of this stuff should execute in hw IRQ context, therefore
use a tasklet_hrtimer so that it runs in softirq context.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-22 18:09:17 -07:00
Luiz Augusto von Dentz
9e726b1742 Bluetooth: Fix rejected connection not disconnecting ACL link
When using DEFER_SETUP on a RFCOMM socket, a SABM frame triggers
authorization which when rejected send a DM response. This is fine
according to the RFCOMM spec:

    the responding implementation may replace the "proper" response
    on the Multiplexer Control channel with a DM frame, sent on the
    referenced DLCI to indicate that the DLCI is not open, and that
    the responder would not grant a request to open it later either.

But some stacks doesn't seems to cope with this leaving DLCI 0 open after
receiving DM frame.

To fix it properly a timer was introduced to rfcomm_session which is used
to set a timeout when the last active DLC of a session is unlinked, this
will give the remote stack some time to reply with a proper DISC frame on
DLCI 0 avoiding both sides sending DISC to each other on stacks that
follow the specification and taking care of those who don't by taking
down DLCI 0.

Signed-off-by: Luiz Augusto von Dentz <luiz.dentz@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 15:05:58 -07:00
Gustavo F. Padovan
ef54fd937f Bluetooth: Full support for receiving L2CAP SREJ frames
Support for receiving of SREJ frames as specified by the state table.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 15:03:43 -07:00
Gustavo F. Padovan
8f17154f1f Bluetooth: Add support for L2CAP SREJ exception
When L2CAP loses an I-frame we send a SREJ frame to the transmitter side
requesting the lost packet. This patch implement all Recv I-frame events
on SREJ_SENT state table except the ones that deal with SendRej (the REJ
exception at receiver side is yet not implemented).

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 15:01:25 -07:00
Gustavo F. Padovan
fcc203c30d Bluetooth: Add support for FCS option to L2CAP
Implement CRC16 check for L2CAP packets. FCS is used by Streaming Mode and
Enhanced Retransmission Mode and is a extra check for the packet content.

Using CRC16 is the default, L2CAP won't use FCS only when both side send
a "No FCS" request.

Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:59:49 -07:00
Gustavo F. Padovan
e90bac061b Bluetooth: Add support for Retransmission and Monitor Timers
L2CAP uses retransmission and monitor timers to inquiry the other side
about unacked I-frames. After sending each I-frame we (re)start the
retransmission timer. If it expires, we start a monitor timer that send a
S-frame with P bit set and wait for S-frame with F bit set. If monitor
timer expires, try again, at a maximum of L2CAP_DEFAULT_MAX_TX.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:56:15 -07:00
Gustavo F. Padovan
30afb5b2aa Bluetooth: Initial support for retransmission of packets with REJ frames
When receiving an I-frame with unexpected txSeq, receiver side start the
recovery procedure by sending a REJ S-frame to the transmitter side. So
the transmitter can re-send the lost I-frame.

This patch just adds a basic support for retransmission, it doesn't
mean that ERTM now has full support for packet retransmission.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:55:20 -07:00
Gustavo F. Padovan
c74e560cd0 Bluetooth: Add support for Segmentation and Reassembly of SDUs
ERTM should use Segmentation and Reassembly to break down a SDU in many
PDUs on sending data to the other side.

On sending packets we queue all 'segments' until end of segmentation and
just the add them to the queue for sending. On receiving we create a new
SKB with the SDU reassembled.

Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:53:58 -07:00
Gustavo F. Padovan
1c2acffb76 Bluetooth: Add initial support for ERTM packets transfers
This patch adds support for ERTM transfers, without retransmission, with
txWindow up to 63 and with acknowledgement of packets received. Now the
packets are queued before call l2cap_do_send(), so packets couldn't be
sent at the time we call l2cap_sock_sendmsg(). They will be sent in
an asynchronous way on later calls of l2cap_ertm_send(). Besides if an
error occurs on calling l2cap_do_send() we disconnect the channel.

Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:53:01 -07:00
Gustavo F. Padovan
f2fcfcd670 Bluetooth: Add configuration support for ERTM and Streaming mode
Add support to config_req and config_rsp to configure ERTM and Streaming
mode. If the remote device specifies ERTM or Streaming mode, then the
same mode is proposed. Otherwise ERTM or Basic mode is used. And in case
of a state 2 device, the remote device should propose the same mode. If
not, then the channel gets disconnected.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:50:07 -07:00
Marcel Holtmann
c6b03cf986 Bluetooth: Allow setting of L2CAP ERTM via socket option
To enable Enhanced Retransmission mode it needs to be set via a socket
option. A different mode can be set on a socket, but on listen() and
connect() the mode is checked and ERTM is only allowed if it is enabled
via the module parameter.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:50:07 -07:00
Thomas Gleixner
a6a67efd70 Bluetooth: Convert hdev->req_lock to a mutex
hdev->req_lock is used as mutex so make it a mutex.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:35:02 -07:00
Marcel Holtmann
9eba32b86d Bluetooth: Add extra device reference counting for connections
The device model itself has no real usable reference counting at the
moment and this causes problems if parents are deleted before their
children. The device model itself handles the memory details of this
correctly, but the uevent order is not consistent. This causes various
problems for systems like HAL or even X.

So until device_put() does a proper cleanup, the device for Bluetooth
connection will be protected with an extra reference counting to ensure
the correct order of uevents when connections are terminated.

This is not an automatic feature. Higher Bluetooth layers like HIDP or
BNEP should grab this new reference to ensure that their uevents are
send before the ones from the parent device.

Based on a report by Brian Rogers <brian@xyzw.org>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:19:26 -07:00
Johannes Berg
ad002395fd cfg80211: fix dangling scan request checking
My patch "cfg80211: fix deadlock" broke the code it
was supposed to fix, the scan request checking. But
it's not trivial to put it back the way it was, since
the original patch had a deadlock.

Now do it in a completely new way: queue the check
off to a work struct, where we can freely lock. But
that has some more complications, like needing to
wait for it to be done before the wiphy/rdev can be
destroyed, so some code is required to handle that.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-20 11:36:05 -04:00
Johannes Berg
f424afa178 mac80211: remove deprecated API
All but two drivers have now stopped using the two
deprecated members radio_enabled and beacon_int,
so it's about time to remove them for good.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-20 11:35:58 -04:00
Johannes Berg
3ac64beecd mac80211: allow configure_filter callback to sleep
Over time, a whole bunch of drivers have come up
with their own scheme to delay the configure_filter
operation to a workqueue. To be able to simplify
things, allow configure_filter to sleep, and add
a new prepare_multicast callback that drivers that
need the multicast address list implement. This new
callback must be atomic, but most drivers either
don't care or just calculate a hash which can be
done atomically and then uploaded to the hardware
non-atomically.

A cursory look suggests that at76c50x-usb, ar9170,
mwl8k (which is actually very broken now), rt2x00,
wl1251, wl1271 and zd1211 should make use of this
new capability.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-20 11:35:58 -04:00
Dmitry Eremin-Solenikov
16eea493da ieee802154: add support for channel pages from IEEE 802.15.4-2006
IEEE 802.15.4-2006 adds new concept: channel pages, which can contain several
channels. Add support for channel pages in the API and in the fakehard driver.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-08-19 23:08:22 +04:00
Dmitry Eremin-Solenikov
2bfb1070ba ieee802154: add a sysfs representation of WPAN master devices
Add a sysfs/in-kernel representation of LR-WPAN master devices.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-08-19 23:08:20 +04:00
Eric Dumazet
c1a8f1f1c8 net: restore gnet_stats_basic to previous definition
In 5e140dfc1f "net: reorder struct Qdisc
for better SMP performance" the definition of struct gnet_stats_basic
changed incompatibly, as copies of this struct are shipped to
userland via netlink.

Restoring old behavior is not welcome, for performance reason.

Fix is to use a private structure for kernel, and
teach gnet_stats_copy_basic() to convert from kernel to user land,
using legacy structure (struct gnet_stats_basic)

Based on a report and initial patch from Michael Spang.

Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-17 21:33:49 -07:00
Johannes Berg
16cb9d42b6 cfg80211: allow driver to override PS default
Sometimes drivers might have a good reason to override
the PS default, like iwlwifi right now where it affects
RX performance significantly at this point. This will
allow them to override the default, if desired, in a
way that users can still change it according to their
trade-off choices, not the driver's, like would happen
if the driver just disabled PS completely then.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-14 09:14:08 -04:00
Pat Erley
d5b96a6f39 mac80211: remove max_bandwidth
This removes the max_bandwidth attribute.  It is only ever
written to, and is duplicated by max_bandwidth_khz in the
regulatory code.

Signed-off-by: Pat Erley <pat-lkml@erley.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-14 09:13:53 -04:00
Johannes Berg
5ba63533bb cfg80211: fix alignment problem in scan request
The memory layout for scan requests was rather wrong,
we put the scan SSIDs before the channels which could
lead to the channel pointers being unaligned in memory.
It turns out that using a pointer to the channel array
isn't necessary anyway since we can embed a zero-length
array into the struct.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-14 09:13:44 -04:00
Johannes Berg
ad5351db89 mac80211: allow DMA optimisation
If we have a lot of frames to transmit at once, for
instance with fragmentation, it can be an optimisation
to only tell the DMA engine about them on the last
fragment/frame to avoid banging the IO too much. This
patch allows implementation such an optimisation by
telling the driver when more frames can be expected.

Currently, this is used by mac80211 only on fragmented
frames, but could also be used in the future on other
frames when the queue was full and there are multiple
frames pending.

Note that drivers need to be careful when using this
flag, they need to kick their DMA engines not just
when this flag is clear, but also when the queue gets
full so that progress can be made.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-14 09:13:44 -04:00
Johannes Berg
ab5b5342fd mac80211: document TX powersave filter requirements
This documents what's required to implement that TX powersave
filter properly wrt. handling hardware queues.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-14 09:13:43 -04:00
Johannes Berg
c555b9b371 mac80211: explain TX retry and status
Add some more documentation including an example so that
it's clearer what should be done for TX retries.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-14 09:13:43 -04:00
Johannes Berg
f5ea9120be nl80211: add generation number to all dumps
In order for userspace to be able to figure out whether
it obtained a consistent snapshot of data or not when
using netlink dumps, we need to have a generation number
in each dump message that indicates whether the list has
changed or not -- its value is arbitrary.

This patch adds such a number to all dumps, this needs
some mac80211 involvement to keep track of a generation
number to start with when adding/removing mesh paths or
stations.

The wiphy and netdev lists can be fully handled within
cfg80211, of course, but generation numbers need to be
stored there as well.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-14 09:13:43 -04:00
Johannes Berg
f401a6f7ed cfg80211: use reassociation when possible
With the move of everything related to the SME from
mac80211 to cfg80211, we lost the ability to send
reassociation frames. This adds them back, but only
for wireless extensions. With the userspace SME, it
shall control assoc vs. reassoc (it already can do
so with the nl80211 interface).

I haven't touched the connect() implementation, so
it is not possible to reassociate with the nl80211
connect primitive. I think that should be done with
the NL80211_CMD_ROAM command, but we'll have to see
how that can be handled in the future, especially
with fullmac chips.

This patch addresses only the immediate regression
we had in mac80211, which previously sent reassoc.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-14 09:13:43 -04:00
Dmitry Baryshkov
acb8aacda3 nl802154: support START-CONFIRM primitive
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-12 21:54:51 -07:00
Dmitry Baryshkov
99eb855864 af_ieee802154: add support for WANT_ACK socket option
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-12 21:54:50 -07:00
Dmitry Baryshkov
8505091d2a af_ieee802154: drop IEEE802154_SIOC_ADD_SLAVE declaration
IEEE802154_SIOC_ADD_SLAVE was used to allocate 802.15.4 interfaces
on the top of radio. It's not used anymore, drop it.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-12 20:49:48 -07:00
David S. Miller
aa11d958d1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	arch/microblaze/include/asm/socket.h
2009-08-12 17:44:53 -07:00
Krishna Kumar
bbd8a0d3a3 net: Avoid enqueuing skb for default qdiscs
dev_queue_xmit enqueue's a skb and calls qdisc_run which
dequeue's the skb and xmits it. In most cases, the skb that
is enqueue'd is the same one that is dequeue'd (unless the
queue gets stopped or multiple cpu's write to the same queue
and ends in a race with qdisc_run). For default qdiscs, we
can remove the redundant enqueue/dequeue and simply xmit the
skb since the default qdisc is work-conserving.

The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
default fast queue. The controversial part of the patch is
incrementing qlen when a skb is requeued - this is to avoid
checks like the second line below:

+  } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
>>         !q->gso_skb &&
+          !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {

Results of a 2 hour testing for multiple netperf sessions (1,
2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
aggregate Mb/s across iterations tested with this version on
System-X boxes with Chelsio 10gbps cards:

----------------------------------
Size |  ORG BW          NEW BW   |
----------------------------------
128K |  156964          159381   |
256K |  158650          162042   |
----------------------------------

Changes from ver1:

1. Move sch_direct_xmit declaration from sch_generic.h to
   pkt_sched.h
2. Update qdisc basic statistics for direct xmit path.
3. Set qlen to zero in qdisc_reset.
4. Changed some function names to more meaningful ones.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 20:10:18 -07:00
David S. Miller
bfe34ebbaa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-08-06 12:57:18 -07:00
Jan Engelhardt
36cbd3dcc1 net: mark read-only arrays as const
String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05 10:42:58 -07:00
Igor Perminov
e3b90ca284 mac80211: FIF_PSPOLL filter flag
When an interface is configured in the AP mode, the mac80211
implementation doesn't inform the driver to receive PS Poll frames.
It leads to inability to communicate with power-saving stations
reliably.
The FIF_CONTROL flag isn't passed by mac80211 to
ieee80211_ops.configure_filter when an interface is in the AP mode.
And it's ok, because we don't want to receive ACK frames and other
control ones, but only PS Poll ones.

This patch introduces the FIF_PSPOLL filter flag in addition to
FIF_CONTROL, which means for the driver "pass PS Poll frames".

This flag is passed to the driver:
A) When an interface is configured in the AP mode.
B) In all cases, when the FIF_CONTROL flag was passed earlier (in
addition to it).

Signed-off-by: Igor Perminov <igor.perminov@inbox.ru>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-04 16:44:35 -04:00
Luis R. Rodriguez
8b19e6ca3b cfg80211: enable country IE support to all cfg80211 drivers
Since the bss is always set now once we are connected, if the
bss has its own information element we refer to it and pass that
instead of relying on mac80211's parsing.

Now all cfg80211 drivers get country IE support, automatically and
we reduce the call overhead that we had on mac80211 which called this
upon every beacon and instead now call this only upon a successfull
connection by a STA on cfg80211.

Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-04 16:44:19 -04:00
Luis R. Rodriguez
42935ecaf4 mac80211: redefine usage of the mac80211 workqueue
The mac80211 workqueue exists to enable mac80211 and drivers
to queue their own work on a single threaded workqueue. mac80211
takes care to flush the workqueue during suspend but we never
really had requirements on drivers for how they should use
the workqueue in consideration for suspend.

We extend mac80211 to document how the mac80211 workqueue should
be used, how it should not be used and finally move raw access to
the workqueue to mac80211 only. Drivers and mac80211 use helpers
to queue work onto the mac80211 workqueue:

  * ieee80211_queue_work()
  * ieee80211_queue_delayed_work()

These helpers will now warn if mac80211 already completed its
suspend cycle and someone is trying to queue work. mac80211
flushes the mac80211 workqueue prior to suspend a few times,
but we haven't taken the care to ensure drivers won't add more
work after suspend. To help with this we add a warning when
someone tries to add work and mac80211 already completed the
suspend cycle.

Drivers should ensure they cancel any work or delayed work
in the mac80211 stop() callback.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-04 16:44:14 -04:00
David S. Miller
eca4c3d2dd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-08-03 19:05:50 -07:00
Luis R. Rodriguez
371842448c cfg80211: fix regression on beacon world roaming feature
A regression was added through patch a4ed90d6:

"cfg80211: respect API on orig_flags on channel for beacon hint"

We did indeed respect _orig flags but the intention was not clearly
stated in the commit log. This patch fixes firmware issues picked
up by iwlwifi when we lift passive scan of beaconing restrictions
on channels its EEPROM has been configured to always enable.

By doing so though we also disallowed beacon hints on devices
registering their wiphy with custom world regulatory domains
enabled, this happens to be currently ath5k, ath9k and ar9170.
The passive scan and beacon restrictions on those devices would
never be lifted even if we did find a beacon and the hardware did
support such enhancements when world roaming.

Since Johannes indicates iwlwifi firmware cannot be changed to
allow beacon hinting we set up a flag now to specifically allow
drivers to disable beacon hints for devices which cannot use them.

We enable the flag on iwlwifi to disable beacon hints and by default
enable it for all other drivers. It should be noted beacon hints lift
passive scan flags and beacon restrictions when we receive a beacon from
an AP on any 5 GHz non-DFS channels, and channels 12-14 on the 2.4 GHz
band. We don't bother with channels 1-11 as those channels are allowed
world wide.

This should fix world roaming for ath5k, ath9k and ar9170, thereby
improving scan time when we receive the first beacon from any AP,
and also enabling beaconing operation (AP/IBSS/Mesh) on cards which
would otherwise not be allowed to do so. Drivers not using custom
regulatory stuff (wiphy_apply_custom_regulatory()) were not affected
by this as the orig_flags for the channels would have been cleared
upon wiphy registration.

I tested this with a world roaming ath5k card.

Cc: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-03 16:31:21 -04:00
Dave Young
af0d3b103b bluetooth: rfcomm_init bug fix
rfcomm tty may be used before rfcomm_tty_driver initilized,
The problem is that now socket layer init before tty layer, if userspace
program do socket callback right here then oops will happen.

reporting in:
http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2

make 3 changes:
1. remove #ifdef in rfcomm/core.c,
make it blank function when rfcomm tty not selected in rfcomm.h

2. tune the rfcomm_init error patch to ensure
tty driver initilized before rfcomm socket usage.

3. remove __exit for rfcomm_cleanup_sockets
because above change need call it in a __init function.

Reported-by: Oliver Hartkopp <oliver@hartkopp.net>
Tested-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-03 13:24:39 -07:00
Eric Dumazet
e4c4e448cf neigh: Convert garbage collection from softirq to workqueue
Current neigh_periodic_timer() function is fired by timer IRQ, and
scans one hash bucket each round (very litle work in fact)

As we are supposed to scan whole hash table in 15 seconds, this means
neigh_periodic_timer() can be fired very often. (depending on the number
of concurrent hash entries we stored in this table)

Converting this to a workqueue permits scanning whole table, minimizing
icache pollution, and firing this work every 15 seconds, independantly
of hash table size.

This 15 seconds delay is not a hard number, as work is a deferrable one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 18:35:16 -07:00
Hannes Eder
1e3e238e9c IPVS: use pr_err and friends instead of IP_VS_ERR and friends
Since pr_err and friends are used instead of printk there is no point
in keeping IP_VS_ERR and friends.  Furthermore make use of '__func__'
instead of hard coded function names.

Signed-off-by: Hannes Eder <heder@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 18:29:30 -07:00
David S. Miller
2f6d7c1b34 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-07-30 19:26:55 -07:00
David S. Miller
df597efb57 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-3945.h
	drivers/net/wireless/iwlwifi/iwl-tx.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-07-30 19:22:43 -07:00
Neil Horman
a33bc5c151 xfrm: select sane defaults for xfrm[4|6] gc_thresh
Choose saner defaults for xfrm[4|6] gc_thresh values on init

Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
(set to 1024).  Given that the ipv4 and ipv6 routing caches are sized
dynamically at boot time, the static selections can be non-sensical.
This patch dynamically selects an appropriate gc threshold based on
the corresponding main routing table size, using the assumption that
we should in the worst case be able to handle as many connections as
the routing table can.

For ipv4, the maximum route cache size is 16 * the number of hash
buckets in the route cache.  Given that xfrm4 starts garbage
collection at the gc_thresh and prevents new allocations at 2 *
gc_thresh, we set gc_thresh to half the maximum route cache size.

For ipv6, its a bit trickier.  there is no maximum route cache size,
but the ipv6 dst_ops gc_thresh is statically set to 1024.  It seems
sane to select a simmilar gc_thresh for the xfrm6 code that is half
the number of hash buckets in the v6 route cache times 16 (like the v4
code does).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-30 18:52:15 -07:00
Hannes Eder
9aada7ac04 IPVS: use pr_fmt
While being at it cleanup whitespace.

Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-30 14:29:44 -07:00
Johannes Berg
1f9298f960 cfg80211: combine IWESSID handlers
Since we now have handlers IWESSID for all modes, we can
combine them into one.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-29 15:46:18 -04:00
Johannes Berg
562e482265 cfg80211: combine IWAP handlers
Since we now have IWAP handlers for all modes, we can
combine them into one.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-29 15:46:16 -04:00
Johannes Berg
0e82ffe3b9 cfg80211: combine iwfreq implementations
Until now we implemented iwfreq for managed mode, we
needed to keep the implementations separate, but now
that we have all versions implemented we can combine
them and export just one handler.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-29 15:46:14 -04:00
Johannes Berg
3fa52056f3 mac80211: fix PS-poll response, race
When a station queries us for a PS-poll response, we wrongly
queue the frame on the virtual interface's queue rather than
the pending queue.

Additionally, fix a race condition where we could potentially
send multiple frames to the sleeping station due to using a
station flag rather than a packet flag. When converting to a
packet flag, we can also convert p54 and remove the filter
clearing we added for it.

(Also remove a now dead function)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Bob Copeland <me@bobcopeland.com>
Tested-by: Bob Copeland <me@bobcopeland.com>
Cc: Christian Lamparter <chunkeey@web.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-27 15:24:19 -04:00
Johannes Berg
463d018323 cfg80211: make aware of net namespaces
In order to make cfg80211/nl80211 aware of network namespaces,
we have to do the following things:

 * del_virtual_intf method takes an interface index rather
   than a netdev pointer - simply change this

 * nl80211 uses init_net a lot, it changes to use the sender's
   network namespace

 * scan requests use the interface index, hold a netdev pointer
   and reference instead

 * we want a wiphy and its associated virtual interfaces to be
   in one netns together, so
    - we need to be able to change ns for a given interface, so
      export dev_change_net_namespace()
    - for each virtual interface set the NETIF_F_NETNS_LOCAL
      flag, and clear that flag only when the wiphy changes ns,
      to disallow breaking this invariant

 * when a network namespace goes away, we need to reparent the
   wiphy to init_net

 * cfg80211 users that support creating virtual interfaces must
   create them in the wiphy's namespace, currently this affects
   only mac80211

The end result is that you can now switch an entire wiphy into
a different network namespace with the new command
	iw phy#<idx> set netns <pid>
and all virtual interfaces will follow (or the operation fails).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-27 15:24:07 -04:00
David S. Miller
f004ec728b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan 2009-07-27 11:29:31 -07:00
Ralf Baechle
dcf777f6ed NET: ROSE: Don't use static buffer.
The use of a static buffer in rose2asc() to return its result is not
threadproof and can result in corruption if multiple threads are trying
to use one of the procfs files based on rose2asc().

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-26 19:11:14 -07:00
Johannes Berg
3b8d81e020 mac80211: remove master netdev
With the internal 'pending' queue system in place, we can simply
put packets there instead of pushing them off to the master dev,
getting rid of the master interface completely.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-24 15:05:30 -04:00
Luis R. Rodriguez
4c6d4f5c33 mac80211: add helper for management / no-ack frame rate decision
All current rate control algorithms agree to send management and no-ack
frames at the lowest rate. They also agree to do this when sta
and the private rate control data is NULL. We add a hlper to mac80211
for this and simplify the rate control algorithm code.

Developers wishing to make enhancements to rate control algorithms
are for broadcast/multicast can opt to not use this in their
gate_rate() mac80211 callback.

Cc: Zhu Yi <yi.zhu@intel.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: ipw3945-devel@lists.sourceforge.net
Cc: Gabor Juhos <juhosg@openwrt.org>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Cc: Derek Smithies <derek@indranet.co.nz>
Cc: Chittajit Mitra <Chittajit.Mitra@Atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-24 15:05:16 -04:00
Luis R. Rodriguez
b770b43e95 mac80211: drop frames for sta with no valid rate
When we're associated we should be able to send data to
target sta. If we cannot we may be trying to use the incorrect
band to talk to the sta. Lets catch any such cases, warn, and
drop the frames to not invalidate assumptions being made on
rate control algorithms when they have a valid sta to
communicate with. Any such cases should be handled and fixed.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-24 15:05:14 -04:00
Helmut Schaa
ca3dbc20d4 cfg80211: update misleading comment
In cfg80211_scan_request n_channels refers to the total number
of channels to scan. Update the misleading comment accordingly.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-24 15:05:11 -04:00
Johannes Berg
fffd0934b9 cfg80211: rework key operation
This reworks the key operation in cfg80211, and now only
allows, from userspace, configuring keys (via nl80211)
after the connection has been established (in managed
mode), the IBSS been joined (in IBSS mode), at any time
(in AP[_VLAN] modes) or never for all the other modes.

In order to do shared key authentication correctly, it
is now possible to give a WEP key to the AUTH command.
To configure static WEP keys, these are given to the
CONNECT or IBSS_JOIN command directly, for a userspace
SME it is assumed it will configure it properly after
the connection has been established.

Since mac80211 used to check the default key in IBSS
mode to see whether or not the network is protected,
it needs an update in that area, as well as an update
to make use of the WEP key passed to auth() for shared
key authentication.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-24 15:05:09 -04:00
David S. Miller
74d154189d Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwmc3200wifi/netdev.c
	net/wireless/scan.c
2009-07-23 19:03:51 -07:00
Rémi Denis-Courmont
c1dc13e9d0 Phonet: sockets list through proc_fs
This provides a list of sockets with their Phonet bind addresses and
some socket debug informations through /proc/net/phonet.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-23 17:58:19 -07:00
Dmitry Eremin-Solenikov
f0166e5e3c ieee802154: move headers out of extra directory
include/net/ieee802154/af_ieee802154.h (and others) naming seems to be too long
and redundant. Drop one level of subdirectories.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-07-23 17:08:51 +04:00
Daniel Silverstone
878fa89f97 IEEE80154: Add documentation to the IEEE80154 netlink and fakehard driver
This adds some perfunctory documentation comments to the IEEE 802.15.4
fakehard.c driver (Fake hard MAC) and the nl802154.h (outgoing netlink messages)
header.

These comments are not necessarily complete, but they do reference the
IEEE 802.15.4-2006 document where possible.

Signed-off-by: Daniel Silverstone <dsilvers@simtec.co.uk>
2009-07-23 17:06:49 +04:00
Johannes Berg
4edf547b4d net: explain netns notifiers a little better
Eric explained this to me -- and afterwards the comment
made sense, but not before. Add the the critical point
about interfaces having to be gone from the netns before
subsys notifiers are called.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20 08:03:00 -07:00
John Dykstra
e3afe7b75e tcp: Fix MD5 signature checking on IPv4 mapped sockets
Fix MD5 signature checking so that an IPv4 active open
to an IPv6 socket can succeed.  In particular, use the
correct address family's signature generation function
for the SYN/ACK.

Reported-by:   Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20 07:49:07 -07:00
David S. Miller
da8120355e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/orinoco/main.c
2009-07-16 20:21:24 -07:00
Eric Dumazet
4dc6dc7162 net: sock_copy() fixes
Commit e912b1142b
(net: sk_prot_alloc() should not blindly overwrite memory)
took care of not zeroing whole new socket at allocation time.

sock_copy() is another spot where we should be very careful.
We should not set refcnt to a non null value, until
we are sure other fields are correctly setup, or
a lockless reader could catch this socket by mistake,
while not fully (re)initialized.

This patch puts sk_node & sk_refcnt to the very beginning
of struct sock to ease sock_copy() & sk_prot_alloc() job.

We add appropriate smp_wmb() before sk_refcnt initializations
to match our RCU requirements (changes to sock keys should
be committed to memory before sk_refcnt setting)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-16 18:05:26 -07:00
Johannes Berg
b333b3d228 wireless extensions: make netns aware
This makes wireless extensions netns aware. The
tasklet sending the events is converted to a work
struct so that we can rtnl_lock() in it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-15 08:53:32 -07:00
Sridhar Samudrala
7ea2f2c5a6 udpv6: Remove unused skb argument of ipv6_select_ident()
- move ipv6_select_ident() inline function to ipv6.h and remove the unused
  skb argument

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:29:28 -07:00
Sridhar Samudrala
d7ca4cc01f udpv4: Handle large incoming UDP/IPv4 packets and support software UFO.
- validate and forward GSO UDP/IPv4 packets from untrusted sources.
- do software UFO if the outgoing device doesn't support UFO.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:29:21 -07:00
Johannes Berg
30ffee8480 net: move and export get_net_ns_by_pid
The function get_net_ns_by_pid(), to get a network
namespace from a pid_t, will be required in cfg80211
as well. Therefore, let's move it to net_namespace.c
and export it. We can't make it a static inline in
the !NETNS case because it needs to verify that the
given pid even exists (and return -ESRCH).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:03:28 -07:00
Johannes Berg
134e63756d genetlink: make netns aware
This makes generic netlink network namespace aware. No
generic netlink families except for the controller family
are made namespace aware, they need to be checked one by
one and then set the family->netnsok member to true.

A new function genlmsg_multicast_netns() is introduced to
allow sending a multicast message in a given namespace,
for example when it applies to an object that lives in
that namespace, a new function genlmsg_multicast_allns()
to send a message to all network namespaces (for objects
that do not have an associated netns).

The function genlmsg_multicast() is changed to multicast
the message in just init_net, which is currently correct
for all generic netlink families since they only work in
init_net right now. Some will later want to work in all
net namespaces because they do not care about the netns
at all -- those will have to be converted to use one of
the new functions genlmsg_multicast_allns() or
genlmsg_multicast_netns() whenever they are made netns
aware in some way.

After this patch families can easily decide whether or
not they should be available in all net namespaces. Many
genl families us it for objects not related to networking
and should therefore be available in all namespaces, but
that will have to be done on a per family basis.

Note that this doesn't touch on the checkpoint/restart
problem where network namespaces could be used, genl
families and multicast groups are numbered globally and
I see no easy way of changing that, especially since it
must be possible to multicast to all network namespaces
for those families that do not care about netns.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:03:27 -07:00
Johannes Berg
11a28d373e net: make namespace iteration possible under RCU
All we need to take care of is using proper RCU list
add/del primitives and inserting a synchronize_rcu()
at one place to make sure the exit notifiers are run
after everybody has stopped iterating the list.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:03:25 -07:00
Johannes Berg
667503ddcb cfg80211: fix locking
Over time, a lot of locking issues have crept into
the smarts of cfg80211, so e.g. scan completion can
race against a new scan, IBSS join can race against
leaving an IBSS, etc.

Introduce a new per-interface lock that protects
most of the per-interface data that we need to keep
track of, and sprinkle assertions about that lock
everywhere. Some things now need to be offloaded to
work structs so that we don't require being able to
sleep in functions the drivers call. The exception
to that are the MLME callbacks (rx_auth etc.) that
currently only mac80211 calls because it was easier
to do that there instead of in cfg80211, and future
drivers implementing those calls will, if they ever
exist, probably need to use a similar scheme like
mac80211 anyway...

In order to be able to handle _deauth and _disassoc
properly, introduce a cookie passed to it that will
determine locking requirements.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:02:32 -04:00
Johannes Berg
cb0b4beb93 cfg80211: mlme API must be able to sleep
After the mac80211 mlme cleanup, we can require that
the MLME functions in cfg80211 can sleep. This will
simplify future work in cfg80211 a lot.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:02:31 -04:00
Johannes Berg
c238c8ac63 cfg80211: dont use union for wext
Otherwise it becomes very hard to reset the structs
correctly since wext can be configured while the
interface is down.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:02:31 -04:00
Johannes Berg
3e5d7649a6 cfg80211: let SME control reassociation vs. association
Since we don't really know that well in the kernel,
let's let the SME control whether it wants to use
reassociation or not, by allowing it to give the
previous BSSID in the associate() parameters.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:02:30 -04:00
Johannes Berg
19957bb399 cfg80211: keep track of BSSes
In order to avoid problems with BSS structs going away
while they're in use, I've long wanted to make cfg80211
keep track of them. Without the SME, that wasn't doable
but now that we have the SME we can do this too. It can
keep track of up to four separate authentications and
one association, regardless of whether it's controlled
by the cfg80211 SME or the userspace SME.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:53 -04:00
Johannes Berg
517357c685 cfg80211: assimilate and export ieee80211_bss_get_ie
This function from mac80211 seems generally useful, and
I will need it in cfg80211 soon.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:53 -04:00
Johannes Berg
8990646d2f cfg80211: implement get_wireless_stats
By dropping the noise reporting, we can implement
wireless stats in cfg80211. We also make the
handler return NULL if we have no information,
which is possible thanks to the recent wext change.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:52 -04:00
Johannes Berg
9930380f0b cfg80211: implement IWRATE
For now, let's implement that using a very hackish way:
simply mirror the wext API in the cfg80211 API. This
will have to be changed later when we implement proper
bitrate API.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:52 -04:00
Johannes Berg
ab737a4f7d cfg80211: implement IWAP for WDS
This implements siocsiwap/giwap for WDS mode.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:52 -04:00
Johannes Berg
bc92afd920 cfg80211: implement iwpower
Just on/off and timeout, and with a hacky cfg80211 method
until we figure out what we want, though this is probably
sufficient as we want to use pm_qos for wifi everywhere.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:51 -04:00
Johannes Berg
f21293549f cfg80211: managed mode wext compatibility
This adds code to make it possible to use the cfg80211
connect() API with wireless extensions, and because the
previous patch added emulation of that API with auth()
and assoc(), by extension also supports wext on that.
At the same time, removes code from mac80211 for wext,
but doesn't yet clean up mac80211's mlme code more.

Signed-off-by: Samuel Ortiz <samuel.ortiz@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:51 -04:00
Johannes Berg
6829c878ec cfg80211: emulate connect with auth/assoc
This adds code to cfg80211 so that drivers (mac80211 right
now) that don't implement connect but rather auth/assoc can
still be used with the nl80211 connect command. This will
also be necessary for the wext compat code.

Signed-off-by: Samuel Ortiz <samuel.ortiz@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:51 -04:00
Samuel Ortiz
b23aa676ab cfg80211: connect/disconnect API
This patch introduces the cfg80211 connect/disconnect API.
The goal here is to run the AUTH and ASSOC steps in one call.
This is needed for some fullmac cards that run both steps
directly from the target, after the host driver sends a
connect command.

Additionally, all the new crypto parameters for connect()
are now also valid for associate() -- although associate
requires the IEs to be used, the information can be useful
for drivers and should be given.

Signed-off-by: Samuel Ortiz <samuel.ortiz@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:51 -04:00
Johannes Berg
aff89a9b90 cfg80211: introduce nl80211 testmode command
This introduces a new NL80211_CMD_TESTMODE for testing
and calibration use with nl80211. There's no multiplexing
like like iwpriv had, and the command is not available by
default, it needs to be explicitly enabled in Kconfig and
shouldn't be enabled in most kernels.

The command requires a wiphy index or interface index to
identify the device to operate on, and the new TESTDATA
attribute. There also is API for sending replies to the
command, and testmode multicast messages (on a testmode
multicast group).

I've also updated mac80211 to be able to pass through the
command to the driver, since it itself doesn't implement
the testmode command.

Additionally, to give people an idea of how to use the
command, I've added a little code to hwsim that makes use
of the new command to set the powersave mode, this is
currently done via debugfs and should remain there, and
the testmode command only serves as an example of how to
use this best -- with nested netlink attributes in the
TESTDATA attribute. A hwsim testmode tool can be found at
http://git.sipsolutions.net/hwsim.git/. This tool is BSD
licensed so people can easily use it as a basis for their
own internal fabrication and validation tools.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:50 -04:00
Johannes Berg
5121ea0481 wext: constify extra argument to wireless_send_event
This is never changed by the function, so can be marked const.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:49 -04:00
Johannes Berg
7ebbe6bd51 cfg80211: remove wireless_dev->bssid
This variable isn't necessary -- the wext code keeps
track of the BSSID itself, and otherwise we have
current_bss.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:49 -04:00
Johannes Berg
e6d6e3420d cfg80211: use proper allocation flags
Instead of hardcoding GFP_ATOMIC everywhere, add a
new function parameter that gets the flags from the
caller. Obviously then I need to update all callers
(all of them in mac80211), and it turns out that now
it's ok to use GFP_KERNEL in almost all places.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:49 -04:00
David Kilroy
f1f74825fe cfg80211: add wrapper function to get wiphy from priv pointer
Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 15:01:42 -04:00
Johannes Berg
f1d58c2521 mac80211: push rx status into skb->cb
Within mac80211, we often need to copy the rx status into
skb->cb. This is wasteful, as drivers could be building it
in there to start with. This patch changes the API so that
drivers are expected to pass the RX status in skb->cb, now
accessible as IEEE80211_SKB_RXCB(skb). It also updates all
drivers to pass the rx status in there, but only by making
them memcpy() it into place before the call to the receive
function (ieee80211_rx(_irqsafe)). Each driver can now be
optimised on its own schedule.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 14:57:54 -04:00
Johannes Berg
e36d56b648 cfg80211: pass netdev to change_virtual_intf
If there was a reason I'm passing the ifidx I cannot
remember it any more and don't see one now, so let's
just pass the pointer itself.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-10 14:57:38 -04:00
David S. Miller
e5a8a896f5 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-07-09 20:18:24 -07:00
Jiri Olsa
ad46276952 memory barrier: adding smp_mb__after_lock
Adding smp_mb__after_lock define to be used as a smp_mb call after
a lock.

Making it nop for x86, since {read|write|spin}_lock() on x86 are
full memory barriers.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09 17:06:58 -07:00
Jiri Olsa
a57de0b433 net: adding memory barrier to the poll and receive callbacks
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp->rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp->rcv_nxt
  ...                        ...
  tp->rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                        wake_up_interruptible(sk->sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp->rcv_nxt check and sleeps, or will get the new value for
tp->rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09 17:06:57 -07:00
Cyrill Gorcunov
e04af024b2 net, netns_xt: shrink netns_xt members
In case if kernel was compiled without ebtables support
there is no need to keep ebt_table pointers in netns_xt
structure.

Make it config dependent.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-05 19:16:18 -07:00
Rami Rosen
af794c7424 cleanup: remove unused member in scm_cookie.
This patch removes an unused member (seq) scm_cookie; besides initialized
to 0 in the header file, it is not used.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-05 19:16:10 -07:00
David S. Miller
53bd9728bf Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-06-29 19:22:31 -07:00
Patrick McHardy
a3a9f79e36 netfilter: tcp conntrack: fix unacknowledged data detection with NAT
When NAT helpers change the TCP packet size, the highest seen sequence
number needs to be corrected. This is currently only done upwards, when
the packet size is reduced the sequence number is unchanged. This causes
TCP conntrack to falsely detect unacknowledged data and decrease the
timeout.

Fix by updating the highest seen sequence number in both directions after
packet mangling.

Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-29 14:07:56 +02:00
Rémi Denis-Courmont
c7a1a4c80f Phonet: publicize the Netlink notification function
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-25 02:58:15 -07:00
Linus Torvalds
09ce42d316 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6:
  bnx2: Fix the behavior of ethtool when ONBOOT=no
  qla3xxx: Don't sleep while holding lock.
  qla3xxx: Give the PHY time to come out of reset.
  ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off
  net: Move rx skb_orphan call to where needed
  ipv6: Use correct data types for ICMPv6 type and code
  net: let KS8842 driver depend on HAS_IOMEM
  can: let SJA1000 driver depend on HAS_IOMEM
  netxen: fix firmware init handshake
  netxen: fix build with without CONFIG_PM
  netfilter: xt_rateest: fix comparison with self
  netfilter: xt_quota: fix incomplete initialization
  netfilter: nf_log: fix direct userspace memory access in proc handler
  netfilter: fix some sparse endianess warnings
  netfilter: nf_conntrack: fix conntrack lookup race
  netfilter: nf_conntrack: fix confirmation race condition
  netfilter: nf_conntrack: death_by_timeout() fix
2009-06-24 10:01:12 -07:00
Herbert Xu
d55d87fdff net: Move rx skb_orphan call to where needed
In order to get the tun driver to account packets, we need to be
able to receive packets with destructors set.  To be on the safe
side, I added an skb_orphan call for all protocols by default since
some of them (IP in particular) cannot handle receiving packets
destructors properly.

Now it seems that at least one protocol (CAN) expects to be able
to pass skb->sk through the rx path without getting clobbered.

So this patch attempts to fix this properly by moving the skb_orphan
call to where it's actually needed.  In particular, I've added it
to skb_set_owner_[rw] which is what most users of skb->destructor
call.

This is actually an improvement for tun too since it means that
we only give back the amount charged to the socket when the skb
is passed to another socket that will also be charged accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Oliver Hartkopp <olver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-23 16:36:25 -07:00
Brian Haley
d5fdd6babc ipv6: Use correct data types for ICMPv6 type and code
Change all the code that deals directly with ICMPv6 type and code
values to use u8 instead of a signed int as that's the actual data
type.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-23 04:31:07 -07:00
Linus Torvalds
5165aece0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (43 commits)
  via-velocity: Fix velocity driver unmapping incorrect size.
  mlx4_en: Remove redundant refill code on RX
  mlx4_en: Removed redundant check on lso header size
  mlx4_en: Cancel port_up check in transmit function
  mlx4_en: using stop/start_all_queues
  mlx4_en: Removed redundant skb->len check
  mlx4_en: Counting all the dropped packets on the TX side
  usbnet cdc_subset: fix issues talking to PXA gadgets
  Net: qla3xxx, remove sleeping in atomic
  ipv4: fix NULL pointer + success return in route lookup path
  isdn: clean up documentation index
  cfg80211: validate station settings
  cfg80211: allow setting station parameters in mesh
  cfg80211: allow adding/deleting stations on mesh
  ath5k: fix beacon_int handling
  MAINTAINERS: Fix Atheros pattern paths
  ath9k: restore PS mode, before we put the chip into FULL SLEEP state.
  ath9k: wait for beacon frame along with CAB
  acer-wmi: fix rfkill conversion
  ath5k: avoid PCI FATAL interrupts by restoring RETRY_TIMEOUT disabling
  ...
2009-06-22 11:57:09 -07:00
Hendrik Brueckner
0ea920d211 af_iucv: Return -EAGAIN if iucv msg limit is exceeded
If the iucv message limit for a communication path is exceeded,
sendmsg() returns -EAGAIN instead of -EPIPE.
The calling application can then handle this error situtation,
e.g. to try again after waiting some time.

For blocking sockets, sendmsg() waits up to the socket timeout
before returning -EAGAIN. For the new wait condition, a macro
has been introduced and the iucv_sock_wait_state() has been
refactored to this macro.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-19 00:10:40 -07:00
Linus Torvalds
d2aa455037 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits)
  netxen: fix tx ring accounting
  netxen: fix detection of cut-thru firmware mode
  forcedeth: fix dma api mismatches
  atm: sk_wmem_alloc initial value is one
  net: correct off-by-one write allocations reports
  via-velocity : fix no link detection on boot
  Net / e100: Fix suspend of devices that cannot be power managed
  TI DaVinci EMAC : Fix rmmod error
  net: group address list and its count
  ipv4: Fix fib_trie rebalancing, part 2
  pkt_sched: Update drops stats in act_police
  sky2: version 1.23
  sky2: add GRO support
  sky2: skb recycling
  sky2: reduce default transmit ring
  sky2: receive counter update
  sky2: fix shutdown synchronization
  sky2: PCI irq issues
  sky2: more receive shutdown
  sky2: turn off pause during shutdown
  ...

Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
2009-06-18 14:07:15 -07:00
Eric Dumazet
c564039fd8 net: sk_wmem_alloc has initial value of one, not zero
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

Some protocols check sk_wmem_alloc value to determine if a timer
must delay socket deallocation. We must take care of the sk_wmem_alloc
value being one instead of zero when no write allocations are pending.

Reported by Ingo Molnar, and full diagnostic from David Miller.

This patch introduces three helpers to get read/write allocations
and a followup patch will use these helpers to report correct
write allocations to user.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17 04:31:25 -07:00
Linus Torvalds
b3fec0fe35 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits)
  signal: fix __send_signal() false positive kmemcheck warning
  fs: fix do_mount_root() false positive kmemcheck warning
  fs: introduce __getname_gfp()
  trace: annotate bitfields in struct ring_buffer_event
  net: annotate struct sock bitfield
  c2port: annotate bitfield for kmemcheck
  net: annotate inet_timewait_sock bitfields
  ieee1394/csr1212: fix false positive kmemcheck report
  ieee1394: annotate bitfield
  net: annotate bitfields in struct inet_sock
  net: use kmemcheck bitfields API for skbuff
  kmemcheck: introduce bitfield API
  kmemcheck: add opcode self-testing at boot
  x86: unify pte_hidden
  x86: make _PAGE_HIDDEN conditional
  kmemcheck: make kconfig accessible for other architectures
  kmemcheck: enable in the x86 Kconfig
  kmemcheck: add hooks for the page allocator
  kmemcheck: add hooks for page- and sg-dma-mappings
  kmemcheck: don't track page tables
  ...
2009-06-16 13:09:51 -07:00
David S. Miller
14ebaf81e1 x25: Fix sleep from timer on socket destroy.
If socket destuction gets delayed to a timer, we try to
lock_sock() from that timer which won't work.

Use bh_lock_sock() in that case.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Ingo Molnar <mingo@elte.hu>
2009-06-16 05:40:30 -07:00
Vegard Nossum
a98b65a3ad net: annotate struct sock bitfield
2009/2/24 Ingo Molnar <mingo@elte.hu>:
> ok, this is the last warning i have from today's overnight -tip
> testruns - a 32-bit system warning in sock_init_data():
>
> [    2.610389] NET: Registered protocol family 16
> [    2.616138] initcall netlink_proto_init+0x0/0x170 returned 0 after 7812 usecs
> [    2.620010] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f642c184)
> [    2.624002] 010000000200000000000000604990c000000000000000000000000000000000
> [    2.634076]  i i i i i i u u i i i i i i i i i i i i i i i i i i i i i i i i
> [    2.641038]          ^
> [    2.643376]
> [    2.644004] Pid: 1, comm: swapper Not tainted (2.6.29-rc6-tip-01751-g4d1c22c-dirty #885)
> [    2.648003] EIP: 0060:[<c07141a1>] EFLAGS: 00010282 CPU: 0
> [    2.652008] EIP is at sock_init_data+0xa1/0x190
> [    2.656003] EAX: 0001a800 EBX: f6836c00 ECX: 00463000 EDX: c0e46fe0
> [    2.660003] ESI: f642c180 EDI: c0b83088 EBP: f6863ed8 ESP: c0c412ec
> [    2.664003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [    2.668003] CR0: 8005003b CR2: f682c400 CR3: 00b91000 CR4: 000006f0
> [    2.672003] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [    2.676003] DR6: ffff4ff0 DR7: 00000400
> [    2.680002]  [<c07423e5>] __netlink_create+0x35/0xa0
> [    2.684002]  [<c07443cc>] netlink_kernel_create+0x4c/0x140
> [    2.688002]  [<c072755e>] rtnetlink_net_init+0x1e/0x40
> [    2.696002]  [<c071b601>] register_pernet_operations+0x11/0x30
> [    2.700002]  [<c071b72c>] register_pernet_subsys+0x1c/0x30
> [    2.704002]  [<c0bf3c8c>] rtnetlink_init+0x4c/0x100
> [    2.708002]  [<c0bf4669>] netlink_proto_init+0x159/0x170
> [    2.712002]  [<c0101124>] do_one_initcall+0x24/0x150
> [    2.716002]  [<c0bbf3c7>] do_initcalls+0x27/0x40
> [    2.723201]  [<c0bbf3fc>] do_basic_setup+0x1c/0x20
> [    2.728002]  [<c0bbfb8a>] kernel_init+0x5a/0xa0
> [    2.732002]  [<c0103e47>] kernel_thread_helper+0x7/0x10
> [    2.736002]  [<ffffffff>] 0xffffffff

We fix this false positive by annotating the bitfield in struct
sock.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:49:36 +02:00
Vegard Nossum
9e337b0fb3 net: annotate inet_timewait_sock bitfields
The use of bitfields here would lead to false positive warnings with
kmemcheck. Silence them.

(Additionally, one erroneous comment related to the bitfield was also
fixed.)

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:49:32 +02:00
Vegard Nossum
45e3ff8270 net: annotate bitfields in struct inet_sock
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:49:27 +02:00
Jarek Poplawski
ca44d6e60f pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US
Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS
(like in PSCHED_TICKS_PER_SEC already) to avoid misleading.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-15 02:31:47 -07:00
Pablo Neira Ayuso
dd7669a92c netfilter: conntrack: optional reliable conntrack event delivery
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.

The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.

At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.

The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.

During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.

A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.

For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.

This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13 12:30:52 +02:00
Pablo Neira Ayuso
9858a3ae1d netfilter: conntrack: move helper destruction to nf_ct_helper_destroy()
This patch moves the helper destruction to a function that lives
in nf_conntrack_helper.c. This new function is used in the patch
to add ctnetlink reliable event delivery.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13 12:28:22 +02:00
Pablo Neira Ayuso
a0891aa6a6 netfilter: conntrack: move event caching to conntrack extension infrastructure
This patch reworks the per-cpu event caching to use the conntrack
extension infrastructure.

The main drawback is that we consume more memory per conntrack
if event delivery is enabled. This patch is required by the
reliable event delivery that follows to this patch.

BTW, this patch allows you to enable/disable event delivery via
/proc/sys/net/netfilter/nf_conntrack_events in runtime, although
you can still disable event caching as compilation option.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13 12:26:29 +02:00
Patrick McHardy
36432dae73 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-06-11 16:00:49 +02:00
David S. Miller
bb400801c2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2009-06-11 05:47:43 -07:00
Eric Dumazet
2b85a34e91 net: No more expensive sock_hold()/sock_put() on each tx
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 02:55:43 -07:00
Johannes Berg
8f77f3849c mac80211: do not pass PS frames out of mac80211 again
In order to handle powersave frames properly we had needed
to pass these out to the device queues again, and introduce
the skb->requeue bit. This, however, also has unnecessary
overhead by needing to 'clean up' already tried frames, and
this clean-up code is also buggy when software encryption
is used.

Instead of sending the frames via the master netdev queue
again, simply put them into the pending queue. This also
fixes a problem where frames for that particular station
could be reordered when some were still on the software
queues and older ones are re-injected into the software
queue after them.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-10 13:28:37 -04:00
Patrick McHardy
440f0d5885 netfilter: nf_conntrack: use per-conntrack locks for protocol data
Introduce per-conntrack locks and use them instead of the global protocol
locks to avoid contention. Especially tcp_lock shows up very high in
profiles on larger machines.

This will also allow to simplify the upcoming reliable event delivery patches.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-10 14:32:47 +02:00
Sergey Lapin
2c21d11518 net: add NL802154 interface for configuration of 802.15.4 devices
Add a netlink interface for configuration of IEEE 802.15.4 device. Also this
interface specifies events notification sent by devices towards higher layers.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 05:25:33 -07:00
Sergey Lapin
9ec7671603 net: add IEEE 802.15.4 socket family implementation
Add support for communication over IEEE 802.15.4 networks. This implementation
is neither certified nor complete, but aims to that goal. This commit contains
only the socket interface for communication over IEEE 802.15.4 networks.
One can either send RAW datagrams or use SOCK_DGRAM to encapsulate data
inside normal IEEE 802.15.4 packets.

Configuration interface, drivers and software MAC 802.15.4 implementation will
follow.

Initial implementation was done by Maxim Gorbachyov, Maxim Osipov and Pavel
Smolensky as a research project at Siemens AG. Later the stack was heavily
reworked to better suit the linux networking model, and is now maitained
as an open project partially sponsored by Siemens.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 05:25:32 -07:00
Jarek Poplawski
a4a710c4a7 pkt_sched: Change PSCHED_SHIFT from 10 to 6
Change PSCHED_SHIFT from 10 to 6 to increase schedulers time
resolution. This will increase 16x a number of (internal) ticks per
nanosecond, and is needed to improve accuracy of schedulers based on
rate tables, like HTB, TBF or CBQ, with rates above 100Mbit. It is
assumed this change is safe for 32bit accounting of time diffs up
to 2 minutes, which should be enough for common use (extremely low
rate values may overflow, so get inaccurate instead). To make full
use of this change an updated iproute2 will be needed. (But using
older iproute2 should be safe too.)

This change breaks ticks - microseconds similarity, so some minor code
fixes might be needed. It is also planned to change naming adequately
eg. to PSCHED_TICKS2NS() etc. in the near future.

Reported-by: Antonio Almeida <vexwek@gmail.com>
Tested-by: Antonio Almeida <vexwek@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 05:25:30 -07:00
Jarek Poplawski
728bf09827 pkt_sched: Use PSCHED_SHIFT in PSCHED time conversion
Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and
PSCHED_NS2US() macros to enable changing this value later.

Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT
definitions. This part of the patch is based on feedback from
Patrick McHardy <kaber@trash.net>.

Reported-by: Antonio Almeida <vexwek@gmail.com>
Tested-by: Antonio Almeida <vexwek@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 05:25:29 -07:00
David S. Miller
05f77f85f4 bluetooth: Kill skb_frags_no(), unused.
Furthermore, it twiddles with the details of SKB list handling
directly, which we're trying to eliminate.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08 16:16:56 -07:00
Jan Kasprzak
f87fb666bb netfilter: nf_ct_icmp: keep the ICMP ct entries longer
Current conntrack code kills the ICMP conntrack entry as soon as
the first reply is received. This is incorrect, as we then see only
the first ICMP echo reply out of several possible duplicates as
ESTABLISHED, while the rest will be INVALID. Also this unnecessarily
increases the conntrackd traffic on H-A firewalls.

Make all the ICMP conntrack entries (including the replied ones)
last for the default of nf_conntrack_icmp{,v6}_timeout seconds.

Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-08 15:53:43 +02:00
Marcel Holtmann
611b30f74b Bluetooth: Add native RFKILL soft-switch support for all devices
With the re-write of the RFKILL subsystem it is now possible to easily
integrate RFKILL soft-switch support into the Bluetooth subsystem. All
Bluetooth devices will now get automatically RFKILL support.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-06-08 14:50:01 +02:00
Marcel Holtmann
b4324b5dc5 Bluetooth: Remove pointless endian conversion helpers
The Bluetooth source uses some endian conversion helpers, that in the end
translate to kernel standard routines. So remove this obfuscation since it
is fully pointless.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-06-08 14:50:01 +02:00
Marcel Holtmann
47ec1dcd69 Bluetooth: Add basic constants for L2CAP ERTM support and use them
This adds the basic constants required to add support for L2CAP Enhanced
Retransmission feature.

Based on a patch from Nathan Holstein <nathan@lampreynetworks.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-06-08 14:50:00 +02:00
Gustavo F. Padovan
589d274648 Bluetooth: Use macro for L2CAP hint mask on receiving config request
Using the L2CAP_CONF_HINT macro is easier to understand than using a
hardcoded 0x80 value.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-06-08 14:50:00 +02:00
Gustavo F. Padovan
8db4dc46dc Bluetooth: Use macros for L2CAP channel identifiers
Use macros instead of hardcoded numbers to make the L2CAP source code
more readable.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-06-08 14:49:59 +02:00
David S. Miller
b1bc81a0ef Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-06-07 04:24:21 -07:00
Johannes Berg
1f87f7d3a3 cfg80211: add rfkill support
To be easier on drivers and users, have cfg80211 register an
rfkill structure that drivers can access. When soft-killed,
simply take down all interfaces; when hard-killed the driver
needs to notify us and we will take down the interfaces
after the fact. While rfkilled, interfaces cannot be set UP.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-03 14:06:14 -04:00
Johannes Berg
7643a2c3fc cfg80211: move txpower wext from mac80211
This patch introduces new cfg80211 API to set the TX power
via cfg80211, puts the wext code into cfg80211 and updates
mac80211 to use all that. The -ENETDOWN bits are a hack but
will go away soon.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-03 14:06:14 -04:00
Johannes Berg
19d337dff9 rfkill: rewrite
This patch completely rewrites the rfkill core to address
the following deficiencies:

 * all rfkill drivers need to implement polling where necessary
   rather than having one central implementation

 * updating the rfkill state cannot be done from arbitrary
   contexts, forcing drivers to use schedule_work and requiring
   lots of code

 * rfkill drivers need to keep track of soft/hard blocked
   internally -- the core should do this

 * the rfkill API has many unexpected quirks, for example being
   asymmetric wrt. alloc/free and register/unregister

 * rfkill can call back into a driver from within a function the
   driver called -- this is prone to deadlocks and generally
   should be avoided

 * rfkill-input pointlessly is a separate module

 * drivers need to #ifdef rfkill functions (unless they want to
   depend on or select RFKILL) -- rfkill should provide inlines
   that do nothing if it isn't compiled in

 * the rfkill structure is not opaque -- drivers need to initialise
   it correctly (lots of sanity checking code required) -- instead
   force drivers to pass the right variables to rfkill_alloc()

 * the documentation is hard to read because it always assumes the
   reader is completely clueless and contains way TOO MANY CAPS

 * the rfkill code needlessly uses a lot of locks and atomic
   operations in locked sections

 * fix LED trigger to actually change the LED when the radio state
   changes -- this wasn't done before

Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> [thinkpad]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-03 14:06:13 -04:00
Johannes Berg
e535c7566e mac80211: deprecate conf.beacon_int properly
Ivo has updated the driver to no longer use the change flag,
so we can remove that, but rt2x00 and ath5k still use the
actual value so let's mark it as deprecated too.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-03 14:05:09 -04:00
Vlad Yasevich
c6ba68a266 sctp: support non-blocking version of the new sctp_connectx() API
Prior implementation of the new sctp_connectx() call that returns
an association ID did not work correctly on non-blocking socket.
This is because we could not return both a EINPROGRESS error and
an association id.  This is a new implementation that supports this.

Originally from Ivan Skytte Jørgensen <isj-sctp@i1.dk

Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-06-03 09:14:47 -04:00
Wei Yongjun
9919b455fc sctp: fix to choose alternate destination when retransmit ASCONF chunk
RFC 5061 Section 5.1 ASCONF Chunk Procedures said:

B4)  Re-transmit the ASCONF Chunk last sent and if possible choose an
     alternate destination address (please refer to [RFC4960],
     Section 6.4.1).  An endpoint MUST NOT add new parameters to this
     chunk; it MUST be the same (including its Sequence Number) as
     the last ASCONF sent.  An endpoint MAY, however, bundle an
     additional ASCONF with new ASCONF parameters with the next
     Sequence Number.  For details, see Section 5.5.

This patch fix to choose an alternate destination address when
re-transmit the ASCONF chunk, with some dup codes cleanup.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-06-03 09:14:46 -04:00
Eric Dumazet
adf30907d6 net: skb->dst accessors
Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03 02:51:04 -07:00
Eric Dumazet
511c3f92ad net: skb->rtable accessor
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb

Delete skb->rtable field

Setting rtable is not allowed, just set dst instead as rtable is an alias.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03 02:51:02 -07:00
Pablo Neira Ayuso
e34d5c1a4f netfilter: conntrack: replace notify chain by function pointer
This patch removes the notify chain infrastructure and replace it
by a simple function pointer. This issue has been mentioned in the
mailing list several times: the use of the notify chain adds
too much overhead for something that is only used by ctnetlink.

This patch also changes nfnetlink_send(). It seems that gfp_any()
returns GFP_KERNEL for user-context request, like those via
ctnetlink, inside the RCU read-side section which is not valid.
Using GFP_KERNEL is also evil since netlink may schedule(),
this leads to "scheduling while atomic" bug reports.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-03 10:32:06 +02:00
Pablo Neira Ayuso
17e6e4eac0 netfilter: conntrack: simplify event caching system
This patch simplifies the conntrack event caching system by removing
several events:

 * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
   since the have no clients.
 * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
   days.
 * IPCT_REFRESH which is not of any use since we always include the
   timeout in the messages.

After this patch, the existing events are:

 * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
 addition and deletion of entries.
 * IPCT_STATUS, that notes that the status bits have changes,
 eg. IPS_SEEN_REPLY and IPS_ASSURED.
 * IPCT_PROTOINFO, that reports that internal protocol information has
 changed, eg. the TCP, DCCP and SCTP protocol state.
 * IPCT_HELPER, that a helper has been assigned or unassigned to this
 entry.
 * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
 covers the case when a mark is set to zero.
 * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
 adjustment.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:46 +02:00
Pablo Neira Ayuso
6bfea1984a netfilter: conntrack: remove events flags from userspace exposed file
This patch moves the event flags from linux/netfilter/nf_conntrack_common.h
to net/netfilter/nf_conntrack_ecache.h. This flags are not of any use
from userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:44 +02:00
Pablo Neira Ayuso
274d383b9c netfilter: conntrack: don't report events on module removal
During the module removal there are no possible event listeners
since ctnetlink must be removed before to allow removing
nf_conntrack. This patch removes the event reporting for the
module removal case which is not of any use in the existing code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:38 +02:00
Pablo Neira Ayuso
f2f3e38c63 netfilter: ctnetlink: rename tuple() by nf_ct_tuple() macro definition
This patch move the internal tuple() macro definition to the
header file as nf_ct_tuple().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:03:35 +02:00
Nivedita Singhvi
f771bef980 ipv4: New multicast-all socket option
After some discussion offline with Christoph Lameter and David Stevens
regarding multicast behaviour in Linux, I'm submitting a slightly
modified patch from the one Christoph submitted earlier.

This patch provides a new socket option IP_MULTICAST_ALL.

In this case, default behaviour is _unchanged_ from the current
Linux standard. The socket option is set by default to provide
original behaviour. Sockets wishing to receive data only from
multicast groups they join explicitly will need to clear this
socket option.

Signed-off-by: Nivedita Singhvi <niv@us.ibm.com>
Signed-off-by: Christoph Lameter<cl@linux.com>
Acked-by: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-02 00:45:24 -07:00
Pablo Neira Ayuso
a17c859849 netfilter: conntrack: add support for DCCP handshake sequence to ctnetlink
This patch adds CTA_PROTOINFO_DCCP_HANDSHAKE_SEQ that exposes
the u64 handshake sequence number to user-space.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-27 17:50:35 +02:00
David S. Miller
45ea4ea2af Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-05-25 00:38:24 -07:00
Zhu Yi
e31a16d6f6 wireless: move some utility functions from mac80211 to cfg80211
The patch moves some utility functions from mac80211 to cfg80211.
Because these functions are doing generic 802.11 operations so they
are not mac80211 specific. The moving allows some fullmac drivers
to be also benefit from these utility functions.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Samuel Ortiz <samuel.ortiz@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-22 14:06:02 -04:00
Michał Mirosław
a7b11d7382 genetlink: Introduce genl_register_family_with_ops()
This introduces genl_register_family_with_ops() that registers a genetlink
family along with operations from a table. This is used to kill copy'n'paste
occurrences in following patches.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:22 -07:00
Rami Rosen
04af8cf6f3 net: Remove unused parameter from fill method in fib_rules_ops.
The netlink message header (struct nlmsghdr) is an unused parameter in
fill method of fib_rules_ops struct.  This patch removes this
parameter from this method and fixes the places where this method is
called.

(include/net/fib_rules.h)

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:26:23 -07:00
Johannes Berg
cce4c77b87 mac80211: fix kernel-doc
Moving information from config_interface to bss_info_changed
removed struct ieee80211_if_conf which the documentation still
refers to, additionally there's one kernel-doc description too
much and one other missing, fix all this.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:32 -04:00
Johannes Berg
e3da574a0d cfg80211: allow wext to remove keys that don't exist
Some applications using wireless extensions expect to be able to
remove a key that doesn't exist. One example is wpa_supplicant
which doesn't actually change behaviour when running into an
error while trying to do that, but it prints an error message
which users interpret as wpa_supplicant having problems.

The safe thing to do is not change the behaviour of wireless
extensions any more, so when the driver reports -ENOENT let
the wext bridge code return success to userspace. To guarantee
this, also document that drivers should return -ENOENT when the
key doesn't exist.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:30 -04:00
David Kilroy
cf5aa2f1f3 cfg80211: mark wiphy->privid as pointer to const
This allows drivers to use a const pointer as the privid without a cast.

Signed-off-by: David Kilroy <kilroyd@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:27 -04:00
David Kilroy
3dcf670baf cfg80211: mark ops as pointer to const
This allows drivers to mark their cfg80211_ops tables const.

Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:27 -04:00
Luis R. Rodriguez
689da1b3b8 wireless: rename IEEE80211_CHAN_NO_FAT_* to HT40-/+
This is more consistent with our nl80211 naming convention
for HT40-/+.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:22 -04:00
Luis R. Rodriguez
038659e7c6 cfg80211: Process regulatory max bandwidth checks for HT40
We are not correctly listening to the regulatory max bandwidth
settings. To actually make use of it we need to redesign things
a bit. This patch does the work for that. We do this to so we
can obey to regulatory rules accordingly for use of HT40.

We end up dealing with HT40 by having two passes for each channel.

The first check will see if a 20 MHz channel fits into the channel's
center freq on a given frequency range. We check for a 20 MHz
banwidth channel as that is the maximum an individual channel
will use, at least for now. The first pass will go ahead and
check if the regulatory rule for that given center of frequency
allows 40 MHz bandwidths and we use this to determine whether
or not the channel supports HT40 or not. So to support HT40 you'll
need at a regulatory rule that allows you to use 40 MHz channels
but you're channel must also be enabled and support 20 MHz by itself.

The second pass is done after we do the regulatory checks over
an device's supported channel list. On each channel we'll check
if the control channel and the extension both:

 o exist
 o are enabled
 o regulatory allows 40 MHz bandwidth on its frequency range

This work allows allows us to idependently check for HT40- and
HT40+.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:22 -04:00
Sascha Hlusiak
645069299a sit: stateless autoconf for isatap
be sent periodically. The rs_delay can be speficied when adding the
PRL entry and defaults to 15 minutes.

The RS is sent from every link local adress that's assigned to the
tunnel interface. It's directed to the (guessed) linklocal address
of the router and is sent through the tunnel.

Better: send to ff02::2 encapsuled in unicast directed to router-v4.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 16:02:02 -07:00
David S. Miller
bb803cfbec Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/scsi/fcoe/fcoe.c
2009-05-18 21:08:20 -07:00
David S. Miller
82d048186e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-05-18 14:48:30 -07:00
Rami Rosen
8b3521eeb7 ipv4: remove an unused parameter from configure method of fib_rules_ops.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 11:59:45 -07:00
Jouni Malinen
3f77316c6b nl80211: Add IEEE 802.1X PAE control for station mode
Add a new NL80211_ATTR_CONTROL_PORT flag for NL80211_CMD_ASSOCIATE to
allow user space to indicate that it will control the IEEE 802.1X port
in station mode. Previously, mac80211 was always marking the port
authorized in station mode. This was enough when drop_unencrypted flag
was set. However, drop_unencrypted can currently be controlled only
with WEXT and the current nl80211 design does not allow fully secure
configuration. Fix this by providing a mechanism for user space to
control the IEEE 802.1X port in station mode (i.e., do the same that
we are already doing in AP mode).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:37 -04:00
Johannes Berg
eccb8e8f0c nl80211: improve station flags handling
It is currently not possible to modify station flags, but that
capability would be very useful. This patch introduces a new
nl80211 attribute that contains a set/mask for station flags,
and updates the internal API (and mac80211) to mirror that.

The new attribute is parsed before falling back to the old so
that userspace can specify both (if it can) to work on all
kernels.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:35 -04:00
Johannes Berg
08645126dd cfg80211: implement wext key handling
Move key handling wireless extension ioctls from mac80211 to cfg80211
so that all drivers that implement the cfg80211 operations get wext
compatibility.

Note that this drops the SIOCGIWENCODE ioctl support for getting
IW_ENCODE_RESTRICTED/IW_ENCODE_OPEN. This means that iwconfig will
no longer report "Security mode:open" or "Security mode:restricted"
for mac80211. However, what we displayed there (the authentication
algo used) was actually wrong -- linux/wireless.h states that this
setting is meant to differentiate between "Refuse non-encoded packets"
and "Accept non-encoded packets".

(Combined with "cfg80211: fix a couple of bugs with key ioctls". -- JWL)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:32 -04:00
Johannes Berg
cbe8fa9c5e cfg80211: put wext data into substructure
To make it more apparent in the code what is for wext
only (and needs to be #ifdef'ed) put all the info for
wext into a substruct in each wireless_dev.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:24:07 -04:00
Johannes Berg
4e943900fb cfg80211: constify key mac address in ops
The address pointed to by mac_addr can be marked as const.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:24:07 -04:00
Johannes Berg
44033f80ce mac80211: remove ieee80211_ht_bss_info
This struct is no longer used (and hasn't been for a while).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:57 -04:00
Johannes Berg
9ed6bcce77 mac80211: move HT operation mode BSS info
There really is no need to have a separate struct for a
single variable. The fact that it exists is due to the
code legacy, but we can remove that now. Very simple.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:57 -04:00
Jouni Malinen
dc6382ced0 nl80211 : Add support for configuring MFP
NL80211_CMD_ASSOCIATE request must be able to indicate whether
management frame protection (IEEE 802.11w) is being used. mac80211 was
able to use MFP in client mode only with WEXT, but the new
NL80211_ATTR_USE_MFP attribute will allow this to be done with
nl80211, too.

Since we are currently using nl80211 for MFP only with drivers that
use user space SME, only MFP disabled and required values are
used. However, the NL80211_ATTR_USE_MFP attribute is an enum that can
be extended with MFP optional in the future, if that is needed with
some drivers (e.g., if the RSN IE is generated by the driver).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:54 -04:00
David S. Miller
a8679be207 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-05-08 12:46:17 -07:00
David S. Miller
22f6dacdfc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	include/net/tcp.h
2009-05-08 02:48:30 -07:00
Eric Dumazet
7aedec2ad5 tcp: tcp_prequeue() can use keyed wakeups
We can avoid waking up tasks not interested in receive notifications,
using wake_up_interruptible_poll() instead of wake_up_interruptible()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-07 14:52:28 -07:00
Eric Dumazet
f5f8d86b23 tcp: tcp_prequeue() cleanup
Small cleanup patch to reduce line lengths, before a change in
tcp_prequeue().

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-07 14:52:26 -07:00
Johannes Berg
5cff20e6c5 mac80211: tell driver when idle
When we aren't doing anything in mac80211, we can turn off
much of the hardware, depending on the driver/hw. Not doing
anything, aka being idle, means:

 * no monitor interfaces
 * no AP/mesh/wds interfaces
 * any station interfaces are in DISABLED state
 * any IBSS interfaces aren't trying to be in a network
 * we aren't trying to scan

By creating a new function that verifies these conditions and calling
it at strategic points where the states of those conditions change,
we can easily make mac80211 tell the driver when we are idle to save
power.

Additionally, this fixes a small quirk where a recalculated powersave
state is passed to the driver even if the hardware is about to stopped
completely.

This patch intentionally doesn't touch radio_enabled because that is
currently implemented to be a soft rfkill which is inappropriate here
when we need to be able to wake up with low latency.

One thing I'm not entirely sure about is this:

  phy0: device no longer idle - in use
  wlan0: direct probe to AP 00:11:24:91:07:4d try 1
  wlan0 direct probe responded
  wlan0: authenticate with AP 00:11:24:91:07:4d
  wlan0: authenticated
> phy0: device now idle
> phy0: device no longer idle - in use
  wlan0: associate with AP 00:11:24:91:07:4d
  wlan0: RX AssocResp from 00:11:24:91:07:4d (capab=0x401 status=0 aid=1)
  wlan0: associated

Is it appropriate to go into idle state for a short time when we have
just authenticated, but not associated yet? This happens only with the
userspace SME, because we cannot really know how long it will wait
before asking us to associate. Would going idle after a short timeout
be more appropriate? We may need to revisit this, depending on what
happens.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-06 15:14:51 -04:00
Johannes Berg
2d0ddec5b2 mac80211: unify config_interface and bss_info_changed
The config_interface method is a little strange, it contains the
BSSID and beacon updates, while bss_info_changed contains most
other BSS information for each interface. This patch removes
config_interface and rolls all the information it previously
passed to drivers into bss_info_changed.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-06 15:14:36 -04:00
Johannes Berg
57c4d7b4c4 mac80211: clean up beacon interval settings
We currently have two beacon interval configuration knobs:
hw.conf.beacon_int and vif.bss_info.beacon_int. This is
rather confusing, even though the former is used when we
beacon ourselves and the latter when we are associated to
an AP.

This just deprecates the hw.conf.beacon_int setting in favour
of always using vif.bss_info.beacon_int. Since it touches all
the beaconing IBSS code anyway, we can also add support for
the cfg80211 IBSS beacon interval configuration easily.

NOTE: The hw.conf.beacon_int setting is retained for now due
      to drivers still using it -- I couldn't untangle all
      drivers, some are updated in this patch.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-06 15:14:33 -04:00
Johannes Berg
9ccebe6148 mac80211: rename max_sleep_interval to max_sleep_period
Kalle points out that max_sleep_interval is somewhat confusing
because the value is measured in beacon intervals, and not in
TU. Rename it to max_sleep_period to be consistent with things
like DTIM period that are also measured in beacon intervals.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-06 15:14:29 -04:00
Linus Torvalds
80445de577 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
  e1000: fix virtualization bug
  bonding: fix alb mode locking regression
  Bluetooth: Fix issue with sysfs handling for connections
  usbnet: CDC EEM support (v5)
  tcp: Fix tcp_prequeue() to get correct rto_min value
  ehea: fix invalid pointer access
  ne2k-pci: Do not register device until initialized.
  Subject: [PATCH] br2684: restore net_dev initialization
  net: Only store high 16 bits of kernel generated filter priorities
  virtio_net: Fix function name typo
  virtio_net: Cleanup command queue scatterlist usage
  bonding: correct the cleanup in bond_create()
  virtio: add missing include to virtio_net.h
  smsc95xx: add support for LAN9512 and LAN9514
  smsc95xx: configure LED outputs
  netconsole: take care of NETDEV_UNREGISTER event
  xt_socket: checks for the state of nf_conntrack
  bonding: bond_slave_info_query() fix
  cxgb3: fixing gcc 4.4 compiler warning: suggest parentheses around operand of ‘!’
  netfilter: use likely() in xt_info_rdlock_bh()
  ...
2009-05-05 08:26:10 -07:00
Marcel Holtmann
a67e899cf3 Bluetooth: Fix issue with sysfs handling for connections
Due to a semantic changes in flush_workqueue() the current approach of
synchronizing the sysfs handling for connections doesn't work anymore. The
whole approach is actually fully broken and based on assumptions that are
no longer valid.

With the introduction of Simple Pairing support, the creation of low-level
ACL links got changed. This change invalidates the reason why in the past
two independent work queues have been used for adding/removing sysfs
devices. The adding of the actual sysfs device is now postponed until the
host controller successfully assigns an unique handle to that link. So
the real synchronization happens inside the controller and not the host.

The only left-over problem is that some internals of the sysfs device
handling are not initialized ahead of time. This leaves potential access
to invalid data and can cause various NULL pointer dereferences. To fix
this a new function makes sure that all sysfs details are initialized
when an connection attempt is made. The actual sysfs device is only
registered when the connection has been successfully established. To
avoid a race condition with the registration, the check if a device is
registered has been moved into the removal work.

As an extra protection two flush_work() calls are left in place to
make sure a previous add/del work has been completed first.

Based on a report by Marc Pignat <marc.pignat@hevs.ch>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Roger Quadros <ext-roger.quadros@nokia.com>
Tested-by: Marc Pignat <marc.pignat@hevs.ch>
2009-05-04 14:29:02 -07:00
Satoru SATOH
0c266898b4 tcp: Fix tcp_prequeue() to get correct rto_min value
tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of
the actual value might be tuned. The following patches fix this and make
tcp_prequeue get the actual value returns from tcp_rto_min().

Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-04 11:11:01 -07:00
Rami Rosen
accc5b4f90 ipv4: remove unused macro (FIB_RES_RESET) from ip_fib.h.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-03 14:19:51 -07:00
David S. Miller
aba7453037 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/isdn/00-INDEX
	drivers/net/wireless/iwlwifi/iwl-scan.c
	drivers/net/wireless/rndis_wlan.c
	net/mac80211/main.c
2009-04-29 20:30:35 -07:00
Marcel Holtmann
052b30b0a8 Bluetooth: Add different pairing timeout for Legacy Pairing
The Bluetooth stack uses a reference counting for all established ACL
links and if no user (L2CAP connection) is present, the link will be
terminated to save power. The problem part is the dedicated pairing
when using Legacy Pairing (Bluetooth 2.0 and before). At that point
no user is present and pairing attempts will be disconnected within
10 seconds or less. In previous kernel version this was not a problem
since the disconnect timeout wasn't triggered on incoming connections
for the first time. However this caused issues with broken host stacks
that kept the connections around after dedicated pairing. When the
support for Simple Pairing got added, the link establishment procedure
needed to be changed and now causes issues when using Legacy Pairing

When using Simple Pairing it is possible to do a proper reference
counting of ACL link users. With Legacy Pairing this is not possible
since the specification is unclear in some areas and too many broken
Bluetooth devices have already been deployed. So instead of trying to
deal with all the broken devices, a special pairing timeout will be
introduced that increases the timeout to 60 seconds when pairing is
triggered.

If a broken devices now puts the stack into an unforeseen state, the
worst that happens is the disconnect timeout triggers after 120 seconds
instead of 4 seconds. This allows successful pairings with legacy and
broken devices now.

Based on a report by Johan Hedberg <johan.hedberg@nokia.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-04-28 09:31:38 -07:00
Roger Quadros
f3784d834c Bluetooth: Ensure that HCI sysfs add/del is preempt safe
Use a different work_struct variables for add_conn() and del_conn() and
use single work queue instead of two for adding and deleting connections.

It eliminates the following error on a preemptible kernel:

[  204.358032] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
[  204.370697] pgd = c0004000
[  204.373443] [0000000c] *pgd=00000000
[  204.378601] Internal error: Oops: 17 [#1] PREEMPT
[  204.383361] Modules linked in: vfat fat rfcomm sco l2cap sd_mod scsi_mod iphb pvr2d drm omaplfb ps
[  204.438537] CPU: 0    Not tainted  (2.6.28-maemo2 #1)
[  204.443664] PC is at klist_put+0x2c/0xb4
[  204.447601] LR is at klist_put+0x18/0xb4
[  204.451568] pc : [<c0270f08>]    lr : [<c0270ef4>]    psr: a0000113
[  204.451568] sp : cf1b3f10  ip : cf1b3f10  fp : cf1b3f2c
[  204.463104] r10: 00000000  r9 : 00000000  r8 : bf08029c
[  204.468353] r7 : c7869200  r6 : cfbe2690  r5 : c78692c8  r4 : 00000001
[  204.474945] r3 : 00000001  r2 : cf1b2000  r1 : 00000001  r0 : 00000000
[  204.481506] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment kernel
[  204.488861] Control: 10c5387d  Table: 887fc018  DAC: 00000017
[  204.494628] Process btdelconn (pid: 515, stack limit = 0xcf1b22e0)

Signed-off-by: Roger Quadros <ext-roger.quadros@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-04-28 09:31:38 -07:00
Neil Horman
edf391ff17 snmp: add missing counters for RFC 4293
The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
OutMcastOctets:
http://tools.ietf.org/html/rfc4293
But it seems we don't track those in any way that easy to separate from other
protocols.  This patch adds those missing counters to the stats file. Tested
successfully by me

With help from Eric Dumazet.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27 02:45:02 -07:00
Rami Rosen
92ae3efa53 ipv4: remove unused member in fib_table.
This patch removes an unused parameter (tb_stamp) from fib_table
structure in include/net/ip_fib.h.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27 02:35:32 -07:00
David S. Miller
495a1b4eff Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	net/mac80211/pm.c
2009-04-25 16:36:46 -07:00
Hendrik Brueckner
09488e2e0f af_iucv: New socket option for setting IUCV MSGLIMITs
The SO_MSGLIMIT socket option modifies the message limit for new
IUCV communication paths.

The message limit specifies the maximum number of outstanding messages
that are allowed for connections. This setting can be lowered by z/VM
when an IUCV connection is established.

Expects an integer value in the range of 1 to 65535.
The default value is 65535.

The message limit must be set before calling connect() or listen()
for sockets.

If sockets are already connected or in state listen, changing the message
limit is not supported.
For reading the message limit value, unconnected sockets return the limit
that has been set or the default limit. For connected sockets, the actual
message limit is returned. The actual message limit is assigned by z/VM
for each connection and it depends on IUCV MSGLIMIT authorizations
specified for the z/VM guest virtual machine.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-23 04:04:38 -07:00
Hendrik Brueckner
44b1e6b5f9 af_iucv: Modify iucv msg target class using control msghdr
Allow 'classification' of socket data that is sent or received over
an af_iucv socket. For classification of data, the target class of an
(native) iucv message is used.

This patch provides the cmsg interface for iucv_sock_recvmsg() and
iucv_sock_sendmsg().  Applications can use the msg_control field of
struct msghdr to set or get the target class as a
"socket control message" (SCM/CMSG).

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-23 04:04:35 -07:00
Hendrik Brueckner
9d5c5d8f41 af_iucv: add sockopt() to enable/disable use of IPRM_DATA msgs
Provide the socket operations getsocktopt() and setsockopt() to enable/disable
sending of data in the parameter list of IUCV messages.
The patch sets respective flag only.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-23 04:04:32 -07:00
Jouni Malinen
1965c85331 nl80211: Add event for authentication/association timeout
SME needs to be notified when the authentication or association
attempt times out and MLME has stopped processing in order to allow
the SME to decide what to do next.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:21 -04:00
Johannes Berg
04fe20372e mac80211: calculate maximum sleep interval
The maximum sleep interval, for powersave purposes, is
determined by the DTIM period (it may not be larger)
and the required networking latency (it must be small
enough to fulfil those constraints).

This makes mac80211 calculate the maximum sleep interval
based on those constraints, and pass it to the driver.
Then the driver should instruct the device to sleep at
most that long.

Note that the device is responsible for aligning the
maximum sleep interval between DTIMs, we make sure it's
not longer but it needs to make sure it's between them.

Also, group some powersave documentation together and
make it more explicit that we support managed mode only,
and no IBSS powersaving (yet).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:20 -04:00
Johannes Berg
8e30bc55de nl80211: allow configuring IBSS beacon interval
Make the JOIN_IBSS command look at the beacon interval
attribute to see if the user requested a specific beacon
interval, if not default to 100 TU (wext too).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:20 -04:00
Johannes Berg
e255d5eb2b mac80211: remove IEEE80211_CONF_CHANGE_DYNPS_TIMEOUT
Just setting IEEE80211_CONF_CHANGE_PS should be sufficient
for changes in the power saving things. The driver already
tells us whether it wants notification of dynps via the
"have dynps support" hw flag.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:20 -04:00
Jouni Malinen
b9a5f8cab7 nl80211: Add set/get for frag/rts threshold and retry limits
Add new nl80211 attributes that can be used with NL80211_CMD_SET_WIPHY
and NL80211_CMD_GET_WIPHY to manage fragmentation/RTS threshold and
retry limits.

Since these values are stored in struct wiphy, remove the local copy
from mac80211 where feasible (frag & rts threshold). The retry limits
are currently needed in struct ieee80211_conf, but these could be
eventually removed since the driver should have access to the values
in struct wiphy.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:17 -04:00
Johannes Berg
d323655372 cfg80211: clean up includes
Trying to separate header files into net/wireless.h and
net/cfg80211.h has been a source of confusion. Remove
net/wireless.h (because there also is the linux/wireless.h)
and subsume everything into net/cfg80211.h -- except the
definitions for regulatory structures which get moved to
a new header net/regulatory.h.

The "new" net/cfg80211.h is now divided into sections.

There are no real changes in this patch but code shuffling
and some very minor documentation fixes.

I have also, to make things reflect reality, put in a
copyright line for Luis to net/regulatory.h since that
is probably exclusively written by him but was formerly
in a file that only had my copyright line.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:17 -04:00
Johannes Berg
04a773ade0 cfg80211/nl80211: add IBSS API
This adds IBSS API along with (preliminary) wext handlers.
The wext handlers can only do IBSS so you need to call them
from your own wext handlers if the mode is IBSS.

The nl80211 API requires
 * an SSID
 * a channel (frequency) for the case that a new IBSS
   has to be created

It optionally supports
 * a flag to fix the channel
 * a fixed BSSID

The cfg80211 code also takes care to leave the IBSS before
the netdev is set down. If wireless extensions are used, it
also caches values when the interface is down and instructs
the driver to join when the interface is set up.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:17 -04:00
Johannes Berg
691597cb26 cfg80211/mac80211: move wext SIWMLME into cfg80211
Since we have ->deauth and ->disassoc we can support the
wext SIWMLME call directly without driver wext handlers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:17 -04:00
Johannes Berg
955394c98c mac80211: document powersaving/beacon filter future
Document what mac80211 will do in the future to help save power.
We're not quite there yet, but a plan helps. Also, while at it,
fix the docs wrt. multicast traffic.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:16 -04:00
Johannes Berg
f2753ddbad mac80211: add hardware restart function
Some hardware defects may require the hardware to be re-initialised
completely from scratch. Drivers would need much information (for
instance the current MAC address, crypto keys, beaconing information,
etc.) stored duplicated from mac80211 to be able to do this, so let
mac80211 help them.

The new ieee80211_restart_hw() function requires the same code as
resuming, so move that code into a new ieee80211_reconfig() function
in util.c and leave only the suspend code in pm.c.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:57:14 -04:00
Johannes Berg
25e47c18ac cfg80211: add cipher capabilities
This adds the necessary code and fields to let drivers specify
their cipher capabilities and exports them to userspace. Also
update mac80211 to export the ciphers it has.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:54:40 -04:00
Johannes Berg
de95a54b1a mac80211: pass all probe request IEs to driver
Instead of just passing the cfg80211-requested IEs, pass
the locally generated ones as well.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:54:39 -04:00
Johannes Berg
18a8365992 cfg80211: introduce scan IE limit attribute
This patch introduces a new attribute for a wiphy that tells
userspace how long the information elements added to a probe
request frame can be at most. It also updates the at76 to
advertise that it cannot support that, and, for now until I
can fix that, iwlwifi too.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:54:39 -04:00
Jussi Kivilinna
06aa7afaaa cfg80211: add cfg80211_inform_bss
Added cfg80211_inform_bss() for full-mac devices to use.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:54:36 -04:00
Jouni Malinen
a3b8b0569f nl80211: Add Michael MIC failure event
Define a new nl80211 event, NL80211_CMD_MICHAEL_MIC_FAILURE, to be
used to notify user space about locally detected Michael MIC failures.
This matches with the MLME-MICHAELMICFAILURE.indication() primitive.

Since we do not actually have TSC in the skb anymore when
mac80211_ev_michael_mic_failure() is called, that function is changed
to take in the TSC as an optional parameter instead of as a
requirement to include the TSC after the hdr field (which we did not
really follow). For now, TSC is not included in the events from
mac80211, but it could be added at some point.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:54:28 -04:00
Jouni Malinen
53b46b8444 nl80211: Generate deauth/disassoc event for locally generated frames
Previously, nl80211 mlme events were generated only for received
deauthentication and disassociation frames. We need to do the same for
locally generated ones in order to let applications know that we
disconnected (e.g., when AP does not reply to a probe). Rename the
nl80211 and cfg80211 functions (s/rx_//) to make it clearer that they
are used for both received and locally generated frames.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:54:28 -04:00
Huang Weiyi
07f62d01c1 cfg80211: remove duplicated #include
Remove duplicated #include in include/net/cfg80211.h.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-22 16:54:28 -04:00
Florian Westphal
a0f82f64e2 syncookies: remove last_synq_overflow from struct tcp_sock
last_synq_overflow eats 4 or 8 bytes in struct tcp_sock, even
though it is only used when a listening sockets syn queue
is full.

We can (ab)use rx_opt.ts_recent_stamp to store the same information;
it is not used otherwise as long as a socket is in listen state.

Move linger2 around to avoid splitting struct mtu_probe
across cacheline boundary on 32 bit arches.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-20 02:25:26 -07:00
David S. Miller
134ffb4cad Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-04-16 16:32:29 -07:00
Patrick McHardy
98d500d66c netfilter: nf_nat: add support for persistent mappings
The removal of the SAME target accidentally removed one feature that is
not available from the normal NAT targets so far, having multi-range
mappings that use the same mapping for each connection from a single
client. The current behaviour is to choose the address from the range
based on source and destination IP, which breaks when communicating
with sites having multiple addresses that require all connections to
originate from the same IP address.

Introduce a IP_NAT_RANGE_PERSISTENT option that controls whether the
destination address is taken into account for selecting addresses.

http://bugzilla.kernel.org/show_bug.cgi?id=12954

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-16 18:33:01 +02:00
Vlad Yasevich
499923c7a3 ipv6: Fix NULL pointer dereference with time-wait sockets
Commit b2f5e7cd3d
(ipv6: Fix conflict resolutions during ipv6 binding)
introduced a regression where time-wait sockets were
not treated correctly.  This resulted in the following:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70
...
Call Trace:
[<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
[<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
[<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400
[<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6]
[<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0
[<ffffffff8056ed49>] sys_bind+0x89/0x100
[<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b

Tested-by: Brian Haley <brian.haley@hp.com>
Tested-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-11 01:53:06 -07:00
Pablo Neira Ayuso
83731671d9 netfilter: ctnetlink: fix regression in expectation handling
This patch fixes a regression (introduced by myself in commit 19abb7b:
netfilter: ctnetlink: deliver events for conntracks changed from
userspace) that results in an expectation re-insertion since
__nf_ct_expect_check() may return 0 for expectation timer refreshing.

This patch also removes a unnecessary refcount bump that
pretended to avoid a possible race condition with event delivery
and expectation timers (as said, not needed since we hold a
reference to the object since until we finish the expectation
setup). This also merges nf_ct_expect_related_report() and
nf_ct_expect_related() which look basically the same.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-06 17:47:20 +02:00
Latchesar Ionkov
1bab88b231 net/9p: handle correctly interrupted 9P requests
Currently the 9p code crashes when a operation is interrupted, i.e. for
example when the user presses ^C while reading from a file.

This patch fixes the code that is responsible for interruption and flushing
of 9P operations.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
2009-04-05 16:54:53 -05:00
Linus Torvalds
ef8a97bbc9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (54 commits)
  glge: remove unused #include <version.h>
  dnet: remove unused #include <version.h>
  tcp: miscounts due to tcp_fragment pcount reset
  tcp: add helper for counter tweaking due mid-wq change
  hso: fix for the 'invalid frame length' messages
  hso: fix for crash when unplugging the device
  fsl_pq_mdio: Fix compile failure
  fsl_pq_mdio: Revive UCC MDIO support
  ucc_geth: Pass proper device to DMA routines, otherwise oops happens
  i.MX31: Fixing cs89x0 network building to i.MX31ADS
  tc35815: Fix build error if NAPI enabled
  hso: add Vendor/Product ID's for new devices
  ucc_geth: Remove unused header
  gianfar: Remove unused header
  kaweth: Fix locking to be SMP-safe
  net: allow multiple dev per napi with GRO
  r8169: reset IntrStatus after chip reset
  ixgbe: Fix potential memory leak/driver panic issue while setting up Tx & Rx ring parameters
  ixgbe: fix ethtool -A|a behavior
  ixgbe: Patch to fix driver panic while freeing up tx & rx resources
  ...
2009-04-02 21:05:30 -07:00
Ilpo Järvinen
797108d134 tcp: add helper for counter tweaking due mid-wq change
We need full-scale adjustment to fix a TCP miscount in the next
patch, so just move it into a helper and call for that from the
other places.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-02 16:31:44 -07:00
Paul Moore
07feee8f81 netlabel: Cleanup the Smack/NetLabel code to fix incoming TCP connections
This patch cleans up a lot of the Smack network access control code.  The
largest changes are to fix the labeling of incoming TCP connections in a
manner similar to the recent SELinux changes which use the
security_inet_conn_request() hook to label the request_sock and let the label
move to the child socket via the normal network stack mechanisms.  In addition
to the incoming TCP connection fixes this patch also removes the smk_labled
field from the socket_smack struct as the minor optimization advantage was
outweighed by the difficulty in maintaining it's proper state.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-03-28 15:01:37 +11:00
Paul Moore
389fb800ac netlabel: Label incoming TCP connections correctly in SELinux
The current NetLabel/SELinux behavior for incoming TCP connections works but
only through a series of happy coincidences that rely on the limited nature of
standard CIPSO (only able to convey MLS attributes) and the write equality
imposed by the SELinux MLS constraints.  The problem is that network sockets
created as the result of an incoming TCP connection were not on-the-wire
labeled based on the security attributes of the parent socket but rather based
on the wire label of the remote peer.  The issue had to do with how IP options
were managed as part of the network stack and where the LSM hooks were in
relation to the code which set the IP options on these newly created child
sockets.  While NetLabel/SELinux did correctly set the socket's on-the-wire
label it was promptly cleared by the network stack and reset based on the IP
options of the remote peer.

This patch, in conjunction with a prior patch that adjusted the LSM hook
locations, works to set the correct on-the-wire label format for new incoming
connections through the security_inet_conn_request() hook.  Besides the
correct behavior there are many advantages to this change, the most significant
is that all of the NetLabel socket labeling code in SELinux now lives in hooks
which can return error codes to the core stack which allows us to finally get
ride of the selinux_netlbl_inode_permission() logic which greatly simplfies
the NetLabel/SELinux glue code.  In the process of developing this patch I
also ran into a small handful of AF_INET6 cleanliness issues that have been
fixed which should make the code safer and easier to extend in the future.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-03-28 15:01:36 +11:00
Johannes Berg
e4e72fb4de mac80211/iwlwifi: move virtual A-MDPU queue bookkeeping to iwlwifi
This patch removes all the virtual A-MPDU-queue bookkeeping from
mac80211. Curiously, iwlwifi already does its own bookkeeping, so
it doesn't require much changes except where it needs to handle
starting and stopping the queues in mac80211.

To handle the queue stop/wake properly, we rewrite the software
queue number for aggregation frames and internally to iwlwifi keep
track of the queues that map into the same AC queue, and only talk
to mac80211 about the AC queue. The implementation requires calling
two new functions, iwl_stop_queue and iwl_wake_queue instead of the
mac80211 counterparts.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Reinette Chattre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:23 -04:00
Johannes Berg
cd8ffc800c mac80211: fix aggregation to not require queue stop
Instead of stopping the entire AC queue when enabling aggregation
(which was only done for hardware with aggregation queues) buffer
the packets for each station, and release them to the pending skb
queue once aggregation is turned on successfully.

We get a little more code, but it becomes conceptually simpler and
we can remove the entire virtual queue mechanism from mac80211 in
a follow-up patch.

This changes how mac80211 behaves towards drivers that support
aggregation but have no hardware queues -- those drivers will now
not be handed packets while the aggregation session is being
established, but only after it has been fully established.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:22 -04:00
Johannes Berg
b1720231ca mac80211: unify and fix TX aggregation start
When TX aggregation becomes operational, we do a number of steps:
 1) print a debug message
 2) wake the virtual queue
 3) notify the driver

Unfortunately, 1) and 3) are only done if the driver is first to
reply to the aggregation request, it is, however, possible that the
remote station replies before the driver! Thus, unify the code for
this and call the new function ieee80211_agg_tx_operational in both
places where TX aggregation can become operational.

Additionally, rename the driver notification from
IEEE80211_AMPDU_TX_RESUME to IEEE80211_AMPDU_TX_OPERATIONAL.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:22 -04:00
Johannes Berg
2b874e83c9 mac80211: rate control status only for controlled packets
This patch changes mac80211 to not notify the rate control algorithm's
tx_status() method when reporting status for a packet that didn't go
through the rate control algorithm's get_rate() method.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:15 -04:00
Kalle Valo
04de838159 mac80211: add beacon filtering support
Add IEEE80211_HW_BEACON_FILTERING flag so that driver inform that it supports
beacon filtering. Drivers need to call the new function
ieee80211_beacon_loss() to notify about beacon loss.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:13 -04:00
Kalle Valo
a08c1c1ac0 cfg80211: add feature to hold bss
In beacon filtering there needs to be a way to not expire the BSS even
when no beacons are received. Add an interface to cfg80211 to hold
BSS and make sure that it's not expired.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:13 -04:00
Kalle Valo
9050bdd858 mac80211: disable power save when scanning
When software scanning we need to disable power save so that all possible
probe responses and beacons are received. For hardware scanning assume that
hardware will take care of that and document that assumption.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:12 -04:00
Jouni Malinen
65fc73ac4a nl80211: Remove NL80211_CMD_SET_MGMT_EXTRA_IE
The functionality that NL80211_CMD_SET_MGMT_EXTRA_IE provided can now
be achieved with cleaner design by adding IE(s) into
NL80211_CMD_TRIGGER_SCAN, NL80211_CMD_AUTHENTICATE,
NL80211_CMD_ASSOCIATE, NL80211_CMD_DEAUTHENTICATE, and
NL80211_CMD_DISASSOCIATE.

Since this is a very recently added command and there are no known (or
known planned) applications using NL80211_CMD_SET_MGMT_EXTRA_IE and
taken into account how much extra complexity it adds to the IE
processing we have now (and need to add in the future to fix IE order
in couple of frames), it looks like the best option is to just remove
the implementation of this command for now. The enum values themselves
are left to avoid changing the nl80211 command or attribute numbers.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:04 -04:00
Jouni Malinen
636a5d3625 nl80211: Add MLME primitives to support external SME
This patch adds new nl80211 commands to allow user space to request
authentication and association (and also deauthentication and
disassociation). The commands are structured to allow separate
authentication and association steps, i.e., the interface between
kernel and user space is similar to the MLME SAP interface in IEEE
802.11 standard and an user space application takes the role of the
SME.

The patch introduces MLME-AUTHENTICATE.request,
MLME-{,RE}ASSOCIATE.request, MLME-DEAUTHENTICATE.request, and
MLME-DISASSOCIATE.request primitives. The authentication and
association commands request the actual operations in two steps
(assuming the driver supports this; if not, separate authentication
step is skipped; this could end up being a separate "connect"
command).

The initial implementation for mac80211 uses the current
net/mac80211/mlme.c for actual sending and processing of management
frames and the new nl80211 commands will just stop the current state
machine from moving automatically from authentication to association.
Future cleanup may move more of the MLME operations into cfg80211.

The goal of this design is to provide more control of authentication and
association process to user space without having to move the full MLME
implementation. This should be enough to allow IEEE 802.11r FT protocol
and 802.11s SAE authentication to be implemented. Obviously, this will
also bring the extra benefit of not having to use WEXT for association
requests with mac80211. An example implementation of a user space SME
using the new nl80211 commands is available for wpa_supplicant.

This patch is enough to get IEEE 802.11r FT protocol working with
over-the-air mechanism (over-the-DS will need additional MLME
primitives for handling the FT Action frames).

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:02 -04:00
Jouni Malinen
6039f6d23f nl80211: Event notifications for MLME events
Add new nl80211 event notifications (and a new multicast group, "mlme")
for informing user space about received and processed Authentication,
(Re)Association Response, Deauthentication, and Disassociation frames in
station and IBSS modes (i.e., MLME SAP interface primitives
MLME-AUTHENTICATE.confirm, MLME-ASSOCIATE.confirm,
MLME-REASSOCIATE.confirm, MLME-DEAUTHENTICATE.indicate, and
MLME-DISASSOCIATE.indication). The event data is encapsulated as the 802.11
management frame since we already have the frame in that format and it
includes all the needed information.

This is the initial step in providing MLME SAP interface for
authentication and association with nl80211. In other words, kernel code
will act as the MLME and a user space application can control it as the
SME.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:02 -04:00
Johannes Berg
b3a902850a mac80211: kill IEEE80211_CONF_SHORT_SLOT_TIME
No drivers use it any more, so it can now be removed safely.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:13:00 -04:00
Johannes Berg
aae89831df wireless: radiotap updates
Radiotap was updated to include a "bad PLCP" flag and standardise
the "bad FCS" flag in the "flags" rather than "RX flags" field,
this patch updates Linux to that standard.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:12:52 -04:00
Johannes Berg
51b381479f mac80211: reduce max number of queues
No hw/driver actually supports more than four queues right now,
and we allocate a number of things per queue which means we
waste a bit of memory. Reduce the maximum number to four to
accurately reflect what we do (and need for QoS). Even if we
had hardware supporting more queues we couldn't take advantage
of that right now anyway.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:12:45 -04:00
Johannes Berg
176be728ee mac80211: remove ieee80211_num_regular_queues
This inline is useless and actually makes the code _longer_
rather than shorter.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-27 20:12:42 -04:00
Thierry Reding
a170285772 net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.
This patch adds a platform device driver that supports the OpenCores 10/100
Mbps Ethernet MAC.

The driver expects three resources: one IORESOURCE_MEM resource defines the
memory region for the core's memory-mapped registers while a second
IORESOURCE_MEM resource defines the network packet buffer space. The third
resource, of type IORESOURCE_IRQ, associates an interrupt with the driver.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-27 00:16:21 -07:00
David S. Miller
01e6de64d9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-03-26 22:45:23 -07:00
Holger Eitzenberger
5c0de29d06 netfilter: nf_conntrack: add generic function to get len of generic policy
Usefull for all protocols which do not add additional data, such
as GRE or UDPlite.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 21:52:17 +01:00
Eric Dumazet
ea781f197d netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()
Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP.

This permits an easy conversion from call_rcu() based hash lists to a
SLAB_DESTROY_BY_RCU one.

Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.

First, it doesnt fill RCU queues (up to 10000 elements per cpu).
This reduces OOM possibility, if queued elements are not taken into account
This reduces latency problems when RCU queue size hits hilimit and triggers
emergency mode.

- It allows fast reuse of just freed elements, permitting better use of
CPU cache.

- We delete rcu_head from "struct nf_conn", shrinking size of this structure
by 8 or 16 bytes.

This patch only takes care of "struct nf_conn".
call_rcu() is still used for less critical conntrack parts, that may
be converted later if necessary.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 21:05:46 +01:00
Holger Eitzenberger
af9d32ad67 netfilter: limit the length of the helper name
This is necessary in order to have an upper bound for Netlink
message calculation, which is not a problem at all, as there
are no helpers with a longer name.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 18:44:01 +01:00
Holger Eitzenberger
e487eb99cf netlink: add nla_policy_len()
It calculates the max. length of a Netlink policy, which is usefull
for allocating Netlink buffers roughly the size of the actual
message.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 18:26:30 +01:00
Holger Eitzenberger
d0dba7255b netfilter: ctnetlink: add callbacks to the per-proto nlattrs
There is added a single callback for the l3 proto helper.  The two
callbacks for the l4 protos are necessary because of the general
structure of a ctnetlink event, which is in short:

 CTA_TUPLE_ORIG
   <l3/l4-proto-attributes>
 CTA_TUPLE_REPLY
   <l3/l4-proto-attributes>
 CTA_ID
 ...
 CTA_PROTOINFO
   <l4-proto-attributes>
 CTA_TUPLE_MASTER
   <l3/l4-proto-attributes>

Therefore the formular is

 size := sizeof(generic-nlas) + 3 * sizeof(tuple_nlas) + sizeof(protoinfo_nlas)

Some of the NLAs are optional, e. g. CTA_TUPLE_MASTER, which is only
set if it's an expected connection.  But the number of optional NLAs is
small enough to prevent netlink_trim() from reallocating if calculated
properly.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 18:24:48 +01:00
Daniel Mack
67fca028f1 ax88796: Add method to take MAC from platform data
Implement a way to provide the MAC address for ax88796 devices from
their platform data. Boards might decide to set the address
programmatically, taken from boot tags or other sources.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 23:32:03 -07:00
Vlad Yasevich
b2f5e7cd3d ipv6: Fix conflict resolutions during ipv6 binding
The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal()
which at times wrongly identified intersections between addresses.
It particularly broke down under a few instances and caused erroneous
bind conflicts.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 19:49:11 -07:00
David S. Miller
b5bb14386e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-03-24 13:24:36 -07:00
Lennert Buytenhek
e84665c9cb dsa: add switch chip cascading support
The initial version of the DSA driver only supported a single switch
chip per network interface, while DSA-capable switch chips can be
interconnected to form a tree of switch chips.  This patch adds support
for multiple switch chips on a network interface.

An example topology for a 16-port device with an embedded CPU is as
follows:

	+-----+          +--------+       +--------+
	|     |eth0    10| switch |9    10| switch |
	| CPU +----------+        +-------+        |
	|     |          | chip 0 |       | chip 1 |
	+-----+          +---++---+       +---++---+
	                     ||               ||
	                     ||               ||
	                     ||1000baseT      ||1000baseT
	                     ||ports 1-8      ||ports 9-16

This requires a couple of interdependent changes in the DSA layer:

- The dsa platform driver data needs to be extended: there is still
  only one netdevice per DSA driver instance (eth0 in the example
  above), but each of the switch chips in the tree needs its own
  mii_bus device pointer, MII management bus address, and port name
  array. (include/net/dsa.h)  The existing in-tree dsa users need
  some small changes to deal with this. (arch/arm)

- The DSA and Ethertype DSA tagging modules need to be extended to
  use the DSA device ID field on receive and demultiplex the packet
  accordingly, and fill in the DSA device ID field on transmit
  according to which switch chip the packet is heading to.
  (net/dsa/tag_{dsa,edsa}.c)

- The concept of "CPU port", which is the switch chip port that the
  CPU is connected to (port 10 on switch chip 0 in the example), needs
  to be extended with the concept of "upstream port", which is the
  port on the switch chip that will bring us one hop closer to the CPU
  (port 10 for both switch chips in the example above).

- The dsa platform data needs to specify which ports on which switch
  chips are links to other switch chips, so that we can enable DSA
  tagging mode on them.  (For inter-switch links, we always use
  non-EtherType DSA tagging, since it has lower overhead.  The CPU
  link uses dsa or edsa tagging depending on what the 'root' switch
  chip supports.)  This is done by specifying "dsa" for the given
  port in the port array.

- The dsa platform data needs to be extended with information on via
  which port to reach any given switch chip from any given switch chip.
  This info is specified via the per-switch chip data struct ->rtable[]
  array, which gives the nexthop ports for each of the other switches
  in the tree.

For the example topology above, the dsa platform data would look
something like this:

	static struct dsa_chip_data sw[2] = {
		{
			.mii_bus	= &foo,
			.sw_addr	= 1,
			.port_names[0]	= "p1",
			.port_names[1]	= "p2",
			.port_names[2]	= "p3",
			.port_names[3]	= "p4",
			.port_names[4]	= "p5",
			.port_names[5]	= "p6",
			.port_names[6]	= "p7",
			.port_names[7]	= "p8",
			.port_names[9]	= "dsa",
			.port_names[10]	= "cpu",
			.rtable		= (s8 []){ -1, 9, },
		}, {
			.mii_bus	= &foo,
			.sw_addr	= 2,
			.port_names[0]	= "p9",
			.port_names[1]	= "p10",
			.port_names[2]	= "p11",
			.port_names[3]	= "p12",
			.port_names[4]	= "p13",
			.port_names[5]	= "p14",
			.port_names[6]	= "p15",
			.port_names[7]	= "p16",
			.port_names[10]	= "dsa",
			.rtable		= (s8 []){ 10, -1, },
		},
	},

	static struct dsa_platform_data pd = {
		.netdev		= &foo,
		.nr_switches	= 2,
		.sw		= sw,
	};

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Gary Thomas <gary@mlbassoc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:06:54 -07:00
Stephen Hemminger
7ca98fa234 snap: use const for descriptor
Protocols should be able to use constant value for the descriptor.
Minor whitespace cleanup as well

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:06:50 -07:00
Vlad Yasevich
8d2f9e8116 sctp: Clean up TEST_FRAME hacks.
Remove 2 TEST_FRAME hacks that are no longer needed.  These allowed
sctp regression tests to compile before, but are no longer needed.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:41:09 -07:00
Richard Kennedy
04ec5cfcfd ipv6: reorder struct inet6_ifaddr to remove padding on 64 bit builds
reorder struct inet6_ifaddr to remove padding on 64 bit builds
    
remove 8 bytes of padding so inet6_ifaddr becomes 192 bytes & fits into
a smaller slab.
    
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:29:05 -07:00
Eric Dumazet
5e140dfc1f net: reorder struct Qdisc for better SMP performance
dev_queue_xmit() needs to dirty fields "state", "q", "bstats" and "qstats"

On x86_64 arch, they currently span three cache lines, involving more
cache line ping pongs than necessary, making longer holding of queue spinlock.

We can reduce this to one cache line, by grouping all read-mostly fields
at the beginning of structure. (Or should I say, all highly modified fields
at the end :) )

Before patch :

offsetof(struct Qdisc, state)=0x38
offsetof(struct Qdisc, q)=0x48
offsetof(struct Qdisc, bstats)=0x80
offsetof(struct Qdisc, qstats)=0x90
sizeof(struct Qdisc)=0xc8

After patch :

offsetof(struct Qdisc, state)=0x80
offsetof(struct Qdisc, q)=0x88
offsetof(struct Qdisc, bstats)=0xa0
offsetof(struct Qdisc, qstats)=0xac
sizeof(struct Qdisc)=0xc0

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-20 01:33:32 -07:00
Florian Westphal
711d60a9e7 netfilter: remove nf_ct_l4proto_find_get/nf_ct_l4proto_put
users have been moved to __nf_ct_l4proto_find.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-18 17:30:50 +01:00
David S. Miller
af4330631c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-03-17 15:04:31 -07:00
David S. Miller
2d6a5e9500 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/igb/igb_main.c
	drivers/net/qlge/qlge_main.c
	drivers/net/wireless/ath9k/ath9k.h
	drivers/net/wireless/ath9k/core.h
	drivers/net/wireless/ath9k/hw.c
2009-03-17 15:01:30 -07:00
David S. Miller
4ada8107f4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-03-17 13:12:47 -07:00
Luis R. Rodriguez
7db90f4a25 cfg80211: move enum reg_set_by to nl80211.h
We do this so we can later inform userspace who set the
regulatory domain and provide details of the request.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:40 -04:00
Luis R. Rodriguez
0fee54cab7 cfg80211: remove REGDOM_SET_BY_INIT
This is not used as we can always just assume the first
regulatory domain set will _always_ be a static regulatory
domain. REGDOM_SET_BY_CORE will be the first request from
cfg80211 for a regdomain and that then populates the first
regulatory request.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:39 -04:00
Christoph Paasch
9d2493f88f netfilter: remove IPvX specific parts from nf_conntrack_l4proto.h
Moving the structure definitions to the corresponding IPvX specific header files.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:15:35 +01:00
Pablo Neira Ayuso
b1e93a68ca netfilter: conntrack: don't deliver events for racy packets
This patch skips the delivery of conntrack events if the packet
was drop due to a race condition in the conntrack insertion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:06:42 +01:00
Eric Leblond
ca735b3aaa netfilter: use a linked list of loggers
This patch modifies nf_log to use a linked list of loggers for each
protocol. This list of loggers is read and write protected with a
mutex.

This patch separates registration and binding. To be used as
logging module, a module has to register calling nf_log_register()
and to bind to a protocol it has to call nf_log_bind_pf().
This patch also converts the logging modules to the new API. For nfnetlink_log,
it simply switchs call to register functions to call to bind function and
adds a call to nf_log_register() during init. For other modules, it just
remove a const flag from the logger structure and replace it with a
__read_mostly.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 14:54:21 +01:00
Ilpo Järvinen
0c54b85f28 tcp: simplify tcp_current_mss
There's very little need for most of the callsites to get
tp->xmit_goal_size updated. That will cost us divide as is,
so slice the function in two. Also, the only users of the
tp->xmit_goal_size are directly behind tcp_current_mss(),
so there's no need to store that variable into tcp_sock
at all! The drop of xmit_goal_size currently leaves 16-bit
hole and some reorganization would again be necessary to
change that (but I'm aiming to fill that hole with u16
xmit_goal_size_segs to cache the results of the remaining
divide to get that tso on regression).

Bring xmit_goal_size parts into tcp.c

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:54 -07:00
Ilpo Järvinen
c887e6d2d9 tcp: consolidate paws check
Wow, it was quite tricky to merge that stream of negations
but I think I finally got it right:

check & replace_ts_recent:
(s32)(rcv_tsval - ts_recent) >= 0                  => 0
(s32)(ts_recent - rcv_tsval) <= 0                  => 0

discard:
(s32)(ts_recent - rcv_tsval)  > TCP_PAWS_WINDOW    => 1
(s32)(ts_recent - rcv_tsval) <= TCP_PAWS_WINDOW    => 0

I toggled the return values of tcp_paws_check around since
the old encoding added yet-another negation making tracking
of truth-values really complicated.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:52 -07:00
Eric W. Biederman
17edde5209 netns: Remove net_alive
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:27 -08:00
Vlad Yasevich
7e99013a50 sctp: Fix broken RTO-doubling for data retransmits
Commit faee47cdbf
(sctp: Fix the RTO-doubling on idle-link heartbeats)
broke the RTO doubling for data retransmits.  If the
heartbeat was sent before the data T3-rtx time, the
the RTO will not double upon the T3-rtx expiration.
Distingish between the operations by passing an argument
to the function.

Additionally, Wei Youngjun pointed out that our treatment
of requested HEARTBEATS and timer HEARTBEATS is the same
wrt resetting congestion window.  That needs to be separated,
since user requested HEARTBEATS should not treat the link
as idle.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:18 -08:00
Hantzis Fotis
ee7537b63a tcp: tcp_init_wl / tcp_update_wl argument cleanup
The above functions from include/net/tcp.h have been defined with an
argument that they never use. The argument is 'u32 ack' which is never
used inside the function body, and thus it can be removed. The rest of
the patch involves the necessary changes to the function callers of the
above two functions.

Signed-off-by: Hantzis Fotis <xantzis@ceid.upatras.gr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:42:02 -08:00
Ilpo Järvinen
cabeccbd17 tcp: kill eff_sacks "cache", the sole user can calculate itself
Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00
Ilpo Järvinen
758ce5c8d1 tcp: add helper for AI algorithm
It seems that implementation in yeah was inconsistent to what
other did as it would increase cwnd one ack earlier than the
others do.

Size benefits:

  bictcp_cong_avoid |  -36
  tcp_cong_avoid_ai |  +52
  bictcp_cong_avoid |  -34
  tcp_scalable_cong_avoid |  -36
  tcp_veno_cong_avoid |  -12
  tcp_yeah_cong_avoid |  -38

= -104 bytes total

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:15 -08:00
David S. Miller
8010dc306b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-02-28 22:32:16 -08:00
Luis R. Rodriguez
e38f8a7a8b cfg80211: Add AP beacon regulatory hints
When devices are world roaming they cannot beacon or do active scan
on 5 GHz or on channels 12, 13 and 14 on the 2 GHz band. Although
we have a good regulatory API some cards may _always_ world roam, this
is also true when a system does not have CRDA present. Devices doing world
roaming can still passive scan, if they find a beacon from an AP on
one of the world roaming frequencies we make the assumption we can do
the same and we also remove the passive scan requirement.

This adds support for providing beacon regulatory hints based on scans.
This works for devices that do either hardware or software scanning.
If a channel has not yet been marked as having had a beacon present
on it we queue the beacon hint processing into the workqueue.

All wireless devices will benefit from beacon regulatory hints from
any wireless device on a system including new devices connected to
the system at a later time.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:59 -05:00
Luis R. Rodriguez
fe33eb3908 cfg80211: move all regulatory hints to workqueue
All regulatory hints (core, driver, userspace and 11d) are now processed in
a workqueue.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:57 -05:00
Luis R. Rodriguez
806a9e3967 cfg80211: make regulatory_request use wiphy_idx instead of wiphy
We do this so later on we can move the pending requests onto a
workqueue. By using the wiphy_idx instead of the wiphy we can
later easily check if the wiphy has disappeared or not.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:56 -05:00
Michael Buesch
80e775bf08 mac80211: Add software scan notifiers
This adds optional notifier functions for software scan.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:51 -05:00
Johannes Berg
4aa188e1a8 mac80211/cfg80211: move iwrange handler to cfg80211
The previous patch made cfg80211 generally aware of the signal
type a given hardware will give, so now it can implement
SIOCGIWRANGE itself, removing more wext stuff from mac80211.
Might need to be a little more parametrized once we have
more hardware using cfg80211 and new hardware capabilities.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:42 -05:00
Johannes Berg
77965c970d cfg80211: clean up signal type
It wasn't a good idea to make the signal type a per-BSS option,
although then it is closer to the actual value. Move it to be
a per-wiphy setting, update mac80211 to match.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:42 -05:00
Jouni Malinen
98c8a60a04 nl80211: Provide access to STA TX/RX packet counters
The TX/RX packet counters are needed to fill in RADIUS Accounting
attributes Acct-Output-Packets and Acct-Input-Packets. We already
collect the needed information, but only the TX/RX bytes were
previously exposed through nl80211. Allow applications to fetch the
packet counters, too, to provide more complete support for accounting.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:39 -05:00
Jouni Malinen
70692ad292 nl80211: Optional IEs into scan request
This extends the NL80211_CMD_TRIGGER_SCAN command to allow applications
to specify a set of information element(s) to be added into Probe
Request frames with NL80211_ATTR_IE. This provides support for the
MLME-SCAN.request primitive parameter VendorSpecificInfo and can be
used, e.g., to implement WPS scanning.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:38 -05:00
Sujith
81cb7623ad mac80211: Extend the rate control API with an update callback
The AP can switch dynamically between 20/40 Mhz channel width,
in which case we switch the local operating channel, but the
rate control algorithm is not notified. This patch adds a new callback
to indicate such changes to the RC algorithm.

Currently, HT channel width change is notified, but this callback
can be used to indicate any new requirements that might come up later on.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:51:45 -05:00
Johannes Berg
96f5e66e8a mac80211: fix aggregation for hardware with ampdu queues
Hardware with AMPDU queues currently has broken aggregation.

This patch fixes it by making all A-MPDUs go over the regular AC queues,
but keeping track of the hardware queues in mac80211. As a first rough
version, it actually stops the AC queue for extended periods of time,
which can be removed by adding buffering internal to mac80211, but is
currently not a huge problem because people rarely use multiple TIDs
that are in the same AC (and iwlwifi currently doesn't operate as AP).

This is a short-term fix, my current medium-term plan, which I hope to
execute soon as well, but am not sure can finish before .30, looks like
this:
 1) rework the internal queuing layer in mac80211 that we use for
    fragments if the driver stopped queue in the middle of a fragmented
    frame to be able to queue more frames at once (rather than just a
    single frame with its fragments)
 2) instead of stopping the entire AC queue, queue up the frames in a
    per-station/per-TID queue during aggregation session initiation,
    when the session has come up take all those frames and put them
    onto the queue from 1)
 3) push the ampdu queue layer abstraction this patch introduces in
    mac80211 into the driver, and remove the virtual queue stuff from
    mac80211 again

This plan will probably also affect ath9k in that mac80211 queues the
frames instead of passing them down, even when there are no ampdu queues.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:51:42 -05:00
Dan Williams
f3734ee6df make net/ieee80211.h private to ipw2x00
Only ipw2x00 now uses it.  Reduce confusion.  Profit!

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:51:42 -05:00
Hannes Eder
56bca31ff1 inet fragments: fix sparse warning: context imbalance
Impact: Attribute function with __releases(...)

Fix this sparse warning:
  net/ipv4/inet_fragment.c:276:35: warning: context imbalance in 'inet_frag_find' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:35 -08:00
Marcel Holtmann
2950f21acb Bluetooth: Ask upper layers for HCI disconnect reason
Some of the qualification tests demand that in case of failures in L2CAP
the HCI disconnect should indicate a reason why L2CAP fails. This is a
bluntly layer violation since multiple L2CAP connections could be using
the same ACL and thus forcing a disconnect reason is not a good idea.

To comply with the Bluetooth test specification, the disconnect reason
is now stored in the L2CAP connection structure and every time a new
L2CAP channel is added it will set back to its default. So only in the
case where the L2CAP channel with the disconnect reason is really the
last one, it will propagated to the HCI layer.

The HCI layer has been extended with a disconnect indication that allows
it to ask upper layers for a disconnect reason. The upper layer must not
support this callback and in that case it will nicely default to the
existing behavior. If an upper layer like L2CAP can provide a disconnect
reason that one will be used to disconnect the ACL or SCO link.

No modification to the ACL disconnect timeout have been made. So in case
of Linux to Linux connection the initiator will disconnect the ACL link
before the acceptor side can signal the specific disconnect reason. That
is perfectly fine since Linux doesn't make use of this value anyway. The
L2CAP layer has a perfect valid error code for rejecting connection due
to a security violation. It is unclear why the Bluetooth specification
insists on having specific HCI disconnect reason.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:43 +01:00
Marcel Holtmann
f29972de8e Bluetooth: Add CID field to L2CAP socket address structure
In preparation for L2CAP fixed channel support, the CID value of a
L2CAP connection needs to be accessible via the socket interface. The
CID is the connection identifier and exists as source and destination
value. So extend the L2CAP socket address structure with this field and
change getsockname() and getpeername() to fill it in.

The bind() and connect() functions have been modified to handle L2CAP
socket address structures of variable sizes. This makes them future
proof if additional fields need to be added.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:42 +01:00
Marcel Holtmann
e1027a7c69 Bluetooth: Request L2CAP fixed channel list if available
If the extended features mask indicates support for fixed channels,
request the list of available fixed channels. This also enables the
fixed channel features bit so remote implementations can request
information about it. Currently only the signal channel will be
listed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:42 +01:00
Marcel Holtmann
435fef20ac Bluetooth: Don't enforce authentication for L2CAP PSM 1 and 3
The recommendation for the L2CAP PSM 1 (SDP) is to not use any kind
of authentication or encryption. So don't trigger authentication
for incoming and outgoing SDP connections.

For L2CAP PSM 3 (RFCOMM) there is no clear requirement, but with
Bluetooth 2.1 the initiator is required to enable authentication
and encryption first and this gets enforced. So there is no need
to trigger an additional authentication step. The RFCOMM service
security will make sure that a secure enough link key is present.

When the encryption gets enabled after the SDP connection setup,
then switch the security level from SDP to low security.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:41 +01:00
Marcel Holtmann
6a8d3010b3 Bluetooth: Fix double L2CAP connection request
If the remote L2CAP server uses authentication pending stage and
encryption is enabled it can happen that a L2CAP connection request is
sent twice due to a race condition in the connection state machine.

When the remote side indicates any kind of connection pending, then
track this state and skip sending of L2CAP commands for this period.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:41 +01:00
Marcel Holtmann
984947dc64 Bluetooth: Fix race condition with L2CAP information request
When two L2CAP connections are requested quickly after the ACL link has
been established there exists a window for a race condition where a
connection request is sent before the information response has been
received. Any connection request should only be sent after an exchange
of the extended features mask has been finished.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:41 +01:00
Marcel Holtmann
0684e5f9fb Bluetooth: Use general bonding whenever possible
When receiving incoming connection to specific services, always use
general bonding. This ensures that the link key gets stored and can be
used for further authentications.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:40 +01:00
Marcel Holtmann
efc7688b55 Bluetooth: Add SCO fallback for eSCO connection attempts
When attempting to setup eSCO connections it can happen that some link
manager implementations fail to properly negotiate the eSCO parameters
and thus fail the eSCO setup. Normally the link manager is responsible
for the negotiation of the parameters and actually fallback to SCO if
no agreement can be reached. In cases where the link manager is just too
stupid, then at least try to establish a SCO link if eSCO fails.

For the Bluetooth devices with EDR support this includes handling packet
types of EDR basebands. This is particular tricky since for the EDR the
logic of enabling/disabling one specific packet type is turned around.
This fix contains an extra bitmask to disable eSCO EDR packet when
trying to fallback to a SCO connection.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:37 +01:00
Marcel Holtmann
8c84b83076 Bluetooth: Pause RFCOMM TX when encryption drops
A role switch with devices following the Bluetooth pre-2.1 standards
or without Encryption Pause and Resume support is not possible if
encryption is enabled. Most newer headsets require the role switch,
but also require that the connection is encrypted.

For connections with a high security mode setting, the link will be
immediately dropped. When the connection uses medium security mode
setting, then a grace period is introduced where the TX is halted and
the remote device gets a change to re-enable encryption after the
role switch. If not re-enabled the link will be dropped.

Based on initial work by Ville Tervo <ville.tervo@nokia.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:33 +01:00
Marcel Holtmann
9f2c8a03fb Bluetooth: Replace RFCOMM link mode with security level
Change the RFCOMM internals to use the new security levels and remove
the link mode details.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:26 +01:00
Marcel Holtmann
2af6b9d518 Bluetooth: Replace L2CAP link mode with security level
Change the L2CAP internals to use the new security levels and remove
the link mode details.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:26 +01:00
Marcel Holtmann
8c1b235594 Bluetooth: Add enhanced security model for Simple Pairing
The current security model is based around the flags AUTH, ENCRYPT and
SECURE. Starting with support for the Bluetooth 2.1 specification this is
no longer sufficient. The different security levels are now defined as
SDP, LOW, MEDIUM and SECURE.

Previously it was possible to set each security independently, but this
actually doesn't make a lot of sense. For Bluetooth the encryption depends
on a previous successful authentication. Also you can only update your
existing link key if you successfully created at least one before. And of
course the update of link keys without having proper encryption in place
is a security issue.

The new security levels from the Bluetooth 2.1 specification are now
used internally. All old settings are mapped to the new values and this
way it ensures that old applications still work. The only limitation
is that it is no longer possible to set authentication without also
enabling encryption. No application should have done this anyway since
this is actually a security issue. Without encryption the integrity of
the authentication can't be guaranteed.

As default for a new L2CAP or RFCOMM connection, the LOW security level
is used. The only exception here are the service discovery sessions on
PSM 1 where SDP level is used. To have similar security strength as with
a Bluetooth 2.0 and before combination key, the MEDIUM level should be
used. This is according to the Bluetooth specification. The MEDIUM level
will not require any kind of man-in-the-middle (MITM) protection. Only
the HIGH security level will require this.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:25 +01:00
Marcel Holtmann
bb23c0ab82 Bluetooth: Add support for deferring RFCOMM connection setup
In order to decide if listening RFCOMM sockets should be accept()ed
the BD_ADDR of the remote device needs to be known. This patch adds
a socket option which defines a timeout for deferring the actual
connection setup.

The connection setup is done after reading from the socket for the
first time. Until then writing to the socket returns ENOTCONN.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:23 +01:00
Marcel Holtmann
c4f912e155 Bluetooth: Add global deferred socket parameter
The L2CAP and RFCOMM applications require support for authorization
and the ability of rejecting incoming connection requests. The socket
interface is not really able to support this.

This patch does the ground work for a socket option to defer connection
setup. Setting this option allows calling of accept() and then the
first read() will trigger the final connection setup. Calling close()
would reject the connection.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:23 +01:00
David S. Miller
f11c179eea Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/orinoco/orinoco.c
2009-02-25 00:02:05 -08:00
David S. Miller
e70049b9e7 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-24 03:50:29 -08:00
Eric W. Biederman
ce16c5337a netns: Remove net_alive
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:50 -08:00
Hagen Paul Pfeifer
e478075c6f netfilter: nf_conntrack: table max size should hold at least table size
Table size is defined as unsigned, wheres the table maximum size is
defined as a signed integer. The calculation of max is 8 or 4,
multiplied the table size. Therefore the max value is aligned to
unsigned.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20 10:47:09 +01:00
Patrick McHardy
5962fc6d5f netfilter: nf_conntrack: don't try to deliver events for untracked connections
The untracked conntrack actually does usually have events marked for
delivery as its not special-cased in that part of the code. Skip the
actual delivery since it impacts performance noticeably.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 15:30:34 +01:00
David S. Miller
92a0acce18 net: Kill skb_truesize_check(), it only catches false-positives.
A long time ago we had bugs, primarily in TCP, where we would modify
skb->truesize (for TSO queue collapsing) in ways which would corrupt
the socket memory accounting.

skb_truesize_check() was added in order to try and catch this error
more systematically.

However this debugging check has morphed into a Frankenstein of sorts
and these days it does nothing other than catch false-positives.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-17 21:24:05 -08:00
Vlad Yasevich
914e1c8b69 sctp: Inherit all socket options from parent correctly.
During peeloff/accept() sctp needs to save the parent socket state
into the new socket so that any options set on the parent are
inherited by the child socket.  This was found when the
parent/listener socket issues SO_BINDTODEVICE, but the
data was misrouted after a route cache flush.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16 00:03:11 -08:00
Vlad Yasevich
faee47cdbf sctp: Fix the RTO-doubling on idle-link heartbeats
SCTP incorrectly doubles rto ever time a Hearbeat chunk
is generated.   However RFC 4960 states:

   On an idle destination address that is allowed to heartbeat, it is
   recommended that a HEARTBEAT chunk is sent once per RTO of that
   destination address plus the protocol parameter 'HB.interval', with
   jittering of +/- 50% of the RTO value, and exponential backoff of the
   RTO if the previous HEARTBEAT is unanswered.

Essentially, of if the heartbean is unacknowledged, do we double the RTO.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16 00:03:10 -08:00
Vlad Yasevich
4458f04c02 sctp: Clean up sctp checksumming code
The sctp crc32c checksum is always generated in little endian.
So, we clean up the code to treat it as little endian and remove
all the __force casts.

Suggested by Herbert Xu.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16 00:03:10 -08:00
Lucas Nussbaum
06e868066e sctp: Allow to disable SCTP checksums via module parameter
This is a new version of my patch, now using a module parameter instead
of a sysctl, so that the option is harder to find. Please note that,
once the module is loaded, it is still possible to change the value of
the parameter in /sys/module/sctp/parameters/, which is useful if you
want to do performance comparisons without rebooting.

Computation of SCTP checksums significantly affects the performance of
SCTP. For example, using two dual-Opteron 246 connected using a Gbe
network, it was not possible to achieve more than ~730 Mbps, compared to
941 Mbps after disabling SCTP checksums.
Unfortunately, SCTP checksum offloading in NICs is not commonly
available (yet).

By default, checksums are still enabled, of course.

Signed-off-by: Lucas Nussbaum <lucas.nussbaum@ens-lyon.fr>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16 00:03:09 -08:00
Patrick Ohly
51f31cabe3 ip: support for TX timestamps on UDP and RAW sockets
Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-15 22:43:38 -08:00
Patrick Ohly
20d4947353 net: socket infrastructure for SO_TIMESTAMPING
The overlap with the old SO_TIMESTAMP[NS] options is handled so
that time stamping in software (net_enable_timestamp()) is
enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
is set.  It's disabled if all of these are off.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-15 22:43:35 -08:00
David S. Miller
5e30589521 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-agn.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-02-14 23:12:00 -08:00
David S. Miller
ac178ef0ae Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-02-14 23:06:44 -08:00
Harvey Harrison
f3a7c66b5c net: replace __constant_{endian} uses in net headers
Base versions handle constant folding now.  For headers exposed to
userspace, we must only expose the __ prefixed versions.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-14 22:58:35 -08:00
Johannes Berg
79420f09e7 cfg80211: add more flexible BSS lookup
Add a more flexible BSS lookup function so that mac80211 or
other drivers can actually use this for getting the BSS to
connect to.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13 13:45:56 -05:00
Johannes Berg
d491af19db cfg80211: allow users to request removing a BSS
This patch introduces cfg80211_unlink_bss, a function to
allow a driver to remove a BSS from the internal list and
make it not show up in scan results any more -- this is
to be used when the driver detects that the BSS is no
longer available.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13 13:45:54 -05:00
Johannes Berg
78c1c7e109 cfg80211: free_priv for BSS info
When cfg80211 users have their own allocated data in the per-BSS
private data, they will need to free this when the BSS struct is
destroyed. Add a free_priv method and fix one place where the BSS
was kfree'd rather than released properly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13 13:45:53 -05:00
Johannes Berg
2a51931192 cfg80211/nl80211: scanning (and mac80211 update to use it)
This patch adds basic scan capability to cfg80211/nl80211 and
changes mac80211 to use it. The BSS list that cfg80211 maintains
is made driver-accessible with a private area in each BSS struct,
but mac80211 doesn't yet use it. That's another large project.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13 13:45:49 -05:00
Alina Friedrichsen
7b08b3b4a9 mac80211: Remove TSF atomic requirement from the documentation
The atomic requirement for the TSF callbacks
is outdated. get_tsf() is only called by
ieee80211_rx_bss_info() which is indirectly
called by the work queue ieee80211_sta_work().
In the same context are called several other
non-atomic functions, too.
And the atomic requirement causes problems
for drivers of USB wifi cards.

Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13 13:44:40 -05:00
Andrew Morton
9970937273 net: don't use in_atomic() in gfp_any()
The problem is that in_atomic() will return false inside spinlocks if
CONFIG_PREEMPT=n.  This will lead to deadlockable GFP_KERNEL allocations
from spinlocked regions.

Secondly, if CONFIG_PREEMPT=y, this bug solves itself because networking
will instead use GFP_ATOMIC from this callsite.  Hence we won't get the
might_sleep() debugging warnings which would have informed us of the buggy
callsites.

Solve both these problems by switching to in_interrupt().  Now, if someone
runs a gfp_any() allocation from inside spinlock we will get the warning
if CONFIG_PREEMPT=y.

I reviewed all callsites and most of them were too complex for my little
brain and none of them documented their interface requirements.  I have no
idea what this patch will do.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-12 16:43:17 -08:00
Johannes Berg
7fee5372d8 mac80211: remove HW_SIGNAL_DB
Giving the signal in dB isn't much more useful to userspace
than giving the signal in unspecified units. This removes
some radiotap information for zd1211 (the only driver using
this flag), but it helps a lot for getting cfg80211-based
scanning which won't support dB, and zd1211 being dB is a
little fishy anyway.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-09 15:03:44 -05:00
Herbert Xu
4cc7f68d65 net: Reexport sock_alloc_send_pskb
The function sock_alloc_send_pskb is completely useless if not
exported since most of the code in it won't be used as is.  In
fact, this code has already been duplicated in the tun driver.

Now that we need accounting in the tun driver, we can in fact
use this function as is.  So this patch marks it for export again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:55:54 -08:00
David S. Miller
1725d409ca Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-02-03 12:41:58 -08:00
Eric Dumazet
24dd1fa184 net: move bsockets outside of read only beginning of struct inet_hashinfo
And switch bsockets to atomic_t since it might be changed in parallel.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-01 12:31:33 -08:00
Jarek Poplawski
b00355db3f pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler.
Patrick McHardy <kaber@trash.net> suggested:
> How about making this flag and the warning message (in a out-of-line
> function) globally available? Other qdiscs (f.i. HFSC) can't deal with
> inner non-work-conserving qdiscs as well.

This patch uses qdisc->flags field of "suspected" child qdisc.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-01 01:12:42 -08:00
David S. Miller
05bee47377 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/e1000/e1000_main.c
2009-01-30 14:31:07 -08:00
Alina Friedrichsen
3b5d665b51 mac80211: Generic TSF debugging
This patch enables low-level driver independent debugging of the TSF and remove the driver specific things of ath5k and ath9k from the debugfs.

Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:46 -05:00
Johannes Berg
c771c9d8da mac80211: add interface list lock
Using only the RTNL has a number of problems, most notably that
ieee80211_iterate_active_interfaces() and other interface list
traversals cannot be done from the internal workqueue because it
needs to be flushed under the RTNL.

This patch introduces a new mutex that protects the interface list
against modifications. A more detailed explanation is part of the
code change.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:45 -05:00
Luis R. Rodriguez
9a95371aa2 mac80211: allow mac80211 drivers to get to struct ieee80211_hw from wiphy
If a driver is given a wiphy and it wants to get to its private
mac80211 driver area it can use wiphy_to_ieee80211_hw() to get first
to its ieee80211_hw and then access the private structure via hw->priv. The
wiphy_priv() is already being used internally by mac80211 and drivers
should not use this. This can be helpful in a drivers reg_notifier().

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:19 -05:00
Luis R. Rodriguez
f976376de0 cfg80211: Allow for strict regulatory settings
This allows drivers to request strict regulatory settings to
be applied to its devices. This is desirable for devices where
proper calibration and compliance can only be gauranteed for
for the device's programmed regulatory domain. Regulatory
domain settings will be ignored until the device's own
regulatory domain is properly configured. If no regulatory
domain is received only the world regulatory domain will be
applied -- if OLD_REG (default to "US") is not enabled. If
OLD_REG behaviour is not acceptable to drivers they must
update their wiphy with a custom reuglatory prior to wiphy
registration.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:18 -05:00
Luis R. Rodriguez
716f9392e2 cfg80211: pass more detailed regulatory request information on reg_notifier()
Drivers may need more information than just who set the last regulatory domain,
as such lets just pass the last regulatory_request receipt. To do this we need
to move out to headers struct regulatory_request, and enum environment_cap. While
at it lets add documentation for enum environment_cap.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:17 -05:00
Luis R. Rodriguez
2a44f911d8 cfg80211: rename fw_handles_regulatory to custom_regulatory
Drivers without firmware can also have custom regulatory maps
which do not map to a specific ISO / IEC alpha2 country code.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:16 -05:00
Luis R. Rodriguez
34f573473a cfg80211: export freq_reg_info()
This can be used by drivers on the reg_notifier()

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:14 -05:00
Luis R. Rodriguez
1fa25e4136 cfg80211: add wiphy_apply_custom_regulatory()
This adds wiphy_apply_custom_regulatory() to be used by drivers
prior to wiphy registration to apply a custom regulatory domain.
This can be used by drivers that do not have a direct 1-1 mapping
between a regulatory domain and a country.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:14 -05:00
Johannes Berg
078e1e60dd mac80211: Add capability to enable/disable beaconing
This patch adds a flag to notify drivers to start and stop
beaconing when needed, for example, during a scan run. Based
on Sujith's first patch to do the same, but now disables
beaconing for all virtual interfaces while scanning, has a
separate change flag and tracks user-space requests.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:13 -05:00
Sujith
2134e7e724 mac80211: Add documentation bits for mac80211_rate_control_flags
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:10 -05:00
Johannes Berg
881d948c23 wireless: restrict to 32 legacy rates
Since the standards only define 12 legacy rates, 32 is certainly
a sane upper limit and we don't need to use u64 everywhere. Add
sanity checking that no more than 32 rates are registered and
change the variables to u32 throughout.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:09 -05:00
Johannes Berg
5f936f1161 mac80211: constify ieee80211_if_conf.bssid
Then one place can be a static const.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:01:07 -05:00
Johannes Berg
0378b3f1c4 cfg80211: add PM hooks
This should help implement suspend/resume in mac80211, these
hooks will be run before the device is suspended and after it
resumes. Therefore, they can touch the hardware as much as
they want to.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:00:51 -05:00
Jouni Malinen
9aed3cc124 nl80211: New command for adding extra IE(s) into management frames
A new nl80211 command, NL80211_CMD_SET_MGMT_EXTRA_IE, can be used to
add arbitrary IE data into the end of management frames. The interface
allows extra IEs to be configured for each management frame subtype, but
only some of them (ProbeReq, ProbeResp, Auth, (Re)AssocReq, Deauth,
Disassoc) are currently accepted in mac80211 implementation.

This makes it easier to implement IEEE 802.11 extensions like WPS and
FT that add IE(s) into some management frames. In addition, this can
be useful for testing and experimentation purposes.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:00:35 -05:00
Bob Copeland
6dd1bf3118 mac80211: document return codes from ops callbacks
For any callbacks in ieee80211_ops, specify what values the return
codes represent.  While at it, fix a couple of capitalization and
punctuation differences.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Reviewed-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:00:17 -05:00
Jouni Malinen
4375d08350 mac80211: 802.11w - Add driver capability flag for MFP
This allows user space to determine whether a driver supports MFP and
behave properly without having to ask user to configure this in
MFP-optional mode.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:00:08 -05:00
Jouni Malinen
1f7d77ab69 mac80211: 802.11w - Optional software CCMP for management frames
If driver/firmware/hardware does not support CCMP for management
frames, it can now request mac80211 to take care of encrypting and
decrypting management frames (when MFP is enabled) in software. The
will need to add this new IEEE80211_KEY_FLAG_SW_MGMT flag when a CCMP
key is being configured for TX side and return the undecrypted frames
on RX side without RX_FLAG_DECRYPTED flag to use software CCMP for
management frames (but hardware for data frames).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:00:08 -05:00
Jouni Malinen
3cfcf6ac6d mac80211: 802.11w - Use BIP (AES-128-CMAC)
Add mechanism for managing BIP keys (IGTK) and integrate BIP into the
TX/RX paths.

Signed-off-by: Jouni Malinen <j@w1.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:00:03 -05:00
Jouni Malinen
5394af4d86 mac80211: 802.11w - STA flag for MFP
Add flags for setting STA entries and struct ieee80211_if_sta to
indicate whether management frame protection (MFP) is used.

Signed-off-by: Jouni Malinen <j@w1.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 16:00:00 -05:00
Luis R. Rodriguez
3e0c3ff36c cfg80211: allow multiple driver regulatory_hints()
We add support for multiple drivers to provide a regulatory_hint()
on a system by adding a wiphy specific regulatory domain cache.
This allows drivers to keep around cache their own regulatory domain
structure queried from CRDA.

We handle conflicts by intersecting multiple regulatory domains,
each driver will stick to its own regulatory domain though unless
a country IE has been received and processed.

If the user already requested a regulatory domain and a driver
requests the same regulatory domain then simply copy to the
driver's regd the same regulatory domain and do not call
CRDA, do not collect $200.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:59 -05:00
Johannes Berg
4be8c3873e mac80211: extend/document powersave API
This modifies hardware flags for powersave to support three different
flags:
 * IEEE80211_HW_SUPPORTS_PS - indicates general PS support
 * IEEE80211_HW_PS_NULLFUNC_STACK - indicates nullfunc sending in software
 * IEEE80211_HW_SUPPORTS_DYNAMIC_PS - indicates dynamic PS on the device

It also adds documentation for all this which explains how to set the
various flags.

Additionally, it fixes a few things:
 * a spot where && was used to test flags
 * enable CONF_PS only when associated again

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:58 -05:00
Johannes Berg
46f2c4bd7e mac80211: move dynamic PS timeout to hardware config
This will be needed for drivers that set the
IEEE80211_HW_NO_STACK_DYNAMIC_PS flag and still
want to handle dynamic PS.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:56 -05:00
Johannes Berg
4797938c5d mac80211: clean up channel type config
The channel_type really doesn't need to be the only member in
a new structure, so remove the struct. Additionally, remove
the _CONF_CHANGE_HT flag and use _CONF_CHANGE_CHANNEL when the
channel type changes, since that's enough of a change to require
reprogramming the hardware anyway.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:55 -05:00
Johannes Berg
2bf30fabad mac80211: remove user_power_level from driver API
I missed this during review of "mac80211: Fix tx power setting",
the user_power_level shouldn't be available to the driver but
rather be an internal value used to calculate the value for the
driver.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:53 -05:00
Johannes Berg
dc822b5db4 mac80211: clean up set_key callback
The set_key callback now seems rather odd, passing a MAC address
instead of a station struct, and a local address instead of a
vif struct. Change that.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Bob Copeland <me@bobcopeland.com> [ath5k]
Acked-by: Ivo van Doorn <ivdoorn@gmail.com> [rt2x00]
Acked-by: Christian Lamparter <chunkeey@web.de> [p54]
Tested-by: Kalle Valo <kalle.valo@nokia.com> [iwl3945]
Tested-by: Samuel Ortiz <samuel@sortiz.org> [iwl3945]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:42 -05:00
Vasanthakumar Thiagarajan
e3c92df08c mac80211: Fix tx power setting
power_level in ieee80211_conf is being used for more than one
purpose. It being used as user configured power limit and the
final power limit given to the driver. By doing so, except very
first time, the tx power limit is taken from min(chan->max_power,
local->hw.conf.power_level) which is not what we want. This patch
defines a new memeber in ieee80211_conf which is meant only for
user configured power limit.

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:36 -05:00
Luis R. Rodriguez
285256a59d mac80211: no need for ht.enabled
We can simply use conf_is_ht() check where needed.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:32 -05:00
Luis R. Rodriguez
10c806b32d mac80211: add HT conf helpers
In HT capable drivers you often need to check if you
are currently using HT20 or HT40. This adds a few small
helpers to let drivers figure that out.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-29 15:59:27 -05:00
Eric Dumazet
94cd3e6cbe net: wrong test in inet_ehash_locks_alloc()
In commit 9db66bdcc8 (net: convert
TCP/DCCP ehash rwlocks to spinlocks), I forgot to change one
occurrence of rwlock_t to spinlock_t

I believe sizeof(raw_spinlock_t) might be > 0 on !CONFIG_SMP if
CONFIG_DEBUG_SPINLOCK while sizeof(raw_rwlock_t) should be 0 in this
case.

Fortunatly, CONFIG_DEBUG_SPINLOCK adds fields to both spinlock_t and
rwlock_t, but at this might change in the future (being able to debug
spinlocks but not rwlocks for example), better to be safe.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-27 17:45:10 -08:00
remi.denis-courmont@nokia
9a3b7a42bb Phonet: use per-namespace devices list
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-26 21:03:35 -08:00
remi.denis-courmont@nokia
660f706d93 Phonet: handle rtnetlink registration failure
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-26 21:03:34 -08:00
remi.denis-courmont@nokia
76e02cf694 Phonet: allow phonet_device_init() to fail, put it to __init section
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-26 21:03:34 -08:00
David S. Miller
3eacdf58c2 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-01-26 17:43:16 -08:00
Vlad Yasevich
9c5ff5f75d sctp: Fix crc32c calculations on big-endian arhes.
crc32c algorithm provides a byteswaped result.  On little-endian
arches, the result ends up in big-endian/network byte order.
On big-endinan arches, the result ends up in little-endian
order and needs to be byte swapped again.  Thus calling cpu_to_le32
gives the right output.

Tested-by: Jukka Taimisto <jukka.taimisto@mail.suomi.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22 14:52:23 -08:00
Benjamin Thery
6c5143dbcf netns: ipmr: declare reg_vif_num per-namespace
Preliminary work to make IPv4 multicast routing netns-aware.

Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4.

At the moment, this variable is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22 13:57:40 -08:00
Benjamin Thery
6f9374a934 netns: ipmr: declare mroute_do_assert and mroute_do_pim per-namespace
Preliminary work to make IPv4 multicast routing netns-aware.

Declare IPv multicast routing variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv4.

At the moment, these variables are only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22 13:57:40 -08:00
Benjamin Thery
1e8fb3b6a4 netns: ipmr: declare counter cache_resolve_queue_len per-namespace
Preliminary work to make IPv4 multicast routing netns-aware.

Declare variable cache_resolve_queue_len per-namespace: move it into
struct netns_ipv4.

This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ipmr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc_cache.

Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.

At the moment, cache_resolve_queue_len is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22 13:57:39 -08:00
Benjamin Thery
2bb8b26c3e netns: ipmr: dynamically allocate mfc_cache_array
Preliminary work to make IPv4 multicast routing netns-aware.

Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array,
and move it to struct netns_ipv4.

At the moment, mfc_cache_array is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22 13:57:38 -08:00
Benjamin Thery
cf958ae377 netns: ipmr: dynamically allocate vif_table
Preliminary work to make IPv6 multicast routing netns-aware.

Dynamically allocate interface table vif_table and move it to
struct netns_ipv4, and update MIF_EXISTS() macro.

At the moment, vif_table is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22 13:57:34 -08:00
Benjamin Thery
70a269e6c9 netns: ipmr: allocate mroute_socket per-namespace.
Preliminary work to make IPv4 multicast routing netns-aware.

Make IPv4 multicast routing mroute_socket per-namespace,
moves it into struct netns_ipv4.

At the moment, mroute_socket is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22 13:57:34 -08:00
Evgeniy Polyakov
a9d8f9110d inet: Allowing more than 64k connections and heavily optimize bind(0) time.
With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.

At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.

Blue area corresponds to the port selection optimization.

This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.

Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Tested-by: Denys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:34:31 -08:00
Stephen Hemminger
b51414b691 netrom: convert to internal net_device_stats
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:02:01 -08:00
Stephen Hemminger
1a6afe8a73 clip: convert to internal network_device_stats
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:01:59 -08:00
Randy Dunlap
b6b50a2162 mac80211: more kernel-doc fixes
Fix (delete) more mac80211 kernel-doc:

Warning(linux-2.6.28-git13//include/net/mac80211.h:375): Excess struct/union/enum/typedef member 'retry_count' description in 'ieee80211_tx_info'
Warning(linux-2.6.28-git13//net/mac80211/sta_info.h:308): Excess struct/union/enum/typedef member 'last_txrate' description in 'sta_info'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-01-16 17:08:23 -05:00
Inaky Perez-Gonzalez
2a4d71d69f wimax: fix typo in kernel-doc for debugfs_dentry in struct wimax_dev
The kernel-doc was referring to member @debufs_dentry instead of
@debugfs_dentry.

Reported by Randy Dunlap http://marc.info/?l=linux-netdev&m=123147942302885&w=2

As well, escape the colon in the field's text description, as it is
causing the generated text to be erraticly broken up (with paragraphs
moved down). Could not find a reason why it is happening so, even when
other field descriptions use colons and work as expected.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-11 00:06:32 -08:00
Linus Torvalds
d9e8a3a5b8 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
  ioat: fix self test for multi-channel case
  dmaengine: bump initcall level to arch_initcall
  dmaengine: advertise all channels on a device to dma_filter_fn
  dmaengine: use idr for registering dma device numbers
  dmaengine: add a release for dma class devices and dependent infrastructure
  ioat: do not perform removal actions at shutdown
  iop-adma: enable module removal
  iop-adma: kill debug BUG_ON
  iop-adma: let devm do its job, don't duplicate free
  dmaengine: kill enum dma_state_client
  dmaengine: remove 'bigref' infrastructure
  dmaengine: kill struct dma_client and supporting infrastructure
  dmaengine: replace dma_async_client_register with dmaengine_get
  atmel-mci: convert to dma_request_channel and down-level dma_slave
  dmatest: convert to dma_request_channel
  dmaengine: introduce dma_request_channel and private channels
  net_dma: convert to dma_find_channel
  dmaengine: provide a common 'issue_pending_all' implementation
  dmaengine: centralize channel allocation, introduce dma_find_channel
  dmaengine: up-level reference counting to the module level
  ...
2009-01-09 11:52:14 -08:00
Inaky Perez-Gonzalez
56cf391a94 wimax: fix kernel-doc for debufs_dentry member of struct wimax_dev
Reported by Randy Dunlap from a warning in the v2.6.29 merge window
tree as of 2009/1/8.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-08 12:56:57 -08:00
David S. Miller
7f46b1343f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2009-01-08 11:05:59 -08:00
Herbert Xu
787e920836 ipv6: Add GRO support
This patch adds GRO support for IPv6.  IPv6 GRO supports extension
headers in the same way as GSO (by using the same infrastructure).
It's also simpler compared to IPv4 since we no longer have to worry
about fragmentation attributes or header checksums.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-08 10:40:57 -08:00
Inaky Perez-Gonzalez
ace22f0881 wimax: headers for kernel API and user space interaction
Definitions for the user/kernel API protocol through generic
netlink. User space can copy it verbatim and use it.

Kernel API definition declares the main data types and calls for the
drivers to integrate into the WiMAX stack. Provides usage
documentation.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:16 -08:00
James Morris
ac8cc0fa53 Merge branch 'next' into for-linus 2009-01-07 09:58:22 +11:00
Dan Williams
f67b459992 net_dma: convert to dma_find_channel
Use the general-purpose channel allocation provided by dmaengine.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:15 -07:00
Dan Williams
6f49a57aa5 dmaengine: up-level reference counting to the module level
Simply, if a client wants any dmaengine channel then prevent all dmaengine
modules from being removed.  Once the clients are done re-enable module
removal.

Why?, beyond reducing complication:
1/ Tracking reference counts per-transaction in an efficient manner, as
   is currently done, requires a complicated scheme to avoid cache-line
   bouncing effects.
2/ Per-transaction ref-counting gives the false impression that a
   dma-driver can be gracefully removed ahead of its user (net, md, or
   dma-slave)
3/ None of the in-tree dma-drivers talk to hot pluggable hardware, but
   if such an engine were built one day we still would not need to notify
   clients of remove events.  The driver can simply return NULL to a
   ->prep() request, something that is much easier for a client to handle.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:14 -07:00
David S. Miller
14deae4156 ipv6: Fix sporadic sendmsg -EINVAL when sending to multicast groups.
Thanks to excellent diagnosis by Eduard Guzovsky.

The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up.  If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.

IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).

So simply mimick this on the ipv6 side.

Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:04:39 -08:00
Paul Moore
6c2e8ac095 netlabel: Update kernel configuration API
Update the NetLabel kernel API to expose the new features added in kernel
releases 2.6.25 and 2.6.28: the static/fallback label functionality and network
address based selectors.

Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-12-31 12:54:11 -05:00
Linus Torvalds
0191b625ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
  net: Allow dependancies of FDDI & Tokenring to be modular.
  igb: Fix build warning when DCA is disabled.
  net: Fix warning fallout from recent NAPI interface changes.
  gro: Fix potential use after free
  sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
  sfc: When disabling the NIC, close the device rather than unregistering it
  sfc: SFT9001: Add cable diagnostics
  sfc: Add support for multiple PHY self-tests
  sfc: Merge top-level functions for self-tests
  sfc: Clean up PHY mode management in loopback self-test
  sfc: Fix unreliable link detection in some loopback modes
  sfc: Generate unique names for per-NIC workqueues
  802.3ad: use standard ethhdr instead of ad_header
  802.3ad: generalize out mac address initializer
  802.3ad: initialize ports LACPDU from const initializer
  802.3ad: remove typedef around ad_system
  802.3ad: turn ports is_individual into a bool
  802.3ad: turn ports is_enabled into a bool
  802.3ad: make ntt bool
  ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
  ...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-28 12:49:40 -08:00
Linus Torvalds
1db2a5c11e Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (85 commits)
  [S390] provide documentation for hvc_iucv kernel parameter.
  [S390] convert ctcm printks to dev_xxx and pr_xxx macros.
  [S390] convert zfcp printks to pr_xxx macros.
  [S390] convert vmlogrdr printks to pr_xxx macros.
  [S390] convert zfcp dumper printks to pr_xxx macros.
  [S390] convert cpu related printks to pr_xxx macros.
  [S390] convert qeth printks to dev_xxx and pr_xxx macros.
  [S390] convert sclp printks to pr_xxx macros.
  [S390] convert iucv printks to dev_xxx and pr_xxx macros.
  [S390] convert ap_bus printks to pr_xxx macros.
  [S390] convert dcssblk and extmem printks messages to pr_xxx macros.
  [S390] convert monwriter printks to pr_xxx macros.
  [S390] convert s390 debug feature printks to pr_xxx macros.
  [S390] convert monreader printks to pr_xxx macros.
  [S390] convert appldata printks to pr_xxx macros.
  [S390] convert setup printks to pr_xxx macros.
  [S390] convert hypfs printks to pr_xxx macros.
  [S390] convert time printks to pr_xxx macros.
  [S390] convert cpacf printks to pr_xxx macros.
  [S390] convert cio printks to pr_xxx macros.
  ...
2008-12-28 12:33:21 -08:00
Vegard Nossum
619e803d3c netlink: fix (theoretical) overrun in message iteration
See commit 1045b03e07 ("netlink: fix
overrun in attribute iteration") for a detailed explanation of why
this patch is necessary.

In short, nlmsg_next() can make "remaining" go negative, and the
remaining >= sizeof(...) comparison will promote "remaining" to an
unsigned type, which means that the expression will evaluate to
true for negative numbers, even though it was not intended.

I put "theoretical" in the title because I have no evidence that
this can actually happen, but I suspect that a crafted netlink
packet can trigger some badness.

Note that the last test, which seemingly has the exact same
problem (also true for nla_ok()), is perfectly OK, since we
already know that remaining is positive.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 17:21:17 -08:00
Wei Yongjun
aea3c5c05d sctp: Implement socket option SCTP_GET_ASSOC_NUMBER
Implement socket option SCTP_GET_ASSOC_NUMBER of the latest ietf socket
extensions API draft.

  8.2.5.  Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER)

   This option gets the current number of associations that are attached
   to a one-to-many style socket.  The option value is an uint32_t.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:57:24 -08:00
Hendrik Brueckner
91d5d45ee0 [S390] iucv: Locking free version of iucv_message_(receive|send)
Provide a locking free version of iucv_message_receive and iucv_message_send
that do not call local_bh_enable in a spin_lock_(bh|irqsave)() context.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
2008-12-25 13:39:04 +01:00
James Morris
cbacc2c7f0 Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
David S. Miller
6332178d91 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ppp_generic.c
2008-12-23 17:56:23 -08:00
Don Skidmore
1486a61ebc net: fix DCB setstate to return success/failure
Data Center Bridging (DCB) had no way to know if setstate had failed in the
driver.  This patch enables dcb netlink code to handle the status for the DCB
setstate interface.  Likewise it allows the driver to return a failed status
if MSI-X isn't enabled.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-21 20:09:50 -08:00
Kalle Valo
520eb82076 mac80211: implement dynamic power save
This patch implements dynamic power save for mac80211. Basically it
means enabling power save mode after an idle period. Implementing it
dynamically gives a good compromise of low power consumption and low
latency. Some hardware have support for this in firmware, but some
require the host to do it.

The dynamic power save is implemented by adding an timeout to
ieee80211_subif_start_xmit(). The timeout can be enabled from userspace
with Wireless Extensions. For example, the command below enables the
dynamic power save and sets the time timeout to 500 ms:

iwconfig wlan0 power timeout 500m

Power save now only works with devices which handle power save in firmware.
It's also disabled by default and the heuristics when and how to enable is
considered as a policy decision and will be left for the userspace to handle.
In case the firmware has support for this, drivers can disable this feature
with IEEE80211_HW_NO_STACK_DYNAMIC_PS.

Big thanks to Johannes Berg for the help with the design and code.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:24:00 -05:00
Jouni Malinen
0fb8ca45eb mac80211: Add HT rates into RX status reporting
This patch adds option for HT-enabled drivers to report HT rates
(HT20/HT40, short GI, MCS index) to mac80211. These rates are
currently not in the rate table, so the rate_idx is used to indicate
MCS index.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:23:04 -05:00
Sujith
094d05dc32 mac80211: Fix HT channel selection
HT management is done differently for AP and STA modes, unify
to just the ->config() callback since HT is fundamentally a
PHY property and cannot be per-BSS.

Rename enum nl80211_sec_chan_offset as nl80211_channel_type to denote
the channel type ( NO_HT, HT20, HT40+, HT40- ).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:22:54 -05:00
Henning Rogge
420e7fabd9 nl80211: Add signal strength and bandwith to nl80211station info
This patch adds signal strength and transmission bitrate
to the station_info of nl80211.

Signed-off-by: Henning Rogge <rogge@fgan.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:04:54 -05:00
Rémi Denis-Courmont
be677730a0 Phonet: use atomic for packet TX window
GPRS TX flow control won't need to lock the underlying socket anymore.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17 15:48:31 -08:00
Samuel Ortiz
69c30e1e74 irda: Add irda_skb_cb qdisc related padding
We need to pad irda_skb_cb in order to keep it safe accross dev_queue_xmit()
calls. This is some ugly and temporary hack triggered by recent qisc code
changes.
Even though it fixes bugzilla.kernel.org bug #11795, it will be replaced by a
proper fix before 2.6.29 is released.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17 15:44:58 -08:00
Herbert Xu
bf296b125b tcp: Add GRO support
This patch adds the TCP-specific portion of GRO.  The criterion for
merging is extremely strict (the TCP header must match exactly apart
from the checksum) so as to allow refragmentation.  Otherwise this
is pretty much identical to LRO, except that we support the merging
of ECN packets.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:43:36 -08:00
Herbert Xu
73cc19f155 ipv4: Add GRO infrastructure
This patch adds GRO support for IPv4.

The criteria for merging is more stringent than LRO, in particular,
we require all fields in the IP header to be identical except for
the length, ID and checksum.  In addition, the ID must form an
arithmetic sequence with a difference of one.

The ID requirement might seem overly strict, however, most hardware
TSO solutions already obey this rule.  Linux itself also obeys this
whether GSO is in use or not.

In future we could relax this rule by storing the IDs (or rather
making sure that we don't drop them when pulling the aggregate
skb's tail).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:41:09 -08:00
Christian Lamparter
89fad578a6 mac80211: integrate sta_notify_ps cmds into sta_notify
This patch replaces the newly introduced sta_notify_ps function,
which can be used to notify the driver about every power state
transition for all associated stations, by integrating its functionality
back into the original sta_notify callback.

Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 14:01:42 -05:00
Johannes Berg
f546638c3f mac80211: remove fragmentation offload functionality
There's no driver that actually does fragmentation on the
device, and the callback is buggy (when it returns an error,
mac80211's fragmentation status is changed so reading the
frag threshold from userspace reads the new value despite
the error). Let's just remove it, if we really find some
hardware supporting it we can add it back later.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 14:01:33 -05:00
John W. Linville
0f202aa2e1 ieee80211_security: correct warning about width of auth_mode
Also remove auth_algo which is unused.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 13:48:30 -05:00
Johannes Berg
7ba1c04ed7 mac80211: improve sta_notify documentation
Mention more possible STA entries and document the atomic requirement.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 13:48:25 -05:00
Benjamin Thery
950d5704e5 netns: ip6mr: declare reg_vif_num per-namespace
Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare variable 'reg_vif_num' per-namespace, moves into struct netns_ipv6.

At the moment, this variable is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:29:24 -08:00
Benjamin Thery
a21f3f997c netns: ip6mr: declare mroute_do_assert and mroute_do_pim per-namespace
Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare IPv6 multicast forwarding variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv6.

At the moment, these variables are only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:28:44 -08:00
Benjamin Thery
4045e57c19 netns: ip6mr: declare counter cache_resolve_queue_len per-namespace
Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare variable cache_resolve_queue_len per-namespace: moves it into
struct netns_ipv6.

This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine 
ip6mr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in 
struct mfc6_cache.

Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming 
or deleting the timer. These tests were equivalent to testing 
mfc_unres_queue value instead and are replaced in this patch.

At the moment, cache_resolve_queue_len is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:27:21 -08:00
Benjamin Thery
4a6258a0e3 netns: ip6mr: dynamically allocate mfc6_cache_array
Preliminary work to make IPv6 multicast forwarding netns-aware.

Dynamically allocates IPv6 multicast forwarding cache, mfc6_cache_array,
and moves it to struct netns_ipv6. 

At the moment, mfc6_cache_array is only referenced in init_net.

Replace 'ARRAY_SIZE(mfc6_cache_array)' with mfc6_cache_array size: MFC6_LINES.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:24:07 -08:00
Benjamin Thery
4e16880cb4 netns: ip6mr: dynamically allocates vif6_table
Preliminary work to make IPv6 multicast forwarding netns-aware.

Dynamically allocates interface table vif6_table and moves it to 
struct netns_ipv6, and updates MIF_EXISTS() macro. 

At the moment, vif6_table is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:15:08 -08:00
Benjamin Thery
bd91b8bf37 netns: ip6mr: allocate mroute6_socket per-namespace.
Preliminary work to make IPv6 multicast forwarding netns-aware.

Make IPv6 multicast forwarding mroute6_socket per-namespace,
moves it into struct netns_ipv6.

At the moment, mroute6_socket is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:07:08 -08:00
Kalle Valo
8bef7a1001 mac80211: document ieee80211_tx_info.pad
Fixes htmldocs warning:

Warning(mac80211.h:379): No description found for parameter 'pad[2]'

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-05 09:35:45 -05:00
Christian Lamparter
4571d3bf87 mac80211: add sta_notify_ps callback
This patch is necessary in order to provide a proper Access point support for p54.
Unfortunately for us, there is no documented way to disable the interfering
power save buffering mechanism in firmware completely.

Therefore we give in and notify the driver through our new sta_notify_ps callback,
so that we can update the filter state.

Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-05 09:35:43 -05:00
Johannes Berg
007e5ddddf wireless: clean up radiotap a bit
No need to pad the header so no constant needed for that,
no need to carry any version number from netbsd nor CVS
IDs from them.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-05 09:32:59 -05:00
Johannes Berg
e60c7744f8 cfg80211: handle SIOCGIWMODE/SIOCSIWMODE
further reducing wext code in mac80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-05 09:32:58 -05:00
Johannes Berg
fee52678db cfg80211: handle SIOCGIWNAME
This patch moves the SIOCGIWNAME handling from mac80211 to cfg80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-05 09:32:13 -05:00
Jouni Malinen
72bdcf3438 nl80211: Add frequency configuration (including HT40)
This patch adds new NL80211_CMD_SET_WIPHY attributes
NL80211_ATTR_WIPHY_FREQ and NL80211_ATTR_WIPHY_SEC_CHAN_OFFSET to allow
userspace to set the operating channel (e.g., hostapd for AP mode).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-05 09:32:11 -05:00
James Morris
ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Rémi Denis-Courmont
5240488198 Phonet: basic net namespace support
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 15:42:56 -08:00
David S. Miller
3f8c6c9c77 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2008-12-02 22:38:02 -08:00
David S. Miller
aa2ba5f108 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ixgbe/ixgbe_main.c
	drivers/net/smc91x.c
2008-12-02 19:50:27 -08:00
Marcel Holtmann
a418b893a6 Bluetooth: Enable per-module dynamic debug messages
With the introduction of CONFIG_DYNAMIC_PRINTK_DEBUG it is possible to
allow debugging without having to recompile the kernel. This patch turns
all BT_DBG() calls into pr_debug() to support dynamic debug messages.

As a side effect all CONFIG_BT_*_DEBUG statements are now removed and
some broken debug entries have been fixed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-11-30 12:17:28 +01:00
Marcel Holtmann
7a9d402053 Bluetooth: Send HCI Reset command by default on device initialization
The Bluetooth subsystem was not using the HCI Reset command when doing
device initialization. The Bluetooth 1.0b specification was ambiguous
on how the device firmware was suppose to handle it. Almost every device
was triggering a transport reset at the same time. In case of USB this
ended up in disconnects from the bus.

All modern Bluetooth dongles handle this perfectly fine and a lot of
them actually require that HCI Reset is sent. If not then they are
either stuck in their HID Proxy mode or their internal structures for
inquiry and paging are not correctly setup.

To handle old and new devices smoothly the Bluetooth subsystem contains
a quirk to force the HCI Reset on initialization. However maintaining
such a quirk becomes more and more complicated. This patch turns the
logic around and lets the old devices disable the HCI Reset command.

The only device where the HCI_QUIRK_NO_RESET is still needed are the
original Digianswer devices and dongles with an early CSR firmware.

CSR reported that they fixed this for version 12 firmware. The last
official release of version 11 firmware is build ID 115. The first
version 12 candidate was build ID 117.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-11-30 12:17:26 +01:00
David S. Miller
ed77a89c30 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
Conflicts:

	net/netfilter/nf_conntrack_netlink.c
2008-11-28 02:19:15 -08:00
Harvey Harrison
475ad8e217 decnet: compile fix for removal of byteorder wrapper
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-27 23:04:13 -08:00
Harvey Harrison
c4106aa88a decnet: remove private wrappers of endian helpers
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-27 00:12:47 -08:00
David S. Miller
5b9ab2ec04 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/hp-plus.c
	drivers/net/wireless/ath5k/base.c
	drivers/net/wireless/ath9k/recv.c
	net/wireless/reg.c
2008-11-26 23:48:40 -08:00
dann frazier
5f23b73496 net: Fix soft lockups/OOM issues w/ unix garbage collector
This is an implementation of David Miller's suggested fix in:
  https://bugzilla.redhat.com/show_bug.cgi?id=470201

It has been updated to use wait_event() instead of
wait_event_interruptible().

Paraphrasing the description from the above report, it makes sendmsg()
block while UNIX garbage collection is in progress. This avoids a
situation where child processes continue to queue new FDs over a
AF_UNIX socket to a parent which is in the exit path and running
garbage collection on these FDs. This contention can result in soft
lockups and oom-killing of unrelated processes.

Signed-off-by: dann frazier <dannf@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-26 15:32:27 -08:00
David S. Miller
b5ddedc9cc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-11-26 15:28:40 -08:00
Jarek Poplawski
244e6c2d07 pkt_sched: gen_estimator: Optimize gen_estimator_active()
Since all other gen_estimator functions use bstats and rate_est params
together, and searching for them is optimized now, let's use this also
in gen_estimator_active(). The return type of gen_estimator_active()
is changed to bool, and gen_find_node() parameters to const, btw.

In tcf_act_police_locate() a check for ACT_P_CREATED is added before
calling gen_estimator_active().

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-26 15:24:32 -08:00
Eric Dumazet
dd24c00191 net: Use a percpu_counter for orphan_count
Instead of using one atomic_t per protocol, use a percpu_counter
for "orphan_count", to reduce cache line contention on
heavy duty network servers. 

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 21:17:14 -08:00
Eric Dumazet
1748376b66 net: Use a percpu_counter for sockets_allocated
Instead of using one atomic_t per protocol, use a percpu_counter
for "sockets_allocated", to reduce cache line contention on
heavy duty network servers. 

Note : We revert commit (248969ae31
net: af_unix can make unix_nr_socks visbile in /proc),
since it is not anymore used after sock_prot_inuse_add() addition

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 21:16:35 -08:00
Stephen Hemminger
c1b56878fb tc: policing requires a rate estimator
Found that while trying average rate policing, it was possible to
request average rate policing without a rate estimator. This results
in no policing which is harmless but incorrect.

Since policing could be setup in two steps, need to check
in the kernel.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 21:14:06 -08:00
Alexey Dobriyan
b27aeadb59 netns xfrm: per-netns sysctls
Make
	net.core.xfrm_aevent_etime
	net.core.xfrm_acq_expires
	net.core.xfrm_aevent_rseqth
	net.core.xfrm_larval_drop

sysctls per-netns.

For that make net_core_path[] global, register it to prevent two
/proc/net/core antries and change initcall position -- xfrm_init() is called
from fs_initcall, so this one should be fs_initcall at least.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 18:00:48 -08:00
Alexey Dobriyan
c68cd1a01b netns xfrm: /proc/net/xfrm_stat in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 18:00:14 -08:00
Alexey Dobriyan
59c9940ed0 netns xfrm: per-netns MIBs
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:59:52 -08:00
Alexey Dobriyan
fbda33b2b8 netns xfrm: ->get_saddr in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:56:49 -08:00
Alexey Dobriyan
c5b3cf46ea netns xfrm: ->dst_lookup in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:51:25 -08:00
Alexey Dobriyan
db983c1144 netns xfrm: KM reporting in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:51:01 -08:00
Alexey Dobriyan
7067802e26 netns xfrm: pass netns with KM notifications
SA and SPD flush are executed with NULL SA and SPD respectively, for
these cases pass netns explicitly from userspace socket.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:50:36 -08:00
Alexey Dobriyan
a6483b790f netns xfrm: per-netns NETLINK_XFRM socket
Stub senders to init_net's one temporarily.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:38:20 -08:00
Alexey Dobriyan
ddcfd79680 netns xfrm: dst garbage-collecting in netns
Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()

	[This needs more thoughts on what to do with dst_ops]
	[Currently stub to init_net]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:37:23 -08:00
Alexey Dobriyan
99a66657b2 netns xfrm: xfrm_route_forward() in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:36:13 -08:00
Alexey Dobriyan
f6e1e25d70 netns xfrm: xfrm_policy_check in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:35:44 -08:00
Alexey Dobriyan
52479b623d netns xfrm: lookup in netns
Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.

Take it from socket or netdevice. Stub DECnet to init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:35:18 -08:00
Alexey Dobriyan
cdcbca7c1f netns xfrm: policy walking in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:34:49 -08:00
Alexey Dobriyan
8d1211a6aa netns xfrm: finding policy in netns
Add netns parameter to xfrm_policy_bysel_ctx(), xfrm_policy_byidx().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:34:20 -08:00
Alexey Dobriyan
33ffbbd52c netns xfrm: policy flushing in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:33:32 -08:00
Alexey Dobriyan
284fa7da30 netns xfrm: state walking in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:32:14 -08:00
Alexey Dobriyan
5447c5e401 netns xfrm: finding states in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:31:51 -08:00
Alexey Dobriyan
221df1ed33 netns xfrm: state lookup in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:30:50 -08:00
Alexey Dobriyan
0e6024519b netns xfrm: state flush in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:30:18 -08:00
Alexey Dobriyan
66caf628c3 netns xfrm: per-netns policy hash resizing work
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:28:57 -08:00
Alexey Dobriyan
dc2caba7b3 netns xfrm: per-netns policy counts
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:24:15 -08:00
Alexey Dobriyan
a35f6c5de3 netns xfrm: per-netns xfrm_policy_bydst hash
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:23:48 -08:00
Alexey Dobriyan
8b18f8eaf9 netns xfrm: per-netns inexact policies
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:23:26 -08:00
Alexey Dobriyan
8100bea7d6 netns xfrm: per-netns xfrm_policy_byidx hashmask
Per-netns hashes are independently resizeable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:22:58 -08:00
Alexey Dobriyan
93b851c1c9 netns xfrm: per-netns xfrm_policy_byidx hash
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:22:35 -08:00
Alexey Dobriyan
adfcf0b27e netns xfrm: per-netns policy list
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:22:11 -08:00
Alexey Dobriyan
0331b1f383 netns xfrm: add struct xfrm_policy::xp_net
Again, to avoid complications with passing netns when not necessary.
Again, ->xp_net is set-once field, once set it never changes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:21:45 -08:00
Alexey Dobriyan
50a30657fd netns xfrm: per-netns km_waitq
Disallow spurious wakeups in __xfrm_lookup().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:21:01 -08:00
Alexey Dobriyan
c78371441c netns xfrm: per-netns state GC work
State GC is per-netns, and this is part of it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:20:36 -08:00
Alexey Dobriyan
b8a0ae20b0 netns xfrm: per-netns state GC list
km_waitq is going to be made per-netns to disallow spurious wakeups
in __xfrm_lookup().

To not wakeup after every garbage-collected xfrm_state (which potentially
can be from different netns) make state GC list per-netns.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:20:11 -08:00
Alexey Dobriyan
6308273385 netns xfrm: per-netns xfrm_hash_work
All of this is implicit passing which netns's hashes should be resized.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:19:07 -08:00
Alexey Dobriyan
0bf7c5b019 netns xfrm: per-netns xfrm_state counts
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:18:39 -08:00
Alexey Dobriyan
529983ecab netns xfrm: per-netns xfrm_state_hmask
Since hashtables are per-netns, they can be independently resized.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:18:12 -08:00
Alexey Dobriyan
b754a4fd8f netns xfrm: per-netns xfrm_state_byspi hash
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:17:47 -08:00
Alexey Dobriyan
d320bbb306 netns xfrm: per-netns xfrm_state_bysrc hash
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:17:24 -08:00
Alexey Dobriyan
73d189dce4 netns xfrm: per-netns xfrm_state_bydst hash
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:16:58 -08:00
Alexey Dobriyan
9d4139c769 netns xfrm: per-netns xfrm_state_all list
This is done to get
a) simple "something leaked" check
b) cover possible DoSes when other netns puts many, many xfrm_states
   onto a list.
c) not miss "alien xfrm_state" check in some of list iterators in future.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:16:11 -08:00
Alexey Dobriyan
673c09be45 netns xfrm: add struct xfrm_state::xs_net
To avoid unnecessary complications with passing netns around.

* set once, very early after allocating
* once set, never changes

For a while create every xfrm_state in init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:15:16 -08:00
Alexey Dobriyan
d62ddc21b6 netns xfrm: add netns boilerplate
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:14:31 -08:00
Ilpo Järvinen
8eecaba900 tcp: tcp_limit_reno_sacked can become static
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 13:45:29 -08:00
Luis R. Rodriguez
14b9815af3 cfg80211: add support for custom firmware regulatory solutions
This adds API to cfg80211 to allow wireless drivers to inform
us if their firmware can handle regulatory considerations *and*
they cannot map these regulatory domains to an ISO / IEC 3166
alpha2. In these cases we skip the first regulatory hint instead
of expecting the driver to build their own regulatory structure,
providing us with an alpha2, or using the reg_notifier().

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-25 16:41:27 -05:00
Luis R. Rodriguez
3f2355cb91 cfg80211/mac80211: Add 802.11d support
This adds country IE parsing to mac80211 and enables its usage
within the new regulatory infrastructure in cfg80211. We parse
the country IEs only on management beacons for the BSSID you are
associated to and disregard the IEs when the country and environment
(indoor, outdoor, any) matches the already processed country IE.

To avoid following misinformed or outdated APs we build and use
a regulatory domain out of the intersection between what the AP
provides us on the country IE and what CRDA is aware is allowed
on the same country.

A secondary device is allowed to follow only the same country IE
as it make no sense for two devices on a system to be in two
different countries.

In the case the AP is using country IEs for an incorrect country
the user may help compliance further by setting the regulatory
domain before or after the IE is parsed and in that case another
intersection will be performed.

CONFIG_WIRELESS_OLD_REGULATORY is supported but requires CRDA
present.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-25 16:41:26 -05:00
Ingo Molnar
65f233fb16 netfilter: fix warning in net/netfilter/nf_conntrack_proto_tcp.c
fix this warning:

  net/netfilter/nf_conntrack_proto_tcp.c: In function \u2018tcp_in_window\u2019:
  net/netfilter/nf_conntrack_proto_tcp.c:491: warning: unused variable \u2018net\u2019
  net/netfilter/nf_conntrack_proto_tcp.c: In function \u2018tcp_packet\u2019:
  net/netfilter/nf_conntrack_proto_tcp.c:812: warning: unused variable \u2018net\u2019

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-25 18:20:13 +01:00
Ilpo Järvinen
832d11c5cd tcp: Try to restore large SKBs while SACK processing
During SACK processing, most of the benefits of TSO are eaten by
the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
Then we're in problems when cleanup work for them has to be done
when a large cumulative ACK comes. Try to return back to pre-split
state already while more and more SACK info gets discovered by
combining newly discovered SACK areas with the previous skb if
that's SACKed as well.

This approach has a number of benefits:

1) The processing overhead is spread more equally over the RTT
2) Write queue has less skbs to process (affect everything
   which has to walk in the queue past the sacked areas)
3) Write queue is consistent whole the time, so no other parts
   of TCP has to be aware of this (this was not the case with
   some other approach that was, well, quite intrusive all
   around).
4) Clean_rtx_queue can release most of the pages using single
   put_page instead of previous PAGE_SIZE/mss+1 calls

In case a hole is fully filled by the new SACK block, we attempt
to combine the next skb too which allows construction of skbs
that are even larger than what tso split them to and it handles
hole per on every nth patterns that often occur during slow start
overshoot pretty nicely. Though this to be really useful also
a retransmission would have to get lost since cumulative ACKs
advance one hole at a time in the most typical case.

TODO: handle upwards only merging. That should be rather easy
when segment is fully sacked but I'm leaving that as future
work item (it won't make very large difference anyway since
this current approach already covers quite a lot of normal
cases).

I was earlier thinking of some sophisticated way of tracking
timestamps of the first and the last segment but later on
realized that it won't be that necessary at all to store the
timestamp of the last segment. The cases that can occur are
basically either:
  1) ambiguous => no sensible measurement can be taken anyway
  2) non-ambiguous is due to reordering => having the timestamp
     of the last segment there is just skewing things more off
     than does some good since the ack got triggered by one of
     the holes (besides some substle issues that would make
     determining right hole/skb even harder problem). Anyway,
     it has nothing to do with this change then.

I choose to route some abnormal looking cases with goto noop,
some could be handled differently (eg., by stopping the
walking at that skb but again). In general, they either
shouldn't happen at all or are rare enough to make no difference
in practice.

In theory this change (as whole) could cause some macroscale
regression (global) because of cache misses that are taken over
the round-trip time but it gets very likely better because of much
less (local) cache misses per other write queue walkers and the
big recovery clearing cumulative ack.

Worth to note that these benefits would be very easy to get also
without TSO/GSO being on as long as the data is in pages so that
we can merge them. Currently I won't let that happen because
DSACK splitting at fragment that would mess up pcounts due to
sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
avoided, we have some conditions that can be made less strict.

TODO: I will probably have to convert the excessive pointer
passing to struct sacktag_state... :-)

My testing revealed that considerable amount of skbs couldn't
be shifted because they were cloned (most likely still awaiting
tx reclaim)...

[The rest is considering future work instead since I got
repeatably EFAULT to tcpdump's recvfrom when I added
pskb_expand_head to deal with clones, so I separated that
into another, later patch]

...To counter that, I gave up on the fifth advantage:

5) When growing previous SACK block, less allocs for new skbs
   are done, basically a new alloc is needed only when new hole
   is detected and when the previous skb runs out of frags space

...which now only happens of if reclaim is fast enough to dispose
the clone before the SACK block comes in (the window is RTT long),
otherwise we'll have to alloc some.

With clones being handled I got these numbers (will be somewhat
worse without that), taken with fine-grained mibs:

                  TCPSackShifted 398
                   TCPSackMerged 877
            TCPSackShiftFallback 320
      TCPSACKCOLLAPSEFALLBACKGSO 0
  TCPSACKCOLLAPSEFALLBACKSKBBITS 0
  TCPSACKCOLLAPSEFALLBACKSKBDATA 0
    TCPSACKCOLLAPSEFALLBACKBELOW 0
    TCPSACKCOLLAPSEFALLBACKFIRST 1
 TCPSACKCOLLAPSEFALLBACKPREVBITS 318
      TCPSACKCOLLAPSEFALLBACKMSS 1
   TCPSACKCOLLAPSEFALLBACKNOHEAD 0
    TCPSACKCOLLAPSEFALLBACKSHIFT 0
          TCPSACKCOLLAPSENOOPSEQ 0
  TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
     TCPSACKCOLLAPSENOOPSMALLLEN 0
             TCPSACKCOLLAPSEHOLE 12

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 21:20:15 -08:00
Ilpo Järvinen
e1aa680fa4 tcp: move tcp_simple_retransmit to tcp_input
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 21:11:55 -08:00
Eric Dumazet
2e77d89b2f net: avoid a pair of dst_hold()/dst_release() in ip_append_data()
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 15:52:46 -08:00
Eric Dumazet
c25eb3bfb9 net: Convert TCP/DCCP listening hash tables to use RCU
This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.

The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.

We define a large value :

#define LISTENING_NULLS_BASE (1U << 29)

So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 17:22:55 -08:00
Krzysztof Hałasa
72364706c3 WAN: syncppp.c is no longer used by any kernel code. Remove it.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:48 +01:00
David S. Miller
6c0bce37ff Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-11-21 17:05:11 -08:00
Catalin Marinas
7e56b5d698 net: Fix memory leak in the proto_register function
If the slub allocator is used, kmem_cache_create() may merge two or more
kmem_cache's into one but the cache name pointer is not updated and
kmem_cache_name() is no longer guaranteed to return the pointer passed
to the former function. This patch stores the kmalloc'ed pointers in the
corresponding request_sock_ops and timewait_sock_ops structures.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 16:45:22 -08:00
Eric Dumazet
f757fec4b0 net: use net_eq() in INET_MATCH and INET_TW_MATCH
We can avoid some useless instructions if !CONFIG_NET_NS

Because of RCU, we use INET_MATCH or INET_TW_MATCH twice for the found
socket, so thats six instructions less per incoming TCP packet.

Yet another tbench speedup :)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 15:49:19 -08:00
Rami Rosen
a1eb5fe319 wireless: missing include in lib80211.h
This patch adds #include <linux/timer.h> in lib80211.h to avoid
these compilation erros.

> In file included from /work/src/wireless-testing/net/wireless/lib80211.c:24:
> /work/src/wireless-testing/include/net/lib80211.h:113: error: field
> 'crypt_deinit_timer' has incomplete type
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_info_init':
> /work/src/wireless-testing/net/wireless/lib80211.c:83: error: implicit
> declaration of function 'setup_timer'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_info_free':
> /work/src/wireless-testing/net/wireless/lib80211.c:95: error: implicit
> declaration of function 'del_timer_sync'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_deinit_handler':
> /work/src/wireless-testing/net/wireless/lib80211.c:157: error:
> implicit declaration of function 'add_timer'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_delayed_deinit':
> /work/src/wireless-testing/net/wireless/lib80211.c:182: error:
> implicit declaration of function 'timer_pending'
> make[3]: *** [net/wireless/lib80211.o] Error 1
> make[2]: *** [net/wireless] Error 2
> make[1]: *** [net] Error 2
> make: *** [sub-make] Error 2

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-21 11:42:55 -05:00
John W. Linville
627271018d mac80211: add explicit padding in struct ieee80211_tx_info
Otherwise, the BUILD_BUG_ON calls in ieee80211_tx_info_clear_status can
fail on some architectures.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-21 11:08:18 -05:00
John W. Linville
2ba4b32ecf lib80211: consolidate crypt init routines
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-21 11:08:17 -05:00
John W. Linville
274bfb8dc5 lib80211: absorb crypto bits from net/ieee80211
These bits are shared already between ipw2x00 and hostap, and could
probably be shared both more cleanly and with other drivers.  This
commit simply relocates the code to lib80211 and adjusts the drivers
appropriately.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-21 11:08:17 -05:00
Randy Dunlap
0ed94eaaed mac80211: remove more excess kernel-doc
Delete kernel-doc struct descriptions for fields that don't exist:

Warning(include/net/mac80211.h:1263): Excess struct/union/enum/typedef member 'conf_ht' description in 'ieee80211_ops'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'addr' description in 'sta_info'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'aid' description in 'sta_info'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Johannes Berg <johannes@sipsolutions.net>
cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-21 11:08:15 -05:00
Felix Fietkau
4821277f36 mac80211: fix BUILD_BUG_ON() caused by misalignment on arm
On ARM alignment is done slightly different from other architectures.
struct ieee80211_tx_rate is aligned to word size, even though it only has 3
single-byte members, which triggers the BUILD_BUG_ON in
ieee80211_tx_info_clear_status

This patch marks the struct ieee80211_tx_rate as packed, so that ARM
behaves like the other architectures.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-21 11:06:05 -05:00
Alexander Duyck
859ee3c438 DCB: Add support for DCB BCN
Adds an interface to configure the Backward Congestion Notification
(BCN) feature.  In a BCN capabale network, congestion notifications
from congested points out in the network can cause the end station
limit the rate of a given traffic flow.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 21:10:23 -08:00
Alexander Duyck
0eb3aa9bab DCB: Add interface to query the state of PFC feature.
Adds a netlink interface for Data Center Bridging (DCB) to get and set
the enable state of the Priority Flow Control (PFC) feature.
Primarily, this is a way to turn off PFC in the driver while DCB
remains enabled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 21:09:23 -08:00
Alexander Duyck
33dbabc4a7 DCB: Add interface to query # of TCs supported by device
Adds interface for Data Center Bridging (DCB) to query (and set if
supported) the number of traffic classes currently supported by the
device for the two (DCB) features: priority groups (PG) and priority
flow control (PFC).

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 21:08:19 -08:00
Alexander Duyck
46132188bf DCB: Add interface to query for the DCB capabilities of an device.
Adds to the netlink interface for Data Center Bridging (DCB), allowing
the DCB capabilities supported by a device to be queried.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 21:05:08 -08:00
Alexander Duyck
2f90b8657e ixgbe: this patch adds support for DCB to the kernel and ixgbe driver
This adds support for Data Center Bridging (DCB) features in the ixgbe
driver and adds an rtnetlink interface for configuring DCB to the
kernel.  The DCB feature support included are Priority Grouping (PG) -
which allows bandwidth guarantees to be allocated to groups to traffic
based on the 802.1q priority, and Priority Based Flow Control (PFC) -
which introduces a new MAC control PAUSE frame which works at
granularity of the 802.1p priority instead of the link (IEEE 802.3x).

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 20:52:10 -08:00
Eric Dumazet
9db66bdcc8 net: convert TCP/DCCP ehash rwlocks to spinlocks
Now TCP & DCCP use RCU lookups, we can convert ehash rwlocks to spinlocks.

/proc/net/tcp and other seq_file 'readers' can safely be converted to 'writers'.

This should speedup writers, since spin_lock()/spin_unlock()
only use one atomic operation instead of two for write_lock()/write_unlock()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 20:39:09 -08:00
David S. Miller
6ab33d5171 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ixgbe/ixgbe_main.c
	include/net/mac80211.h
	net/phonet/af_phonet.c
2008-11-20 16:44:00 -08:00
Eric Dumazet
5caea4ea70 net: listening_hash get a spinlock per bucket
This patch prepares RCU migration of listening_hash table for
TCP/DCCP protocols.

listening_hash table being small (32 slots per protocol), we add
a spinlock for each slot, instead of a single rwlock for whole table.

This should reduce hold time of readers, and writers concurrency.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 00:40:07 -08:00
Joe Perches
07f0757a68 include/net net/ - csum_partial - remove unnecessary casts
The first argument to csum_partial is const void *
casts to char/u8 * are not necessary

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19 15:44:53 -08:00
David S. Miller
198d6ba4d7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/isdn/i4l/isdn_net.c
	fs/cifs/connect.c
2008-11-18 23:38:23 -08:00
Johannes Berg
8e3bad65a5 mac80211: remove ieee80211_notify_mac
Before ieee80211_notify_mac() was added, it was presented with the
use case of using it to tell mac80211 that the association may
have been lost because the firmware crashed/reset.

Since then, it has also been used by iwlwifi to (slightly) speed
up re-association after resume, a workaround around the fact that
mac80211 has no suspend/resume handling yet. It is also not used
by any other drivers, so clearly it cannot be necessary for "good
enough" suspend/resume.

Unfortunately, the callback suffers from a severe problem: It only
works for station mode. If suspend/resume happens while in IBSS or
any other mode (but station), then the callback is pointless.

Recently, it has created a number of locking issues, first because
it required rtnl locking rather than RCU due to calling sleeping
functions within the critical section, and now because it's called
by iwlwifi from the mac80211 workqueue that may not use the rtnl
because it is flushed under rtnl.
(cf. http://bugzilla.kernel.org/show_bug.cgi?id=12046)

I think, therefore, that we should take a step back, remove it
entirely for now and add the small feature it provided properly.
For suspend and resume we will need to introduce new hooks, and for
the case where the firmware was reset the driver will probably
simply just pretend it has done a suspend/resume cycle to get
mac80211 to reprogram the hardware completely, not just try to
connect to the current AP again in station mode. When doing so, we
will need to take into account locking issues and possibly defer
to schedule_work from within mac80211 for the resume operation,
while the suspend operation must be done directly.

Proper suspend/resume should also not necessarily try to reconnect
to the current AP, the time spent in suspend may have been short
enough to not be disconnected from the AP, mac80211 will detect
that the AP went out of range quickly if it did, and if the
association is lost then the AP will disassoc as soon as a data
frame is sent. We might also take into account WWOL then, and
have mac80211 program the hardware into such a mode where it is
available and requested.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-18 17:26:26 -05:00
Patrick McHardy
d9e150071d netfilter: nfnetlink_log: fix warning and prototype mismatch
net/netfilter/nfnetlink_log.c:537:1: warning: symbol 'nfulnl_log_packet' was not declared. Should it be static?

Including the proper header also revealed an incorrect prototype.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-18 12:16:52 +01:00
Pablo Neira Ayuso
19abb7b090 netfilter: ctnetlink: deliver events for conntracks changed from userspace
As for now, the creation and update of conntracks via ctnetlink do not
propagate an event to userspace. This can result in inconsistent situations
if several userspace processes modify the connection tracking table by means
of ctnetlink at the same time. Specifically, using the conntrack command
line tool and conntrackd at the same time can trigger unconsistencies.

This patch also modifies the event cache infrastructure to pass the
process PID and the ECHO flag to nfnetlink_send() to report back
to userspace if the process that triggered the change needs so.
Based on a suggestion from Patrick McHardy.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-18 11:56:20 +01:00
Pablo Neira Ayuso
226c0c0ef2 netfilter: ctnetlink: helper modules load-on-demand support
This patch adds module loading for helpers via ctnetlink.

* Creation path: We support explicit and implicit helper assignation. For
  the explicit case, we try to load the module. If the module is correctly
  loaded and the helper is present, we return EAGAIN to re-start the
  creation. Otherwise, we return EOPNOTSUPP.
* Update path: release the spin lock, load the module and check. If it is
  present, then return EAGAIN to re-start the update.

This patch provides a refactorized function to lookup-and-set the
connection tracking helper. The function removes the exported symbol
__nf_ct_helper_find as it has not clients anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-18 11:54:05 +01:00
James Morris
f3a5c54701 Merge branch 'master' into next
Conflicts:
	fs/cifs/misc.c

Merge to resolve above, per the patch below.

Signed-off-by: James Morris <jmorris@namei.org>

diff --cc fs/cifs/misc.c
index ec36410,addd1dc..0000000
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@@ -347,13 -338,13 +338,13 @@@ header_assemble(struct smb_hdr *buffer
  		/*  BB Add support for establishing new tCon and SMB Session  */
  		/*      with userid/password pairs found on the smb session   */
  		/*	for other target tcp/ip addresses 		BB    */
 -				if (current->fsuid != treeCon->ses->linux_uid) {
 +				if (current_fsuid() != treeCon->ses->linux_uid) {
  					cFYI(1, ("Multiuser mode and UID "
  						 "did not match tcon uid"));
- 					read_lock(&GlobalSMBSeslock);
- 					list_for_each(temp_item, &GlobalSMBSessionList) {
- 						ses = list_entry(temp_item, struct cifsSesInfo, cifsSessionList);
+ 					read_lock(&cifs_tcp_ses_lock);
+ 					list_for_each(temp_item, &treeCon->ses->server->smb_ses_list) {
+ 						ses = list_entry(temp_item, struct cifsSesInfo, smb_ses_list);
 -						if (ses->linux_uid == current->fsuid) {
 +						if (ses->linux_uid == current_fsuid()) {
  							if (ses->server == treeCon->ses->server) {
  								cFYI(1, ("found matching uid substitute right smb_uid"));
  								buffer->Uid = ses->Suid;
2008-11-18 18:52:37 +11:00
Pablo Neira Ayuso
4dc06f9633 netfilter: nf_conntrack: connection tracking helper name persistent aliases
This patch adds the macro MODULE_ALIAS_NFCT_HELPER that defines a
way to provide generic and persistent aliases for the connection
tracking helpers.

This next patch requires this patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-17 16:01:42 +01:00
Alexey Dobriyan
4d24b52ac5 ematch: simpler tcf_em_unregister()
Simply delete ops from list and let list debugging do the job.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 23:01:49 -08:00
Eric Dumazet
5635c10d97 net: make sure struct dst_entry refcount is aligned on 64 bytes
As found in the past (commit f1dd9c379c
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.

We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.

for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0

As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)

Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.

"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 19:46:36 -08:00
Eric Dumazet
3ab5aee7fe net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls
RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
  price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.

This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.

Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.

__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)

Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 19:40:17 -08:00
Eric Dumazet
88ab1932ea udp: Use hlist_nulls in UDP RCU code
This is a straightforward patch, using hlist_nulls infrastructure.

RCUification already done on UDP two weeks ago.

Using hlist_nulls permits us to avoid some memory barriers, both
at lookup time and delete time.

Patch is large because it adds new macros to include/net/sock.h.
These macros will be used by TCP & DCCP in next patch.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 19:39:21 -08:00
Ingo Molnar
e8f6fbf62d lockdep: include/linux/lockdep.h - fix warning in net/bluetooth/af_bluetooth.c
fix this warning:

  net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used
  net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used

this is a lockdep macro problem in the !LOCKDEP case.

We cannot convert it to an inline because the macro works on multiple types,
but we can mark the parameter used.

[ also clean up a misaligned tab in sock_lock_init_class_and_name() ]

[ also remove #ifdefs from around af_family_clock_key strings - which
  were certainly added to get rid of the ugly build warnings. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-13 23:19:10 -08:00
Jarek Poplawski
f30ab418a1 pkt_sched: Remove qdisc->ops->requeue() etc.
After implementing qdisc->ops->peek() and changing sch_netem into
classless qdisc there are no more qdisc->ops->requeue() users. This
patch removes this method with its wrappers (qdisc_requeue()), and
also unused qdisc->requeue structure. There are a few minor fixes of
warnings (htb_enqueue()) and comments btw.

The idea to kill ->requeue() and a similar patch were first developed
by David S. Miller.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-13 22:56:30 -08:00
Petr Tesarik
38a7ddffa4 tcp: remove an unnecessary field in struct tcp_skb_cb
The urg_ptr field is not used anywhere and is merely confusing.

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-13 22:44:11 -08:00
James Morris
2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells
8192b0c482 CRED: Wrap task credential accesses in the networking subsystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: netdev@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:10 +11:00
Alexey Dobriyan
2378982487 net: ifdef struct sock::sk_async_wait_queue
Every user is under CONFIG_NET_DMA already, so ifdef field as well.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12 23:25:32 -08:00
Eric Dumazet
e42ea986e4 net: Cleanup of neighbour code
Using read_pnet() and write_pnet() in neighbour code ease the reading
of code.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12 00:54:54 -08:00
Eric Dumazet
7a9546ee35 net: ib_net pointer should depends on CONFIG_NET_NS
We can shrink size of "struct inet_bind_bucket" by 50%, using
read_pnet() and write_pnet()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12 00:54:20 -08:00
Eric Dumazet
8f424b5f32 net: Introduce read_pnet() and write_pnet() helpers
This patch introduces two helpers that deal with reading and writing
struct net pointers in various network structures.

Their implementation depends on CONFIG_NET_NS

For symmetry, both functions work with "struct net **pnet".

Their usage should reduce the number of #ifdef CONFIG_NET_NS,
without adding many helpers for each network structure
that hold a "struct net *pointer"

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12 00:53:30 -08:00
Alexey Dobriyan
6bb3ce25d0 net: remove struct dst_entry::entry_size
Unused after kmem_cache_zalloc() conversion.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-11 17:25:22 -08:00
Alexey Dobriyan
9b739ba5e6 net: remove struct neigh_table::pde
->pde isn't actually needed, since name is stashed in ->id.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-11 16:47:44 -08:00
David S. Miller
7e452baf6b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/message/fusion/mptlan.c
	drivers/net/sfc/ethtool.c
	net/mac80211/debugfs_sta.c
2008-11-11 15:43:02 -08:00
Kay Sievers
fb28ad3590 net: struct device - replace bus_id with dev_name(), dev_set_name()
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-10 13:55:14 -08:00
Luis R. Rodriguez
b219cee191 cfg80211: make use of reg macros on REG_RULE
Ensure regulatory converstion macros safely accept
multiple arguments and make REG_RULE() use them.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-10 15:17:41 -05:00
Jouni Malinen
318884875b nl80211: Add TX queue parameter configuration
Add a new attribute, NL80211_ATTR_WIPHY_TXQ_PARAMS, that can be used with
NL80211_CMD_SET_WIPHY for userspace (e.g., hostapd) to set TX queue
parameters (txop, cwmin, cwmax, aifs).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-10 15:17:40 -05:00
Jouni Malinen
90c97a040d nl80211: Add basic rate configuration for AP mode
Add a new attribute, NL80211_ATTR_BSS_BASIC_RATES, that can be used with
NL80211_CMD_SET_BSS for userspace (e.g., hostapd) to set which rates are
in the basic rate set.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-10 15:17:39 -05:00
Johannes Berg
bd81525272 wireless: implement basic rate helper function
This adds a helper function that, given a bitmap of basic
rates and a bitrate returns the response rate for this rate.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-10 15:17:35 -05:00
Sujith
8469cdef1f mac80211: Add a new event in ieee80211_ampdu_mlme_action
Send a notification to the driver on succesful
reception of an ADDBA response, add IEEE80211_AMPDU_TX_RESUME
for this purpose.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-10 15:17:32 -05:00
Johannes Berg
41bb73eeac mac80211: remove SSID driver code
Remove the SSID from the driver API since now there is no
driver that requires knowing the SSID and I think it's
unlikely that any hardware design that does require the
SSID will play well with mac80211.

This also removes support for setting the SSID in master
mode which will require a patch to hostapd to not try.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-11-10 15:11:56 -05:00
Miklos Szeredi
6209344f5a net: unix: fix inflight counting bug in garbage collector
Previously I assumed that the receive queues of candidates don't
change during the GC.  This is only half true, nothing can be received
from the queues (see comment in unix_gc()), but buffers could be added
through the other half of the socket pair, which may still have file
descriptors referring to it.

This can result in inc_inflight_move_tail() erronously increasing the
"inflight" counter for a unix socket for which dec_inflight() wasn't
previously called.  This in turn can trigger the "BUG_ON(total_refs <
inflight_refs)" in a later garbage collection run.

Fix this by only manipulating the "inflight" counter for sockets which
are candidates themselves.  Duplicating the file references in
unix_attach_fds() is also needed to prevent a socket becoming a
candidate for GC while the skb that contains it is not yet queued.

Reported-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-09 11:17:33 -08:00
David S. Miller
167c6274c3 Merge branch 'davem-next' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-11-07 01:37:16 -08:00
David S. Miller
9eeda9abd1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/ath5k/base.c
	net/8021q/vlan_core.c
2008-11-06 22:43:03 -08:00
David Miller
f8d570a474 net: Fix recursive descent in __scm_destroy().
__scm_destroy() walks the list of file descriptors in the scm_fp_list
pointed to by the scm_cookie argument.

Those, in turn, can close sockets and invoke __scm_destroy() again.

There is nothing which limits how deeply this can occur.

The idea for how to fix this is from Linus.  Basically, we do all of
the fput()s at the top level by collecting all of the scm_fp_list
objects hit by an fput().  Inside of the initial __scm_destroy() we
keep running the list until it is empty.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 13:51:50 -08:00
Brian Haley
305d552acc bonding: send IPv6 neighbor advertisement on failover
This patch adds better IPv6 failover support for bonding devices,
especially when in active-backup mode and there are only IPv6 addresses
configured, as reported by Alex Sidorenko.

- Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
   IPv6-specific routines.  Both regular bonds and VLANs over bonds
   are supported.

- Adds a new tunable, num_unsol_na, to limit the number of unsolicited
   IPv6 Neighbor Advertisements that are sent on a failover event.
   Default is 1.

- Creates two new IPv6 neighbor discovery functions:

   ndisc_build_skb()
   ndisc_send_skb()

   These were required to support VLANs since we have to be able to
   add the VLAN id to the skb since ndisc_send_na() and friends
   shouldn't be asked to do this.  These two routines are basically
   __ndisc_send() split into two pieces, in a slightly different order.

- Updates Documentation/networking/bonding.txt and bumps the rev of bond
   support to 3.4.0.

On failover, this new code will generate one packet:

- An unsolicited IPv6 Neighbor Advertisement, which helps the switch
   learn that the address has moved to the new slave.

Testing has shown that sending just the NA results in pretty good
behavior when in active-back mode, I saw no lost ping packets for example.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-11-06 00:49:37 -05:00
Jarek Poplawski
61c9eaf900 pkt_sched: Fix qdisc len in qdisc_peek_dequeued()
A packet dequeued and stored as gso_skb in qdisc_peek_dequeued() should
be seen as part of the queue for sch->q.qlen queries until it's really
dequeued with qdisc_dequeue_peeked(), so qlen needs additional updating
in these functions. (Updating qstats.backlog shouldn't matter here.)

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 16:02:34 -08:00
Alexey Dobriyan
d5f642384e net: #ifdef ->sk_security
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 14:45:58 -08:00
Alexey Dobriyan
b71b30a626 netfilter: netns ebtables: ebtable_nat in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-04 14:30:46 +01:00
Alexey Dobriyan
4aad10938d netfilter: netns ebtables: ebtable_filter in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-04 14:29:58 +01:00
Alexey Dobriyan
8157e6d16a netfilter: netns ebtables: ebtable_broute in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-04 14:29:03 +01:00
Eric Leblond
5f7340eff8 netfilter: xt_NFLOG: don't call nf_log_packet in NFLOG module.
This patch modifies xt_NFLOG to suppress the call to nf_log_packet()
function. The call of this wrapper in xt_NFLOG was causing NFLOG to
use the first initialized module. Thus, if ipt_ULOG is loaded before
nfnetlink_log all NFLOG rules are treated as plain LOG rules.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-11-04 14:21:08 +01:00
Julius Volz
48148938b4 IPVS: Remove supports_ipv6 scheduler flag
Remove the 'supports_ipv6' scheduler flag since all schedulers now
support IPv6.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 17:08:56 -08:00
Johannes Berg
e25cf4a694 mac80211: fix two kernel-doc warnings
One parameter wasn't described and one I forgot to update when
renaming it; also update TBDs in sta_info.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:02:36 -04:00
Johannes Berg
be3d48106c wireless: remove struct regdom hinting
The code needs to be split out and cleaned up, so as a
first step remove the capability, to add it back in a
subsequent patch as a separate function. Also remove the
publically facing return value of the function and the
wiphy argument. A number of internal functions go from
being generic helpers to just being used for alpha2
setting.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:02:30 -04:00
Johannes Berg
d2372b3152 wireless: make regdom passing semantics simpler
The regdom struct is given to the core, so it might as well
free it in error conditions.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:02:30 -04:00
Sujith
8b30b1fe36 mac80211: Re-enable aggregation
Wireless HW without any dedicated queues for aggregation
do not need the ampdu_queues mechanism present right now
in mac80211. Since mac80211 is still incomplete wrt TX MQ
changes, do not allow aggregation sessions for drivers that
set ampdu_queues.

This is only an interim hack until Intel fixes the requeue issue.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: Luis Rodriguez <Luis.Rodriguez@Atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:02:14 -04:00
John W. Linville
7211801527 wireless: avoid some net/ieee80211.h vs. linux/ieee80211.h conflicts
There is quite a lot of overlap in definitions between these headers...

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:50 -04:00
John W. Linville
9387b7caf3 wireless: use individual buffers for printing ssid values
Also change escape_ssid to print_ssid to match print_mac semantics.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:50 -04:00
John W. Linville
c5d3dce875 wireless: remove NETWORK_EMPTY_ESSID flag
It is unnecessary and of questionable value.  Also remove
is_empty_ssid, as it is also unnecessary.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:48 -04:00
John W. Linville
7e272fcff6 wireless: consolidate on a single escape_essid implementation
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:46 -04:00
Johannes Berg
cf03268e6e wireless: don't publish __regulatory_hint
This function requires an internal lock to be held, so it cannot
be published to other modules in the kernel.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:41 -04:00
Bob Copeland
e37d4dffdf mac80211: fix a few typos in mac80211 kernel doc
Correct a handful of errors found while reading the mac80211 book.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:41 -04:00
colin@cozybit.com
93da9cc17c Add nl80211 commands to get and set o11s mesh networking parameters
The two new commands are NL80211_CMD_GET_MESH_PARAMS and
NL80211_CMD_SET_MESH_PARAMS. There is a new attribute enum,
NL80211_ATTR_MESH_PARAMS, which enumerates the various mesh configuration
parameters.

Moved struct mesh_config from mac80211/ieee80211_i.h to net/cfg80211.h.
nl80211_get_mesh_params and nl80211_set_mesh_params unpack the netlink messages
and ask the driver to get or set the configuration.  This is done via two new
function stubs, get_mesh_params and set_mesh_params, in struct cfg80211_ops.

Signed-off-by: Colin McCabe <colin@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:39 -04:00
Johannes Berg
50fb2e4572 mac80211: remove rate_control_clear
"Clearing" the rate control algorithm is pointless, none of
the algorithms actually uses this operation and it's not even
invoked properly for all channel switching. Also, there's no
need to since rate control algorithms work per station.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:37 -04:00
Johannes Berg
e6a9854b05 mac80211/drivers: rewrite the rate control API
So after the previous changes we were still unhappy with how
convoluted the API is and decided to make things simpler for
everybody. This completely changes the rate control API, now
taking into account 802.11n with MCS rates and more control,
most drivers don't support that though.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:23 -04:00
Johannes Berg
ae5eb02641 mac80211: rewrite HT handling
The HT handling has the following deficiencies, which I've
(partially) fixed:
 * it always uses the AP info even if there is no AP,
   hence has no chance of working as an AP
 * it pretends to be HW config, but really is per-BSS
 * channel sanity checking is left to the drivers
 * it generally lets the driver control too much

HT enabling is still wrong with this patch if you have more than
one virtual STA mode interface, but that never happens currently.
Once WDS, IBSS or AP/VLAN gets HT capabilities, it will also be
wrong, see the comment in ieee80211_enable_ht().

Additionally, this fixes a number of bugs:
 * mac80211: ieee80211_set_disassoc doesn't notify the driver any
             more since the refactoring
 * iwl-agn-rs: always uses the HT capabilities from the wrong stuff
               mac80211 gives it rather than the actual peer STA
 * ath9k: a number of bugs resulting from the broken HT API

I'm not entirely happy with putting the HT capabilities into
struct ieee80211_sta as restricted to our own HT TX capabilities,
but I see no cleaner solution for now.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:16 -04:00
Johannes Berg
bda3933a8a mac80211: move bss_conf into vif
Move bss_conf into the vif struct so that drivers can
access it during ->tx without having to store it in
the private data or similar. No driver updates because
this is only for when they want to start using it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:15 -04:00
Johannes Berg
9124b07740 mac80211: make retry limits part of hw config
Instead of having a separate callback, use the HW config callback
with a new flag to change retry limits.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:14 -04:00
Johannes Berg
e8975581f6 mac80211: introduce hw config change flags
This makes mac80211 notify the driver which configuration
actually changed, e.g. channel etc.

No driver changes, this is just plumbing, driver authors are
expected to act on this if they want to.

Also remove the HW CONFIG debug printk, it's incorrect, often
we configure something else.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:07 -04:00
Johannes Berg
0f4ac38b59 mac80211: kill hw.conf.antenna_sel_{rx,tx}
Never actually used.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:06 -04:00
Johannes Berg
d9fe60dea7 802.11: clean up/fix HT support
This patch cleans up a number of things:
 * the unusable definition of the HT capabilities/HT information
   information elements
 * variable names that are hard to understand
 * mac80211: move ieee80211_handle_ht to ht.c and remove the unused
             enable_ht parameter
 * mac80211: fix bug with MCS rate 32 in ieee80211_handle_ht
 * mac80211: fix bug with casting the result of ieee80211_bss_get_ie
             to an information element _contents_ rather than the
             whole element, add size checking (another out-of-bounds
             access bug fixed!)
 * mac80211: remove some unused return values in favour of BUG_ON
             checking
 * a few minor other things

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 19:00:06 -04:00
Johannes Berg
7a5158ef8d mac80211: fix short slot handling
This patch makes mac80211 handle short slot requests from the AP
properly. Also warn about uses of IEEE80211_CONF_SHORT_SLOT_TIME
and optimise out the code since it cannot ever be hit anyway.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 18:58:53 -04:00
Johannes Berg
e87a2feea7 mac80211: remove max_antenna_gain config
The antenna gain isn't exactly configurable, despite the belief of
some unnamed individual who thinks that the EEPROM might influence
it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 18:06:01 -04:00
Johannes Berg
3db594380b mac80211: remove wiphy_to_hw
This isn't used by anyone, if we ever need it we can add
it back, until then it's useless.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31 18:06:00 -04:00
Harvey Harrison
3685f25de1 misc: replace NIPQUAD()
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:56:49 -07:00
Jarek Poplawski
77be155cba pkt_sched: Add peek emulation for non-work-conserving qdiscs.
This patch adds qdisc_peek_dequeued() wrapper to emulate peek method
with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until
dequeuing. This is mainly for compatibility reasons not to break some
strange configs because peeking is expected for non-work-conserving
parent qdiscs to query work-conserving child qdiscs.

This implementation requires using qdisc_dequeue_peeked() wrapper
instead of directly calling qdisc->dequeue() for all qdiscs ever
querried with qdisc->ops->peek() or qdisc_peek_dequeued().

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:47:01 -07:00
Patrick McHardy
48a8f519e0 pkt_sched: Add ->peek() methods for fifo, prio and SFQ qdiscs.
From: Patrick McHardy <kaber@trash.net>

Just as a demonstration how easy adding a peek operation to the
work-conserving qdiscs actually is. It doesn't need to keep or change
any internal state in many cases thanks to the guarantee that the
packet will either be dequeued or, if another packet arrives, the
upper qdisc will immediately ->peek again to reevaluate the state.

(This is only slightly modified Patrick's patch.)

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:44:18 -07:00
Jarek Poplawski
90d841fd0a pkt_sched: sch_generic: Add Qdisc_ops peek() method.
Add Qdisc_ops peek() method in order to replace requeuing.

Based on ideas and patches of Herbert Xu, Patrick McHardy and
David S. Miller.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:43:45 -07:00
Alexey Dobriyan
cc0fe83525 xfrm: remove unused struct xfrm_policy::next
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:42:25 -07:00
David S. Miller
a1744d3bee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/p54/p54common.c
2008-10-31 00:17:34 -07:00
Alexey Dobriyan
485ac57bc1 netns: add register_pernet_gen_subsys/unregister_pernet_gen_subsys
netns ops which are registered with register_pernet_gen_device() are
shutdown strictly before those which are registered with
register_pernet_subsys(). Sometimes this leads to opposite (read: buggy)
shutdown ordering between two modules.

Add register_pernet_gen_subsys()/unregister_pernet_gen_subsys() for modules
which aren't elite enough for entry in struct net, and which can't use
register_pernet_gen_device(). PPTP conntracking module is such one.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30 23:55:16 -07:00
Randy Dunlap
ad1d967c88 net: delete excess kernel-doc notation
Remove excess kernel-doc function parameters from networking header
& driver files:

Warning(include/net/sock.h:946): Excess function parameter or struct member 'sk' description in 'sk_filter_release'
Warning(include/linux/netdevice.h:1545): Excess function parameter or struct member 'cpu' description in 'netif_tx_lock'
Warning(drivers/net/wan/z85230.c:712): Excess function parameter or struct member 'regs' description in 'z8530_interrupt'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30 23:54:35 -07:00
Harvey Harrison
5b095d9892 net: replace %p6 with %pI6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:52:50 -07:00
Eric Dumazet
96631ed16c udp: introduce sk_for_each_rcu_safenext()
Corey Minyard found a race added in commit 271b72c7fa
(udp: RCU handling for Unicast packets.)

 "If the socket is moved from one list to another list in-between the
 time the hash is calculated and the next field is accessed, and the
 socket has moved to the end of the new list, the traversal will not
 complete properly on the list it should have, since the socket will
 be on the end of the new list and there's not a way to tell it's on a
 new list and restart the list traversal.  I think that this can be
 solved by pre-fetching the "next" field (with proper barriers) before
 checking the hash."

This patch corrects this problem, introducing a new
sk_for_each_rcu_safenext() macro.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 11:19:58 -07:00
Eric Dumazet
271b72c7fa udp: RCU handling for Unicast packets.
Goals are :

1) Optimizing handling of incoming Unicast UDP frames, so that no memory
 writes should happen in the fast path.

 Note: Multicasts and broadcasts still will need to take a lock,
 because doing a full lockless lookup in this case is difficult.

2) No expensive operations in the socket bind/unhash phases :
  - No expensive synchronize_rcu() calls.

  - No added rcu_head in socket structure, increasing memory needs,
  but more important, forcing us to use call_rcu() calls,
  that have the bad property of making sockets structure cold.
  (rcu grace period between socket freeing and its potential reuse
   make this socket being cold in CPU cache).
  David did a previous patch using call_rcu() and noticed a 20%
  impact on TCP connection rates.
  Quoting Cristopher Lameter :
   "Right. That results in cacheline cooldown. You'd want to recycle
    the object as they are cache hot on a per cpu basis. That is screwed
    up by the delayed regular rcu processing. We have seen multiple
    regressions due to cacheline cooldown.
    The only choice in cacheline hot sensitive areas is to deal with the
    complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."

  - Because udp sockets are allocated from dedicated kmem_cache,
  use of SLAB_DESTROY_BY_RCU can help here.

Theory of operation :
---------------------

As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.

Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.

In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.

We use RCU only for fast path.
Thus, /proc/net/udp still takes spinlocks.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 02:11:14 -07:00
Eric Dumazet
645ca708f9 udp: introduce struct udp_table and multiple spinlocks
UDP sockets are hashed in a 128 slots hash table.

This hash table is protected by *one* rwlock.

This rwlock is readlocked each time an incoming UDP message is handled.

This rwlock is writelocked each time a socket must be inserted in
hash table (bind time), or deleted from this table (close time)

This is not scalable on SMP machines :

1) Even in read mode, lock() and unlock() are atomic operations and
 must dirty a contended cache line, shared by all cpus.

2) A writer might be starved if many readers are 'in flight'. This can
 happen on a machine with some NIC receiving many UDP messages. User
 process can be delayed a long time at socket creation/dismantle time.

This patch prepares RCU migration, by introducing 'struct udp_table
and struct udp_hslot', and using one spinlock per chain, to reduce
contention on central rwlock.

Introducing one spinlock per chain reduces latencies, for port
randomization on heavily loaded UDP servers. This also speedup
bindings to specific ports.

udp_lib_unhash() was uninlined, becoming to big.

Some cleanups were done to ease review of following patch
(RCUification of UDP Unicast lookups)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 01:41:45 -07:00
Harvey Harrison
0c6ce78abf net: replace uses of NIP6_FMT with %p6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 23:02:31 -07:00
Alexey Dobriyan
def8b4faff net: reduce structures when XFRM=n
ifdef out
* struct sk_buff::sp		(pointer)
* struct dst_entry::xfrm	(pointer)
* struct sock::sk_policy	(2 pointers)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 13:24:06 -07:00
Patrick McHardy
b057efd4d2 netlink: constify struct nlattr * arg to parsing functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 11:59:11 -07:00
Neil Horman
1080d709fb net: implement emergency route cache rebulds when gc_elasticity is exceeded
This is a patch to provide on demand route cache rebuilding.  Currently, our
route cache is rebulid periodically regardless of need.  This introduced
unneeded periodic latency.  This patch offers a better approach.  Using code
provided by Eric Dumazet, we compute the standard deviation of the average hash
bucket chain length while running rt_check_expire.  Should any given chain
length grow to larger that average plus 4 standard deviations, we trigger an
emergency hash table rebuild for that net namespace.  This allows for the common
case in which chains are well behaved and do not grow unevenly to not incur any
latency at all, while those systems (which may be being maliciously attacked),
only rebuild when the attack is detected.  This patch take 2 other factors into
account:
1) chains with multiple entries that differ by attributes that do not affect the
hash value are only counted once, so as not to unduly bias system to rebuilding
if features like QOS are heavily used
2) if rebuilding crosses a certain threshold (which is adjustable via the added
sysctl in this patch), route caching is disabled entirely for that net
namespace, since constant rebuilding is less efficient that no caching at all

Tested successfully by me.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-27 17:06:14 -07:00
Randy Dunlap
ea2d8b59bc mac80211.h: fix kernel-doc excesses
Fix mac80211.h kernel-doc: it had some extra parameters that were
no longer valid and incorrect format for a return value in 2 places.

Warning(lin2628-rc2//include/net/mac80211.h:1487): Excess function parameter or struct member 'control' description in 'ieee80211_beacon_get'
Warning(lin2628-rc2//include/net/mac80211.h:1596): Excess function parameter or struct member 'control' description in 'ieee80211_get_buffered_bc'
Warning(lin2628-rc2//include/net/mac80211.h:1632): Excess function parameter or struct member 'rc4key' description in 'ieee80211_get_tkip_key'
Warning(lin2628-rc2//include/net/mac80211.h:1735): Excess function parameter or struct member 'return' description in 'ieee80211_start_tx_ba_session'
Warning(lin2628-rc2//include/net/mac80211.h:1775): Excess function parameter or struct member 'return' description in 'ieee80211_stop_tx_ba_session'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-27 17:46:11 -04:00
Remi Denis-Courmont
e214a8cc7a Phonet: include generic link-layer header size in MAX_PHONET_HEADER
This fixes an OOPS in hard_header if a Phonet address is assigned to a
non-Phonet network interface.

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-26 23:06:31 -07:00
Linus Torvalds
2242d5eff1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
  tcp: Restore ordering of TCP options for the sake of inter-operability
  net: Fix disjunct computation of netdev features
  sctp: Fix to handle SHUTDOWN in SHUTDOWN_RECEIVED state
  sctp: Fix to handle SHUTDOWN in SHUTDOWN-PENDING state
  sctp: Add check for the TSN field of the SHUTDOWN chunk
  sctp: Drop ICMP packet too big message with MTU larger than current PMTU
  p54: enable 2.4/5GHz spectrum by eeprom bits.
  orinoco: reduce stack usage in firmware download path
  ath5k: fix suspend-related oops on rmmod
  [netdrvr] fec_mpc52xx: Implement polling, to make netconsole work.
  qlge: Fix MSI/legacy single interrupt bug.
  smc911x: Make the driver safer on SMP
  smc911x: Add IRQ polarity configuration
  smc911x: Allow Kconfig dependency on ARM
  sis190: add identifier for Atheros AR8021 PHY
  8139x: reduce message severity on driver overlap
  igb: add IGB_DCA instead of selecting INTEL_IOATDMA
  igb: fix tx data corruption with transition to L0s on 82575
  ehea: Fix memory hotplug support
  netdev: DM9000: remove BLACKFIN hacking in DM9000 netdev driver
  ...
2008-10-23 19:19:54 -07:00
Wei Yongjun
2e3f92dad6 sctp: Fix to handle SHUTDOWN in SHUTDOWN_RECEIVED state
Once an endpoint has reached the SHUTDOWN-RECEIVED state,
it MUST NOT send a SHUTDOWN in response to a ULP request.
The Cumulative TSN Ack of the received SHUTDOWN chunk
MUST be processed.

This patch fix to process Cumulative TSN Ack of the received
SHUTDOWN chunk in SHUTDOWN_RECEIVED state.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-23 01:01:18 -07:00
Eric Van Hensbergen
e45c5405e1 9p: fix sparse warnings
Several sparse warnings were introduced by patches accepted during the merge
window which weren't caught.  This patch fixes those warnings.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-22 18:54:47 -05:00
Tom Tucker
fc79d4b104 9p: rdma: RDMA Transport Support for 9P
This patch implements the RDMA transport provider for 9P. It allows
mounts to be performed over iWARP and IB capable network interfaces.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Latchesar Ionkov <lionkov@lanl.gov>
2008-10-22 18:47:39 -05:00
Eric Van Hensbergen
0b15a3a528 9p: fix debug build error
Fixes build problem with 9p when building with debug disabled.
Also contains some fixes for warnings which pop up when 
CONFIG_NET_9P_DEBUG is disabled.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-22 18:47:40 -05:00
Linus Torvalds
45e4a24f7b Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (26 commits)
  9p: add more conservative locking
  9p: fix oops in protocol stat parsing error path.
  9p: fix device file handling
  9p: Improve debug support
  9p: eliminate depricated conv functions
  9p: rework client code to use new protocol support functions
  9p: remove unnecessary tag field from p9_req_t structure
  9p: remove 9p fcall debug prints
  9p: add new protocol support code
  9p: encapsulate version function
  9p: move dirread to fs layer
  9p: adjust 9p vfs write operation
  9p: move readn meta-function from client to fs layer
  9p: consolidate read/write functions
  9p: drop broken unused error path from p9_conn_create()
  9p: make rpc code common and rework flush code
  9p: use the rcall structure passed in the request in trans_fd read_work
  9p: apply common request code to trans_fd
  9p: apply common tagpool handling to trans_fd
  9p: move request management to client code
  ...
2008-10-20 09:39:47 -07:00
Linus Torvalds
5fdf11283e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  netfilter: replace old NF_ARP calls with NFPROTO_ARP
  netfilter: fix compilation error with NAT=n
  netfilter: xt_recent: use proc_create_data()
  netfilter: snmp nat leaks memory in case of failure
  netfilter: xt_iprange: fix range inversion match
  netfilter: netns: use NFPROTO_NUMPROTO instead of NUMPROTO for tables array
  netfilter: ctnetlink: remove obsolete NAT dependency from Kconfig
  pkt_sched: sch_generic: Fix oops in sch_teql
  dccp: Port redirection support for DCCP
  tcp: Fix IPv6 fallout from 'Port redirection support for TCP'
  netdev: change name dropping error codes
  ipvs: Update CONFIG_IP_VS_IPV6 description and help text
2008-10-20 09:06:35 -07:00
Patrick McHardy
10a03a42d1 netfilter: netns: use NFPROTO_NUMPROTO instead of NUMPROTO for tables array
The netfilter families have been decoupled from regular protocol families.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-20 03:31:54 -07:00
Eric Van Hensbergen
e7f4b8f1a5 9p: Improve debug support
The new debug support lacks some of the information that the previous fcprint
code provided -- this patch focuses on better presentation of debug data along
with more helpful debug along error paths.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 16:20:07 -05:00
Eric Van Hensbergen
02da398b95 9p: eliminate depricated conv functions
Remove depricated conv functions which have been replaced with new 
protocol routines.

This patch also reworks the one instance of the file-system code which
directly calls conversion routines (to accomplish unpacking dirreads).

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:06:57 -05:00
Eric Van Hensbergen
51a87c552d 9p: rework client code to use new protocol support functions
Now that the new protocol functions are in place, this patch switches
the client code to using the new support code.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:45 -05:00
Eric Van Hensbergen
cb198131b0 9p: remove unnecessary tag field from p9_req_t structure
This removes the vestigial tag field from the p9_req_t structure.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:45 -05:00
Eric Van Hensbergen
51d71f9f7a 9p: remove 9p fcall debug prints
One of the current debug options allows users to get a verbose dump of fcalls.
This isn't really necessary as correctly parsed protocol frames can be printed
as part of the code in the client functions.  The consolidated printfcalls
structure would require new entries to be added for every extension.  This
patch removes the debug print methods and their use.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:44 -05:00
Eric Van Hensbergen
ace51c4dd2 9p: add new protocol support code
This adds a new protocol processing support code based on Anthony Liguori's
9p library code.  This code performs protocol marshalling/unmarshalling using
printf like strings to represent protocol elements.  It is my intent to use
them to replace the current functions in conv.c as well as the 
p9_create_* functions.

This should make the client implementation much more clear, and also make it
much easier to add new protocol extensions by limiting the number of places
in which changes need to be made.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:44 -05:00
Eric Van Hensbergen
06b55b464e 9p: move dirread to fs layer
Currently reading a directory is implemented in the client code.
This function is not actually a wire operation, but a meta operation 
which calls read operations and processes the results.

This patch moves this functionality to the fs layer and calls component
wire operations instead of constructing their packets.  This provides a 
cleaner separation and will help when we reorganize the client functions
and protocol processing methods.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:43 -05:00
Eric Van Hensbergen
fbedadc16e 9p: move readn meta-function from client to fs layer
There are a couple of methods in the client code which aren't actually
wire operations.  To keep things organized cleaner, these operations are
being moved to the fs layer.

This patch moves the readn meta-function (which executes multiple wire
reads until a buffer is full) to the fs layer.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:43 -05:00
Eric Van Hensbergen
0fc9655ec6 9p: consolidate read/write functions
Currently there are two separate versions of read and write.  One for
dealing with user buffers and the other for dealing with kernel buffers.
There is a tremendous amount of code duplication in the otherwise
identical versions of these functions.  This patch adds an additional
user buffer parameter to read and write and conditionalizes handling of
the buffer on whether the kernel buffer or the user buffer is populated.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:42 -05:00
Eric Van Hensbergen
91b8534fa8 9p: make rpc code common and rework flush code
This code moves the rpc function to the common client base,
reorganizes the flush code to be more simple and stable, and
makes the necessary adjustments to the underlying transports
to adapt to the new structure.

This reduces the overall amount of code duplication between the
transports and should make adding new transports more straightforward.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:42 -05:00
Eric Van Hensbergen
673d62cdaa 9p: apply common request code to trans_fd
Apply the now common p9_req_t structure to the fd transport.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:42 -05:00
Eric Van Hensbergen
fea511a644 9p: move request management to client code
The virtio transport uses a simplified request management system
that I want to use for all transports.  This patch adapts and moves the
exisiting code for managing requests to the client common code.
Later patches will apply these mechanisms to the other transports.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:42 -05:00
Eric Van Hensbergen
8b81ef589a 9p: consolidate transport structure
Right now there is a transport module structure which provides per-transport
type functions and data and a transport structure which contains per-instance
public data as well as function pointers to instance specific functions.

This patch moves public transport visible instance data to the client
structure (which in some cases had duplicate data) and consolidates the
functions into the transport module structure.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-10-17 11:04:41 -05:00
Linus Torvalds
cb23832e39 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
  decnet: Fix compiler warning in dn_dev.c
  IPV6: Fix default gateway criteria wrt. HIGH/LOW preference radv option
  net/802/fc.c: Fix compilation warnings
  netns: correct mib stats in ip6_route_me_harder()
  netns: fix net_generic array leak
  rt2x00: fix regression introduced by "mac80211: free up 2 bytes in skb->cb"
  rtl8187: Add USB ID for Belkin F5D7050 with RTL8187B chip
  p54usb: Device ID updates
  mac80211: fixme for kernel-doc
  ath9k/mac80211: disallow fragmentation in ath9k, report to userspace
  libertas : Remove unused variable warning for "old_channel" from cmd.c
  mac80211: Fix scan RX processing oops
  orinoco: fix unsafe locking in spectrum_cs_suspend
  orinoco: fix unsafe locking in orinoco_cs_resume
  cfg80211: fix debugfs error handling
  mac80211: fix debugfs netdev rename
  iwlwifi: fix ct kill configuration for 5350
  mac80211: fix HT information element parsing
  p54: Fix compilation problem on PPC
  mac80211: fix debugfs lockup
  ...
2008-10-16 11:26:26 -07:00
Alexey Dobriyan
f221e726bf sysctl: simplify ->strategy
name and nlen parameters passed to ->strategy hook are unused, remove
them.  In general ->strategy hook should know what it's doing, and don't
do something tricky for which, say, pointer to original userspace array
may be needed (name).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net> [ networking bits ]
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:47 -07:00
Harvey Harrison
d5c003b4d1 include: replace __FUNCTION__ with __func__
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:30 -07:00
David S. Miller
ab55570d64 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-10-14 23:19:16 -07:00
Randy Dunlap
e1a65b5828 mac80211: fixme for kernel-doc
Fix kernel-doc warnings in mac80211.h.
Fields need real explanations added to them.

Warning(lin2627-g3-kdocfixes//include/net/mac80211.h:659): No description found for parameter 'icv_len'
Warning(lin2627-g3-kdocfixes//include/net/mac80211.h:659): No description found for parameter 'iv_len'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-14 21:12:37 -04:00
Pablo Neira Ayuso
e6a7d3c04f netfilter: ctnetlink: remove bogus module dependency between ctnetlink and nf_nat
This patch removes the module dependency between ctnetlink and
nf_nat by means of an indirect call that is initialized when
nf_nat is loaded. Now, nf_conntrack_netlink only requires
nf_conntrack and nfnetlink.

This patch puts nfnetlink_parse_nat_setup_hook into the
nf_conntrack_core to avoid dependencies between ctnetlink,
nf_conntrack_ipv4 and nf_conntrack_ipv6.

This patch also introduces the function ctnetlink_change_nat
that is only invoked from the creation path. Actually, the
nat handling cannot be invoked from the update path since
this is not allowed. By introducing this function, we remove
the useless nat handling in the update path and we avoid
deadlock-prone code.

This patch also adds the required EAGAIN logic for nfnetlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-14 11:58:31 -07:00
James Morris
93db628658 Merge branch 'next' into for-linus 2008-10-13 09:35:14 +11:00
Linus Torvalds
64f1b65382 net: fix dummy 'nf_conntrack_event_cache()'
The dummy version of 'nf_conntrack_event_cache()' (used when the
NF_CONNTRACK_EVENTS config option is not enabled) had not been updated
when the calling convention changed.

This was introduced by commit a71996fccc
("netfilter: netns nf_conntrack: pass conntrack to
nf_conntrack_event_cache() not skb")

Tssk.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-11 09:46:24 -07:00
Paul Moore
d91d407991 netlabel: Add configuration support for local labeling
Add the necessary NetLabel support for the new CIPSO mapping,
CIPSO_V4_MAP_LOCAL, which allows full LSM label/context support.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
2008-10-10 10:16:34 -04:00
Paul Moore
15c45f7b2e cipso: Add support for native local labeling and fixup mapping names
This patch accomplishes three minor tasks: add a new tag type for local
labeling, rename the CIPSO_V4_MAP_STD define to CIPSO_V4_MAP_TRANS and
replace some of the CIPSO "magic numbers" with constants from the header
file.  The first change allows CIPSO to support full LSM labels/contexts,
not just MLS attributes.  The second change brings the mapping names inline
with what userspace is using, compatibility is preserved since we don't
actually change the value.  The last change is to aid readability and help
prevent mistakes.

Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-10-10 10:16:34 -04:00
Paul Moore
8d75899d03 netlabel: Changes to the NetLabel security attributes to allow LSMs to pass full contexts
This patch provides support for including the LSM's secid in addition to
the LSM's MLS information in the NetLabel security attributes structure.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
2008-10-10 10:16:33 -04:00
Paul Moore
014ab19a69 selinux: Set socket NetLabel based on connection endpoint
Previous work enabled the use of address based NetLabel selectors, which while
highly useful, brought the potential for additional per-packet overhead when
used.  This patch attempts to solve that by applying NetLabel socket labels
when sockets are connect()'d.  This should alleviate the per-packet NetLabel
labeling for all connected sockets (yes, it even works for connected DGRAM
sockets).

Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
2008-10-10 10:16:33 -04:00
Paul Moore
948bf85c1b netlabel: Add functionality to set the security attributes of a packet
This patch builds upon the new NetLabel address selector functionality by
providing the NetLabel KAPI and CIPSO engine support needed to enable the
new packet-based labeling.  The only new addition to the NetLabel KAPI at
this point is shown below:

 * int netlbl_skbuff_setattr(skb, family, secattr)

... and is designed to be called from a Netfilter hook after the packet's
IP header has been populated such as in the FORWARD or LOCAL_OUT hooks.

This patch also provides the necessary SELinux hooks to support this new
functionality.  Smack support is not currently included due to uncertainty
regarding the permissions needed to expand the Smack network access controls.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
2008-10-10 10:16:32 -04:00
Paul Moore
63c4168874 netlabel: Add network address selectors to the NetLabel/LSM domain mapping
This patch extends the NetLabel traffic labeling capabilities to individual
packets based not only on the LSM domain but the by the destination address
as well.  The changes here only affect the core NetLabel infrastructre,
changes to the NetLabel KAPI and individial protocol engines are also
required but are split out into a different patch to ease review.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
2008-10-10 10:16:32 -04:00
Paul Moore
b1edeb1023 netlabel: Replace protocol/NetLabel linking with refrerence counts
NetLabel has always had a list of backpointers in the CIPSO DOI definition
structure which pointed to the NetLabel LSM domain mapping structures which
referenced the CIPSO DOI struct.  The rationale for this was that when an
administrator removed a CIPSO DOI from the system all of the associated
NetLabel LSM domain mappings should be removed as well; a list of
backpointers made this a simple operation.

Unfortunately, while the backpointers did make the removal easier they were
a bit of a mess from an implementation point of view which was making
further development difficult.  Since the removal of a CIPSO DOI is a
realtively rare event it seems to make sense to remove this backpointer
list as the optimization was hurting us more then it was helping.  However,
we still need to be able to track when a CIPSO DOI definition is being used
so replace the backpointer list with a reference count.  In order to
preserve the current functionality of removing the associated LSM domain
mappings when a CIPSO DOI is removed we walk the LSM domain mapping table,
removing the relevant entries.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
2008-10-10 10:16:31 -04:00
Paul Moore
dfaebe9825 selinux: Fix missing calls to netlbl_skbuff_err()
At some point I think I messed up and dropped the calls to netlbl_skbuff_err()
which are necessary for CIPSO to send error notifications to remote systems.
This patch re-introduces the error handling calls into the SELinux code.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
2008-10-10 10:16:31 -04:00
Paul Moore
948a72438d netlabel: Remove unneeded in-kernel API functions
After some discussions with the Smack folks, well just Casey, I now have a
better idea of what Smack wants out of NetLabel in the future so I think it
is now safe to do some API "pruning".  If another LSM comes along that
needs this functionality we can always add it back in, but I don't see any
LSMs on the horizon which might make use of these functions.

Thanks to Rami Rosen who suggested removing netlbl_cfg_cipsov4_del() back
in February 2008.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
2008-10-10 10:16:30 -04:00
Guo-Fu Tseng
bb21c95e2d nf_conntrack_ecache.h: Fix missing braces
This patch add missing braces of today's net-next-2.6:
include/net/netfilter/nf_conntrack_ecache.h

Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-09 21:10:36 -07:00
Herbert Xu
64194c31a0 inet: Make tunnel RX/TX byte counters more consistent
This patch makes the RX/TX byte counters for IPIP, GRE and SIT more
consistent.  Previously we included the external IP headers on the
way out but not when the packet is inbound.

The new scheme is to count payload only in both directions.  For
IPIP and SIT this simply means the exclusion of the external IP
header.  For GRE this means that we exclude the GRE header as
well.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-09 12:03:17 -07:00
Lennert Buytenhek
396138f03f dsa: add support for Trailer tagging format
This adds support for the Trailer switch tagging format.  This is
another tagging that doesn't explicitly mark tagged packets with a
distinct ethertype, so that we need to add a similar hack in the
receive path as for the Original DSA tagging format.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Byron Bradley <byron.bbradley@gmail.com>
Tested-by: Tim Ellis <tim.ellis@mac.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 17:24:16 -07:00
Lennert Buytenhek
cf85d08fdf dsa: add support for original DSA tagging format
Most of the DSA switches currently in the field do not support the
Ethertype DSA tagging format that one of the previous patches added
support for, but only the original DSA tagging format.

The original DSA tagging format carries the same information as the
Ethertype DSA tagging format, but with the difference that it does not
have an ethertype field.  In other words, when receiving a packet that
is tagged with an original DSA tag, there is no way of telling in
eth_type_trans() that this packet is in fact a DSA-tagged packet.

This patch adds a hook into eth_type_trans() which is only compiled in
if support for a switch chip that doesn't support Ethertype DSA is
selected, and which checks whether there is a DSA switch driver
instance attached to this network device which uses the old tag format.
If so, it sets the protocol field to ETH_P_DSA without looking at the
packet, so that the packet ends up in the right place.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Peter van Valderen <linux@ddcrew.com>
Tested-by: Dirk Teurlings <dirk@upexia.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 17:19:56 -07:00
Lennert Buytenhek
91da11f870 net: Distributed Switch Architecture protocol support
Distributed Switch Architecture is a protocol for managing hardware
switch chips.  It consists of a set of MII management registers and
commands to configure the switch, and an ethernet header format to
signal which of the ports of the switch a packet was received from
or is intended to be sent to.

The switches that this driver supports are typically embedded in
access points and routers, and a typical setup with a DSA switch
looks something like this:

	+-----------+       +-----------+
	|           | RGMII |           |
	|           +-------+           +------ 1000baseT MDI ("WAN")
	|           |       |  6-port   +------ 1000baseT MDI ("LAN1")
	|    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
	|           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
	|           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
	|           |       |           |
	+-----------+       +-----------+

The switch driver presents each port on the switch as a separate
network interface to Linux, polls the switch to maintain software
link state of those ports, forwards MII management interface
accesses to those network interfaces (e.g. as done by ethtool) to
the switch, and exposes the switch's hardware statistics counters
via the appropriate Linux kernel interfaces.

This initial patch supports the MII management interface register
layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and
supports the "Ethertype DSA" packet tagging format.

(There is no officially registered ethertype for the Ethertype DSA
packet format, so we just grab a random one.  The ethertype to use
is programmed into the switch, and the switch driver uses the value
of ETH_P_EDSA for this, so this define can be changed at any time in
the future if the one we chose is allocated to another protocol or
if Ethertype DSA gets its own officially registered ethertype, and
everything will continue to work.)

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Byron Bradley <byron.bbradley@gmail.com>
Tested-by: Tim Ellis <tim.ellis@mac.com>
Tested-by: Peter van Valderen <linux@ddcrew.com>
Tested-by: Dirk Teurlings <dirk@upexia.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 17:15:19 -07:00
Vlad Yasevich
02015180e2 sctp: shrink sctp_tsnmap some more by removing gabs array
The gabs array in the sctp_tsnmap structure is only used
in one place, sctp_make_sack().  As such, carrying the
array around in the sctp_tsnmap and thus directly in
the sctp_association is rather pointless since most
of the time it's just taking up space.  Now, let
sctp_make_sack create and populate it and then throw
it away when it's done.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 14:19:01 -07:00
Vlad Yasevich
8e1ee18c33 sctp: Rework the tsn map to use generic bitmap.
The tsn map currently use is 4K large and is stuck inside
the sctp_association structure making memory references REALLY
expensive.  What we really need is at most 4K worth of bits
so the biggest map we would have is 512 bytes.   Also, the
map is only really usefull when we have gaps to store and
report.  As such, starting with minimal map of say 32 TSNs (bits)
should be enough for normal low-loss operations.  We can grow
the map by some multiple of 32 along with some extra room any
time we receive the TSN which would put us outside of the map
boundry.  As we close gaps, we can shift the map to rebase
it on the latest TSN we've seen.  This saves 4088 bytes per
association just in the map alone along savings from the now
unnecessary structure members.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 14:18:39 -07:00
Eric Dumazet
3c689b7320 inet: cleanup of local_port_range
I noticed sysctl_local_port_range[] and its associated seqlock
sysctl_local_port_range_lock were on separate cache lines.
Moreover, sysctl_local_port_range[] was close to unrelated
variables, highly modified, leading to cache misses.

Moving these two variables in a structure can help data
locality and moving this structure to read_mostly section
helps sharing of this data among cpus.

Cleanup of extern declarations (moved in include file where
they belong), and use of inet_get_local_port_range()
accessor instead of direct access to ports values.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 14:18:04 -07:00
Denis V. Lunev
9261e53701 ipv6: making ip and icmp statistics per/namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:16:45 -07:00
Denis V. Lunev
087fe24033 ipv6: added net argument to _DEVINC/_DEVADD
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:16:19 -07:00
Denis V. Lunev
55d43808eb ipv6: added net argument to ICMP6MSGIN_INC_STATS_BH
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:15:46 -07:00
Denis V. Lunev
a712d3e859 ipv6: ICMP6MSGIN_INC_STATS is not used
Removed.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:15:26 -07:00
Denis V. Lunev
5a57d4c7fd ipv6: added net argument to ICMP6MSGOUT_INC_STATS_BH
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:15:05 -07:00
Denis V. Lunev
5c5d244bd3 ipv6: added net argument to ICMP6MSGOUT_INC_STATS
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:14:44 -07:00
Denis V. Lunev
e41b5368e0 ipv6: added net argument to ICMP6_INC_STATS_BH
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:14:13 -07:00
Denis V. Lunev
a862f6a6dc ipv6: added net argument to ICMP6_INC_STATS
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:13:58 -07:00
Denis V. Lunev
821d57776d ipv6: added net argument to IP6_ADD_STATS_BH
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:13:31 -07:00
Denis V. Lunev
483a47d2fe ipv6: added net argument to IP6_INC_STATS_BH
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:09:27 -07:00
Denis V. Lunev
3bd653c845 netns: add net parameter to IP6_INC_STATS
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 10:54:51 -07:00
David S. Miller
364ae953a4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2008-10-08 09:50:38 -07:00
KOVACS Krisztian
9ad2d745a2 netfilter: iptables tproxy core
The iptables tproxy core is a module that contains the common routines used by
various tproxy related modules (TPROXY target and socket match)

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:12 +02:00
KOVACS Krisztian
73e4022f78 netfilter: split netfilter IPv4 defragmentation into a separate module
Netfilter connection tracking requires all IPv4 packets to be defragmented.
Both the socket match and the TPROXY target depend on this functionality, so
this patch separates the Netfilter IPv4 defrag hooks into a separate module.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:12 +02:00
Alexey Dobriyan
0c4c9288ad netfilter: netns nat: per-netns bysource hash
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:11 +02:00
Alexey Dobriyan
e099a17357 netfilter: netns nat: per-netns NAT table
Same story as with iptable_filter, iptables_raw tables.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:10 +02:00
Alexey Dobriyan
d716a4dfbb netfilter: netns nf_conntrack: per-netns conntrack accounting
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:09 +02:00
Alexey Dobriyan
c2a2c7e0cc netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_log_invalid sysctl
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:08 +02:00
Alexey Dobriyan
c04d05529a netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_checksum sysctl
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:08 +02:00
Alexey Dobriyan
802507071b netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_count sysctl
Note, sysctl table is always duplicated, this is simpler and less
special-cased.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:08 +02:00
Alexey Dobriyan
0d55af8791 netfilter: netns nf_conntrack: per-netns statistics
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:07 +02:00
Alexey Dobriyan
6058fa6bb9 netfilter: netns nf_conntrack: per-netns event cache
Heh, last minute proof-reading of this patch made me think,
that this is actually unneeded, simply because "ct" pointers will be
different for different conntracks in different netns, just like they
are different in one netns.

Not so sure anymore.

[Patrick: pointers will be different, flushing can only be done while
 inactive though and thus it needs to be per netns]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:07 +02:00
Alexey Dobriyan
a71996fccc netfilter: netns nf_conntrack: pass conntrack to nf_conntrack_event_cache() not skb
This is cleaner, we already know conntrack to which event is relevant.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:07 +02:00
Alexey Dobriyan
74c51a1497 netfilter: netns nf_conntrack: pass netns pointer to L4 protocol's ->error hook
Again, it's deducible from skb, but we're going to use it for
nf_conntrack_checksum and statistics, so just pass it from upper layer.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:05 +02:00
Alexey Dobriyan
a702a65fc1 netfilter: netns nf_conntrack: pass netns pointer to nf_conntrack_in()
It's deducible from skb->dev or skb->dst->dev, but we know netns at
the moment of call, so pass it down and use for finding and creating
conntracks.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:04 +02:00
Alexey Dobriyan
63c9a26264 netfilter: netns nf_conntrack: per-netns unconfirmed list
What is confirmed connection in one netns can very well be unconfirmed
in another one.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:04 +02:00
Alexey Dobriyan
9b03f38d04 netfilter: netns nf_conntrack: per-netns expectations
Make per-netns a) expectation hash and b) expectations count.

Expectations always belongs to netns to which it's master conntrack belong.
This is natural and doesn't bloat expectation.

Proc files and leaf users are stubbed to init_net, this is temporary.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:03 +02:00
Alexey Dobriyan
400dad39d1 netfilter: netns nf_conntrack: per-netns conntrack hash
* make per-netns conntrack hash

  Other solution is to add ->ct_net pointer to tuplehashes and still has one
  hash, I tried that it's ugly and requires more code deep down in protocol
  modules et al.

* propagate netns pointer to where needed, e. g. to conntrack iterators.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:03 +02:00
Alexey Dobriyan
49ac8713b6 netfilter: netns nf_conntrack: per-netns conntrack count
Sysctls and proc files are stubbed to init_net's one. This is temporary.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:03 +02:00
Alexey Dobriyan
5a1fb391d8 netfilter: netns nf_conntrack: add ->ct_net -- pointer from conntrack to netns
Conntrack (struct nf_conn) gets pointer to netns: ->ct_net -- netns in which
it was created. It comes from netdevice.

->ct_net is write-once field.

Every conntrack in system has ->ct_net initialized, no exceptions.

->ct_net doesn't pin netns: conntracks are recycled after timeouts and
pinning background traffic will prevent netns from even starting shutdown
sequence.

Right now every conntrack is created in init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:02 +02:00
Alexey Dobriyan
dfdb8d7918 netfilter: netns nf_conntrack: add netns boilerplate
One comment: #ifdefs around #include is necessary to overcome amazing compile
breakages in NOTRACK-in-netns patch (see below).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:02 +02:00
Jan Engelhardt
76108cea06 netfilter: Use unsigned types for hooknum and pf vars
and (try to) consistently use u_int8_t for the L3 family.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:00 +02:00
David S. Miller
075f664689 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-10-07 16:26:38 -07:00
Denis V. Lunev
be713a443e netns: make uplitev6 mib per/namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:50:06 -07:00
Denis V. Lunev
0c7ed677fb netns: make udpv6 mib per/namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:49:36 -07:00
Denis V. Lunev
835bcc0497 netns: move /proc/net/dev_snmp6 to struct net
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:45:55 -07:00
Peter Zijlstra
c57943a1c9 net: wrap sk->sk_backlog_rcv()
Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the
generic sk_backlog_rcv behaviour.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:18:42 -07:00
KOVACS Krisztian
23542618de inet: Don't lookup the socket if there's a socket attached to the skb
Use the socket cached in the skb if it's present.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 12:41:01 -07:00
Arnaldo Carvalho de Melo
9a1f27c480 inet_hashtables: Add inet_lookup_skb helpers
To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 11:41:57 -07:00
Felix Fietkau
870abdf671 mac80211: add multi-rate retry support
This patch adjusts the rate control API to allow multi-rate retry
if supported by the driver. The ieee80211_hw struct specifies how
many alternate rate selections the driver supports.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-06 18:14:57 -04:00
Felix Fietkau
76708dee38 mac80211: free up 2 bytes in skb->cb
Free up 2 bytes in skb->cb to be used for multi-rate retry later.
Move iv_len and icv_len initialization into key alloc.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-06 18:14:57 -04:00
Jarek Poplawski
554794de79 pkt_sched: Fix handling of gso skbs on requeuing
Jay Cliburn noticed and diagnosed a bug triggered in
dev_gso_skb_destructor() after last change from qdisc->gso_skb
to qdisc->requeue list. Since gso_segmented skbs can't be queued
to another list this patch brings back qdisc->gso_skb for them.

Reported-by: Jay Cliburn <jcliburn@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-06 09:54:39 -07:00
Arnaud Ebalard
13c1d18931 xfrm: MIGRATE enhancements (draft-ebalard-mext-pfkey-enhanced-migrate)
Provides implementation of the enhancements of XFRM/PF_KEY MIGRATE mechanism
specified in draft-ebalard-mext-pfkey-enhanced-migrate-00. Defines associated
PF_KEY SADB_X_EXT_KMADDRESS extension and XFRM/netlink XFRMA_KMADDRESS
attribute.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-05 13:33:42 -07:00
Rémi Denis-Courmont
02a47617cd Phonet: implement GPRS virtual interface over PEP socket
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-05 11:16:16 -07:00
Rémi Denis-Courmont
c41bd97f81 Phonet: receive pipe control requests as out-of-band data
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-05 11:15:43 -07:00
Rémi Denis-Courmont
9641458d3e Phonet: Pipe End Point for Phonet Pipes protocol
This protocol provides some connection handling and negotiated
congestion control. Nokia cellular modems use it for bulk transfers.
It provides packet boundaries (hence SOCK_SEQPACKET). Congestion
control is per packet rather per byte, so we do not re-use the
generic socket memory accounting.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-05 11:15:13 -07:00
Rémi Denis-Courmont
9995a32b4d Phonet: connected sockets glue
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-05 11:14:48 -07:00
Vlad Yasevich
52cae8f06b sctp: try harder to figure out address family when checking wildcards
sctp_is_any() function that is used to check for wildcard addresses
only looks at the address itself to determine the address family.
This function is used in the API to check the address passed in from
the user.  If the user simply zerroes out the sockaddr_storage and
pass that in, we'll end up failing.  So, let's try harder to determine
the address family by also checking the socket if it's possible.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-10-01 11:33:06 -04:00
Neil Horman
c226ef9b83 sctp: reduce memory footprint of sctp_chunk structure
sctp_chunks should be put on a diet.  This is some of the low hanging
fruit that we can strip out.  Changes all the __s8/__u8 flags to
bitfields.  Saves 12 bytes per chunk.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-10-01 11:33:06 -04:00
KOVACS Krisztian
bcd41303f4 udp: Export UDP socket lookup function
The iptables tproxy code has to be able to do UDP socket hash lookups,
so we have to provide an exported lookup function for this purpose.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:48:10 -07:00
KOVACS Krisztian
a3116ac5c2 tcp: Port redirection support for TCP
Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.

This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:46:49 -07:00
KOVACS Krisztian
86b08d867d ipv4: Make Netfilter's ip_route_me_harder() non-local address compatible
Netfilter's ip_route_me_harder() tries to re-route packets either
generated or re-routed by Netfilter. This patch changes
ip_route_me_harder() to handle packets from non-locally-bound sockets
with IP_TRANSPARENT set as local and to set the appropriate flowi
flags when re-doing the routing lookup.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:44:42 -07:00
KOVACS Krisztian
88ef4a5a78 tcp: Handle TCP SYN+ACK/ACK/RST transparency
The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to
incoming packets. The non-local source address check on output bites
us again, as replies for transparently redirected traffic won't have a
chance to leave the node.

This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing the
route lookup for those replies. Transparent replies are enabled if the
listening socket has the transparent socket flag set.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:41:00 -07:00
KOVACS Krisztian
79876874ce ipv4: Conditionally enable transparent flow flag when connecting
Set FLOWI_FLAG_ANYSRC in flowi->flags if the socket has the
transparent socket option set. This way we selectively enable certain
connections with non-local source addresses to be routed.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:35:39 -07:00
KOVACS Krisztian
1668e010cb ipv4: Make inet_sock.h independent of route.h
inet_iif() in inet_sock.h requires route.h. Since users of inet_iif()
usually require other route.h functionality anyway this patch moves
inet_iif() to route.h.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:33:10 -07:00
KOVACS Krisztian
f5715aea45 ipv4: Implement IP_TRANSPARENT socket option
This patch introduces the IP_TRANSPARENT socket option: enabling that
will make the IPv4 routing omit the non-local source address check on
output. Setting IP_TRANSPARENT requires NET_ADMIN capability.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:30:02 -07:00
Julian Anastasov
a210d01ae3 ipv4: Loosen source address check on IPv4 output
ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.

This patch introduces a flowi flag which makes omitting this check
possible. The new flag provides a way of handling transparent and
non-transparent connections differently.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:28:28 -07:00
Herbert Xu
12a169e7d8 ipsec: Put dumpers on the dump list
Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:

As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep.  It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.

I've also changed the order of addition on new states to prevent
a never-ending dump.

Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:03:24 -07:00
David S. Miller
b262e60309 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/ath9k/core.c
	drivers/net/wireless/ath9k/main.c
	net/core/dev.c
2008-10-01 06:12:56 -07:00
Ilpo Järvinen
93c8b90f01 ipv6: almost identical frag hashing funcs combined
$ diff-funcs ip6qhashfn reassembly.c netfilter/nf_conntrack_reasm.c
 --- reassembly.c:ip6qhashfn()
 +++ netfilter/nf_conntrack_reasm.c:ip6qhashfn()
@@ -1,5 +1,5 @@
-static unsigned int ip6qhashfn(__be32 id, struct in6_addr *saddr,
-			       struct in6_addr *daddr)
+static unsigned int ip6qhashfn(__be32 id, const struct in6_addr *saddr,
+			       const struct in6_addr *daddr)
 {
 	u32 a, b, c;

@@ -9,7 +9,7 @@

 	a += JHASH_GOLDEN_RATIO;
 	b += JHASH_GOLDEN_RATIO;
-	c += ip6_frags.rnd;
+	c += nf_frags.rnd;
 	__jhash_mix(a, b, c);

 	a += (__force u32)saddr->s6_addr32[3];

And codiff xx.o.old xx.o.new:

net/ipv6/netfilter/nf_conntrack_reasm.c:
  ip6qhashfn         | -512
  nf_hashfn          |   +6
  nf_ct_frag6_gather |  +36
 3 functions changed, 42 bytes added, 512 bytes removed, diff: -470
net/ipv6/reassembly.c:
  ip6qhashfn    | -512
  ip6_hashfn    |   +7
  ipv6_frag_rcv |  +89
 3 functions changed, 96 bytes added, 512 bytes removed, diff: -416

net/ipv6/reassembly.c:
  inet6_hash_frag | +510
 1 function changed, 510 bytes added, diff: +510

Total: -376

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 02:48:31 -07:00
John W. Linville
55ad175fb6 ieee80211.h: remove superfluous ETH_P_PAE definition
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-30 14:07:23 -04:00
Wei Yongjun
ba0166708e sctp: Fix kernel panic while process protocol violation parameter
Since call to function sctp_sf_abort_violation() need paramter 'arg' with
'struct sctp_chunk' type, it will read the chunk type and chunk length from
the chunk_hdr member of chunk. But call to sctp_sf_violation_paramlen()
always with 'struct sctp_paramhdr' type's parameter, it will be passed to
sctp_sf_abort_violation(). This may cause kernel panic.

   sctp_sf_violation_paramlen()
     |-- sctp_sf_abort_violation()
        |-- sctp_make_abort_violation()

This patch fixed this problem. This patch also fix two place which called
sctp_sf_violation_paramlen() with wrong paramter type.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-30 05:32:24 -07:00
Tejun Heo
72029fe85d 9p: implement proper trans module refcounting and unregistration
9p trans modules aren't refcounted nor were they unregistered
properly.  Fix it.

* Add 9p_trans_module->owner and reference the module on each trans
  instance creation and put it on destruction.

* Protect v9fs_trans_list with a spinlock.  This isn't strictly
  necessary as the list is manipulated only during module loading /
  unloading but it's a good idea to make the API safe.

* Unregister trans modules when the corresponding module is being
  unloaded.

* While at it, kill unnecessary EXPORT_SYMBOL on p9_trans_fd_init().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-24 16:22:23 -05:00
Johannes Berg
4b7679a561 mac80211: clean up rate control API
Long awaited, hard work. This patch totally cleans up the rate control
API to remove the requirement to include internal headers outside of
net/mac80211/.

There's one internal use in the PID algorithm left for mesh networking,
we'll have to figure out a way to clean that one up and decide how to
do the peer link evaluation, possibly independent of the rate control
algorithm or via new API.

Additionally, ath9k is left using the cross-inclusion hack for now, we
will add new API where necessary to make this work properly, but right
now I'm not expert enough to do it. It's still off better than before.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-24 16:18:03 -04:00
Johannes Berg
60719ffd72 cfg80211: show interface type
This patch makes cfg80211 show the interface in the nl80211
information about a specific interface. API users are required
to keep the type updated (everything else is fairly complicated)
but you will get a warning if you fail to keep it updated.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-24 16:18:00 -04:00
Johannes Berg
e07aa3783e cfg80211: fix code ordering in header file
Luis added the regulatory hint stuff to this file without
observing that __ieee80211_get_channel and ieee80211_get_channel
really belong together.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-24 16:17:59 -04:00
Jarek Poplawski
f4ab543201 pkt_sched: Remove the tx queue state check in qdisc_run()
The current check wrongly uses the state of one (currently the first)
tx queue for all tx queues in case of non-default qdiscs. This check
mainly prevented requeuing loop with __netif_schedule(), but now it's
controlled inside __qdisc_run(), while dequeuing. The wrongness of
this check was first noticed by Herbert Xu.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-23 01:05:56 -07:00
David S. Miller
cd07a8ea0d tcp: Use SKB queue handling interfaces instead of by-hand versions.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-23 00:50:13 -07:00
David S. Miller
d258b4914b tcp: Use skb_queue_is_last() instead of by-hand version.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-23 00:34:37 -07:00
David S. Miller
242f8bfefe pkt_sched: Make qdisc->gso_skb a list.
The idea is that we can use this to get rid of
->requeue().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 22:15:30 -07:00
David S. Miller
3d09274cc9 sctp: Use skb_queue_walk_safe() and skb_queue_split_tail_init().
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 22:14:36 -07:00
Remi Denis-Courmont
be0c52bfed Phonet: emit errors when a packet cannot be delivered locally
When there is no listener socket for a received packet, send an error
back to the sender.

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 20:09:13 -07:00
Remi Denis-Courmont
87ab4e20b4 Phonet: proc interface for port range
Phonet endpoints are bound to individual ports.
This provides a /proc/sys/net/phonet (or sysctl) interface for
selecting the range of automatically allocated ports (much like the
ip_local_port_range with IPv4).

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 20:08:39 -07:00
Remi Denis-Courmont
107d0d9b8d Phonet: Phonet datagram transport protocol
This provides the basic SOCK_DGRAM transport protocol for Phonet.

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 20:05:57 -07:00
Remi Denis-Courmont
ba113a94b7 Phonet: common socket glue
This provides the socket API for the Phonet protocols family.

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 20:05:19 -07:00
Remi Denis-Courmont
8fb397406f Phonet: Netlink interface
This provides support for configuring Phonet addresses, notifying
Phonet configuration changes, and dumping the configuration.

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 20:04:30 -07:00
Remi Denis-Courmont
f8ff60283d Phonet: network device and address handling
This provides support for adding Phonet addresses to and removing
Phonet addresses from network devices.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 20:03:44 -07:00
Remi Denis-Courmont
4b07b3f69a Phonet: PF_PHONET protocol family support
This is the basis for the Phonet protocol families, and introduces
the ETH_P_PHONET packet type and the PF_PHONET socket family.

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 20:02:10 -07:00
Herbert Xu
5c1824587f ipsec: Fix xfrm_state_walk race
As discovered by Timo Teräs, the currently xfrm_state_walk scheme
is racy because if a second dump finishes before the first, we
may free xfrm states that the first dump would walk over later.

This patch fixes this by storing the dumps in a list in order
to calculate the correct completion counter which cures this
problem.

I've expanded netlink_cb in order to accomodate the extra state
related to this.  It shouldn't be a big deal since netlink_cb
is kmalloced for each dump and we're just increasing it by 4 or
8 bytes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-22 19:48:19 -07:00
David S. Miller
43f59c8939 net: Remove __skb_insert() calls outside of skbuff internals.
This minor cleanup simplifies later changes which will convert
struct sk_buff and friends over to using struct list_head.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-21 21:28:51 -07:00
Ilpo Järvinen
ef9da47c7c tcp: don't clear retransmit_skb_hint when not necessary
Most importantly avoid doing it with cumulative ACK. Not clearing
means that we no longer need n^2 processing in resolution of each
fast recovery.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20 21:25:15 -07:00
Ilpo Järvinen
0e1c54c2a4 tcp: reorganize retransmit code loops
Both loops are quite similar, so they can be combined
with little effort. As a result, forward_skb_hint becomes
obsolete as well.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20 21:24:21 -07:00
Ilpo Järvinen
006f582c73 tcp: convert retransmit_cnt_hint to seqno
Main benefit in this is that we can then freely point
the retransmit_skb_hint to anywhere we want to because
there's no longer need to know what would be the count
changes involve, and since this is really used only as a
terminator, unnecessary work is one time walk at most,
and if some retransmissions are necessary after that
point later on, the walk is not full waste of time
anyway.

Since retransmit_high must be kept valid, all lost
markers must ensure that.

Now I also have learned how those "holes" in the
rexmittable skbs can appear, mtu probe does them. So
I removed the misleading comment as well.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20 21:20:20 -07:00
Ilpo Järvinen
64edc2736e tcp: Partial hint clearing has again become meaningless
Ie., the difference between partial and all clearing doesn't
exists anymore since the SACK optimizations got dropped by
an sacktag rewrite.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20 21:18:32 -07:00
Johannes Berg
25d834e162 mac80211: fix virtual interfaces vs. injection
Currently, virtual interface pointers passed to drivers might be
from monitor interfaces and as such completely uninitialised
because we do not tell the driver about monitor interfaces when
those are created. Instead of passing them, we should therefore
indicate to the driver that there is no information; do that by
passing a NULL value and adjust drivers to cope with it.

As a result, some mac80211 API functions also need to cope with
a NULL vif pointer so drivers can still call them unconditionally.

Also, when injecting frames we really don't want to pass NULL all
the time, if we know we are the source address of a frame and have
a local interface for that address, we can to use that interface.
This also helps with processing the frame correctly for that
interface which will help the 802.11w implementation. It's not
entirely correct for VLANs or WDS interfaces because there the MAC
address isn't unique, but it's already a lot better than what we
do now.

Finally, when injecting without a matching local interface, don't
assign sequence numbers at all.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-15 16:48:25 -04:00
Johannes Berg
687c7c0807 mac80211: share sta_info->ht_info
Rate control algorithms may need access to a station's
HT capabilities, so share the ht_info struct in the
public station API.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-15 16:48:24 -04:00
Johannes Berg
323ce79a9c mac80211: share sta->supp_rates
As more preparation for a saner rate control algorithm API,
share the supported rates bitmap in the public API.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-15 16:48:24 -04:00
Johannes Berg
17741cdc26 mac80211: share STA information with driver
This patch changes mac80211 to share some more data about
stations with drivers. Should help iwlwifi and ath9k when
 they get around to updating, and might also help with
implementing rate control algorithms without internals.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Sujith Manoharan <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-15 16:48:23 -04:00
Johannes Berg
05c914fe33 mac80211: use nl80211 interface types
There's really no reason for mac80211 to be using its
own interface type defines. Use the nl80211 types and
simplify the configuration code a bit: there's no need
to translate them any more now.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-15 16:48:23 -04:00
Johannes Berg
96dd22ac06 mac80211: inform driver of basic rateset
Drivers need to know the basic rateset to be able to configure
the ACK/CTS programming in hardware correctly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-15 16:48:22 -04:00
Johannes Berg
5bc75728fd mac80211: fix scan vs. interface removal race
When we remove an interface, we can currently end up having
a pointer to it left in local->scan_sdata after it has been
set down, and then with a hardware scan the scan completion
can try to access it which is a bug. Alternatively, a scan
that started as a hardware scan may terminate as though it
was a software scan, if the timing is just right.

On SMP systems, software scan also has a similar problem,
just canceling the delayed work and setting a flag isn't
enough since it may be running concurrently; in this case
we would also never restore state of other interfaces.

This patch hopefully fixes the problems by always invoking
ieee80211_scan_completed or requiring it to be invoked by
the driver, I suspect the drivers that have ->hw_scan() are
buggy. The bug will not manifest itself unless you remove
the interface while hw-scanning which will also turn off
the hw, and then add a new interface which will be unusable
until you scan once.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-15 16:48:20 -04:00
Luis R. Rodriguez
b2e1b30290 cfg80211: Add new wireless regulatory infrastructure
This adds the new wireless regulatory infrastructure. The
main motiviation behind this was to centralize regulatory
code as each driver was implementing their own regulatory solution,
and to replace the initial centralized code we have where:

* only 3 regulatory domains are supported: US, JP and EU
* regulatory domains can only be changed through module parameter
* all rules were built statically in the kernel

We now have support for regulatory domains for many countries
and regulatory domains are now queried through a userspace agent
through udev allowing distributions to update regulatory rules
without updating the kernel.

Each driver can regulatory_hint() a regulatory domain
based on either their EEPROM mapped regulatory domain value to a
respective ISO/IEC 3166-1 country code or pass an internally built
regulatory domain. We also add support to let the user set the
regulatory domain through userspace in case of faulty EEPROMs to
further help compliance.

Support for world roaming will be added soon for cards capable of
this.

For more information see:

http://wireless.kernel.org/en/developers/Regulatory/CRDA

For now we leave an option to enable the old module parameter,
ieee80211_regdom, and to build the 3 old regdomains statically
(US, JP and EU). This option is CONFIG_WIRELESS_OLD_REGULATORY.
These old static definitions and the module parameter is being
scheduled for removal for 2.6.29. Note that if you use this
you won't make use of a world regulatory domain as its pointless.
If you leave this option enabled and if CRDA is present and you
use US or JP we will try to ask CRDA to update us a regulatory
domain for us.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-15 16:48:19 -04:00
Alexander Duyck
ca9b0e27e0 pkt_action: add new action skbedit
This new action will have the ability to change the priority and/or
queue_mapping fields on an sk_buff.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-12 16:30:20 -07:00
Vegard Nossum
1045b03e07 netlink: fix overrun in attribute iteration
kmemcheck reported this:

  kmemcheck: Caught 16-bit read from uninitialized memory (f6c1ba30)
  0500110001508abf050010000500000002017300140000006f72672e66726565
   i i i i i i i i i i i i i u u u u u u u u u u u u u u u u u u u
                                   ^

  Pid: 3462, comm: wpa_supplicant Not tainted (2.6.27-rc3-00054-g6397ab9-dirty #13)
  EIP: 0060:[<c05de64a>] EFLAGS: 00010296 CPU: 0
  EIP is at nla_parse+0x5a/0xf0
  EAX: 00000008 EBX: fffffffd ECX: c06f16c0 EDX: 00000005
  ESI: 00000010 EDI: f6c1ba30 EBP: f6367c6c ESP: c0a11e88
   DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  CR0: 8005003b CR2: f781cc84 CR3: 3632f000 CR4: 000006d0
  DR0: c0ead9bc DR1: 00000000 DR2: 00000000 DR3: 00000000
  DR6: ffff4ff0 DR7: 00000400
   [<c05d4b23>] rtnl_setlink+0x63/0x130
   [<c05d5f75>] rtnetlink_rcv_msg+0x165/0x200
   [<c05ddf66>] netlink_rcv_skb+0x76/0xa0
   [<c05d5dfe>] rtnetlink_rcv+0x1e/0x30
   [<c05dda21>] netlink_unicast+0x281/0x290
   [<c05ddbe9>] netlink_sendmsg+0x1b9/0x2b0
   [<c05beef2>] sock_sendmsg+0xd2/0x100
   [<c05bf945>] sys_sendto+0xa5/0xd0
   [<c05bf9a6>] sys_send+0x36/0x40
   [<c05c03d6>] sys_socketcall+0x1e6/0x2c0
   [<c020353b>] sysenter_do_call+0x12/0x3f
   [<ffffffff>] 0xffffffff

This is the line in nla_ok():

  /**
   * nla_ok - check if the netlink attribute fits into the remaining bytes
   * @nla: netlink attribute
   * @remaining: number of bytes remaining in attribute stream
   */
  static inline int nla_ok(const struct nlattr *nla, int remaining)
  {
          return remaining >= sizeof(*nla) &&
                 nla->nla_len >= sizeof(*nla) &&
                 nla->nla_len <= remaining;
  }

It turns out that remaining can become negative due to alignment in
nla_next(). But GCC promotes "remaining" to unsigned in the test
against sizeof(*nla) above. Therefore the test succeeds, and the
nla_for_each_attr() may access memory outside the received buffer.

A short example illustrating this point is here:

  #include <stdio.h>

  main(void)
  {
          printf("%d\n", -1 >= sizeof(int));
  }

...which prints "1".

This patch adds a cast in front of the sizeof so that GCC will make
a signed comparison and fix the illegal memory dereference. With the
patch applied, there is no kmemcheck report.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-11 19:05:29 -07:00
Johannes Berg
fe3fa82731 mac80211: make conf_tx non-atomic
The conf_tx callback currently needs to be atomic, this requirement
is just because it can be called from scanning. This rearranges it
slightly to only update while not scanning (which is fine, we'll be
getting beacons when associated) and thus removes the atomic
requirement.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-11 15:53:34 -04:00
Herbert Xu
abb81c4f3c ipsec: Use RCU-like construct for saved state within a walk
Now that we save states within a walk we need synchronisation
so that the list the saved state is on doesn't disappear from
under us.

As it stands this is done by keeping the state on the list which
is bad because it gets in the way of the management of the state
life-cycle.

An alternative is to make our own pseudo-RCU system where we use
counters to indicate which state can't be freed immediately as
it may be referenced by an ongoing walk when that resumes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09 19:58:29 -07:00
David S. Miller
dacc62dbf5 Merge branch 'lvs-next-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-2.6 2008-09-09 19:51:04 -07:00
David S. Miller
47abf28d5b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2008-09-09 19:28:03 -07:00
Simon Horman
c051a0a2c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into lvs-next-2.6 2008-09-10 09:14:52 +10:00
Gerrit Renker
410e27a49b This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp"
as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-09 13:27:22 +02:00
David S. Miller
fd9ec7d31f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2008-09-09 02:11:11 -07:00
Marcel Holtmann
e7c29cb16c [Bluetooth] Reject L2CAP connections on an insecure ACL link
The Security Mode 4 of the Bluetooth 2.1 specification has strict
authentication and encryption requirements. It is the initiators job
to create a secure ACL link. However in case of malicious devices, the
acceptor has to make sure that the ACL is encrypted before allowing
any kind of L2CAP connection. The only exception here is the PSM 1 for
the service discovery protocol, because that is allowed to run on an
insecure ACL link.

Previously it was enough to reject a L2CAP connection during the
connection setup phase, but with Bluetooth 2.1 it is forbidden to
do any L2CAP protocol exchange on an insecure link (except SDP).

The new hci_conn_check_link_mode() function can be used to check the
integrity of an ACL link. This functions also takes care of the cases
where Security Mode 4 is disabled or one of the devices is based on
an older specification.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-09-09 07:19:20 +02:00
Marcel Holtmann
09ab6f4c23 [Bluetooth] Enforce correct authentication requirements
With the introduction of Security Mode 4 and Simple Pairing from the
Bluetooth 2.1 specification it became mandatory that the initiator
requires authentication and encryption before any L2CAP channel can
be established. The only exception here is PSM 1 for the service
discovery protocol (SDP). It is meant to be used without any encryption
since it contains only public information. This is how Bluetooth 2.0
and before handle connections on PSM 1.

For Bluetooth 2.1 devices the pairing procedure differentiates between
no bonding, general bonding and dedicated bonding. The L2CAP layer
wrongly uses always general bonding when creating new connections, but it
should not do this for SDP connections. In this case the authentication
requirement should be no bonding and the just-works model should be used,
but in case of non-SDP connection it is required to use general bonding.

If the new connection requires man-in-the-middle (MITM) protection, it
also first wrongly creates an unauthenticated link key and then later on
requests an upgrade to an authenticated link key to provide full MITM
protection. With Simple Pairing the link key generation is an expensive
operation (compared to Bluetooth 2.0 and before) and doing this twice
during a connection setup causes a noticeable delay when establishing
a new connection. This should be avoided to not regress from the expected
Bluetooth 2.0 connection times. The authentication requirements are known
up-front and so enforce them.

To fulfill these requirements the hci_connect() function has been extended
with an authentication requirement parameter that will be stored inside
the connection information and can be retrieved by userspace at any
time. This allows the correct IO capabilities exchange and results in
the expected behavior.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-09-09 07:19:20 +02:00
David S. Miller
0a68a20cc3 Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp
Conflicts:

	net/dccp/input.c
	net/dccp/options.c
2008-09-08 17:28:59 -07:00
David S. Miller
17dce5dfe3 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	net/mac80211/mlme.c
2008-09-08 16:59:05 -07:00
Sven Wegener
e9c0ce232e ipvs: Embed user stats structure into kernel stats structure
Instead of duplicating the fields, integrate a user stats structure into
the kernel stats structure. This is more robust when the members are
changed, because they are now automatically kept in sync.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Reviewed-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-09 09:53:08 +10:00
Sven Wegener
2206a3f5b7 ipvs: Restrict connection table size via Kconfig
Instead of checking the value in include/net/ip_vs.h, we can just
restrict the range in our Kconfig file. This will prevent values outside
of the range early.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Reviewed-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-09 09:50:55 +10:00
Daniel Lezcano
d315492b1a netns : fix kernel panic in timewait socket destruction
How to reproduce ?
 - create a network namespace
 - use tcp protocol and get timewait socket
 - exit the network namespace
 - after a moment (when the timewait socket is destroyed), the kernel
   panics.

# BUG: unable to handle kernel NULL pointer dereference at
0000000000000007
IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
PGD 119985067 PUD 11c5c0067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>]
inet_twdr_do_twkill_work+0x6e/0xb8
RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
ffff88011ff76690)
Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
0000000000000008
0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
Call Trace:
<IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e
[<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e
[<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193
[<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd
[<ffffffff8200d08c>] ? call_softirq+0x1c/0x28
[<ffffffff8200e611>] ? do_softirq+0x2c/0x68
[<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9
[<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b
[<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d


Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0
48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
RSP <ffff88011ff7fed0>
CR2: 0000000000000007

This patch provides a function to purge all timewait sockets related
to a network namespace. The timewait sockets life cycle is not tied with
the network namespace, that means the timewait sockets stay alive while
the network namespace dies. The timewait sockets are for avoiding to
receive a duplicate packet from the network, if the network namespace is
freed, the network stack is removed, so no chance to receive any packets
from the outside world. Furthermore, having a pending destruction timer
on these sockets with a network namespace freed is not safe and will lead
to an oops if the timer callback which try to access data belonging to 
the namespace like for example in:
	inet_twdr_do_twkill_work
		-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);

Purging the timewait sockets at the network namespace destruction will:
 1) speed up memory freeing for the namespace
 2) fix kernel panic on asynchronous timewait destruction

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-08 13:17:27 -07:00
Luis R. Rodriguez
f59ac04816 cfg80211: keep track of supported interface modes
It is obviously good for userspace to know up front which
interface modes a given piece of hardware might support (even
if adding such an interface might fail later because of
concurrency issues), so let's make cfg80211 aware of that.
For good measure, disallow adding interfaces in all other
modes so drivers don't forget to announce support for one mode
when they add it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stephen Blackheath <tramp.enshrine.stephen@blacksapphire.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-09-05 16:17:42 -04:00
Julius Volz
cfc78c5a09 IPVS: Adjust various debug outputs to use new macros
Adjust various debug outputs to use the new *_BUF macro variants for
correct output of v4/v6 addresses.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:12 +10:00
Julius Volz
7937df1564 IPVS: Convert real server lookup functions
Convert functions for looking up destinations (real servers) to support
IPv6 services/dests.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:10 +10:00
Julius Volz
b3cdd2a738 IPVS: Add and bind IPv6 xmit functions
Add xmit functions for IPv6. Also add the already needed __ip_vs_get_out_rt_v6()
to ip_vs_core.c. Bind the new xmit functions to v6 connections.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:08 +10:00
Julius Volz
28364a59f3 IPVS: Extend functions for getting/creating connections
Extend functions for getting/creating connections and connection
templates for IPv6 support and fix the callers.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:08 +10:00
Julius Volz
0bbdd42b7e IPVS: Extend protocol DNAT/SNAT and state handlers
Extend protocol DNAT/SNAT and state handlers to work with IPv6. Also
change/introduce new checksumming helper functions for this.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:07 +10:00
Julius Volz
51ef348b14 IPVS: Add 'af' args to protocol handler functions
Add 'af' arguments to conn_schedule(), conn_in_get(), conn_out_get() and
csum_check() function pointers in struct ip_vs_protocol. Extend the
respective functions for TCP, UDP, AH and ESP and adjust the callers.

The changes in the callers need to be somewhat extensive, since they now
need to pass a filled out struct ip_vs_iphdr * to the modified functions
instead of a struct iphdr *.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:06 +10:00
Julius Volz
b14198f6c1 IPVS: Add IPv6 support flag to schedulers
Add 'supports_ipv6' flag to struct ip_vs_scheduler to indicate whether a
scheduler supports IPv6. Set the flag to 1 in schedulers that work with
IPv6, 0 otherwise. This flag is checked in a later patch while trying to
add a service with a specific scheduler. Adjust debug in v6-supporting
schedulers to work with both address families.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:06 +10:00
Julius Volz
3c2e0505d2 IPVS: Add v6 support to ip_vs_service_get()
Add support for selecting services based on their address family to
ip_vs_service_get() and adjust the callers.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:05 +10:00
Julius Volz
c860c6b147 IPVS: Add internal versions of sockopt interface structs
Add extended internal versions of struct ip_vs_service_user and struct
ip_vs_dest_user (the originals can't be modified as they are part
of the old sockopt interface). Adjust ip_vs_ctl.c to work with the new
data structures and add some minor AF-awareness.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:04 +10:00
Julius Volz
c842a3ada9 IPVS: Add debug macros for v4 and v6 address output
Add some debugging macros that allow conditional output of either v4 or v6
addresses, depending on an 'af' parameter. This is done by creating a
temporary string buffer in an outer debug macro and writing addresses'
string representations into it from another macro which can only be used
when inside the outer one.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:04 +10:00
Julius Volz
64aae3cb9f IPVS: Add general v4/v6 helper functions / data structures
Add a struct ip_vs_iphdr for easier handling of common v4 and v6 header
fields in the same code path. ip_vs_fill_iphdr() helps to fill this struct
from an IPv4 or IPv6 header. Add further helper functions for copying and
comparing addresses.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:03 +10:00
Julius Volz
e7ade46a53 IPVS: Change IPVS data structures to support IPv6 addresses
Introduce new 'af' fields into IPVS data structures for specifying an
entry's address family. Convert IP addresses to be of type union
nf_inet_addr.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:03 +10:00
Gerrit Renker
6224877b2c tcp/dccp: Consolidate common code for RFC 3390 conversion
This patch consolidates the code common to TCP and CCID-2:
 * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
 * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:39 +02:00
Thomas Graf
2c10b32bf5 netlink: Remove compat API for nested attributes
Removes all _nested_compat() functions from the API. The prio qdisc
no longer requires them and netem has its own format anyway. Their
existance is only confusing.

Resend: Also remove the wrapper macro.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-02 17:30:27 -07:00
David S. Miller
b171e19ed0 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	net/mac80211/mlme.c
2008-08-29 23:06:00 -07:00
David S. Miller
143b11c03c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-08-29 14:02:13 -07:00
Jouni Malinen
36aedc903e mac80211/cfg80211: HT capabilities for NEW_STA
Allow userspace (e.g., hostapd) to set HT capabilities for associated
STAs. This is based on a patch from Zhu Yi <yi.zhu@intel.com> (only
the NL80211_ATTR_HT_CAPABILITY for NEW_STA part is included here).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-08-29 16:24:09 -04:00
Jouni Malinen
9f1ba9062e mac80211/cfg80211: Add BSS configuration options for AP mode
This change adds a new cfg80211 command, NL80211_CMD_SET_BSS, to allow
AP mode BSS parameters to be changed from user space (e.g., hostapd).
The drivers using mac80211 are expected to be modified with separate
changes to use the new BSS info parameter for short slot time in the
bss_info_changed() handler.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-08-29 16:23:55 -04:00
Alexey Dobriyan
af01d53746 net: more #ifdef CONFIG_COMPAT
All users of struct proto::compat_[gs]etsockopt and
struct inet_connection_sock_af_ops::compat_[gs]etsockopt are under
#ifdef already, so use it in structure definition too.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-28 02:53:51 -07:00
Jarek Poplawski
fe439dd09d pkt_sched: Fix sch_tree_lock()
Use new qdisc_root_sleeping_lock() instead of qdisc_root_lock() as
sch_tree_lock() because this lock could be used while dev is
deactivated, but we never need to use this with noop_qdisc as a root.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-27 02:27:10 -07:00
Jarek Poplawski
f6f9b93f16 pkt_sched: Fix gen_estimator locks
While passing a qdisc root lock to gen_new_estimator() and
gen_replace_estimator() dev could be deactivated or even before
grafting proper root qdisc as qdisc_sleeping (e.g. qdisc_create), so
using qdisc_root_lock() is not enough. This patch adds
qdisc_root_sleeping_lock() for this, plus additional checks, where
necessary.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-27 02:25:17 -07:00
Simon Horman
7fd1067851 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-2.6 into lvs-next-2.6 2008-08-27 15:11:37 +10:00
Harvey Harrison
6b644e524b mac80211: remove ieee80211_get_hdrlen
All users have been moved over to the version taking a le16 frame control
rather than a cpu-endian value.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-08-22 16:29:54 -04:00
Bruno Randolf
b4f28bbb9b mac80211: add rx status flag for short preamble
and use it for the radiotap header

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-08-22 16:29:50 -04:00
Tomas Winkler
92ab853549 mac80211: add ieee80211_queue_stopped)
This patch adds ieee80211_queue_stopped that let drivers to query
queue status

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-08-22 16:29:50 -04:00
Jarek Poplawski
f6e0b239a2 pkt_sched: Fix qdisc list locking
Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup())
without rtnl_lock(), adding and deleting from a qdisc list needs
additional locking. This patch adds global spinlock qdisc_list_lock
and wrapper functions for modifying the list. It is considered as a
temporary solution until hfsc_dequeue(), netem_dequeue() and
tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone.

With feedback from Herbert Xu and David S. Miller.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-22 03:31:39 -07:00
Jarek Poplawski
2540e0511e pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race
dev_deactivate() can skip rescheduling of a qdisc by qdisc_watchdog()
or other timer calling netif_schedule() after dev_queue_deactivate().
We prevent this checking aliveness before scheduling the timer. Since
during deactivation the root qdisc is available only as qdisc_sleeping
additional accessor qdisc_root_sleeping() is created.

With feedback from Herbert Xu <herbert@gondor.apana.org.au>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-21 05:11:14 -07:00
Simon Horman
3f087668c4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-08-19 17:36:22 +10:00
David S. Miller
8e0f36ec37 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-08-18 21:15:44 -07:00
Luis R. Rodriguez
546c80c91f mac80211: remove kdoc references to IEEE80211_HW_HOST_GEN_BEACON_TEMPLATE
IEEE80211_HW_HOST_GEN_BEACON_TEMPLATE was made unnecessary in
the recent revamp on beacon configuration.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-08-18 11:05:14 -04:00
David S. Miller
1e0d5a5747 pkt_sched: No longer destroy qdiscs from RCU.
We can now kill them synchronously with all of the
previous dev_deactivate() cures.

This makes netdev destruction and shutdown saner as
the qdiscs hold references to the device.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-17 22:31:26 -07:00
David S. Miller
a9312ae893 pkt_sched: Add 'deactivated' state.
This new state lets dev_deactivate() mark a qdisc as having been
deactivated.

dev_queue_xmit() and ing_filter() check for this bit and do not
try to process the qdisc if the bit is set.

dev_deactivate() polls the qdisc after setting the bit, waiting
for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear.

This isn't perfect yet, but subsequent changesets will make it so.
This part is just one piece of the puzzle.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-17 21:51:03 -07:00
Simon Horman
51df190139 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-08-16 14:44:17 +10:00
Sven Wegener
a919cf4b6b ipvs: Create init functions for estimator code
Commit 8ab19ea36c ("ipvs: Fix possible deadlock
in estimator code") fixed a deadlock condition, but that condition can only
happen during unload of IPVS, because during normal operation there is at least
our global stats structure in the estimator list. The mod_timer() and
del_timer_sync() calls are actually initialization and cleanup code in
disguise. Let's make it explicit and move them to their own init and cleanup
function.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-15 09:26:15 +10:00
Brian Haley
191cd58250 netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr()
ipv6_dev_get_saddr() blindly de-references dst_dev to get the network
namespace, but some callers might pass NULL.  Change callers to pass a
namespace pointer instead.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-14 15:33:21 -07:00
Rami Rosen
6bf90b2bf4 ipv6: Kill unused ip6_prohibit_entry and ip6_blk_hole_entry declarations.
This patch removes ip6_prohibit_entry and ip6_blk_hole_entry
declarations from include/net/ip6_route.h as they are unused.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13 02:35:39 -07:00
Rami Rosen
83ac794f15 ipv6: ip6_route.h cleanup.
This patch removes rt6_lock declaration from include/net/ip6_route.h
as it is unused.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13 02:34:39 -07:00
David S. Miller
83f36f3f35 pkt_sched: Add queue stopped test back to qdisc_run().
Based upon a bug report by Andrew Gallatin on netdev
with subject "CPU utilization increased in 2.6.27rc"

In commit 37437bb2e1
("pkt_sched: Schedule qdiscs instead of netdev_queue.")
the test of the queue being stopped was erroneously
removed from qdisc_run().

When the TX queue of the device fills up, this omission
causes lots of extraneous useless work to be queued up
to softirq context, where we'll just return immediately
because the device is still stuffed up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13 02:13:34 -07:00
Sven Wegener
3a14a313f9 ipvs: Embed estimator object into stats object
There's no reason for dynamically allocating an estimator object for every
stats object. Directly embed an estimator object into every stats object and
switch to using the kernel-provided list implementation. This makes the code
much simpler and faster, as we do not need to traverse the list of all
estimators to find the one belonging to a stats object. There's no need to use
an rwlock, as we only have one reader. Also reorder the members of the
estimator structure slightly to avoid padding overhead. This can't be done
with the stats object as the members are currently copied to our user space
object via memcpy() and changing it would break ABI.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
2008-08-11 14:00:43 +02:00
Sven Wegener
5587da55fb ipvs: Mark net_vs_ctl_path const
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
2008-08-11 11:46:27 +02:00
Sven Wegener
afdd614071 ipvs: Use ARRAY_SIZE()
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
2008-08-11 11:45:48 +02:00
David S. Miller
32bb93b02d Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-08-07 02:10:27 -07:00
Jeff Garzik
3859069bc3 Merge branch 'for-jeff' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6 into tmp 2008-08-07 04:05:46 -04:00
Gui Jianfeng
6edafaaf6f tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup
If the following packet flow happen, kernel will panic.
MathineA			MathineB
		SYN
	---------------------->    
        	SYN+ACK
	<----------------------
		ACK(bad seq)
	---------------------->
When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is 
NULL at that moment, so kernel panic happens.
This patch fixes this bug.

OOPS output is as following:
[  302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
[  302.817075] Oops: 0000 [#1] SMP 
[  302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
[  302.849946] 
[  302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
[  302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
[  302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
[  302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
[  302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
[  302.868333]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
[  302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 
[  302.883275]        c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 
[  302.890971]        ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 
[  302.900140] Call Trace:
[  302.902392]  [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
[  302.907060]  [<c05d28f8>] tcp_check_req+0x156/0x372
[  302.910082]  [<c04250a3>] printk+0x14/0x18
[  302.912868]  [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
[  302.917423]  [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
[  302.920453]  [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
[  302.923865]  [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
[  302.928569]  [<c059e438>] dev_alloc_skb+0x11/0x25
[  302.931563]  [<c05a211f>] netif_receive_skb+0x2d6/0x33a
[  302.934914]  [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
[  302.938735]  [<c05a3b48>] net_rx_action+0x5c/0xfe
[  302.941792]  [<c042856b>] __do_softirq+0x5d/0xc1
[  302.944788]  [<c042850e>] __do_softirq+0x0/0xc1
[  302.948999]  [<c040564b>] do_softirq+0x55/0x88
[  302.951870]  [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
[  302.954986]  [<c04284da>] irq_exit+0x35/0x69
[  302.959081]  [<c0405717>] do_IRQ+0x99/0xae
[  302.961896]  [<c040422b>] common_interrupt+0x23/0x28
[  302.966279]  [<c040819d>] default_idle+0x2a/0x3d
[  302.969212]  [<c0402552>] cpu_idle+0xb2/0xd2
[  302.972169]  =======================
[  302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 
[  303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
[  303.018360] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-06 23:50:04 -07:00
David S. Miller
33e334950a Merge branch 'no-ath9k' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-08-05 01:28:35 -07:00
Rami Rosen
95c3e8bfcd ipv4: remove unused field in struct flowi (include/net/flow.h).
This patch removes an unused field (flags) from struct flowi; it seems
that this "flags" field was used once in the past for multipath
routing with FLOWI_FLAG_MULTIPATHOLDROUTE flag (which does no longer
exist); however, the "flags" field of struct flowi is not used
anymore.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-05 01:19:50 -07:00
David S. Miller
cc6533e98a net: Kill plain NET_XMIT_BYPASS.
dst_input() was doing something completely absurd, looping
on skb->dst->input() if NET_XMIT_BYPASS was seen, but these
functions never return such an error.

And as a result plain ole' NET_XMIT_BYPASS has no more
references and can be completely killed off.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 23:04:08 -07:00
Jarek Poplawski
c27f339af9 net_sched: Add qdisc __NET_XMIT_BYPASS flag
Patrick McHardy <kaber@trash.net> noticed that it would be nice to
handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag
__NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit().

David Miller <davem@davemloft.net> spotted a serious bug in the first
version of this patch.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 22:39:11 -07:00
Jarek Poplawski
378a2f090f net_sched: Add qdisc __NET_XMIT_STOLEN flag
Patrick McHardy <kaber@trash.net> noticed:
"The other problem that affects all qdiscs supporting actions is
TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
even though the packet is not queued, corrupting upper qdiscs'
qlen counters."

and later explained:
"The reason why it translates it at all seems to be to not increase
the drops counter. Within a single qdisc this could be avoided by
other means easily, upper qdiscs would still increase the counter
when we return anything besides NET_XMIT_SUCCESS though.

This means we need a new NET_XMIT return value to indicate this to
the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
in dev_queue_xmit, similar to NET_XMIT_BYPASS."

David Miller <davem@davemloft.net> noticed:
"Maybe these NET_XMIT_* values being passed around should be a set of
bits. They could be composed of base meanings, combined with specific
attributes.

So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"

The attributes get masked out by the top-level ->enqueue() caller,
such that the base meanings are the only thing that make their
way up into the stack. If it's only about communication within the
qdisc tree, let's simply code it that way."

This patch is trying to realize these ideas.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 22:31:03 -07:00
Tomas Winkler
ea95bba41e mac80211: make listen_interval be limited by low level driver
This patch makes possible for a driver to specify maximal listen interval
The possibility for user to configure listen interval is not implemented
yet, currently the maximum provided by the driver or 1 is used.
Mac80211 uses config handler to set listen interval for to the driver.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-08-04 15:09:07 -04:00
Emmanuel Grumbach
98f7dfd86c mac80211: pass dtim_period to low level driver
This patch adds the dtim_period in ieee80211_bss_conf, this allows the low
level driver to know the dtim_period, and to plan power save accordingly.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-08-04 15:09:07 -04:00
Herbert Xu
f880374c2f sctp: Drop ipfargok in sctp_xmit function
The ipfragok flag controls whether the packet may be fragmented
either on the local host on beyond.  The latter is only valid on
IPv4.

In fact, we never want to do the latter even on IPv4 when PMTU is
enabled.  This is because even though we can't fragment packets
within SCTP due to the prtocol's inherent faults, we can still
fragment it at IP layer.  By setting the DF bit we will improve
the PMTU process.

RFC 2960 only says that we SHOULD clear the DF bit in this case,
so we're compliant even if we set the DF bit.  In fact RFC 4960
no longer has this statement.

Once we make this change, we only need to control the local
fragmentation.  There is already a bit in the skb which controls
that, local_df.  So this patch sets that instead of using the
ipfragok argument.

The only complication is that there isn't a struct sock object
per transport, so for IPv4 we have to resort to changing the
pmtudisc field for every packet.  This should be safe though
as the protocol is single-threaded.

Note that after this patch we can remove ipfragok from the rest
of the stack too.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03 21:15:08 -07:00
David S. Miller
7e43f1128d pkt_sched: Make sure RTNL is held in qdisc_root_lock().
It is the only legal environment in which this can be
used.

Add some commentary explaining the situation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-02 23:27:37 -07:00
Julius Volz
bc4768eb08 ipvs: Move userspace definitions to include/linux/ip_vs.h
Current versions of ipvsadm include "/usr/src/linux/include/net/ip_vs.h"
directly. This file also contains kernel-only definitions. Normally, public
definitions should live in include/linux, so this patch moves the
definitions shared with userspace to a new file, "include/linux/ip_vs.h".

This also removes the unused NFC_IPVS_PROPERTY bitmask, which was once
used to point into skb->nfcache.

To make old ipvsadms still compile with this, the old header file includes
the new one.

Thanks to Dave Miller and Horms for noting/adding the missing Kbuild entry
for the new header file.

Signed-off-by: Julius Volz <juliusv@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31 20:45:24 -07:00
Johannes Berg
d0f0980414 mac80211: partially fix skb->cb use
This patch fixes mac80211 to not use the skb->cb over the queue step
from virtual interfaces to the master. The patch also, for now,
disables aggregation because that would still require requeuing,
will fix that in a separate patch. There are two other places (software
requeue and powersaving stations) where requeue can happen, but that is
not currently used by any drivers/not possible to use respectively.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-29 16:55:08 -04:00
Johannes Berg
605a0bd66d mac80211: remove IEEE80211_HW_HOST_GEN_BEACON_TEMPLATE flag
I forgot this in the previous patch that made it unused.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-29 16:36:24 -04:00
Al Viro
6f9f489a4e net: missing bits of net-namespace / sysctl
Piss-poor sysctl registration API strikes again, film at 11...
What we really need is _pathname_ required to be present in
already registered table, so that kernel could warn about bad
order.  That's the next target for sysctl stuff (and generally
saner and more explicit order of initialization of ipv[46]
internals wouldn't hurt either).

For the time being, here are full fixups required by ..._rotable()
stuff; we make per-net sysctl sets descendents of "ro" one and
make sure that sufficient skeleton is there before we start registering
per-net sysctls.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-27 04:40:51 -07:00
Linus Torvalds
4836e30078 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits)
  [PATCH] fix RLIM_NOFILE handling
  [PATCH] get rid of corner case in dup3() entirely
  [PATCH] remove remaining namei_{32,64}.h crap
  [PATCH] get rid of indirect users of namei.h
  [PATCH] get rid of __user_path_lookup_open
  [PATCH] f_count may wrap around
  [PATCH] dup3 fix
  [PATCH] don't pass nameidata to __ncp_lookup_validate()
  [PATCH] don't pass nameidata to gfs2_lookupi()
  [PATCH] new (local) helper: user_path_parent()
  [PATCH] sanitize __user_walk_fd() et.al.
  [PATCH] preparation to __user_walk_fd cleanup
  [PATCH] kill nameidata passing to permission(), rename to inode_permission()
  [PATCH] take noexec checks to very few callers that care
  Re: [PATCH 3/6] vfs: open_exec cleanup
  [patch 4/4] vfs: immutable inode checking cleanup
  [patch 3/4] fat: dont call notify_change
  [patch 2/4] vfs: utimes cleanup
  [patch 1/4] vfs: utimes: move owner check into inode_change_ok()
  [PATCH] vfs: use kstrdup() and check failing allocation
  ...
2008-07-26 20:23:44 -07:00
Linus Torvalds
2284284281 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  netns: fix ip_rt_frag_needed rt_is_expired
  netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences
  netfilter: fix double-free and use-after free
  netfilter: arptables in netns for real
  netfilter: ip{,6}tables_security: fix future section mismatch
  selinux: use nf_register_hooks()
  netfilter: ebtables: use nf_register_hooks()
  Revert "pkt_sched: sch_sfq: dump a real number of flows"
  qeth: use dev->ml_priv instead of dev->priv
  syncookies: Make sure ECN is disabled
  net: drop unused BUG_TRAP()
  net: convert BUG_TRAP to generic WARN_ON
  drivers/net: convert BUG_TRAP to generic WARN_ON
2008-07-26 20:17:56 -07:00
Al Viro
516e0cc564 [PATCH] f_count may wrap around
make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:40 -04:00
Al Viro
bd7b1533cd [PATCH] sysctl: make sure that /proc/sys/net/ipv4 appears before per-ns ones
Massage ipv4 initialization - make sure that net.ipv4 appears as
non-per-net-namespace before it shows up in per-net-namespace sysctls.
That's the only change outside of sysctl.c needed to get sane ordering
rules and data structures for sysctls (esp. for procfs side of that
mess).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:10 -04:00
Al Viro
734550921e [PATCH] beginning of sysctl cleanup - ctl_table_set
New object: set of sysctls [currently - root and per-net-ns].
Contains: pointer to parent set, list of tables and "should I see this set?"
method (->is_seen(set)).
Current lists of tables are subsumed by that; net-ns contains such a beast.
->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
that to ->list of that ctl_table_set.

[folded compile fixes by rdd for configs without sysctl]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:08 -04:00
Ilpo Järvinen
547b792cac net: convert BUG_TRAP to generic WARN_ON
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25 21:43:18 -07:00
Linus Torvalds
1ff8419871 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ipsec: ipcomp - Decompress into frags if necessary
  ipsec: ipcomp - Merge IPComp implementations
  pkt_sched: Fix locking in shutdown_scheduler_queue()
2008-07-25 17:40:16 -07:00
Harvey Harrison
8b5ac31e27 include: use get/put_unaligned_* helpers
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:26 -07:00
Herbert Xu
6fccab671f ipsec: ipcomp - Merge IPComp implementations
This patch merges the IPv4/IPv6 IPComp implementations since most
of the code is identical.  As a result future enhancements will no
longer need to be duplicated.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25 02:54:40 -07:00
Krzysztof Hałasa
efa415840d Remove bogus variables from syncppp.[ch]
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-07-23 23:00:31 +02:00
Stephen Hemminger
3d0f24a74e ipv6: icmp6_dst_gc return change
Change icmp6_dst_gc to return the one value the caller cares about rather
than using call by reference.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:35:50 -07:00
Stephen Hemminger
417f28bb34 netns: dont alloc ipv6 fib timer list
FIB timer list is a trivial size structure, avoid indirection and just
put it in existing ns.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:33:45 -07:00
Adrian Bunk
888c848ed3 ipv6: make struct ipv6_devconf static
struct ipv6_devconf can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:21:58 -07:00
Adrian Bunk
abd0b198ea sctp: make sctp_outq_flush() static
sctp_outq_flush() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:20:45 -07:00
Krzysztof Piotr Oledzki
584015727a netfilter: accounting rework: ct_extend + 64bit counters (v4)
Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. However, 64bit counters
waste a lot of memory and it was not possible to enable/disable it runtime.

This patch:
 - reimplements accounting with respect to the extension infrastructure,
 - makes one global version of seq_print_acct() instead of two seq_print_counters(),
 - makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n),
 - makes it possible to enable/disable it at runtime by sysctl or sysfs,
 - extends counters from 32bit to 64bit,
 - renames ip_conntrack_counter -> nf_conn_counter,
 - enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT),
 - set initial accounting enable state based on CONFIG_NF_CT_ACCT
 - removes buggy IPCT_COUNTER_FILLING event handling.

If accounting is enabled newly created connections get additional acct extend.
Old connections are not changed as it is not possible to add a ct_extend area
to confirmed conntrack. Accounting is performed for all connections with
acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct".

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 10:10:58 -07:00
Krzysztof Piotr Oledzki
07a7c1070e netlink: add NLA_PUT_BE64 macro
Add NLA_PUT_BE64 macro required for 64bit counters in netfilter

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 10:10:58 -07:00
David S. Miller
3a682fbd73 pkt_sched: Fix build with NET_SCHED disabled.
The stab bits can't be referenced uniless the full
packet scheduler layer is enabled.

Reported by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-20 18:13:01 -07:00
Jussi Kivilinna
175f9c1bba net_sched: Add size table for qdiscs
Add size table functions for qdiscs and calculate packet size in
qdisc_enqueue().

Based on patch by Patrick McHardy
 http://marc.info/?l=linux-netdev&m=115201979221729&w=2

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-20 00:08:47 -07:00
Jussi Kivilinna
0abf77e55a net_sched: Add accessor function for packet length for qdiscs
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-20 00:08:27 -07:00
Jussi Kivilinna
5f86173bdf net_sched: Add qdisc_enqueue wrapper
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-20 00:08:04 -07:00
YOSHIFUJI Hideaki
230b183921 net: Use standard structures for generic socket address structures.
Use sockaddr_storage{} for generic socket address storage
and ensures proper alignment.
Use sockaddr{} for pointers to omit several casts.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 22:35:47 -07:00
David S. Miller
407d819cf0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2008-07-19 00:30:39 -07:00
Denis V. Lunev
7abbcd6a4c ipv6: remove unused macros from net/ipv6.h
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 00:29:42 -07:00
Denis V. Lunev
725a8ff04a ipv6: remove unused parameter from ip6_ra_control
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 00:28:58 -07:00
Adam Langley
33ad798c92 tcp: options clean up
This should fix the following bugs:
  * Connections with MD5 signatures produce invalid packets whenever SACK
    options are included
  * MD5 signatures are counted twice in the MSS calculations

Behaviour changes:
  * A SYN with MD5 + SACK + TS elicits a SYNACK with MD5 + SACK

    This is because we can't fit any SACK blocks in a packet with MD5 + TS
    options. There was discussion about disabling SACK rather than TS in
    order to fit in better with old, buggy kernels, but that was deemed to
    be unnecessary.

  * SYNs with MD5 don't include a TS option

    See above.

Additionally, it removes a bunch of duplicated logic for calculating options,
which should help avoid these sort of issues in the future.

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 00:04:31 -07:00
Adam Langley
49a72dfb88 tcp: Fix MD5 signatures for non-linear skbs
Currently, the MD5 code assumes that the SKBs are linear and, in the case
that they aren't, happily goes off and hashes off the end of the SKB and
into random memory.

Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy
Polyakov. Also includes a couple of missed route_caps from Stephen's patch
in [2].

[1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2
[2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 00:01:42 -07:00
Harvey Harrison
336d3262df sctp: remove unnecessary byteshifting, calculate directly in big-endian
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 23:07:09 -07:00
Vlad Yasevich
7dab83de50 sctp: Support ipv6only AF_INET6 sockets.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 23:05:40 -07:00
Stephen Hemminger
c1e20f7c8b tcp: RTT metrics scaling
Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in
kernel units (jiffies) and this leaks out through the netlink API to
user space where the units for jiffies are unknown.

This patches changes the kernel to convert to/from milliseconds. This
changes the ABI, but milliseconds seemed like the most natural unit
for these parameters.  Values available via syscall in
/proc/net/rt_cache and netlink will be in milliseconds.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 23:02:15 -07:00
David S. Miller
72b25a913e pkt_sched: Get rid of u32_list.
The u32_list is just an indirect way of maintaining a reference
to a U32 node on a per-qdisc basis.

Just add an explicit node pointer for u32 to struct Qdisc an do
away with this global list.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 20:54:17 -07:00
Pavel Emelyanov
923c6586b0 mib: put icmpmsg statistics on struct net
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:04:22 -07:00
Pavel Emelyanov
b60538a0d7 mib: put icmp statistics on struct net
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:04:02 -07:00
Pavel Emelyanov
386019d351 mib: put udplite statistics on struct net
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:03:45 -07:00
Pavel Emelyanov
2f275f91a4 mib: put udp statistics on struct net
Similar to... ouch, I repeat myself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:03:27 -07:00
Pavel Emelyanov
61a7e26028 mib: put net statistics on struct net
Similar to ip and tcp ones :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:03:08 -07:00
Pavel Emelyanov
a20f5799ca mib: put ip statistics on struct net
Similar to tcp one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:02:42 -07:00
Pavel Emelyanov
57ef42d59d mib: put tcp statistics on struct net
Proc temporary uses stats from init_net.

BTW, TCP_XXX_STATS are beautiful (w/o do { } while (0) facing) again :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:02:08 -07:00
Pavel Emelyanov
852566f53c mib: add netns/mib.h file
The only structure declared within is the netns_mib, which will
carry all our mibs within. I didn't put the mibs in the existing
netns_xxx structures to make it possible to mark this one as
properly aligned and get in a separate "read-mostly" cache-line.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:01:24 -07:00
David S. Miller
93245dd6d3 pkt_sched: Don't used locked skb_queue_purge() in __qdisc_reset_queue()
We have to have exclusive access to the given qdisc anyways, so
doing even more locking is superfluous.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:32 -07:00
David S. Miller
8387400092 pkt_sched: Kill netdev_queue lock.
We can simply use the qdisc->q.lock for all of the
qdisc tree synchronization.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:30 -07:00
David S. Miller
c7e4f3bbb4 pkt_sched: Kill qdisc_lock_tree and qdisc_unlock_tree.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:29 -07:00
David S. Miller
78a5b30b73 pkt_sched: Rework {sch,tbf}_tree_lock().
Make sch_tree_lock() lock the qdisc's root.  All of the
users hold the RTNL semaphore and the root qdisc is not
changing.

Implement tbf_tree_{lock,unlock}() simply in terms of
sch_tree_{lock,unlock}().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:28 -07:00
David S. Miller
37437bb2e1 pkt_sched: Schedule qdiscs instead of netdev_queue.
When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.

Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.

Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:20 -07:00
David S. Miller
7698b4fcab pkt_sched: Add and use qdisc_root() and qdisc_root_lock().
When code wants to lock the qdisc tree state, the logic
operation it's doing is locking the top-level qdisc that
sits of the root of the netdev_queue.

Add qdisc_root_lock() to represent this and convert the
easiest cases.

In order for this to work out in all cases, we have to
hook up the noop_qdisc to a dummy netdev_queue.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:19 -07:00
David S. Miller
e2627c8c22 pkt_sched: Make QDISC_RUNNING a qdisc state.
Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:18 -07:00
David S. Miller
d3b753db7c pkt_sched: Move gso_skb into Qdisc.
We liberate any dangling gso_skb during qdisc destruction.

It really only matters for the root qdisc.  But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:18 -07:00
David S. Miller
51cb6db0f5 mac80211: Reimplement WME using ->select_queue().
The only behavior change is that we do not drop packets under any
circumstances.  If that is absolutely needed, we could easily add it
back.

With cleanups and help from Johannes Berg.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:12 -07:00
David S. Miller
fd2ea0a79f net: Use queue aware tests throughout.
This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.

Non-multiqueue drivers need no changes.  The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.

Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.

pktgen and netpoll required a little bit more surgery than the others.

In particular the pktgen changes, whilst functional, could be largely
improved.  The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless.  The thing to do is probably to
invoke fill_packet() earlier.

The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.

Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx().  If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.

Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.

With IGB changes from Jeff Kirsher.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:07 -07:00
David S. Miller
e8a0464cc9 netdev: Allocate multiple queues for TX.
alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces.  This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:00 -07:00
Neil Horman
9a6d276e85 core: add stat to track unresolved discards in neighbor cache
in __neigh_event_send, if we have a neighbour entry which is in
NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour
to the neighbours arp_queue, which is default capped to a length of 3
skbs.  If that queue exceeds its set length, it will drop an skb on
the queue to enqueue the newly arrived skb.  This results in a drop
for which we have no statistics incremented.  This patch adds an
unresolved_discards stat to /proc/net/stat/ndisc_cache to track these
lost frames.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:50:49 -07:00
Pavel Emelyanov
ed88098e25 mib: add net to NET_ADD_STATS_USER
Done with NET_XXX_STATS macros :)

To be continued...

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:32:45 -07:00
Pavel Emelyanov
f2bf415cfe mib: add net to NET_ADD_STATS_BH
This one is tricky. 

The thing is that this macro is only used when killing tw buckets, 
but since this killer is promiscuous wrt to which net each particular
tw belongs to, I have to use it only when NET_NS is off. When the net
namespaces are on, I use the INET_INC_STATS_BH for each bucket.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:32:25 -07:00
Pavel Emelyanov
6f67c817fc mib: add net to NET_INC_STATS_USER
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:31:39 -07:00
Pavel Emelyanov
de0744af1f mib: add net to NET_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:31:16 -07:00
Pavel Emelyanov
4e6734447d mib: add net to NET_INC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:30:14 -07:00
Pavel Emelyanov
5c52ba170f sock: add net to prot->enter_memory_pressure callback
The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't
have where to get the net from.

I decided to add a sk argument, not the net itself, only to factor
all the required sock_net(sk) calls inside the enter_memory_pressure 
callback itself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:28:10 -07:00
Pavel Emelyanov
cf1100a7a4 mib: add net to TCP_ADD_STATS_USER
Now we're done with the TCP_XXX_STATS macros.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:27:38 -07:00
Pavel Emelyanov
74688e487a mib: add net to TCP_DEC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:22:46 -07:00
Pavel Emelyanov
63231bddf6 mib: add net to TCP_INC_STATS_BH
Same as before - the sock is always there to get the net from,
but there are also some places with the net already saved on 
the stack.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:22:25 -07:00
Pavel Emelyanov
81cc8a75d9 mib: add net to TCP_INC_STATS
Fortunately (almost) all the TCP code has a sock to get the net from :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:22:04 -07:00
Pavel Emelyanov
a9c19329ec tcp: add net to tcp_mib_init
This one sets TCP MIBs after zeroing them, and thus requires
the net.

The existing single caller can use init_net (temporarily).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:21:42 -07:00
Pavel Emelyanov
f10f84314d mib: drop unused TCP_XXX_STATS macros
TCP_INC_STATS_USER and TCP_ADD_STATS_BH are currently unused.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:21:20 -07:00
Pavel Emelyanov
c5346fe396 mib: add net to IP_ADD_STATS_BH
Very simple - only ip_evictor (fragments) requires such.
This patch ends up the IP_XXX_STATS patching.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:20:33 -07:00
Pavel Emelyanov
7c73a6faff mib: add net to IP_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:20:11 -07:00
Pavel Emelyanov
5e38e27044 mib: add net to IP_INC_STATS
All the callers already have either the net itself, or the place
where to get it from.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:19:49 -07:00
Pavel Emelyanov
c6f8f7e3bb mib: drop unused IP_INC_STATS_USER
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:19:26 -07:00
Pavel Emelyanov
f66ac03d49 mib: add struct net to ICMPMSGIN_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:31 -07:00
Pavel Emelyanov
903fc1964e mib: add struct net to ICMPMSGOUT_INC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:30 -07:00
Pavel Emelyanov
dcfc23cac1 mib: add struct net to ICMP_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:29 -07:00
Pavel Emelyanov
75c939bb4d mib: add struct net to ICMP_INC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:28 -07:00
Pavel Emelyanov
43589aa93c icmp: drop unused MIB accounting wrappers
There are ICMP_XXX_STATS that are not used in the kernel, so I remove
them, not to "just patch" them later. But if there's some sense in
keeping them, kick me - I will remake this set keeping them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:26 -07:00
Pavel Emelyanov
0388b00426 icmp: add struct net argument to icmp_out_count
This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:13 -07:00
Allan Stephens
0ea522416b tipc: Remove unneeded parameter to tipc_createport_raw()
This patch eliminates an unneeded parameter when creating a low-level
TIPC port object.  Instead of returning both the pointer to the port
structure and the port's reference ID, it now returns only the pointer
since the port structure contains the reference ID as one of its fields.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:42:19 -07:00
David S. Miller
fc943b12e4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-07-14 20:40:34 -07:00
David S. Miller
4c88949800 netfilter: Let nf_ct_kill() callers know if del_timer() returned true.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 20:22:38 -07:00
Johannes Berg
f434b2d111 mac80211: fix struct ieee80211_tx_queue_params
Multiple issues:
 - there are no "default" values needed
 - cw_min/cw_max can be larger than documented
 - restructure to decrease size
 - use get_unaligned_le16

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14 14:52:57 -04:00
Johannes Berg
f591fa5dbb mac80211: fix TX sequence numbers
This patch makes mac80211 assign proper sequence numbers to
QoS-data frames. It also removes the old sequence number code
because we noticed that only the driver or hardware can assign
sequence numbers to non-QoS-data and especially management
frames in a race-free manner because beacons aren't passed
through mac80211's TX path.

This patch also adds temporary code to the rt2x00 drivers to
not break them completely, that code will have to be reworked
for proper sequence numbers on beacons.

It also moves sequence number assignment down in the TX path
so no sequence numbers are assigned to frames that are dropped.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14 14:52:57 -04:00
Johannes Berg
9d139c810a mac80211: revamp beacon configuration
This patch changes mac80211's beacon configuration handling
to never pass skbs to the driver directly but rather always
require the driver to use ieee80211_beacon_get(). Additionally,
it introduces "change flags" on the config_interface() call
to enable drivers to figure out what is changing. Finally, it
removes the beacon_update() driver callback in favour of
having IBSS beacon delivered by ieee80211_beacon_get() as well.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14 14:30:07 -04:00
Samuel Ortiz
49292d5635 mac80211: power management wext hooks
This patch implements the power management routines wireless extensions
for mac80211.
For now we only support switching PS mode between on and off.

Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14 14:30:06 -04:00
Marcel Holtmann
8b6b3da765 [Bluetooth] Store remote modem status for RFCOMM TTY
When switching a RFCOMM socket to a TTY, the remote modem status might
be needed later. Currently it is lost since the original configuration
is done via the socket interface. So store the modem status and reply
it when the socket has been converted to a TTY.

Signed-off-by: Denis Kenzior <denis.kenzior@trolltech.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:52 +02:00
Marcel Holtmann
3241ad820d [Bluetooth] Add timestamp support to L2CAP, RFCOMM and SCO
Enable the common timestamp functionality that the network subsystem
provides for L2CAP, RFCOMM and SCO sockets. It is possible to either
use SO_TIMESTAMP or the IOCTLs to retrieve the timestamp of the
current packet.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:50 +02:00
Marcel Holtmann
40be492fe4 [Bluetooth] Export details about authentication requirements
With the Simple Pairing support, the authentication requirements are
an explicit setting during the bonding process. Track and enforce the
requirements and allow higher layers like L2CAP and RFCOMM to increase
them if needed.

This patch introduces a new IOCTL that allows to query the current
authentication requirements. It is also possible to detect Simple
Pairing support in the kernel this way.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:50 +02:00
Marcel Holtmann
769be974d0 [Bluetooth] Use ACL config stage to retrieve remote features
The Bluetooth technology introduces new features on a regular basis
and for some of them it is important that the hardware on both sides
support them. For features like Simple Pairing it is important that
the host stacks on both sides have switched this feature on. To make
valid decisions, a config stage during ACL link establishment has been
introduced that retrieves remote features and if needed also the remote
extended features (known as remote host features) before signalling
this link as connected.

This change introduces full reference counting of incoming and outgoing
ACL links and the Bluetooth core will disconnect both if no owner of it
is present. To better handle interoperability during the pairing phase
the disconnect timeout for incoming connections has been increased to
10 seconds. This is five times more than for outgoing connections.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:49 +02:00
Marcel Holtmann
41a96212b3 [Bluetooth] Track status of remote Simple Pairing mode
The Simple Pairing process can only be used if both sides have the
support enabled in the host stack. The current Bluetooth specification
has three ways to detect this support.

If an Extended Inquiry Result has been sent during inquiry then it
is safe to assume that Simple Pairing is enabled. It is not allowed
to enable Extended Inquiry without Simple Pairing. During the remote
name request phase a notification with the remote host supported
features will be sent to indicate Simple Pairing support. Also the
second page of the remote extended features can indicate support for
Simple Pairing.

For all three cases the value of remote Simple Pairing mode is stored
in the inquiry cache for later use.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:48 +02:00
Marcel Holtmann
333140b57f [Bluetooth] Track status of Simple Pairing mode
The Simple Pairing feature is optional and needs to be enabled by the
host stack first. The Linux kernel relies on the Bluetooth daemon to
either enable or disable it, but at any time it needs to know the
current state of the Simple Pairing mode. So track any changes made
by external entities and store the current mode in the HCI device
structure.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:48 +02:00
Marcel Holtmann
0493684ed2 [Bluetooth] Disable disconnect timer during Simple Pairing
During the Simple Pairing process the HCI disconnect timer must be
disabled. The way to do this is by holding a reference count of the
HCI connection. The Simple Pairing process on both sides starts with
an IO Capabilities Request and ends with Simple Pairing Complete.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:48 +02:00
Marcel Holtmann
e4e8e37c42 [Bluetooth] Make use of the default link policy settings
The Bluetooth specification supports the default link policy settings
on a per host controller basis. For every new connection the link
manager would then use these settings. It is better to use this instead
of bothering the controller on every connection setup to overwrite the
default settings.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:47 +02:00
Marcel Holtmann
a8746417e8 [Bluetooth] Track connection packet type changes
The connection packet type can be changed after the connection has been
established and thus needs to be properly tracked to ensure that the
host stack has always correct and valid information about it.

On incoming connections the Bluetooth core switches the supported packet
types to the configured list for this controller. However the usefulness
of this feature has been questioned a lot. The general consent is that
every Bluetooth host stack should enable as many packet types as the
hardware actually supports and leave the decision to the link manager
software running on the Bluetooth chip.

When running on Bluetooth 2.0 or later hardware, don't change the packet
type for incoming connections anymore. This hardware likely supports
Enhanced Data Rate and thus leave it completely up to the link manager
to pick the best packet type.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:46 +02:00
Marcel Holtmann
9719f8afce [Bluetooth] Disconnect when encryption gets disabled
The Bluetooth specification allows to enable or disable the encryption
of an ACL link at any time by either the peer or the remote device. If
a L2CAP or RFCOMM connection requested an encrypted link, they will now
disconnect that link if the encryption gets disabled. Higher protocols
that don't care about encryption (like SDP) are not affected.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:45 +02:00
Marcel Holtmann
77db198056 [Bluetooth] Enforce security for outgoing RFCOMM connections
Recent tests with various Bluetooth headsets have shown that some of
them don't enforce authentication and encryption when connecting. All
of them leave it up to the host stack to enforce it. Non of them should
allow unencrypted connections, but that is how it is. So in case the
link mode settings require authentication and/or encryption it will now
also be enforced on outgoing RFCOMM connections. Previously this was
only done for incoming connections.

This support has a small drawback from a protocol level point of view
since the host stack can't really tell with 100% certainty if a remote
side is already authenticated or not. So if both sides are configured
to enforce authentication it will be requested twice. Most Bluetooth
chips are caching this information and thus no extra authentication
procedure has to be triggered over-the-air, but it can happen.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-07-14 20:13:45 +02:00
David S. Miller
79d16385c7 netdev: Move atomic queue state bits into netdev_queue.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:14:46 -07:00
David S. Miller
eb6aafe3f8 pkt_sched: Make qdisc_run take a netdev_queue.
This allows us to use this calling convention all the way down into
qdisc_restart().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:12:38 -07:00
David S. Miller
052979499c pkt_sched: Add qdisc_tx_is_noop() helper and use in IPV6.
This indicates if the NOOP scheduler is what is active for TX on a
given device.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:01:27 -07:00
David S. Miller
6fa9864b53 net: Clean up explicit ->tx_queue references in link watch.
First, we add a qdisc_tx_changing() helper which returns true if the
qdisc attachment is in transition.

Second, we remove an assertion warning which is of limited value and
is hard to express precisely in a multiqueue environment.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:01:06 -07:00
David S. Miller
3e745dd695 pkt_sched: Add qdisc_all_tx_empty()
This is a helper function, currently used by IRDA.

This is being added so that we can contain and isolate as many
explicit ->tx_queue references in the tree as possible.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:00:25 -07:00
David S. Miller
5aa709954a pkt_sched: Add qdisc_reset_all_tx().
Isolate callers that want to simply reset all the TX qdiscs from the
details of TX queues.

Use this in the ISDN code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 22:59:10 -07:00
David S. Miller
68dfb42798 pkt_sched: Kill stats_lock member of struct Qdisc.
It is always equal to qdisc->dev_queue->lock

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 22:57:31 -07:00
David S. Miller
b0e1e6462d netdev: Move rest of qdisc state into struct netdev_queue
Now qdisc, qdisc_sleeping, and qdisc_list also live there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:42:10 -07:00
David S. Miller
5ce2d488fe pkt_sched: Remove 'dev' member of struct Qdisc.
It can be obtained via the netdev_queue.  So create a helper routine,
qdisc_dev(), to make the transformations nicer looking.

Now, qdisc_alloc() now no longer needs a net_device pointer argument.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:06:30 -07:00
David S. Miller
bb949fbd18 netdev: Create netdev_queue abstraction.
A netdev_queue is an entity managed by a qdisc.

Currently there is one RX and one TX queue, and a netdev_queue merely
contains a backpointer to the net_device.

The Qdisc struct is augmented with a netdev_queue pointer as well.

Eventually the 'dev' Qdisc member will go away and we will have the
resulting hierarchy:

	net_device --> netdev_queue --> Qdisc

Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue
pointer argument.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 16:55:56 -07:00
Randy Dunlap
6ef307bc56 mac80211: fix lots of kernel-doc
Fix more than 50 kernel-doc warnings in ieee80211/mac80211 kernel-doc notation.
Fix a few typos also.

Note: Some fields are marked as TBD and need to have their description
corrected.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-08 14:16:03 -04:00
Ron Rindjunsky
429a380571 mac80211: add block ack request capability
This patch adds block ack request capability

Signed-off-by: Ester Kummer <ester.kummer@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-08 10:21:34 -04:00
Pablo Neira Ayuso
b891c5a831 netfilter: nf_conntrack: add allocation flag to nf_conntrack_alloc
ctnetlink does not need to allocate the conntrack entries with GFP_ATOMIC
as its code is executed in user context.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:35:55 -07:00
Patrick McHardy
fb0305ce1b net-sched: consolidate default fifo qdisc setup
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 23:40:21 -07:00
Patrick McHardy
6fe1c7a555 net-sched: add dynamically sized qdisc class hash helpers
Currently all qdiscs which allow to create classes uses a fixed sized hash
table with size 16 to hash the classes. This causes a large bottleneck
when using thousands of classes and unbound filters.

Add helpers for dynamically sized class hashes to fix this. The following
patches will convert the qdiscs to use them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 23:21:31 -07:00
David S. Miller
ea2aca084b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	Documentation/feature-removal-schedule.txt
	drivers/net/wan/hdlc_fr.c
	drivers/net/wireless/iwlwifi/iwl-4965.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2008-07-05 23:08:07 -07:00
David S. Miller
f3032be921 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-07-05 21:41:53 -07:00
Patrick McHardy
70c03b49b8 vlan: Add GVRP support
Add GVRP support for dynamically registering VLANs with switches.

By default GVRP is disabled because we only support the applicant-only
participant model, which means it should not be enabled on vlans that
are members of a bridge. Since there is currently no way to cleanly
determine that, the user is responsible for enabling it.

The code is pretty small and low impact, its wrapped in a config
option though because it depends on the GARP implementation and
the STP core.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:26:57 -07:00
Patrick McHardy
eca9ebac65 net: Add GARP applicant-only participant
Add an implementation of the GARP (Generic Attribute Registration Protocol)
applicant-only participant. This will be used by the following patch to
add GVRP support to the VLAN code.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:26:13 -07:00
Patrick McHardy
a19800d704 net: Add STP demux layer
Add small STP demux layer for demuxing STP PDUs based on MAC address.
This is needed to run both GARP and STP in parallel (or even load the
modules) since both use LLC_SAP_BSPAN.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:25:39 -07:00
Pavel Emelyanov
ef28d1a20f MIB: add struct net to UDP6_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:19:40 -07:00
Pavel Emelyanov
235b9f7ac5 MIB: add struct net to UDP6_INC_STATS_USER
As simple as the patch #1 in this set.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:19:20 -07:00
Pavel Emelyanov
0283328e23 MIB: add struct net to UDP_INC_STATS_BH
Two special cases here - one is rxrpc - I put init_net there
explicitly, since we haven't touched this part yet. The second
place is in __udp4_lib_rcv - we already have a struct net there,
but I have to move its initialization above to make it ready
at the "drop" label.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:18:48 -07:00
Pavel Emelyanov
629ca23c33 MIB: add struct net to UDP_INC_STATS_USER
Nothing special - all the places already have a struct sock
at hands, so use the sock_net() net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:18:07 -07:00
Denis V. Lunev
e84f84f276 netns: place rt_genid into struct net
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 19:04:32 -07:00
Denis V. Lunev
9f5e97e536 netns: make rt_secret_rebuild timer per namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 19:02:59 -07:00
Denis V. Lunev
39a23e7508 netns: register net.ipv4.route.flush in each namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 19:02:33 -07:00
Denis V. Lunev
ae299fc051 net: add fib_rules_ops to flush_cache method
This is required to pass namespace context into rt_cache_flush called from
->flush_cache.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 19:01:28 -07:00
Denis V. Lunev
76e6ebfb40 netns: add namespace parameter to rt_cache_flush
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 19:00:44 -07:00
Patrick McHardy
ff31ab56c0 net-sched: change tcf_destroy_chain() to clear start of filter list
Pass double tcf_proto pointers to tcf_destroy_chain() to make it
clear the start of the filter list for more consistency.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01 19:52:38 -07:00
Tomas Winkler
06ff47bc95 mac80211: add spectrum capabilities
This patch add spectrum capability and required information
elements to association request providing AP has requested it and
it is supported by the driver

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-30 17:37:34 -04:00
Emmanuel Grumbach
23976efedd mac80211: don't accept WEP keys other than WEP40 and WEP104
This patch makes mac80211 refuse a WEP key whose length is not WEP40 nor
WEP104.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-30 15:43:53 -04:00
David S. Miller
28f49d8fec Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-06-28 22:57:58 -07:00
David S. Miller
1b63ba8a86 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/iwlwifi/iwl4965-base.c
2008-06-28 01:19:40 -07:00
Ivo van Doorn
428da76523 mac80211: Add RTNL warning for workqueue
The workqueue provided by mac80211 should not be used for
scheduled tasks that acquire the RTNL lock. This could be done
when the driver uses the function ieee80211_iterate_active_interfaces()
within the scheduled work. Such behavior will end in locking
dependencies problems when an interface is being removed.

This patch will add a notification about the RTNL locking and
the mac80211 workqueue to prevent driver developers from
blindly using it.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27 09:09:20 -04:00
Luis R. Rodriguez
ffd7891dc9 mac80211: Let drivers have access to TKIP key offets for TX and RX MIC
Some drivers may want to to use the TKIP key offsets for TX and RX
MIC so lets move this out. Lets also clear up a bit how this is used
internally in mac80211.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27 09:09:17 -04:00
John W. Linville
1839cea91e Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/wireless-2.6 2008-06-25 15:17:58 -04:00
Eric W. Biederman
b9f75f45a6 netns: Don't receive new packets in a dead network namespace.
Alexey Dobriyan <adobriyan@gmail.com> writes:
> Subject: ICMP sockets destruction vs ICMP packets oops

> After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
> icmp_reply() wants ICMP socket.
>
> Steps to reproduce:
>
> 	launch shell in new netns
> 	move real NIC to netns
> 	setup routing
> 	ping -i 0
> 	exit from shell
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30
> PGD 17f3cd067 PUD 17f3ce067 PMD 0 
> Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU 0 
> Modules linked in: usblp usbcore
> Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
> RIP: 0010:[<ffffffff803fce17>]  [<ffffffff803fce17>] icmp_sk+0x17/0x30
> RSP: 0018:ffffffff8057fc30  EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
> RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
> RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
> R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
> R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
> FS:  0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
> Stack:  0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
>  ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
>  000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
> Call Trace:
>  <IRQ>  [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0
>  [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360
>  [<ffffffff803fd645>] icmp_echo+0x65/0x70
>  [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0
>  [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0
>  [<ffffffff803d71bb>] ip_rcv+0x33b/0x650
>  [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340
>  [<ffffffff803be57d>] process_backlog+0x9d/0x100
>  [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250
>  [<ffffffff80237be5>] __do_softirq+0x75/0x100
>  [<ffffffff8020c97c>] call_softirq+0x1c/0x30
>  [<ffffffff8020f085>] do_softirq+0x65/0xa0
>  [<ffffffff80237af7>] irq_exit+0x97/0xa0
>  [<ffffffff8020f198>] do_IRQ+0xa8/0x130
>  [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60
>  [<ffffffff8020bc46>] ret_from_intr+0x0/0xf
>  <EOI>  [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60
>  [<ffffffff80212f23>] ? mwait_idle+0x43/0x60
>  [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0
>  [<ffffffff8040f380>] ? rest_init+0x70/0x80
> Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
> 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08
> 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
> RIP  [<ffffffff803fce17>] icmp_sk+0x17/0x30
>  RSP <ffffffff8057fc30>
> CR2: 0000000000000000
> ---[ end trace ea161157b76b33e8 ]---
> Kernel panic - not syncing: Aiee, killing interrupt handler!

Receiving packets while we are cleaning up a network namespace is a
racy proposition. It is possible when the packet arrives that we have
removed some but not all of the state we need to fully process it.  We
have the choice of either playing wack-a-mole with the cleanup routines
or simply dropping packets when we don't have a network namespace to
handle them.

Since the check looks inexpensive in netif_receive_skb let's just
drop the incoming packets.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-20 22:16:51 -07:00
YOSHIFUJI Hideaki
f630e43a21 ipv6: Drop packets for loopback address from outside of the box.
[ Based upon original report and patch by Karsten Keil.  Karsten
  has verified that this fixes the TAHI test case "ICMPv6 test
  v6LC.5.1.2 Part F". -DaveM ]

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:33:57 -07:00
Vlad Yasevich
2e3216cd54 sctp: Follow security requirement of responding with 1 packet
RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts

When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet.  If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet.  If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses.  Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.

We implement this by not servicing our outqueue until we reach the end
of the packet.  This enables maximum bundling.  We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:08:18 -07:00
David S. Miller
0344f1c66b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	net/mac80211/tx.c
2008-06-19 16:00:04 -07:00
David S. Miller
972692e0db net: Add sk_set_socket() helper.
In order to more easily grep for all things that set
sk->sk_socket, add sk_set_socket() helper inline function.

Suggested (although only half-seriously) by Evgeniy Polyakov.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 22:41:38 -07:00
Eric Dumazet
cb61cb9b8b udp: sk_drops handling
In commits 33c732c361 ([IPV4]: Add raw
drops counter) and a92aa318b4 ([IPV6]:
Add raw drops counter), Wang Chen added raw drops counter for
/proc/net/raw & /proc/net/raw6

This patch adds this capability to UDP sockets too (/proc/net/udp &
/proc/net/udp6).

This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
be examined for each udp socket.

# grep Udp: /proc/net/snmp
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 23971006 75 899420 16390693 146348 0

# cat /proc/net/udp
 sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt  ---
uid  timeout inode ref pointer drops
 75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
  0        0 2358 2 ffff81082a538c80 0
111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
  0        0 2286 2 ffff81042dd35c80 146348

In this example, only port 111 (0x006F) was flooded by messages that
user program could not read fast enough. 146348 messages were lost.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 21:04:56 -07:00
Bernard Pidoux
fe2c802ab6 rose: improving AX25 routing frames via ROSE network
ROSE network is organized through nodes connected via hamradio or Internet.
AX25 packet radio frames sent to a remote ROSE address destination are routed
through these nodes.

Without the present patch, automatic routing mechanism did not work optimally
due to an improper parameter checking.

rose_get_neigh() function is called either by rose_connect() or by
rose_route_frame().

In the case of a call from rose_connect(), f0 timer is checked to find if a connection
is already pending. In that case it returns the address of the neighbour, or returns a NULL otherwise.

When called by rose_route_frame() the purpose was to route a packet AX25 frame
through an adjacent node given a destination rose address.
However, in that case, t0 timer checked does not indicate if the adjacent node
is actually connected even if the timer is not null. Thus, for each frame sent, the
function often tried to start a new connexion even if the adjacent node was already connected.

The patch adds a "new" parameter that is true when the function is called by
rose route_frame().
This instructs rose_get_neigh() to check node parameter "restarted". 
If restarted is true it means that the route to the destination address is opened via a neighbour
node already connected.
If "restarted" is false the function returns a NULL.
In that case the calling function will initiate a new connection as before.

This results in a fast routing of frames, from nodes to nodes, until
destination is reached, as originaly specified by ROSE protocole.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 17:08:32 -07:00
Patrick McHardy
68b80f1138 netfilter: nf_nat: fix RCU races
Fix three ct_extend/NAT extension related races:

- When cleaning up the extension area and removing it from the bysource hash,
  the nat->ct pointer must not be set to NULL since it may still be used in
  a RCU read side

- When replacing a NAT extension area in the bysource hash, the nat->ct
  pointer must be assigned before performing the replacement

- When reallocating extension storage in ct_extend, the old memory must
  not be freed immediately since it may still be used by a RCU read side

Possibly fixes https://bugzilla.redhat.com/show_bug.cgi?id=449315
and/or http://bugzilla.kernel.org/show_bug.cgi?id=10875

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 15:51:47 -07:00
David S. Miller
338db08551 net: Kill SOCK_SLEEP_PRE and SOCK_SLEEP_POST, no users.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 01:09:00 -07:00
David S. Miller
8ce9c6ede1 sctp: Kill SCTP_SOCK_SLEEP_{PRE,POST}, unused.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 00:40:36 -07:00
David S. Miller
ccc580571c wext: Emit event stream entries correctly when compat.
Three major portions to this change:

1) Add IW_EV_COMPAT_LCP_LEN, IW_EV_COMPAT_POINT_OFF,
   and IW_EV_COMPAT_POINT_LEN helper defines.

2) Delete iw_stream_check_add_*(), they are unused.

3) Add iw_request_info argument to iwe_stream_add_*(), and use it to
   size the event and pointer lengths correctly depending upon whether
   IW_REQUEST_FLAG_COMPAT is set or not.

4) The mechanical transformations to the drivers and wireless stack
   bits to get the iw_request_info passed down into the routines
   modified in #3.  Also, explicit references to IW_EV_LCP_LEN are
   replaced with iwe_stream_lcp_len(info).

With a lot of help and bug fixes from Masakazu Mokuno.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 18:50:49 -07:00
David S. Miller
0f5cabba49 wext: Create IW_REQUEST_FLAG_COMPAT and set it as needed.
Now low-level WEXT ioctl handlers can do compat handling
when necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 18:34:49 -07:00
David S. Miller
87de87d5e4 wext: Dispatch and handle compat ioctls entirely in net/wireless/wext.c
Next we can kill the hacks in fs/compat_ioctl.c and also
dispatch compat ioctls down into the driver and 80211 protocol
helper layers in order to handle iw_point objects embedded in
stream replies which need to be translated.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 18:32:46 -07:00
Pavel Emelyanov
0b4419162a netns: introduce the net_hash_mix "salt" for hashes
There are many possible ways to add this "salt", thus I made this
patch to be the last in the series to change it if required.

Currently I propose to use the struct net pointer itself as this 
salt, but since this pointer is most often cache-line aligned, shift 
this right to eliminate the bits, that are most often zeroed.

After this, simply add this mix to prepared hashfn-s.

For CONFIG_NET_NS=n case this salt is 0 and no changes in hashfn
appear.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:14:11 -07:00
Pavel Emelyanov
33de014c63 inet6: add struct net argument to inet6_ehashfn
Same as for inet_hashfn, prepare its ipv6 incarnation.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:13:48 -07:00
Pavel Emelyanov
9f26b3add3 inet: add struct net argument to inet_ehashfn
Although this hash takes addresses into account, the ehash chains
can also be too long when, for instance, communications via lo occur.
So, prepare the inet_hashfn to take struct net into account.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:13:27 -07:00
Pavel Emelyanov
2086a65078 inet: add struct net argument to inet_lhashfn
Listening-on-one-port sockets in many namespaces produce long 
chains in the listening_hash-es, so prepare the inet_lhashfn to 
take struct net into account.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:13:08 -07:00
Pavel Emelyanov
7f635ab71e inet: add struct net argument to inet_bhashfn
Binding to some port in many namespaces may create too long
chains in bhash-es, so prepare the hashfn to take struct net
into account.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:12:49 -07:00
David S. Miller
942e7b102a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-06-14 17:15:39 -07:00
Brian Haley
7d06b2e053 net: change proto destroy method to return void
Change struct proto destroy function pointer to return void.  Noticed
by Al Viro.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-14 17:04:49 -07:00
Harvey Harrison
6693be7124 mac80211: add utility function to get header length
Take a __le16 directly rather than a host-endian value.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-14 12:18:13 -04:00
Harvey Harrison
c9c6950c14 mac80211: make ieee80211_get_hdrlen_from_skb return unsigned
Many callers already expect it to.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-14 12:18:12 -04:00
David S. Miller
4ae127d1b6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/smc911x.c
2008-06-13 20:52:39 -07:00
Richard Kennedy
875ec4333b udp: reorder udp_iter_state to remove padding on 64bit builds
reorder udp_iter_state to remove padding on 64bit builds

shrinks from 24 to 16 bytes, moving to a smaller slab when
CONFIG_NET_NS is undefined & seq_net_private = {}

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-13 03:03:17 -07:00
David S. Miller
ec0a196626 tcp: Revert 'process defer accept as established' changes.
This reverts two changesets, ec3c0982a2
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adb
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.

Ilpo Järvinen first spotted the locking problems.  The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.

Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock.  The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.

Next is a problem noticed by Vitaliy Gusev, he noted:

----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> 		goto death;
> 	}
>
>+	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+		tcp_send_active_reset(sk, GFP_ATOMIC);
>+		goto death;

Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------

Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:

----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------

So revert this thing for now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-12 16:34:35 -07:00
David S. Miller
e6e30add6b Merge branch 'net-next-2.6-misc-20080612a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next 2008-06-11 22:33:59 -07:00
Adrian Bunk
0b04082995 net: remove CVS keywords
This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11 21:00:38 -07:00
YOSHIFUJI Hideaki
9501f97229 tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().
As we do for other socket/timewait-socket specific parameters,
let the callers pass appropriate arguments to
tcp_v{4,6}_do_calc_md5_hash().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 03:46:30 +09:00
YOSHIFUJI Hideaki
8d26d76dd4 tcp md5sig: Share most of hash calcucaltion bits between IPv4 and IPv6.
We can share most part of the hash calculation code because
the only difference between IPv4 and IPv6 is their pseudo headers.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:20 +09:00
YOSHIFUJI Hideaki
076fb72233 tcp md5sig: Remove redundant protocol argument.
Protocol is always TCP, so remove useless protocol argument.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:19 +09:00
YOSHIFUJI Hideaki
7d5d5525bd tcp md5sig: Share MD5 Signature option parser between IPv4 and IPv6.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:18 +09:00
Benjamin Thery
3de232554a ipv6 netns: Address labels per namespace
This pacth makes IPv6 address labels per network namespace.
It keeps the global label tables, ip6addrlbl_table, but
adds a 'net' member to each ip6addrlbl_entry.
This new member is taken into account when matching labels.

Changelog
=========
* v1: Initial version
* v2:
  * Minize the penalty when network namespaces are not configured:
      *  the 'net' member is added only if CONFIG_NET_NS is
         defined. This saves space when network namespaces are not
         configured.
      * 'net' value is retrieved with the inlined function
         ip6addrlbl_net() that always return &init_net when
         CONFIG_NET_NS is not defined.
  * 'net' member in ip6addrlbl_entry renamed to the less generic
    'lbl_net' name (helps code search).

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:15 +09:00
Rami Rosen
0399e5f07a ipv6 addrconf: Remove IFA_GLOBAL definition from include/net/if_inet6.h.
This patches removes IFA_GLOBAL definition from linux/include/net/if_inet6.h
as it is unused.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:13 +09:00
Arnaldo Carvalho de Melo
ce4a7d0d48 inet{6}_request_sock: Init ->opt and ->pktopts in the constructor
Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10 12:39:35 -07:00
Rami Rosen
45d465bc23 ipv4: Remove unused declaration from include/net/tcp.h.
- The tcp_unhash() method in /include/net/tcp.h is no more needed, as the
unhash method in tcp_prot structure is now inet_unhash (instead of
tcp_unhash in the
past); see tcp_prot structure in net/ipv4/tcp_ipv4.c.

- So, this patch removes tcp_unhash() declaration from include/net/tcp.h

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10 12:37:42 -07:00
David S. Miller
65b53e4cc9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/tg3.c
	drivers/net/wireless/rt2x00/rt2x00dev.c
	net/mac80211/ieee80211_i.h
2008-06-10 02:22:26 -07:00
David S. Miller
788c0a5316 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:

	drivers/net/ps3_gelic_wireless.c
	drivers/net/wireless/libertas/main.c
2008-06-10 01:54:31 -07:00
Rami Rosen
7bcd978e8c netfilter: nf_conntrack: remove unnecessary function declaration
This patch removes nf_ct_ipv4_ct_gather_frags() method declaration from
include/net/netfilter/ipv4/nf_conntrack_ipv4.h, since it is unused in
the Linux kernel.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 16:00:22 -07:00
Fabian Hugelshofer
718d4ad98e netfilter: nf_conntrack: properly account terminating packets
Currently the last packet of a connection isn't accounted when its causing
abnormal termination.

Introduces nf_ct_kill_acct() which increments the accounting counters on
conntrack kill. The new function was necessary, because there are calls
to nf_ct_kill() which don't need accounting:

nf_conntrack_proto_tcp.c line ~847:
Kills ct and returns NF_REPEAT. We don't want to count twice.

nf_conntrack_proto_tcp.c line ~880:
Kills ct and returns NF_DROP. I think we don't want to count dropped
packets.

nf_conntrack_netlink.c line ~824:
As far as I can see ctnetlink_del_conntrack() is used to destroy a
conntrack on behalf of the user. There is an sk_buff, but I don't think
this is an actual packet. Incrementing counters here is therefore not
desired.

Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:59:40 -07:00
Patrick McHardy
51091764f2 netfilter: nf_conntrack: add nf_ct_kill()
Encapsulate the common

	if (del_timer(&ct->timeout))
		ct->timeout.function((unsigned long)ct)

sequence in a new function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:59:06 -07:00
James Morris
17e6e59f0a netfilter: ip6_tables: add ip6tables security table
This is a port of the IPv4 security table for IPv6.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:58:05 -07:00
James Morris
560ee653b6 netfilter: ip_tables: add iptables security table for mandatory access control rules
The following patch implements a new "security" table for iptables, so
that MAC (SELinux etc.) networking rules can be managed separately to
standard DAC rules.

This is to help with distro integration of the new secmark-based
network controls, per various previous discussions.

The need for a separate table arises from the fact that existing tools
and usage of iptables will likely clash with centralized MAC policy
management.

The SECMARK and CONNSECMARK targets will still be valid in the mangle
table to prevent breakage of existing users.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:57:24 -07:00
Vlad Yasevich
b9031d9d87 sctp: Fix ECN markings for IPv6
Commit e9df2e8fd8 ("[IPV6]: Use
appropriate sock tclass setting for routing lookup.") also changed the
way that ECN capable transports mark this capability in IPv6.  As a
result, SCTP was not marking ECN capablity because the traffic class
was never set.  This patch brings back the markings for IPv6 traffic.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:40:15 -07:00
Vlad Yasevich
62aeaff5cc sctp: Start T3-RTX timer when fast retransmitting lowest TSN
When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:39:11 -07:00
Vlad Yasevich
a646523481 sctp: Correctly implement Fast Recovery cwnd manipulations.
Correctly keep track of Fast Recovery state and do not reduce
congestion window multiple times during sucht state.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 12:38:43 -07:00
David S. Miller
aed5a833fb Merge branch 'net-2.6-misc-20080605a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-fix 2008-06-04 12:10:21 -07:00
Denis V. Lunev
36d926b94a [IPV6]: inet_sk(sk)->cork.opt leak
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:38 +09:00
YOSHIFUJI Hideaki
91e1908f56 [IPV6] NETNS: Handle ancillary data in appropriate namespace.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:36 +09:00
YOSHIFUJI Hideaki
4bed72e4f5 [IPV6] ADDRCONF: Allow longer lifetime on 64bit archs.
- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
  by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
  helper functions: addrconf_timeout_fixup() and
  addrconf_finite_timeout().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:34 +09:00
YOSHIFUJI Hideaki
e51171019b [SCTP]: Fix NULL dereference of asoc.
Commit 7cbca67c07 ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:30 +09:00
Thomas Graf
bc3ed28caa netlink: Improve returned error codes
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.

Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-03 16:36:54 -07:00
Emmanuel Grumbach
9306102ea5 mac80211: allow disable FAT in specific configurations
This patch allows to disable FAT channel in specific configurations.

For example the configuration (8, +1), (primary channel 8, extension
channel 12) isn't permitted in U.S., but (8, -1), (primary channel 8,
extension channel 4) is. When FAT channel configuration is not
permitted, FAT channel should be reported as not supported in the
capabilities of the HT IE in association request. And sssociation is
performed on 20Mhz channel.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-03 15:00:26 -04:00
David S. Miller
43154d08d6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/cpmac.c
	net/mac80211/mlme.c
2008-05-25 23:26:10 -07:00
Thomas Graf
b9a2f2e450 netlink: Fix nla_parse_nested_compat() to call nla_parse() directly
The purpose of nla_parse_nested_compat() is to parse attributes which
contain a struct followed by a stream of nested attributes.  So far,
it called nla_parse_nested() to parse the stream of nested attributes
which was wrong, as nla_parse_nested() expects a container attribute
as data which holds the attribute stream.  It needs to call
nla_parse() directly while pointing at the next possible alignment
point after the struct in the beginning of the attribute.

With this patch, I can no longer reproduce the reported leftover
warnings.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-22 10:48:59 -07:00
Johannes Berg
e253008360 mac80211: use multi-queue master netdevice
This patch updates mac80211 and drivers to be multi-queue aware and
use that instead of the internal queue mapping. Also does a number
of cleanups in various pieces of the code that fall out and reduces
internal mac80211 state size.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-21 21:48:14 -04:00
Johannes Berg
eefce91a38 mac80211: dont allow fragmentation and requeuing on A-MPDU queues
There really is no reason for a driver to reject a frame on
an A-MPDU queue when it can stop that queue for any period
of time and is given frames one by one. Hence, disallow it
with a big warning and reduce mac80211-internal state.

Also add a warning when we try to fragment a frame destined
for an A-MPDU queue and drop it, the actual bug needs to be
fixed elsewhere but I'm not exactly sure how to yet.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-21 21:48:13 -04:00
Johannes Berg
e039fa4a41 mac80211: move TX info into skb->cb
This patch converts mac80211 and all drivers to have transmit
information and status in skb->cb rather than allocating extra
memory for it and copying all the data around. To make it fit,
a union is used where only data that is necessary for all steps
is kept outside of the union.

A number of fixes were done by Ivo, as well as the rt2x00 part
of this patch.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-21 21:48:11 -04:00
Johannes Berg
2e92e6f2c5 mac80211: use rate index in TX control
This patch modifies struct ieee80211_tx_control to give band
info and the rate index (instead of rate pointers) to drivers.
This mostly serves to reduce the TX control structure size to
make it fit into skb->cb so that the fragmentation code can
put it there and we can think about passing it to drivers that
way in the future.

The rt2x00 driver update was done by Ivo, thanks.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-21 21:48:09 -04:00
Johannes Berg
36d6825b91 mac80211: let drivers wake but not start queues
Having drivers start queues is just confusing, their ->start()
callback can block and do whatever is necessary, so let mac80211
start queues and have drivers wake queues when necessary (to get
packets flowing again right away.)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-21 21:48:08 -04:00
Pavel Emelyanov
3dca02af38 ip6tnl: Use on-device stats instead of private ones.
This tunnel uses its own private structure and requires separate
patch to switch from private stats to on-device ones.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-21 14:17:05 -07:00
Pavel Emelyanov
f56dd017c3 tunnels: Remove stat member from ip_tunnel struct.
All users already use on-device statistics, so this field can be
safely removed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-21 14:16:36 -07:00
David S. Miller
44dc19c829 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-05-19 16:29:40 -07:00
YOSHIFUJI Hideaki
0686caa35e ndisc: Add missing strategies for per-device retrans timer/reachable time settings.
Noticed from Al Viro <viro@ftp.linux.org.uk> via David Miller
<davem@davemloft.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 16:25:42 -07:00
Pavel Emelyanov
d62c612ef8 netns: Introduce sysctl root for read-only net sysctls.
This one stores all ctl-heads in one list and restricts the
permissions not give write access to non-init net namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 13:45:33 -07:00
Ivo van Doorn
2f561feb38 mac80211: Add RTNL version of ieee80211_iterate_active_interfaces
Since commit e38bad4766
	mac80211: make ieee80211_iterate_active_interfaces not need rtnl
rt2500usb and rt73usb broke down due to attempting register access
in atomic context (which is not possible for USB hardware).

This patch restores ieee80211_iterate_active_interfaces() to use RTNL lock,
and provides the non-RTNL version under a new name:
	ieee80211_iterate_active_interfaces_atomic()

So far only rt2x00 uses ieee80211_iterate_active_interfaces(), and those
drivers require the RTNL version of ieee80211_iterate_active_interfaces().
Since they already call that function directly, this patch will automatically
fix the USB rt2x00 drivers.

v2: Rename ieee80211_iterate_active_interfaces_rtnl

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-16 17:15:09 -04:00
David S. Miller
f42a44494b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-05-15 00:52:37 -07:00
David S. Miller
63fe46da9c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/iwlwifi/iwl-4965-rs.c
	drivers/net/wireless/rt2x00/rt61pci.c
2008-05-15 00:34:44 -07:00
Eric Van Hensbergen
887b3ece65 9p: fix error path during early mount
There was some cleanup issues during early mount which would trigger
a kernel bug for certain types of failure.  This patch reorganizes the
cleanup to get rid of the bad behavior.

This also merges the 9pnet and 9pnet_fd modules for the purpose of
configuration and initialization.  Keeping the fd transport separate
from the core 9pnet code seemed like a good idea at the time, but in
practice has caused more harm and confusion than good.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-14 19:23:27 -05:00
Eric Van Hensbergen
ee443996a3 9p: Documentation updates
The kernel-doc comments of much of the 9p system have been in disarray since
reorganization.  This patch fixes those problems, adds additional documentation
and a template book which collects the 9p information.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-14 19:23:25 -05:00
Bruno Randolf
566bfe5a8b mac80211: use hardware flags for signal/noise units
trying to clean up the signal/noise code. the previous code in mac80211 had
confusing names for the related variables, did not have much definition of
what units of signal and noise were provided and used implicit mechanisms from
the wireless extensions.

this patch introduces hardware capability flags to let the hardware specify
clearly if it can provide signal and noise level values and which units it can
provide. this also anticipates possible new units like RCPI in the future.

for signal:

  IEEE80211_HW_SIGNAL_UNSPEC - unspecified, unknown, hw specific
  IEEE80211_HW_SIGNAL_DB     - dB difference to unspecified reference point
  IEEE80211_HW_SIGNAL_DBM    - dBm, difference to 1mW

for noise we currently only have dBm:

  IEEE80211_HW_NOISE_DBM     - dBm, difference to 1mW

if IEEE80211_HW_SIGNAL_UNSPEC or IEEE80211_HW_SIGNAL_DB is used the driver has
to provide the maximum value (max_signal) it reports in order for applications
to make sense of the signal values.

i tried my best to find out for each driver what it can provide and update it
but i'm not sure (?) for some of them and used the more conservative guess in
doubt. this can be fixed easily after this patch has been merged by changing
the hardware flags of the driver.

DRIVER          SIGNAL    MAX	NOISE   QUAL
-----------------------------------------------------------------
adm8211         unspec(?) 100   n/a     missing
at76_usb        unspec(?) (?)   unused  missing
ath5k           dBm             dBm     percent rssi
b43legacy       dBm             dBm     percent jssi(?)
b43             dBm             dBm     percent jssi(?)
iwl-3945        dBm             dBm     percent snr+more
iwl-4965        dBm             dBm     percent snr+more
p54             unspec    127   n/a     missing
rt2x00          dBm	        n/a     percent rssi+tx/rx frame success
  rt2400        dBm             n/a
  rt2500pci     dBm             n/a
  rt2500usb     dBm             n/a
  rt61pci       dBm             n/a
  rt73usb       dBm             n/a
rtl8180         unspec(?) 65    n/a     (?)
rtl8187         unspec(?) 65    (?)     noise(?)
zd1211          dB(?)     100   n/a     percent

drivers/net/wireless/ath5k/base.c:      Changes-licensed-under: 3-Clause-BSD

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-14 16:29:49 -04:00
Graf Yang
332223831e irda: Fix a misalign access issue. (v2)
Replace u16ho with put/get_unaligned functions

Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-13 23:25:57 -07:00
Allan Stephens
7ef43ebaa5 tipc: Fix race condition when creating socket or native port
This patch eliminates the (very remote) chance of a crash resulting
from a partially initialized socket or native port unexpectedly
receiving a message.  Now, during the creation of a socket or native
port, the underlying generic port's lock is not released until all
initialization required to handle incoming messages has been done.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12 15:42:28 -07:00
David S. Miller
4951704b4e syncppp: Fix crashes.
The syncppp layer wants a mid-level netdev private pointer.

It was using netdev->priv but that only worked by accident,
and thus this scheme was broken when the device private
allocation strategy changed.

Add a proper mid-layer private pointer for uses like this,
update syncppp and all users, and remove the HDLC_PPP broken
tag from drivers/net/wan/Kconfig

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12 03:29:11 -07:00
Neil Horman
20c2c1fd6c sctp: add sctp/remaddr table to complete RFC remote address table OID
Add support for RFC3873 remote address table OID.

      +--(5) sctpAssocRemAddrTable
      |   |
      |   |--(-) sctpAssocId (shared index)
      |   |
      |   +--(1) sctpAssocRemAddrType (index)
      .   |
      .   +--(2) sctpAssocRemAddr (index)
      .   |
          +--(3) sctpAssocRemAddrActive
          |
          +--(4) sctpAssocRemAddrHBActive
          |
          +--(5) sctpAssocRemAddrRTO
          |
          +--(6) sctpAssocRemAddrMaxPathRtx
          |
          +--(7) sctpAssocRemAddrRtx
          |
          +--(8) sctpAssocRemAddrStartTime

This patch places all the requsite data in /proc/net/sctp/remaddr.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-09 15:14:50 -07:00
Vlad Yasevich
88a0a948e7 sctp: Support the new specification of sctp_connectx()
The specification of sctp_connectx() has been changed to return
an association id.  We've added a new socket option that will
return the association id as the return value from the setsockopt()
call.  The library that implements sctp_connectx() interface will
implement both socket options.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-09 15:14:11 -07:00
Wei Yongjun
d364d9276b sctp: Bring SCTP_DELAYED_ACK socket option into API compliance
Brings delayed_ack socket option set/get into line with the latest ietf
socket extensions API draft, while maintaining backwards compatibility.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-09 15:13:26 -07:00
Johannes Berg
e100bb64bf mac80211: QoS related cleanups
This
 * makes the queue number passed to drivers a u16
   (as it will be with skb_get_queue_mapping)
 * removes the useless queue number defines
 * splits hw->queues into hw->queues/ampdu_queues
 * removes the debugfs files for per-queue counters
 * removes some dead QoS code
 * removes the beacon queue configuration for IBSS
   so that the drivers now never get a queue number
   bigger than (hw->queues + hw->ampdu_queues - 1)
   for tx and only in the range 0..hw->queues-1 for
   conf_tx.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-07 15:02:26 -04:00
Johannes Berg
36fc6757fe mac80211: remove queue info from ieee80211_tx_status
The queue info in struct ieee80211_tx_status is never used.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-07 15:02:26 -04:00
Johannes Berg
57ffc589a9 mac80211: clean up get_tx_stats callback
The callback takes a ieee80211_tx_queue_stats with a contained
array of ieee80211_tx_queue_stats_data, remove the former, rename
the latter to ieee80211_tx_queue_stats and make tx_stats() take
the array directly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-07 15:02:26 -04:00
Adrian Bunk
7eafd25d95 remove ieee80211_wx_{get,set}_auth()
After the bcm43xx removal ieee80211_wx_{get,set}_auth() were no longer
used.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-07 15:02:14 -04:00
Adrian Bunk
c12cf21097 remove ieee80211_tx_frame()
After the softmac removal ieee80211_tx_frame() was no longer used.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-07 15:02:14 -04:00
Ivo van Doorn
c6adbd2158 mac80211: Add IEEE80211_KEY_FLAG_PAIRWISE
This adds a new flag to the ieee80211_key_conf structure.
This flag will inform the driver the key is pairwise rather then
a shared key.

This is important for drivers who support both types of keys,
and need to be informed which type of key this is. Alternative
would be drivers checking the address argument of set_key(),
but it will be safer when mac80211 is more explicit.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-07 15:02:11 -04:00
Ivo van Doorn
1c01442058 mac80211: Replace ieee80211_tx_control->key_idx with ieee80211_key_conf
The hw_key_idx inside the ieee80211_key_conf structure does
not provide all the information drivers might need to perform
hardware encryption.

This is in particular true for rt2x00 who needs to know the
key algorithm and whether it is a shared or pairwise key.

By passing the ieee80211_key_conf pointer it assures us that
drivers can make full use of all information that it should know
about a particular key.

Additionally this patch updates all drivers to grab the hw_key_idx from
the ieee80211_key_conf structure.

v2: Removed bogus u16 cast
v3: Add warning about ieee80211_tx_control pointers
v4: Update warning about ieee80211_tx_control pointers

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-07 15:02:11 -04:00
Satoru SATOH
0bbeafd011 ip: Make use of the inline function dst_metric_locked()
Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-04 22:12:43 -07:00
Marcin Slusarz
41fef0ee7b xfrm: convert empty xfrm_audit_* macros to functions
it removes these warnings when CONFIG_AUDITSYSCALL is unset:

net/xfrm/xfrm_user.c: In function 'xfrm_add_sa':
net/xfrm/xfrm_user.c:412: warning: unused variable 'sid'
net/xfrm/xfrm_user.c:411: warning: unused variable 'sessionid'
net/xfrm/xfrm_user.c:410: warning: unused variable 'loginuid'
net/xfrm/xfrm_user.c: In function 'xfrm_del_sa':
net/xfrm/xfrm_user.c:485: warning: unused variable 'sid'
net/xfrm/xfrm_user.c:484: warning: unused variable 'sessionid'
net/xfrm/xfrm_user.c:483: warning: unused variable 'loginuid'
net/xfrm/xfrm_user.c: In function 'xfrm_add_policy':
net/xfrm/xfrm_user.c:1132: warning: unused variable 'sid'
net/xfrm/xfrm_user.c:1131: warning: unused variable 'sessionid'
net/xfrm/xfrm_user.c:1130: warning: unused variable 'loginuid'
net/xfrm/xfrm_user.c: In function 'xfrm_get_policy':
net/xfrm/xfrm_user.c:1382: warning: unused variable 'sid'
net/xfrm/xfrm_user.c:1381: warning: unused variable 'sessionid'
net/xfrm/xfrm_user.c:1380: warning: unused variable 'loginuid'
net/xfrm/xfrm_user.c: In function 'xfrm_add_pol_expire':
net/xfrm/xfrm_user.c:1620: warning: unused variable 'sid'
net/xfrm/xfrm_user.c:1619: warning: unused variable 'sessionid'
net/xfrm/xfrm_user.c:1618: warning: unused variable 'loginuid'
net/xfrm/xfrm_user.c: In function 'xfrm_add_sa_expire':
net/xfrm/xfrm_user.c:1658: warning: unused variable 'sid'
net/xfrm/xfrm_user.c:1657: warning: unused variable 'sessionid'
net/xfrm/xfrm_user.c:1656: warning: unused variable 'loginuid'

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-03 21:03:01 -07:00
Linus Torvalds
95dfec6ae1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
  tcp: Overflow bug in Vegas
  [IPv4] UFO: prevent generation of chained skb destined to UFO device
  iwlwifi: move the selects to the tristate drivers
  ipv4: annotate a few functions __init in ipconfig.c
  atm: ambassador: vcc_sf semaphore to mutex
  MAINTAINERS: The socketcan-core list is subscribers-only.
  netfilter: nf_conntrack: padding breaks conntrack hash on ARM
  ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
  sch_sfq: use del_timer_sync() in sfq_destroy()
  net: Add compat support for getsockopt (MCAST_MSFILTER)
  net: Several cleanups for the setsockopt compat support.
  ipvs: fix oops in backup for fwmark conn templates
  bridge: kernel panic when unloading bridge module
  bridge: fix error handling in br_add_if()
  netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets
  netfilter: x_tables: fix net namespace leak when reading /proc/net/xxx_tables_names
  netfilter: xt_TCPOPTSTRIP: signed tcphoff for ipv6_skip_exthdr() retval
  tcp: Limit cwnd growth when deferring for GSO
  tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled
  [netdrvr] gianfar: Determine TBIPA value dynamically
  ...
2008-04-30 08:45:48 -07:00
Linus Torvalds
9781db7b34 Merge branch 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  [PATCH] new predicate - AUDIT_FILETYPE
  [patch 2/2] Use find_task_by_vpid in audit code
  [patch 1/2] audit: let userspace fully control TTY input auditing
  [PATCH 2/2] audit: fix sparse shadowed variable warnings
  [PATCH 1/2] audit: move extern declarations to audit.h
  Audit: MAINTAINERS update
  Audit: increase the maximum length of the key field
  Audit: standardize string audit interfaces
  Audit: stop deadlock from signals under load
  Audit: save audit_backlog_limit audit messages in case auditd comes back
  Audit: collect sessionid in netlink messages
  Audit: end printk with newline
2008-04-29 11:41:22 -07:00
Philip Craig
443a70d50b netfilter: nf_conntrack: padding breaks conntrack hash on ARM
commit 0794935e "[NETFILTER]: nf_conntrack: optimize hash_conntrack()"
results in ARM platforms hashing uninitialised padding.  This padding
doesn't exist on other architectures.

Fix this by replacing NF_CT_TUPLE_U_BLANK() with memset() to ensure
everything is initialised.  There were only 4 bytes that
NF_CT_TUPLE_U_BLANK() wasn't clearing anyway (or 12 bytes on ARM).

Signed-off-by: Philip Craig <philipc@snapgear.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:35:10 -07:00
Timo Teras
0010e46577 ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
Add struct net_device parameter to ip_rt_frag_needed() and update MTU to
cache entries where ifindex is specified. This is similar to what is
already done in ip_rt_redirect().

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:32:25 -07:00
David L Stevens
42908c69f6 net: Add compat support for getsockopt (MCAST_MSFILTER)
This patch adds support for getsockopt for MCAST_MSFILTER for
both IPv4 and IPv6. It depends on the previous setsockopt patch,
and uses the same method.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:23:22 -07:00
Julian Anastasov
2ad17defd5 ipvs: fix oops in backup for fwmark conn templates
Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=10556
where conn templates with protocol=IPPROTO_IP can oops backup box.

        Result from ip_vs_proto_get() should be checked because
protocol value can be invalid or unsupported in backup. But
for valid message we should not fail for templates which use
IPPROTO_IP. Also, add checks to validate message limits and
connection state. Show state NONE for templates using IPPROTO_IP.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:21:23 -07:00
Linus Torvalds
77a50df2b1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  iwlwifi: Allow building iwl3945 without iwl4965.
  wireless: Fix compile error with wifi & leds
  tcp: Fix slab corruption with ipv6 and tcp6fuzz
  ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.
  [IPSEC]: Use digest_null directly for auth
  sunrpc: fix missing kernel-doc
  can: Fix copy_from_user() results interpretation
  Revert "ipv6: Fix typo in net/ipv6/Kconfig"
  tipc: endianness annotations
  ipv6: result of csum_fold() is already 16bit, no need to cast
  [XFRM] AUDIT: Fix flowlabel text format ambibuity.
2008-04-28 09:44:11 -07:00
Eric Paris
2532386f48 Audit: collect sessionid in netlink messages
Previously I added sessionid output to all audit messages where it was
available but we still didn't know the sessionid of the sender of
netlink messages.  This patch adds that information to netlink messages
so we can audit who sent netlink messages.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-28 06:18:03 -04:00
David L Stevens
dae5029548 ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.
Add support on 64-bit kernels for seting 32-bit compatible MCAST*
socket options.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 14:26:53 -07:00
Aurélien Charbon
f15364bd4c IPv6 support for NFS server export caches
This adds IPv6 support to the interfaces that are used to express nfsd
exports.  All addressed are stored internally as IPv6; backwards
compatibility is maintained using mapped addresses.

Thanks to Bruce Fields, Brian Haley, Neil Brown and Hideaki Joshifuji
for comments

Signed-off-by: Aurelien Charbon <aurelien.charbon@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Brian Haley <brian.haley@hp.com>
Cc:  YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:36 -04:00
Herbert Xu
c5d18e984a [IPSEC]: Fix catch-22 with algorithm IDs above 31
As it stands it's impossible to use any authentication algorithms
with an ID above 31 portably.  It just happens to work on x86 but
fails miserably on ppc64.

The reason is that we're using a bit mask to check the algorithm
ID but the mask is only 32 bits wide.

After looking at how this is used in the field, I have concluded
that in the long term we should phase out state matching by IDs
because this is made superfluous by the reqid feature.  For current
applications, the best solution IMHO is to allow all algorithms when
the bit masks are all ~0.

The following patch does exactly that.

This bug was identified by IBM when testing on the ppc64 platform
using the NULL authentication algorithm which has an ID of 251.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-22 00:46:42 -07:00
Pavel Emelyanov
53083773dc [INET]: Uninline the __inet_inherit_port call.
This deblats ~200 bytes when ipv6 and dccp are 'y'.

Besides, this will ease compilation issues for patches
I'm working on to make inet hash tables more scalable 
wrt net namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-17 23:18:15 -07:00
Pavel Emelyanov
e56d8b8a2e [INET]: Drop the inet_inherit_port() call.
As I can see from the code, two places (tcp_v6_syn_recv_sock and
dccp_v6_request_recv_sock) that call this one already run with
BHs disabled, so it's safe to call __inet_inherit_port there.

Besides (in case I missed smth with code review) the calltrace
tcp_v6_syn_recv_sock
 `- tcp_v4_syn_recv_sock
     `- __inet_inherit_port
and the similar for DCCP are valid, but assumes BHs to be disabled.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-17 23:17:34 -07:00
Reinette Chatre
d18ef29f34 mac80211: no BSS changes to driver from beacons processed during scanning
There is no need to send BSS changes to driver from beacons processed
during scanning. We are more interested in beacons from an AP with which
we are associated - these will still be used to send updates to driver as
the beacons are received without scanning.

This change·removes the requirement that bss_info_changed needs to be atomic.
The beacons received during scanning are processed from a tasklet, but if we
do not call bss_info_changed for these beacons there is no need for it to be
atomic. This function (bss_info_changed) is called either from workqueue or
ioctl in all other instances.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-16 15:59:56 -04:00
John Heffner
dd9e0dda66 [TCP]: Increase the max_burst threshold from 3 to tp->reordering.
This change is necessary to allow cwnd to grow during persistent
reordering.  Cwnd moderation is applied when in the disorder state
and an ack that fills the hole comes in.  If the hole was greater
than 3 packets, but less than tp->reordering, cwnd will shrink when
it should not have.

Signed-off-by: John Heffner <jheffner@napa.(none)>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 02:29:56 -07:00
Denis V. Lunev
3661a91083 [NETNS]: Add netns refcnt debug to fib rules.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 02:01:56 -07:00
Denis V. Lunev
65a18ec58e [NETNS]: Add netns refcnt debug for kernel sockets.
Protocol control sockets and netlink kernel sockets should not prevent the
namespace stop request. They are initialized and disposed in a special way by
sk_change_net/sk_release_kernel.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:59:46 -07:00
Denis V. Lunev
5d1e4468a7 [NETNS]: Make netns refconting debug like a socket one.
Make release_net/hold_net noop for performance-hungry people. This is a debug
staff and should be used in the debug mode only.

Add check for net != NULL in hold/release calls. This will be required
later on.

[ Added minor simplifications suggested by Brian Haley. -DaveM ]

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:58:04 -07:00
Pavel Emelyanov
669f87baab [RTNL]: Introduce the rtnl_kill_links helper.
This one is responsible for calling ->dellink on each net
device found in net to help with vlan net_exit hook in the
nearest future.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:46:52 -07:00
Pavel Emelyanov
dec827d174 [NETNS]: The generic per-net pointers.
Add the elastic array of void * pointer to the struct net.
The access rules are simple:

 1. register the ops with register_pernet_gen_device to get
    the id of your private pointer
 2. call net_assign_generic() to put the private data on the
    struct net (most preferably this should be done in the
    ->init callback of the ops registered)
 3. do not store any private reference on the net_generic array;
 4. do not change this pointer while the net is alive;
 5. use the net_generic() to get the pointer.

When adding a new pointer, I copy the old array, replace it
with a new one and schedule the old for kfree after an RCU
grace period.

Since the net_generic explores the net->gen array inside rcu
read section and once set the net->gen->ptr[x] pointer never 
changes, this grants us a safe access to generic pointers.

Quoting Paul: "... RCU is protecting -only- the net_generic 
structure that net_generic() is traversing, and the [pointer]
returned by net_generic() is protected by a reference counter 
in the upper-level struct net."

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:36:08 -07:00
Pavel Emelyanov
c93cf61fd1 [NETNS]: The net-subsys IDs generator.
To make some per-net generic pointers, we need some way to address
them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
subsystems.

Addressing questions about potential checkpoint/restart problems: 
these IDs are "lite-offsets" within the net structure and are by no 
means supposed to be exported to the userspace.

Since it will be used in the nearest future by devices only (tun,
vlan, tunnels, bridge, etc), I make it resemble the functionality
of register_pernet_device().

The new ids is stored in the *id pointer _before_ calling the init
callback to make this id available in this callback.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:35:23 -07:00
Adrian Bunk
7ef3abd210 [IRDA]: Remove irlan_eth_send_gratuitous_arp()
Even kernel 2.2.26 (sic) already contains the
  #undef CONFIG_IRLAN_SEND_GRATUITOUS_ARP
with the comment "but for some reason the machine crashes if you use DHCP".

Either someone finally looks into this or it's simply time to remove 
this dead code.

Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:29:24 -07:00
Allan Stephens
0c3141e910 [TIPC]: Overhaul of socket locking logic
This patch modifies TIPC's socket code to follow the same approach
used by other protocols.  This change eliminates the need for a
mutex in the TIPC-specific portion of the socket protocol data
structure -- in its place, the standard Linux socket backlog queue
and associated locking routines are utilized.  These changes fix
a long-standing receive queue bug on SMP systems, and also enable
individual read and write threads to utilize a socket without
unnecessarily interfering with each other.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:22:02 -07:00
David S. Miller
334f8b2afd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26 2008-04-14 03:50:43 -07:00
David S. Miller
df39e8ba56 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ehea/ehea_main.c
	drivers/net/wireless/iwlwifi/Kconfig
	drivers/net/wireless/rt2x00/rt61pci.c
	net/ipv4/inet_timewait_sock.c
	net/ipv6/raw.c
	net/mac80211/ieee80211_sta.c
2008-04-14 02:30:23 -07:00
Jan Engelhardt
3c9fba656a [NETFILTER]: nf_conntrack: replace NF_CT_DUMP_TUPLE macro indrection by function call
Directly call IPv4 and IPv6 variants where the address family is
easily known.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:54 +02:00
Jan Engelhardt
f2ea825f48 [NETFILTER]: nf_nat: use bool type in nf_nat_proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt
5f2b4c9006 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_tuple.h
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt
09f263cd39 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l4proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt
8ce8439a31 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l3proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Jan Engelhardt
9dbae79178 [NETFILTER]: Remove unused callbacks in nf_conntrack_l3proto
These functions are never called.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy
5e8fbe2ac8 [NETFILTER]: nf_conntrack: add tuplehash l3num/protonum accessors
Add accessors for l3num and protonum and get rid of some overly long
expressions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy
dd13b01036 [NETFILTER]: nf_nat: kill helper and seq_adjust hooks
Connection tracking helpers (specifically FTP) need to be called
before NAT sequence numbers adjustments are performed to be able
to compare them against previously seen ones. We've introduced
two new hooks around 2.6.11 to maintain this ordering when NAT
modules were changed to get called from conntrack helpers directly.

The cost of netfilter hooks is quite high and sequence number
adjustments are only rarely needed however. Add a RCU-protected
sequence number adjustment function pointer and call it from
IPv4 conntrack after calling the helper.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy
55871d0479 [NETFILTER]: nf_conntrack_extend: warn on confirmed conntracks
New extensions may only be added to unconfirmed conntracks to avoid races
when reallocating the storage.

Also change NF_CT_ASSERT to use WARN_ON to get backtraces.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:51 +02:00
Patrick McHardy
8c87238b72 [NETFILTER]: nf_nat: don't add NAT extension for confirmed conntracks
Adding extensions to confirmed conntracks is not allowed to avoid races
on reallocation. Don't setup NAT for confirmed conntracks in case NAT
module is loaded late.

The has one side-effect, the connections existing before the NAT module
was loaded won't enter the bysource hash. The only case where this actually
makes a difference is in case of SNAT to a multirange where the IP before
NAT is also part of the range. Since old connections don't enter the
bysource hash the first new connection from the IP will have a new address
selected. This shouldn't matter at all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:51 +02:00
Patrick McHardy
2bc780499a [NETFILTER]: nf_conntrack: add DCCP protocol support
Add DCCP conntrack helper. Thanks to Gerrit Renker <gerrit@erg.abdn.ac.uk>
for review and testing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:49 +02:00
Patrick McHardy
2d2d84c40e [NETFILTER]: nf_nat: remove unused name from struct nf_nat_protocol
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:48 +02:00
Patrick McHardy
535b57c7c1 [NETFILTER]: nf_nat: move NAT ctnetlink helpers to nf_nat_proto_common
Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're
also used by protocols that don't have port numbers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:47 +02:00
Patrick McHardy
937e0dfd87 [NETFILTER]: nf_nat: add helpers for common NAT protocol operations
Add generic ->in_range and ->unique_tuple ops to avoid duplicating them
again and again for future NAT modules and save a few bytes of text:

net/ipv4/netfilter/nf_nat_proto_tcp.c:
  tcp_in_range     |  -62 (removed)
  tcp_unique_tuple | -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0
 2 functions changed, 321 bytes removed

net/ipv4/netfilter/nf_nat_proto_udp.c:
  udp_in_range     |  -62 (removed)
  udp_unique_tuple | -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0
 2 functions changed, 321 bytes removed

net/ipv4/netfilter/nf_nat_proto_gre.c:
  gre_in_range |  -62 (removed)
 1 function changed, 62 bytes removed

vmlinux:
 5 functions changed, 704 bytes removed

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:46 +02:00
Gerrit Renker
7de6c03336 [SKB]: __skb_append = __skb_queue_after
This expresses __skb_append in terms of __skb_queue_after, exploiting that

  __skb_append(old, new, list) = __skb_queue_after(list, old, new).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:05:09 -07:00
YOSHIFUJI Hideaki
e9df2e8fd8 [IPV6]: Use appropriate sock tclass setting for routing lookup.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:40:51 -07:00
Pavel Emelyanov
0204774191 [NETNS][DCCPV6]: Move the dccp_v6_ctl_sk on the struct net.
And replace all its usage with init_net's socket.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:32:25 -07:00
Pavel Emelyanov
7b1cffa8c9 [NETNS][DCCPV4]: Move the dccp_v4_ctl_sk on the struct net.
And replace all its usage with init_net's socket.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:29:37 -07:00
Pavel Emelyanov
67019cc9ee [NETNS]: Add an empty netns_dccp structure on struct net.
According to the overall struct net design, it will be
filled with DCCP-related members.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:28:42 -07:00
Denis V. Lunev
5f4472c5a6 [TCP]: Remove owner from tcp_seq_afinfo.
Move it to tcp_seq_afinfo->seq_fops as should be.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:13:53 -07:00
Denis V. Lunev
68fcadd16c [TCP]: Place file operations directly into tcp_seq_afinfo.
No need to have separate never-used variable.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:13:30 -07:00
Denis V. Lunev
9427c4b36b [TCP]: Move seq_ops from tcp_iter_state to tcp_seq_afinfo.
No need to create seq_operations for each instance of 'netstat'.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:12:13 -07:00
Denis V. Lunev
a4146b1b2c [TCP]: Replace struct net on tcp_iter_state with seq_net_private.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:11:14 -07:00
David S. Miller
6fb9114e4b Merge branch 'net-2.6.26-misc-20080412b' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-dev 2008-04-12 19:19:46 -07:00
Paul Moore
00447872a6 NetLabel: Allow passing the LSM domain as a shared pointer
Smack doesn't have the need to create a private copy of the LSM "domain" when
setting NetLabel security attributes like SELinux, however, the current
NetLabel code requires a private copy of the LSM "domain".  This patches fixes
that by letting the LSM determine how it wants to pass the domain value.

 * NETLBL_SECATTR_DOMAIN_CPY
   The current behavior, NetLabel assumes that the domain value is a copy and
   frees it when done

 * NETLBL_SECATTR_DOMAIN
   New, Smack-friendly behavior, NetLabel assumes that the domain value is a
   reference to a string managed by the LSM and does not free it when done

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-12 19:06:42 -07:00
Vlad Yasevich
ab38fb04c9 [SCTP]: Fix compiler warning about const qualifiers
Fix 3 warnings about discarding const qualifiers:

net/sctp/ulpevent.c:862: warning: passing argument 1 of 'sctp_event2skb' discards qualifiers from pointer target type
net/sctp/sm_statefuns.c:4393: warning: passing argument 1 of 'SCTP_ASOC' discards qualifiers from pointer target type
net/sctp/socket.c:5874: warning: passing argument 1 of 'cmsg_nxthdr' discards qualifiers from pointer target type

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-12 18:40:06 -07:00
Gui Jianfeng
f4ad85ca3e [SCTP]: Fix protocol violation when receiving an error lenght INIT-ACK
When receiving an error length INIT-ACK during COOKIE-WAIT,
a 0-vtag ABORT will be responsed. This action violates the
protocol apparently. This patch achieves the following things.
1 If the INIT-ACK contains all the fixed parameters, use init-tag
  recorded from INIT-ACK as vtag.
2 If the INIT-ACK doesn't contain all the fixed parameters,
  just reflect its vtag.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-12 18:39:34 -07:00
YOSHIFUJI Hideaki
7f1eced8b0 [IPV6] MIP6: Use our standard definitions for paddings.
MIP6_OPT_PAD_X are actually for paddings in destination
option header.  Replace them with our standard IPV6_TLV_PADX.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:22 +09:00
YOSHIFUJI Hideaki
f3ee4010e8 [IPV6]: Define constants for link-local multicast addresses.
- Define link-local all-node / all-router multicast addresses.
- Remove ipv6_addr_all_nodes() and ipv6_addr_all_routers().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:19 +09:00
YOSHIFUJI Hideaki
9acd9f3ae9 [IPV6]: Make address arguments const.
- net/ipv6/addrconf.c:
	ipv6_get_ifaddr(), ipv6_dev_get_saddr()
- net/ipv6/mcast.c:
	ipv6_sock_mc_join(), ipv6_sock_mc_drop(),
	inet6_mc_check(),
	ipv6_dev_mc_inc(), __ipv6_dev_mc_dec(), ipv6_dev_mc_dec(),
	ipv6_chk_mcast_addr()
- net/ipv6/route.c:
	rt6_lookup(), icmp6_dst_alloc()
- net/ipv6/ip6_output.c:
	ip6_nd_hdr()
- net/ipv6/ndisc.c:
	ndisc_send_ns(), ndisc_send_rs(), ndisc_send_redirect(),
	ndisc_get_neigh(), __ndisc_send()

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:18 +09:00
YOSHIFUJI Hideaki
dfd982baff [IPV6] ADDRCONF: Uninline ipv6_isatap_eui64().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:17 +09:00
YOSHIFUJI Hideaki
3eb84f4929 [IPV6] ADDRCONF: Uninline ipv6_addr_hash().
The function is only used in net/ipv6/addrconf.c.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:15 +09:00
YOSHIFUJI Hideaki
fed85383ac [IPV6]: Use XOR and OR rather than mutiple ands for ipv6 address comparisons.
ipv6_addr_equal(), ipv6_addr_v4mapped(),
ipv6_addr_is_ll_all_{nodes,routers}(),
ipv6_masked_addr_cmp()

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:14 +09:00
Florian Westphal
4dfc281702 [Syncookies]: Add support for TCP options via timestamps.
Allow the use of SACK and window scaling when syncookies are used
and the client supports tcp timestamps. Options are encoded into
the timestamp sent in the syn-ack and restored from the timestamp
echo when the ack is received.

Based on earlier work by Glenn Griffin.
This patch avoids increasing the size of structs by encoding TCP
options into the least significant bits of the timestamp and
by not using any 'timestamp offset'.

The downside is that the timestamp sent in the packet after the synack
will increase by several seconds.

changes since v1:
 don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
 and have ipv6/syncookies.c use it.
 Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()

Reviewed-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 03:12:40 -07:00
Rami Rosen
5c06f510a2 [IPV6]: Remove unused declarations in include/net/ip6_route.h.
1) Standlaone ip6_null_entry is no longer needed as it is replaced by
   the ip6_null_entry member of ipv6 (instance of struct netns_ipv6) in
   struct net (as a result of Network Namespaces patches).


2) These 3 methods from this same header are not defined anywhere:
   ip6_rt_addr_add(), ip6_rt_addr_del(), rt6_sndmsg()

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 02:31:20 -07:00
Rami Rosen
3cccd60784 [IPV6] Remove three method declarations in include/net/ndisc.h.
This patch removes two unused method declarations in
include/net/ndisc.h: ndisc_forwarding_on(void) and
ndisc_forwarding_off(void);

Also igmp6_cleanup(void) appears twice in this header, so one
igmp6_cleanup(void) declaration is removed.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 02:01:21 -07:00
Stephen Hemminger
43db6d65e0 socket: sk_filter deinline
The sk_filter function is too big to be inlined. This saves 2296 bytes
of text on allyesconfig.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 01:43:09 -07:00
Mohamed Abbas
84363e6e07 mac80211: notify mac from low level driver (iwlwifi)
Add new API to MAC80211 to allow low level driver to
notify MAC with driver status.

Signed-off-by: Mohamed Abbas <mabbas@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-08 16:44:43 -04:00
Chr
fff7710937 mac80211: add station aid into ieee80211_tx_control
This patch is necessary for the upcoming Accesspoint patch for p54.

Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-08 15:05:57 -04:00
Tomas Winkler
21c0cbe760 mac80211: add association capabilty and timing info into bss_conf
This patch adds assocation capability, timestamp (tsf) and beacon interval
to bss_conf. This is required for successful assocation of iwlwifi drivers

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-08 15:05:56 -04:00
Tomas Winkler
38668c059f mac80211: eliminate conf_ht
This patch eliminates the use of conf_ht, replacing it with
bss_info_changed.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-08 15:05:56 -04:00
David S. Miller
8eefca4888 Merge branch 'net-2.6.26-isatap-20080403' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-dev 2008-04-08 02:33:36 -07:00
Ilpo Järvinen
882bebaaca [TCP]: tcp_simple_retransmit can cause S+L
This fixes Bugzilla #10384

tcp_simple_retransmit does L increment without any checking
whatsoever for overflowing S+L when Reno is in use.

The simplest scenario I can currently think of is rather
complex in practice (there might be some more straightforward
cases though). Ie., if mss is reduced during mtu probing, it
may end up marking everything lost and if some duplicate ACKs
arrived prior to that sacked_out will be non-zero as well,
leading to S+L > packets_out, tcp_clean_rtx_queue on the next
cumulative ACK or tcp_fastretrans_alert on the next duplicate
ACK will fix the S counter.

More straightforward (but questionable) solution would be to
just call tcp_reset_reno_sack() in tcp_simple_retransmit but
it would negatively impact the probe's retransmission, ie.,
the retransmissions would not occur if some duplicate ACKs
had arrived.

So I had to add reno sacked_out reseting to CA_Loss state
when the first cumulative ACK arrives (this stale sacked_out
might actually be the explanation for the reports of left_out
overflows in kernel prior to 2.6.23 and S+L overflow reports
of 2.6.24). However, this alone won't be enough to fix kernel
before 2.6.24 because it is building on top of the commit
1b6d427bb7 ([TCP]: Reduce sacked_out with reno when purging
write_queue) to keep the sacked_out from overflowing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-07 22:33:07 -07:00
Denis V. Lunev
046ee90235 [NETNS]: Create tcp control socket in the each namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:31:33 -07:00
Denis V. Lunev
5677242f43 [NETNS]: Inet control socket should not hold a namespace.
This is a generic requirement, so make inet_ctl_sock_create namespace
aware and create a inet_ctl_sock_destroy wrapper around
sk_release_kernel.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:28:30 -07:00
Denis V. Lunev
eee4fe4ded [INET]: Let inet_ctl_sock_create return sock rather than socket.
All upper protocol layers are already use sock internally.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:27:58 -07:00
Denis V. Lunev
3d58b5fa8e [INET]: Rename inet_csk_ctl_sock_create to inet_ctl_sock_create.
This call is nothing common with INET connection sockets code. It
simply creates an unhashes kernel sockets for protocol messages.

Move the new call into af_inet.c after the rename.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:22:32 -07:00
Denis V. Lunev
a4aa834a91 [NETNS]: Declare init_net even without CONFIG_NET defined.
This does not look good, but there is no other choice. The compilation
without CONFIG_NET is broken and can not be fixed with ease.

After that there is no need for the following commits:
1567ca7eec
3edf8fa5cc
2d38f9a4f8

Revert them.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 13:04:33 -07:00
David S. Miller
e1ec1b8ccd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/s2io.c
2008-04-02 22:35:23 -07:00
YOSHIFUJI Hideaki
52eeeb8481 [IPV6]: Unify ip6_onlink() and ipip6_onlink().
Both are identical, let's create ipv6_chk_prefix() and use it
in both places.
2008-04-03 10:06:00 +09:00
YOSHIFUJI Hideaki
300aaeeaab [IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-03 10:06:00 +09:00
Templin, Fred L
fadf6bf060 [IPV6] SIT: Add PRL management for ISATAP.
This patch updates the Linux the Intra-Site Automatic Tunnel Addressing
Protocol (ISATAP) implementation. It places the ISATAP potential router
list (PRL) in the kernel and adds three new private ioctls for PRL
management.

[Add several changes of structure name, constant names etc. - yoshfuji]

Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-03 10:05:58 +09:00
Denis V. Lunev
c0f39322c3 [NETNS]: Do not include net/net_namespace.h from seq_file.h
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-02 00:10:28 -07:00
Denis V. Lunev
225c0a0107 [NETNS]: Merge ifdef CONFIG_NET in include/net/net_namespace.h.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-02 00:09:29 -07:00
Joonwoo Park
f83f1768f8 [LLC]: skb allocation size for responses
Allocate the skb for llc responses with the received packet size by
using the size adjustable llc_frame_alloc.
Don't allocate useless extra payload.
Cleanup magic numbers.

So, this fixes oops.
Reported by Jim Westfall:
kernel: skb_over_panic: text:c0541fc7 len:1000 put:997 head:c166ac00 data:c166ac2f tail:0xc166b017 end:0xc166ac80 dev:eth0
kernel: ------------[ cut here ]------------
kernel: kernel BUG at net/core/skbuff.c:95!

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31 21:02:47 -07:00
Pavel Emelyanov
70ee115942 [SOCK][NETNS]: Add the percpu prot_inuse counter in the struct net.
Such an accounting would cost us two more dereferences to get the
percpu variable from the struct net, so I make sock_prot_inuse_get
and _add calls work differently depending on CONFIG_NET_NS - without
it old optimized routines are used.

The per-cpu counter for init_net is prepared in core_initcall, so
that even af_inet, that starts as fs_initcall, will already have the
init_net prepared.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31 19:42:16 -07:00
Pavel Emelyanov
c29a0bc4df [SOCK][NETNS]: Add a struct net argument to sock_prot_inuse_add and _get.
This counter is about to become per-proto-and-per-net, so we'll need 
two arguments to determine which cell in this "table" to work with.

All the places, but proc already pass proper net to it - proc will be
tuned a bit later.

Some indentation with spaces in proc files is done to keep the file
coding style consistent.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31 19:41:46 -07:00
Pavel Emelyanov
8efa6e93cb [NETNS]: Introduce a netns_core structure.
There's already some stuff on the struct net, that should better
be folded into netns_core structure. I'm making the per-proto inuse 
counter be per-net also, which is also a candidate for this, so 
introduce this structure and populate it a bit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31 19:41:14 -07:00
Denis V. Lunev
4ad96d39a2 [UDP]: Remove owner from udp_seq_afinfo.
Move it to udp_seq_afinfo->seq_fops as should be.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 18:25:53 -07:00
Denis V. Lunev
3ba9441bdf [UDP]: Place file operations directly into udp_seq_afinfo.
No need to have separate never-used variable.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 18:25:32 -07:00
Denis V. Lunev
dda61925f8 [UDP]: Move seq_ops from udp_iter_state to udp_seq_afinfo.
No need to create seq_operations for each instance of 'netstat'.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 18:24:26 -07:00
Denis V. Lunev
6f191efe48 [UDP]: Replace struct net on udp_iter_state with seq_net_private.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 18:23:33 -07:00
Pavel Emelyanov
bdcde3d71a [SOCK]: Drop inuse pcounter from struct proto (v2).
An uppercut - do not use the pcounter on struct proto.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:39:33 -07:00
Pavel Emelyanov
60e7663d46 [SOCK]: Drop per-proto inuse init and fre functions (v2).
Constructive part of the set is finished here. We have to remove the
pcounter, so start with its init and free functions.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:39:10 -07:00
Pavel Emelyanov
1338d466d9 [SOCK]: Introduce a percpu inuse counters array (v2).
And redirect sock_prot_inuse_add and _get to use one.

As far as the dereferences are concerned. Before the patch we made
1 dereference to proto->inuse.add call, the call itself and then
called the __get_cpu_var() on a static variable. After the patch we 
make a direct call, then one dereference to proto->inuse_idx and 
then the same __get_cpu_var() on a still static variable. So this 
patch doesn't seem to produce performance penalty on SMP.

This is not per-net yet, but I will deliberately make NET_NS=y case
separated from NET_NS=n one, since it'll cost us one-or-two more 
dereferences to get the struct net and the inuse counter.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:38:43 -07:00
Pavel Emelyanov
13ff3d6fa4 [SOCK]: Enumerate struct proto-s to facilitate percpu inuse accounting (v2).
The inuse counters are going to become a per-cpu array.  Introduce an
index for this array on the struct proto.

To handle the case of proto register-unregister-register loop the
bitmap is used. All its bits manipulations are protected with
proto_list_lock and a sanity check for the bitmap being exhausted is
also added.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:38:17 -07:00
Joe Perches
bc578a54f0 [NET]: Rename inet_frag.h identifiers COMPLETE, FIRST_IN, LAST_IN to INET_FRAG_*
On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote:
> they should all be renamed.

Done for include/net and net

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:35:27 -07:00
Joonwoo Park
a5a04819c5 [LLC]: station source mac address
kill unnecessary llc_station_mac_sa.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:28:36 -07:00
Rami Rosen
be2ce06b49 [IPV6]: Remove unused method declaration in include/net/addrconf.h.
This patches removes unused declaration of addrconf_forwarding_on() method
in include/net/addrconf.h.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:26:45 -07:00
David S. Miller
8e8e43843b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/usb/rndis_host.c
	drivers/net/wireless/b43/dma.c
	net/ipv6/ndisc.c
2008-03-27 18:48:56 -07:00
David S. Miller
ed85f2c3b2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.26 2008-03-27 18:01:13 -07:00
Ilpo Järvinen
bc09dff198 [SCTP]: Remove sctp_add_cmd_sf wrapper bloat
With a was number of callsites sctp_add_cmd_sf wrapper bloats
kernel by some amount. Due to unlikely tracking allyesconfig,
with the initial result were around ~7kB (thus caught my
attention) while a non-debug config produced only ~2.3kB effect.

I (ij) proposed first a patch to uninline it but Vlad responded
with a patch that removed the only sctp_add_cmd call which is
wrapped by sctp_add_cmd_sf (I wasn't sure if I could do that).
I did minor cleanup to Vlad's patch.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 17:54:29 -07:00
Ilpo Järvinen
8d3308687f [NET]: uninline dst_release
Codiff stats (allyesconfig, v2.6.24-mm1):
-16420  187 funcs, 103 +, 16523 -, diff: -16420 --- dst_release

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
-7257  186 funcs, 70 +, 7327 -, diff: -7257 --- dst_release
dst_release                   |  +40

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 17:53:31 -07:00
Rami Rosen
4f95165d4b [IPV6]: Remove three unused method declarations in include/net/ipv6.h
This patch removes three unused method declarations in include/net/ipv6.h:
inet_getfrag_t(), ipv6_build_nfrag_opts() and ipv6_build_frag_opts().

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 17:39:19 -07:00
Denis V. Lunev
09382bac66 [PKT_SCHED]: Pass real namespace in net scheduler classifiers.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 16:53:37 -07:00
Johannes Berg
6c507cd040 cfg80211: don't export ieee80211_get_channel
This patch makes ieee80211_get_channel a static inline defined in
cfg80211's header file which simply calls __ieee80211_get_channel
to avoid symbol clashes with the ieee80211 code.

The problem was pointed out by David Miller, thanks!

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-27 16:03:20 -04:00
Benjamin Thery
60e8fbc4c5 [NETNS][IPV6] flowlabels - make flowlabels per namespace
This patch introduces a new member, fl_net, in struct ip6_flowlabel.
This allows to create labels with the same value in different namespaces.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 16:53:08 -07:00
Daniel Lezcano
6ab57e7e7f [NETNS][IPV6] anycast - handle several network namespace
Make use of the network namespace information to have this protocol to
handle several network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 16:52:32 -07:00
Herbert Xu
732c8bd590 [IPSEC]: Fix BEET output
The IPv6 BEET output function is incorrectly including the inner
header in the payload to be protected.  This causes a crash as
the packet doesn't actually have that many bytes for a second
header.

The IPv4 BEET output on the other hand is broken when it comes
to handling an inner IPv6 header since it always assumes an
inner IPv4 header.

This patch fixes both by making sure that neither BEET output
function touches the inner header at all.  All access is now
done through the protocol-independent cb structure.  Two new
attributes are added to make this work, the IP header length
and the IPv4 option length.  They're filled in by the inner
mode's output function.

Thanks to Joakim Koskela for finding this problem.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 16:51:09 -07:00
Pavel Emelyanov
68528f0998 [NETNS][ICMP]: Make ctl tables for ICMP sysctls per-net.
Add some flesh to ipv4_sysctl_init_net and ipv4_sysctl_exit_net,
i.e. copy the table, alter .data pointers and register it per-net.

Other ipv4_table's sysctls are now global, but this is going to
change once sysctl permissions patches migrate from -mm tree to 
mainline in 2.6.26 merge window :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 01:56:24 -07:00
Pavel Emelyanov
a24022e188 [NETNS][ICMP]: Move ICMP sysctls on struct net.
Initialization is moved to icmp_sk_init, all the places, that
refer to them use init_net for now.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 01:55:37 -07:00
Denis V. Lunev
f5aa23fd49 [NETNS]: Compilation warnings under CONFIG_NET_NS.
Recent commits from YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
have been introduced a several compilation warnings
'assignment discards qualifiers from pointer target type'
due to extra const modifier in the inline call parameters of
{dev|sock|twsk}_net_set.

Drop it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 00:48:17 -07:00
Patrick McHardy
0d0ab0378d [NETFILTER]: nf_conntrack_sip: support multiple media channels
Add support for multiple media channels and use it to create
expectations for video streams when present.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-25 20:26:24 -07:00
Patrick McHardy
0f32a40fc9 [NETFILTER]: nf_conntrack_sip: create signalling expectations
Create expectations for incoming signalling connections when seeing
a REGISTER request. This is needed when the registrar uses a
different source port number for signalling messages and for receiving
incoming calls from other endpoints than the registrar.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-25 20:25:13 -07:00
Patrick McHardy
b8beedd25d [NETFILTER]: Add nf_inet_addr_cmp()
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-25 20:09:33 -07:00
Patrick McHardy
6002f266b3 [NETFILTER]: nf_conntrack: introduce expectation classes and policies
Introduce expectation classes and policies. An expectation class
is used to distinguish different types of expectations by the
same helper (for example audio/video/t.120). The expectation
policy is used to hold the maximum number of expectations and
the initial timeout for each class.

The individual classes are isolated from each other, which means
that for example an audio expectation will only evict other audio
expectations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-25 20:09:15 -07:00
Patrick McHardy
359b9ab614 [NETFILTER]: nf_conntrack_expect: support inactive expectations
This is useful for the SIP helper and signalling expectations.
We don't want to create a full-blown expectation with a wildcard
as source based on a single UDP packet, but need to know the
final port anyways. With inactive expectations we can register
the expectation and reserve the tuple, but wait for confirmation
from the registrar before activating it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-25 20:08:37 -07:00
Patrick McHardy
1d9d752259 [NETFILTER]: nf_conntrack_expect: constify nf_ct_expect_init arguments
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-25 20:07:58 -07:00
Patrick McHardy
ef27559b70 [NETFILTER]: nf_conntrack: fix NF_CT_TUPLE_DUMP for IPv4
NF_CT_TUPLE_DUMP prints IPv4 addresses as IPv6, fix this and use printk
(guarded by #ifdef DEBUG) directly instead of pr_debug since the tuple
is usually printed at the end of line and we don't want to include a
log-level.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-25 20:07:38 -07:00
David S. Miller
dfe98e9214 Merge branch 'net-2.6.26-netns-20080326' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-dev 2008-03-25 19:43:59 -07:00
David S. Miller
f89e6e3834 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.26 2008-03-25 17:20:03 -07:00
Johannes Berg
906c730a2d wireless: add wiphy channel freq to channel struct lookup helper
Add ieee80211_get_channel() which gets you a channel struct for a
specific wiphy if that channel is present in that wiphy.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-25 16:41:55 -04:00
Emmanuel Grumbach
9ae4fda332 mac80211: allows driver to request a Phase 1 RX key
This patch makes mac80211 able to send a phase1 key for TKIP
decryption.
This is needed for drivers that don't do the rekeying by themselves
(i.e. iwlwifi). Upon IV16 wrap around, the packet is decrypted in SW,
if decryption is ok, mac80211 calls to update_tkip_key  with a new
phase 1 RX key.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-25 16:41:53 -04:00
Emmanuel Grumbach
5d2cdcd4e8 mac80211: get a TKIP phase key from skb
This patch makes mac80211 able to compute a TKIP key from an skb.
The requested key can be a phase 1 or a phase 2 key.
This is useful for drivers who need to provide tkip key to their
HW to enable HW encryption.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-25 16:41:52 -04:00
YOSHIFUJI Hideaki
878628fbf2 [NET] NETNS: Omit namespace comparision without CONFIG_NET_NS.
Introduce an inline net_eq() to compare two namespaces.
Without CONFIG_NET_NS, since no namespace other than &init_net
exists, it is always 1.

We do not need to convert 1) inline vs inline and
2) inline vs &init_net comparisons.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:40:00 +09:00
YOSHIFUJI Hideaki
57da52c1e6 [NET] NETNS: Omit neigh_parms->net and pneigh_entry->net without CONFIG_NET_NS.
Introduce neigh_parms/pneigh_entry inlines: neigh_parms_net(), pneigh_net().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:58 +09:00
YOSHIFUJI Hideaki
3b1e0a655f [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:55 +09:00
YOSHIFUJI Hideaki
7cbca67c07 [IPV6]: Support Source Address Selection API (RFC5014).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:24:01 +09:00
YOSHIFUJI Hideaki
6b75d09081 [IPV6]: Optimize hop-limit determination.
Last part of hop-limit determination is always:
    hoplimit = dst_metric(dst, RTAX_HOPLIMIT);
    if (hoplimit < 0)
        hoplimit = ipv6_get_hoplimit(dst->dev).

Let's consolidate it as ip6_dst_hoplimit(dst).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:24:00 +09:00
YOSHIFUJI Hideaki
c8cdaf998d [IPV4,IPV6]: Share cork.rt between IPv4 and IPv6.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:23:59 +09:00
YOSHIFUJI Hideaki
9bb182a700 [XFRM] MIP6: Fix address keys for routing search.
Each MIPv6 XFRM state (DSTOPT/RH2) holds either destination or source
address to be mangled in the IPv6 header (that is "CoA").
On Inter-MN communication after both nodes binds each other,
they use route optimized traffic two MIPv6 states applied, and
both source and destination address in the IPv6 header
are replaced by the states respectively.
The packet format is correct, however, next-hop routing search
are not.
This patch fixes it by remembering address pairs for later states.

Based on patch from Masahide NAKAMURA <nakam@linux-ipv6.org>.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:23:57 +09:00
Denis V. Lunev
f145049a06 [NETNS]: Drop packets in the non-initial namespace on the per/protocol basis.
IP layer now can handle multiple namespaces normally. So, process such
packets normally and drop them only if the transport layer is not
aware about namespaces.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 15:33:00 -07:00
Denis V. Lunev
7a6adb92fe [NETNS]: Add namespace parameter to ip_cmsg_send.
Pass the init_net there for now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 15:30:27 -07:00
Denis V. Lunev
f2c4802b3f [NETNS]: Add namespace parameter to ip_options_get(...).
Pass the init_net there for now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 15:29:55 -07:00
Denis V. Lunev
0e6bd4a1c6 [NETNS]: Add namespace parameter to ip_options_compile.
ip_options_compile uses inet_addr_type which requires a namespace. The
packet argument is optional, so parameter is the only way to obtain
it. Pass the init_net there for now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 15:29:23 -07:00
Kazunori MIYAZAWA
df9dcb4588 [IPSEC]: Fix inter address family IPsec tunnel handling.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 14:51:51 -07:00
Pavel Emelyanov
fa86d322d8 [NEIGH]: Fix race between pneigh deletion and ipv6's ndisc_recv_ns (v3).
Proxy neighbors do not have any reference counting, so any caller
of pneigh_lookup (unless it's a netlink triggered add/del routine)
should _not_ perform any actions on the found proxy entry. 

There's one exception from this rule - the ipv6's ndisc_recv_ns() 
uses found entry to check the flags for NTF_ROUTER.

This creates a race between the ndisc and pneigh_delete - after 
the pneigh is returned to the caller, the nd_tbl.lock is dropped 
and the deleting procedure may proceed.

One of the fixes would be to add a reference counting, but this
problem exists for ndisc only. Besides such a patch would be too 
big for -rc4.

So I propose to introduce a __pneigh_lookup() which is supposed
to be called with the lock held and use it in ndisc code to check
the flags on alive pneigh entry.


Changes from v2:
As David noticed, Exported the __pneigh_lookup() to ipv6 module. 
The checkpatch generates a warning on it, since the EXPORT_SYMBOL 
does not follow the symbol itself, but in this file all the 
exports come at the end, so I decided no to break this harmony.

Changes from v1:
Fixed comments from YOSHIFUJI - indentation of prototype in header
and the pndisc_check_router() name - and a compilation fix, pointed
by Daniel - the is_routed was (falsely) considered as uninitialized
by gcc.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 14:48:59 -07:00
David S. Miller
06802a819a Merge branch 'master' of ../net-2.6/
Conflicts:

	net/ipv6/ndisc.c
2008-03-23 22:54:03 -07:00
Florian Westphal
80445cfb28 [SCTP]: Remove redundant wrapper functions.
sctp_datamsg_free and sctp_datamsg_track are just aliases for
sctp_datamsg_put and sctp_chunk_hold, respectively.

Saves 32 Bytes on x86.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-23 22:47:08 -07:00
Florian Westphal
2051f11fb8 [TCP]: Shrink syncookie_secret by 8 byte.
the first u32 copied from syncookie_secret is overwritten by the
minute-counter four lines below.  After adjusting the destination
address, the size of syncookie_secret can be reduced accordingly.

AFAICS, the only other user of syncookie_secret[] is the ipv6
syncookie support.  Because ipv6 syncookies only grab 44 bytes from
syncookie_secret[], this shouldn't affect them in any way.

With fixes from Glenn Griffin.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-23 22:21:28 -07:00
Joe Perches
7d164be8aa [NET]: include/net/route.h - remove duplicate include
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-23 22:03:56 -07:00
Pavel Emelyanov
fc8717baa8 [RAW]: Add raw_hashinfo member on struct proto.
Sorry for the patch sequence confusion :| but I found that the similar
thing can be done for raw sockets easily too late.

Expand the proto.h union with the raw_hashinfo member and use it in
raw_prot and rawv6_prot. This allows to drop the protocol specific
versions of hash and unhash callbacks.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 16:56:51 -07:00
Pavel Emelyanov
6ba5a3c52d [UDP]: Make full use of proto.h.udp_hash innovation.
After this we have only udp_lib_get_port to get the port and two 
stubs for ipv4 and ipv6. No difference in udp and udplite except
for initialized h.udp_hash member.

I tried to find a graceful way to drop the only difference between
udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison 
routine), but adding one more callback on the struct proto didn't 
appear such :( Maybe later.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 16:51:21 -07:00
Pavel Emelyanov
39d8cda76c [SOCK]: Add udp_hash member to struct proto.
Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to 
struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4 
and -v6 protocols.

The result is not that exciting, but it removes some levels of
indirection in udpxxx_get_port and saves some space in code and text.

The first step is to union existing hashinfo and new udp_hash on the
struct proto and give a name to this union, since future initialization 
of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc 
warning about inability to initialize anonymous member this way.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 16:50:58 -07:00
Denis V. Lunev
ef722495c8 [IPV4]: Remove unused ip_options->is_data.
ip_options->is_data is assigned only and never checked. The structure is
not a part of kernel interface to the userspace. So, it is safe to remove
this field.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 16:35:29 -07:00
Patrick McManus
ec3c0982a2 [TCP]: TCP_DEFER_ACCEPT updates - process as established
Change TCP_DEFER_ACCEPT implementation so that it transitions a
connection to ESTABLISHED after handshake is complete instead of
leaving it in SYN-RECV until some data arrvies. Place connection in
accept queue when first data packet arrives from slow path.

Benefits:
  - established connection is now reset if it never makes it
   to the accept queue

 - diagnostic state of established matches with the packet traces
   showing completed handshake

 - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
   enforced with reasonable accuracy instead of rounding up to next
   exponential back-off of syn-ack retry.

Signed-off-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 16:33:01 -07:00
Stephen Hemminger
4cd9029d25 socket: SOCK_DEBUG type checking
Use the inline trick (same as pr_debug) to get checking of debug
statements even if no code is generated.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 15:54:53 -07:00
David S. Miller
1233823b08 [SCTP]: Fix build warnings with IPV6 disabled.
Introduced by 270637abff
("[SCTP]: Fix a race between module load and protosw access")

Reported by Gabriel C:

In file included from net/sctp/sm_statetable.c:50:
include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
In file included from net/sctp/sm_statefuns.c:62:
include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
 ...

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 15:40:47 -07:00
Daniel Lezcano
6f8b13bcb3 [NETNS][IPV6] tcp6 - make proc per namespace
Make the proc for tcp6 to be per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:14:45 -07:00
Daniel Lezcano
0c96d8c50b [NETNS][IPV6] udp6 - make proc per namespace
The proc init/exit functions take a new network namespace parameter in
order to register/unregister /proc/net/udp6 for a namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:14:17 -07:00
Daniel Lezcano
f40c8174d3 [NETNS][IPV4] tcp - make proc handle the network namespaces
This patch, like udp proc, makes the proc functions to take care of
which namespace the socket belongs.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:13:54 -07:00
Daniel Lezcano
a91275eff4 [NETNS][IPV6] udp - make proc handle the network namespace
This patch makes the common udp proc functions to take care of which
socket they should show taking into account the namespace it belongs.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:11:58 -07:00
Peter P Waskiewicz Jr
82cc1a7a56 [NET]: Add per-connection option to set max TSO frame size
Update: My mailer ate one of Jarek's feedback mails...  Fixed the
parameter in netif_set_gso_max_size() to be u32, not u16.  Fixed the
whitespace issue due to a patch import botch.  Changed the types from
u32 to unsigned int to be more consistent with other variables in the
area.  Also brought the patch up to the latest net-2.6.26 tree.

Update: Made gso_max_size container 32 bits, not 16.  Moved the
location of gso_max_size within netdev to be less hotpath.  Made more
consistent names between the sock and netdev layers, and added a
define for the max GSO size.

Update: Respun for net-2.6.26 tree.

Update: changed max_gso_frame_size and sk_gso_max_size from signed to
unsigned - thanks Stephen!

This patch adds the ability for device drivers to control the size of
the TSO frames being sent to them, per TCP connection.  By setting the
netdevice's gso_max_size value, the socket layer will set the GSO
frame size based on that value.  This will propogate into the TCP
layer, and send TSO's of that size to the hardware.

This can be desirable to help tune the bursty nature of TSO on a
per-adapter basis, where one may have 1 GbE and 10 GbE devices
coexisting in a system, one running multiqueue and the other not, etc.

This can also be desirable for devices that cannot support full 64 KB
TSO's, but still want to benefit from some level of segmentation
offloading.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 03:43:19 -07:00
David S. Miller
a25606c845 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-03-21 03:42:24 -07:00
Vlad Yasevich
270637abff [SCTP]: Fix a race between module load and protosw access
There is a race is SCTP between the loading of the module
and the access by the socket layer to the protocol functions.
In particular, a list of addresss that SCTP maintains is
not initialized prior to the registration with the protosw.
Thus it is possible for a user application to gain access
to SCTP functions before everything has been initialized.
The problem shows up as odd crashes during connection
initializtion when we try to access the SCTP address list.

The solution is to refactor how we do registration and
initialize the lists prior to registering with the protosw.
Care must be taken since the address list initialization
depends on some other pieces of SCTP initialization.  Also
the clean-up in case of failure now also needs to be refactored.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20 15:17:14 -07:00
David S. Miller
577f99c1d0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/rt2x00/rt2x00dev.c
	net/8021q/vlan_dev.c
2008-03-18 00:37:55 -07:00
Al Viro
8e3d716cce xfrm: ->eth_proto is __be16
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17 22:49:16 -07:00
Joe Perches
068edceb7e include/net/ieee80211.h - remove duplicate include
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-13 19:32:31 -04:00
Adrian Bunk
7524d7d6de the scheduled ieee80211 softmac removal
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-13 16:02:31 -04:00
Zhang Yanmin
f1dd9c379c [NET]: Fix tbench regression in 2.6.25-rc1
Comparing with kernel 2.6.24, tbench result has regression with
2.6.25-rc1.

1) On 2 quad-core processor stoakley: 4%.
2) On 4 quad-core processor tigerton: more than 30%.

bisect located below patch.

b4ce92775c is first bad commit
commit b4ce92775c
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Nov 13 21:33:32 2007 -0800

    [IPV6]: Move nfheader_len into rt6_info

    The dst member nfheader_len is only used by IPv6.  It's also currently
    creating a rather ugly alignment hole in struct dst.  Therefore this patch
    moves it from there into struct rt6_info.

Above patch changes the cache line alignment, especially member
__refcnt. I did a testing by adding 2 unsigned long pading before
lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next
cache line. The performance is recovered.

I created a patch to rearrange the members in struct dst_entry.

With Eric and Valdis Kletnieks's suggestion, I made finer arrangement.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So
   sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I
   tested many patches on my 16-core tigerton by moving tclassid to
   different place. It looks like tclassid could also have impact on
   performance.  If moving tclassid before metrics, or just don't move
   tclassid, the performance isn't good. So I move it behind metrics.

2) Add comments before __refcnt.

On 16-core tigerton:

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18%
better than the one without the patch;

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30%
better than the one without the patch.

With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't
introduce regression.

Thank Eric, Valdis, and David!

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-12 22:52:37 -07:00
David S. Miller
ba73d4c84a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6.26 2008-03-11 19:17:18 -07:00
Pekka Enberg
019f692ea7 [NETFILTER]: nf_conntrack: replace horrible hack with ksize()
There's a horrible slab abuse in net/netfilter/nf_conntrack_extend.c
that can be replaced with a call to ksize().

Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-10 16:43:41 -07:00
Ron Rindjunsky
6c5ef8a705 mac80211: document IEEE80211_TXCTL_OFDM_HT
This patch clarifies the use of IEEE80211_TXCTL_OFDM_HT flag.

Can by united with patch "mac80211: adding mac80211_tx_control
flags and HT flags"

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-07 16:03:01 -05:00
Ron Rindjunsky
11f4b1cec9 mac80211: adding mac80211_tx_control_flags and HT flags
This patch makes enum from the defines previously dwelled inside
ieee80211_tx_control for better readability.
The patch also addes HT flags, for 802.11n drivers:
- IEEE80211_TXCTL_OFDM_HT: request low-level driver to use HT OFDM rates
- IEEE80211_TXCTL_GREEN_FIELD: use green field protection
- IEEE80211_TXCTL_DUP_DATA: duplicate data on both 20 Mhz channels
- IEEE80211_TXCTL_40_MHZ_WIDTH: send this frame in 40Mhz width
- IEEE80211_TXCTL_SHORT_GI: send this frame with short guard interval

Tx command can be a combination of any of these flags, along with
bitrate represented by ieee80211_rate. this will allow legacy drivers to
switch easily to any 11n rate representation.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-07 16:02:59 -05:00
Daniel Lezcano
b8ad0cbc58 [NETNS][IPV6] mcast - handle several network namespace
This patch make use of the network namespace information at the right
places to handle the multicast for several network namespaces.  It
makes the socket control to be per namespace too.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:16:55 -08:00
Daniel Lezcano
93ec926b07 [NETNS][IPV6] tcp6 - make socket control per namespace
Instead of having a tcp6_socket global to all the namespace, there is
tcp6 socket control per namespace. That is consistent with which
namespace sent a RST and allows to pass the socket to the underlying
function to retrieve the network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:16:02 -08:00
Daniel Lezcano
1762f7e88e [NETNS][IPV6] ndisc - make socket control per namespace
Make ndisc socket control per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:15:34 -08:00
Pavel Emelyanov
e9720acd72 [NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.

The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.

The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.

# ls -l /proc/net
lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net

In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.

Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
  screwup pointed out by Stephen.

  To get the correct nlink count the ->getattr callback for /proc/net
  is overridden to read one from the net->proc_net entry.

  To make selinux still work the net->proc_net entry is initialized
  properly, i.e. with the "net" name and the proc_net parent.

Selinux fixes are
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>

Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
David S. Miller
db8dac20d5 [UDP]: Revert udplite and code split.
This reverts commit db1ed684f6 ("[IPV6]
UDP: Rename IPv6 UDP files."), commit
8be8af8fa4 ("[IPV4] UDP: Move
IPv4-specific bits to other file.") and commit
e898d4db27 ("[UDP]: Allow users to
configure UDP-Lite.").

First, udplite is of such small cost, and it is a core protocol just
like TCP and normal UDP are.

We spent enormous amounts of effort to make udplite share as much code
with core UDP as possible.  All of that work is less valuable if we're
just going to slap a config option on udplite support.

It is also causing build failures, as reported on linux-next, showing
that the changeset was not tested very well.  In fact, this is the
second build failure resulting from the udplite change.

Finally, the config options provided was a bool, instead of a modular
option.  Meaning the udplite code does not even get build tested
by allmodconfig builds, and furthermore the user is not presented
with a reasonable modular build option which is particularly needed
by distribution vendors.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-06 16:22:02 -08:00
Allan Stephens
0e0609bbd2 [TIPC]: Eliminate "sparse" symbol warnings
This patch eliminates warnings about undeclared symbols.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-06 15:06:06 -08:00
Allan Stephens
8c8696553a [TIPC]: Removal of message header option code
This patch removes code associated with optional, user-specified
fields of the TIPC message header.  Such fields were never
utilized by TIPC, and have now been removed from the protocol
specification.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-06 15:05:07 -08:00
Johannes Berg
dbbea6713d mac80211: add documentation book
Quite a while ago I started this book. The required kernel-doc
patches have since gone into the tree so it is now possible to
build the book in mainline.

The actual documentation is still rather incomplete and not all
things are linked into the book, but this enables us to edit
the documentation collaboratively, hopefully driver authors can
add documentation based on their experience with mac80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-06 15:30:47 -05:00
Johannes Berg
902acc7896 mac80211: clean up mesh code
Various cleanups, reducing the #ifdef mess and other things.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-06 15:30:42 -05:00
Johannes Berg
6032f934c8 mac80211: add mesh interface type
This adds the mesh interface type.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-06 15:30:41 -05:00
Luis Carlos Cobo
2ec600d672 nl80211/cfg80211: support for mesh, sta dumping
Added support for mesh id and mesh path operation as well as
station structure dumping.

Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-03-06 15:30:41 -05:00
Tobias Klauser
04005dd9ae bluetooth: Make hci_sock_cleanup() return void
hci_sock_cleanup() always returns 0 and its return value isn't used
anywhere in the code.

Compile-tested with 'make allyesconfig && make net/bluetooth/bluetooth.ko'

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2008-03-05 18:47:03 -08:00
Harvey Harrison
4eb329a5aa irda: replace __inline with inline
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 18:37:16 -08:00
Eric Dumazet
ee6b967301 [IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts
(Anonymous) unions can help us to avoid ugly casts.

A common cast it the (struct rtable *)skb->dst one.

Defining an union like  :
union {
     struct dst_entry *dst;
     struct rtable *rtable;
};
permits to use skb->rtable in place.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 18:30:47 -08:00
David S. Miller
255333c1db Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	net/mac80211/rc80211_pid_algo.c
2008-03-05 12:26:41 -08:00
Daniel Lezcano
4591db4f37 [NETNS][IPV6] route6 - add netns parameter to ip6_route_output
Add an netns parameter to ip6_route_output. That will allow to access
to the right routing table for outgoing traffic.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:48:10 -08:00
Daniel Lezcano
af2849377e [NETNS][IPV6] addrconf - Pass the proper network namespace parameters to addrconf
This patch propagates the network namespace pointer to the address
configuration routines which need it, which means adding a new
parameter to these functions, and make them use it instead of using
the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:46:57 -08:00
David S. Miller
7adc3830f9 [TCP]: Improve ipv4 established hash function.
If all of the entropy is in the local and foreign addresses,
but xor'ing together would cancel out that entropy, the
current hash performs poorly.

Suggested by Cosmin Ratiu:

	Basically, the situation is as follows: There is a client
	machine and a server machine. Both create 15000 virtual
	interfaces, open up a socket for each pair of interfaces and
	do SIP traffic. By profiling I noticed that there is a lot of
	time spent walking the established hash chains with this
	particular setup.

	The addresses were distributed like this: client interfaces
	were 198.18.0.1/16 with increments of 1 and server interfaces
	were 198.18.128.1/16 with increments of 1. As I said, there
	were 15000 interfaces. Source and destination ports were 5060
	for each connection.  So in this case, ports don't matter for
	hashing purposes, and the bits from the address pairs used
	cancel each other, meaning there are no differences in the
	whole lot of pairs, so they all end up in the same hash chain.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 14:28:41 -08:00
Benjamin Thery
6891a346c3 [NETNS][IPV6] route6 - make garbage collection work with multiple network namespaces
This patch makes the necessary changes to make IPv6 dst_entry garbage
collection work with multiple network namespaces.

In ip6_dst_gc(), static local variables are now declared
per-namespace.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:49:47 -08:00
Benjamin Thery
f2fc6a5458 [NETNS][IPV6] route6 - move ip6_dst_ops inside the network namespace
The ip6_dst_ops is moved inside the network namespace structure.  All
references to this structure are now relative to the initial network
namespace.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:49:23 -08:00
Daniel Lezcano
8ed6778967 [NETNS][IPV6] rt6_info - move rt6_info structure inside the namespace
The rt6_info structures are moved inside the network namespace
structure. All references to these structures are now relative to the
initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:48:30 -08:00
Daniel Lezcano
bdb3289f73 [NETNS][IPV6] rt6_info - make rt6_info accessed as a pointer
This patch make mindless changes and prepares the code to use dynamic
allocation for rt6_info structure. The code accesses the rt6_info
structure as a pointer instead of a global static variable.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:48:10 -08:00
Daniel Lezcano
5578689a4e [NETNS][IPV6] route6 - make route6 per namespace
This patch makes the routing engine use the network namespaces to
access routing informations: Add a network namespace parameter to
ipv6_route_ioctl and propagate the network namespace value to all the
routing code that have not yet been changed.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:47:47 -08:00
Daniel Lezcano
7b4da53229 [NETNS][IPV6] route6 - Pass the network namespace parameter to rt6_purge_dflt_routers
Add a network namespace parameter to rt6_purge_dflt_routers.  This is
needed to call fib6_get_table with the appropriate network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:47:14 -08:00
Daniel Lezcano
606a2b4862 [NETNS][IPV6] route6 - Pass the network namespace parameter to rt6_lookup
Add a network namespace parameter to rt6_lookup().

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:45:59 -08:00
Benjamin Thery
c572872f89 [NETNS][IPV6] rt6_stats - make the stats per network namespace
The rt6_stats is now per namespace with this patch. It is allocated
when a network namespace is created and freed when the network
namespace exits and references are relative to the network namespace.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:34:17 -08:00
Daniel Lezcano
6cc118bd50 [NETNS][IPV6] rt6_stats - dynamically allocate the routes statistics
This patch allocates the rt6_stats struct dynamically when the fib6 is
initialized. That provides the ability to create several instances of
this structure for the network namespaces.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:33:43 -08:00
Daniel Lezcano
dcabb819a6 [NETNS][IPV6] fib6_rules - handle several network namespaces
The fib6_rules_ops is moved to the network namespace structure.  All
references are changed to have it relatively to it.

Each time a network namespace is created a new fib6_rules_ops is
allocated, initialized and stored into the network namespace
structure.

The common part of the fib rules is namespace aware, so it is quite
easy to retrieve the network namespace from the rules and use it in
the different callbacks.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:33:08 -08:00
Daniel Lezcano
63152fc0de [NETNS][IPV6] ip6_fib - gc timer per namespace
Move the timer initialization at the network namespace creation and
store the network namespace in the timer argument.

That enables multiple timers (one per network namespace) to do garbage
collecting.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:31:11 -08:00
Daniel Lezcano
5b7c931dff [NETNS][IPV6] ip6_fib - add net to gc timer parameter
The fib tables are now relative to the network namespace. When the
garbage collector timer expires, we must have a network namespace
parameter in order to retrieve the tables. For now this is the
init_net, but we should be able to have a timer per namespace and use
the timer callback parameter to pass the network namespace from the
expired timer.

The timer callback, fib6_run_gc, is actually used to be called
synchronously by some functions and asynchronously when the timer
expires.

When the timer expires, the delay specified for fib6_run_gc parameter
is always zero. So, I changed fib6_run_gc to not be a timer callback
but a function called by the timer callback and I added a timer
callback where its work is just to retrieve from the data arg of the
timer the network namespace and call fib6_run_gc with zero expiring
time and the network namespace parameters. That makes the code cleaner
for the fib6_run_gc callers.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:28:58 -08:00
Daniel Lezcano
f3db48517f [NETNS][IPV6] ip6_fib - fib6_clean_all handle several network namespaces
The function fib6_clean_all takes the network namespace as
parameter. That allows to flush the routes related to a specific
network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:27:06 -08:00
Daniel Lezcano
58f09b78b7 [NETNS][IPV6] ip6_fib - make it per network namespace
The fib table for ipv6 are moved to the network namespace structure.
All references to them are made relatively to the network namespace.

All external calls to the ip6_fib functions taking the network
namespace parameter are made using the init_net variable, so the
ip6_fib engine is ready for the namespaces but the callers not yet.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:25:27 -08:00
YOSHIFUJI Hideaki
3b00944c5c [IPV6]: Make ndisc_dst_alloc() common for later use.
For later use, this patch is renaming ndisc_dst_alloc()
(and related function/structures) to icmp6_dst_alloc()
(and so on).  This patch also removing unused function-
pointer argument for it.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:24 +09:00
YOSHIFUJI Hideaki
5e5f3f0f80 [IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().
Since most users of ipv6_get_saddr() pass non-NULL as
dst argument, use ipv6_dev_get_saddr() directly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:23 +09:00
YOSHIFUJI Hideaki
8082c37cdc [NET] NEIGHBOUR: Remove unpopular neigh_is_connected().
neigh_is_connected() is not popular at all, and the only user
drivers/net/cxgb3/l2t.c:t3_l2t_update() also have raw (expanded) expression.
Let's expand it and remove the inline function.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:23 +09:00
YOSHIFUJI Hideaki
0e7b8dcd16 [IPV6]: Use htonl() instead of __constant_htonl() where appricable.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:23 +09:00
YOSHIFUJI Hideaki
662397fd7a [IPV6]: Move packet_type{} related bits to af_inet6.c.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:23 +09:00
YOSHIFUJI Hideaki
e898d4db27 [UDP]: Allow users to configure UDP-Lite.
Let's give users an option for disabling UDP-Lite (~4K).

old:
|    text	   data	    bss	    dec	    hex	filename
|  286498	  12432	   6072	 305002	  4a76a	net/ipv4/built-in.o
|  193830	   8192	   3204	 205226	  321aa	net/ipv6/ipv6.o

new (without UDP-Lite):
|    text	   data	    bss	    dec	    hex	filename
|  284086	  12136	   5432	 301654	  49a56	net/ipv4/built-in.o
|  191835	   7832	   3076	 202743	  317f7	net/ipv6/ipv6.o

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:22 +09:00
Glenn Griffin
c6aefafb7e [TCP]: Add IPv6 support to TCP SYN cookies
Updated to incorporate Eric's suggestion of using a per cpu buffer
rather than allocating on the stack.  Just a two line change, but will
resend in it's entirety.

Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:21 +09:00
David S. Miller
4a80f27889 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6.26 2008-02-29 13:41:25 -08:00
Johannes Berg
2485f7105f mac80211: clarify use of TX status/RX callbacks
This patch clarifies the use of the irqsafe vs. non-irq-safe
functions and their respective locking requirements.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:41:58 -05:00
Johannes Berg
d46e144b65 mac80211: rework TX filtered frame code
This reworks the code for TX filtered frames, splitting it out to
a new function to handle those cases, making the clear instruction
a flag and renaming a few things to be easier to understand and
less Atheros hardware specific. Finally, it also makes the comments
explain more.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:41:32 -05:00
Bruno Randolf
9d9bf77d16 mac80211: enable IBSS merging
enable IBSS cell merging. if an IBSS beacon with the same channel, same ESSID
and a TSF higher than the local TSF (mactime) is received, we have to join its
BSSID. while this might not be immediately apparent from reading the 802.11
standard it is compliant and necessary to make IBSS mode functional in many
cases. most drivers have a similar behaviour.

* move the relevant code section (previously only containing debug code) down
to the end of the function, so we can reuse the bss structure.

* we have to compare the mactime (TSF at the time of packet receive) rather
than the current TSF. since mactime is defined as the time the first data
symbol arrived we add the time until byte 24 where the timestamp resides, since
this is how the beacon timestamp is defined. as some some drivers are not able
to give a reliable mactime we fall back to use the current TSF, which will be
enough to catch most (but not all) cases where an IBSS merge is necessary.

* in IBSS mode we want to allow beacons to override probe response info so we
can correctly do merges.

* we don't only configure beacons based on scan results, so change that
message.

* to enable this we have to let all beacons thru in IBSS mode, even if they
have a different BSSID.

Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:37:12 -05:00
Bruno Randolf
c132bec33c mac80211: better definition of mactime
define mactime as the time when the first data symbol arrived at the HW. the
old definition was questionable because 802.11 defines timestamp only for
beacon and probe response frames, and there it means the timestamp field.

a stricter definition of mactime is necessary for correct merging of IBSS.

note that it is up to the driver to convert whatever its hardware returns to
this definition. unfortunately we don't know for example when atheros hardware
takes its rx timestamp exactly :(

Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:37:11 -05:00
Michael Buesch
d0f5afbe6d mac80211: Extend filter flag documentation about unsupported flags
This extends the filter flags documentation to make it clear
what clearing a flag really means.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:37:08 -05:00
Johannes Berg
3330d7be70 mac80211: give burst time in txop rather than 0.1msec units
This changes mac80211 to pass the burst time to conf_tx in txop
units rather than 0.1msec units. 0.1msec units are only required
by atheros hardware (according to current driver support), all
other drivers do other calculations or require the txop value.
Therefore, it results in fewer calculations and more precision
if we just pass the txop value through to the driver.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:37:07 -05:00
Michael Wu
66f7ac50ed nl80211: Add monitor interface configuration flags
This allows precise control over what a monitor interface shows.

Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:37:02 -05:00
Johannes Berg
8318d78a44 cfg80211 API for channels/bitrates, mac80211 and driver conversion
This patch creates new cfg80211 wiphy API for channel and bitrate
registration and converts mac80211 and drivers to the new API. The
old mac80211 API is completely ripped out. All drivers (except ath5k)
are updated to the new API, in many cases I expect that optimisations
can be done.

Along with the regulatory code I've also ripped out the
IEEE80211_HW_DEFAULT_REG_DOMAIN_CONFIGURED flag, I believe it to be
unnecessary if the hardware simply gives us whatever channels it wants
to support and we then enable/disable them as required, which is pretty
much required for travelling.

Additionally, the patch adds proper "basic" rate handling for STA
mode interface, AP mode interface will have to have new API added
to allow userspace to set the basic rate set, currently it'll be
empty... However, the basic rate handling will need to be moved to
the BSS conf stuff.

I do expect there to be bugs in this, especially wrt. transmit
power handling where I'm basically clueless about how it should work.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:32 -05:00
Ron Rindjunsky
483fdcecc5 mac80211: A-MPDU Tx change tx_status to support Block Ack data
This patch adds fields to ieee80211_tx_status in order to allow block ack
information exchange between low-level driver,mac80211 and rate scaling
module.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:18 -05:00
Ron Rindjunsky
9e72349237 mac80211: A-MPDU Tx adding qdisc support
This patch allows qdisc support in A-MPDU Tx. a method to
handle QoS <-> TID switches is present in this patch.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:17 -05:00
Ron Rindjunsky
0df3ef45a3 mac80211: A-MPDU Tx add session's and low level driver's API
This patch adds the API for 3 stages in A-MPDU Tx session flow:
- request mac80211 to start/stop A-MPDU Tx session for specific TID. such a
  request should be issued by a load aware element, either mac80211 itself
  or external element.
- requests by mac80211 to low-level driver to start/stop Tx aggregation.
  notice that low level driver responds now with Starting Sequence Number.
- async feedback by low-level to mac80211 to inform that HW is ready for
  next A-MPDU Tx state.
Changes in API to Rx A-MPDU were also made, reflected in iwlwifi changes as
well.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:13 -05:00
Ilpo Järvinen
03a64c93b6 [LLC]: Kill static inline llc_addrany
After the patch:
$ git-grep llc_addrany | wc -l
0

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:46:17 -08:00
Ilpo Järvinen
a90bcbd651 [SCTP]: Kill unused static inline sctp_sysctl_jiffies_ms
After the patch:
$ git-grep sctp_sysctl_jiffies_ms | wc -l
0

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:45:34 -08:00
Denis V. Lunev
fd80eb942a [INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack.
It looks like dst parameter is used in this API due to historical
reasons.  Actually, it is really used in the direct call to
tcp_v4_send_synack only.  So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:43:03 -08:00
Neil Horman
58fbbed4fb [SCTP]: extend exported data in /proc/net/sctp/assoc
RFC 3873 specifies several MIB objects that can't be obtained by the
current data set exported by /proc/sys/net/sctp/assoc.  This patch
adds the missing pieces of data that allow us to compute all the
objects in the sctpAssocTable object.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:40:56 -08:00
Denis V. Lunev
98c6d1b261 [NETNS]: Make icmpv6_sk per namespace.
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmpv6_sk macro with
proper inline call.  Actual namespace the packet belongs too will be
passed later along with the one for the routing.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:21:22 -08:00
Denis V. Lunev
4a6ad7a141 [NETNS]: Make icmp_sk per namespace.
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmp_sk macro with
proper inline call.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:19:58 -08:00
Denis V. Lunev
edf0208702 [NET]: Make netlink_kernel_release publically available as sk_release_kernel.
This staff will be needed for non-netlink kernel sockets, which should
also not pin a namespace like tcp_socket and icmp_socket.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:18:32 -08:00
Denis V. Lunev
a5710d6582 [ICMP]: Add return code to icmp_init.
icmp_init could fail and this is normal for namespace other than initial.
So, the panic should be triggered only on init_net initialization path.

Additionally create rollback path for icmp_init as a separate function.
It will also be used later during namespace destruction.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:14:50 -08:00
Denis V. Lunev
9b0f976f27 [INET]: Remove struct net_proto_family* from _init calls.
struct net_proto_family* is not used in icmp[v6]_init, ndisc_init,
igmp_init and tcp_v4_init. Remove it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:13:15 -08:00
Timo Teras
4c563f7669 [XFRM]: Speed up xfrm_policy and xfrm_state walking
Change xfrm_policy and xfrm_state walking algorithm from O(n^2) to O(n).
This is achieved adding the entries to one more list which is used
solely for walking the entries.

This also fixes some races where the dump can have duplicate or missing
entries when the SPD/SADB is modified during an ongoing dump.

Dumping SADB with 20000 entries using "time ip xfrm state" the sys
time dropped from 1.012s to 0.080s.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 21:31:08 -08:00
Juha-Matti Tapio
99cd07a537 [IPV6]: Fix source address selection for ORCHID addresses
Skip the prefix length matching in source address selection for
orchid -> non-orchid addresses.

Overlay Routable Cryptographic Hash IDentifiers (RFC 4843,
2001:10::/28) are currenty not globally reachable. Without this
check a host with an ORCHID address can end up preferring those over
regular addresses when talking to other regular hosts in the 2001::/16
range thus breaking non-orchid connections.

Signed-off-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:55:46 -08:00
Vlad Yasevich
7e8616d8e7 [SCTP]: Update AUTH structures to match declarations in draft-16.
The new SCTP socket api (draft 16) updates the AUTH API structures.
We never exported these since we knew they would change.
Update the rest to match the draft.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-28 16:45:04 -05:00
Pavel Emelyanov
34cc7ba639 [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.

Thanks Patrick for noticing this.

[ The way this works is, when the device is actually registered,
  the generic code noticed the '%' in the name and invokes
  dev_alloc_name() to fully resolve the name.  -DaveM ]

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23 20:19:20 -08:00
Randy Dunlap
3172936341 net: fix kernel-doc warnings in header files
Add missing structure kernel-doc descriptions to sock.h & skbuff.h
to fix kernel-doc warnings.

(I think that Stephen H. sent a similar patch, but I can't find it.
I just want to kill the warnings, with either patch.)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 20:52:13 -08:00
Herbert Xu
b318e0e4ef [IPSEC]: Fix bogus usage of u64 on input sequence number
Al Viro spotted a bogus use of u64 on the input sequence number which
is big-endian.  This patch fixes it by giving the input sequence number
its own member in the xfrm_skb_cb structure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:50:35 -08:00
Rami Rosen
0f8f27c395 [IPV6]: remove unused method declaration (net/ndisc.h).
This patch removes unused declaration of dflt_rt_lookup() method in
include/net/ndisc.h

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:06:53 -08:00
Jarek Poplawski
e848b583e0 [AX25] ax25_ds_timer: use mod_timer instead of add_timer
This patch changes current use of: init_timer(), add_timer()
and del_timer() to setup_timer() with mod_timer(), which
should be safer anyway.

Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:34 -08:00
Jarek Poplawski
21fab4a86a [AX25] ax25_timer: use mod_timer instead of add_timer
According to one of Jann's OOPS reports it looks like
BUG_ON(timer_pending(timer)) triggers during add_timer()
in ax25_start_t1timer(). This patch changes current use
of: init_timer(), add_timer() and del_timer() to
setup_timer() with mod_timer(), which should be safer
anyway.

Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:33 -08:00
David S. Miller
ab1ecbabb1 Merge branch 'pending' of master.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev 2008-02-09 03:44:25 -08:00
Ilpo Järvinen
86121fe5b4 [TIPC]: Kill unused static inline (x5)
All these static inlines are unused:

in_own_zone     1 (net/tipc/addr.h)
msg_dataoctet   1 (net/tipc/msg.h)
msg_direct      1 (include/net/tipc/tipc_msg.h)
msg_options     1 (include/net/tipc/tipc_msg.h)
tipc_nmap_get   1 (net/tipc/bcast.h)

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:17:13 -08:00
Rami Rosen
4e881a217b [IPV6] Minor cleanup: remove unused definitions in net/ip6_fib.h
This patch removes some unused definitions and one method typedef
declaration (f_pnode)
in include/net/ip6_fib.h, as they are not used in the kernel.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:11:49 -08:00
Rami Rosen
bba536a3d5 [IPV6] Minor clenup: remove two unused definitions in net/ip6_route.h
Remove IP6_RT_PRIO_FW and IP6_RT_FLOW_MASK definitions in
include/net/ip6_route.h, as they are not used in the kernel.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:10:19 -08:00
Patrick McHardy
86577c661b [NETFILTER]: nf_conntrack: fix ct_extend ->move operation
The ->move operation has two bugs:

- It is called with the same extension as source and destination,
  so it doesn't update the new extension.

- The address of the old extension is calculated incorrectly,
  instead of (void *)ct->ext + ct->ext->offset[i] it uses
  ct->ext + ct->ext->offset[i].

Fixes a crash on x86_64 reported by Chuck Ebbert <cebbert@redhat.com>
and Thomas Woerner <twoerner@redhat.com>.

Tested-by: Thomas Woerner <twoerner@redhat.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 17:56:34 -08:00
Eric Van Hensbergen
8a0dc95fd9 9p: transport API reorganization
This merges the mux.c (including the connection interface) with trans_fd
in preparation for transport API changes.  Ultimately, trans_fd will need
to be rewritten to clean it up and simplify the implementation, but this
reorganization is viewed as the first step.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:03 -06:00
Anthony Liguori
d199d652c5 9p: add support for sticky bit
GDM gets unhappy if /var/gdm doesn't have the sticky bit set.  This patch adds
support for the sticky bit in much the same way setuid/setgid is supported.

With this patch, I can launch X from a v9fs rootfs (although I quickly run out
of fds in the server once gnome starts up).

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:06 -06:00
Eric Van Hensbergen
e2735b7720 9p: block-based virtio client
This replaces the console-based virto client with a block-based
client using a single request queue.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:58 -06:00
Eric Van Hensbergen
043aba403e 9p: create transport rpc cut-thru
Add a new transport function which allows a cut-thru directly to
the transport instead of processing request through the mux if the
cut-thru exists.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:09 -06:00
Linus Torvalds
3d412f60b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  [PKT_SCHED]: vlan tag match
  [NET]: Add if_addrlabel.h to sanitized headers.
  [NET] rtnetlink.c: remove no longer used functions
  [ICMP]: Restore pskb_pull calls in receive function
  [INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations.
  [NET]: Remove further references to net-modules.txt
  bluetooth rfcomm tty: destroy before tty_close()
  bluetooth: blacklist another Broadcom BCM2035 device
  drivers/bluetooth/btsdio.c: fix double-free
  drivers/bluetooth/bpa10x.c: fix memleak
  bluetooth: uninlining
  bluetooth: hidp_process_hid_control remove unnecessary parameter dealing
  tun: impossible to deassert IFF_ONE_QUEUE or IFF_NO_PI
  hamradio: fix dmascc section mismatch
  [SCTP]: Fix kernel panic while received AUTH chunk with BAD shared key identifier
  [SCTP]: Fix kernel panic while received AUTH chunk while enabled auth
  [IPV4]: Formatting fix for /proc/net/fib_trie.
  [IPV6]: Fix sysctl compilation error.
  [NET_SCHED]: Add #ifdef CONFIG_NET_EMATCH in net/sched/cls_flow.c (latest git broken build)
  [IPV4]: Fix compile error building without CONFIG_FS_PROC
  ...
2008-02-05 10:09:07 -08:00
Paul Moore
eda61d32e8 NetLabel: introduce a new kernel configuration API for NetLabel
Add a new set of configuration functions to the NetLabel/LSM API so that
LSMs can perform their own configuration of the NetLabel subsystem without
relying on assistance from userspace.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:20 -08:00
Vlad Yasevich
60c778b259 [SCTP]: Stop claiming that this is a "reference implementation"
I was notified by Randy Stewart that lksctp claims to be
"the reference implementation".  First of all, "the
refrence implementation" was the original implementation
of SCTP in usersapce written ty Randy and a few others.
Second, after looking at the definiton of 'reference implementation',
we don't really meet the requirements.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-05 10:59:07 -05:00
Pavel Emelyanov
5d8c0aa943 [INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations.
The port offset calculations depend on the protocol family, but, as
Adrian noticed, I broke this logic with the commit

	5ee31fc1ec
	[INET]: Consolidate inet(6)_hash_connect.

Return this logic back, by passing the port offset directly into the
consolidated function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Noticed-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:14:44 -08:00
Daniel Lezcano
6de1a91040 [IPV6]: Fix sysctl compilation error.
Move ipv6_icmp_sysctl_init and ipv6_route_sysctl_init into the right
ifdef section otherwise that does not compile when CONFIG_SYSCTL=yes
and CONFIG_PROC_FS=no

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 02:57:59 -08:00
Li Zefan
cc8274f50f [IPV4]: Fix compile error building without CONFIG_FS_PROC
compile error building without CONFIG_FS_PROC:

net/ipv4/fib_frontend.c: In function 'fib_net_init':
net/ipv4/fib_frontend.c:1032: error: implicit declaration of function 'fib_proc_
init'
net/ipv4/fib_frontend.c: In function 'fib_net_exit':
net/ipv4/fib_frontend.c:1047: error: implicit declaration of function 'fib_proc_
exit'

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 02:54:16 -08:00
Arnaldo Carvalho de Melo
246f19d194 [IPV6]: Reorg struct ifmcaddr6 to save some bytes
/home/acme/git/net-2.6/net/ipv6/mcast.c:
  struct ifmcaddr6 |   -8
 1 struct changed
  igmp6_group_dropped  |   -6
  add_grec             |   -3
  mld_ifc_timer_expire |  -18
  ip6_mc_add_src       |   -3
  ip6_mc_del_src       |   -3
  igmp6_group_added    |   -3
 6 functions changed, 36 bytes removed, diff: -36

ipv6.ko:
 6 functions changed, 36 bytes removed, diff: -36

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:54 -08:00
Arnaldo Carvalho de Melo
ad8bb78083 [INET_TIMEWAIT_SOCK]: Reorganize struct inet_timewait_sock to save some bytes
/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  struct inet_timewait_sock |   -8
  struct tcp_timewait_sock  |   -8
 2 structs changed
  tcp_v6_rcv                |   -6
 1 function changed, 6 bytes removed, diff: -6

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:54 -08:00
Arnaldo Carvalho de Melo
4e7e5cfe38 [INET6]: Reorganize struct inet6_dev to save 8 bytes
And make it a multiple of a 64 bytes, reducing cacheline trashing:

Before:

[acme@doppio net-2.6]$ pahole -C inet6_dev net/dccp/ipv6.o
struct inet6_dev {
	<SNIP>
	long unsigned int          mc_maxdelay;          /*    48     8 */
	unsigned char              mc_qrv;               /*    56     1 */
	unsigned char              mc_gq_running;        /*    57     1 */
	unsigned char              mc_ifc_count;         /*    58     1 */

	/* XXX 5 bytes hole, try to pack */

	/* --- cacheline 1 boundary (64 bytes) --- */
	struct timer_list          mc_gq_timer;          /*    64    48 */
	<SNIP>
	__u32                      if_flags;             /*   180     4 */
	int                        dead;                 /*   184     4 */
	u8                         rndid[8];             /*   188     8 */

	/* XXX 4 bytes hole, try to pack */

	/* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
	struct timer_list          regen_timer;          /*   200    48 */

	<SNIP>

	/* size: 456, cachelines: 8 */
	/* sum members: 447, holes: 2, sum holes: 9 */
	/* last cacheline: 8 bytes */
};

After:

net-2.6/net/ipv6/af_inet6.c:
  struct inet6_dev |   -8
 1 struct changed

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:52 -08:00
Arnaldo Carvalho de Melo
ab1e0a13d7 [SOCK] proto: Add hashinfo member to struct proto
This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                       a specific version to deal with mapped sockets
sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
  struct proto			     |   +8
  struct inet_connection_sock_af_ops |   +8
 2 structs changed
  __inet_hash_nolisten               |  +18
  __inet_hash                        | -210
  inet_put_port                      |   +8
  inet_bind_bucket_create            |   +1
  __inet_hash_connect                |   -8
 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
  proto_seq_show                     |   +3
 1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
  inet_csk_get_port                  |  +15
 1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
  tcp_set_state                      |   -7
 1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
  tcp_v4_get_port                    |  -31
  tcp_v4_hash                        |  -48
  tcp_v4_destroy_sock                |   -7
  tcp_v4_syn_recv_sock               |   -2
  tcp_unhash                         | -179
 5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
  __inet6_hash |   +8
 1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
  inet_unhash                        | +190
  inet_hash                          | +242
 2 functions changed, 432 bytes added, diff: +432

vmlinux:
 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  tcp_v6_get_port                    |  -31
  tcp_v6_hash                        |   -7
  tcp_v6_syn_recv_sock               |   -9
 3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
  dccp_destroy_sock                  |   -7
  dccp_unhash                        | -179
  dccp_hash                          |  -49
  dccp_set_state                     |   -7
  dccp_done                          |   +1
 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
  dccp_v4_get_port                   |  -31
  dccp_v4_request_recv_sock          |   -2
 2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
  dccp_v6_get_port                   |  -31
  dccp_v6_hash                       |   -7
  dccp_v6_request_recv_sock          |   +5
 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:52 -08:00
Denis V. Lunev
4814bdbd59 [NETNS]: Lookup in FIB semantic hashes taking into account the namespace.
The namespace is not available in the fib_sync_down_addr, add it as a
parameter.

Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all.  So, just fix lookup by address and insertion to the
hash table path.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:41 -08:00
Denis V. Lunev
7462bd744e [NETNS]: Add a namespace mark to fib_info.
This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:40 -08:00
Denis V. Lunev
85326fa54b [IPV4]: fib_sync_down rework.
fib_sync_down can be called with an address and with a device. In
reality it is called either with address OR with a device. The
codepath inside is completely different, so lets separate it into two
calls for these two cases.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:39 -08:00
Patrick McHardy
5239008b0d [NET_SCHED]: Constify struct tcf_ext_map
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:34 -08:00
Eric Dumazet
29e75252da [IPV4] route cache: Introduce rt_genid for smooth cache invalidation
Current ip route cache implementation is not suited to large caches.

We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.

A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.

Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.

This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
2008-01-31 19:28:27 -08:00
Pavel Emelyanov
d86e0dac2c [NETNS]: Tcp-v6 sockets per-net lookup.
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:20 -08:00
Pavel Emelyanov
c67499c0e7 [NETNS]: Tcp-v4 sockets per-net lookup.
Add a net argument to inet_lookup and propagate it further
into lookup calls. Plus tune the __inet_check_established.

The dccp and inet_diag, which use that lookup functions
pass the init_net into them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:19 -08:00
Pavel Emelyanov
941b1d22cc [NETNS]: Make bind buckets live in net namespaces.
This tags the inet_bind_bucket struct with net pointer,
initializes it during creation and makes a filtering
during lookup.

A better hashfn, that takes the net into account is to
be done in the future, but currently all bind buckets
with similar port will be in one hash chain.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:18 -08:00
Pavel Emelyanov
5ee31fc1ec [INET]: Consolidate inet(6)_hash_connect.
These two functions are the same except for what they call
to "check_established" and "hash" for a socket.

This saves half-a-kilo for ipv4 and ipv6.

 add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546)
 function                                     old     new   delta
 __inet_hash_connect                            -     577    +577
 arp_ignore                                   108     113      +5
 static.hint                                    8       4      -4
 rt_worker_func                               376     372      -4
 inet6_hash_connect                           584      25    -559
 inet_hash_connect                            586      25    -561

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:17 -08:00
Jan Engelhardt
32948588ac [NETFILTER]: nf_conntrack: annotate l3protos with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:13 -08:00
Jan Engelhardt
82f568fc2f [NETFILTER]: nf_{conntrack,nat}_proto_tcp: constify and annotate TCP modules
Constify a few data tables use const qualifiers on variables where
possible in the nf_*_proto_tcp sources.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:10 -08:00
Patrick McHardy
c88130bcd5 [NETFILTER]: nf_conntrack: naming unification
Rename all "conntrack" variables to "ct" for more consistency and
avoiding some overly long lines.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:59 -08:00
Patrick McHardy
ffaa9c100b [NETFILTER]: nf_conntrack: reorder struct nf_conntrack_l4proto
Reorder struct nf_conntrack_l4proto so all members used during packet
processing are in the same cacheline.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:57 -08:00
Patrick McHardy
380517dead [NETFILTER]: nf_conntrack: avoid duplicate protocol comparison in nf_ct_tuple_equal()
nf_ct_tuple_src_equal() and nf_ct_tuple_dst_equal() both compare the protocol
numbers. Unfortunately gcc doesn't optimize out the second comparison, so
remove it and prefix both functions with __ to indicate that they should not
be used directly.

Saves another 16 byte of text in __nf_conntrack_find() on x86_64:

  nf_conntrack_tuple_taken |  -20 # 320 -> 300, size inlines: 181 -> 161
  __nf_conntrack_find      |  -16 # 267 -> 251, size inlines: 127 -> 115
  __nf_conntrack_confirm   |  -40 # 875 -> 835, size inlines: 570 -> 537
 3 functions changed, 76 bytes removed

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:56 -08:00
Patrick McHardy
ba419aff2c [NETFILTER]: nf_conntrack: optimize __nf_conntrack_find()
Ignoring specific entries in __nf_conntrack_find() is only needed by NAT
for nf_conntrack_tuple_taken(). Remove it from __nf_conntrack_find()
and make nf_conntrack_tuple_taken() search the hash itself.

Saves 54 bytes of text in the hotpath on x86_64:

  __nf_conntrack_find      |  -54 # 321 -> 267, # inlines: 3 -> 2, size inlines: 181 -> 127
  nf_conntrack_tuple_taken | +305 # 15 -> 320, lexblocks: 0 -> 3, # inlines: 0 -> 3, size inlines: 0 -> 181
  nf_conntrack_find_get    |   -2 # 90 -> 88
 3 functions changed, 305 bytes added, 56 bytes removed, diff: +249

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:55 -08:00
Patrick McHardy
f8ba1affa1 [NETFILTER]: nf_conntrack: switch rwlock to spinlock
With the RCU conversion only write_lock usages of nf_conntrack_lock are
left (except one read_lock that should actually use write_lock in the
H.323 helper). Switch to a spinlock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:54 -08:00
Patrick McHardy
76507f69c4 [NETFILTER]: nf_conntrack: use RCU for conntrack hash
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:54 -08:00
Patrick McHardy
7d0742da1c [NETFILTER]: nf_conntrack_expect: use RCU for expectation hash
Use RCU for expectation hash. This doesn't buy much for conntrack
runtime performance, but allows to reduce the use of nf_conntrack_lock
for /proc and nf_netlink_conntrack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:53 -08:00
Patrick McHardy
58a3c9bb0c [NETFILTER]: nf_conntrack: use RCU for conntrack helpers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:51 -08:00
Stephen Hemminger
96eb24d770 [NETFILTER]: nf_conntrack: sparse warnings
The hashtable size is really unsigned so sparse complains when you pass
a signed integer.  Change all uses to make it consistent.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:44 -08:00
Alexey Dobriyan
9ea0cb2601 [NETFILTER]: arp_tables: per-netns arp_tables FILTER
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:41 -08:00
Alexey Dobriyan
8280aa6182 [NETFILTER]: ip6_tables: per-netns IPv6 FILTER, MANGLE, RAW
Now it's possible to list and manipulate per-netns ip6tables rules.
Filtering decisions are based on init_net's table so far.

P.S.: remove init_net check in inet6_create() to see the effect

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:39 -08:00
Alexey Dobriyan
9335f047fe [NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW
Now, iptables show and configure different set of rules in different
netnss'. Filtering decisions are still made by consulting only
init_net's set.

Changes are identical except naming so no splitting.

P.S.: one need to remove init_net checks in nf_sockopt.c and inet_create()
      to see the effect.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:38 -08:00
Alexey Dobriyan
8d87005207 [NETFILTER]: x_tables: per-netns xt_tables
In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.

Every user stubbed with init_net for a while.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:35 -08:00
Jan Engelhardt
abfdf1c489 [NETFILTER]: ebtables: remove casts, use consts
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:33 -08:00
Helge Deller
000e8a5354 [NETFILTER]: nf_log: add netfilter gcc printf format checking
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:32 -08:00
Denis V. Lunev
3046d76746 [RAW]: Wrong content of the /proc/net/raw6.
The address of IPv6 raw sockets was shown in the wrong format, from
IPv4 ones.  The problem has been introduced by the commit
42a73808ed ("[RAW]: Consolidate proc
interface.")

Thanks to Adrian Bunk who originally noticed the problem.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:26 -08:00
Denis V. Lunev
377cf82d66 [RAW]: Family check in the /proc/net/raw[6] is extra.
Different hashtables are used for IPv6 and IPv4 raw sockets, so no
need to check the socket family in the iterator over hashtables. Clean
this out.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:24 -08:00
Eric Dumazet
533cb5b0a6 [XFRM]: constify 'struct xfrm_type'
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:20 -08:00
Laszlo Attila Toth
4a19ec5800 [NET]: Introducing socket mark socket option.
A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.

It requires CAP_NET_ADMIN capability.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:19 -08:00
Herbert Xu
1a6509d991 [IPSEC]: Add support for combined mode algorithms
This patch adds support for combined mode algorithms with GCM being
the first algorithm supported.

Combined mode algorithms can be added through the xfrm_user interface
using the new algorithm payload type XFRMA_ALG_AEAD.  Each algorithms
is identified by its name and the ICV length.

For the purposes of matching algorithms in xfrm_tmpl structures,
combined mode algorithms occupy the same name space as encryption
algorithms.  This is in line with how they are negotiated using IKE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:03 -08:00
Herbert Xu
38320c70d2 [IPSEC]: Use crypto_aead and authenc in ESP
This patch converts ESP to use the crypto_aead interface and in particular
the authenc algorithm.  This lays the foundations for future support of
combined mode algorithms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:02 -08:00
Paul Moore
8cc44579d1 NetLabel: Introduce static network labels for unlabeled connections
Most trusted OSs, with the exception of Linux, have the ability to specify
static security labels for unlabeled networks.  This patch adds this ability to
the NetLabel packet labeling framework.

If the NetLabel subsystem is called to determine the security attributes of an
incoming packet it first checks to see if any recognized NetLabel packet
labeling protocols are in-use on the packet.  If none can be found then the
unlabled connection table is queried and based on the packets incoming
interface and address it is matched with a security label as configured by the
administrator using the netlabel_tools package.  The matching security label is
returned to the caller just as if the packet was explicitly labeled using a
labeling protocol.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:28 +11:00
Paul Moore
75e22910cf NetLabel: Add IP address family information to the netlbl_skbuff_getattr() function
In order to do any sort of IP header inspection of incoming packets we need to
know which address family, AF_INET/AF_INET6/etc., it belongs to and since the
sk_buff structure does not store this information we need to pass along the
address family separate from the packet itself.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:20 +11:00
Paul Moore
16efd45435 NetLabel: Add secid token support to the NetLabel secattr struct
This patch adds support to the NetLabel LSM secattr struct for a secid token
and a type field, paving the way for full LSM/SELinux context support and
"static" or "fallback" labels.  In addition, this patch adds a fair amount
of documentation to the core NetLabel structures used as part of the
NetLabel kernel API.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:19 +11:00
Patrick McHardy
ab27cfb85c [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:17 -08:00
Denis V. Lunev
b5921910a1 [NETNS]: Routing cache virtualization.
Basically, this piece looks relatively easy. Namespace is already
available on the dst entry via device and the device is safe to
dereferrence. Compare it with one of a searcher and skip entry if
appropriate.

The only exception is ip_rt_frag_needed. So, add namespace parameter to it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:13 -08:00
Denis V. Lunev
eee80592c3 [NETNS]: Correct namespace for connect-time routing.
ip_route_connect and ip_route_newports are a part of routing API
presented to the socket layer. The namespace is available inside them
through a socket.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:12 -08:00
Patrick McHardy
7ba699c604 [NET_SCHED]: Convert actions from rtnetlink to new netlink API
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:11 -08:00
Patrick McHardy
add93b610a [NET_SCHED]: Convert classifiers from rtnetlink to new netlink API
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:11 -08:00
Patrick McHardy
1e90474c37 [NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API
Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:10 -08:00
Patrick McHardy
01480e1cf5 [NETLINK]: Add nla_append()
Used to append data to a message without a header or padding.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:09 -08:00
Denis V. Lunev
f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Denis V. Lunev
f1b050bf7a [NETNS]: Add namespace parameter to ip_route_output_flow.
Needed to propagate it down to the __ip_route_output_key.

Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:06 -08:00
Denis V. Lunev
611c183ebc [NETNS]: Add namespace parameter to __ip_route_output_key.
This is only required to propagate it down to the
ip_route_output_slow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:05 -08:00
Denis V. Lunev
010278ec4c [NETNS]: Add netns parameter to fib_select_default.
Currently fib_select_default calls fib_get_table() with the
init_net. Prepare it to provide a correct namespace to lookup default
route.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:03 -08:00
Denis V. Lunev
64c2d53829 [IPV4]: Consolidate fib_select_default.
The difference in the implementation of the fib_select_default when
CONFIG_IP_MULTIPLE_TABLES is (not) defined looks
negligible. Consolidate it and place into fib_frontend.c.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:02 -08:00
Denis V. Lunev
e30d3a0ccd [IPV4]: Declarations cleanup in ip_fib.h.
Two small issues fixed:
- fib_select_multipath is exported from fib_semantics.c rather than from
  fib_frontend.c. So, move the declaration below appropriate comment.
- struct rt_entry declaration is not used. Drop it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:02 -08:00
Eric Dumazet
69a73829db [DST]: shrinks sizeof(struct rtable) by 64 bytes on x86_64
On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to
0x180 bytes by SLAB allocator.

We can reduce this to exactly 0x140 bytes, without alignment overhead,
and store 12 struct rtable per PAGE instead of 10.

rate_tokens is currently defined as an "unsigned long", while its
content should not exceed 6*HZ. It can safely be converted to an
unsigned int.

Moving tclassid right after rate_tokens to fill the 4 bytes hole
permits to save 8 bytes on 'struct dst_entry', which finally permits
to save 8 bytes on 'struct rtable'

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:41 -08:00
Pavel Emelyanov
81566e8322 [NETNS][FRAGS]: Make the pernet subsystem for fragments.
On namespace start we mainly prepare the ctl variables.

When the namespace is stopped we have to kill all the fragments that
point to this namespace.  The inet_frags_exit_net() handles it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:40 -08:00
Pavel Emelyanov
3140c25c82 [NETNS][FRAGS]: Make the LRU list per namespace.
The inet_frags.lru_list is used for evicting only, so we have
to make it per-namespace, to evict only those fragments, who's
namespace exceeded its high threshold, but not the whole hash.
Besides, this helps to avoid long loops  in evictor.

The spinlock is not per-namespace because it protects the
hash table as well, which is global.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:39 -08:00
Pavel Emelyanov
3b4bc4a2bf [NETNS][FRAGS]: Isolate the secret interval from namespaces.
Since we have one hashtable to lookup the fragment, having
different secret_interval-s for hash rebuild doesn't make
sense, so move this one to inet_frags.

The inet_frags_ctl becomes empty after this, so remove it.
The appropriate ctl table is kept read-only in namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:39 -08:00
Pavel Emelyanov
e31e0bdc7e [NETNS][FRAGS]: Make thresholds work in namespaces.
This is the same as with the timeout variable.

Currently, after exceeding the high threshold _all_
the fragments are evicted, but it will be fixed in
later patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:38 -08:00
Pavel Emelyanov
b2fd5321dd [NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces.
Move it to the netns_frags, adjust the usage and
make the appropriate ctl table writable.

Now fragment, that live in different namespaces can
live for different times.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:37 -08:00
Pavel Emelyanov
e4a2d5c2bc [NETNS][FRAGS]: Duplicate sysctl tables for new namespaces.
Each namespace has to have own tables to tune their
different parameters, so duplicate the tables and
register them.

All the tables in sub-namespaces are temporarily made
read-only.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:37 -08:00
Pavel Emelyanov
6ddc082223 [NETNS][FRAGS]: Make the mem counter per-namespace.
This is also simple, but introduces more changes, since
then mem counter is altered in more places.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:36 -08:00
Pavel Emelyanov
e5a2bb842c [NETNS][FRAGS]: Make the nqueues counter per-namespace.
This is simple - just move the variable from struct inet_frags
to struct netns_frags and adjust the usage appropriately.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:35 -08:00
Pavel Emelyanov
ac18e7509e [NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces.
Since fragment management code is consolidated, we cannot have the
pointer from inet_frag_queue to struct net, since we must know what
king of fragment this is.

So, I introduce the netns_frags structure. This one is currently
empty, but will be eventually filled with per-namespace
attributes. Each inet_frag_queue is tagged with this one.

The conntrack_reasm is not "netns-izated", so it has one static
netns_frags instance to keep working in init namespace.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:34 -08:00
Pavel Emelyanov
8d8354d2fb [NETNS][FRAGS]: Move ctl tables around.
This is a preparation for sysctl netns-ization.
Move the ctl tables to the files, where the tuning
variables reside. Plus make the helpers to register
the tables.

This will simplify the later patches and will keep
similar things closer to each other.

ipv4, ipv6 and conntrack_reasm are patched differently,
but the result is all the tables are in appropriate files.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:34 -08:00
YOSHIFUJI Hideaki
2334ecbdb2 [IPV6]: Sparse: Declare non-static ipv6_{route,icmp,frag}_sysctl_init() in header.
Fix the following sparse warnings:
| net/ipv6/route.c:2491:18: warning: symbol 'ipv6_route_sysctl_init' was not declared. Should it be static?
| net/ipv6/icmp.c:922:18: warning: symbol 'ipv6_icmp_sysctl_init' was not declared. Should it be static?
| net/ipv6/reassembly.c:628:6: warning: symbol 'ipv6_frag_sysctl_init' was not declared. Should it be static?

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:27 -08:00
Denis V. Lunev
da0e28cb68 [NETNS]: Add netns parameter to fib_lookup.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:19 -08:00
Johannes Berg
471b3efdfc mac80211: add unified BSS configuration
This patch (based on Ron Rindjunsky's) creates a framework for
a unified way to pass BSS configuration to drivers that require
the information, e.g. for implementing power save mode.

This patch introduces new ieee80211_bss_conf structure that is
passed to the driver via the new bss_info_changed() callback
when the BSS configuration changes.

This new BSS configuration infrastructure adds the following
new features:
 * drivers are notified of their association AID
 * drivers are notified of association status

and replaces the erp_ie_changed() callback. The patch also does
the relevant driver updates for the latter change.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:43 -08:00
Johannes Berg
51fb61e76d mac80211: move interface type to vif structure
Drivers that support mixed AP/STA operation may well need to
know the type of a virtual interface when iterating over them.
The easiest way to support that is to move the interface type
variable into the vif structure.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:37 -08:00
Johannes Berg
32bfd35d4b mac80211: dont use interface indices in drivers
This patch gets rid of the if_id stuff where possible in favour of
a new per-virtual-interface structure "struct ieee80211_vif". This
structure is located at the end of the per-interface structure and
contains a variable length driver-use data area.

This has two advantages:
 * removes the need to look up interfaces by if_id, this is better
   for working with network namespaces and performance
 * allows drivers to store and retrieve per-interface data without
   having to allocate own lists/hash tables

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:36 -08:00
Al Viro
d9e94d5647 ieee80211: fix misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:08:48 -08:00
Jan Engelhardt
1e637c74b0 [IPV4]: Enable use of 240/4 address space.
This short patch modifies the IPv4 networking to enable use of the
240.0.0.0/4 (aka "class-E") address space as propsed in the internet
draft draft-fuller-240space-00.txt.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:44 -08:00
Denis V. Lunev
51314a17ba [NETNS]: Process FIB rule action in the context of the namespace.
Save namespace context on the fib rule at the rule creation time and
call routing lookup in the correct namespace.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:14 -08:00
Denis V. Lunev
9e3a548781 [NETNS]: FIB rules API cleanup.
Remove struct net from fib_rules_register(unregister)/notify_change
paths and diet code size a bit.

add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65)
function                                     old     new   delta
notify_rule_change                           273     280      +7
trie_show_stats                              471     475      +4
fn_trie_delete                               473     477      +4
fib_rules_unregister                         144     148      +4
fib4_rule_compare                            119     123      +4
resize                                      2842    2845      +3
fn_trie_select_default                       515     518      +3
inet_sk_rebuild_header                       836     838      +2
fib_trie_seq_show                            764     766      +2
__devinet_sysctl_register                    276     278      +2
fn_trie_lookup                              1124    1123      -1
ip_fib_check_default                         133     131      -2
devinet_conf_sysctl                          223     221      -2
snmp_fold_field                              126     123      -3
fn_trie_insert                              2091    2086      -5
inet_create                                  876     870      -6
fib4_rules_init                              197     191      -6
fib_sync_down                                452     444      -8
inet_gso_send_check                          334     325      -9
fib_create_info                             3003    2991     -12
fib_nl_delrule                               568     553     -15
fib_nl_newrule                               883     852     -31

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:13 -08:00
Denis V. Lunev
0359238333 [FIB]: Add netns to fib_rules_ops.
The backward link from FIB rules operations to the network namespace
will allow to simplify the API a bit.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:13 -08:00
Adrian Bunk
e9888f5498 [IrDA]: Irport removal - part 1
This patch removes IrPORT and the old dongle drivers (all off them
have replacement drivers).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:10 -08:00
Daniel Lezcano
d4fa26ff44 [NETNS][DST]: Add the network namespace pointer in dst_ops
The network namespace pointer can be stored into the dst_ops structure.
This is usefull when there are multiple instances of the dst_ops for a
protocol. When there are no several instances, this field will be never
used in the protocol. So there is no impact for the protocols which do
implement the network namespaces.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:47 -08:00
Daniel Lezcano
569d36452e [NETNS][DST] dst: pass the dst_ops as parameter to the gc functions
The garbage collection function receive the dst_ops structure as
parameter. This is useful for the next incoming patchset because it
will need the dst_ops (there will be several instances) and the
network namespace pointer (contained in the dst_ops).

The protocols which do not take care of the namespaces will not be
impacted by this change (expect for the function signature), they do
just ignore the parameter.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:46 -08:00
Patrick McHardy
c56cc9c07b [NETFILTER]: nf_conntrack: remove print_conntrack function from l3protos
Its unused and unlikely to ever be used.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:41 -08:00
Patrick McHardy
b334aadc3c [NETFILTER]: nf_conntrack: clean up a few header files
- Remove declarations of non-existing variables and functions
- Move helper init/cleanup function declarations to nf_conntrack_helper.h
- Remove unneeded __nf_conntrack_attach declaration and make it static

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:41 -08:00
Stephen Hemminger
7f9b80529b [IPV4]: fib hash|trie initialization
Initialization of the slab cache's should be done when IP is
initialized to make sure of available memory, and that code can be
marked __init.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:15 -08:00
Denis V. Lunev
06f0511df1 [ARP]: neigh_parms_put(destroy) are essentially local to core/neighbour.c.
Make them static.

[ Moved the inline before, instead of after, call sites. -DaveM ]

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:11 -08:00
Denis V. Lunev
72132c1b6c [IPV4]: fib_rules_unregister is essentially void.
fib_rules_unregister is called only after successful register and the
return code is never checked.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:09 -08:00
Pavel Emelyanov
f51d599fbe [NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list.
Pull the struct net pointer up to the showing functions
to filter the sockets depending on their namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:06 -08:00
Ilpo Järvinen
cea14e0ed6 [TCP]: Uninline tcp_is_cwnd_limited
net/ipv4/tcp_cong.c:
  tcp_reno_cong_avoid |  -65
 1 function changed, 65 bytes removed, diff: -65

net/ipv4/arp.c:
  arp_ignore |   -5
 1 function changed, 5 bytes removed, diff: -5

net/ipv4/tcp_bic.c:
  bictcp_cong_avoid |  -57
 1 function changed, 57 bytes removed, diff: -57

net/ipv4/tcp_cubic.c:
  bictcp_cong_avoid |  -61
 1 function changed, 61 bytes removed, diff: -61

net/ipv4/tcp_highspeed.c:
  hstcp_cong_avoid |  -63
 1 function changed, 63 bytes removed, diff: -63

net/ipv4/tcp_hybla.c:
  hybla_cong_avoid |  -85
 1 function changed, 85 bytes removed, diff: -85

net/ipv4/tcp_htcp.c:
  htcp_cong_avoid |  -57
 1 function changed, 57 bytes removed, diff: -57

net/ipv4/tcp_veno.c:
  tcp_veno_cong_avoid |  -52
 1 function changed, 52 bytes removed, diff: -52

net/ipv4/tcp_scalable.c:
  tcp_scalable_cong_avoid |  -61
 1 function changed, 61 bytes removed, diff: -61

net/ipv4/tcp_yeah.c:
  tcp_yeah_cong_avoid |  -75
 1 function changed, 75 bytes removed, diff: -75

net/ipv4/tcp_illinois.c:
  tcp_illinois_cong_avoid |  -54
 1 function changed, 54 bytes removed, diff: -54

net/dccp/ccids/ccid3.c:
  ccid3_update_send_interval |   -7
  ccid3_hc_tx_packet_recv    |   +7
 2 functions changed, 7 bytes added, 7 bytes removed, diff: +0

net/ipv4/tcp_cong.c:
  tcp_is_cwnd_limited |  +88
 1 function changed, 88 bytes added, diff: +88

built-in.o:
 14 functions changed, 95 bytes added, 642 bytes removed, diff: -547

...Again some gcc artifacts visible as well.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:48 -08:00
Ilpo Järvinen
490d504693 [TCP]: Uninline tcp_set_state
net/ipv4/tcp.c:
  tcp_close_state | -226
  tcp_done        | -145
  tcp_close       | -564
  tcp_disconnect  | -141
 4 functions changed, 1076 bytes removed, diff: -1076

net/ipv4/tcp_input.c:
  tcp_fin               |  -86
  tcp_rcv_state_process | -164
 2 functions changed, 250 bytes removed, diff: -250

net/ipv4/tcp_ipv4.c:
  tcp_v4_connect | -209
 1 function changed, 209 bytes removed, diff: -209

net/ipv4/arp.c:
  arp_ignore |   +5
 1 function changed, 5 bytes added, diff: +5

net/ipv6/tcp_ipv6.c:
  tcp_v6_connect | -158
 1 function changed, 158 bytes removed, diff: -158

net/sunrpc/xprtsock.c:
  xs_sendpages |   -2
 1 function changed, 2 bytes removed, diff: -2

net/dccp/ccids/ccid3.c:
  ccid3_update_send_interval |   +7
 1 function changed, 7 bytes added, diff: +7

net/ipv4/tcp.c:
  tcp_set_state | +238
 1 function changed, 238 bytes added, diff: +238

built-in.o:
 12 functions changed, 250 bytes added, 1695 bytes removed, diff: -1445

I've no explanation why some unrelated changes seem to occur
consistently as well (arp_ignore, ccid3_update_send_interval;
I checked the arp_ignore asm and it seems to be due to some
reordered of operation order causing some extra opcodes to be
generated). Still, the benefits are pretty obvious from the
codiff's results.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:47 -08:00
Daniel Lezcano
389f661224 [NETNS][IPV6]: inet6_addr - make ipv6_chk_home_addr namespace aware
Looks if the address is belonging to the network namespace, otherwise
discard the address for the check.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:46 -08:00
Daniel Lezcano
1cab3da6be [NETNS][IPV6]: inet6_addr - ipv6_get_ifaddr namespace aware
The inet6_addr_lst is browsed taking into account the network
namespace specified as parameter. If an address does not belong
to the specified namespace, it is ignored.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:45 -08:00
Daniel Lezcano
bfeade0870 [NETNS][IPV6]: inet6_addr - check ipv6 address per namespace
When a new address is added, we must check if the new address does not
already exists.  This patch makes this check to be aware of a network
namespace, so the check will look if the address already exists for
the specified network namespace. While the addresses are browsed, the
addresses which do not belong to the namespace are discarded.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:44 -08:00
Pavel Emelyanov
39971554d3 [NEIGH]: Add a comment describing what a NUD stands for.
When I studied the neighbor code I puzzled over what the NUD can mean
for quite a long time.

Finally I asked Alexey and he said that this was smth like "neighbor
unreachability detection".

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:43 -08:00
David S. Miller
9993e7d313 [TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer().
Otherwise we beat heavily on the global tcp_memory atomics
when all of the sockets in the system are slowly sending
perioding packet clumps.

Noticed and suggested by Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:42 -08:00
Pavel Emelyanov
e0da5a480c [NETNS]: Create ipv6 devconf-s for namespaces
This is the core. Declare and register the pernet subsys for
addrconf. The init callback the will create the devconf-s.

The init_net will reuse the existing statically declared confs,
so that accessing them from inside the ipv6 code will still
work.

The register_pernet_subsys() is moved above the ipv6_add_dev()
call for loopback, because this function will need the
net->devconf_dflt pointer to be already set.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:40 -08:00
Denis V. Lunev
4250846146 [NEIGH]: Make /proc/net/arp opening consistent with seq_net_open semantics
seq_open_net requires that first field of the seq->private data to be
struct seq_net_private. In reality this is a single pointer to a
struct net for now. The patch makes code consistent.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:37 -08:00
Denis V. Lunev
1bad118a33 [NETNS]: Pass namespace through ip_rt_ioctl.
... up to rtentry_to_fib_config

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:34 -08:00
Denis V. Lunev
6bd48fcf73 [NETNS]: Provide correct namespace for fibnl netlink socket.
This patch makes the netlink socket to be per namespace. That allows
to have each namespace its own socket for routing queries.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:32 -08:00
Denis V. Lunev
e4aef8aea3 [NETNS]: Place fib tables into netns.
The preparatory work has been done. All we need is to substitute
fib_table_hash with net->ipv4.fib_table_hash. Netns context is
available when required.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:31 -08:00
Denis V. Lunev
e4e4971c5f [NETNS]: Namespacing IPv4 fib rules.
The final trick for rules: place fib4_rules_ops into struct net and
modify initialization path for this.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:31 -08:00
Denis V. Lunev
4d1169c1e7 [NETNS]: Add netns to nl_info structure.
nl_info is used to track the end-user destination of routing change
notification. This is a natural object to hold a namespace on. Place
it there and utilize the context in the appropriate places.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:29 -08:00
Eric W. Biederman
6b175b26c1 [NETNS]: Add netns parameter to inet_(dev_)add_type.
The patch extends the inet_addr_type and inet_dev_addr_type with the
network namespace pointer. That allows to access the different tables
relatively to the network namespace.

The modification of the signature function is reported in all the
callers of the inet_addr_type using the pointer to the well known
init_net.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:27 -08:00
Denis V. Lunev
8ad4942cd5 [NETNS]: Add netns parameter to fib_get_table/fib_new_table.
This patch extends the fib_get_table and the fib_new_table functions
with the network namespace pointer. That will allow to access the
table relatively from the network namespace.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:27 -08:00
Denis V. Lunev
93456b6d77 [IPV4]: Unify access to the routing tables.
Replace the direct pointers to local and main tables with
calls to fib_get_table() with appropriate argument.

This doesn't introduce additional dereferences, but makes the access to fib
tables uniform in any (CONFIG_IP_MULTIPLE_TABLES) case.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:26 -08:00
Denis V. Lunev
7b1a74fdbb [NETNS]: Refactor fib initialization so it can handle multiple namespaces.
This patch makes the fib to be initialized as a subsystem for the
network namespaces. The code does not handle several namespaces yet,
so in case of a creation of a network namespace, the
creation/initialization will not occur.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:25 -08:00
Denis V. Lunev
dbb50165b5 [IPV4]: Check fib4_rules_init failure.
This adds error paths into both versions of fib4_rules_init
(with/without CONFIG_IP_MULTIPLE_TABLES) and returns error code to the
caller.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:25 -08:00
Denis V. Lunev
61a0265344 [NETNS]: Add namespace to API for routing /proc entries creation.
This adds netns parameter to fib_proc_init/exit and replaces __init
specifier with __net_init. After this, we will not yet have these proc
files show info from the specific namespace - this will be done when
these tables become namespaced.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:24 -08:00
Denis V. Lunev
5fd30ee7c4 [NETNS]: Namespacing in the generic fib rules code.
Move static rules_ops & rules_mod_lock to the struct net, register the
pernet subsys to init them and enjoy the fact that the core rules
infrastructure works in the namespace.

Real IPv4 fib rules virtualization requires fib tables support in the
namespace and will be done seriously later in the patchset.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:23 -08:00
Denis V. Lunev
868d13ac81 [NETNS]: Pass fib_rules_ops into default_pref method.
fib_rules_ops contains operations and the list of configured rules. ops will
become per/namespace soon, so we need them to be known in the default_pref
callback.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:22 -08:00
Denis V. Lunev
f8c26b8d58 [NETNS]: Add netns parameter to fib_rules_(un)register.
The patch extends the different fib rules API in order to pass the
network namespace pointer. That will allow to access the different
tables from a namespace relative object. As usual, the pointer to the
init_net variable is passed as parameter so we don't break the
network.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:21 -08:00
Daniel Lezcano
41a76906b3 [NETNS][IPV6]: Make icmpv6_time sysctl per namespace.
This patch moves the icmpv6_time sysctl to the network namespace
structure.

Because the ipv6 protocol is not yet per namespace, the variable is
accessed relatively to the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:20 -08:00
Daniel Lezcano
4990509f19 [NETNS][IPV6]: Make sysctls route per namespace.
All the sysctl concerning the routes are moved to the network
namespace structure. A helper function is called to initialize the
variables.

Because the ipv6 protocol is not yet per namespace, the variables are
accessed relatively from the network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:20 -08:00
Daniel Lezcano
e71e0349eb [NETNS][IPV6]: Make ip6_frags per namespace.
The ip6_frags is moved to the network namespace structure.  Because
there can be multiple instances of the network namespaces, and the
ip6_frags is no longer a global static variable, a helper function has
been added to facilitate the initialization of the variables.

Until the ipv6 protocol is not per namespace, the variables are
accessed relatively from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:18 -08:00
Daniel Lezcano
99bc9c4e45 [NETNS][IPV6]: Make bindv6only sysctl per namespace.
This patch moves the bindv6only sysctl to the network namespace
structure. Until the ipv6 protocol is not per namespace, the sysctl
variable is always from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:18 -08:00
Daniel Lezcano
760f2d0186 [NETNS][IPV6]: Make multiple instance of sysctl tables.
Each network namespace wants its own set of sysctl value, eg. we
should not be able from a namespace to set a sysctl value for another
namespace , especially for the initial network namespace.

This patch duplicates the sysctl table when we register a new network
namespace for ipv6. The duplicated table are postfixed with the
"template" word to notify the developper the table is cloned.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:17 -08:00
Daniel Lezcano
b0f159db7c [NETNS][IPV6]: Add ipv6 structure for netns.
Like the ipv4 part, this patch adds an ipv6 structure in the net
structure to aggregate the different resources to make ipv6 per
namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:15 -08:00
Daniel Lezcano
291480c09a [NETNS][IPV6]: Make ipv6_sysctl_register to return a value.
This patch makes the function ipv6_sysctl_register to return a
value. The af_inet6 init function is now able to handle an error and
catch it from the initialization of the sysctl.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:14 -08:00
Pavel Emelyanov
b3fd3ffe39 [NETFILTER]: Use the ctl paths instead of hand-made analogue
The conntracks subsystem has a similar infrastructure
to maintain ctl_paths, but since we already have it
on the generic level, I think it's OK to switch to
using it.

So, basically, this patch just replaces the ctl_table-s
with ctl_path-s, nf_register_sysctl_table with
register_sysctl_paths() and removes no longer needed code.

After this the net/netfilter/nf_sysctl.c file contains
the paths only.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:11 -08:00
Pavel Emelyanov
3d7cc2ba62 [NETFILTER]: Switch to using ctl_paths in nf_queue and conntrack modules
This includes the most simple cases for netfilter.

The first part is tne queue modules for ipv4 and ipv6,
on which the net/ipv4/ and net/ipv6/ paths are reused
from the appropriate ipv4 and ipv6 code.

The conntrack module is also patched, but this hunk is
very small and simple.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:10 -08:00
Pavel Emelyanov
90754f8ec0 [IPVS]: Switch to using ctl_paths.
The feature of ipvs ctls is that the net/ipv4/vs path
is common for core ipvs ctls and for two schedulers,
so I make it exported and re-use it in modules.

Two other .c files required linux/sysctl.h to make the
extern declaration of this path compile well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:08 -08:00
Ivo van Doorn
cdcb006fbe mac80211: Add radio led trigger
Some devices have a seperate LED which indicates if the radio is
enabled or not. This adds a LED trigger to mac80211 where drivers
can hook into when they are interested in radio status changes.

v2: Check hw.conf.radio_enabled when calling start().

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:04 -08:00
Ron Rindjunsky
1b7d03acbf mac80211: A-MPDU Rx add low level driver API
This patch adds the API to perform A-MPDU actions between mac80211 and low
level driver.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:57 -08:00
Rami Rosen
3becd578c5 [NET]: Remove unused member of dst_entry
The info placeholder member of dst_entry seems to be unused in the
network stack.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:47 -08:00
WANG Cong
64c31b3f76 [XFRM] xfrm_policy_destroy: Rename and relative fixes.
Since __xfrm_policy_destroy is used to destory the resources
allocated by xfrm_policy_alloc. So using the name
__xfrm_policy_destroy is not correspond with xfrm_policy_alloc.
Rename it to xfrm_policy_destroy.

And along with some instances that call xfrm_policy_alloc
but not using xfrm_policy_destroy to destroy the resource,
fix them.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:46 -08:00
Eric Dumazet
2a75de0c1d [NETNS]: Should build with CONFIG_SYSCTL=n
Previous NETNS patches broke CONFIG_SYSCTL=n case

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:40 -08:00
Eric Dumazet
65f7651788 [NET]: prot_inuse cleanups and optimizations
1) Cleanups (all functions are prefixed by sock_prot_inuse)

sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_inuse()       -> sock_prot_inuse_get()

New functions :

sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.

2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
since nobody wants to read the inuse value.

This saves 1372 bytes on i386/SMP and some cpu cycles.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:36 -08:00
Rami Rosen
b950dfcf50 [IPVS]: Remove declaration of unimplemented method and remove unused definition from include/net/ip_vs.h
In include/net/ip_vs.h:
- The ip_vs_secure_tcp_set() method is not implemented anywhere.
- IP_VS_APP_TYPE_FTP is an unused definition.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:30 -08:00
Rami Rosen
b798232fcc [IPV4]: Remove three declarations of unimplemented methods and correct a typo in include/net/ip.h
These three declarations in include/net/ip.h are not implemented
anywhere:

ip_mc_dropsocket(), ip_mc_dropdevice() and ip_net_unreachable().

Also, correct a comment to be "Functions provided by ip_fragment.c"
(instead of by ip_fragment.o) in consistency with the other comments
in this header.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:29 -08:00
Herbert Xu
9ef32d0d1f [IPSEC]: Kill duplicate xfrm_policy_flush prototype
For five years we had two xfrm_policy_flush prototypes and every time that
function's signature changed people have been diligently updating both of
them without noticing :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:28 -08:00
Ilpo Järvinen
4828e7f49a [TCP]: Remove TCPCB_URG & TCPCB_AT_TAIL as unnecessary
The snd_up check should be enough. I suspect this has been
there to provide a minor optimization in clean_rtx_queue which
used to have a small if (!->sacked) block which could skip
snd_up check among the other work.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:23 -08:00
Ilpo Järvinen
90840defab [TCP]: Introduce tcp_wnd_end() to reduce line lengths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:22 -08:00
Rami Rosen
61f1ab41b8 [IPV4]: Remove unused multipath cached routing defintion in net/flow.h
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:20 -08:00
Hideo Aoki
95766fff6b [UDP]: Add memory accounting.
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:19 -08:00
Hideo Aoki
3ab224be6d [NET] CORE: Introducing new memory accounting interface.
This patch introduces new memory accounting functions for each network
protocol. Most of them are renamed from memory accounting functions
for stream protocols. At the same time, some stream memory accounting
functions are removed since other functions do same thing.

Renaming:
	sk_stream_free_skb()		->	sk_wmem_free_skb()
	__sk_stream_mem_reclaim()	->	__sk_mem_reclaim()
	sk_stream_mem_reclaim()		->	sk_mem_reclaim()
	sk_stream_mem_schedule 		->    	__sk_mem_schedule()
	sk_stream_pages()      		->	sk_mem_pages()
	sk_stream_rmem_schedule()	->	sk_rmem_schedule()
	sk_stream_wmem_schedule()	->	sk_wmem_schedule()
	sk_charge_skb()			->	sk_mem_charge()

Removeing
	sk_stream_rfree():	consolidates into sock_rfree()
	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
	sk_stream_mem_schedule()

The following functions are added.
    	sk_has_account(): check if the protocol supports accounting
	sk_mem_uncharge(): do the opposite of sk_mem_charge()

In addition, to achieve consolidation, updating sk_wmem_queued is
removed from sk_mem_charge().

Next, to consolidate memory accounting functions, this patch adds
memory accounting calls to network core functions. Moreover, present
memory accounting call is renamed to new accounting call.

Finally we replace present memory accounting calls with new interface
in TCP and SCTP.

Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:18 -08:00
Rami Rosen
f624357959 [NEIGH]: Remove unused method from include/net/neighbour.h
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:17 -08:00
Rami Rosen
04ce99c483 [IPV4]: Remove unused define in include/net/arp.h (HAVE_ARP_CREATE)
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:16 -08:00
Eric Dumazet
21371f768b [SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim()
sk_forward_alloc being signed, we should take care of divides by
SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and
__sk_stream_mem_reclaim()

This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined
as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right
shifts instead of plain divides.

This should help compiler to choose right shifts instead of
expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:05 -08:00
Eric W. Biederman
426b5303eb [NETNS]: Modify the neighbour table code so it handles multiple network namespaces
I'm actually surprised at how much was involved.  At first glance it
appears that the neighbour table data structures are already split by
network device so all that should be needed is to modify the user
interface commands to filter the set of neighbours by the network
namespace of their devices.

However a couple things turned up while I was reading through the
code.  The proxy neighbour table allows entries with no network
device, and the neighbour parms are per network device (except for the
defaults) so they now need a per network namespace default.

So I updated the two structures (which surprised me) with their very
own network namespace parameter.  Updated the relevant lookup and
destroy routines with a network namespace parameter and modified the
code that interacts with users to filter out neighbour table entries
for devices of other namespaces.

I'm a little concerned that we can modify and display the global table
configuration and from all network namespaces.  But this appears good
enough for now.

I keep thinking modifying the neighbour table to have per network
namespace instances of each table type would should be cleaner.  The
hash table is already dynamically sized so there are it is not a
limiter.  The default parameter would be straight forward to take care
of.  However when I look at the how the network table is built and
used I still find some assumptions that there is only a single
neighbour table for each type of table in the kernel.  The netlink
operations, neigh_seq_start, the non-core network users that call
neigh_lookup.  So while it might be doable it would require more
refactoring than my current approach of just doing a little extra
filtering in the code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:03 -08:00
Paul Moore
afeb14b490 [XFRM]: RFC4303 compliant auditing
This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303.  This includes audit hooks for the following events:

 * Could not find a valid SA [sections 2.1, 3.4.2]
   . xfrm_audit_state_notfound()
   . xfrm_audit_state_notfound_simple()

 * Sequence number overflow [section 3.3.3]
   . xfrm_audit_state_replay_overflow()

 * Replayed packet [section 3.4.3]
   . xfrm_audit_state_replay()

 * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
   . xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP.  The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:01 -08:00
Eric Dumazet
8df09ea3b8 [SOCK] Avoid integer divides where not necessary in include/net/sock.h
Because sk_wmem_queued, sk_sndbuf are signed, a divide per two
may force compiler to use an integer divide.

We can instead use a right shift.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:59 -08:00
YOSHIFUJI Hideaki
9cb5734e5b [TCP]: Convert several length variable to unsigned.
Several length variables cannot be negative, so convert int to
unsigned int.  This also allows us to do sane shift operations
on those variables.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:56 -08:00
Johannes Berg
fd5b74dcb8 cfg80211/nl80211: implement station attribute retrieval
After a station is added to the kernel's structures, userspace
has to be able to retrieve statistics about that station, especially
whether the station was idle and how much bytes were transferred
to and from it. This adds the necessary code to nl80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:52 -08:00
Johannes Berg
5727ef1b2e cfg80211/nl80211: station handling
This patch adds station handling to cfg80211/nl80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:51 -08:00
Johannes Berg
ed1b6cc7f8 cfg80211/nl80211: add beacon settings
This adds the necessary API to cfg80211/nl80211 to allow
changing beaconing settings.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:50 -08:00
Johannes Berg
62da92fb75 mac80211: support getting key sequence counters via cfg80211
This implements cfg80211's get_key() to allow retrieving the sequence
counter for a TKIP or CCMP key from userspace. It also cleans up and
documents the associated low-level driver interface.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:50 -08:00
Johannes Berg
41ade00f21 cfg80211/nl80211: introduce key handling
This introduces key handling to cfg80211/nl80211. Default
and group keys can be added, changed and removed; sequence
counters for each key can be retrieved.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:48 -08:00
Johannes Berg
7d54d0ddd6 mac80211: allow easier multicast/broadcast buffering in hardware
There are various decisions influencing the decision whether to buffer
a frame for after the next DTIM beacon. The "do we have stations in PS
mode" condition cannot be tested by the driver so mac80211 has to do
that. To ease driver writing for hardware that can buffer frames until
after the next DTIM beacon, introduce a new txctl flag telling the
driver to buffer a specific frame.

While at it, restructure and comment the code for multicast buffering
and remove spurious "inline" directives.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:47 -08:00
Johannes Berg
678f5f7117 mac80211: clean up eapol handling in TX path
The previous patch left only one user of the ieee80211_is_eapol()
function and that user can be eliminated easily by introducing
a new "frame is EAPOL" flag to handle the frame specially (we
already have this information) instead of doing the (expensive)
ieee80211_is_eapol() all the time.

Also, allow unencrypted frames to be sent when they are injected.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:46 -08:00
Paul Moore
68277accb3 [XFRM]: Assorted IPsec fixups
This patch fixes a number of small but potentially troublesome things in the
XFRM/IPsec code:

 * Use the 'audit_enabled' variable already in include/linux/audit.h
   Removed the need for extern declarations local to each XFRM audit fuction

 * Convert 'sid' to 'secid' everywhere we can
   The 'sid' name is specific to SELinux, 'secid' is the common naming
   convention used by the kernel when refering to tokenized LSM labels,
   unfortunately we have to leave 'ctx_sid' in 'struct xfrm_sec_ctx' otherwise
   we risk breaking userspace

 * Convert address display to use standard NIP* macros
   Similar to what was recently done with the SPD audit code, this also also
   includes the removal of some unnecessary memcpy() calls

 * Move common code to xfrm_audit_common_stateinfo()
   Code consolidation from the "less is more" book on software development

 * Proper spacing around commas in function arguments
   Minor style tweak since I was already touching the code

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:40 -08:00
Masahide NAKAMURA
558f82ef6e [XFRM]: Define packet dropping statistics.
This statistics is shown factor dropped by transformation
at /proc/net/xfrm_stat for developer.
It is a counter designed from current transformation source code
and defined as linux private MIB.

See Documentation/networking/xfrm_proc.txt for the detail.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:38 -08:00
Masahide NAKAMURA
a1b051405b [XFRM] IPv6: Fix dst/routing check at transformation.
IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
  care about routing changes. It is required by MIPv6 and
  off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
  MIPv6 to rt6i_nfheader_len as IPv6 name space.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:36 -08:00
Pavel Emelyanov
7054fb9376 [INET]: Uninline the inet_twsk_put function.
This one is not that big, but is widely used: saves 1200 bytes
from net/ipv4/built-in.o

add/remove: 1/0 grow/shrink: 1/12 up/down: 97/-1300 (-1203)
function                                     old     new   delta
inet_twsk_put                                  -      87     +87
__inet_lookup_listener                       274     284     +10
tcp_sacktag_write_queue                     2255    2254      -1
tcp_time_wait                                482     411     -71
__inet_check_established                     796     722     -74
tcp_v4_err                                   973     898     -75
__inet_twsk_kill                             230     154     -76
inet_twsk_deschedule                         180     103     -77
tcp_v4_do_rcv                                462     384     -78
inet_hash_connect                            686     607     -79
inet_twdr_do_twkill_work                     236     150     -86
inet_twdr_twcal_tick                         395     307     -88
tcp_v4_rcv                                  1744    1480    -264
tcp_timewait_state_process                   975     644    -331

Export it for ipv6 module.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:28 -08:00
Pavel Emelyanov
77a5ba55da [INET]: Uninline the __inet_lookup_established function.
This is -700 bytes from the net/ipv4/built-in.o

add/remove: 1/0 grow/shrink: 1/3 up/down: 340/-1040 (-700)
function                                     old     new   delta
__inet_lookup_established                      -     339    +339
tcp_sacktag_write_queue                     2254    2255      +1
tcp_v4_err                                  1304     973    -331
tcp_v4_rcv                                  2089    1744    -345
tcp_v4_do_rcv                                826     462    -364

Exporting is for dccp module (used via e.g. inet_lookup).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:27 -08:00
Pavel Emelyanov
152da81deb [INET]: Uninline the __inet_hash function.
This one is used in quite many places in the networking code and
seems to big to be inline.

After the patch net/ipv4/build-in.o loses ~650 bytes:
add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653)
function                                     old     new   delta
__inet_hash_nolisten                           -     282    +282
__inet_hash                                    -     179    +179
tcp_sacktag_write_queue                     2255    2254      -1
__inet_lookup_listener                       284     274     -10
tcp_v4_syn_recv_sock                         755     493    -262
tcp_v4_hash                                  389      35    -354
inet_hash_connect                           1086     599    -487

This version addresses the issue pointed by Eric, that
while being inline this function was optimized by gcc
in respect to the 'listen_possible' argument.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:26 -08:00
Vlad Yasevich
75205f4783 [SCTP]: Implement ADD-IP special case processing for ABORT chunk
ADD-IP spec has a special case for processing ABORTs:
    F4) ... One special consideration is that ABORT
        Chunks arriving destined to the IP address being deleted MUST be
        ignored (see Section 5.3.1 for further details).

Check if the address we received on is in the DEL state, and if
so, ignore the ABORT.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:24 -08:00
Vlad Yasevich
f57d96b2e9 [SCTP]: Change use_as_src into a full address state
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:24 -08:00
Vlad Yasevich
a08de64d07 [SCTP]: Update ASCONF processing to conform to spec.
The processing of the ASCONF chunks has changed a lot in the
spec.  New items are:
    1. A list of ASCONF-ACK chunks is now cached
    2. The source of the packet is used in response.
    3. New handling for unexpect ASCONF chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:23 -08:00
Vlad Yasevich
d6de309759 [SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT
The ADD-IP "Set Primary IP Address" parameter is allowed in the
INIT/INIT-ACK exchange.  Allow processing of this parameter during
the INIT/INIT-ACK.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:21 -08:00
Vlad Yasevich
42e30bf346 [SCTP]: Handle the wildcard ADD-IP Address parameter
The Address Parameter in the parameter list of the ASCONF chunk
may be a wildcard address.  In this case special processing
is required.  For the 'add' case, the source IP of the packet is
added.  In the 'del' case, all addresses except the source IP
of packet are removed. In the "mark primary" case, the source
address is marked as primary.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:20 -08:00
Herbert Xu
d647b36a69 [SNMP]: Fix SNMP counters with PREEMPT
The SNMP macros use raw_smp_processor_id() in process context
which is illegal because the process may be preempted and then
migrated to another CPU.

This patch makes it use get_cpu/put_cpu to disable preemption.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:16 -08:00
Jan Engelhardt
643a2c15a4 [NETFILTER]: Introduce nf_inet_address
A few netfilter modules provide their own union of IPv4 and IPv6
address storage. Will unify that in this patch series.

(1/4): Rename union nf_conntrack_address to union nf_inet_addr and
move it to x_tables.h.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:07 -08:00
Patrick McHardy
7b2f9631e7 [NETFILTER]: nf_log: constify struct nf_logger and nf_log_packet loginfo arg
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:59 -08:00
Patrick McHardy
f01ffbd6e7 [NETFILTER]: nf_log: move logging stuff to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:58 -08:00
Patrick McHardy
cc01dcbd26 [NETFILTER]: nf_nat: pass manip type instead of hook to nf_nat_setup_info
nf_nat_setup_info gets the hook number and translates that to the
manip type to perform. This is a relict from the time when one
manip per hook could exist, the exact hook number doesn't matter
anymore, its converted to the manip type. Most callers already
know what kind of NAT they want to perform, so pass the maniptype
in directly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:57 -08:00
Patrick McHardy
2b628a0866 [NETFILTER]: nf_nat: mark NAT protocols const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:56 -08:00
Patrick McHardy
838965ba22 [NETLINK]: Add NLA_PUT_BE16/nla_get_be16()
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:53 -08:00
Johannes Berg
c49e5ea322 mac80211: conditionally include timestamp in radiotap information
This makes mac80211 include the low-level MAC timestamp
in the radiotap header if the driver indicated (by a new
RX flag) that the timestamp is valid.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:25 -08:00
Vlad Yasevich
9ad0977fe1 [SCTP]: Use crc32c library for checksum calculations.
The crc32c library used an identical table and algorithm
as SCTP.  Switch to using the library instead of carrying
our own table.  Using crypto layer proved to have too
much overhead compared to using the library directly.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:20 -08:00
Joe Perches
b5cb2bbc4c [IPV4] sctp: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:17 -08:00
Joe Perches
3db8cda362 [IPV4] include/net: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:14 -08:00
Pavel Emelyanov
752d14dc6a [IPV4]: Move the devinet pointers on the struct net
This is the core.

Add all and default pointers on the netns_ipv4 and register
a new pernet subsys to initialize them.

Also add the ctl_table_header to register the
net.ipv4.ip_forward ctl.

I don't allocate additional memory for init_net, but use
global devinets.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:11 -08:00
Pavel Emelyanov
32e569b727 [IPV4]: Pass the net pointer to the arp_req_set_proxy()
This one will need to set the IPV4_DEVCONF_ALL(PROXY_ARP), but
there's no ways to get the net right in place, so we have to
pull one from the inet_ioctl's struct sock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:09 -08:00
Pavel Emelyanov
8afd351c77 [NETNS]: Add the netns_ipv4 struct
The ipv4 will store its parameters inside this structure.
This one is empty now, but it will be eventually filled.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:08 -08:00
Harvey Harrison
41380930d2 [NET]: Remove FASTCALL macro
X86_32 was the last user of the FASTCALL macro, now that it
uses regparm(3) by default, this macro expands to nothing.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:23 -08:00
Herbert Xu
8b7817f3a9 [IPSEC]: Add ICMP host relookup support
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:23 -08:00
Herbert Xu
d5422efe68 [IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:22 -08:00
Herbert Xu
815f4e57e9 [IPSEC]: Make xfrm_lookup flags argument a bit-field
This patch introduces an enum for bits in the flags argument of xfrm_lookup.
This is so that we can cram more information into it later.

Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has
been added with the value 1 << 0 to represent the current meaning of flags.

The test in __xfrm_lookup has been changed accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:21 -08:00
Denis V. Lunev
2aaef4e47f [NETNS]: separate af_packet netns data
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:15 -08:00
Denis V. Lunev
a0a53c8ba9 [NETNS]: struct net content re-work (v3)
Recently David Miller and Herbert Xu pointed out that struct net becomes
overbloated and un-maintainable. There are two solutions:
- provide a pointer to a network subsystem definition from struct net.
  This costs an additional dereferrence
- place sub-system definition into the structure itself. This will speedup
  run-time access at the cost of recompilation time

The second approach looks better for us. Other sub-systems will follow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:14 -08:00
Denis V. Lunev
27147c9e6e [AF_UNIX]: Remove unused declaration of sysctl_unix_max_dgram_qlen.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:13 -08:00
Daniel Lezcano
7f4e4868f3 [IPV6]: make the protocol initialization to return an error code
This patchset makes the different protocols to return an error code, so
the af_inet6 module can check the initialization was correct or not.

The raw6 was taken into account to be consistent with the rest of the
protocols, but the registration is at the same place.
Because the raw6 has its own init function, the proto and the ops structure
can be moved inside the raw6.c file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:13 -08:00
Daniel Lezcano
87c3efbfdd [IPV6]: make inet6_register_protosw to return an error code
This patch makes the inet6_register_protosw to return an error code.
The different protocols can be aware the registration was successful or
not and can pass the error to the initial caller, af_inet6.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:12 -08:00
Daniel Lezcano
853cbbaaa4 [IPV6]: make frag to return an error at initialization
This patch makes the frag_init to return an error code, so the af_inet6
module can handle the error.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:11 -08:00
Daniel Lezcano
248b238dc9 [IPV6]: make extended headers to return an error at initialization
This patch factorize the code for the differents init functions for rthdr,
nodata, destopt in a single function exthdrs_init.
This function returns an error so the af_inet6 module can check correctly
the initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
Daniel Lezcano
0a3e78ac2c [IPV6]: make flowlabel to return an error
This patch makes the flowlab subsystem to return an error code and makes
some cleanup with procfs ifdefs.
The af_inet6 will use the flowlabel init return code to check the initialization
was correct.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
Herbert Xu
005011211f [IPSEC]: Add xfrm_input_state helper
This patch adds the xfrm_input_state helper function which returns the
current xfrm state being processed on the input path given an sk_buff.
This is currently only used by xfrm_input but will be used by ESP upon
asynchronous resumption.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:05 -08:00
YOSHIFUJI Hideaki
c69bce20dd [NET]: Remove unused "mibalign" argument for snmp_mib_init().
With fixes from Arnaldo Carvalho de Melo.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:02 -08:00
Denis V. Lunev
971b893e79 [IPV4]: last default route is a fib table property
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:01 -08:00
Daniel Lezcano
7e5449c215 [IPV6]: route6 remove ifdef for fib_rules
The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano
c35b7e72cd [IPV6]: remove ifdef in route6 for xfrm6
The following patch create the usual static inline functions to disable
the xfrm6_init and xfrm6_fini function when XFRM is off.
That's allow to remove some ifdef and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Pavel Emelyanov
b8e1f9b5c3 [NET] sysctl: make sysctl_somaxconn per-namespace
Just move the variable on the struct net and adjust
its usage.

Others sysctls from sys.net.core table are more
difficult to virtualize (i.e. make them per-namespace),
but I'll look at them as well a bit later.

Signed-off-by: Pavel Emelyanov <xemul@oenvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:57 -08:00
Pavel Emelyanov
024626e36d [NET] sysctl: make the sys.net.core sysctls per-namespace
Making them per-namespace is required for the following
two reasons:

 First, some ctl values have a per-namespace meaning.
 Second, making them writable from the sub-namespace
 is an isolation hole.

So I introduce the pernet operations to create these
tables. For init_net I use the existing statically
declared tables, for sub-namespace they are duplicated
and the write bits are removed from the mode.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:56 -08:00
Pavel Emelyanov
cbbb90e68c [SNMP]: Remove unused devconf macros.
The SNMP_INC_STATS_OFFSET_BH is used only by ICMP6_INC_STATS_OFFSET_BH.
The ICMP6_INC_STATS_OFFSET_BH is unused.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:55 -08:00
Eric W. Biederman
bb80317586 [IPV4]: Remove ip_fib_local_table and ip_fib_main_table defines.
There are only 2 users and it doesn't hurt to call fib_get_table
instead, and it makes it easier to make the fib network namespace
aware.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:49 -08:00
Daniel Lezcano
433d49c3bb [IPV6]: Make ip6_route_init to return an error code.
The route initialization function does not return any value to notify
if the initialization is successful or not. This patch checks all
calls made for the initilization in order to return a value for the
caller.

Unfortunately, proc_net_fops_create will return a NULL pointer if
CONFIG_PROC_FS is off, so we can not check the return code without an
ifdef CONFIG_PROC_FS block in the ip6_route_init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano
9eb87f3f7e [IPV6]: Make fib6_rules_init to return an error code.
When the fib_rules initialization finished, no return code is provided
so there is no way to know, for the caller, if the initialization has
been successful or has failed. This patch fix that.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:46 -08:00
Daniel Lezcano
0013cabab3 [IPV6]: Make xfrm6_init to return an error code.
The xfrm initialization function does not return any error code, so if
there is an error, the caller can not be advise of that.  This patch
checks the return code of the different called functions in order to
return a successful or failed initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Daniel Lezcano
d63bddbe90 [IPV6]: Make fib6_init to return an error code.
If there is an error in the initialization function, nothing is
followed up to the caller. So I add a return value to be set for the
init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Patrick McHardy
f4d900a2ca [NETLINK]: Mark attribute construction exception unlikely
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:34 -08:00
Herbert Xu
1781f7f580 [UDP]: Restore missing inDatagrams increments
The previous move of the the UDP inDatagrams counter caused the
counting of encapsulated packets, SUNRPC data (as opposed to call)
packets and RXRPC packets to go missing.

This patch restores all of these.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:33 -08:00
Pavel Emelyanov
3e37c3f997 [IPV4]: Use ctl paths to register net/ipv4/ table
This is the same as I did for the net/core/ table in the
second patch in his series: use the paths and isolate the
whole table in the .c file.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:27 -08:00
Pavel Emelyanov
33eb9cfc70 [NET]: Isolate the net/core/ sysctl table
Using ctl paths we can put all the stuff, related to net/core/
sysctl table, into one file and remove all the references on it.

As a good side effect this hides the "core_table" name from
the global scope :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:26 -08:00
Patrick McHardy
02f014d888 [NETFILTER]: nf_queue: move list_head/skb/id to struct nf_info
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.

Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy
c01cd429fc [NETFILTER]: nf_queue: move queueing related functions/struct to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
5859034d7e [NETFILTER]: x_tables: add RATEEST target
Add new rate estimator target (using gen_estimator). In combination with
the rateest match (next patch) this can be used for load-based multipath
routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:02 -08:00
Jan Engelhardt
3c3f486603 [NET]: Constify include/net/dsfield.h
Constify include/net/dsfield.h

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:58 -08:00
Laszlo Attila Toth
0553811612 [IPV4]: Add inet_dev_addr_type()
Address type search can be limited to an interface by
inet_dev_addr_type function.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:56 -08:00
Denis V. Lunev
0eeb8ffcfe [NET]: netns compilation speedup
This patch speedups compilation when net_namespace.h is changed.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:51 -08:00
Herbert Xu
2fcb45b6b8 [IPSEC]: Use the correct family for input state lookup
When merging the input paths of IPsec I accidentally left a hard-coded
AF_INET for the state lookup call.  This broke IPv6 obviously.  This
patch fixes by getting the input callers to specify the family through
skb->cb.

Credit goes to Kazunori Miyazawa for diagnosing this and providing an
initial patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:49 -08:00
Ilpo Järvinen
6859d49475 [TCP]: Abstract tp->highest_sack accessing & point to next skb
Pointing to the next skb is necessary to avoid referencing
already SACKed skbs which will soon be on a separate list.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:46 -08:00
Ilpo Järvinen
234b686070 [TCP]: Add tcp_for_write_queue_from_safe and use it in mtu_probe
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:43 -08:00
Ilpo Järvinen
c3a05c6050 [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:41 -08:00
Ron Rindjunsky
d3c990fb26 mac80211: adding 802.11n configuration flows
This patch configures the 802.11n mode of operation
internally in ieee80211_conf structure and in the low-level
driver as well (through op conf_ht).
It does not include AP configuration flows.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:33 -08:00
Ron Rindjunsky
10816d40f2 mac80211: adding 802.11n HT framework definitions
New structures:
 - ieee80211_ht_info: describing STA's HT capabilities
 - ieee80211_ht_bss_info: describing BSS's HT characteristics
Changed structures:
 - ieee80211_hw_mode: now also holds PHY HT capabilities for each HW mode
 - ieee80211_conf: ht_conf holds current self HT configuration
                   ht_bss_conf holds current BSS HT configuration
 - flag IEEE80211_CONF_SUPPORT_HT_MODE added to indicate if HT use is
   desired
 - sta_info: now also holds Peer's HT capabilities

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:30 -08:00
Johannes Berg
e38bad4766 mac80211: make ieee80211_iterate_active_interfaces not need rtnl
Interface iteration in mac80211 can be done without holding any
locks because I converted it to RCU. Initially, I thought this
wouldn't be needed for ieee80211_iterate_active_interfaces but
it's turning out that multi-BSS AP support can be much simpler
in a driver if ieee80211_iterate_active_interfaces can be called
without holding locks. This converts it to use RCU, it adds a
requirement that the callback it invokes cannot sleep.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:28 -08:00
Pavel Emelyanov
1597fbc0fa [UNIX]: Make the unix sysctl tables per-namespace
This is the core.

 * add the ctl_table_header on the struct net;
 * make the unix_sysctl_register and _unregister clone the table;
 * moves calls to them into per-net init and exit callbacks;
 * move the .data pointer in the proper place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:23 -08:00
Pavel Emelyanov
d392e49756 [UNIX]: Move the sysctl_unix_max_dgram_qlen
This will make all the sub-namespaces always use the
default value (10) and leave the tuning via sysctl
to the init namespace only.

Per-namespace tuning is coming.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:22 -08:00
Pavel Emelyanov
97577e3828 [UNIX]: Extend unix_sysctl_(un)register prototypes
Add the struct net * argument to both of them to use in
the future. Also make the register one return an error code.

It is useless right now, but will make the future patches
much simpler.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:21 -08:00
Eric W. Biederman
95bdfccb2b [NET]: Implement the per network namespace sysctl infrastructure
The user interface is: register_net_sysctl_table and
unregister_net_sysctl_table.  Very much like the current
interface except there is a network namespace parameter.

With this any sysctl registered with register_net_sysctl_table
will only show up to tasks in the same network namespace.

All other sysctls continue to be globally visible.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:18 -08:00
Patrick McHardy
a99a00cf1a [NET]: Move netfilter checksum helpers to net/core/utils.c
This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT.
This patch redefines the old names to keep the noise low, the next patch
converts all users.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:14 -08:00
Fred L. Templin
c7dc89c0ac [IPV6]: Add RFC4214 support
This patch includes support for the Intra-Site Automatic Tunnel
Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
module, and is configured using extensions to the "iproute2"
utility. The diffs are specific to the Linux 2.6.24-rc2 kernel
distribution.

This version includes the diff for ./include/linux/if.h which was
missing in the v2.4 submission and is needed to make the
patch compile. The patch has been installed, compiled and
tested in a clean 2.6.24-rc2 kernel build area.

Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:09 -08:00
Pavel Emelyanov
df97c708d5 [NET]: Eliminate unused argument from sk_stream_alloc_pskb
The 3rd argument is always zero (according to grep :) Eliminate
it and merge the function with sk_stream_alloc_skb.

This saves 44 more bytes, and together with the previous patch
we have:

add/remove: 1/0 grow/shrink: 0/8 up/down: 183/-751 (-568)
function                                     old     new   delta
sk_stream_alloc_skb                            -     183    +183
ip_rt_init                                   529     525      -4
arp_ignore                                   112     107      -5
__inet_lookup_listener                       284     274     -10
tcp_sendmsg                                 2583    2481    -102
tcp_sendpage                                1449    1300    -149
tso_fragment                                 417     258    -159
tcp_fragment                                1149     988    -161
__tcp_push_pending_frames                   1998    1837    -161

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:08 -08:00
Pavel Emelyanov
f561d0f27d [NET]: Uninline the sk_stream_alloc_pskb
This function seems too big for inlining. Indeed, it saves
half-a-kilo when uninlined:

add/remove: 1/0 grow/shrink: 0/7 up/down: 195/-719 (-524)
function                                     old     new   delta
sk_stream_alloc_pskb                           -     195    +195
ip_rt_init                                   529     525      -4
__inet_lookup_listener                       284     274     -10
tcp_sendmsg                                 2583    2486     -97
tcp_sendpage                                1449    1305    -144
tso_fragment                                 417     267    -150
tcp_fragment                                1149     992    -157
__tcp_push_pending_frames                   1998    1841    -157

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:07 -08:00
Ilpo Järvinen
8512430e55 [TCP]: Move FRTO checks out from write queue abstraction funcs
Better place exists in update_send_head (other non-queue related
adjustments are done there as well) which is the only caller of
tcp_advance_send_head (now that the bogus call from mtu_probe is
gone).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:05 -08:00
Arnaldo Carvalho de Melo
ebb53d7565 [NET] proto: Use pcounters for the inuse field
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:40 -08:00
Johannes Berg
dabeb344f5 mac80211: provide interface iterator for drivers
Sometimes drivers need to know which interfaces are associated with
their hardware. Rather than forcing those drivers to keep track of
the interfaces that were added, this adds an iteration function to
mac80211.

As it is intended to be used from the interface add/remove callbacks,
the iteration function may currently only be called under RTNL.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:37 -08:00
Pavel Emelyanov
42a73808ed [RAW]: Consolidate proc interface.
Both ipv6/raw.c and ipv4/raw.c use the seq files to walk
through the raw sockets hash and show them.

The "walking" code is rather huge, but is identical in both
cases. The difference is the hash table to walk over and
the protocol family to check (this was not in the first
virsion of the patch, which was noticed by YOSHIFUJI)

Make the ->open store the needed hash table and the family
on the allocated raw_iter_state and make the start/next/stop
callbacks work with it.

This removes most of the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:32 -08:00
Pavel Emelyanov
ab70768ec7 [RAW]: Consolidate proto->unhash callback
Same as the ->hash one, this is easily consolidated.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov
65b4c50b47 [RAW]: Consolidate proto->hash callback
Having the raw_hashinfo it's easy to consolidate the
raw[46]_hash functions.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov
b673e4dfc8 [RAW]: Introduce raw_hashinfo structure
The ipv4/raw.c and ipv6/raw.c contain many common code (most
of which is proc interface) which can be consolidated.

Most of the places to consolidate deal with the raw sockets
hashtable, so introduce a struct raw_hashinfo which describes
the raw sockets hash.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:30 -08:00
Pavel Emelyanov
69d6da0b0f [IPv6] RAW: Compact the API for the kernel
Same as in the previous patch for ipv4, compact the
API and hide hash table and rwlock inside the raw.c
file.

Plus fix some "bad" places from checkpatch.pl point
of view (assignments inside if()).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:29 -08:00
Pavel Emelyanov
7bc54c9030 [IPv4] RAW: Compact the API for the kernel
The raw sockets functions are explicitly used from
inside the kernel in two places:

1. in ip_local_deliver_finish to intercept skb-s
2. in icmp_error

For this purposes many functions and even data structures,
that are naturally internal for raw protocol, are exported.

Compact the API to two functions and hide all the other
(including hash table and rwlock) inside the net/ipv4/raw.c

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:28 -08:00
Denis V. Lunev
d12d01d6b4 [NET]: Make AF_PACKET handle multiple network namespaces
This is done by making packet_sklist_lock and packet_sklist per
network namespace and adding an additional filter condition on
received packets to ensure they came from the proper network
namespace.

Changes from v1:
- prohibit to call inet_dgram_ops.ioctl in other than init_net

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:26 -08:00
Denis V. Lunev
97c53cacf0 [NET]: Make rtnetlink infrastructure network namespace aware (v3)
After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:25 -08:00
Ilpo Järvinen
68f8353b48 [TCP]: Rewrite SACK block processing & sack_recv_cache use
Key points of this patch are:

  - In case new SACK information is advance only type, no skb
    processing below previously discovered highest point is done
  - Optimize cases below highest point too since there's no need
    to always go up to highest point (which is very likely still
    present in that SACK), this is not entirely true though
    because I'm dropping the fastpath_skb_hint which could
    previously optimize those cases even better. Whether that's
    significant, I'm not too sure.

Currently it will provide skipping by walking. Combined with
RB-tree, all skipping would become fast too regardless of window
size (can be done incrementally later).

Previously a number of cases in TCP SACK processing fails to
take advantage of costly stored information in sack_recv_cache,
most importantly, expected events such as cumulative ACK and new
hole ACKs. Processing on such ACKs result in rather long walks
building up latencies (which easily gets nasty when window is
huge). Those latencies are often completely unnecessary
compared with the amount of _new_ information received, usually
for cumulative ACK there's no new information at all, yet TCP
walks whole queue unnecessary potentially taking a number of
costly cache misses on the way, etc.!

Since the inclusion of highest_sack, there's a lot information
that is very likely redundant (SACK fastpath hint stuff,
fackets_out, highest_sack), though there's no ultimate guarantee
that they'll remain the same whole the time (in all unearthly
scenarios). Take advantage of this knowledge here and drop
fastpath hint and use direct access to highest SACKed skb as
a replacement.

Effectively "special cased" fastpath is dropped. This change
adds some complexity to introduce better coveraged "fastpath",
though the added complexity should make TCP behave more cache
friendly.

The current ACK's SACK blocks are compared against each cached
block individially and only ranges that are new are then scanned
by the high constant walk. For other parts of write queue, even
when in previously known part of the SACK blocks, a faster skip
function is used (if necessary at all). In addition, whenever
possible, TCP fast-forwards to highest_sack skb that was made
available by an earlier patch. In typical case, no other things
but this fast-forward and mandatory markings after that occur
making the access pattern quite similar to the former fastpath
"special case".

DSACKs are special case that must always be walked.

The local to recv_sack_cache copying could be more intelligent
w.r.t DSACKs which are likely to be there only once but that
is left to a separate patch.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:07 -08:00
Ilpo Järvinen
a47e5a988a [TCP]: Convert highest_sack to sk_buff to allow direct access
It is going to replace the sack fastpath hint quite soon... :-)

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:03 -08:00
Pavel Emelyanov
c0ef877b2c [NET]: Move sock_valbool_flag to socket.c
The sock_valbool_flag() helper is used in setsockopt to
set or reset some flag on the sock. This helper is required
in the net/socket.c only, so move it there.

Besides, patch two places in sys_setsockopt() that repeat
this helper functionality manually.

Since this is not a bugfix, but a trivial cleanup, I
prepared this patch against net-2.6.25, but it also
applies (with a single offset) to the latest net-2.6.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:00 -08:00
Eric Dumazet
20fea08b5f [NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections.
Qdisc_class_ops are const, and Qdisc_ops are mostly read.

Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:58 -08:00
YOSHIFUJI Hideaki
2a8cc6c890 [IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.
Policy table is implemented as an RCU linear list since we do not expect
large list nor frequent updates.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:58 -08:00
David S. Miller
294b4baf29 [IPSEC]: Kill afinfo->nf_post_routing
After changeset:

	[NETFILTER]: Introduce NF_INET_ hook values

It always evaluates to NF_INET_POST_ROUTING.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:55 -08:00
Patrick McHardy
6e23ae2a48 [NETFILTER]: Introduce NF_INET_ hook values
The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:55 -08:00
Herbert Xu
1bf06cd2e3 [IPSEC]: Add async resume support on input
This patch adds support for async resumptions on input.  To do so, the
transform would return -EINPROGRESS and subsequently invoke the
function xfrm_input_resume to resume processing.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:54 -08:00
Herbert Xu
60d5fcfb19 [IPSEC]: Remove nhoff from xfrm_input
The nhoff field isn't actually necessary in xfrm_input.  For tunnel
mode transforms we now throw away the output IP header so it makes no
sense to fill in the nexthdr field.  For transport mode we can now let
the function transport_finish do the setting and it knows where the
nexthdr field is.

The only other thing that needs the nexthdr field to be set is the
header extraction code.  However, we can simply move the protocol
extraction out of the generic header extraction.

We want to minimise the amount of info we have to carry around between
transforms as this simplifies the resumption process for async crypto.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:53 -08:00
Herbert Xu
d26f398400 [IPSEC]: Make x->lastused an unsigned long
Currently x->lastused is u64 which means that it cannot be
read/written atomically on all architectures.  David Miller observed
that the value stored in it is only an unsigned long which is always
atomic.

So based on his suggestion this patch changes the internal
representation from u64 to unsigned long while the user-interface
still refers to it as u64.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:52 -08:00
Herbert Xu
716062fd4c [IPSEC]: Merge most of the input path
As part of the work on asynchronous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common input code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:50 -08:00
Herbert Xu
c6581a457e [IPSEC]: Add async resume support on output
This patch adds support for async resumptions on output.  To do so,
the transform would return -EINPROGRESS and subsequently invoke the
function xfrm_output_resume to resume processing.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:49 -08:00
Herbert Xu
862b82c6f9 [IPSEC]: Merge most of the output path
As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common output code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:48 -08:00
Herbert Xu
ef76bc23ef [IPV6]: Add ip6_local_out
Most callers of the LOCAL_OUT chain will set the IP packet length
before doing so.  They also share the same output function dst_output.

This patch creates a new function called ip6_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:47 -08:00
Herbert Xu
c439cb2e4b [IPV4]: Add ip_local_out
Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so.  They also share the same output
function dst_output.

This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:47 -08:00
Herbert Xu
227620e295 [IPSEC]: Separate inner/outer mode processing on input
With inter-family transforms the inner mode differs from the outer
mode.  Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.

This patch separates the two parts on the input path so that each
function deals with one family only.

In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb.  This is then used by the inner mode
input functions to modify the inner IP header.  In this way the input
function no longer has to know about the outer address family.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:46 -08:00
Herbert Xu
36cf9acf93 [IPSEC]: Separate inner/outer mode processing on output
With inter-family transforms the inner mode differs from the outer
mode.  Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.

This patch separates the two parts on the output path so that each
function deals with one family only.

In particular, the functions xfrm4_extract_output/xfrm6_extract_output
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb.  This is then used by the outer mode
output functions to write the outer IP header.  In this way the output
function no longer has to know about the inner address family.

Since the extract functions are only called by tunnel modes (the only
modes that can support inter-family transforms), I've also moved the
xfrm*_tunnel_check_size calls into them.  This allows the correct ICMP
message to be sent as opposed to now where you might call icmp_send
with an IPv6 packet and vice versa.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:45 -08:00
Herbert Xu
29bb43b4ec [INET]: Give outer DSCP directly to ip*_copy_dscp
This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
that they directly take the outer DSCP rather than the outer IP header.
This will help us to unify the code for inter-family tunnels.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:45 -08:00
Herbert Xu
25ee3286dc [IPSEC]: Merge common code into xfrm_bundle_create
Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common.  This patch extracts that logic and puts it into
xfrm_bundle_create.  The rest of it are then accessed through afinfo.

As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.

This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:43 -08:00
Herbert Xu
66cdb3ca27 [IPSEC]: Move flow construction into xfrm_dst_lookup
This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function.  It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.

This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:42 -08:00
Herbert Xu
f04e7e8d7f [IPSEC]: Replace x->type->{local,remote}_addr with flags
The functions local_addr and remote_addr are more than what they're
needed for.  The same thing can be done easily with flags on the type
object.  This patch does that and simplifies the wrapper functions in
xfrm6_policy accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:41 -08:00
Herbert Xu
274b3426db [NET]: Remove unnecessary inclusion of dst.h
The file net/netevent.h only refers to struct dst_entry * so it
doesn't need to include dst.h.  I've replaced it with a forward
declaration.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:38 -08:00
Herbert Xu
352e512c32 [NET]: Eliminate duplicate copies of dst_discard
We have a number of copies of dst_discard scattered around the place
which all do the same thing, namely free a packet on the input or
output paths.

This patch deletes all of them except dst_discard and points all the
users to it.

The only non-trivial bit is decnet where it returns an error.
However, conceptually this is identical to the blackhole functions
used in IPv4 and IPv6 which do not return errors.  So they should
either all return errors or all return zero.  For now I've stuck with
the majority and picked zero as the return value.

It doesn't really matter in practice since few if any driver would
react differently depending on a zero return value or NET_RX_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:37 -08:00
Herbert Xu
b4ce92775c [IPV6]: Move nfheader_len into rt6_info
The dst member nfheader_len is only used by IPv6.  It's also currently
creating a rather ugly alignment hole in struct dst.  Therefore this patch
moves it from there into struct rt6_info.

It also reorders the fields in rt6_info to minimize holes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:37 -08:00
Wang Chen
33c732c361 [IPV4]: Add raw drops counter.
Add raw drops counter for IPv4 in /proc/net/raw .

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:33 -08:00
Jens Axboe
9c55e01c0c [TCP]: Splice receive support.
Support for network splice receive.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:31 -08:00
Rolf Manderscheid
a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Eric Dumazet
9d3e44425e [SOCK]: Adds a rcu_dereference() in sk_filter
It seems commit fda9ef5d67 introduced a RCU 
protection for sk_filter(), without a rcu_dereference()

Either we need a rcu_dereference(), either a comment should explain why we 
dont need it. I vote for the former.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:41:28 -08:00
Eric Dumazet
0f99be0d11 [XFRM]: xfrm_algo_clone() allocates too much memory
alg_key_len is the length in bits of the key, not in bytes.

Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c 
to include/net/xfrm.h, and to use it in xfrm_algo_clone()

alg_len() is renamed to xfrm_alg_len() because of its global exposition.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:39:06 -08:00
Paul Moore
02f1c89d6e [NET]: Clone the sk_buff 'iif' field in __skb_clone()
Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
on the 'iif' field to determine the receiving network interface of
inbound packets.  Unfortunately, at present this field is not
preserved across a skb clone operation which can lead to garbage
values if the cloned skb is sent back through the network stack.  This
patch corrects this problem by properly copying the 'iif' field in
__skb_clone() and removing the 'iif' field assignment from
skb_act_clone() since it is no longer needed.

Also, while we are here, put the assignments in the same order as the
offsets to reduce cacheline bounces.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:17 -08:00
Vlad Yasevich
f691724c4d [SCTP]: Fix the name of the authentication event.
The even should be called SCTP_AUTHENTICATION_INDICATION.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:02 -08:00
Stephen Hemminger
ecef969e5b [VETH]: move veth.h to include/linux
Move veth.h from net/ to linux/ since it is a user api, and add it to
user header processing Kbuild.

[ Use header-y as suggested by Sam Ravnborg.  -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-26 19:36:35 -08:00
Patrick McHardy
fae718ddaf [NETFILTER]: nf_conntrack_ipv4: fix module parameter compatibility
Some users do "modprobe ip_conntrack hashsize=...". Since we have the
module aliases this loads nf_conntrack_ipv4 and nf_conntrack, the
hashsize parameter is unknown for nf_conntrack_ipv4 however and makes
it fail.

Allow to specify hashsize= for both nf_conntrack and nf_conntrack_ipv4.

Note: the nf_conntrack message in the ringbuffer will display an
incorrect hashsize since nf_conntrack is first pulled in as a
dependency and calculates the size itself, then it gets changed
through a call to nf_conntrack_set_hashsize().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-26 19:36:33 -08:00
Joe Perches
f4ab2f72e9 [NET] include/net/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-20 13:56:32 -08:00
Vlad Yasevich
8e71a11c9f [SCTP]: Fix the bind_addr info during migration.
During accept/migrate the code attempts to copy the addresses from
the parent endpoint to the new endpoint.   However, if the parent
was bound to a wildcard address, then we end up pointlessly copying
all of the current addresses on the system.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-07 01:07:49 -08:00
Denis V. Lunev
56c99d0415 [IPV4]: Remove prototype of ip_rt_advice
ip_rt_advice has been gone, so no need to keep prototype and debug message.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-07 01:07:38 -08:00
Vlad Yasevich
b7e0fe9f81 SCTP: Fix build issues with SCTP AUTH.
SCTP-AUTH requires selection of CRYPTO, HMAC and SHA1 since
SHA1 is a MUST requirement for AUTH.  We also support SHA256,
but that's optional, so fix the code to treat it as such.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-29 10:17:42 -05:00
Pavel Emelyanov
218ad12f42 [IPV4]: Fix memory leak in inet_hashtables.h when NUMA is on
The inet_ehash_locks_alloc() looks like this:

#ifdef CONFIG_NUMA
	if (size > PAGE_SIZE)
		x = vmalloc(...);
	else
#endif
		x = kmalloc(...);

Unlike it, the inet_ehash_locks_alloc() looks like this:

#ifdef CONFIG_NUMA
	if (size > PAGE_SIZE)
		vfree(x);
	else
#else
		kfree(x);
#endif

The error is obvious - if the NUMA is on and the size
is less than the PAGE_SIZE we leak the pointer (kfree is
inside the #else branch).

Compiler doesn't warn us because after the kfree(x) there's
a "x = NULL" assignment, so here's another (minor?) bug: we 
don't set x to NULL under certain circumstances.

Boring explanation, I know... Patch explains it better.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-11-26 20:23:31 +08:00
David S. Miller
53438e5d04 Merge branch 'fixes-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2007-11-20 17:24:29 -08:00
Guillaume Chazarain
92468c53cf ieee80211: Stop net_ratelimit/IEEE80211_DEBUG_DROP log pollution
if (net_ratelimit())
	IEEE80211_DEBUG_DROP(...)

can pollute the logs with messages like:

printk: 1 messages suppressed.
printk: 2 messages suppressed.
printk: 7 messages suppressed.

if debugging information is disabled. These messages are printed by
net_ratelimit(). Add a wrapper to net_ratelimit() that takes into account
the log level, so that net_ratelimit() is called only when we really want
to print something.

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-11-20 16:43:17 -05:00
Ilpo Jrvinen
6e42141009 [TCP] MTUprobe: fix potential sk_send_head corruption
When the abstraction functions got added, conversion here was
made incorrectly. As a result, the skb may end up pointing
to skb which got included to the probe skb and then was freed.
For it to trigger, however, skb_transmit must fail sending as
well.

Signed-off-by: Ilpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 23:24:09 -08:00
Simon Horman
9055fa1f3d [IPVS]: Move remaining sysctl handlers over to CTL_UNNUMBERED
Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
I stronly doubt that anyone is using the sys_sysctl interface to
these variables.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 21:51:13 -08:00
Simon Horman
9e103fa6bd [IPVS]: Fix sysctl warnings about missing strategy in schedulers
sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy

Switch these entried over to use CTL_UNNUMBERED as clearly
the sys_syscal portion wasn't working.

This is along the same lines as Christian Borntraeger's patch that fixes
up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 21:50:21 -08:00
Christian Borntraeger
611cd55b15 [IPVS]: Fix sysctl warnings about missing strategy
Running the latest git code I get the following messages during boot:
sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
[...]		  
sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy

I removed the binary sysctl handler for those messages and also removed
the definitions in ip_vs.h. The alternative would be to implement a 
proper strategy handler, but syscall sysctl is deprecated.

There are other sysctl definitions that are commented out or work with 
the default sysctl_data strategy. I did not touch these. 

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 21:49:25 -08:00
Herbert Xu
21df56c6e2 [TCP]: Fix TCP header misalignment
Indeed my previous change to alloc_pskb has made it possible
for the TCP header to be misaligned iff the MTU is not a multiple
of 4 (and less than a page).  So I suspect the optimised IPsec
MTU calculation is giving you just such an MTU :)

This patch fixes it by changing alloc_pskb to make sure that
the size is at least 32-bit aligned.  This does not cause the
problem fixed by the previous patch because max_header is always
32-bit aligned which means that in the SG/NOTSO case this will
be a no-op.

I thought about putting this in the callers but all the current
callers are from TCP.  If and when we get a non-TCP caller we
can always create a TCP wrapper for this function and move the
alignment over there.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-18 18:48:08 -08:00
Pavel Emelyanov
dab6ba3688 [INET]: Fix potential kfree on vmalloc-ed area of request_sock_queue
The request_sock_queue's listen_opt is either vmalloc-ed or
kmalloc-ed depending on the number of table entries. Thus it 
is expected to be handled properly on free, which is done in 
the reqsk_queue_destroy().

However the error path in inet_csk_listen_start() calls 
the lite version of reqsk_queue_destroy, called 
__reqsk_queue_destroy, which calls the kfree unconditionally. 

Fix this and move the __reqsk_queue_destroy into a .c file as 
it looks too big to be inline.

As David also noticed, this is an error recovery path only,
so no locking is required and the lopt is known to be not NULL.

reqsk_queue_yank_listen_sk is also now only used in
net/core/request_sock.c so we should move it there too.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-15 02:57:06 -08:00
Herbert Xu
fb93134dfc [TCP]: Fix size calculation in sk_stream_alloc_pskb
We round up the header size in sk_stream_alloc_pskb so that
TSO packets get zero tail room.  Unfortunately this rounding
up is not coordinated with the select_size() function used by
TCP to calculate the second parameter of sk_stream_alloc_pskb.

As a result, we may allocate more than a page of data in the
non-TSO case when exactly one page is desired.

In fact, rounding up the head room is detrimental in the non-TSO
case because it makes memory that would otherwise be available to
the payload head room.  TSO doesn't need this either, all it wants
is the guarantee that there is no tail room.

So this patch fixes this by adjusting the skb_reserve call so that
exactly the requested amount (which all callers have calculated in
a precise way) is made available as tail room.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-14 15:45:21 -08:00
Denis V. Lunev
022cbae611 [NET]: Move unneeded data to initdata section.
This patch reverts Eric's commit 2b008b0a8e

It diets .text & .data section of the kernel if CONFIG_NET_NS is not set.
This is safe after list operations cleanup.

Signed-of-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-13 03:23:50 -08:00
Pavel Emelyanov
d71209ded2 [INET]: Use list_head-s in inetpeer.c
The inetpeer.c tracks the LRU list of inet_perr-s, but makes
it by hands. Use the list_head-s for this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12 21:27:28 -08:00
Arnaldo Carvalho de Melo
c0d8248710 [INET]: Remove leftover prototypes from include/net/inet_common.h
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12 21:02:51 -08:00
David S. Miller
bce943278d Merge branch 'pending' of master.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev 2007-11-12 18:16:13 -08:00
Denis V. Lunev
2994c63863 [INET]: Small possible memory leak in FIB rules
This patch fixes a small memory leak. Default fib rules can be deleted by
the user if the rule does not carry FIB_RULE_PERMANENT flag, f.e. by
	ip rule flush

Such a rule will not be freed as the ref-counter has 2 on start and becomes
clearly unreachable after removal.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10 22:12:03 -08:00
Pavel Emelyanov
9305cfa444 [AF_UNIX]: Make unix_tot_inflight counter non-atomic
This counter is _always_ modified under the unix_gc_lock spinlock, 
so its atomicity can be provided w/o additional efforts.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10 22:06:01 -08:00
Johannes Berg
56db6c52bb mac80211: remove unused driver ops
The driver operations set_ieee8021x(), set_port_auth() and
set_privacy_invoked() are not used by any drivers, except
set_privacy_invoked() they aren't even used by mac80211.
Remove them at least until we need to support drivers with
mac80211 that require getting this information.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-11-10 22:01:15 -08:00
Johannes Berg
830f903866 mac80211: allow driver to ask for a rate control algorithm
This allows a driver to ask for a specific rate control algorithm.
The rate control algorithm asked for must be registered and be
available as a module or built-in.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-11-10 21:59:54 -08:00
Pavel Emelyanov
03f49f3457 [NET]: Make helper to get dst entry and "use" it
There are many places that get the dst entry, increase the
__use counter and set the "lastuse" time stamp.

Make a helper for this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10 21:28:34 -08:00
Eric Dumazet
9e4505c459 [INET]: Add a missing include <linux/vmalloc.h> to inet_hashtables.h
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10 21:18:39 -08:00
Vlad Yasevich
fa7ff654e1 SCTP: Clean-up some defines for regressions tests.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-09 11:43:41 -05:00
Vlad Yasevich
7ab9080467 SCTP: Make sctp_verify_param return multiple indications.
SCTP-AUTH and future ADD-IP updates have a requirement to
do additional verification of parameters and an ability to
ABORT the association if verification fails.  So, introduce
additional return code so that we can clear signal a required
action.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-09 11:43:41 -05:00
Vlad Yasevich
d970dbf845 SCTP: Convert custom hash lists to use hlist.
Convert the custom hash list traversals to use hlist functions.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-09 11:43:40 -05:00
Vlad Yasevich
73d9c4fd1a SCTP: Allow ADD_IP to work with AUTH for backward compatibility.
This patch adds a tunable that will allow ADD_IP to work without
AUTH for backward compatibility.  The default value is off since
the default value for ADD_IP is off as well.  People who need
to use ADD-IP with older implementations take risks of connection
hijacking and should consider upgrading or turning this tunable on.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Vlad Yasevich
88799fe5ec SCTP: Correctly disable ADD-IP when AUTH is not supported.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Vlad Yasevich
0ed90fb0f6 SCTP: Update RCU handling during the ADD-IP case
After learning more about rcu, it looks like the ADD-IP hadling
doesn't need to call call_rcu_bh.  All the rcu critical sections
use rcu_read_lock, so using call_rcu_bh is wrong here.
Now, restore the local_bh_disable() code blocks and use normal
call_rcu() calls.  Also restore the missing return statement.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Vlad Yasevich
b6157d8e03 SCTP: Fix difference cases of retransmit.
Commit d0ce92910b broke several retransmit
cases including fast retransmit.  The reason is that we should
only delay by rto while doing retranmists as a result of a timeout.
Retransmit as a result of path mtu discover, fast retransmit, or
other evernts that should trigger immidiate retransmissions got broken.

Also, since rto is doubled prior to marking of packets elegable for
retransmission, we never marked correct chunks anyway.

The fix is provide a reason for a given retransmission so that we
can mark chunks appropriately and to save the old rto value to do
comparisons against.

All regressions tests passed with this code.

Spotted by Wei Yongjun <yjwei@cn.fujitsu.com>

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Eric Dumazet
230140cffa [INET]: Remove per bucket rwlock in tcp/dccp ehash table.
As done two years ago on IP route cache table (commit
22c047ccbc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:11 -08:00
Rumen G. Bogdanovski
efac52762b [IPVS]: Synchronize closing of Connections
This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.

Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:10 -08:00
Rumen G. Bogdanovski
1e356f9cdf [IPVS]: Bind connections on stanby if the destination exists
This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          0
  -> node484.local:5999           Route   1000   0          0

while "ipvs -lnc" is right
root@test2:~# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          1
  -> node484.local:5999           Route   1000   0          1

rumen@test2:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  -> C0A80033:176F      Route   1000   0          1
  -> C0A80032:176F      Route   1000   0          1

Regards,
Rumen Bogdanovski

Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2007-11-07 04:15:09 -08:00
Pavel Emelyanov
c3e9a353d8 [IPV4]: Compact some ifdefs in the fib code.
There are places that check for CONFIG_IP_MULTIPLE_TABLES
twice in the same file, but the internals of these #ifdefs
can be merged.

As a side effect - remove one ifdef from inside a function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:11:41 -08:00
Eric Dumazet
286ab3d460 [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way.
"struct proto" currently uses an array stats[NR_CPUS] to track change on
'inuse' sockets per protocol.

If NR_CPUS is big, this means we use a big memory area for this.
Moreover, all this memory area is located on a single node on NUMA
machines, increasing memory pressure on the boot node.

In this patch, I tried to :

- Keep a fast !CONFIG_SMP implementation
- Keep a fast CONFIG_SMP implementation for often used protocols
(tcp,udp,raw,...)
- Introduce a NUMA efficient implementation

Some helper macros are defined in include/net/sock.h
These macros take into account CONFIG_SMP

If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
REF_PROTO_INUSE
macros, it will automatically use a default implementation, using a
dynamically allocated percpu zone.
This default implementation will be NUMA efficient, but might use 32/64
bytes per possible cpu
because of current alloc_percpu() implementation.
However it still should be better than previous implementation based on
stats[NR_CPUS] field.

When a "struct proto" is changed to use the new macros, we use a single
static "int" percpu variable,
lowering the memory and cpu costs, still preserving NUMA efficiency.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:57 -08:00
Adrian Bunk
87ae9afdca cleanup asm/scatterlist.h includes
Not architecture specific code should not #include <asm/scatterlist.h>.

This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-02 08:47:06 +01:00
Pavel Emelyanov
d46557955f [NET]: Relax the reference counting of init_net_ns
When the CONFIG_NET_NS is n there's no need in refcounting
the initial net namespace. So relax this code by making a
stupid stubs for the "n" case.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:43:49 -07:00
Pavel Emelyanov
6257ff2177 [NET]: Forget the zero_it argument of sk_alloc()
Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.

Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.

This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope 
this particular split helped.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:39:31 -07:00
Pavel Emelyanov
f1a6c4da14 [NET]: Move the sock_copy() from the header
The sock_copy() call is not used outside the sock.c file,
so just move it into a sock.c

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:29:45 -07:00
Al Viro
d06f608265 SCTP endianness annotations regression
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-29 07:41:32 -07:00
Eric W. Biederman
2b008b0a8e [NET]: Marking struct pernet_operations __net_initdata was inappropriate
It is not safe to to place struct pernet_operations in a special section.
We need struct pernet_operations to last until we call unregister_pernet_subsys.
Which doesn't happen until module unload.

So marking struct pernet_operations is a disaster for modules in two ways.
- We discard it before we call the exit method it points to.
- Because I keep struct pernet_operations on a linked list discarding
  it for compiled in code removes elements in the middle of a linked
  list and does horrible things for linked insert.

So this looks safe assuming __exit_refok is not discarded
for modules.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 22:54:53 -07:00
Adrian Bunk
8ad7c62b75 [SCTP] net/sctp/auth.c: make 3 functions static
This patch makes three needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 04:21:23 -07:00
Adrian Bunk
d84d64dcb3 [SCTP]: #if 0 sctp_update_copy_cksum()
sctp_update_copy_cksum() is no longer used.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 04:07:20 -07:00
Adrian Bunk
d76081f875 [IRDA]: Make ircomm_tty static.
ircomm_tty can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 03:56:43 -07:00
Jamal Hadi Salim
12da81d11a [NET_CLS_ACT]: Introduce skb_act_clone
Reworked skb_clone looks uglier with the single ifdef
CONFIG_NET_CLS_ACT This patch introduces skb_act_clone which will
replace skb_clone in tc actions

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 02:47:23 -07:00
Vlad Yasevich
fee9dee730 [UDP]: Make use of inet_iif() when doing socket lookups.
UDP currently uses skb->dev->ifindex which may provide the wrong
information when the socket bound to a specific interface.
This patch makes inet_iif() accessible to UDP and makes UDP use it.

The scenario we are trying to fix is when a client is running on
the same system and the server and both client and server bind to
a non-loopback device.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-25 18:54:46 -07:00
Pavel Emelyanov
a37ae4086e [NET]: Don't declare extern variables in net/core/sysctl_net_core.c
Some are already declared in include/linux/netdevice.h, while
some others (xfrm ones) need to be declared.

The driver/net/rrunner.c just uses same extern as well, so
cleanup it also.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-23 21:27:56 -07:00
Chuck Lever
c1bd24b768 [TCP]: Remove unneeded implicit type cast when calling tcp_minshall_update()
The tcp_minshall_update() function is called in exactly one place, and is
passed an unsigned integer for the mss_len argument.  Make the sign of the
argument match the sign of the passed variable in order to eliminate an
unneeded implicit type cast and a mixed sign comparison in
tcp_minshall_update().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-23 21:27:55 -07:00
Marcel Holtmann
b6a0dc8224 [Bluetooth] Add support for handling simple eSCO links
With the Bluetooth 1.2 specification the Extended SCO feature for
better audio connections was introduced. So far the Bluetooth core
wasn't able to handle any eSCO connections correctly. This patch
adds simple eSCO support while keeping backward compatibility with
older devices.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22 02:59:47 -07:00
Marcel Holtmann
6464f35f37 [Bluetooth] Fall back to L2CAP in basic mode
In case the remote entity tries to negogiate retransmission or flow
control mode, reject it and fall back to basic mode.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22 02:59:43 -07:00
Marcel Holtmann
4e8402a3f8 [Bluetooth] Retrieve L2CAP features mask on connection setup
The Bluetooth 1.2 specification introduced a specific features mask
value to interoperate with newer versions of the specification. So far
this piece of information was never needed, but future extensions will
rely on it. This patch adds a generic way to retrieve this information
only once per connection setup.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22 02:59:41 -07:00
Marcel Holtmann
861d6882b3 [Bluetooth] Remove global conf_mtu variable from L2CAP
After the change to the L2CAP configuration parameter handling the
global conf_mtu variable is no longer needed and so remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22 02:59:41 -07:00
Marcel Holtmann
a9de924806 [Bluetooth] Switch from OGF+OCF to using only opcodes
The Bluetooth HCI commands are divided into logical OGF groups for
easier identification of their purposes. While this still makes sense
for the written specification, its makes the code only more complex
and harder to read. So instead of using separate OGF and OCF values
to identify the commands, use a common 16-bit opcode that combines
both values. As a side effect this also reduces the complexity of
OGF and OCF calculations during command header parsing.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-10-22 02:59:40 -07:00
Jean Delvare
c03983ac9b Spelling fix: explicitly
From: Jean Delvare <khali@linux-fr.org>

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19 23:22:55 +02:00
Pavel Emelyanov
ba25f9dcc4 Use helpers to obtain task pid in printks
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:43 -07:00
Pavel Emelyanov
b488893a39 pid namespaces: changes to show virtual ids to user
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.

The idea is:
 - all in-kernel data structures must store either struct pid itself
   or the pid's global nr, obtained with pid_nr() call;
 - when seeking the task from kernel code with the stored id one
   should use find_task_by_pid() call that works with global pids;
 - when showing pid's numerical value to the user the virtual one
   should be used, but however when one shows task's pid outside this
   task's namespace the global one is to be used;
 - when getting the pid from userspace one need to consider this as
   the virtual one and use appropriate task/pid-searching functions.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:40 -07:00
Herbert Xu
13996378e6 [IPSEC]: Rename mode to outer_mode and add inner_mode
This patch adds a new field to xfrm states called inner_mode.  The existing
mode object is renamed to outer_mode.

This is the first part of an attempt to fix inter-family transforms.  As it
is we always use the outer family when determining which mode to use.  As a
result we may end up shoving IPv4 packets into netfilter6 and vice versa.

What we really want is to use the inner family for the first part of outbound
processing and the outer family for the second part.  For inbound processing
we'd use the opposite pairing.

I've also added a check to prevent silly combinations such as transport mode
with inter-family transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 21:35:51 -07:00
Herbert Xu
17c2a42a24 [IPSEC]: Store afinfo pointer in xfrm_mode
It is convenient to have a pointer from xfrm_state to address-specific
functions such as the output function for a family.  Currently the
address-specific policy code calls out to the xfrm state code to get
those pointers when we could get it in an easier way via the state
itself.

This patch adds an xfrm_state_afinfo to xfrm_mode (since they're
address-specific) and changes the policy code to use it.  I've also
added an owner field to do reference counting on the module providing
the afinfo even though it isn't strictly necessary today since IPv6
can't be unloaded yet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 21:33:12 -07:00
Herbert Xu
1bfcb10f67 [IPSEC]: Add missing BEET checks
Currently BEET mode does not reinject the packet back into the stack
like tunnel mode does.  Since BEET should behave just like tunnel mode
this is incorrect.

This patch fixes this by introducing a flags field to xfrm_mode that
tells the IPsec code whether it should terminate and reinject the packet
back into the stack.

It then sets the flag for BEET and tunnel mode.

I've also added a number of missing BEET checks elsewhere where we check
whether a given mode is a tunnel or not.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 21:31:50 -07:00
Herbert Xu
aa5d62cc87 [IPSEC]: Move type and mode map into xfrm_state.c
The type and mode maps are only used by SAs, not policies.  So it makes
sense to move them from xfrm_policy.c into xfrm_state.c.  This also allows
us to mark xfrm_get_type/xfrm_put_type/xfrm_get_mode/xfrm_put_mode as
static.

The only other change I've made in the move is to get rid of the casts
on the request_module call for types.  They're unnecessary because C
will promote them to ints anyway.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 21:31:12 -07:00
Herbert Xu
33b5ecb8f6 [IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi
Currently xfrm6_rcv_spi gets the nexthdr value itself from the packet.
This means that we need to fix up the value in case we have a 4-on-6
tunnel.  Moving this logic into the caller simplifies things and allows
us to merge the code with IPv4.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 21:29:25 -07:00
Herbert Xu
c4541b41c0 [IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input
This patch moves the tunnel parsing for IPv4 out of xfrm4_input and into
xfrm4_tunnel.  This change is in line with what IPv6 does and will allow
us to merge the two input functions.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 21:28:53 -07:00
Pavel Emelyanov
47e958eac2 [NET]: Fix the race between sk_filter_(de|at)tach and sk_clone()
The proposed fix is to delay the reference counter decrement
until the quiescent state pass. This will give sk_clone() a
chance to get the reference on the cloned filter.

Regular sk_filter_uncharge can happen from the sk_free() only
and there's no need in delaying the put - the socket is dead
anyway and is to be release itself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 21:22:42 -07:00
Pavel Emelyanov
309dd5fc87 [NET]: Move the filter releasing into a separate call
This is done merely as a preparation for the fix.

The sk_filter_uncharge() unaccounts the filter memory and calls
the sk_filter_release(), which in turn decrements the refcount
anf frees the filter.

The latter function will be required separately.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 21:21:51 -07:00
Pavel Emelyanov
48d6005638 [INET]: Remove no longer needed ->equal callback
Since this callback is used to check for conflicts in
hashtable when inserting a newly created frag queue, we can
do the same by checking for matching the queue with the 
argument, used to create one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 19:47:56 -07:00
Pavel Emelyanov
abd6523d15 [INET]: Consolidate xxx_find() in fragment management
Here we need another callback ->match to check whether the
entry found in hash matches the key passed. The key used 
is the same as the creation argument for inet_frag_create.

Yet again, this ->match is the same for netfilter and ipv6.
Running a frew steps forward - this callback will later
replace the ->equal one.

Since the inet_frag_find() uses the already consolidated
inet_frag_create() remove the xxx_frag_create from protocol
codes.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 19:47:21 -07:00
Pavel Emelyanov
c6fda28229 [INET]: Consolidate xxx_frag_create()
This one uses the xxx_frag_intern() and xxx_frag_alloc()
routines, which are already consolidated, so remove them
from protocol code (as promised).

The ->constructor callback is used to init the rest of
the frag queue and it is the same for netfilter and ipv6.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 19:46:47 -07:00
Pavel Emelyanov
e521db9d79 [INET]: Consolidate xxx_frag_alloc()
Just perform the kzalloc() allocation and setup common
fields in the inet_frag_queue(). Then return the result
to the caller to initialize the rest.

The inet_frag_alloc() may return NULL, so check the 
return value before doing the container_of(). This looks 
ugly, but the xxx_frag_alloc() will be removed soon.

The xxx_expire() timer callbacks are patches, 
because the argument is now the inet_frag_queue, not 
the protocol specific queue.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 19:45:23 -07:00
Pavel Emelyanov
2588fe1d78 [INET]: Consolidate xxx_frag_intern
This routine checks for the existence of a given entry
in the hash table and inserts the new one if needed.

The ->equal callback is used to compare two frag_queue-s
together, but this one is temporary and will be removed
later. The netfilter code and the ipv6 one use the same
routine to compare frags.

The inet_frag_intern() always returns non-NULL pointer,
so convert the inet_frag_queue into protocol specific
one (with the container_of) without any checks.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 19:44:34 -07:00
Eric Van Hensbergen
982c37cfb6 9p: remove sysctl
A sysctl method was added to enable and disable debugging levels.  After
further review, it was decided that there are better approaches to doing this
and the sysctl methodology isn't really desirable.  This patch removes the
sysctl code from 9p.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-10-17 14:35:15 -05:00
Eric Van Hensbergen
fb0466c3ae 9p: fix bad kconfig cross-dependency
This patch moves transport dynamic registration and matching to the net
module to prevent a bad Kconfig dependency between the net and fs 9p modules.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-10-17 14:31:07 -05:00
Latchesar Ionkov
ba17674fe0 9p: attach-per-user
The 9P2000 protocol requires the authentication and permission checks to be
done in the file server. For that reason every user that accesses the file
server tree has to authenticate and attach to the server separately.
Multiple users can share the same connection to the server.

Currently v9fs does a single attach and executes all I/O operations as a
single user. This makes using v9fs in multiuser environment unsafe as it
depends on the client doing the permission checking.

This patch improves the 9P2000 support by allowing every user to attach
separately. The patch defines three modes of access (new mount option
'access'):

- attach-per-user (access=user) (default mode for 9P2000.u)
 If a user tries to access a file served by v9fs for the first time, v9fs
 sends an attach command to the server (Tattach) specifying the user. If
 the attach succeeds, the user can access the v9fs tree.
 As there is no uname->uid (string->integer) mapping yet, this mode works
 only with the 9P2000.u dialect.

- allow only one user to access the tree (access=<uid>)
 Only the user with uid can access the v9fs tree. Other users that attempt
 to access it will get EPERM error.

- do all operations as a single user (access=any) (default for 9P2000)
 V9fs does a single attach and all operations are done as a single user.
 If this mode is selected, the v9fs behavior is identical with the current
 one.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-10-17 14:31:07 -05:00
Eric Van Hensbergen
a80d923e13 9p: Make transports dynamic
This patch abstracts out the interfaces to underlying transports so that
new transports can be added as modules.  This should also allow kernel
configuration of transports without ifdef-hell.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-10-17 14:31:07 -05:00
Herbert Xu
e5bbef20e0 [IPV6]: Replace sk_buff ** with sk_buff * in input handlers
With all the users of the double pointers removed from the IPv6 input path,
this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input
handlers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:50:28 -07:00
Pavel Emelyanov
762cc40801 [INET]: Consolidate the xxx_put
These ones use the generic data types too, so move
them in one place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:43 -07:00
Pavel Emelyanov
8e7999c44e [INET]: Consolidate the xxx_evictor
The evictors collect some statistics for ipv4 and ipv6,
so make it return the number of evicted queues and account
them all at once in the caller.

The XXX_ADD_STATS_BH() macros are just for this case,
but maybe there are places in code, that can make use of
them as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:42 -07:00
Pavel Emelyanov
1e4b82873a [INET]: Consolidate the xxx_frag_destroy
To make in possible we need to know the exact frag queue
size for inet_frags->mem management and two callbacks:

 * to destoy the skb (optional, used in conntracks only)
 * to free the queue itself (mandatory, but later I plan to
   move the allocation and the destruction of frag_queues
   into the common place, so this callback will most likely
   be optional too).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:42 -07:00
Pavel Emelyanov
321a3a99e4 [INET]: Consolidate xxx_the secret_rebuild
This code works with the generic data types as well, so
move this into inet_fragment.c

This move makes it possible to hide the secret_timer
management and the secret_rebuild routine completely in
the inet_fragment.c

Introduce the ->hashfn() callback in inet_frags() to get
the hashfun for a given inet_frag_queue() object.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:41 -07:00
Pavel Emelyanov
277e650ddf [INET]: Consolidate the xxx_frag_kill
Since now all the xxx_frag_kill functions now work
with the generic inet_frag_queue data type, this can
be moved into a common place.

The xxx_unlink() code is moved as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:41 -07:00
Pavel Emelyanov
04128f233f [INET]: Collect common frag sysctl variables together
Some sysctl variables are used to tune the frag queues
management and it will be useful to work with them in
a common way in the future, so move them into one
structure, moreover they are the same for all the frag
management codes.

I don't place them in the existing inet_frags object,
introduced in the previous patch for two reasons:

 1. to keep them in the __read_mostly section;
 2. not to export the whole inet_frags objects outside.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:40 -07:00
Pavel Emelyanov
7eb95156d9 [INET]: Collect frag queues management objects together
There are some objects that are common in all the places
which are used to keep track of frag queues, they are:

 * hash table
 * LRU list
 * rw lock
 * rnd number for hash function
 * the number of queues
 * the amount of memory occupied by queues
 * secret timer

Move all this stuff into one structure (struct inet_frags)
to make it possible use them uniformly in the future. Like
with the previous patch this mostly consists of hunks like

-    write_lock(&ipfrag_lock);
+    write_lock(&ip4_frags.lock);

To address the issue with exporting the number of queues and
the amount of memory occupied by queues outside the .c file
they are declared in, I introduce a couple of helpers.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:39 -07:00
Pavel Emelyanov
5ab11c98d3 [INET]: Move common fields from frag_queues in one place.
Introduce the struct inet_frag_queue in include/net/inet_frag.h
file and place there all the common fields from three structs:

 * struct ipq in ipv4/ip_fragment.c
 * struct nf_ct_frag6_queue in nf_conntrack_reasm.c
 * struct frag_queue in ipv6/reassembly.c

After this, replace these fields on appropriate structures with
this structure instance and fix the users to use correct names
i.e. hunks like

-    atomic_dec(&fq->refcnt);
+    atomic_dec(&fq->q.refcnt);

(these occupy most of the patch)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:38 -07:00
Herbert Xu
3db05fea51 [NETFILTER]: Replace sk_buff ** with sk_buff *
With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:29 -07:00
Herbert Xu
af1e1cf073 [IPVS]: Replace local version of skb_make_writable
This patch removes the IPVS-specific version of skb_make_writable and
replaces it with the netfilter one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:28 -07:00
Herbert Xu
776c729e8d [IPV4]: Change ip_defrag to return an integer
Now that ip_frag always returns the packet given to it on input, we can
change it to return an integer indicating error instead.  This patch does
that and updates all its callers accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:25 -07:00
Pierre Ynard
31910575a9 [IPv6]: Export userland ND options through netlink (RDNSS support)
As discussed before, this patch provides userland with a way to access
relevant options in Router Advertisements, after they are processed
and validated by the kernel. Extra options are processed in a generic
way; this patch only exports RDNSS options described in RFC5006, but
support to control which options are exported could be easily added.

A new rtnetlink message type is defined, to transport Neighbor
Discovery options, along with optional context information. At the
moment only the address of the router sending an RDNSS option is
included, but additional attributes may be later defined, if needed by
new use cases.

Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:22:05 -07:00
Ingo Molnar
092e9d93b3 [9P]: build fix with !CONFIG_SYSCTL
found via make randconfig build testing: 

 net/built-in.o: In function `init_p9':
 mod.c:(.init.text+0x3b39): undefined reference to `p9_sysctl_register'
 net/built-in.o: In function `exit_p9':
 mod.c:(.exit.text+0x36b): undefined reference to `p9_sysctl_unregister'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:19:28 -07:00
Denis V. Lunev
cd40b7d398 [NET]: make netlink user -> kernel interface synchronious
This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced 
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:15:29 -07:00
Pierre Ynard
d1ec3b7722 [NETLINK]: Fix typos in comments in netlink.h
This patch fixes a few typos in comments in include/net/netlink.h

Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:09:48 -07:00
Stephen Hemminger
227b60f510 [INET]: local port range robustness
Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:46 -07:00
Stephen Hemminger
0639300900 [SCTP]: port randomization
Add port randomization rather than a simple fixed rover
for use with SCTP.  This makes it act similar to TCP, UDP, DCCP
when allocating ports.

No longer need port_alloc_lock as well (suggestion by Brian Haley).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:18 -07:00
Herbert Xu
87bdc48d30 [IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdr
This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since
they're identical to the IPv4 versions.  Duplicating them would only create
problems for ourselves later when we need to add things like extended
sequence numbers.

I've also added transport header type conversion headers for these types
which are now used by the transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:55 -07:00
Herbert Xu
37fedd3aab [IPSEC]: Use IPv6 calling convention as the convention for x->mode->output
The IPv6 calling convention for x->mode->output is more general and could
help an eventual protocol-generic x->type->output implementation.  This
patch adopts it for IPv4 as well and modifies the IPv4 type output functions
accordingly.

It also rewrites the IPv6 mac/transport header calculation to be based off
the network header where practical.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:54 -07:00
Herbert Xu
658b219e93 [IPSEC]: Move common code into xfrm_alloc_spi
This patch moves some common code that conceptually belongs to the xfrm core
from af_key/xfrm_user into xfrm_alloc_spi.

In particular, the spin lock on the state is now taken inside xfrm_alloc_spi.
Previously it also protected the construction of the response PF_KEY/XFRM
messages to user-space.  This is inconsistent as other identical constructions
are not protected by the state lock.  This is bad because they in fact should
be protected but only in certain spots (so as not to hold the lock for too
long which may cause packet drops).

The SPI byte order conversion has also been moved.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:01 -07:00
Pavel Emelyanov
4665079cbb [NETNS]: Move some code into __init section when CONFIG_NET_NS=n
With the net namespaces many code leaved the __init section,
thus making the kernel occupy more memory than it did before.
Since we have a config option that prohibits the namespace
creation, the functions that initialize/finalize some netns
stuff are simply not needed and can be freed after the boot.

Currently, this is almost not noticeable, since few calls
are no longer in __init, but when the namespaces will be
merged it will be possible to free more code. I propose to
use the __net_init, __net_exit and __net_initdata "attributes"
for functions/variables that are not used if the CONFIG_NET_NS
is not set to save more space in memory.

The exiting functions cannot just reside in the __exit section,
as noticed by David, since the init section will have
references on it and the compilation will fail due to modpost
checks. These references can exist, since the init namespace
never dies and the exit callbacks are never called. So I
introduce the __exit_refok attribute just like it is already
done with the __init_refok.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:58 -07:00
Herbert Xu
cdf7e668d4 [IPSEC]: Unexport xfrm_replay_notify
Now that the only callers of xfrm_replay_notify are in xfrm, we can remove
the export.

This patch also removes xfrm_aevent_doreplay since it's now called in just
one spot.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:55 -07:00
Herbert Xu
436a0a4022 [IPSEC]: Move output replay code into xfrm_output
The replay counter is one of only two remaining things in the output code
that requires a lock on the xfrm state (the other being the crypto).  This
patch moves it into the generic xfrm_output so we can remove the lock from
the transforms themselves.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:54 -07:00
Herbert Xu
83815dea47 [IPSEC]: Move xfrm_state_check into xfrm_output.c
The functions xfrm_state_check and xfrm_state_check_space are only used by
the output code in xfrm_output.c so we can move them over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:54 -07:00
Herbert Xu
406ef77c89 [IPSEC]: Move common output code to xfrm_output
Most of the code in xfrm4_output_one and xfrm6_output_one are identical so
this patch moves them into a common xfrm_output function which will live
in net/xfrm.

In fact this would seem to fix a bug as on IPv4 we never reset the network
header after a transform which may upset netfilter later on.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:53 -07:00
Herbert Xu
bc31d3b2c7 [IPSEC] ah: Remove keys from ah_data structure
The keys are only used during initialisation so we don't need to carry them
in esp_data.  Since we don't have to allocate them again, there is no need
to place a limit on the authentication key length anymore.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:53 -07:00
Herbert Xu
4b7137ff8f [IPSEC] esp: Remove keys from esp_data structure
The keys are only used during initialisation so we don't need to carry them
in esp_data.  Since we don't have to allocate them again, there is no need
to place a limit on the authentication key length anymore.

This patch also kills the unused auth.icv member.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:52 -07:00
Ursula Braun
f0703c80e5 [AF_IUCV]: postpone receival of iucv-packets
AF_IUCV socket programs may waste Linux storage, because af_iucv
allocates an skb whenever posted by the receive callback routine and
receives the message immediately.
Message receival is now postponed if data from previous callbacks has
not yet been transferred to the receiving socket program. Instead a
message handle is saved in a message queue as a reminder. Once
messages could be given to the receiving socket program, there is
an additional checking for entries in the message queue, followed
by skb allocation and message receival if applicable.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:51 -07:00
Heiko Carstens
57f2044803 [AF_IUCV]: remove static declarations from header file.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:51 -07:00
Stephen Hemminger
cfcabdcc2d [NET]: sparse warning fixes
Fix a bunch of sparse warnings. Mostly about 0 used as
NULL pointer, and shadowed variable declarations.
One notable case was that hash size should have been unsigned.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:48 -07:00
Michael Buesch
5ecc2a5d3e [MAC80211]: Update beacon_update callback documentation
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:21 -07:00
Tomas Winkler
478f8d2ba5 [MAC80211]: add sta_notify callback
This patch adds sta_notify callback and removes sta_table_notification
which was not used by any driver.
sta_notify() is essential for drivers that keeps notion of station
internally and need to be notified about removal or addition of a station
to the (I)BSS or assocation to an AP.

This version adds interface id to the parameter list
as suggested by Johannes Berg

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:21 -07:00
Michael Buesch
47f0c50220 [MAC80211]: Add association LED trigger
Many devices have LEDs to indicate the link status.
Export this functionality to drivers.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:20 -07:00
Johannes Berg
628a140ba0 [MAC80211]: remove ALG_NONE
This "algorithm" is used only internally and is not useful.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Michael Buesch <mb@bu3sch.de>
Acked-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:18 -07:00
Johannes Berg
f9d540ee5f [MAC80211]: remove management interface
Removes the management interface since it is only required
for hostapd/userspace MLME, will not be in the final tree
at least in this form and hostapd/userspace MLME currently
do not work against this tree anyway.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:15 -07:00
Johannes Berg
a289755250 [MAC80211]: add "invalid" interface type
Since I cannot convince the lazy driver authors (hello Michael)
to stop (ab)using the MGMT interface type internally in their
drivers, this patch introduces a new _INVALID type especially
for their use and changes all affected drivers to use it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:15 -07:00
Magnus Damm
89e536a190 ax88796: add 93cx6 eeprom support
Hook up the 93cx6 eeprom code to the ax88796 driver and modify the ax88796
driver to read out the mac address from the eeprom.  We need this for the
ax88796 on certain SuperH boards.  The pin configuration used to connect
the eeprom to the ax88796 on these boards is the same as pointed out by the
ax88796 datasheet, so we can probably reuse this code for multiple
platforms in the future.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-10 16:53:56 -07:00
Patrick McHardy
3583240249 [NETFILTER]: nf_conntrack_expect: kill unique ID
Similar to the conntrack ID, the per-expectation ID is not needed
anymore, kill it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:36 -07:00
Patrick McHardy
7f85f91472 [NETFILTER]: nf_conntrack: kill unique ID
Remove the per-conntrack ID, its not necessary anymore for dumping.
For compatiblity reasons we send the address of the conntrack to
userspace as ID.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:36 -07:00
Patrick McHardy
f73e924cdd [NETFILTER]: ctnetlink: use netlink policy
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:35 -07:00
Patrick McHardy
fdf708322d [NETFILTER]: nfnetlink: rename functions containing 'nfattr'
There is no struct nfattr anymore, rename functions to 'nlattr'.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:32 -07:00
Patrick McHardy
df6fb868d6 [NETFILTER]: nfnetlink: convert to generic netlink attribute functions
Get rid of the duplicated rtnetlink macros and use the generic netlink
attribute functions. The old duplicated stuff is moved to a new header
file that exists just for userspace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:31 -07:00
Johannes Berg
b4010e0890 [PATCH] mac80211: remove generic IE for AP interfaces
This is not useful since we do not support probe response
offload to hardware at this time and beacons are set in
another way.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:17 -07:00
Herbert Xu
b421995235 [PKT_SCHED]: Add stateless NAT
Stateless NAT is useful in controlled environments where restrictions are
placed on through traffic such that we don't need connection tracking to
correctly NAT protocol-specific data.

In particular, this is of interest when the number of flows or the number
of addresses being NATed is large, or if connection tracking information
has to be replicated and where it is not practical to do so.

Previously we had stateless NAT functionality which was integrated into
the IPv4 routing subsystem.  This was a great solution as long as the NAT
worked on a subnet to subnet basis such that the number of NAT rules was
relatively small.  The reason is that for SNAT the routing based system
had to perform a linear scan through the rules.

If the number of rules is large then major renovations would have take
place in the routing subsystem to make this practical.

For the time being, the least intrusive way of achieving this is to use
the u32 classifier written by Alexey Kuznetsov along with the actions
infrastructure implemented by Jamal Hadi Salim.

The following patch is an attempt at this problem by creating a new nat
action that can be invoked from u32 hash tables which would allow large
number of stateless NAT rules that can be used/updated in constant time.

The actual NAT code is mostly based on the previous stateless NAT code
written by Alexey.  In future we might be able to utilise the protocol
NAT code from netfilter to improve support for other protocols.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:11 -07:00
Johannes Berg
ea49c359f3 [PATCH] mac80211: remove crypto algorithm typedef
The typedef is not required, we can just use "enum ieee80211_key_alg"
instead of "ieee80211_key_alg"

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:00 -07:00
Johannes Berg
f97df02e23 [PATCH] wireless networking: move frame inline functions to generic header
These inlines are generally useful, not just with mac80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:52:59 -07:00
Johannes Berg
75a5f0ccfd [PATCH] mac80211: document a lot more
This patch adds a lot more documentation (in kernel-doc format)
to include/net/mac80211.h

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:52:59 -07:00
Johannes Berg
1bc0826c8f [PATCH] mac80211: renumber and document the hardware flags
Currently, hardware flags that drivers must set are not
documented well enough. Fix this.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:52:58 -07:00
Johannes Berg
0ec3ca4459 [PATCH] mac80211: validate VLAN interfaces better
This patch changes mac80211 to verify that VLAN interfaces
are valid and not bother drivers about them any more.
VLAN interfaces are now only valid when an AP interface
is up with the same MAC address, and are automatically
turned off when the AP interface is set down.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jouni Malinen <j@w1.fi>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:52:57 -07:00
Johannes Berg
4150c57212 [PATCH] mac80211: revamp interface and filter configuration
Drivers are currently supposed to keep track of monitor
interfaces if they allow so-called "hard" monitor, and
they are also supposed to keep track of multicast etc.

This patch changes that, replaces the set_multicast_list()
callback with a new configure_filter() callback that takes
filter flags (FIF_*) instead of interface flags (IFF_*).
For a driver, this means it should open the filter as much
as necessary to get all frames requested by the filter flags.
Accordingly, the filter flags are named "positively", e.g.
FIF_ALLMULTI.

Multicast filtering is a bit special in that drivers that
have no multicast address filters need to allow multicast
frames through when either the FIF_ALLMULTI flag is set or
when the mc_count value is positive.

At the same time, drivers are no longer notified about
monitor interfaces at all, this means they now need to
implement the start() and stop() callbacks and the new
change_filter_flags() callback. Also, the start()/stop()
ordering changed, start() is now called *before* any
add_interface() as it really should be, and stop() after
any remove_interface().

The patch also changes the behaviour of setting the bssid
to multicast for scanning when IEEE80211_HW_NO_PROBE_FILTERING
is set; the IEEE80211_HW_NO_PROBE_FILTERING flag is removed
and the filter flag FIF_BCN_PRBRESP_PROMISC introduced.
This is a lot more efficient for hardware like b43 that
supports it and other hardware can still set the BSSID
to all-ones.

Driver modifications by Johannes Berg (b43 & iwlwifi), Michael Wu
(rtl8187, adm8211, and p54), Larry Finger (b43legacy), and
Ivo van Doorn (rt2x00).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:52:57 -07:00
Eric W. Biederman
f4618d39a3 [NETNS]: Simplify the network namespace list locking rules.
Denis V. Lunev <den@sw.ru> noticed that the locking rules
for the network namespace list are over complicated and broken.

In particular the current register_netdev_notifier currently
does not take any lock making the for_each_net iteration racy
with network namespace creation and destruction. Oops.

The fact that we need to use for_each_net in rtnl_unlock() when
the rtnetlink support becomes per network namespace makes designing
the proper locking tricky.  In addition we need to be able to call
rtnl_lock() and rtnl_unlock() when we have the net_mutex held.

After thinking about it and looking at the alternatives carefully
it looks like the simplest and most maintainable solution is
to remove net_list_mutex altogether, and to use the rtnl_mutex instead.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:55 -07:00
Stephen Hemminger
3b04ddde02 [NET]: Move hardware header operations out of netdevice.
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Stephen Hemminger
0c4e85813d [NET]: Wrap netdevice hardware header creation.
Add inline for common usage of hardware header creation, and
fix bug in IPV6 mcast where the assumption about negative return is
an errno. Negative return from hard_header means not enough space
was available,(ie -N bytes).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:50 -07:00
Eric W. Biederman
2774c7aba6 [NET]: Make the loopback device per network namespace.
This patch makes loopback_dev per network namespace.  Adding
code to create a different loopback device for each network
namespace and adding the code to free a loopback device
when a network namespace exits.

This patch modifies all users the loopback_dev so they
access it as init_net.loopback_dev, keeping all of the
code compiling and working.  A later pass will be needed to
update the users to use something other than the initial network
namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:49 -07:00
Eric W. Biederman
9dd776b6d7 [NET]: Add network namespace clone & unshare support.
This patch allows you to create a new network namespace
using sys_clone, or sys_unshare.

As the network namespace is still experimental and under development
clone and unshare support is only made available when CONFIG_NET_NS is
selected at compile time.

As this patch introduces network namespace support into code paths
that exist when the CONFIG_NET is not selected there are a few
additions made to net_namespace.h to allow a few more functions
to be used when the networking stack is not compiled in.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:46 -07:00
Johannes Berg
9c7d7728ba [MAC80211]: remove tx info sw_retry_attempt member
This is unused.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:31 -07:00
Johannes Berg
6b301cdfad [MAC80211]: yet more documentation
Add more mac80211 documentation.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:30 -07:00
Johannes Berg
c33e3f3bcd [MAC80211]: remove IEEE80211_CONF_SSID_HIDDEN
The IEEE80211_CONF_SSID_HIDDEN setting is not useful for any driver
we have and should be a per-interface setting anyway. Remove it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:29 -07:00
Johannes Berg
72abd81b98 [MAC80211]: allow drivers to indicate failed FCS/PLCP checksum
This patch allows drivers to indicate bad FCS/PLCP CRC to the stack and
have the stack drop packets like that except for monitor interfaces.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:28 -07:00
Johannes Berg
501d857ec9 [IEEE80211]: Fix softmac lockdep reports.
It seems I was actually able to hit this deadlock, on my quad G5 softmac
locks up more often than not. This fixes it by using an own workqueue
that can safely be flushed under RTNL.

Not sure if the patch is correct with the workqueue naming. And don't
think with the patch it doesn't continually lock up. It still does, just
doesn't invoke lockdep warnings all the time.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:22 -07:00
Johannes Berg
5568296573 [NL80211]: add netlink interface to cfg80211
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:14 -07:00
David S. Miller
0800f17026 [TCP]: Minor coding style fixup.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:13 -07:00
Ilpo Järvinen
b76892051c [TCP]: Avoid clearing sacktag hint in trivial situations
There's no reason to clear the sacktag skb hint when small part
of the rexmit queue changes. Account changes (if any) instead when
fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
no need to discard SACK tag hint at all.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:12 -07:00
Ilpo Järvinen
5af4ec236f [TCP]: clear_all_retrans_hints prefixed by tcp_
In addition, fix its function comment spacing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
2007-10-10 16:52:09 -07:00
Joe Perches
0795af5729 [NET]: Introduce and use print_mac() and DECLARE_MAC_BUF()
This is nicer than the MAC_FMT stuff.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:42 -07:00
Vlad Yasevich
6b2f9cb64d [SCTP]: Tie ADD-IP and AUTH functionality as required by spec.
ADD-IP spec requires AUTH. It is, in fact, dangerous without AUTH.
So, disable ADD-IP functionality if the peer claims to support
ADD-IP, but not AUTH.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:33 -07:00
Vlad Yasevich
65b07e5d0d [SCTP]: API updates to suport SCTP-AUTH extensions.
Add SCTP-AUTH API.  The API implemented here was
agreed to between implementors at the 9th SCTP Interop.
It will be documented in the next revision of the
SCTP socket API spec.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:32 -07:00
Vlad Yasevich
bbd0d59809 [SCTP]: Implement the receive and verification of AUTH chunk
This patch implements the receive path needed to process authenticated
chunks.  Add ability to process the AUTH chunk and handle edge cases
for authenticated COOKIE-ECHO as well.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:31 -07:00
Vlad Yasevich
4cd57c8078 [SCTP]: Enable the sending of the AUTH chunk.
SCTP-AUTH, Section 6.2:

   Endpoints MUST send all requested chunks authenticated where this has
   been requested by the peer.  The other chunks MAY be sent
   authenticated or not.  If endpoint pair shared keys are used, one of
   them MUST be selected for authentication.

   To send chunks in an authenticated way, the sender MUST include these
   chunks after an AUTH chunk.  This means that a sender MUST bundle
   chunks in order to authenticate them.

   If the endpoint has no endpoint pair shared key for the peer, it MUST
   use Shared Key Identifier 0 with an empty endpoint pair shared key.
   If there are multiple endpoint shared keys the sender selects one and
   uses the corresponding Shared Key Identifier

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:31 -07:00
Vlad Yasevich
730fc3d05c [SCTP]: Implete SCTP-AUTH parameter processing
Implement processing for the CHUNKS, RANDOM, and HMAC parameters and
deal with how this parameters are effected by association restarts.
In particular, during unexpeted INIT processing, we need to reply with
parameters from the original INIT chunk.  Also, after restart, we need
to update the old association with new peer parameters and change the
association shared keys.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:30 -07:00
Vlad Yasevich
1f485649f5 [SCTP]: Implement SCTP-AUTH internals
This patch implements the internals operations of the AUTH, such as
key computation and storage.  It also adds necessary variables to
the SCTP data structures.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:29 -07:00
David L Stevens
96793b4825 [IPV4]: Add ICMPMsgStats MIB (RFC 4293)
Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches "remove" (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types
        listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for "Icmp" to get the named data
        from new counters.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:28 -07:00
David L Stevens
14878f75ab [IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2]
Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches "remove" (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types
        listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for "Icmp" to get the named data
        from new counters.
[new to 2nd revision]
6) support per-interface ICMP stats
7) use common macro for per-device stat macros

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:27 -07:00
Herbert Xu
0cfad07555 [NETLINK]: Avoid pointer in netlink_run_queue
I was looking at Patrick's fix to inet_diag and it occured
to me that we're using a pointer argument to return values
unnecessarily in netlink_run_queue.  Changing it to return
the value will allow the compiler to generate better code
since the value won't have to be memory-backed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:24 -07:00
Vlad Yasevich
131a47e31a [SCTP]: Implement the Supported Extensions Parameter
SCTP Supported Extenions parameter is specified in Section 4.2.7
of the ADD-IP draft (soon to be RFC).  The parameter is
encoded as:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |     Parameter Type = 0x8008   |      Parameter Length         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | CHUNK TYPE 1  |  CHUNK TYPE 2 |  CHUNK TYPE 3 |  CHUNK TYPE 4 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                             ....                              |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | CHUNK TYPE N  |      PAD      |      PAD      |      PAD      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It contains a list of chunks that a particular SCTP extension
uses.  Current extensions supported are Partial Reliability
(FWD-TSN) and ADD-IP (ASCONF and ASCONF-ACK).

When implementing new extensions (AUTH, PKT-DROP, etc..), new
chunks need to be added to this parameter.  Parameter processing
would be modified to negotiate support for these new features.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:23 -07:00
Denis V. Lunev
76c72d4f44 [IPV4/IPV6/DECNET]: Small cleanup for fib rules.
This patch slightly cleanups FIB rules framework. rules_list as a pointer
on struct fib_rules_ops is useless. It is always assigned with a static
per/subsystem list in IPv4, IPv6 and DecNet.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:22 -07:00
Johannes Berg
c39e3a0d03 [MAC80211]: remove TKIP mixing for hw accel again
The TKIP mixing code was added for the benefit of Intel's ipw3945
chipset but that code ended up not using it. We have previously
identified many problems with this code and it crystallized that
library functions for mixing are likely to handle this in much
more generality and might allow b43 to take advantage of hardware
acceleration for TKIP.

Due to these reasons, remove the TKIP mixing for hardware
accelerated crypto operations.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:30 -07:00
Johannes Berg
6a7664d451 [MAC80211]: remove HW_KEY_IDX_INVALID
This patch makes the mac80211/driver interface rely only on the
IEEE80211_TXCTL_DO_NOT_ENCRYPT flag to signal to the driver whether
a frame should be encrypted or not, since mac80211 internally no
longer relies on HW_KEY_IDX_INVALID either this removes it, changes
the key index to be a u8 in all places and makes the full range of
the value available to drivers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:29 -07:00
Johannes Berg
7ac1bd6aec [MAC80211]: some more documentation
This patch formats some documentation in mac80211.h into kerneldoc
and also adds some more explanations for hardware crypto.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:29 -07:00
Johannes Berg
c15a205070 [MAC80211]: remove set_key_idx callback
No existing drivers use this callback, hence there's no telling
how it might be used. In fact, it is unlikely to be of much use
as-is because the default key index isn't something that the
driver can do much with without knowing which interface it was
for etc. And if it needs the key index for the transmitted frame,
it can get it by keeping a reference to the key_conf structure
and looking it up by hw_key_idx.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:28 -07:00
Johannes Berg
7848ba7d7a [MAC80211]: rework hardware crypto flags
This patch reworks the various hardware crypto related
flags to make them more local, i.e. put them with each
key or each packet instead of into the hw struct.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:27 -07:00
Johannes Berg
b708e61062 [MAC80211]: remove turbo modes
This patch removes all mention of the atheros turbo modes that
can't possibly work properly anyway since in some places we don't
check for them when we should.

I have no idea what the iwlwifi drivers were doing with these but
it can't possibly have been correct.

Cc: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:27 -07:00
Eric W. Biederman
077130c0cf [NET]: Fix race when opening a proc file while a network namespace is exiting.
The problem:  proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace).   So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.

To fix this introduce maybe_get_net and get_proc_net.

maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.

get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.

PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.

Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL.  In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:22 -07:00
Daniel Lezcano
4fabcd7118 [NETNS]: Fix allnoconfig compilation error.
When CONFIG_NET=no, init_net is unresolved because net_namespace.c
is not compiled and the include pull init_net definition.

This problem was very similar with the ipc namespace where the kernel
can be compiled with SYSV ipc out.

This patch fix that defining a macro which simply remove init_net
initialization from nsproxy namespace aggregator.

Compiled and booted on qemu-i386 with CONFIG_NET=no and CONFIG_NET=yes.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:21 -07:00
Jesper Dangaard Brouer
e08b09983f [NET_SCHED]: Making rate table lookups more flexible.
This is done in order to, add support to changing the rate table to
use the upper-boundry L2T (length to time) value. Currently we use the
lower-boundry, which result in under-estimating the actual bandwidth
usage.

Extend the tc_ratespec struct, with two parameters: 1) "cell_align"
that allow adjusting the alignment of the rate table. 2) "overhead"
that allow adding a packet overhead before the lookup.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:20 -07:00
Jesper Dangaard Brouer
e9bef55d3d [NET_SCHED]: Cleanup L2T macros and handle oversized packets
Change L2T (length to time) macros, in all rate based schedulers, to
call a common function qdisc_l2t() that does the rate table lookup.
This function handles if the packet size lookup is larger than the
rate table, which often occurs with TSO enabled.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:20 -07:00
Adrian Bunk
5c94bf86c8 [SCTP]: Make sctp_addto_param() static.
sctp_addto_param() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:19 -07:00
Thomas Graf
8f4c1f9b04 [NETLINK]: Introduce nested and byteorder flag to netlink attribute
This change allows the generic attribute interface to be used within
the netfilter subsystem where this flag was initially introduced.

The byte-order flag is yet unused, it's intended use is to
allow automatic byte order convertions for all atomic types.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:16 -07:00
Eric W. Biederman
881d966b48 [NET]: Make the device list and device lookups per namespace.
This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables.  The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces.  The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace.  This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:10 -07:00
Eric W. Biederman
1b8d7ae42d [NET]: Make socket creation namespace safe.
This patch passes in the namespace a new socket should be created in
and has the socket code do the appropriate reference counting.  By
virtue of this all socket create methods are touched.  In addition
the socket create methods are modified so that they will fail if
you attempt to create a socket in a non-default network namespace.

Failing if we attempt to create a socket outside of the default
network namespace ensures that as we incrementally make the network stack
network namespace aware we will not export functionality that someone
has not audited and made certain is network namespace safe.
Allowing us to partially enable network namespaces before all of the
exotic protocols are supported.

Any protocol layers I have missed will fail to compile because I now
pass an extra parameter into the socket creation code.

[ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:07 -07:00
Eric W. Biederman
457c4cbc5a [NET]: Make /proc/net per network namespace
This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:06 -07:00
Eric W. Biederman
07feaebfcc [NET]: Add a network namespace parameter to struct sock
Sockets need to get a reference to their network namespace,
or possibly a simple hold if someone registers on the network
namespace notifier and will free the sockets when the namespace
is going to be destroyed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:05 -07:00
Eric W. Biederman
5f256becd8 [NET]: Basic network namespace infrastructure.
This is the basic infrastructure needed to support network
namespaces.  This infrastructure is:
- Registration functions to support initializing per network
  namespace data when a network namespaces is created or destroyed.

- struct net.  The network namespace data structure.
  This structure will grow as variables are made per network
  namespace but this is the minimal starting point.

- Functions to grab a reference to the network namespace.
  I provide both get/put functions that keep a network namespace
  from being freed.  And hold/release functions serve as weak references
  and will warn if their count is not zero when the data structure
  is freed.  Useful for dealing with more complicated data structures
  like the ipv4 route cache.

- A list of all of the network namespaces so we can iterate over them.

- A slab for the network namespace data structure allowing leaks
  to be spotted.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:03 -07:00
Joy Latten
ab5f5e8b14 [XFRM]: xfrm audit calls
This patch modifies the current ipsec audit layer
by breaking it up into purpose driven audit calls.

So far, the only audit calls made are when add/delete
an SA/policy. It had been discussed to give each
key manager it's own calls to do this, but I found
there to be much redundnacy since they did the exact
same things, except for how they got auid and sid, so I
combined them. The below audit calls can be made by any
key manager. Hopefully, this is ok.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:02 -07:00
John Heffner
d2e9117c7a [NET]: Change type of owner in sock_lock_t to int, rename
The type of owner in sock_lock_t is currently (struct sock_iocb *),
presumably for historical reasons.  It is never used as this type, only
tested as NULL or set to (void *)1.  For clarity, this changes it to type
int, and renames to owned, to avoid any possible type casting errors.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:01 -07:00
Johannes Berg
11a843b7e1 [MAC80211]: rework key handling
This moves all the key handling code out from ieee80211_ioctl.c
into key.c and also does the following changes including documentation
updates in mac80211.h:

 1) Turn off hardware acceleration for keys when the interface
    is down. This is necessary because otherwise monitor
    interfaces could be decrypting frames for other interfaces
    that are down at the moment. Also, it should go some way
    towards better suspend/resume support, in any case the
    routines used here could be used for that as well.
    Additionally, this makes the driver interface nicer, keys
    for a specific local MAC address are only ever present
    while an interface with that MAC address is enabled.

 2) Change driver set_key() callback interface to allow only
    return values of -ENOSPC, -EOPNOTSUPP and 0, warn on all
    other return values. This allows debugging the stack when
    a driver notices it's handed a key while it is down.

 3) Invert the flag meaning to KEY_FLAG_UPLOADED_TO_HARDWARE.

 4) Remove REMOVE_ALL_KEYS command as it isn't used nor do we
    want to use it, we'll use DISABLE_KEY for each key. It is
    hard to use REMOVE_ALL_KEYS because we can handle multiple
    virtual interfaces with different key configuration, so we'd
    have to keep track of a lot of state for this and that isn't
    worth it.

 5) Warn when disabling a key fails, it musn't.

 6) Remove IEEE80211_HW_NO_TKIP_WMM_HWACCEL in favour of per-key
    IEEE80211_KEY_FLAG_WMM_STA to let driver sort it out itself.

 7) Tell driver that a (non-WEP) key is used only for transmission
    by using an all-zeroes station MAC address when configuring.

 8) Change the set_key() callback to have access to the local MAC
    address the key is being added for.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:53 -07:00
Johannes Berg
f658eb90d0 [MAC80211] key handling: remove default_wep_only
Remove the default_wep_only stuff, this wasn't really done well
and no current driver actually cares.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:52 -07:00
Johannes Berg
8f20fc2498 [MAC80211]: embed key conf in key, fix driver interface
This patch embeds the struct ieee80211_key_conf into struct ieee80211_key
and thus avoids allocations and having data present twice.

This required some more changes:
 1) The removal of the IEEE80211_KEY_DEFAULT_TX_KEY key flag.
    This flag isn't used by drivers nor should it be since
    we have a set_key_idx() callback. Maybe that callback needs
    to be extended to include the key conf, but only a driver that
    requires it will tell.
 2) The removal of the IEEE80211_KEY_DEFAULT_WEP_ONLY key flag.
    This flag is global, so it shouldn't be passed in the key
    conf structure. Pass it to the function instead.

Also, this patch removes the AID parameter to the set_key() callback
because it is currently unused and the hardware currently cannot know
about the AID anyway. I suspect this was used with some hardware that
actually selected the AID itself, but that functionality was removed.

Additionally, I've removed the ALG_NULL key algorithm since we have
ALG_NONE.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:51 -07:00
Johannes Berg
7b33a57f0f [MAC80211]: remove unused ioctls (3)
The ioctls
 * PRISM2_PARAM_RADAR_DETECT
 * PRISM2_PARAM_SPECTRUM_MGMT

are not used by hostapd or wpa_supplicant,

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:46 -07:00
Johannes Berg
53cb670042 [MAC80211]: remove unused ioctls (2)
The ioctls

 * PRISM2_PARAM_STA_ANTENNA_SEL
 * PRISM2_PARAM_TX_POWER_REDUCTION
 * PRISM2_PARAM_DEFAULT_WEP_ONLY

are not used by hostapd or wpa_supplicant.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:45 -07:00
Johannes Berg
b2446b3680 [MAC80211]: remove unused ioctls (1)
The ioctls

 * PRISM2_PARAM_ANTENNA_MODE
 * PRISM2_PARAM_STAT_TIME

are not used by hostapd or wpa_supplicant.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:45 -07:00
Johannes Berg
3017b80bf0 [MAC80211]: fix software decryption
When doing key selection for software decryption, mac80211 gets
a few things wrong: it always uses pairwise keys if configured,
even if the frame is addressed to a multicast address. Also, it
doesn't allow using a key index of zero if a pairwise key has
also been found.

This patch changes the key selection code to be (more) in line
with the 802.11 specification. I have confirmed that with this,
multicast frames are correctly decrypted and I've tested with
WEP as well.

While at it, I've cleaned up the semantics of the hardware flags
IEEE80211_HW_WEP_INCLUDE_IV and IEEE80211_HW_DEVICE_HIDES_WEP
and clarified them in the mac80211.h header; it is also now
allowed to set the IEEE80211_HW_DEVICE_HIDES_WEP option even if
it only applies to frames that have been decrypted by the hw,
unencrypted frames must be dropped but encrypted frames that
the hardware couldn't handle can be passed up unmodified.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:48:44 -07:00
Johannes Berg
82f716056f [MAC80211]: remove radar stuff
Unused in drivers, userspace and mac80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:43 -07:00
Johannes Berg
aaa92e9a74 [MAC80211]: remove IEEE80211_HW_DATA_NULLFUNC_ACK
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:42 -07:00
Johannes Berg
0ef6e49b75 [MAC80211]: remove IEEE80211_HW_HOST_GEN_BEACON flag
The flag is never checked because drivers can simply call
ieee80211_beacon_get() regardless of setting this flag.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:41 -07:00
Johannes Berg
4dfd1d2f6a [MAC80211]: remove reset callback
The callback isn't used so remove it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:40 -07:00
Ilpo Järvinen
172589ccdd [NET]: DIV_ROUND_UP cleanup (part two)
Hopefully captured all single statement cases under net/. I'm
not too sure if there is some policy about #includes that are
"guaranteed" (ie., in the current tree) to be available through
some other #included header, so I just added linux/kernel.h to
each changed file that didn't #include it previously.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:37 -07:00
Noriaki TAKAMIYA
a47ed4cd8c [IPV6] XFRM: Fix connected socket to use transformation.
When XFRM policy and state are ready after TCP connection is started,
the traffic should be transformed immediately, however it does not
on IPv6 TCP.

It depends on a dst cache replacement policy with connected socket.
It seems that the replacement is always done for IPv4, however, on
IPv6 case it is done only when routing cookie is changed.

This patch fix that non-transformation dst can be changed to
transformation one.
This behavior is required by MIPv6 and improves IPv6 IPsec.

Fixes by Masahide NAKAMURA.

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:32 -07:00
Brian Haley
e773e4faa1 [IPV6]: Add v4mapped address inline
Add v4mapped address inline to avoid calls to ipv6_addr_type().

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:32 -07:00
Ilpo Järvinen
6ff03ac355 [TCP]: tcp_packets_out_inc to tcp_output.c (no callers elsewhere)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:28 -07:00
Ilpo Järvinen
e9144bd8da [TCP]: Remove unnecessary wrapper tcp_packets_out_dec
Makes caller side more obvious, there's no need to have
a wrapper for this oneliner!

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:27 -07:00
Neil Horman
4d93df0abd [SCTP]: Rewrite of sctp buffer management code
This patch introduces autotuning to the sctp buffer management code
similar to the TCP.  The buffer space can be grown if the advertised
receive window still has room.  This might happen if small message
sizes are used, which is common in telecom environmens.
New tunables are introduced that provide limits to buffer growth
and memory pressure is entered if to much buffer spaces is used.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:09 -07:00
Ilpo Järvinen
e60402d0a9 [TCP]: Move sack_ok access to obviously named funcs & cleanup
Previously code had IsReno/IsFack defined as macros that were
local to tcp_input.c though sack_ok field has user elsewhere too
for the same purpose. This changes them to static inlines as
preferred according the current coding style and unifies the
access to sack_ok across multiple files. Magic bitops of sack_ok
for FACK and DSACK are also abstracted to functions with
appropriate names.

Note:
- One sack_ok = 1 remains but that's self explanary, i.e., it
  enables sack
- Couple of !IsReno cases are changed to tcp_is_sack
- There were no users for IsDSack => I dropped it

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:00 -07:00
Ilpo Järvinen
b9c4595bc4 [TCP]: Don't panic if S+L skb is detected
BUG_ON is an overkill. In fact, I was mislead by BUG_TRAP
severity (equals to WARN_ON) which is much lower than BUG_ON's
(that panics).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:59 -07:00
Ilpo Järvinen
1b6d427bb7 [TCP]: Reduce sacked_out with reno when purging write_queue
Previously TCP had a transitional state during which reno
counted segments that are already below the current window into
sacked_out, which is now prevented. In addition, re-try now
the unconditional S+L skb catching.

This approach conservatively calls just remove_sack and leaves
reset_sack() calls alone. The best solution to the whole problem
would be to first calculate the new sacked_out fully (this patch
does not move reno_sack_reset calls from original sites and thus
does not implement this). However, that would require very
invasive change to fastretrans_alert (perhaps even slicing it to
two halves). Alternatively, all callers of tcp_packets_in_flight
(i.e., users that depend on sacked_out) should be postponed
until the new sacked_out has been calculated but it isn't any
simpler alternative.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:58 -07:00
Ilpo Järvinen
005903bc3a [TCP]: Left out sync->verify (the new meaning of it) & definify
Left_out was dropped a while ago, thus leaving verifying
consistency of the "left out" as only task for the function in
question. Thus make it's name more appropriate.

In addition, it is intentionally converted to #define instead
of static inline because the location of the invariant failure
is the most important thing to have if this ever triggers. I
think it would have been helpful e.g. in this case where the
location of the failure point had to be based on some quesswork:
    http://lkml.org/lkml/2007/5/2/464
...Luckily the guesswork seems to have proved to be correct.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:57 -07:00
Ilpo Järvinen
83ae40885f [TCP]: Add tcp_left_out(tp) "back" to get cleaner looking lines
tp->left_out got removed but nothing came to replace it back
then (users just did addition by themselves), so add function
for users now.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:56 -07:00
Ilpo Järvinen
b5860bbac7 [TCP]: Tighten tcp_sock's belt, drop left_out
It is easily calculable when needed and user are not that many
after all.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:55 -07:00
Ilpo Järvinen
af610b4ca1 [TCP]: Add tcp_dec_pcount_approx int variant
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:54 -07:00
Ilpo Järvinen
bdf1ee5d3b [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it
No other users exist for tcp_ecn.h. Very few things remain in
tcp.h, for most TCP ECN functions callers reside within a
single .c file and can be placed there.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:54 -07:00
Pavel Emelyanov
e314dbdc1c [NET]: Virtual ethernet device driver.
Veth stands for Virtual ETHernet. It is a simple tunnel driver
that works at the link layer and looks like a pair of ethernet
devices interconnected with each other.

Mainly it allows to communicate between network namespaces but
it can be used as is as well.

The newlink callback is organized that way to make it easy to
create the peer device in the separate namespace when we have
them in kernel.

This implementation uses another interface - the RTM_NRELINK
message introduced by Patric.

Bug fixes from Daniel Lezcano.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:46 -07:00
Pavel Emelianov
e71992889e [RTNETLINK]: Introduce generic rtnl_create_link().
This routine gets the parsed rtnl attributes and creates a new
link with generic info (IFLA_LINKINFO policy). Its intention
is to help the drivers, that need to create several links at
once (like VETH).

This is nothing but a copy-paste-ed part of rtnl_newlink() function
that is responsible for creation of new device.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:45 -07:00
Andy Green
dfe6e81dea [MAC80211]: Add get_unaligned to ieee80211_get_radiotap_len
ieee80211_get_radiotap_len() tries to dereference radiotap length without
taking care that it is completely unaligned and get_unaligned()
is required.

Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:40 -07:00
Daniel Drake
d9430a3288 [MAC80211]: implement ERP info change notifications
zd1211rw and bcm43xx are interested in being notified when ERP IE conditions
change, so that they can reprogram a register which affects how control frames
are transmitted.

This patch adds an interface similar to the one that can be found in softmac.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:39 -07:00
Daniel Drake
7e9ed18874 [MAC80211]: improved short preamble handling
Similarly to CTS protection, whether short preambles are used for 802.11b
transmissions should be a per-subif setting, not device global.

For STAs, this patch makes short preamble handling automatic based on the ERP
IE. For APs, hostapd still uses the prism ioctls, but the write ioctl has been
restricted to AP-only subifs.

ieee80211_txrx_data.short_preamble (an unused field) was removed.

Unfortunately, some API changes were required for the following functions:
 - ieee80211_generic_frame_duration
 - ieee80211_rts_duration
 - ieee80211_ctstoself_duration
 - ieee80211_rts_get
 - ieee80211_ctstoself_get
Affected drivers were updated accordingly.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:38 -07:00
Ivo van Doorn
d5d08def92 [MAC80211]: Add LONG_RETRY flag to ieee80211_tx_control
mac80211 informs the driver what the short and long retry values are through
set_retry_limit(), but when packets are being transmitted it did not inform the
driver which of the 2 retry limits should actually be used.
Instead it sends the actual value, but for drivers that can only set the retry limit
and the register and in the descriptor need to indicate which of the limits should
be used this is not really useful.

This patch will add a IEEE80211_TXCTL_LONG_RETRY_LIMIT flag to the
ieee80211_tx_control structure. By default the short retry limit should be
used but if the flag is set the long retry should be used.

This does not prevent the driver to ignore the request for "no retry" packets,
but at least those will be send out with the short retry limit. But there is no
perfect cure for this problem.. :(

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:38 -07:00
Michael Wu
be8755e180 [MAC80211]: improve locking of sta_info related structures
The sta_info code has some awkward locking which prevents some driver
callbacks from being allowed to sleep. This patch makes the locking more
focused so code that calls driver callbacks are allowed to sleep. It also
converts sta_lock to a rwlock.

Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:37 -07:00
Johannes Berg
571ecf676d [MAC80211]: split RX handlers into own file
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:47:29 -07:00
Alexey Dobriyan
891e6a9312 [ROSE]: Fix rose.ko oops on unload
Commit a3d384029a aka
"[AX.25]: Fix unchecked rose_add_loopback_neigh uses"
transformed rose_loopback_neigh var into statically allocated one.
However, on unload it will be kfree's which can't work.

Steps to reproduce:

	modprobe rose
	rmmod rose

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c014c664
*pde = 00000000
Oops: 0000 [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in: rose ax25 fan ufs loop usbhid rtc snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus uhci_hcd thermal usbcore button processor evdev sr_mod cdrom
CPU:    0
EIP:    0060:[<c014c664>]    Not tainted VLI
EFLAGS: 00210086   (2.6.23-rc9 #3)
EIP is at kfree+0x48/0xa1
eax: 00000556   ebx: c1734aa0   ecx: f6a5e000   edx: f7082000
esi: 00000000   edi: f9a55d20   ebp: 00200287   esp: f6a5ef28
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process rmmod (pid: 1823, ti=f6a5e000 task=f7082000 task.ti=f6a5e000)
Stack: f9a55d20 f9a5200c 00000000 00000000 00000000 f6a5e000 f9a5200c f9a55a00 
       00000000 bf818cf0 f9a51f3f f9a55a00 00000000 c0132c60 65736f72 00000000 
       f69f9630 f69f9528 c014244a f6a4e900 00200246 f7082000 c01025e6 00000000 
Call Trace:
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a51f3f>] rose_exit+0x4c/0xd5 [rose]
 [<c0132c60>] sys_delete_module+0x15e/0x186
 [<c014244a>] remove_vma+0x40/0x45
 [<c01025e6>] sysenter_past_esp+0x8f/0x99
 [<c012bacf>] trace_hardirqs_on+0x118/0x13b
 [<c01025b6>] sysenter_past_esp+0x5f/0x99
 =======================
Code: 05 03 1d 80 db 5b c0 8b 03 25 00 40 02 00 3d 00 40 02 00 75 03 8b 5b 0c 8b 73 10 8b 44 24 18 89 44 24 04 9c 5d fa e8 77 df fd ff <8b> 56 08 89 f8 e8 84 f4 fd ff e8 bd 32 06 00 3b 5c 86 60 75 0f 
EIP: [<c014c664>] kfree+0x48/0xa1 SS:ESP 0068:f6a5ef28

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-07 23:44:17 -07:00
David S. Miller
f8ab18d2d9 [TCP]: Fix MD5 signature handling on big-endian.
Based upon a report and initial patch by Peter Lieven.

tcp4_md5sig_key and tcp6_md5sig_key need to start with
the exact same members as tcp_md5sig_key.  Because they
are both cast to that type by tcp_v{4,6}_md5_do_lookup().

Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
length instead of a u8, which is what tcp_md5sig_key
uses.  This just so happens to work by accident on
little-endian, but on big-endian it doesn't.

Instead of casting, just place tcp_md5sig_key as the first member of
the address-family specific structures, adjust the access sites, and
kill off the ugly casts.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-28 15:18:35 -07:00
Al Viro
78bd8fbbcd fix sctp_del_bind_addr() last argument type
It gets pointer to fastcall function, expects a pointer to normal
one and calls the sucker.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26 09:22:04 -07:00
Wei Yongjun
6f4c618ddb SCTP : Add paramters validity check for ASCONF chunk
If ADDIP is enabled, when an ASCONF chunk is received with ASCONF
paramter length set to zero, this will cause infinite loop.
By the way, if an malformed ASCONF chunk is received, will cause
processing to access memory without verifying.

This is because of not check the validity of parameters in ASCONF chunk.
This patch fixed this.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-09-25 22:55:49 -07:00
Vlad Yasevich
ece25dfa09 SCTP: Clean up OOTB handling and fix infinite loop processing
While processing OOTB chunks as well as chunks with an invalid
length of 0, it was possible to SCTP to get wedged inside an
infinite loop because we didn't catch the condition correctly,
or didn't mark the packet for discard correctly.
This work is based on original findings and work by
Wei Yongjun <yjwei@cn.fujitsu.com>

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-09-25 22:55:47 -07:00
Vlad Yasevich
559cf710b0 [SCTP]: Convert bind_addr_list locking to RCU
Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU.  This includes the
sctp_bind_addrs structure and it's list of bound addresses.

This list is currently protected by an external rw_lock and that
looks like an overkill.  There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other.  These are also relatively rare, so we
should be good with RCU.

The readers are varied and they are easily converted to RCU.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 16:03:28 -07:00
Vlad Yasevich
2930354799 [SCTP]: Add RCU synchronization around sctp_localaddr_list
sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers.  As a result,
the readers can access an entry that has been freed and
crash the sytem.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 16:02:12 -07:00
Wei Yongjun
00f1c2df2a SCTP: Fix to encode PROTOCOL VIOLATION error cause correctly
PROTOCOL VIOLATION error cause in ABORT is bad encode when make abort
chunk. When SCTP encode ABORT chunk with PROTOCOL VIOLATION error cause,
it just add the error messages to PROTOCOL VIOLATION error cause, the
rest four bytes(struct sctp_paramhdr) is just add to the chunk, not
change the length of error cause. This cause the ABORT chunk to be a bad
format. The chunk is like this:

ABORT chunk
  Chunk type: ABORT (6)
  Chunk flags: 0x00
  Chunk length: 72 (*1)
  Protocol violation cause
    Cause code: Protocol violation (0x000d)
    Cause length: 62 (*2)
    Cause information: 5468652063756D756C61746976652074736E2061636B2062...
    Cause padding: 0000
[Needless] 00030010
Chunk Length(*1) = 72 but Cause length(*2) only 62, not include the
extend 4 bytes.
((72 - sizeof(chunk_hdr)) = 68) != (62 +3) / 4 * 4

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 13:50:48 -04:00
Vlad Yasevich
ea2dfb3733 SCTP: properly clean up fragment and ordering queues during FWD-TSN.
When we recieve a FWD-TSN (meaning the peer has abandoned the data),
we need to clean up any partially received messages that may be
hanging out on the re-assembly or re-ordering queues.  This is
a MUST requirement that was not properly done before.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com.>
2007-08-29 13:34:33 -04:00
Patrick McHardy
591e620693 [NETFILTER]: nf_nat: add symbolic dependency on IPv4 conntrack
Loading nf_nat causes the conntrack core to be loaded, but we need IPv4 as
well.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-07 18:12:01 -07:00
David S. Miller
3a97aeb5c1 Merge davem@master.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev 2007-08-02 19:44:43 -07:00
David S. Miller
3516ffb0fe [TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg().
As discovered by Evegniy Polyakov, if we try to sendmsg after
a connection reset, we can do incredibly stupid things.

The core issue is that inet_sendmsg() tries to autobind the
socket, but we should never do that for TCP.  Instead we should
just go straight into TCP's sendmsg() code which will do all
of the necessary state and pending socket error checks.

TCP's sendpage already directly vectors to tcp_sendpage(), so this
merely brings sendmsg() in line with that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-02 19:42:28 -07:00
Paul Moore
9534f71ca3 SELinux: restore proper NetLabel caching behavior
A small fix to the SELinux/NetLabel glue code to ensure that the NetLabel
cache is utilized when possible.  This was broken when the SELinux/NetLabel
glue code was reorganized in the last kernel release.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2007-08-02 11:52:21 -04:00
sebastian@breakpoint.cc
0a5fcb9cf8 sctp: move global declaration to header file.
sctp_chunk_cachep & sctp_bucket_cachep is used module global, so move it
to a header file.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-01 11:19:06 -04:00
Adrian Bunk
131116989b [AF_UNIX]: Make code static.
The following code can now become static:
- struct unix_socket_table
- unix_table_lock

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:28:27 -07:00
Adrian Bunk
1a3a206f7f [NETFILTER]: Make nf_ct_ipv6_skip_exthdr() static.
nf_ct_ipv6_skip_exthdr() can now become static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:28:26 -07:00
Herbert Xu
20283d84c7 [IPV6]: Remove circular dependency on if_inet6.h
net/if_inet6.h includes linux/ipv6.h which also tries to include
net/if_inet6.h.  Since the latter only needs it for forward
declarations, we can fix this by adding the declarations.

A number of files are implicitly including net/if_inet6.h through
linux/ipv6.h.  They also use net/ipv6.h so this patch includes
net/if_inet6.h there.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:28:17 -07:00
Al Viro
8e036fc314 [BLUETOOTH] l2cap: endianness annotations
no code changes, just documenting existing types

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:28:07 -07:00
Stephen Hemminger
30cfd0baf0 [TCP]: congestion control API pass RTT in microseconds
This patch changes the API for the callback that is done after an ACK is
received. It solves a couple of issues:

  * Some congestion controls want higher resolution value of RTT
    (controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but
    all compute a RTT in microseconds.

  * Other congestion control could use RTT at jiffies resolution.

To keep API consistent the units should be the same for both cases, just the
resolution should change.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:27:57 -07:00
Al Viro
a34c45896a netfilter endian regressions
no real bugs, just misannotations cropping up

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:11:56 -07:00
Linus Torvalds
721e2629fa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel
  SELinux: enable dynamic activation/deactivation of NetLabel/SELinux enforcement
2007-07-19 14:42:40 -07:00
Paul Moore
23bcdc1ade SELinux: enable dynamic activation/deactivation of NetLabel/SELinux enforcement
Create a new NetLabel KAPI interface, netlbl_enabled(), which reports on the
current runtime status of NetLabel based on the existing configuration.  LSMs
that make use of NetLabel, i.e. SELinux, can use this new function to determine
if they should perform NetLabel access checks.  This patch changes the
NetLabel/SELinux glue code such that SELinux only enforces NetLabel related
access checks when netlbl_enabled() returns true.

At present NetLabel is considered to be enabled when there is at least one
labeled protocol configuration present.  The result is that by default NetLabel
is considered to be disabled, however, as soon as an administrator configured
a CIPSO DOI definition NetLabel is enabled and SELinux starts enforcing
NetLabel related access controls - including unlabeled packet controls.

This patch also tries to consolidate the multiple "#ifdef CONFIG_NETLABEL"
blocks into a single block to ease future review as recommended by Linus.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2007-07-19 10:21:11 -04:00
Johannes Berg
2dbba6f773 [GENETLINK]: Dynamic multicast groups.
Introduce API to dynamically register and unregister multicast groups.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18 15:47:52 -07:00
Patrick McHardy
bd0bf0765e [XFRM]: Fix crash introduced by struct dst_entry reordering
XFRM expects xfrm_dst->u.next to be same pointer as dst->next, which
was broken by the dst_entry reordering in commit 1e19e02c~, causing
an oops in xfrm_bundle_ok when walking the bundle upwards.

Kill xfrm_dst->u.next and change the only user to use dst->next instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18 01:55:52 -07:00
Stephen Hemminger
16751347a0 [TCP]: remove unused argument to cong_avoid op
None of the existing TCP congestion controls use the rtt value pased
in the ca_ops->cong_avoid interface.  Which is lucky because seq_rtt
could have been -1 when handling a duplicate ack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18 01:46:58 -07:00
Roland McGrath
c09edd6eba avoid OPEN_MAX in SCM_MAX_FD
The OPEN_MAX constant is an arbitrary number with no useful relation to
anything.  Nothing should be using it.  SCM_MAX_FD is just an arbitrary
constant and it should be clear that its value is chosen in net/scm.h
and not actually derived from anything else meaningful in the system.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Linus Torvalds
d3502d7f25 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
  [TCP]: Verify the presence of RETRANS bit when leaving FRTO
  [IPV6]: Call inet6addr_chain notifiers on link down
  [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
  [NET_SCHED]: act_api: qdisc internal reclassify support
  [NET_SCHED]: sch_dsmark: act_api support
  [NET_SCHED]: sch_atm: act_api support
  [NET_SCHED]: sch_atm: Lindent
  [IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
  [IPV4]: Cleanup call to __neigh_lookup()
  [NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
  [NETFILTER]: nf_conntrack: UDPLITE support
  [NETFILTER]: nf_conntrack: mark protocols __read_mostly
  [NETFILTER]: x_tables: add connlimit match
  [NETFILTER]: Lower *tables printk severity
  [NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
  [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
  [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
  [NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
  [NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
  [AF_IUCV]: Add lock when updating accept_q
  ...
2007-07-15 16:50:46 -07:00
Patrick McHardy
c3bc7cff8f [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15 00:03:05 -07:00
Patrick McHardy
73ca4918fb [NET_SCHED]: act_api: qdisc internal reclassify support
The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return
it to the qdisc, which could handle it internally or ignore it. With
NET_CLS_ACT however, tc_classify starts over at the first classifier
and never returns it to the qdisc. This makes it impossible to support
qdisc-internal reclassification, which in turn makes it impossible to
remove the old NET_CLS_POLICE code without breaking compatibility since
we have two qdiscs (CBQ and ATM) that support this.

This patch adds a tc_classify_compat function that handles
reclassification the old way and changes CBQ and ATM to use it.

This again is of course not fully backwards compatible with the previous
NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain
compatibility *and* support qdisc internal reclassification with
NET_CLS_ACT, but this seems like the better choice over keeping the two
incompatible options around forever.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15 00:02:31 -07:00
Patrick McHardy
61075af51f [NETFILTER]: nf_conntrack: mark protocols __read_mostly
Also remove two unnecessary EXPORT_SYMBOLs and move the
nf_conntrack_l3proto_ipv4 declaration to the correct file.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:48:19 -07:00
Yasuyuki Kozakai
e2a3123fbe [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
nf_ct_get_tuple() requires the offset to transport header and that bothers
callers such as icmp[v6] l4proto modules. This introduces new function
to simplify them.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:45:14 -07:00
Yasuyuki Kozakai
ffc3069048 [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple.
But they have to find the offset to transport protocol header before that.
Their processings are almost same as prepare() of l3proto modules.
This makes prepare() more generic to simplify icmp[v6] l4proto module
later.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:44:50 -07:00
Ursula Braun
febca281f6 [AF_IUCV]: Add lock when updating accept_q
The accept_queue of an af_iucv socket will be corrupted, if
adding and deleting of entries in this queue occurs at the
same time (connect request from one client, while accept call
is processed for another client).
Solution: add locking when updating accept_q

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Acked-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 19:04:25 -07:00
Adrian Bunk
acd159b6b5 [INET_SOCK]: make net/ipv4/inet_timewait_sock.c:__inet_twsk_kill() static
This patch makes the needlessly global __inet_twsk_kill() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 19:00:59 -07:00
Latchesar Ionkov
bd238fb431 9p: Reorganization of 9p file system code
This patchset moves non-filesystem interfaces of v9fs from fs/9p to net/9p.
It moves the transport, packet marshalling and connection layers to net/9p
leaving only the VFS related files in fs/9p.  This work is being done in
preparation for in-kernel 9p servers as well as alternate 9p clients (other
than VFS).

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-07-14 15:13:40 -05:00
Johannes Berg
4480f15ca6 [PATCH] mac80211: clarify some mac80211 things
The semantics of not having an add_interface callback are not well
defined, this callback is required because otherwise you cannot obtain
the requested MAC address of the device. Change the documentation to
reflect this, add a note about having no MAC address at all, add a
warning that mac_addr in struct ieee80211_if_init_conf can be NULL and
finally verify that a few callbacks are assigned by way of BUG_ON()

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-07-12 16:07:26 -04:00
Johannes Berg
c59304b5e0 [PATCH] mac80211: remove ieee80211_set_aid_for_sta
Remove ieee80211_set_aid_for_sta and associated code.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-07-12 16:07:25 -04:00
Andy Green
179f831bc3 [PATCH] cfg80211: Radiotap parser
Generic code to walk through the fields in a radiotap header, accounting
for nasties like extended "field present" bitfields and alignment rules

Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-07-12 16:07:24 -04:00
Patrick McHardy
db3d99c090 [NET_SCHED]: ematch: module autoloading
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11 19:46:26 -07:00
David S. Miller
50b65cc6fa Merge master.kernel.org:/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2007-07-11 19:37:40 -07:00
Miklos Szeredi
1fd05ba5a2 [AF_UNIX]: Rewrite garbage collector, fixes race.
Throw out the old mark & sweep garbage collector and put in a
refcounting cycle detecting one.

The old one had a race with recvmsg, that resulted in false positives
and hence data loss.  The old algorithm operated on all unix sockets
in the system, so any additional locking would have meant performance
problems for all users of these.

The new algorithm instead only operates on "in flight" sockets, which
are very rare, and the additional locking for these doesn't negatively
impact the vast majority of users.

In fact it's probable, that there weren't *any* heavy senders of
sockets over sockets, otherwise the above race would have been
discovered long ago.

The patch works OK with the app that exposed the race with the old
code.  The garbage collection has also been verified to work in a few
simple cases.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11 14:22:39 -07:00
Ilpo Järvinen
ed8b548ce3 [DECNET]: Another unnecessary net/tcp.h inclusion in net/dn.h
No longer needed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 23:02:12 -07:00
YOSHIFUJI Hideaki
bb4dbf9e61 [IPV6]: Do not send RH0 anymore.
Based on <draft-ietf-ipv6-deprecate-rh0-00.txt>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:55:49 -07:00
Marcel Holtmann
5b7f990927 [Bluetooth] Add basics to better support and handle eSCO links
To better support and handle eSCO links in the future a bunch of
constants needs to be added and some basic routines need to be
updated. This is the initial step.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-07-11 07:35:32 +02:00
Philippe De Muyter
4839c52b01 [IPV4]: Make ip_tos2prio const.
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:19:04 -07:00
Patrick McHardy
0d53778e81 [NETFILTER]: Convert DEBUGP to pr_debug
Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:20 -07:00
Patrick McHardy
b8a7fe6c10 [NETFILTER]: nf_conntrack_helper: use hashtable for conntrack helpers
Eliminate the last global list searched for every new connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:13 -07:00
Patrick McHardy
f264a7df08 [NETFILTER]: nf_conntrack_expect: introduce nf_conntrack_expect_max sysct
As a last step of preventing DoS by creating lots of expectations, this
patch introduces a global maximum and a sysctl to control it. The default
is initialized to 4 * the expectation hash table size, which results in
1/64 of the default maxmimum of conntracks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:12 -07:00
Patrick McHardy
b560580a13 [NETFILTER]: nf_conntrack_expect: maintain per conntrack expectation list
This patch brings back the per-conntrack expectation list that was
removed around 2.6.10 to avoid walking all expectations on expectation
eviction and conntrack destruction.

As these were the last users of the global expectation list, this patch
also kills that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:02 -07:00
Patrick McHardy
a71c085562 [NETFILTER]: nf_conntrack: use hashtable for expectations
Currently all expectations are kept on a global list that

- needs to be searched for every new conncetion
- needs to be walked for evicting expectations when a master connection
  has reached its limit
- needs to be walked on connection destruction for connections that
  have open expectations

This is obviously not good, especially when considering helpers like
H.323 that register *lots* of expectations and can set up permanent
expectations, but it also allows for an easy DoS against firewalls
using connection tracking helpers.

Use a hashtable for expectations to avoid incurring the search overhead
for every new connection. The default hash size is 1/256 of the conntrack
hash table size, this can be overriden using a module parameter.

This patch only introduces the hash table for expectation lookups and
keeps other users to reduce the noise, the following patches will get
rid of it completely.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:59 -07:00
Patrick McHardy
e9c1b084e1 [NETFILTER]: nf_conntrack: move expectaton related init code to nf_conntrack_expect.c
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:58 -07:00
Patrick McHardy
d4156e8cd9 [NETFILTER]: nf_conntrack: reduce masks to a subset of tuples
Since conntrack currently allows to use masks for every bit of both
helper and expectation tuples, we can't hash them and have to keep
them on two global lists that are searched for every new connection.

This patch removes the never used ability to use masks for the
destination part of the expectation tuple and completely removes
masks from helpers since the only reasonable choice is a full
match on l3num, protonum and src.u.all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:55 -07:00
Patrick McHardy
6823645d60 [NETFILTER]: nf_conntrack_expect: function naming unification
Currently there is a wild mix of nf_conntrack_expect_, nf_ct_exp_,
expect_, exp_, ...

Consistently use nf_ct_ as prefix for exported functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:53 -07:00
Patrick McHardy
53aba5979e [NETFILTER]: nf_nat: use hlists for bysource hash
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:43 -07:00
Patrick McHardy
ac565e5fc1 [NETFILTER]: nf_conntrack: export hash allocation/destruction functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:42 -07:00
Patrick McHardy
330f7db5e5 [NETFILTER]: nf_conntrack: remove 'ignore_conntrack' argument from nf_conntrack_find_get
All callers pass NULL, this also doesn't seem very useful for modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:41 -07:00
Patrick McHardy
f205c5e0c2 [NETFILTER]: nf_conntrack: use hlists for conntrack hash
Convert conntrack hash to hlists to reduce its size and cache
footprint. Since the default hashsize to max. entries ratio
sucks (1:16), this patch doesn't reduce the amount of memory
used for the hash by default, but instead uses a better ratio
of 1:8, which results in the same max. entries value.

One thing worth noting is early_drop. It really should use LRU,
so it now has to iterate over the entire chain to find the last
unconfirmed entry. Since chains shouldn't be very long and the
entire operation is very rare this shouldn't be a problem.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:40 -07:00
Yasuyuki Kozakai
b6b84d4a94 [NETFILTER]: nf_nat: merge nf_conn and nf_nat_info
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:37 -07:00
Yasuyuki Kozakai
d8a0509a69 [NETFILTER]: nf_nat: kill global 'destroy' operation
This kills the global 'destroy' operation which was used by NAT.
Instead it uses the extension infrastructure so that multiple
extensions can register own operations.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:36 -07:00
Yasuyuki Kozakai
dacd2a1a5c [NETFILTER]: nf_conntrack: remove old memory allocator of conntrack
Now memory space for help and NAT are allocated by extension
infrastructure.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:35 -07:00
Yasuyuki Kozakai
ff09b7493c [NETFILTER]: nf_nat: remove unused nf_nat_module_is_loaded
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:34 -07:00
Yasuyuki Kozakai
2d59e5ca8c [NETFILTER]: nf_nat: use extension infrastructure
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:20 -07:00
Yasuyuki Kozakai
e54cbc1f91 [NETFILTER]: nf_nat: add reference to conntrack from entry of bysource list
I will split 'struct nf_nat_info' out from conntrack. So I cannot use
'offsetof' to get the pointer to conntrack from it.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:19 -07:00
Yasuyuki Kozakai
ceceae1b15 [NETFILTER]: nf_conntrack: use extension infrastructure for helper
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:18 -07:00
Yasuyuki Kozakai
ecfab2c9fe [NETFILTER]: nf_conntrack: introduce extension infrastructure
Old space allocator of conntrack had problems about extensibility.
- It required slab cache per combination of extensions.
- It expected what extensions would be assigned, but it was impossible
  to expect that completely, then we allocated bigger memory object than
  really required.
- It needed to search helper twice due to lock issue.

Now basic informations of a connection are stored in 'struct nf_conn'.
And a storage for extension (helper, NAT) is allocated by kmalloc.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:17 -07:00
Yasuyuki Kozakai
4ba887790c [NETFILTER]: nf_nat: move NAT declarations from nf_conntrack_ipv4.h to nf_nat.h
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:16 -07:00
James Chapman
067b207b28 [UDP]: Cleanup UDP encapsulation code
This cleanup fell out after adding L2TP support where a new encap_rcv
funcptr was added to struct udp_sock. Have XFRM use the new encap_rcv
funcptr, which allows us to move the XFRM encap code from udp.c into
xfrm4_input.c.

Make xfrm4_rcv_encap() static since it is no longer called externally.

Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:53 -07:00
Samuel Ortiz
89da1ecf54 [IrDA]: Netlink layer.
First IrDA configuration netlink layer implementation.
Currently, we only support the set/get mode commands.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:43 -07:00
Patrick McHardy
4bdf39911e [NET_SCHED]: Remove unnecessary stats_lock pointers
Remove stats_lock pointers from qdisc-internal structures, in all cases
it points to dev->queue_lock. The only case where it is necessary is for
top-level qdiscs, where it might also point to dev->ingress_lock in case
of the ingress qdisc. Also remove it from actions completely, it always
points to the actions internal lock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:38 -07:00
Jamal Hadi Salim
628529b6ee [XFRM] Introduce standalone SAD lookup
This allows other in-kernel functions to do SAD lookups.
The only known user at the moment is pktgen.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:35 -07:00
Masahide NAKAMURA
d3d6dd3ada [XFRM]: Add module alias for transformation type.
It is clean-up for XFRM type modules and adds aliases with its
protocol:
 ESP, AH, IPCOMP, IPIP and IPv6 for IPsec
 ROUTING and DSTOPTS for MIPv6

It is almost the same thing as XFRM mode alias, but it is added
new defines XFRM_PROTO_XXX for preprocessing since some protocols
are defined as enum.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Ingo Oeser <netdev@axxeo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:43 -07:00
Masahide NAKAMURA
59fbb3a61e [IPV6] MIP6: Loadable module support for MIPv6.
This patch makes MIPv6 loadable module named "mip6".

Here is a modprobe.conf(5) example to load it automatically
when user application uses XFRM state for MIPv6:

alias xfrm-type-10-43 mip6
alias xfrm-type-10-60 mip6

Some MIPv6 feature is not included by this modular, however,
it should not be affected to other features like either IPsec
or IPv6 with and without the patch.
We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
separately for future work.

Loadable features:
* MH receiving check (to send ICMP error back)
* RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
* XFRM policy/state database handling for RO

These are NOT covered as loadable:
* Home Address flags and its rule on source address selection
* XFRM sub policy (depends on its own kernel option)
* XFRM functions to receive RO as IPv6 extension header
* MH sending/receiving through raw socket if user application
  opens it (since raw socket allows to do so)
* RH2 sending as ancillary data
* RH2 operation with setsockopt(2)

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:42 -07:00
Masahide NAKAMURA
136ebf08b4 [IPV6] MIP6: Kill unnecessary ifdefs.
Kill unnecessary CONFIG_IPV6_MIP6.

o It is redundant for RAW socket to keep MH out with the config then
  it can handle any protocol.
o Clean-up at AH.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:41 -07:00
Patrick McHardy
1092cb2197 [NETLINK]: attr: add nested compat attribute type
Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:38 -07:00
Patrick McHardy
38f7b870d4 [RTNETLINK]: Link creation API
Add rtnetlink API for creating, changing and deleting software devices.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:20 -07:00
Ville Tervo
8de0a15483 [Bluetooth] Keep rfcomm_dev on the list until it is freed
This patch changes the RFCOMM TTY release process so that the TTY is kept
on the list until it is really freed. A new device flag is used to keep
track of released TTYs.

Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-07-11 07:06:51 +02:00
Allan Stephens
05646c9110 [TIPC]: Optimize stream send routine to avoid fragmentation
This patch enhances TIPC's stream socket send routine so that
it avoids transmitting data in chunks that require fragmentation
and reassembly, thereby improving performance at both the
sending and receiving ends of the connection.

The "maximum packet size" hint that records MTU info allows
the socket to decide how big a chunk it should send; in the
event that the hint has become stale, fragmentation may still
occur, but the data will be passed correctly and the hint will
be updated in time for the following send.  Note: The 66060 byte
pseudo-MTU used for intra-node connections requires the send
routine to perform an additional check to ensure it does not
exceed TIPC"s limit of 66000 bytes of user data per chunk.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:12 -07:00
David S. Miller
e06e7c6158 [IPV4]: The scheduled removal of multipath cached routing support.
With help from Chris Wedgwood.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:05:57 -07:00
Marcel Holtmann
ef222013fc [Bluetooth] Add hci_recv_fragment() helper function
Most drivers must handle fragmented HCI data packets and events. This
patch adds a generic function for their reassembly to the Bluetooth
core layer and thus allows to shrink the complexity of the drivers.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-07-11 06:42:04 +02:00
Ben Dooks
825a2ff189 AX88796 network driver
Support for the Asix AX88796 network controller, an
NE2000 compatible 10/100 ethernet device with internal
PHY.

The driver supports PHY settings via either ioctl() or
the ethtool driver ops.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-10 12:41:08 -04:00
Vlad Yasevich
8a4794914f [SCTP] Flag a pmtu change request
Currently, if the socket is owned by the user, we drop the ICMP
message.  As a result SCTP forgets that path MTU changed and
never adjusting it's estimate.  This causes all subsequent
packets to be fragmented.  With this patch, we'll flag the association
that it needs to udpate it's estimate based on the already updated
routing information.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
2007-06-13 20:44:42 +00:00
Vlad Yasevich
c910b47e18 [SCTP] Update pmtu handling to be similar to tcp
Introduce new function sctp_transport_update_pmtu that updates
the transports and destination caches view of the path mtu.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
2007-06-13 20:44:42 +00:00
G. Liakhovetski
c0cfe7faa1 [IrDA]: Fix Rx/Tx path race.
From: G. Liakhovetski <gl@dsa-ac.de>

We need to switch to NRM _before_ sending the final packet otherwise
we might hit a race condition where we get the first packet from the
peer while we're still in LAP_XMIT_P.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-08 19:15:17 -07:00
Paul Moore
ba6ff9f2b5 [NetLabel]: consolidate the struct socket/sock handling to just struct sock
The current NetLabel code has some redundant APIs which allow both
"struct socket" and "struct sock" types to be used; this may have made
sense at some point but it is wasteful now.  Remove the functions that
operate on sockets and convert the callers.  Not only does this make
the code smaller and more consistent but it pushes the locking burden
up to the caller which can be more intelligent about the locks.  Also,
perform the same conversion (socket to sock) on the SELinux/NetLabel
glue code where it make sense.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-08 13:33:09 -07:00
Joy Latten
4aa2e62c45 xfrm: Add security check before flushing SAD/SPD
Currently we check for permission before deleting entries from SAD and
SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete())
However we are not checking for authorization when flushing the SPD and
the SAD completely. It was perhaps missed in the original security hooks
patch.

This patch adds a security check when flushing entries from the SAD and
SPD.  It runs the entire database and checks each entry for a denial.
If the process attempting the flush is unable to remove all of the
entries a denial is logged the the flush function returns an error
without removing anything.

This is particularly useful when a process may need to create or delete
its own xfrm entries used for things like labeled networking but that
same process should not be able to delete other entries or flush the
entire database.

Signed-off-by: Joy Latten<latten@austin.ibm.com>
Signed-off-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <jmorris@namei.org>
2007-06-07 13:42:46 -07:00
David S. Miller
df2bc459a3 [UDP]: Revert 2-pass hashing changes.
This reverts changesets:

6aaf47fa48
b7b5f487ab
de34ed91c4
fc038410b4

There are still some correctness issues recently
discovered which do not have a known fix that doesn't
involve doing a full hash table scan on port bind.

So revert for now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:40:50 -07:00
Patrick McHardy
ef7c79ed64 [NETLINK]: Mark netlink policies const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:40:10 -07:00
Patrick McHardy
f0e48dbfc5 [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack
A time_wait socket inherits sk_bound_dev_if from the original socket,
but it is not used when sending ACK packets using ip_send_reply.

Fix by passing the oif to ip_send_reply in struct ip_reply_arg and
use it for output routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:38:51 -07:00
David S. Miller
1c92b4e50e [AF_UNIX]: Make socket locking much less confusing.
The unix_state_*() locking macros imply that there is some
rwlock kind of thing going on, but the implementation is
actually a spinlock which makes the code more confusing than
it needs to be.

So use plain unix_state_lock and unix_state_unlock.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03 18:08:40 -07:00
Pavel Emelianov
e4fd5da39f [TCP]: Consolidate checking for tcp orphan count being too big.
tcp_out_of_resources() and tcp_close() perform the
same checking of number of orphan sockets. Move this
code into common place.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31 01:23:34 -07:00
Arnaldo Carvalho de Melo
4e07a91c37 [SOCK]: Shrink struct sock by 8 bytes on 64-bit.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31 01:23:32 -07:00
David S. Miller
01e67d08fa [XFRM]: Allow XFRM_ACQ_EXPIRES to be tunable via sysctl.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31 01:23:23 -07:00
David S. Miller
14e50e57ae [XFRM]: Allow packet drops during larval state resolution.
The current IPSEC rule resolution behavior we have does not work for a
lot of people, even though technically it's an improvement from the
-EAGAIN buisness we had before.

Right now we'll block until the key manager resolves the route.  That
works for simple cases, but many folks would rather packets get
silently dropped until the key manager resolves the IPSEC rules.

We can't tell these folks to "set the socket non-blocking" because
they don't have control over the non-block setting of things like the
sockets used to resolve DNS deep inside of the resolver libraries in
libc.

With that in mind I coded up the patch below with some help from
Herbert Xu which provides packet-drop behavior during larval state
resolution, controllable via sysctl and off by default.

This lays the framework to either:

1) Make this default at some point or...

2) Move this logic into xfrm{4,6}_policy.c and implement the
   ARP-like resolution queue we've all been dreaming of.
   The idea would be to queue packets to the policy, then
   once the larval state is resolved by the key manager we
   re-resolve the route and push the packets out.  The
   packets would timeout if the rule didn't get resolved
   in a certain amount of time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24 18:17:54 -07:00
Marcel Holtmann
5dee9e7c4c [Bluetooth] Fix L2CAP configuration parameter handling
The L2CAP configuration parameter handling was missing the support
for rejecting unknown options. The capability to reject unknown
options is mandatory since the Bluetooth 1.2 specification. This
patch implements its and also simplifies the parameter parsing.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-05-24 14:27:19 +02:00
Yasuyuki Kozakai
fda6143683 [NETFILTER]: nf_conntrack: Removes unused destroy operation of l3proto
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:46 -07:00
Yasuyuki Kozakai
c874d5f726 [NETFILTER]: nf_conntrack: Removes duplicated declarations
These are also in include/net/netfilter/nf_conntrack_helper.h

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:45 -07:00
Yasuyuki Kozakai
ba4c7cbadd [NETFILTER]: nf_nat: remove unused argument of function allocating binding
nf_nat_rule_find, alloc_null_binding and alloc_null_binding_confirmed
do not use the argument 'info', which is actually ct->nat.info.
If they are necessary to access it again, we can use the argument 'ct'
instead.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:44 -07:00
David S. Miller
fc038410b4 [UDP]: Fix AF-specific references in AF-agnostic code.
__udp_lib_port_inuse() cannot make direct references to
inet_sk(sk)->rcv_saddr as that is ipv4 specific state and
this code is used by ipv6 too.

Use an operations vector to solve this, and this also paves
the way for ipv6 support for non-wild saddr hashing in UDP.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:22 -07:00
Jeff Garzik
2c4f365ad2 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream 2007-05-09 18:54:49 -04:00
John Anthony Kazos Jr
121e70b69a include files: convert "include" subdirectory to UTF-8
Convert the "include" subdirectory to UTF-8.

Signed-off-by: John Anthony Kazos Jr. <jakj@j-a-k-j.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:58:21 +02:00
Christoph Hellwig
6272e26679 cleanup compat ioctl handling
Merge all compat ioctl handling into compat_ioctl.c instead of splitting it
over compat.c and compat_ioctl.c.  This also allows to get rid of ioctl32.h

Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks-good-to: Andi Kleen <ak@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Larry Finger
f5cdf30618 [PATCH] ieee80211: add ieee80211_channel_to_freq
The routines that interrogate the ieee80211_geo struct are missing a
channel to frequency entry. This patch adds it.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-05-08 11:51:59 -04:00
Jiri Benc
f0706e828e [MAC80211]: Add mac80211 wireless stack.
Add mac80211, the IEEE 802.11 software MAC layer.

Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-05-05 11:45:53 -07:00
Vlad Yasevich
07d9396771 [SCTP]: Set assoc_id correctly during INIT collision.
During the INIT/COOKIE-ACK collision cases, it's possible to get
into a situation where the association id is not yet set at the time
of the user event generation.  As a result, user events have an
association id set to 0 which will confuse applications.

This happens if we hit case B of duplicate cookie processing.
In the particular example found and provided by Oscar Isaula
<Oscar.Isaula@motorola.com>, flow looks like this:
A				B
---- INIT------->  (lost)
	    <---------INIT------
---- INIT-ACK--->
	    <------ Cookie ECHO

When the Cookie Echo is received, we end up trying to update the
association that was created on A as a result of the (lost) INIT,
but that association doesn't have the ID set yet.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 13:55:27 -07:00
Sridhar Samudrala
827bf12236 [SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 13:36:30 -07:00
Jamal Hadi Salim
5a6d34162f [XFRM] SPD info TLV aggregation
Aggregate the SPD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:55:39 -07:00
Jamal Hadi Salim
af11e31609 [XFRM] SAD info TLV aggregationx
Aggregate the SAD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:55:13 -07:00
Jennifer Hunt
561e036006 [AF_IUCV]: Implementation of a skb backlog queue
With the inital implementation we missed to implement a skb backlog
queue . The result is that socket receive processing tossed packets.
Since AF_IUCV connections are working synchronously it leads to
connection hangs. Problems with read, close and select also occured.

Using a skb backlog queue is fixing all of these problems .

Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:22:07 -07:00
Eric Dumazet
db3459d1a7 [IPV6]: Some cleanups in include/net/ipv6.h
1) struct ip6_flowlabel : moves 'users' field to avoid two 32bits
   holes for 64bit arches. Shrinks by 8 bytes sizeof(struct
   ip6_flowlabel)

2) ipv6_addr_cmp() and ipv6_addr_copy() dont need (void *) casts :
   Compiler might take into account natural alignement of in6_addr
   structs to emit better code for memcpy()/memcmp() Casts to (void *)
   force byte accesses.

3) ipv6_addr_prefix() optimization :

Better to clear whole struct, as compiler can emit better code for
memset(addr, 0, 16) (2 stores on x86_64), and avoid some conditional
branches.

# size vmlinux.after vmlinux.before
   text    data     bss     dec     hex filename
5262262  647612  557432 6467306  62aeea vmlinux.after
5262550  647612  557432 6467594  62b00a vmlinux.before

thats 288 bytes saved.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 17:39:04 -07:00
Ilpo Järvinen
0ec96822d5 [TCP]: Use S+L catcher only with SACK for now
TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:30:34 -07:00
Eric Dumazet
709525fad8 [IPV6]: Get rid of __HAVE_ARCH_ADDR_SET.
__HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:08:43 -07:00
Linus Torvalds
152a6a9da1 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts
  [IPV4] SNMP: Support InMcastPkts and InBcastPkts
  [IPV4] SNMP: Support InTruncatedPkts
  [IPV4] SNMP: Support InNoRoutes
  [SNMP]: Add definitions for {In,Out}BcastPkts
  [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
  [TCP] FRTO: Delay skb available check until it's mandatory
  [XFRM]: Restrict upper layer information by bundle.
  [TCP]: Catch skb with S+L bugs earlier
  [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo
  [L2TP]: Add the ability to autoload a pppox protocol module.
  [SKB]: Introduce skb_queue_walk_safe()
  [AF_IUCV/IUCV]: smp_call_function deadlock
  [IPV6]: Fix slab corruption running ip6sic
  [TCP]: Update references in two old comments
  [XFRM]: Export SPD info
  [IPV6]: Track device renames in snmp6.
  [SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage.
  [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats.
  [NETPOLL]: Remove CONFIG_NETPOLL_RX
  ...
2007-04-30 08:14:42 -07:00
Ilpo Järvinen
d551e4541d [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
This is a corner case where less than MSS sized new data thingie
is awaiting in the send queue. For F-RTO to work correctly, a
new data segment must be sent at certain point or F-RTO cannot
be used at all. RFC4138 allows overriding of Nagle at that
point.

Implementation uses frto_counter states 2 and 3 to distinguish
when Nagle override is needed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:16 -07:00
Masahide NAKAMURA
157bfc2502 [XFRM]: Restrict upper layer information by bundle.
On MIPv6 usage, XFRM sub policy is enabled.
When main (IPsec) and sub (MIPv6) policy selectors have the same
address set but different upper layer information (i.e. protocol
number and its ports or type/code), multiple bundle should be created.
However, currently we have issue to use the same bundle created for
the first time with all flows covered by the case.

It is useful for the bundle to have the upper layer information
to be restructured correctly if it does not match with the flow.

1. Bundle was created by two policies
Selector from another policy is added to xfrm_dst.
If the flow does not match the selector, it goes to slow path to
restructure new bundle by single policy.

2. Bundle was created by one policy
Flow cache is added to xfrm_dst as originated one. If the flow does
not match the cache, it goes to slow path to try searching another
policy.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:09 -07:00
Ilpo Järvinen
34588b4c04 [TCP]: Catch skb with S+L bugs earlier
SACKED_ACKED and LOST are mutually exclusive with SACK, thus
having their sum larger than packets_out is bug with SACK.
Eventually these bugs trigger traps in the tcp_clean_rtx_queue
with SACK but it's much more informative to do this here.

Non-SACK TCP, however, could get more than packets_out duplicate
ACKs which each increment sacked_out, so it makes sense to do
this kind of limitting for non-SACK TCP but not for SACK enabled
one. Perhaps the author had the opposite in mind but did the
logic accidently wrong way around? Anyway, the sacked_out
incrementer code for non-SACK already deals this issue before
calling sync_left_out so this trapping can be done
unconditionally.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:57:33 -07:00
Martin Schwidefsky
04b090d50c [AF_IUCV/IUCV]: smp_call_function deadlock
Calling smp_call_function can lead to a deadlock if it is called
from tasklet context. 
Fixing this deadlock requires to move the smp_call_function from the
tasklet context to a work queue. To do that queue the path pending
interrupts to a separate list and move the path cleanup out of
iucv_path_sever to iucv_path_connect and iucv_path_pending.
This creates a new requirement for iucv_path_connect: it may not be
called from tasklet context anymore. 
Also fixed compile problem for CONFIG_HOTPLUG_CPU=n and
another one when walking the cpu_online mask. When doing this, 
we must disable cpu hotplug.

Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-28 23:03:59 -07:00
Jamal Hadi Salim
ecfd6b1837 [XFRM]: Export SPD info
With this patch you can use iproute2 in user space to efficiently see
how many policies exist in different directions.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-28 21:20:32 -07:00
Pavel Roskin
6693228da9 [PATCH] Remove comment about IEEE80211_RADIOTAP_FCS
IEEE80211_RADIOTAP_FCS is obsolete and should not be used.  It's no
longer defined.  Remove it from the comment too.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-04-28 11:01:03 -04:00
Jouni Malinen
85d32e7b0e [PATCH] Update my email address from jkmaline@cc.hut.fi to j@w1.fi
After 13 years of use, it looks like my email address is finally going
to disappear. While this is likely to drop the amount of incoming spam
greatly ;-), it may also affect more appropriate messages, so let's
update my email address in various places. In addition, Host AP mailing
list is subscribers-only and linux-wireless can also be used for
discussing issues related to this driver which is now shown in
MAINTAINERS.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-04-28 11:01:01 -04:00
Pavel Roskin
a0d69f229f [PATCH] sparse-annotate radiotap header
Document that all fields must be little endian.  Use annotated types
even in the comments.  Consistently use shorter type names (u8, s8).
Realign the comments.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-04-28 11:01:00 -04:00
Marcelo Tosatti
876c9d3aeb [PATCH] Marvell Libertas 8388 802.11b/g USB driver
Add the Marvell Libertas 8388 802.11 USB driver.

Signed-off-by: Marcelo Tosatti <marcelo@kvack.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-04-28 11:00:54 -04:00
David Howells
b8b8fd2dc2 [NET]: Fix networking compilation errors
Fix miscellaneous networking compilation errors.

 (*) Export ktime_add_ns() for modules.

 (*) wext_proc_init() should have an ANSI declaration.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-27 15:31:24 -07:00
Johannes Berg
295f4a1fa3 [WEXT]: Clean up how wext is called.
This patch cleans up the call paths from the core code into wext.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 20:43:56 -07:00
David Howells
651350d10f [AF_RXRPC]: Add an interface to the AF_RXRPC module for the AFS filesystem to use
Add an interface to the AF_RXRPC module so that the AFS filesystem module can
more easily make use of the services available.  AFS still opens a socket but
then uses the action functions in lieu of sendmsg() and registers an intercept
functions to grab messages before they're queued on the socket Rx queue.

This permits AFS (or whatever) to:

 (1) Avoid the overhead of using the recvmsg() call.

 (2) Use different keys directly on individual client calls on one socket
     rather than having to open a whole slew of sockets, one for each key it
     might want to use.

 (3) Avoid calling request_key() at the point of issue of a call or opening of
     a socket.  This is done instead by AFS at the point of open(), unlink() or
     other VFS operation and the key handed through.

 (4) Request the use of something other than GFP_KERNEL to allocate memory.

Furthermore:

 (*) The socket buffer markings used by RxRPC are made available for AFS so
     that it can interpret the cooked RxRPC messages itself.

 (*) rxgen (un)marshalling abort codes are made available.


The following documentation for the kernel interface is added to
Documentation/networking/rxrpc.txt:

=========================
AF_RXRPC KERNEL INTERFACE
=========================

The AF_RXRPC module also provides an interface for use by in-kernel utilities
such as the AFS filesystem.  This permits such a utility to:

 (1) Use different keys directly on individual client calls on one socket
     rather than having to open a whole slew of sockets, one for each key it
     might want to use.

 (2) Avoid having RxRPC call request_key() at the point of issue of a call or
     opening of a socket.  Instead the utility is responsible for requesting a
     key at the appropriate point.  AFS, for instance, would do this during VFS
     operations such as open() or unlink().  The key is then handed through
     when the call is initiated.

 (3) Request the use of something other than GFP_KERNEL to allocate memory.

 (4) Avoid the overhead of using the recvmsg() call.  RxRPC messages can be
     intercepted before they get put into the socket Rx queue and the socket
     buffers manipulated directly.

To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket,
bind an addess as appropriate and listen if it's to be a server socket, but
then it passes this to the kernel interface functions.

The kernel interface functions are as follows:

 (*) Begin a new client call.

	struct rxrpc_call *
	rxrpc_kernel_begin_call(struct socket *sock,
				struct sockaddr_rxrpc *srx,
				struct key *key,
				unsigned long user_call_ID,
				gfp_t gfp);

     This allocates the infrastructure to make a new RxRPC call and assigns
     call and connection numbers.  The call will be made on the UDP port that
     the socket is bound to.  The call will go to the destination address of a
     connected client socket unless an alternative is supplied (srx is
     non-NULL).

     If a key is supplied then this will be used to secure the call instead of
     the key bound to the socket with the RXRPC_SECURITY_KEY sockopt.  Calls
     secured in this way will still share connections if at all possible.

     The user_call_ID is equivalent to that supplied to sendmsg() in the
     control data buffer.  It is entirely feasible to use this to point to a
     kernel data structure.

     If this function is successful, an opaque reference to the RxRPC call is
     returned.  The caller now holds a reference on this and it must be
     properly ended.

 (*) End a client call.

	void rxrpc_kernel_end_call(struct rxrpc_call *call);

     This is used to end a previously begun call.  The user_call_ID is expunged
     from AF_RXRPC's knowledge and will not be seen again in association with
     the specified call.

 (*) Send data through a call.

	int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg,
				   size_t len);

     This is used to supply either the request part of a client call or the
     reply part of a server call.  msg.msg_iovlen and msg.msg_iov specify the
     data buffers to be used.  msg_iov may not be NULL and must point
     exclusively to in-kernel virtual addresses.  msg.msg_flags may be given
     MSG_MORE if there will be subsequent data sends for this call.

     The msg must not specify a destination address, control data or any flags
     other than MSG_MORE.  len is the total amount of data to transmit.

 (*) Abort a call.

	void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code);

     This is used to abort a call if it's still in an abortable state.  The
     abort code specified will be placed in the ABORT message sent.

 (*) Intercept received RxRPC messages.

	typedef void (*rxrpc_interceptor_t)(struct sock *sk,
					    unsigned long user_call_ID,
					    struct sk_buff *skb);

	void
	rxrpc_kernel_intercept_rx_messages(struct socket *sock,
					   rxrpc_interceptor_t interceptor);

     This installs an interceptor function on the specified AF_RXRPC socket.
     All messages that would otherwise wind up in the socket's Rx queue are
     then diverted to this function.  Note that care must be taken to process
     the messages in the right order to maintain DATA message sequentiality.

     The interceptor function itself is provided with the address of the socket
     and handling the incoming message, the ID assigned by the kernel utility
     to the call and the socket buffer containing the message.

     The skb->mark field indicates the type of message:

	MARK				MEANING
	===============================	=======================================
	RXRPC_SKB_MARK_DATA		Data message
	RXRPC_SKB_MARK_FINAL_ACK	Final ACK received for an incoming call
	RXRPC_SKB_MARK_BUSY		Client call rejected as server busy
	RXRPC_SKB_MARK_REMOTE_ABORT	Call aborted by peer
	RXRPC_SKB_MARK_NET_ERROR	Network error detected
	RXRPC_SKB_MARK_LOCAL_ERROR	Local error encountered
	RXRPC_SKB_MARK_NEW_CALL		New incoming call awaiting acceptance

     The remote abort message can be probed with rxrpc_kernel_get_abort_code().
     The two error messages can be probed with rxrpc_kernel_get_error_number().
     A new call can be accepted with rxrpc_kernel_accept_call().

     Data messages can have their contents extracted with the usual bunch of
     socket buffer manipulation functions.  A data message can be determined to
     be the last one in a sequence with rxrpc_kernel_is_data_last().  When a
     data message has been used up, rxrpc_kernel_data_delivered() should be
     called on it..

     Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose
     of.  It is possible to get extra refs on all types of message for later
     freeing, but this may pin the state of a call until the message is finally
     freed.

 (*) Accept an incoming call.

	struct rxrpc_call *
	rxrpc_kernel_accept_call(struct socket *sock,
				 unsigned long user_call_ID);

     This is used to accept an incoming call and to assign it a call ID.  This
     function is similar to rxrpc_kernel_begin_call() and calls accepted must
     be ended in the same way.

     If this function is successful, an opaque reference to the RxRPC call is
     returned.  The caller now holds a reference on this and it must be
     properly ended.

 (*) Reject an incoming call.

	int rxrpc_kernel_reject_call(struct socket *sock);

     This is used to reject the first incoming call on the socket's queue with
     a BUSY message.  -ENODATA is returned if there were no incoming calls.
     Other errors may be returned if the call had been aborted (-ECONNABORTED)
     or had timed out (-ETIME).

 (*) Record the delivery of a data message and free it.

	void rxrpc_kernel_data_delivered(struct sk_buff *skb);

     This is used to record a data message as having been delivered and to
     update the ACK state for the call.  The socket buffer will be freed.

 (*) Free a message.

	void rxrpc_kernel_free_skb(struct sk_buff *skb);

     This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC
     socket.

 (*) Determine if a data message is the last one on a call.

	bool rxrpc_kernel_is_data_last(struct sk_buff *skb);

     This is used to determine if a socket buffer holds the last data message
     to be received for a call (true will be returned if it does, false
     if not).

     The data message will be part of the reply on a client call and the
     request on an incoming call.  In the latter case there will be more
     messages, but in the former case there will not.

 (*) Get the abort code from an abort message.

	u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb);

     This is used to extract the abort code from a remote abort message.

 (*) Get the error number from a local or network error message.

	int rxrpc_kernel_get_error_number(struct sk_buff *skb);

     This is used to extract the error number from a message indicating either
     a local error occurred or a network error occurred.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:50:17 -07:00
David Howells
17926a7932 [AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both
Provide AF_RXRPC sockets that can be used to talk to AFS servers, or serve
answers to AFS clients.  KerberosIV security is fully supported.  The patches
and some example test programs can be found in:

	http://people.redhat.com/~dhowells/rxrpc/

This will eventually replace the old implementation of kernel-only RxRPC
currently resident in net/rxrpc/.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:48:28 -07:00
Adrian Bunk
42bad1da50 [NETLINK]: Possible cleanups.
- make the following needlessly global variables static:
  - core/rtnetlink.c: struct rtnl_msg_handlers[]
  - netfilter/nf_conntrack_proto.c: struct nf_ct_protos[]
- make the following needlessly global functions static:
  - core/rtnetlink.c: rtnl_dump_all()
  - netlink/af_netlink.c: netlink_queue_skip()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 00:57:41 -07:00
Jamal Hadi Salim
28d8909bc7 [XFRM]: Export SAD info.
On a system with a lot of SAs, counting SAD entries chews useful
CPU time since you need to dump the whole SAD to user space;
i.e something like ip xfrm state ls | grep -i src | wc -l
I have seen taking literally minutes on a 40K SAs when the system
is swapping.
With this patch, some of the SAD info (that was already being tracked)
is exposed to user space. i.e you do:
ip xfrm state count
And you get the count; you can also pass -s to the command line and
get the hash info.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 00:10:29 -07:00
Herbert Xu
7f7d9a6b96 [IPV6]: Consolidate common SNMP code
This patch moves the non-proc SNMP code into addrconf.c and reuses
IPv4 SNMP code where applicable.

As a result we can skip proc.o if /proc is disabled.

Note that I've made a number of functions static since they're only
used by addrconf.c for now.  If they ever get used elsewhere we can
always remove the static.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:52 -07:00
Herbert Xu
5e0f04351d [IPV4]: Consolidate common SNMP code
This patch moves the SNMP code shared between IPv4/IPv6 from proc.c
into net/ipv4/af_inet.c.  This makes sense because these functions
aren't specific to /proc.

As a result we can again skip proc.o if /proc is disabled.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:51 -07:00
Johannes Berg
43fb45cb79 [WIRELESS] cfg80211: Update comment for locking.
This patch adds a comment that was part of my rtnl locking patch for
cfg80211 but which I forgot for the merge.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:48 -07:00
Stephen Hemminger
164891aadf [TCP]: Congestion control API update.
Do some simple changes to make congestion control API faster/cleaner.
* use ktime_t rather than timeval
* merge rtt sampling into existing ack callback
  this means one indirect call versus two per ack.
* use flags bits to store options/settings

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:45 -07:00
Johannes Berg
9e101eab15 [WIRELESS]: Remove wext over netlink.
As scheduled, this patch removes the pointless wext over netlink code.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:42 -07:00
Johannes Berg
704232c271 [WIRELESS] cfg80211: New wireless config infrastructure.
This patch creates the core cfg80211 code along with some sysfs bits.
This is a stripped down version to allow mac80211 to function, but
doesn't include any configuration yet except for creating and removing
virtual interfaces.

This patch includes the nl80211 header file but it only contains the
interface types which the cfg80211 interface for creating virtual
interfaces relies on.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:41 -07:00
YOSHIFUJI Hideaki
97fc8d0bc5 [IPV6] SNMP: Use put_unaligned() instead of memcpy().
Hint from David Miller <davem@davemloft.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:37 -07:00
YOSHIFUJI Hideaki
2334e97355 [IPV6] SNMP: Avoid unaligned accesses.
Because stats pointer may not be aligned for u64, use memcpy
to fill u64 values.
Issue reported by David Miller <davem@davemloft.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-04-25 22:29:35 -07:00
Ilpo Järvinen
9e412ba763 [TCP]: Sed magic converts func(sk, tp, ...) -> func(sk, ...)
This is (mostly) automated change using magic:

sed -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
    -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
    -e 's|struct sock \*sk,[\n\t ]*struct tcp_sock \*tp\([^{]*\n{\n\)|
	  struct sock \*sk\1\tstruct tcp_sock *tp = tcp_sk(sk);\n|g'
    -e 's|struct sock \*sk, struct tcp_sock \*tp|
	  struct sock \*sk|g' -e 's|sk, tp\([^-]\)|sk\1|g'

Fixed four unused variable (tp) warnings that were introduced.

In addition, manually added newlines after local variables and
tweaked function arguments positioning.

$ gcc --version
gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
...
$ codiff -fV built-in.o.old built-in.o.new
net/ipv4/route.c:
  rt_cache_flush |  +14
 1 function changed, 14 bytes added

net/ipv4/tcp.c:
  tcp_setsockopt |   -5
  tcp_sendpage   |  -25
  tcp_sendmsg    |  -16
 3 functions changed, 46 bytes removed

net/ipv4/tcp_input.c:
  tcp_try_undo_recovery |   +3
  tcp_try_undo_dsack    |   +2
  tcp_mark_head_lost    |  -12
  tcp_ack               |  -15
  tcp_event_data_recv   |  -32
  tcp_rcv_state_process |  -10
  tcp_rcv_established   |   +1
 7 functions changed, 6 bytes added, 69 bytes removed, diff: -63

net/ipv4/tcp_output.c:
  update_send_head          |   -9
  tcp_transmit_skb          |  +19
  tcp_cwnd_validate         |   +1
  tcp_write_wakeup          |  -17
  __tcp_push_pending_frames |  -25
  tcp_push_one              |   -8
  tcp_send_fin              |   -4
 7 functions changed, 20 bytes added, 63 bytes removed, diff: -43

built-in.o.new:
 18 functions changed, 40 bytes added, 178 bytes removed, diff: -138

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:34 -07:00
Andi Kleen
9958089a43 [NET]: Move sk_setup_caps() out of line.
It is far too large to be an inline and not in any hot paths.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:26 -07:00
Andi Kleen
4ac02bab77 [TCP]: Uninline tcp_done().
The function is quite big and has several call sites and nothing
to collapse by compiler optimization on inlining.

Besides it's nicer to read in a in .c file.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:25 -07:00
YOSHIFUJI Hideaki
334901700f [IPV4] SNMP: Move some statistic bits to net/ipv4/proc.c.
This also fixes memory leak in error path.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:12 -07:00
YOSHIFUJI Hideaki
bf99f1bde3 [IPV6] SNMP: Netlink interface.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:10 -07:00
Patrick McHardy
0463d4ae25 [NET_SCHED]: Eliminate qdisc_tree_lock
Since we're now holding the rtnl during the entire dump operation, we
can remove qdisc_tree_lock, whose only purpose is to protect dump
callbacks from concurrent changes to the qdisc tree.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:07 -07:00
Herbert Xu
604763722c [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY
When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
treat it as such in the stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:43 -07:00
Patrick McHardy
c5c2523893 [XFRM]: Optimize MTU calculation
Replace the probing based MTU estimation, which usually takes 2-3 iterations
to find a fitting value and may underestimate the MTU, by an exact calculation.

Also fix underestimation of the XFRM trailer_len, which causes unnecessary
reallocations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:38 -07:00
David Howells
716ea3a7aa [NET]: Move generic skbuff stuff from XFRM code to generic code
Move generic skbuff stuff from XFRM code to generic code so that
AF_RXRPC can use it too.

The kdoc comments I've attached to the functions needs to be checked
by whoever wrote them as I had to make some guesses about the workings
of these functions.

Signed-off-By: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:33 -07:00
Arnaldo Carvalho de Melo
2a123b86e2 [BLUETOOTH]: Introduce skb->data accessor methods for hci_{acl,event,sco}_hdr
For consistency with other skb data accessors, reducing the number of direct
accesses to skb->data.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-04-25 22:28:21 -07:00
Thomas Graf
73417f617a [NET] fib_rules: Flush route cache after rule modifications
The results of FIB rules lookups are cached in the routing cache
except for IPv6 as no such cache exists. So far, it was the
responsibility of the user to flush the cache after modifying any
rules. This lead to many false bug reports due to misunderstanding
of this concept.

This patch automatically flushes the route cache after inserting
or deleting a rule.

Thanks to Muli Ben-Yehuda <muli@il.ibm.com> for catching a bug
in the previous patch.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:18 -07:00
Thomas Graf
0947c9fe56 [NET] fib_rules: goto rule action
This patch adds a new rule action FR_ACT_GOTO which allows
to skip a set of rules by jumping to another rule. The rule
to jump to is specified via the FRA_GOTO attribute which
carries a rule preference.

Referring to a rule which doesn't exists is explicitely allowed.
Such goto rules are marked with the flag FIB_RULE_UNRESOLVED
and will act like a rule with a non-matching selector. The rule
will become functional as soon as its target is present.

The goto action enables performance optimizations by reducing
the average number of rules that have to be passed per lookup.

Example:
0:      from all lookup local
40:     not from all to 192.168.23.128 goto 32766
41:     from all fwmark 0xa blackhole
42:     from all fwmark 0xff blackhole
32766:  from all lookup main

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:12 -07:00
David S. Miller
b3da2cf37c [INET]: Use jhash + random secret for ehash.
The days are gone when this was not an issue, there are folks out
there with huge bot networks that can be used to attack the
established hash tables on remote systems.

So just like the routing cache and connection tracking
hash, use Jenkins hash with random secret input.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:06 -07:00
Johannes Berg
d30045a0bc [NETLINK]: introduce NLA_BINARY type
This patch introduces a new NLA_BINARY attribute policy type with the
verification of simply checking the maximum length of the payload.

It also fixes a small typo in the example.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:05 -07:00
Vlad Yasevich
703315712c [SCTP]: Implement SCTP_MAX_BURST socket option.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:04 -07:00
Vlad Yasevich
a5a35e7675 [SCTP]: Implement sac_info field in SCTP_ASSOC_CHANGE notification.
As stated in the sctp socket api draft:

   sac_info: variable

   If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received
   for this association, sac_info[] contains the complete ABORT chunk as
   defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7.

We now save received ABORT chunks into the sac_info field and pass that
to the user.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:03 -07:00
Vlad Yasevich
bdf3092af6 [SCTP]: Honor flags when setting peer address parameters
Parameters only take effect when a corresponding flag bit is set
and a value is specified. This means we need to check the flags
in addition to checking for non-zero value.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:02 -07:00
Vlad Yasevich
1ae4114dce [SCTP]: Implement SCTP_ADDR_CONFIRMED state for ADDR_CHNAGE event
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:01 -07:00
Vlad Yasevich
d49d91d79a [SCTP]: Implement SCTP_PARTIAL_DELIVERY_POINT option.
This option induces partial delivery to run as soon
as the specified amount of data has been accumulated on
the association.  However, we give preference to fully
reassembled messages over PD messages.  In any case,
window and buffer is freed up.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@.hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:00 -07:00
Vlad Yasevich
b6e1331f3c [SCTP]: Implement SCTP_FRAGMENT_INTERLEAVE socket option
This option was introduced in draft-ietf-tsvwg-sctpsocket-13.  It
prevents head-of-line blocking in the case of one-to-many endpoint.
Applications enabling this option really must enable SCTP_SNDRCV event
so that they would know where the data belongs.  Based on an
earlier patch by Ivan Skytte Jørgensen.

Additionally, this functionality now permits multiple associations
on the same endpoint to enter Partial Delivery.  Applications should
be extra careful, when using this functionality, to track EOR indicators.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:59 -07:00
Patrick McHardy
a48b5a6144 [NET_SCHED]: Unline tcf_destroy
Uninline tcf_destroy and add a helper function to destroy an entire filter
chain.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:56 -07:00
Patrick McHardy
3bebcda280 [NET_SCHED]: turn PSCHED_GET_TIME into inline function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:55 -07:00
Patrick McHardy
03cc45c0a5 [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function
Also rename to psched_tdiff_bounded.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:54 -07:00
Patrick McHardy
8edc0c31d6 [NET_SCHED]: kill PSCHED_TDIFF
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:53 -07:00
Patrick McHardy
a084980dcb [NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT
Use direct assignment and comparison instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:51 -07:00
Patrick McHardy
104e087898 [NET_SCHED]: kill PSCHED_TLESS
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:50 -07:00
Patrick McHardy
7c59e25f31 [NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:49 -07:00
Patrick McHardy
26e252df1e [NET_SCHED]: kill PSCHED_AUDIT_TDIFF
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:48 -07:00
Thomas Graf
1d00a4eb42 [NETLINK]: Remove error pointer from netlink message handler
The error pointer argument in netlink message handlers is used
to signal the special case where processing has to be interrupted
because a dump was started but no error happened. Instead it is
simpler and more clear to return -EINTR and have netlink_run_queue()
deal with getting the queue right.

nfnetlink passed on this error pointer to its subsystem handlers
but only uses it to signal the start of a netlink dump. Therefore
it can be removed there as well.

This patch also cleans up the error handling in the affected
message handlers to be consistent since it had to be touched anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:30 -07:00
Thomas Graf
c454673da7 [NET] rules: Unified rules dumping
Implements a unified, protocol independant rules dumping function
which is capable of both, dumping a specific protocol family or
all of them. This speeds up dumping as less lookups are required.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:17 -07:00
Thomas Graf
c127ea2c45 [IPv6]: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:13 -07:00
Thomas Graf
fa34ddd739 [DECNet]: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:12 -07:00
Thomas Graf
be577ddc2b [PKT_SCHED] qdisc: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:09 -07:00
Thomas Graf
63f3444fb9 [IPv4]: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:08 -07:00
Thomas Graf
9d9e6a5819 [NET] rules: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:07 -07:00
Thomas Graf
c8822a4e00 [NEIGH]: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:06 -07:00
Thomas Graf
e284986385 [RTNL]: Message handler registration interface
This patch adds a new interface to register rtnetlink message
handlers replacing the exported rtnl_links[] array which
required many message handlers to be exported unnecessarly.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:04 -07:00
Arnaldo Carvalho de Melo
dc5fc579b9 [NETLINK]: Use nlmsg_trim() where appropriate
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:37 -07:00
Arnaldo Carvalho de Melo
27a884dc3c [SK_BUFF]: Convert skb->tail to sk_buff_data_t
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:28 -07:00
Patrick McHardy
00c04af9df [NET_SCHED]: kill jiffie conversion macros
Now that all packet schedulers have been converted to hrtimers most users
of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use
it to convert external time units to packet scheduler clock ticks, so use
PSCHED_TICKS_PER_SEC instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:14 -07:00
Patrick McHardy
4179477f63 [NET_SCHED]: Add hrtimer based qdisc watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:05 -07:00
Patrick McHardy
641b9e0e8b [NET_SCHED]: Use ktime as clocksource
Get rid of the manual clock source selection mess and use ktime. Also
use a scalar representation, which allows to clean up pkt_sched.h a bit
more and results in less ktime_to_ns() calls in most cases.

The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite
inefficient by this patch, following patches will convert all qdiscs
to hrtimers and get rid of them entirely.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:04 -07:00
Patrick McHardy
010c7d6f86 [NETFILTER]: nf_conntrack: uninline notifier registration functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:46 -07:00
Patrick McHardy
a3c5029cf7 [NETFILTER]: nfnetlink: use mutex instead of semaphore
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:43 -07:00
Patrick McHardy
ac5357ebac [NETFILTER]: nf_conntrack: remove ugly hack in l4proto registration
Remove ugly special-casing of nf_conntrack_l4proto_generic, all it
wants is its sysctl tables registered, so do that explicitly in an
init function and move the remaining protocol initialization and
cleanup code to nf_conntrack_proto.c as well.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:40 -07:00
Patrick McHardy
587aa64163 [NETFILTER]: Remove IPv4 only connection tracking/NAT
Remove the obsolete IPv4 only connection tracking/NAT as scheduled in
feature-removal-schedule.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:34 -07:00
Arnaldo Carvalho de Melo
9c70220b73 [SK_BUFF]: Introduce skb_transport_header(skb)
For the places where we need a pointer to the transport header, it is
still legal to touch skb->h.raw directly if just adding to,
subtracting from or setting it to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo
aa8223c7bb [SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo
4bedb45203 [SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:22 -07:00
Arnaldo Carvalho de Melo
ea2ae17d64 [SK_BUFF]: Introduce skb_transport_offset()
For the quite common 'skb->h.raw - skb->data' sequence.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo
0660e03f6b [SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo
eddc9ec53b [SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo
c9bdd4b525 [IP]: Introduce ip_hdrlen()
For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open
coded skb->nh.iph uses, now to go after the rest...

Just out of curiosity, here are the idioms found to get the same result:

skb->nh.iph->ihl << 2
skb->nh.iph->ihl<<2
skb->nh.iph->ihl * 4
skb->nh.iph->ihl*4
(skb->nh.iph)->ihl * sizeof(u32)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:07 -07:00
Arnaldo Carvalho de Melo
d56f90a7c9 [SK_BUFF]: Introduce skb_network_header()
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo
c1d2bbe1cd [SK_BUFF]: Introduce skb_reset_network_header(skb)
For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can
later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo
37e6636669 [LLC]: Kill llc_set_pdu_hdr
We'll have skb_reset_network_header soon.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:42 -07:00
Arnaldo Carvalho de Melo
459a98ed88 [SK_BUFF]: Introduce skb_reset_mac_header(skb)
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:32 -07:00
Eric Dumazet
92f37fd2ee [NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support
Now that network timestamps use ktime_t infrastructure, we can add a new
SOL_SOCKET sockopt  SO_TIMESTAMPNS.

This command is similar to SO_TIMESTAMP, but permits transmission of
a 'timespec struct' instead of a 'timeval struct' control message.
(nanosecond resolution instead of microsecond)

Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP

A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are
mutually exclusive.

sock_recv_timestamp() became too big to be fully inlined so I added a
__sock_recv_timestamp() helper function.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: linux-arch@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:21 -07:00
Stephen Hemminger
a2a316fd06 [NET]: Replace CONFIG_NET_DEBUG with sysctl.
Covert network warning messages from a compile time to runtime choice.
Removes kernel config option and replaces it with new /proc/sys/net/core/warnings.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:05 -07:00
Eric Dumazet
ae40eb1ef3 [NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:04 -07:00
David S. Miller
fe067e8ab5 [TCP]: Abstract out all write queue operations.
This allows the write queue implementation to be changed,
for example, to one which allows fast interval searching.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:02 -07:00
Herbert Xu
759e5d0064 [UDP]: Clean up UDP-Lite receive checksum
This patch eliminates some duplicate code for the verification of
receive checksums between UDP-Lite and UDP.  It does this by
introducing __skb_checksum_complete_head which is identical to
__skb_checksum_complete_head apart from the fact that it takes
a length parameter rather than computing the first skb->len bytes.

As a result UDP-Lite will be able to use hardware checksum offload
for packets which do not use partial coverage checksums.  It also
means that UDP-Lite loopback no longer does unnecessary checksum
verification.

If any NICs start support UDP-Lite this would also start working
automatically.

This patch removes the assumption that msg_flags has MSG_TRUNC clear
upon entry in recvmsg.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:51 -07:00
Neil Horman
95c385b4d5 [IPV6] ADDRCONF: Optimistic Duplicate Address Detection (RFC 4429) Support.
Nominally an autoconfigured IPv6 address is added to an interface in the
Tentative state (as per RFC 2462).  Addresses in this state remain in this
state while the Duplicate Address Detection process operates on them to
determine their uniqueness on the network.  During this period, these
tentative addresses may not be used for communication, increasing the time
before a node may be able to communicate on a network.  Using Optimistic
Duplicate Address Detection, autoconfigured addresses may be used
immediately for communication on the network, as long as certain rules are
followed to avoid conflicts with other nodes during the Duplicate Address
Detection process.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:43 -07:00
Eric Dumazet
b7aa0bf70c [NET]: convert network timestamps to ktime_t
We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:34 -07:00
James Morris
9d729f72dc [NET]: Convert xtime.tv_sec to get_seconds()
Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:32 -07:00
Eric Dumazet
fa438ccfdf [NET]: Keep sk_backlog near sk_lock
sk_backlog is a critical field of struct sock. (known famous words)

It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(),
tcp_v4_rcv(), sk_receive_skb().

It really makes sense to place it next to sk_lock, because sk_backlog is only
used after sk_lock locked (and thus memory cache line in L1 cache). This
should reduce cache misses and sk_lock acquisition time.

(In theory, we could only move the head pointer near sk_lock, and leaving tail
far away, because 'tail' is normally not so hot, but keep it simple :) )

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:27 -07:00
Ilpo Järvinen
3cfe3baaf0 [TCP]: Add two new spurious RTO responses to FRTO
New sysctl tcp_frto_response is added to select amongst these
responses:
	- Rate halving based; reuses CA_CWR state (default)
	- Very conservative; used to be the only one available (=1)
	- Undo cwr; undoes ssthresh and cwnd reductions (=2)

The response with rate halving requires a new parameter to
tcp_enter_cwr because FRTO has already reduced ssthresh and
doing a second reduction there has to be prevented. In addition,
to keep things nice on 80 cols screen, a local variable was
added.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:23 -07:00
John Heffner
886236c124 [TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:19 -07:00
Ilpo Järvinen
46d0de4ed9 [TCP] FRTO: Entry is allowed only during (New)Reno like recovery
This interpretation comes from RFC4138:
    "If the sender implements some loss recovery algorithm other
     than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD
     NOT be entered when earlier fast recovery is underway."

I think the RFC means to say (especially in the light of
Appendix B) that ...recovery is underway (not just fast recovery)
or was underway when it was interrupted by an earlier (F-)RTO
that hasn't yet been resolved (snd_una has not advanced enough).
Thus, my interpretation is that whenever TCP has ever
retransmitted other than head, basic version cannot be used
because then the order assumptions which are used as FRTO basis
do not hold.

NewReno has only the head segment retransmitted at a time.
Therefore, walk up to the segment that has not been SACKed, if
that segment is not retransmitted nor anything before it, we know
for sure, that nothing after the non-SACKed segment should be
either. This assumption is valid because TCPCB_EVER_RETRANS does
not leave holes but each non-SACKed segment is rexmitted
in-order.

Check for retrans_out > 1 avoids more expensive walk through the
skb list, as we can know the result beforehand: F-RTO will not be
allowed.

SACKed skb can turn into non-SACked only in the extremely rare
case of SACK reneging, in this case we might fail to detect
retransmissions if there were them for any other than head. To
get rid of that feature, whole rexmit queue would have to be
walked (always) or FRTO should be prevented when SACK reneging
happens. Of course RTO should still trigger after reneging which
makes this issue even less likely to show up. And as long as the
response is as conservative as it's now, nothing bad happens even
then.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:12 -07:00
Ilpo Järvinen
bdaae17da8 [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c
In addition, removed inline.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:02 -07:00
Patrick McHardy
c01003c205 [IFB]: Fix crash on input device removal
The input_device pointer is not refcounted, which means the device may
disappear while packets are queued, causing a crash when ifb passes packets
with a stale skb->dev pointer to netif_rx().

Fix by storing the interface index instead and do a lookup where neccessary.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-29 11:46:52 -07:00
Jeff Garzik
a9c87a10db Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes 2007-03-28 02:21:18 -04:00
Jean Tourrilhes
c2805fbb86 [PATCH] WE-22 : prevent information leak on 64 bit
Johannes Berg discovered that kernel space was leaking to
userspace on 64 bit platform. He made a first patch to fix that. This
is an improved version of his patch.

Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-03-27 14:10:26 -04:00
David S. Miller
f11e6659ce [IPV6]: Fix routing round-robin locking.
As per RFC2461, section 6.3.6, item #2, when no routers on the
matching list are known to be reachable or probably reachable we
do round robin on those available routes so that we make sure
to probe as many of them as possible to detect when one becomes
reachable faster.

Each routing table has a rwlock protecting the tree and the linked
list of routes at each leaf.  The round robin code executes during
lookup and thus with the rwlock taken as a reader.  A small local
spinlock tries to provide protection but this does not work at all
for two reasons:

1) The round-robin list manipulation, as coded, goes like this (with
   read lock held):

	walk routes finding head and tail

	spin_lock();
	rotate list using head and tail
	spin_unlock();

   While one thread is rotating the list, another thread can
   end up with stale values of head and tail and then proceed
   to corrupt the list when it gets the lock.  This ends up causing
   the OOPS in fib6_add() later onthat many people have been hitting.

2) All the other code paths that run with the rwlock held as
   a reader do not expect the list to change on them, they
   expect it to remain completely fixed while they hold the
   lock in that way.

So, simply stated, it is impossible to implement this correctly using
a manipulation of the list without violating the rwlock locking
semantics.

Reimplement using a per-fib6_node round-robin pointer.  This way we
don't need to manipulate the list at all, and since the round-robin
pointer can only ever point to real existing entries we don't need
to perform any locking on the changing of the round-robin pointer
itself.  We only need to reset the round-robin pointer to NULL when
the entry it is pointing to is removed.

The idea is from Thomas Graf and it is very similar to how this
was implemented before the advanced router selection code when in.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:05 -07:00
Alexey Kuznetsov
ecbb416939 [NET]: Fix neighbour destructor handling.
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.

The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.

I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down.  But it
would be better to add explicit test for neigh->dead in any case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:01 -07:00
Thomas Graf
e1701c68c1 [NET]: Fix fib_rules compatibility breakage
Based upon a patch from Patrick McHardy.

The fib_rules netlink attribute policy introduced in 2.6.19 broke
userspace compatibilty. When specifying a rule with "from all"
or "to all", iproute adds a zero byte long netlink attribute,
but the policy requires all addresses to have a size equal to
sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a
validation error.

Check attribute length of FRA_SRC/FRA_DST in the generic framework
by letting the family specific rules implementation provide the
length of an address. Report an error if address length is non
zero but no address attribute is provided. Fix actual bug by
checking address length for non-zero instead of relying on
availability of attribute.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:00 -07:00
Vlad Yasevich
749bf9215e [SCTP]: Reset some transport and association variables on restart
If the association has been restarted, we need to reset the
transport congestion variables as well as accumulated error
counts and CACC variables.  If we do not, the association
will use the wrong values and may terminate prematurely.

This was found with a scenario where the peer restarted
the association when lksctp was in the last HB timeout for
its association.  The restart happened, but the error counts
have not been reset and when the timeout occurred, a newly
restarted association was terminated due to excessive
retransmits.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-20 00:09:45 -07:00
Vlad Yasevich
0b58a81146 [SCTP]: Clean up stale data during association restart
During association restart we may have stale data sitting
on the ULP queue waiting for ordering or reassembly.  This
data may cause severe problems if not cleaned up.  In particular
stale data pending ordering may cause problems with receive
window exhaustion if our peer has decided to restart the
association.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-20 00:09:43 -07:00
Eric Paris
ef41aaa0b7 [IPSEC]: xfrm_policy delete security check misplaced
The security hooks to check permissions to remove an xfrm_policy were
actually done after the policy was removed.  Since the unlinking and
deletion are done in xfrm_policy_by* functions this moves the hooks
inside those 2 functions.  There we have all the information needed to
do the security check and it can be done before the deletion.  Since
auditing requires the result of that security check err has to be passed
back and forth from the xfrm_policy_by* functions.

This patch also fixes a bug where a deletion that failed the security
check could cause improper accounting on the xfrm_policy
(xfrm_get_policy didn't have a put on the exit path for the hold taken
by xfrm_policy_by*)

It also fixes the return code when no policy is found in
xfrm_add_pol_expire.  In old code (at least back in the 2.6.18 days) err
wasn't used before the return when no policy is found and so the
initialization would cause err to be ENOENT.  But since err has since
been used above when we don't get a policy back from the xfrm_policy_by*
function we would always return 0 instead of the intended ENOENT.  Also
fixed some white space damage in the same area.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Venkat Yekkirala <vyekkirala@trustedcs.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-07 16:08:09 -08:00
David S. Miller
64a146513f [NET]: Revert incorrect accept queue backlog changes.
This reverts two changes:

8488df894d
248f06726e

A backlog value of N really does mean allow "N + 1" connections
to queue to a listening socket.  This allows one to specify
"0" as the backlog and still get 1 connection.

Noticed by Gerrit Renker and Rick Jones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-06 11:21:05 -08:00
Eric Dumazet
187f5f84ef [INET]: twcal_jiffie should be unsigned long, not int
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05 13:32:48 -08:00
Patrick McHardy
ec68e97ded [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops
Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling:

- unconfirmed entries can not be killed manually, they are removed on
  confirmation or final destruction of the conntrack entry, which means
  we might iterate forever without making forward progress.

  This can happen in combination with the conntrack event cache, which
  holds a reference to the conntrack entry, which is only released when
  the packet makes it all the way through the stack or a different
  packet is handled.

- taking references to an unconfirmed entry and using it outside the
  locked section doesn't work, the list entries are not refcounted and
  another CPU might already be waiting to destroy the entry

What the code really wants to do is make sure the references of the hash
table to the selected conntrack entries are released, so they will be
destroyed once all references from skbs and the event cache are dropped.

Since unconfirmed entries haven't even entered the hash yet, simply mark
them as dying and skip confirmation based on that.

Reported and tested by Chuck Ebbert <cebbert@redhat.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05 13:25:18 -08:00
Wei Dong
8488df894d [NET]: Fix bugs in "Whether sock accept queue is full" checking
when I use linux TCP socket, and find there is a bug in function
sk_acceptq_is_full().

	When a new SYN comes, TCP module first checks its validation. If valid,
send SYN,ACK to the client and add the sock to the syn hash table. Next
time if received the valid ACK for SYN,ACK from the client. server will
accept this connection and increase the sk->sk_ack_backlog -- which is
done in function tcp_check_req().We check wether acceptq is full in
function tcp_v4_syn_recv_sock().

Consider an example:

 After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to
1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept()
system call is not invoked now.

1. 1st connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept
this connection. Increase the sk->sk_ack_backlog
2. 2nd connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept
this connection. Increase the sk->sk_ack_backlog
3. 3rd connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. Refuse
this connection.

I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1
but now it can accept 2 connections.

Signed-off-by: Wei Dong <weid@np.css.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-02 20:37:33 -08:00
Patrick McHardy
4498121ca3 [NET]: Handle disabled preemption in gfp_any()
ctnetlink uses netlink_unicast from an atomic_notifier_chain
(which is called within a RCU read side critical section)
without holding further locks. netlink_unicast calls netlink_trim
with the result of gfp_any() for the gfp flags, which are passed
down to pskb_expand_header. gfp_any() only checks for softirq
context and returns GFP_KERNEL, resulting in this warning:

BUG: sleeping function called from invalid context at mm/slab.c:3032
in_atomic():1, irqs_disabled():0
no locks held by rmmod/7010.

Call Trace:
 [<ffffffff8109467f>] debug_show_held_locks+0x9/0xb
 [<ffffffff8100b0b4>] __might_sleep+0xd9/0xdb
 [<ffffffff810b5082>] __kmalloc+0x68/0x110
 [<ffffffff811ba8f2>] pskb_expand_head+0x4d/0x13b
 [<ffffffff81053147>] netlink_broadcast+0xa5/0x2e0
 [<ffffffff881cd1d7>] :nfnetlink:nfnetlink_send+0x83/0x8a
 [<ffffffff8834f6a6>] :nf_conntrack_netlink:ctnetlink_conntrack_event+0x94c/0x96a
 [<ffffffff810624d6>] notifier_call_chain+0x29/0x3e
 [<ffffffff8106251d>] atomic_notifier_call_chain+0x32/0x60
 [<ffffffff881d266d>] :nf_conntrack:destroy_conntrack+0xa5/0x1d3
 [<ffffffff881d194e>] :nf_conntrack:nf_ct_cleanup+0x8c/0x12c
 [<ffffffff881d4614>] :nf_conntrack:kill_l3proto+0x0/0x13
 [<ffffffff881d482a>] :nf_conntrack:nf_conntrack_l3proto_unregister+0x90/0x94
 [<ffffffff883551b3>] :nf_conntrack_ipv4:nf_conntrack_l3proto_ipv4_fini+0x2b/0x5d
 [<ffffffff8109d44f>] sys_delete_module+0x1b5/0x1e6
 [<ffffffff8105f245>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff8105911e>] system_call+0x7e/0x83

Since netlink_unicast is supposed to be callable from within RCU
read side critical sections, make gfp_any() check for in_atomic()
instead of in_softirq().

Additionally nfnetlink_send needs to use gfp_any() as well for the
call to netlink_broadcast).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-28 09:42:13 -08:00
Adrian Bunk
a39a21982c [IRDA] net/irda/: proper prototypes
This patch adds proper prototypes for some functions in
include/net/irda/irda.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26 11:42:43 -08:00
Kazunori MIYAZAWA
73d605d1ab [IPSEC]: changing API of xfrm6_tunnel_register
This patch changes xfrm6_tunnel register and deregister
interface to prepare for solving the conflict of device
tunnels with inter address family IPsec tunnel.
There is no device which conflicts with IPv4 over IPv6
IPsec tunnel.

Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13 12:55:55 -08:00
Kazunori MIYAZAWA
c0d56408e3 [IPSEC]: Changing API of xfrm4_tunnel_register.
This patch changes xfrm4_tunnel register and deregister
interface to prepare for solving the conflict of device
tunnels with inter address family IPsec tunnel.

Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13 12:54:47 -08:00
Patrick McHardy
fe3eb20c1a [NETFILTER]: nf_conntrack: change nf_conntrack_l[34]proto_unregister to void
No caller checks the return value, and since its usually called within the
module unload path there's nothing a module could do about errors anyway,
so BUG on invalid conditions and return void.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:14:28 -08:00
Patrick McHardy
c0e912d7ed [NETFILTER]: nf_conntrack: fix invalid conntrack statistics RCU assumption
NF_CT_STAT_INC assumes rcu_read_lock in nf_hook_slow disables
preemption as well, making it legal to use __get_cpu_var without
disabling preemption manually. The assumption is not correct anymore
with preemptable RCU, additionally we need to protect against softirqs
when not holding nf_conntrack_lock.

Add NF_CT_STAT_INC_ATOMIC macro, which disables local softirqs,
and use where necessary.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:13:43 -08:00
Patrick McHardy
923f4902fe [NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
all paths not obviously only used within packet process context
(nfnetlink_conntrack).
  
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:57 -08:00
Arjan van de Ven
540473208f [PATCH] mark struct file_operations const 1
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:44 -08:00
Eric Dumazet
1e19e02ca0 [NET]: Reorder fields of struct dst_entry
This last patch (but not least :) ) finally moves the next pointer at
the end of struct dst_entry. This permits to perform route cache
lookups with a minimal cost of one cache line per entry, instead of
two.

Both 32bits and 64bits platforms benefit from this new layout.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:45 -08:00
Eric Dumazet
0c195c3fc4 [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer
This patch removes the next pointer from 'struct dn_route.u' union,
and renames u.rt_next to u.dst.dn_next.

It also moves 'struct flowi' right after 'struct dst_entry' to prepare
speedup lookups.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:43 -08:00
Eric Dumazet
7cc482634f [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer
This patch removes the next pointer from 'struct rt6_info.u' union,
and renames u.next to u.dst.rt6_next.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:40 -08:00
Eric Dumazet
093c2ca416 [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer
This patch removes the rt_next pointer from 'struct rtable.u' union,
and renames u.rt_next to u.dst_rt_next.

It also moves 'struct flowi' right after 'struct dst_entry' to prepare
the gain on lookups.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:38 -08:00
Eric Dumazet
75ce7ceaa1 [NET]: Introduce union in struct dst_entry to hold 'next' pointer
This patch introduces an anonymous union to nicely express the fact that all
objects inherited from struct dst_entry should access to the generic 'next'
pointer but with appropriate type verification.

This patch is a prereq before following patches.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:36 -08:00
Eric Dumazet
dbca9b2750 [NET]: change layout of ehash table
ehash table layout is currently this one :

First half of this table is used by sockets not in TIME_WAIT state
Second half of it is used by sockets in TIME_WAIT state.

This is non optimal because of for a given hash or socket, the two chain heads 
are located in separate cache lines.
Moreover the locks of the second half are never used.

If instead of this halving, we use two list heads in inet_ehash_bucket instead 
of only one, we probably can avoid one cache miss, and reduce ram usage, 
particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, 
CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep 
together related chains to speedup lookups and socket state change.

In this patch I did not try to align struct inet_ehash_bucket, but a future 
patch could try to make this structure have a convenient size (a power of two 
or a multiple of L1_CACHE_SIZE).
I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so 
maybe we dont need to scratch our heads to align the bucket...

Note : In case struct inet_ehash_bucket is not a power of two, we could 
probably change alloc_large_system_hash() (in case it use __get_free_pages()) 
to free the unused space. It currently allocates a big zone, but the last 
quarter of it could be freed. Again, this should be a temporary 'problem'.

Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 14:16:46 -08:00
Jennifer Hunt
eac3731bd0 [S390]: Add AF_IUCV socket support
From: Jennifer Hunt <jenhunt@us.ibm.com>

This patch adds AF_IUCV socket support.

Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:51:54 -08:00
Martin Schwidefsky
2356f4cb19 [S390]: Rewrite of the IUCV base code, part 2
Add rewritten IUCV base code to net/iucv.

Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:37:42 -08:00
Andrew Hendry
39e21c0d34 [X.25]: Adds /proc/sys/net/x25/x25_forward to control forwarding.
echo "1" > /proc/sys/net/x25/x25_forward 
To turn on x25_forwarding, defaults to off
Requires the previous patch.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:34:36 -08:00
Andrew Hendry
95a9dc4390 [X.25]: Add call forwarding
Adds call forwarding to X.25, allowing it to operate like an X.25 router.
Useful if one needs to manipulate X.25 traffic with tools like tc.
This is an update/cleanup based off a patch submitted by Daniel Ferenci a few years ago.

Thanks Alan for the feedback.
Added the null check to the clones.
Moved the skb_clone's into the forwarding functions.

Worked ok with Cisco XoT, linux X.25 back to back, and some old NTUs/PADs.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:34:02 -08:00
Shinta Sugimoto
80c9abaabf [XFRM]: Extension for dynamic update of endpoint address(es)
Extend the XFRM framework so that endpoint address(es) in the XFRM
databases could be dynamically updated according to a request (MIGRATE
message) from user application. Target XFRM policy is first identified
by the selector in the MIGRATE message. Next, the endpoint addresses
of the matching templates and XFRM states are updated according to
the MIGRATE message.

Signed-off-by: Shinta Sugimoto <shinta.sugimoto@ericsson.com>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:11:42 -08:00
Eric Leblond
41f4689a7c [NETFILTER]: NAT: optional source port randomization support
This patch adds support to NAT to randomize source ports.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:17 -08:00
Michal Schmidt
6fecd19851 [NETFILTER]: Add SANE connection tracking helper
This is nf_conntrack_sane, a netfilter connection tracking helper module
for the SANE protocol used by the 'saned' daemon to make scanners available
via network. The SANE protocol uses separate control & data connections,
similar to passive FTP. The helper module is needed to recognize the data
connection as RELATED to the control one.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:09 -08:00
Miika Komu
cdca72652a [IPSEC]: exporting xfrm_state_afinfo
This patch exports xfrm_state_afinfo.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:00 -08:00
David S. Miller
8eb9086f21 [IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts.
Do this even for non-blocking sockets.  This avoids the silly -EAGAIN
that applications can see now, even for non-blocking sockets in some
cases (f.e. connect()).

With help from Venkat Tekkirala.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:45 -08:00
Frederik Deweerdt
ba7808eac1 [TCP]: remove tcp header from tcp_v4_check (take #2)
The tcphdr struct passed to tcp_v4_check is not used, the following
patch removes it from the parameter list.

This adds the netfilter modifications missing in the patch I sent
for rc3-mm1.

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:44 -08:00
David S. Miller
e89862f4c5 [TCP]: Restore SKB socket owner setting in tcp_transmit_skb().
Revert 931731123a

We can't elide the skb_set_owner_w() here because things like certain
netfilter targets (such as owner MATCH) need a socket to be set on the
SKB for correct operation.

Thanks to Jan Engelhardt and other netfilter list members for
pointing this out.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 01:04:55 -08:00
Vlad Yasevich
610ab73ac4 [SCTP]: Correctly handle unexpected INIT-ACK chunk.
Consider the chunk as Out-of-the-Blue if we don't have
an endpoint.  Otherwise discard it as before.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 20:25:46 -08:00
Mikael Pettersson
16d807988f [NETFILTER]: fix xt_state compile failure
In file included from net/netfilter/xt_state.c:13:
include/net/netfilter/nf_conntrack_compat.h: In function 'nf_ct_l3proto_try_module_get':
include/net/netfilter/nf_conntrack_compat.h:70: error: 'PF_INET' undeclared (first use in this function)
include/net/netfilter/nf_conntrack_compat.h:70: error: (Each undeclared identifier is reported only once
include/net/netfilter/nf_conntrack_compat.h:70: error: for each function it appears in.)
include/net/netfilter/nf_conntrack_compat.h:71: warning: control reaches end of non-void function
make[2]: *** [net/netfilter/xt_state.o] Error 1
make[1]: *** [net/netfilter] Error 2
make: *** [net] Error 2

A simple fix is to have nf_conntrack_compat.h #include <linux/socket.h>.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 20:25:43 -08:00
Jeff Garzik
11897539a9 Merge branch 'upstream-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes 2007-01-07 22:44:56 -05:00
Gerrit Renker
0d630cc0a6 [TCP]: Use old definition of before
This reverts the new (unambiguous) definition of the TCP `before'
relation. As pointed out in an example by Herbert Xu, there is 
existing code which implicitly requires the old definition in order
to work correctly.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-04 12:25:16 -08:00
Adrian Bunk
7f18ba6248 [X25]: proper prototype for x25_init_timers()
This patch adds a proper prototype for x25_init_timers() in 
include/net/x25.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-03 18:48:13 -08:00
Zhu Yi
3eb546057d [PATCH] ieee80211: WLAN_GET_SEQ_SEQ fix (select correct region)
The WLAN_GET_SEQ_SEQ(seq) macro in ieee80211 is selecting the wrong region.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-01-02 20:56:26 -05:00
Al Viro
b23e353666 [IPV6]: Dumb typo in generic csum_ipv6_magic()
... duh

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:07 -08:00
Adrian Bunk
24123186fa [SCTP]: make 2 functions static
This patch makes the following needlessly global functions static:
- ipv6.c: sctp_inet6addr_event()
- protocol.c: sctp_inetaddr_event()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:05 -08:00
Ivan Skytte Jorgensen
0f3fffd8ab [SCTP]: Fix typo adaption -> adaptation as per the latest API draft.
Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:04 -08:00
Gerrit Renker
9a036b9c33 [TCP]: Fix ambiguity in the `before' relation.
While looking at DCCP sequence numbers, I stumbled over a problem with
the following definition of before in tcp.h:

static inline int before(__u32 seq1, __u32 seq2)
{
        return (__s32)(seq1-seq2) < 0;
}

Problem: This definition suffers from an an ambiguity, i.e. always

           before(a, (a + 2^31) % 2^32)) = 1
           before((a + 2^31) % 2^32), a) = 1

         In text: when the difference between a and b amounts to 2^31,
         a is always considered `before' b, the function can not decide.
         The reason is that implicitly 0 is `before' 1 ... 2^31-1 ... 2^31

Solution: There is a simple fix, by defining before in such a way that
          0 is no longer `before' 2^31, i.e. 0 `before' 1 ... 2^31-1
          By not using the middle between 0 and 2^32, before can be made
          unambiguous.
          This is achieved by testing whether seq2-seq1 > 0 (using signed
          32-bit arithmetic).

I attach a patch to codify this. Also the `after' relation is basically
a redefinition of `before', it is now defined as a macro after before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:01 -08:00
Ralf Baechle
a3d384029a [AX.25]: Fix unchecked rose_add_loopback_neigh uses
rose_add_loopback_neigh uses kmalloc and the callers were ignoring the
error value.  Rewrite to let the caller deal with the allocation.  This
allows the use of static allocation of kmalloc use entirely.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:14 -08:00
Ralf Baechle
a4282717c1 [AX.25]: Fix unchecked ax25_linkfail_register uses
ax25_linkfail_register uses kmalloc and the callers were ignoring the
error value.  Rewrite to let the caller deal with the allocation.  This
allows the use of static allocation of kmalloc use entirely.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:11 -08:00
Ralf Baechle
8d5cf596d1 [AX.25]: Fix unchecked ax25_protocol_register uses.
Replace ax25_protocol_register by ax25_register_pid which assumes the
caller has done the memory allocation.  This allows replacing the
kmalloc allocations entirely by static allocations.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:08 -08:00
Ralf Baechle
c9266b99e2 [AX.25]: Mark all kmalloc users __must_check
The recent fix 0506d4068b made obvious that
error values were not being propagated through the AX.25 stack.  To help
with that this patch marks all kmalloc users in the AX.25, NETROM and
ROSE stacks as __must_check.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:07 -08:00
Kim Nordlund
8bce65b95a [IPV6]: Make fib6_node subtree depend on IPV6_SUBTREES
Make fib6_node 'subtree' depend on IPV6_SUBTREES.

Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13 16:48:31 -08:00
Ivan Skytte Jorgensen
6ab792f577 [SCTP]: Add support for SCTP_CONTEXT socket option.
Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13 16:48:29 -08:00
Sridhar Samudrala
29c7cf9618 [SCTP]: Handle address add/delete events in a more efficient way.
Currently in SCTP, we maintain a local address list by rebuilding the whole
list from the device list whenever we get a address add/delete event.

This patch fixes it by only adding/deleting the address for which we
receive the event.

Also removed the sctp_local_addr_lock() which is no longer needed as we
now use list_for_each_safe() to traverse this list. This fixes the bugs
in sctp_copy_laddrs_xxx() routines where we do copy_to_user() while
holding this lock.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13 16:48:27 -08:00
Yasuyuki Kozakai
fe0b9294c9 [NETFILTER]: x_tables: error if ip_conntrack is asked to handle IPv6 packets
To do that, this makes nf_ct_l3proto_try_module_{get,put} compatible
functions. As a result we can remove '#ifdef' surrounds and direct call of
need_conntrack().

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13 16:48:20 -08:00
Al Viro
905f3ed625 [PATCH] hci endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:52 -08:00
Ralf Baechle
f654c854d1 [HAMRADIO]: Fix baycom_epp.c compile failure.
Fix foobar in 15b1c0e822 and
e8cc49bb0f patch series.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-11 14:35:01 -08:00
Alexey Dobriyan
1f29bcd739 [PATCH] sysctl: remove unused "context" param
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Ralf Baechle
15b1c0e822 [AX.25]: Fix default address and broadcast address initialization.
Only the callsign but not the SSID part of an AX.25 address is ASCII
based but Linux by initializes the SSID which should be just a 4-bit
number from ASCII anyway.

Fix that and convert the code to use a shared constant for both default
addresses.  While at it, use the same style for null_ax25_address also.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:26 -08:00
Ralf Baechle
e8cc49bb0f [AX.25]: Constify ax25 utility functions
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:25 -08:00
Stephen Hemminger
3644f0cee7 [NET]: Convert hh_lock to seqlock.
The hard header cache is in the main output path, so using
seqlock instead of reader/writer lock should reduce overhead.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:20 -08:00
Alan Cox
edc6afc549 [PATCH] tty: switch to ktermios and new framework
This is the core of the switch to the new framework.  I've split it from the
driver patches which are mostly search/replace and would encourage people to
give this one a good hard stare.

The references to BOTHER and ISHIFT are the termios values that must be
defined by a platform once it wants to turn on "new style" ioctl support.  The
code patches here ensure that providing

1. The termios overlays the ktermios in memory
2. The only new kernel only fields are c_ispeed/c_ospeed (or none)

the existing behaviour is retained.  This is true for the patches at this
point in time.

Future patches will define BOTHER, ISHIFT and enable newer termios structures
for each architecture, and once they are all done some of the ifdefs also
vanish.

[akpm@osdl.org: warning fix]
[akpm@osdl.org: IRDA fix]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:56 -08:00
Linus Torvalds
2685b267bc Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits)
  [NETFILTER]: Fix non-ANSI func. decl.
  [TG3]: Identify Serdes devices more clearly.
  [TG3]: Use msleep.
  [TG3]: Use netif_msg_*.
  [TG3]: Allow partial speed advertisement.
  [TG3]: Add TG3_FLG2_IS_NIC flag.
  [TG3]: Add 5787F device ID.
  [TG3]: Fix Phy loopback.
  [WANROUTER]: Kill kmalloc debugging code.
  [TCP] inet_twdr_hangman: Delete unnecessary memory barrier().
  [NET]: Memory barrier cleanups
  [IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries.
  audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n
  audit: Add auditing to ipsec
  [IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n
  [IrDA]: Incorrect TTP header reservation
  [IrDA]: PXA FIR code device model conversion
  [GENETLINK]: Fix misplaced command flags.
  [NETLIK]: Add a pointer to the Generic Netlink wiki page.
  [IPV6] RAW: Don't release unlocked sock.
  ...
2006-12-07 09:05:15 -08:00
Peter Zijlstra
ed07536ed6 [PATCH] lockdep: annotate nfs/nfsd in-kernel sockets
Stick NFS sockets in their own class to avoid some lockdep warnings.  NFS
sockets are never exposed to user-space, and will hence not trigger certain
code paths that would otherwise pose deadlock scenarios.

[akpm@osdl.org: cleanups]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Dickson <SteveD@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Fixed patch corruption by quilt, pointed out by Peter Zijlstra ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:30 -08:00
Christoph Lameter
e18b890bb0 [PATCH] slab: remove kmem_cache_t
Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Christoph Lameter
54e6ecb239 [PATCH] slab: remove SLAB_ATOMIC
SLAB_ATOMIC is an alias of GFP_ATOMIC

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Joy Latten
c9204d9ca7 audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n
Disables auditing in ipsec when CONFIG_AUDITSYSCALL is
disabled in the kernel.

Also includes a bug fix for xfrm_state.c as a result of
original ipsec audit patch.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 20:14:23 -08:00
Joy Latten
161a09e737 audit: Add auditing to ipsec
An audit message occurs when an ipsec SA
or ipsec policy is created/deleted.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 20:14:22 -08:00
Randy Dunlap
95b99a670d [IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n
include/net/irda/irlan_filter.h:31: warning: 'struct seq_file' declared inside parameter list
include/net/irda/irlan_filter.h:31: warning: its scope is only this definition or declaration, which is probably not what you want

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 20:10:07 -08:00
David Howells
9db7372445 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/ata/libata-scsi.c
	include/linux/libata.h

Futher merge of Linus's head and compilation fixups.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 17:01:28 +00:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Al Viro
d7fe0f241d [PATCH] severing skbuff.h -> mm.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-04 02:00:34 -05:00
Al Viro
bd01f843c3 [PATCH] severing skbuff.h -> poll.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-04 02:00:31 -05:00
Patrick McHardy
f09943fefe [NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port
Add nf_conntrack port of the PPtP conntrack/NAT helper. Since there seems
to be no IPv6-capable PPtP implementation the helper only support IPv4.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:09:41 -08:00
Patrick McHardy
f587de0e2f [NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port
Add IPv4 and IPv6 capable nf_conntrack port of the H.323 conntrack/NAT helper.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:08:46 -08:00
Patrick McHardy
d6a9b6500a [NETFILTER]: nf_conntrack: add helper function for expectation initialization
Expectation address masks need to be differently initialized depending
on the address family, create helper function to avoid cluttering up
the code too much.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:08:01 -08:00
Jozsef Kadlecsik
55a733247d [NETFILTER]: nf_nat: add FTP NAT helper port
Add FTP NAT helper.

Split out from Jozsef's big nf_nat patch with a few small fixes by myself.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:07:44 -08:00
Jozsef Kadlecsik
5b1158e909 [NETFILTER]: Add NAT support for nf_conntrack
Add NAT support for nf_conntrack. Joint work of Jozsef Kadlecsik,
Yasuyuki Kozakai, Martin Josefsson and myself.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:07:13 -08:00
Patrick McHardy
9457d851fc [NETFILTER]: nf_conntrack: automatic helper assignment for expectations
Some helpers (namely H.323) manually assign further helpers to expected
connections. This is not possible with nf_conntrack anymore since we
need to know whether a helper is used at allocation time.

Handle the helper assignment centrally, which allows to perform the
correct allocation and as a nice side effect eliminates the need
for the H.323 helper to fiddle with nf_conntrack_lock.

Mid term the allocation scheme really needs to be redesigned since
we do both the helper and expectation lookup _twice_ for every new
connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:05:25 -08:00
Patrick McHardy
bff9a89bca [NETFILTER]: nf_conntrack: endian annotations
Resync with Al Viro's ip_conntrack annotations and fix a missed
spot in ip_nat_proto_icmp.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:05:08 -08:00
Patrick McHardy
f9aae95828 [NETFILTER]: nf_conntrack: fix helper structure alignment
Adding the alignment to the size doesn't make any sense, what it
should do is align the size of the conntrack structure to the
alignment requirements of the helper structure and return an
aligned pointer in nfct_help().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:04:50 -08:00
Jamal Hadi Salim
a4d1366d50 [GENETLINK]: Add cmd dump completion.
Remove assumption that generic netlink commands cannot have dump
completion callbacks.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:32:09 -08:00
Miika Komu
76b3f055f3 [IPSEC]: Add encapsulation family.
Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:48 -08:00
Patrick McHardy
43effa1e57 [NET_SCHED]: Fix endless loops caused by inaccurate qlen counters (part 1)
There are multiple problems related to qlen adjustment that can lead
to an upper qdisc getting out of sync with the real number of packets
queued, leading to endless dequeueing attempts by the upper layer code.

All qdiscs must maintain an accurate q.qlen counter. There are basically
two groups of operations affecting the qlen: operations that propagate
down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the
root qdisc and operations only affecting a subtree or single qdisc
(change, graft, delete class). Since qlen changes during operations from
the second group don't propagate to ancestor qdiscs, their qlen values
become desynchronized.

This patch adds a function to propagate qlen changes up the qdisc tree,
optionally calling a callback function to perform qdisc-internal
maintenance when the child qdisc becomes empty. The follow-up patches
will convert all qdiscs to use this function where necessary.

Noticed by Timo Steinbach <tsteinbach@astaro.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:42 -08:00
Patrick McHardy
9f9afec482 [NET_SCHED]: Set parent classid in default qdiscs
Set parent classids in default qdiscs to allow walking up the tree
from outside the qdiscs. This is needed by the next patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:41 -08:00
Paul Moore
0275276035 NetLabel: convert to an extensibile/sparse category bitmap
The original NetLabel category bitmap was a straight char bitmap which worked
fine for the initial release as it only supported 240 bits due to limitations
in the CIPSO restricted bitmap tag (tag type 0x01).  This patch converts that
straight char bitmap into an extensibile/sparse bitmap in order to lay the
foundation for other CIPSO tag types and protocols.

This patch also has a nice side effect in that all of the security attributes
passed by NetLabel into the LSM are now in a format which is in the host's
native byte/bit ordering which makes the LSM specific code much simpler; look
at the changes in security/selinux/ss/ebitmap.c as an example.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:31:36 -08:00
Yasuyuki Kozakai
468ec44bd5 [NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_find
We usually uses 'xxx_find_get' for function which increments
reference count.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:21 -08:00
Patrick McHardy
e4bd8bce3e [NETFILTER]: nf_conntrack: /proc compatibility with old connection tracking
This patch adds /proc/net/ip_conntrack, /proc/net/ip_conntrack_expect and
/proc/net/stat/ip_conntrack files to keep old programs using them working.

The /proc/net/ip_conntrack and /proc/net/ip_conntrack_expect files show only
IPv4 entries, the /proc/net/stat/ip_conntrack shows global statistics.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:20 -08:00
Patrick McHardy
a999e68376 [NETFILTER]: nf_conntrack: sysctl compatibility with old connection tracking
This patch adds an option to keep the connection tracking sysctls visible
under their old names.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:19 -08:00
Patrick McHardy
d62f9ed4a4 [NETFILTER]: nf_conntrack: automatic sysctl registation for conntrack protocols
Add helper functions for sysctl registration with optional instantiating
of common path elements (like net/netfilter) and use it for support for
automatic registation of conntrack protocol sysctls.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:17 -08:00
Patrick McHardy
f8eb24a89a [NETFILTER]: nf_conntrack: move extern declaration to header files
Using extern in a C file is a bad idea because the compiler can't
catch type errors.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:16 -08:00
Martin Josefsson
824621eddd [NETFILTER]: nf_conntrack: remove unused struct list_head from protocols
Remove unused struct list_head from struct nf_conntrack_l3proto and
nf_conntrack_l4proto as all protocols are kept in arrays, not linked
lists.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:13 -08:00
Martin Josefsson
ae5718fb3d [NETFILTER]: nf_conntrack: more sanity checks in protocol registration/unregistration
Add some more sanity checks when registering/unregistering l3/l4 protocols.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:10 -08:00
Martin Josefsson
605dcad6c8 [NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol
Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in
order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets
rather confusing with 'nf_conntrack_protocol'.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:09 -08:00
Martin Josefsson
f61801218a [NETFILTER]: nf_conntrack: split out the event cache
This patch splits out the event cache into its own file
nf_conntrack_ecache.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:06 -08:00
Martin Josefsson
7e5d03bb9d [NETFILTER]: nf_conntrack: split out helper handling
This patch splits out handling of helpers into its own file
nf_conntrack_helper.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:05 -08:00
Martin Josefsson
77ab9cff0f [NETFILTER]: nf_conntrack: split out expectation handling
This patch splits out expectation handling into its own file
nf_conntrack_expect.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:04 -08:00
Arnaldo Carvalho de Melo
ee41e2dff1 [INET]: Change protocol field in struct inet_protosw to u16
[acme@newtoy net-2.6.20]$ pahole /tmp/tcp_ipv6.o inet_protosw
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/protocol.h:69 */
struct inet_protosw {
        struct list_head           list;                 /*     0     8 */
        short unsigned int         type;                 /*     8     2 */

        /* XXX 2 bytes hole, try to pack */

        int                        protocol;             /*    12     4 */
        struct proto *             prot;                 /*    16     4 */
        const struct proto_ops  *  ops;                  /*    20     4 */
        int                        capability;           /*    24     4 */
        char                       no_check;             /*    28     1 */
        unsigned char              flags;                /*    29     1 */
}; /* size: 32, sum members: 28, holes: 1, sum holes: 2, padding: 2 */

So that we can kill that hole, protocol can only go all the way to 255 (RAW).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:55 -08:00
Arnaldo Carvalho de Melo
46ca5f5dc4 [XFRM]: Pack struct xfrm_policy
[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o xfrm_policy
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/security.h:67 */
struct xfrm_policy {
        struct xfrm_policy *       next;                 /*     0     4 */
        struct hlist_node          bydst;                /*     4     8 */
        struct hlist_node          byidx;                /*    12     8 */
        rwlock_t                   lock;                 /*    20    36 */
        atomic_t                   refcnt;               /*    56     4 */
        struct timer_list          timer;                /*    60    24 */
        u8                         type;                 /*    84     1 */

        /* XXX 3 bytes hole, try to pack */

        u32                        priority;             /*    88     4 */
        u32                        index;                /*    92     4 */
        struct xfrm_selector       selector;             /*    96    56 */
        struct xfrm_lifetime_cfg   lft;                  /*   152    64 */
        struct xfrm_lifetime_cur   curlft;               /*   216    32 */
        struct dst_entry *         bundles;              /*   248     4 */
        __u16                      family;               /*   252     2 */
        __u8                       action;               /*   254     1 */
        __u8                       flags;                /*   255     1 */
        __u8                       dead;                 /*   256     1 */
        __u8                       xfrm_nr;              /*   257     1 */

        /* XXX 2 bytes hole, try to pack */

        struct xfrm_sec_ctx *      security;             /*   260     4 */
        struct xfrm_tmpl           xfrm_vec[6];          /*   264   360 */
}; /* size: 624, sum members: 619, holes: 2, sum holes: 5 */

So lets have just one hole instead of two, by moving 'type' to just before 'action',
end result:

[acme@newtoy net-2.6.20]$ codiff -s /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
  struct xfrm_policy |   -4
 1 struct changed
[acme@newtoy net-2.6.20]$

[acme@newtoy net-2.6.20]$ pahole -c 64 net/ipv4/tcp.o xfrm_policy
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/security.h:67 */
struct xfrm_policy {
        struct xfrm_policy *       next;                 /*     0     4 */
        struct hlist_node          bydst;                /*     4     8 */
        struct hlist_node          byidx;                /*    12     8 */
        rwlock_t                   lock;                 /*    20    36 */
        atomic_t                   refcnt;               /*    56     4 */
        struct timer_list          timer;                /*    60    24 */
        u32                        priority;             /*    84     4 */
        u32                        index;                /*    88     4 */
        struct xfrm_selector       selector;             /*    92    56 */
        struct xfrm_lifetime_cfg   lft;                  /*   148    64 */
        struct xfrm_lifetime_cur   curlft;               /*   212    32 */
        struct dst_entry *         bundles;              /*   244     4 */
        u16                        family;               /*   248     2 */
        u8                         type;                 /*   250     1 */
        u8                         action;               /*   251     1 */
        u8                         flags;                /*   252     1 */
        u8                         dead;                 /*   253     1 */
        u8                         xfrm_nr;              /*   254     1 */

        /* XXX 1 byte hole, try to pack */

        struct xfrm_sec_ctx *      security;             /*   256     4 */
        struct xfrm_tmpl           xfrm_vec[6];          /*   260   360 */
}; /* size: 620, sum members: 619, holes: 1, sum holes: 1 */

Are there any fugly data dependencies here? None that I know.

In the process changed the removed the __ prefixed types, that are just for
userspace visible headers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:48 -08:00
Arnaldo Carvalho de Melo
850db6b8c5 [INET_CONNECTION_SOCK]: Pack struct inet_connection_sock_af_ops
We have a hole in:

[acme@newtoy net-2.6.20]$ pahole net/ipv6/tcp_ipv6.o inet_connection_sock_af_ops
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/inet_connection_sock.h:38 */
struct inet_connection_sock_af_ops {
        int                        (*queue_xmit)();      /*     0     4 */
        void                       (*send_check)();      /*     4     4 */
        int                        (*rebuild_header)();  /*     8     4 */
        int                        (*conn_request)();    /*    12     4 */
        struct sock *              (*syn_recv_sock)();   /*    16     4 */
        int                        (*remember_stamp)();  /*    20     4 */
        __u16                      net_header_len;       /*    24     2 */

        /* XXX 2 bytes hole, try to pack */

        int                        (*setsockopt)();      /*    28     4 */
        int                        (*getsockopt)();      /*    32     4 */
        int                        (*compat_setsockopt)(); /*    36     4 */
        int                        (*compat_getsockopt)(); /*    40     4 */
        void                       (*addr2sockaddr)();   /*    44     4 */
        int                        sockaddr_len;         /*    48     4 */
}; /* size: 52, sum members: 50, holes: 1, sum holes: 2 */

But we don't need sockaddr_len to be an int:

[acme@newtoy net-2.6.20]$ find net -name "*.[ch]" | xargs grep '\.sockaddr_len.\+=' | sort -u
net/dccp/ipv4.c:        .sockaddr_len      = sizeof(struct sockaddr_in),
net/dccp/ipv6.c:        .sockaddr_len      = sizeof(struct sockaddr_in6),
net/ipv4/tcp_ipv4.c:    .sockaddr_len      = sizeof(struct sockaddr_in),
net/ipv6/tcp_ipv6.c:    .sockaddr_len      = sizeof(struct sockaddr_in6),
net/sctp/ipv6.c:        .sockaddr_len      = sizeof(struct sockaddr_in6),
net/sctp/protocol.c:    .sockaddr_len      = sizeof(struct sockaddr_in),

[acme@newtoy net-2.6.20]$ pahole --sizes net/ipv6/tcp_ipv6.o | grep sockaddr_in
struct sockaddr_in: 16 0
struct sockaddr_in6: 28 0
[acme@newtoy net-2.6.20]$

So I turned sockaddr_len a 'u16', and now:

[acme@newtoy net-2.6.20]$ pahole net/ipv6/tcp_ipv6.o inet_connection_sock_af_ops
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/inet_connection_sock.h:38 */
struct inet_connection_sock_af_ops {
        int            (*queue_xmit)();        /*     0   4 */
        void           (*send_check)();        /*     4   4 */
        int            (*rebuild_header)();    /*     8   4 */
        int            (*conn_request)();      /*    12   4 */
        struct sock *  (*syn_recv_sock)();     /*    16   4 */
        int            (*remember_stamp)();    /*    20   4 */
        u16            net_header_len;         /*    24   2 */
        u16            sockaddr_len;           /*    26   2 */
        int            (*setsockopt)();        /*    28   4 */
        int            (*getsockopt)();        /*    32   4 */
        int            (*compat_setsockopt)(); /*    36   4 */
        int            (*compat_getsockopt)(); /*    40   4 */
        void           (*addr2sockaddr)();     /*    44   4 */
}; /* size: 48 */

So we've saved 4 bytes:

[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp_ipv6.o.before net/ipv6/tcp_ipv6.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/tcp_ipv6.c:
  struct inet_connection_sock_af_ops |   -4
    net_header_len;
     from: __u16                 /*    24(0)     2(0) */
     to:   u16                   /*    24(0)     2(0) */
    sockaddr_len;
     from: int                   /*    48(0)     4(0) */
     to:   u16                   /*    26(0)     2(0) */
 1 struct changed
[acme@newtoy net-2.6.20]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:46 -08:00
Gerrit Renker
4c0a6cb0db [UDP(-Lite)]: consolidate v4 and v6 get|setsockopt code
This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The
justification is that UDP(-Lite) is a transport-layer protocol and therefore
the socket option code (at least in theory) should be AF-independent.

Furthermore, there is the following code reduplication:
 * do_udp{,v6}_getsockopt is 100% identical between v4 and v6
 * do_udp{,v6}_setsockopt is identical up to the following differerence
	--v4 in contrast to v4 additionally allows the experimental encapsulation
          types  UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE
	--the remainder is identical between v4 and v6
   I believe that this difference is of little relevance.

The advantages in not duplicating twice almost completely identical code.

The patch further simplifies the interface of udp{,v6}_push_pending_frames,
since for the second argument (struct udp_sock *up) it always holds that
up = udp_sk(sk); where sk is the first function argument.

Signed-off-by: Gerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:45 -08:00
Thomas Graf
4a89c2562c [DECNET] address: Convert to new netlink interface
Extends the netlink interface to support the __le16 type and
converts address addition, deletion and, dumping to use the
new netlink interface.

Fixes multiple occasions of possible illegal memory references
due to not validated netlink attributes.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:30 -08:00
Al Viro
66c6f529c3 [NET]: net/sched annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:19 -08:00
Al Viro
8e5200f540 [NET]: Fix assorted misannotations (from md5 and udplite merges).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:16 -08:00
Al Viro
2178eda826 [SCTP]: SCTP_CMD_PROCESS_CTSN annotations.
argument passed as __be32

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:14 -08:00
Al Viro
3dbe86566e [SCTP]: Annotate ->supported_addrs().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:11 -08:00
Al Viro
e1857ea28d [SCTP]: sctp_association ->peer.i is a host-endian analog of sctp_inthdr.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:10 -08:00
Al Viro
6fbfa9f951 [SCTP]: Annotate ->inaddr_any().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:08 -08:00
Al Viro
c9c938cb05 [SCTP]: flip_to_{h,n}() are not needed anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:07 -08:00
Al Viro
516b20ee2d [SCTP]: ->a_h is gone now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:05 -08:00
Al Viro
74af924ab6 [SCTP]: ->a_h is gone now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:00 -08:00
Al Viro
80f15d6241 [SCTP]: ->source_h is not used anymore.
kill it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:57 -08:00
Al Viro
a926626893 [SCTP]: Switch all remaining users of ->saddr_h to ->saddr.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:56 -08:00
Al Viro
dd86d136f9 [SCTP]: Switch ->from_addr_param() to net-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:48 -08:00
Al Viro
854d43a465 [SCTP]: Annotate ->dst_saddr()
switched to taking a pointer to net-endian sctp_addr
and a net-endian port number.  Instances and callers
adjusted; interestingly enough, the only calls are
direct calls of specific instances - the method is not
used at all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:35 -08:00
Al Viro
2a6fd78ade [SCTP] embedded sctp_addr: net-endian mirrors
Add sctp_chunk->source, sctp_sockaddr_entry->a, sctp_transport->ipaddr
and sctp_transport->saddr, maintain them as net-endian mirrors of
their host-endian counterparts.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:30 -08:00
Al Viro
09ef7fecea [SCTP]: Beginning of conversion to net-endian for embedded sctp_addr.
Part 1: rename sctp_chunk->source, sctp_sockaddr_entry->a,
sctp_transport->ipaddr and sctp_transport->saddr (to ..._h)

The next patch will reintroduce these fields and keep them as
net-endian mirrors of the original (renamed) ones.  Split in
two patches to make sure that we hadn't forgotten any instanes.

Later in the series we'll eliminate uses of host-endian variants
(basically switching users to net-endian counterparts as we
progress through that mess).  Then host-endian ones will die.

Other embedded host-endian sctp_addr will be easier to switch
directly, so we leave them alone for now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:29 -08:00
Al Viro
04afd8b282 [SCTP]: Beginning of sin_port fixes.
That's going to be a long series.  Introduced temporary helpers
doing copy-and-convert for sctp_addr; they are used to kill
flip-in-place in global data structures and will be used
to gradually push host-endian uses of sctp_addr out of existence.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:24 -08:00
Al Viro
dbc16db1e5 [SCTP]: Trivial sctp endianness annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:23 -08:00
Al Viro
72f17e1c09 [SCTP]: Annotate tsn_dups.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:22 -08:00
Al Viro
dc251b2b1c [SCTP]: SCTP_CMD_INIT_FAILED annotations.
argument stored for SCTP_CMD_INIT_FAILED is always __be16
(protocol error).  Introduced new field and accessor for
it (SCTP_PERR()); switched to their use (from SCTP_U32() and
.u32)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:20 -08:00
Al Viro
63706c5c6f [SCTP]: sctp_make_op_error() annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:18 -08:00
Al Viro
5bf2db0390 [SCTP]: Annotate sctp_init_cause().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:17 -08:00
Adrian Bunk
89c8945815 [IPV6] net/ipv6/sit.c: make 2 functions static
This patch makes two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:26:15 -08:00
Paul Moore
c6b1677a54 NetLabel: use the correct CIPSOv4 MLS label limits
The CIPSOv4 engine currently has MLS label limits which are slightly larger
than what the draft allows.  This is not a major problem due to the current
implementation but we should fix this so it doesn't bite us later.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:24:12 -08:00
Paul Moore
701a90bad9 NetLabel: make netlbl_lsm_secattr struct easier/quicker to understand
The existing netlbl_lsm_secattr struct required the LSM to check all of the
fields to determine if any security attributes were present resulting in a lot
of work in the common case of no attributes.  This patch adds a 'flags' field
which is used to indicate which attributes are present in the structure; this
should allow the LSM to do a quick comparison to determine if the structure
holds any security attributes.

Example:

 if (netlbl_lsm_secattr->flags)
	/* security attributes present */
 else
	/* NO security attributes present */

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:24:07 -08:00
Paul Moore
c6fa82a9dd NetLabel: change netlbl_secattr_init() to return void
The netlbl_secattr_init() function would always return 0 making it pointless
to have a return value.  This patch changes the function to return void.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:24:06 -08:00
Paul Moore
1f758d9354 NetLabel: use gfp_t instead of int where it makes sense
There were a few places in the NetLabel code where the int type was being used
instead of the gfp_t type, this patch corrects this mistake.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:24:04 -08:00
Arnaldo Carvalho de Melo
58a5a7b955 [NET]: Conditionally use bh_lock_sock_nested in sk_receive_skb
Spotted by Ian McDonald, tentatively fixed by Gerrit Renker:

http://www.mail-archive.com/dccp%40vger.kernel.org/msg00599.html

Rewritten not to unroll sk_receive_skb, in the common case, i.e. no lock
debugging, its optimized away.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:23:51 -08:00
David S. Miller
6bb100b9fc [UDPLite]: udplite.h needs ip6_checksum.h
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:48 -08:00
Al Viro
f9214b2627 [NET]: ipvs checksum annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:41 -08:00
Al Viro
5c78f275e6 [NET]: IP header modifier helpers annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:40 -08:00
Al Viro
f6ab028804 [NET]: Make mangling a checksum (0 -> 0xffff on the wire) explicit.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:39 -08:00
Al Viro
b51655b958 [NET]: Annotate __skb_checksum_complete() and friends.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:38 -08:00
Al Viro
b1550f2212 [NET]: Annotate ip_vs_checksum_complete() and callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:37 -08:00
Al Viro
5084205faf [NET]: Annotate callers of csum_partial_copy_...() and csum_and_copy...() in net/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:33 -08:00
Al Viro
868c86bcb5 [NET]: annotate csum_ipv6_magic() callers in net/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:31 -08:00
Al Viro
6b11687ef0 [NET]: Annotate csum_tcpudp_magic() callers in net/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:29 -08:00
Al Viro
d6f5493c1a [NET]: Annotate callers of csum_tcpudp_nofold() in net/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:28 -08:00
Al Viro
56649d5d3c [NET]: Generic checksum annotations and cleanups.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:25 -08:00
Al Viro
30d492da73 [ATM]: Annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:55 -08:00
Al Viro
ef296f56f8 [IPV6]: __ipv6_addr_diff() annotations and cleanup.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:53 -08:00
Al Viro
e69a4adc66 [IPV6]: Misc endianness annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:52 -08:00
Al Viro
714e85be35 [IPV6]: Assorted trivial endianness annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:50 -08:00
Al Viro
448c31aa34 [IRDA]: Trivial annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:48 -08:00
Gerrit Renker
ba4e58eca8 [NET]: Supporting UDP-Lite (RFC 3828) in Linux
This is a revision of the previously submitted patch, which alters
the way files are organized and compiled in the following manner:

	* UDP and UDP-Lite now use separate object files
	* source file dependencies resolved via header files
	  net/ipv{4,6}/udp_impl.h
	* order of inclusion files in udp.c/udplite.c adapted
	  accordingly

[NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)

This patch adds support for UDP-Lite to the IPv4 stack, provided as an
extension to the existing UDPv4 code:
        * generic routines are all located in net/ipv4/udp.c
        * UDP-Lite specific routines are in net/ipv4/udplite.c
        * MIB/statistics support in /proc/net/snmp and /proc/net/udplite
        * shared API with extensions for partial checksum coverage

[NET/IPv6]: Extension for UDP-Lite over IPv6

It extends the existing UDPv6 code base with support for UDP-Lite
in the same manner as per UDPv4. In particular,
        * UDPv6 generic and shared code is in net/ipv6/udp.c
        * UDP-Litev6 specific extensions are in net/ipv6/udplite.c
        * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
        * support for IPV6_ADDRFORM
        * aligned the coding style of protocol initialisation with af_inet6.c
        * made the error handling in udpv6_queue_rcv_skb consistent;
          to return `-1' on error on all error cases
        * consolidation of shared code

[NET]: UDP-Lite Documentation and basic XFRM/Netfilter support

The UDP-Lite patch further provides
        * API documentation for UDP-Lite
        * basic xfrm support
        * basic netfilter support for IPv4 and IPv6 (LOG target)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:46 -08:00
Thomas Graf
17c157c889 [GENL]: Add genlmsg_put_reply() to simplify building reply headers
By modyfing genlmsg_put() to take a genl_family and by adding
genlmsg_put_reply() the process of constructing the netlink
and generic netlink headers is simplified.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:42 -08:00
Thomas Graf
81878d27fd [GENL]: Add genlmsg_reply() to simply unicast replies to requests
A generic netlink user has no interest in knowing how to
address the source of the original request.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:41 -08:00
Thomas Graf
3dabc71578 [GENL]: Add genlmsg_new() to allocate generic netlink messages
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:40 -08:00
YOSHIFUJI Hideaki
cfb6eeb4c8 [TCP]: MD5 Signature Option (RFC2385) support.
Based on implementation by Rick Payne.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:39 -08:00
Gerrit Renker
b9df3cb8cf [TCP/DCCP]: Introduce net_xmit_eval
Throughout the TCP/DCCP (and tunnelling) code, it often happens that the
return code of a transmit function needs to be tested against NET_XMIT_CN
which is a value that does not indicate a strict error condition.

This patch uses a macro for these recurring situations which is consistent
with the already existing macro net_xmit_errno, saving on duplicated code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:22:27 -08:00
Thomas Graf
339bf98ffc [NETLINK]: Do precise netlink message allocations where possible
Account for the netlink message header size directly in nlmsg_new()
instead of relying on the caller calculate it correctly.

Replaces error handling of message construction functions when
constructing notifications with bug traps since a failure implies
a bug in calculating the size of the skb.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:11 -08:00
YOSHIFUJI Hideaki
a11d206d0f [IPV6]: Per-interface statistics support.
For IP MIB (RFC4293).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02 21:22:08 -08:00
YOSHIFUJI Hideaki
7a3025b1b3 [IPV6]: Introduce ip6_dst_idev() to get inet6_dev{} stored in dst_entry{}.
Otherwise, we will see a lot of casts...

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02 21:22:07 -08:00
David S. Miller
931731123a [TCP]: Don't set SKB owner in tcp_transmit_skb().
The data itself is already charged to the SKB, doing
the skb_set_owner_w() just generates a lot of noise and
extra atomics we don't really need.

Lmbench improvements on lat_tcp are minimal:

before:
TCP latency using localhost: 23.2701 microseconds
TCP latency using localhost: 23.1994 microseconds
TCP latency using localhost: 23.2257 microseconds

after:
TCP latency using localhost: 22.8380 microseconds
TCP latency using localhost: 22.9465 microseconds
TCP latency using localhost: 22.8462 microseconds

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:52 -08:00
Stephen Hemminger
ce7bc3bf15 [TCP]: Restrict congestion control choices.
Allow normal users to only choose among a restricted set of congestion
control choices.  The default is reno and what ever has been configured
as default. But the policy can be changed by administrator at any time.

For example, to allow any choice:
    cp /proc/sys/net/ipv4/tcp_available_congestion_control \
       /proc/sys/net/ipv4/tcp_allowed_congestion_control

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:49 -08:00
Stephen Hemminger
3ff825b28d [TCP]: Add tcp_available_congestion_control sysctl.
Create /proc/sys/net/ipv4/tcp_available_congestion_control
that reflects currently available TCP choices.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:48 -08:00
Vlad Yasevich
b68dbcab1d [SCTP]: Fix warning
An alternate solution would be to make the digest a pointer, allocate
it in sctp_endpoint_init() and free it in sctp_endpoint_destroy().

I guess I should have originally done it this way...

  CC [M]  net/sctp/sm_make_chunk.o
net/sctp/sm_make_chunk.c: In function 'sctp_unpack_cookie':
net/sctp/sm_make_chunk.c:1358: warning: initialization discards qualifiers from pointer target type

The reason is that sctp_unpack_cookie() takes a const struct
sctp_endpoint and modifies the digest in it (digest being embedded in
the struct, not a pointer).  Make digest a pointer to fix this
warning.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:47 -08:00
Eric Dumazet
72a3effaf6 [NET]: Size listen hash tables using backlog hint
We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
each LISTEN socket, regardless of various parameters (listen backlog for
example)

On x86_64, this means order-1 allocations (might fail), even for 'small'
sockets, expecting few connections. On the contrary, a huge server wanting a
backlog of 50000 is slowed down a bit because of this fixed limit.

This patch makes the sizing of listen hash table a dynamic parameter,
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application  (2nd parameter of listen())

For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
kmalloc().

We still limit memory allocation with the two existing tunables (somaxconn &
tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
usage.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:44 -08:00
Thomas Graf
1f6c9557e8 [NET] rules: Share common attribute validation policy
Move the attribute policy for the non-specific attributes into
net/fib_rules.h and include it in the respective protocols.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:41 -08:00
Thomas Graf
b8964ed9fa [NET] rules: Protocol independant mark selector
Move mark selector currently implemented per protocol into
the protocol independant part.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:41 -08:00
Thomas Graf
5f300893fd [IPV4] nl_fib_lookup: Rename fl_fwmark to fl_mark
For the sake of consistency.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:40 -08:00
Thomas Graf
47dcf0cb10 [NET]: Rethink mark field in struct flowi
Now that all protocols have been made aware of the mark
field it can be moved out of the union thus simplyfing
its usage.

The config options in the IPv4/IPv6/DECnet subsystems
to enable respectively disable mark based routing only
obfuscate the code with ifdefs, the cost for the
additional comparison in the flow key is insignificant,
and most distributions have all these options enabled
by default anyway. Therefore it makes sense to remove
the config options and enable mark based routing by
default.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:39 -08:00
Andrew Morton
776810217a [XFRM]: uninline xfrm_selector_match()
Six callsites, huge.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:36 -08:00
Peter Zijlstra
fcc70d5fdc [BLUETOOTH] lockdep: annotate sk_lock nesting in AF_BLUETOOTH
=============================================
[ INFO: possible recursive locking detected ]
2.6.18-1.2726.fc6 #1
2006-12-02 21:21:35 -08:00
Venkat Yekkirala
6b877699c6 SELinux: Return correct context for SO_PEERSEC
Fix SO_PEERSEC for tcp sockets to return the security context of
the peer (as represented by the SA from the peer) as opposed to the
SA used by the local/source socket.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:21:33 -08:00
Al Viro
6ba9c755e5 [BLUETOOTH]: rfcomm endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:29 -08:00
Al Viro
3fbd418acc [LLC]: anotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:23 -08:00
Al Viro
fede70b986 [IPV6]: annotate inet6_csk_search_req()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:22 -08:00
Al Viro
90bcaf7b4a [IPV6]: flowlabels are net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:21 -08:00
Al Viro
92d9ece7af [INET]: annotate inet_ecn.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:20 -08:00
Al Viro
8a9ae2110b [NET]: annotate dsfield.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:19 -08:00
Al Viro
5d36b1803d [XFRM]: annotate ->new_mapping()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:18 -08:00
Al Viro
44473a6b27 [IPV6]: annotate struct frag_hdr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:14 -08:00
Al Viro
04ce69093f [IPV6]: 'info' argument of ipv6 ->err_handler() is net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:12 -08:00
Al Viro
8c689a6eae [XFRM]: misc annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:11 -08:00
Al Viro
d2ecd9ccd0 [IPV6]: annotate inet6_hashtables
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:10 -08:00
Al Viro
5a874db4d9 [NET]: ipconfig and nfsroot annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:09 -08:00
Al Viro
3e6c8cd566 [TIPC]: endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:08 -08:00
Larry Finger
837925df02 [PATCH] ieee80211: Drop and count duplicate data frames to remove 'replay detected' log messages
In the SoftMAC version of the IEEE 802.11 stack, not all duplicate messages are
detected. For the most part, there is no difficulty; however for TKIP and CCMP
encryption, the duplicates result in a "replay detected" log message where the
received and previous values of the TSC are identical. This change adds a new
variable to the ieee80211_device structure that holds the 'seq_ctl' value for
the previous frame. When a new frame repeats the value, the frame is dropped and
the appropriate counter is incremented.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-12-02 00:11:57 -05:00
Daniel Drake
c9308b06c0 [PATCH] ieee80211: Move IV/ICV stripping into ieee80211_rx
This patch adds a host_strip_iv_icv flag to ieee80211 which indicates that
ieee80211_rx should strip the IV/ICV/other security features from the payload.
This saves on some memmove() calls in the driver and seems like something that
belongs in the stack as it can be used by bcm43xx, ipw2200, and zd1211rw

I will submit the ipw2200 patch separately as it needs testing.

This patch also adds some sensible variable reuse (idx vs keyidx) in
ieee80211_rx

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-12-02 00:11:56 -05:00
Paul Bonser
dc9b334622 [NET]: Re-fix of doc-comment in sock.h
Restoring old, correct comment for sk_filter_release, moving it to
where it should actually be, and changing new comment into proper
comment for sk_filter_rcu_free, where it actually makes sense.

The original fix submitted for this on Oct 23 mistakenly documented
the wrong function.

Signed-off-by: Paul Bonser <misterpib@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-25 15:16:51 -08:00
David Howells
c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
David Howells
65f27f3844 WorkStruct: Pass the work_struct pointer instead of context data
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct.  This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function.  This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated..  This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems.  But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default.  Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:55:48 +00:00
Simon Horman
da413908d5 [IPVS]: Compile fix for annotations in userland.
This change makes __beXX available to user-space applications, such as
ipvsadm, which include ip_vs.h

Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-09 20:00:55 -08:00
Al Viro
95026cd242 [IPV6]: Fix ECN bug on big-endian
__constant_htons(2<<4) is not a replacement for
htonl(2<<20).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-05 14:11:26 -08:00
Al Viro
02e60370d4 [IPX]: Annotate and fix IPX checksum
Calculation of IPX checksum got buggered about 2.4.0.  The old variant
mangled the packet; that got fixed, but calculation itself got buggered.
Restored the correct logics, fixed a subtle breakage we used to have even
back then: if the sum is 0 mod 0xffff, we want to return 0, not 0xffff.
The latter has special meaning for IPX (cheksum disabled).  Observation
(and obvious fix) nicked from history of FreeBSD ipx_cksum.c...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-05 14:11:25 -08:00
Al Viro
4833ed0940 [IPX]: Trivial parts of endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-05 14:11:24 -08:00
Randy Dunlap
6a43487f43 [NET]: kernel-doc fix for sock.h
Fix kernel-doc warning in include/net/sock.h:
Warning(/var/linsrc/linux-2619-rc1-pv//include/net/sock.h:894): No description found for parameter 'rcu'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-22 20:38:00 -07:00
Eric Dumazet
185b1aa122 [NET]: Reduce sizeof(struct flowi) by 20 bytes.
As suggested by David, just kill off some unused fields in dnports to
reduce sizef(struct flowi). If they come back, they should be moved to
nl_u.dn_u in order not to enlarge again struct flowi

[ Modified to really delete this stuff instead of using #if 0. -DaveM ]

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-21 20:24:01 -07:00
Jeff Garzik
cde49b0584 Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes 2006-10-21 14:21:11 -04:00
Eric Dumazet
78d7942317 [IPV4] inet_peer: Group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS
Lot of routers/embedded devices still use CPUS with 16/32 bytes cache
lines.  (486, Pentium, ...  PIII) It makes sense to group together
fields used at lookup time so they fit in one cache line.  This reduce
cache footprint and speedup lookups.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-20 00:28:35 -07:00
Thomas Graf
b52f070c9c [IPv4] fib: Remove unused fib_config members
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18 20:26:36 -07:00
Ville Nuorvala
e320af1df4 [IPV6]: Remove struct pol_chain.
Struct pol_chain has existed since at least the 2.2 kernel, but isn't used
anymore. As the IPv6 policy routing is implemented in a totally different
way in the current kernel, just get rid of it.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18 19:55:25 -07:00
Michael Buesch
7c28ad2d83 [PATCH] softmac: Fix WX and association related races
This fixes some race conditions in the WirelessExtension
handling and association handling code.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-10-16 20:09:47 -04:00
Marcel Holtmann
4c67bc74f0 [Bluetooth] Support concurrent connect requests
Most Bluetooth chips don't support concurrent connect requests, because
this would involve a multiple baseband page with only one radio. In the
case an upper layer like L2CAP requests a concurrent connect these chips
return the error "Command Disallowed" for the second request. If this
happens it the responsibility of the Bluetooth core to queue the request
and try again after the previous connect attempt has been completed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-10-15 23:14:30 -07:00
Eric Dumazet
4663afe2c8 [NET]: reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()
1) shrink struct inet_peer on 64 bits platforms.
2006-10-15 23:14:17 -07:00
Al Viro
645408d1ff [PATCH] gfp_t in netlabel
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-15 11:00:58 -07:00
YOSHIFUJI Hideaki
42b6785eeb [NET]: Introduce protocol-specific destructor for time-wait sockets.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-12 00:00:00 -07:00
Vlad Yasevich
331c4ee7fa [SCTP]: Fix receive buffer accounting.
When doing receiver buffer accounting, we always used skb->truesize.
This is problematic when processing bundled DATA chunks because for
every DATA chunk that could be small part of one large skb, we would
charge the size of the entire skb.  The new approach is to store the
size of the DATA chunk we are accounting for in the sctp_ulpevent
structure and use that stored value for accounting.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-11 23:59:44 -07:00
Venkat Yekkirala
5b368e61c2 IPsec: correct semantics for SELinux policy matching
Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.

The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.

This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.

With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).

Secondly, when a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL.  We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.

The solution for this is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.

Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely).  This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).

This patch: Fix the selinux side of things.

This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.

Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-10-11 23:59:37 -07:00
James Morris
134b0fc544 IPsec: propagate security module errors up from flow_cache_lookup
When a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return an access denied permission
(or other error).  We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.

The way I was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.

The first SYNACK would be blocked, because of an uncached lookup via
flow_cache_lookup(), which would fail to resolve an xfrm policy because
the SELinux policy is checked at that point via the resolver.

However, retransmitted SYNACKs would then find a cached flow entry when
calling into flow_cache_lookup() with a null xfrm policy, which is
interpreted by xfrm_lookup() as the packet not having any associated
policy and similarly to the first case, allowing it to pass without
transformation.

The solution presented here is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.

Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely).  This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).

Signed-off-by: James Morris <jmorris@namei.org>
2006-10-11 23:59:34 -07:00
paul.moore@hp.com
ffb733c650 NetLabel: fix a cache race condition
Testing revealed a problem with the NetLabel cache where a cached entry could
be freed while in use by the LSM layer causing an oops and other problems.
This patch fixes that problem by introducing a reference counter to the cache
entry so that it is only freed when it is no longer in use.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-10-11 23:59:29 -07:00
Dave Jones
038b0a6d8d Remove all inclusions of <linux/config.h>
kbuild explicitly includes this at build time.

Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-04 03:38:54 -04:00
Balbir Singh
17db952cd1 [PATCH] Add genetlink utilities for payload length calculation
Add two utility helper functions genlmsg_msg_size() and genlmsg_total_size().
These functions are derived from their netlink counterparts.

Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jamal Hadi <hadi@cyberus.ca>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00
Badari Pulavarty
027445c372 [PATCH] Vectorize aio_read/aio_write fileop methods
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Paul Moore
95d4e6be25 [NetLabel]: audit fixups due to delayed feedback
Fix some issues Steve Grubb had with the way NetLabel was using the audit
subsystem.  This should make NetLabel more consistent with other kernel
generated audit messages specifying configuration changes.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-29 17:05:05 -07:00
Paul Moore
32f50cdee6 [NetLabel]: add audit support for configuration changes
This patch adds audit support to NetLabel, including six new audit message
types shown below.

 #define AUDIT_MAC_UNLBL_ACCEPT 1406
 #define AUDIT_MAC_UNLBL_DENY   1407
 #define AUDIT_MAC_CIPSOV4_ADD  1408
 #define AUDIT_MAC_CIPSOV4_DEL  1409
 #define AUDIT_MAC_MAP_ADD      1410
 #define AUDIT_MAC_MAP_DEL      1411

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:03:09 -07:00
Al Viro
014d730d56 [IPVS]: ipvs annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:03:04 -07:00
Al Viro
d77072ecfb [NET]: Annotate dst_ops protocol
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:58 -07:00
Samuel Ortiz
1b0fee7d68 [IrDA]: Memory allocations cleanups
This patch replaces the bunch of arbitrary 64 and 128 bytes alloc_skb() calls
with more accurate allocation sizes.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:48 -07:00
Al Viro
4324a17430 [XFRM]: fl_ipsec_spi is net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:43 -07:00
Al Viro
61f4627b2f [XFRM]: xfrm_replay_advance() annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:41 -07:00
Al Viro
a252cc2371 [XFRM]: xrfm_replay_check() annotations
seq argument is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:40 -07:00
Al Viro
6067b2baba [XFRM]: xfrm_parse_spi() annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:39 -07:00
Al Viro
a94cfd1974 [XFRM]: xfrm_state_lookup() annotations
spi argument of xfrm_state_lookup() is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:37 -07:00
Al Viro
26977b4ed7 [XFRM]: xfrm_alloc_spi() annotated
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:36 -07:00
Al Viro
5f19343fb1 [XFRM]: addr_match() annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:34 -07:00
Al Viro
f9d07e41f8 [XFRM]: xfrm_flowi_[sd]port() annotations
both return net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:32 -07:00
Al Viro
48818f822d [IPV6]: struct in6_addr annotations
in6_addr elements are net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:30 -07:00
Al Viro
82103232ed [IPV4]: inet_rcv_saddr() annotations
inet_rcv_saddr() returns net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:28 -07:00
Al Viro
23f33c2d4f [IPV4]: struct inet_timewait_sock annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:27 -07:00
Al Viro
fb99c848e5 [IPV4]: annotate inet_lookup() and friends
inet_lookup() annotated along with helper functions (__inet_lookup(),
__inet_lookup_established(), inet_lookup_established(),
inet_lookup_listener(), __inet_lookup_listener() and inet_ehashfn())

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:26 -07:00
Al Viro
4f765d842f [IPV4]: INET_MATCH() annotations
INET_MATCH() and friends depend on an interesting set of kludges:
	* there's a pair of adjacent fields in struct inet_sock - __be16 dport
followed by __u16 num.  We want to search by pair, so we combine the keys into
a single 32bit value and compare with 32bit value read from &...->dport.
	* on 64bit targets we combine comparisons with pair of adjacent __be32
fields in the same way.

Make sure that we don't mix those values with anything else and that pairs
we form them from have correct types.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:25 -07:00
Al Viro
81f7bf6cba [IPV4]: net/ipv4/fib annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:23 -07:00
Al Viro
6b72977bd6 [IPV4]: inet_csk_search_req() annotations
rport argument is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:15 -07:00
Al Viro
ed9bad06ee [IPV4] net/ipv4/arp.c: trivial annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:14 -07:00
Al Viro
e11be94bf6 [IPV4]: struct inet_request_sock annotations
->port is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:12 -07:00
Al Viro
39dccd9d92 [IPV4]: route.h annotations
ip_route_connect(), ip_route_newports() get port numbers net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:10 -07:00
Al Viro
35986b329f [IPV4]: ip_icmp_error() annotations
port is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:09 -07:00
Al Viro
0579016ec4 [IPV4]: ip_local_error() annotations
port argument is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:08 -07:00
Al Viro
cc939d3734 [NET]: ip ports in struct flowi are net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:07 -07:00
Al Viro
2816e1284a [IPV4]: ports in struct inet_sock are net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:06 -07:00
Al Viro
4b06a7cf2f [IPV4]: ip_local_error() ipv4 address argument annotated
daddr is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:56 -07:00
Al Viro
c1d18f9fa0 [IPV4]: struct ipcm_cookie annotation
->addr is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:54 -07:00
Al Viro
3ca3c68e76 [IPV4]: struct ip_options annotations
->faddr is net-endian; annotated as such, variables inferred to be net-endian
annotated.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:53 -07:00
Al Viro
7f25afbbef [IPV4]: inet_csk_search_req() (partial) annotations
raddr is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:52 -07:00
Al Viro
adaf345b53 [IPV4]: annotate address in inet_request_sock
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:51 -07:00
Marcel Holtmann
6ac59344ef [Bluetooth] Support create connection cancel command
In case of non-blocking connects it is possible that the last user
of an ACL link quits before the connection has been fully established.
This will lead to a race condition where the internal state of a
connection is closed, but the actual link has been established and is
active. In case of Bluetooth 1.2 and later devices it is possible to
call create connection cancel to abort the connect. For older devices
the disconnect timer will be used to trigger the needed disconnect.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-09-28 18:01:33 -07:00
Marcel Holtmann
1143e5a6d4 [Bluetooth] Read local version information on device init
The local version information are needed to identify certain feature
sets of devices. They must be read on device init and stored for later
use. It is also possible to access them through the device model.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-09-28 18:01:32 -07:00
Marcel Holtmann
defc761bc2 [Bluetooth] Handle command complete event for exit periodic inquiry
The command complete event of the exit periodic inquiry command must
clear the HCI_INQUIRY flag and finish the HCI request.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-09-28 18:01:29 -07:00
Marcel Holtmann
0ac53939a0 [Bluetooth] Add HCI device identifier for SDIO cards
This patch assigns the next free HCI device identifier to Bluetooth
devices based on the SDIO interface.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-09-28 18:01:28 -07:00
Marcel Holtmann
b219e3ac66 [Bluetooth] Integrate low-level connections into the driver model
This patch integrates the low-level connections (ACL and SCO) into the
driver model. Every connection is presented as device with the parent
set to its host controller device.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-09-28 18:01:25 -07:00
Al Viro
13d8eaa06a [IPV4]: ip_build_and_send_pkt() annotations
saddr and daddr are net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:19 -07:00
Al Viro
8712f774dc [IPV4]: ip_options_build() annotations
daddr is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:18 -07:00
Al Viro
d9cd66e0e5 [IPV4]: multipath_set_nhinfo() annotations
multipath_set_nhinfo() (and underlying callback) take net-endian
network and netmask.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:15 -07:00
Al Viro
80e856e16a [IPV4]: annotate addresses in fib_result and fib_result_nl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:10 -07:00
Al Viro
53576d9b99 [IPV4]: inetpeer annotations
This one is interesting - we use net-endian value as search key, but
order the tree by *host-endian* comparisons of keys.  OK since we only
care about lookups.  Annotated inet_getpeer() and friends.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:09 -07:00
Al Viro
d878e72e41 [IPV4]: ip_fib_check_default() annotated
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:08 -07:00
Al Viro
fd68322209 [IPV4]: inet_addr_type() annotations
argument and inferred net-endian variables in callers annotated.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:07 -07:00
Al Viro
e4883014f4 [IPV4]: icmp_send() annotation
The last argument is network-endian (it will go straight into the packet).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:06 -07:00
Al Viro
6d85c10abe [IPV4]: struct fib_config IPv4 address fields annotated
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:17 -07:00
Al Viro
00012e5bb9 [IPV4]: introduce nla_get_be32()/NLA_PUT_BE32()
net-endian counterparts of nla_get_u32()/NLA_PUT_U32()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:14 -07:00
Al Viro
b83738ae00 [IPV4]: FIB_RES_PREFSRC() annotated
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:13 -07:00
Al Viro
ed49e3caaa [IPV4]: fib_hn ->nh_gw is net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:12 -07:00
Al Viro
d9c9df8c93 [IPV4]: fib_validate_source() annotations
annotated arguments and inferred net-endian variables in callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:09 -07:00
Al Viro
011a926108 [IPV4]: annotated ipv4 addresses in struct inet_sock
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:07 -07:00
Al Viro
bada8adc4e [IPV4]: ip_route_connect() ipv4 address arguments annotated
annotated address arguments (port number left alone for now); ditto
for inferred net-endian variables in callers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:06 -07:00
Al Viro
f2c3fe2411 [IPV4]: annotate ipv4 addresses in struct rtable and struct flowi
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:05 -07:00
Al Viro
f7655229c0 [IPV4]: ip_rt_redirect() annotations
The first 4 arguments of ip_rt_redirect() are net-endian.  Annotated.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:03 -07:00
Al Viro
9e12bb22e3 [IPV4]: ip_route_input() annotations
ip_route_input() takes net-endian source and destination address.
* Annotated as such.
* arguments of its invocations annotated where needed.
* local helpers getting the same values passed to by it (ip_route_input_mc(),
ip_route_input_slow(), ip_handle_martian_source(), ip_mkroute_input(),
ip_mkroute_input_def(), __mkroute_input()) annotated

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:02 -07:00
Paul Moore
fcd4828064 [NetLabel]: rework the Netlink attribute handling (part 1)
At the suggestion of Thomas Graf, rewrite NetLabel's use of Netlink attributes
to better follow the common Netlink attribute usage.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-25 15:56:09 -07:00
Paul Moore
4fe5d5c07a [Netlink]: add nla_validate_nested()
Add a new function, nla_validate_nested(), to validate nested Netlink
attributes.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-25 15:54:03 -07:00
Paul Moore
22acb19a91 [NETLINK]: add nla_for_each_nested() to the interface list
At the top of include/net/netlink.h is a list of Netlink interfaces, however,
the nla_for_each_nested() macro was not listed.  This patch adds this interface
to the list at the top of the header file.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-25 15:53:37 -07:00
Paul Moore
14a72f53fb [NetLabel]: correct improper handling of non-NetLabel peer contexts
Fix a problem where NetLabel would always set the value of 
sk_security_struct->peer_sid in selinux_netlbl_sock_graft() to the context of
the socket, causing problems when users would query the context of the
connection.  This patch fixes this so that the value in
sk_security_struct->peer_sid is only set when the connection is NetLabel based,
otherwise the value is untouched.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-25 15:52:01 -07:00
Al Viro
1db2ea398f [PATCH] netlabel gfp annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-24 15:55:03 -07:00
Linus Torvalds
a319a2773a Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (217 commits)
  net/ieee80211: fix more crypto-related build breakage
  [PATCH] Spidernet: add ethtool -S (show statistics)
  [NET] GT96100: Delete bitrotting ethernet driver
  [PATCH] mv643xx_eth: restrict to 32-bit PPC_MULTIPLATFORM
  [PATCH] Cirrus Logic ep93xx ethernet driver
  r8169: the MMIO region of the 8167 stands behin BAR#1
  e1000, ixgb: Remove pointless wrappers
  [PATCH] Remove powerpc specific parts of 3c509 driver
  [PATCH] s2io: Switch to pci_get_device
  [PATCH] gt96100: move to pci_get_device API
  [PATCH] ehea: bugfix for register access functions
  [PATCH] e1000 disable device on PCI error
  drivers/net/phy/fixed: #if 0 some incomplete code
  drivers/net: const-ify ethtool_ops declarations
  [PATCH] ethtool: allow const ethtool_ops
  [PATCH] sky2: big endian
  [PATCH] sky2: fiber support
  [PATCH] sky2: tx pause bug fix
  drivers/net: Trim trailing whitespace
  [PATCH] ehea: IBM eHEA Ethernet Device Driver
  ...

Manually resolved conflicts in drivers/net/ixgb/ixgb_main.c and
drivers/net/sky2.c related to CHECKSUM_HW/CHECKSUM_PARTIAL changes by
commit 84fa7933a3 that just happened to be
next to unrelated changes in this update.
2006-09-24 10:15:13 -07:00
Jeff Garzik
28eb177dfa Merge branch 'master' into upstream
Conflicts:

	net/ieee80211/ieee80211_crypt_tkip.c
	net/ieee80211/ieee80211_crypt_wep.c
2006-09-22 20:10:23 -04:00
Noriaki TAKAMIYA
3b9f9a1c39 [IPV6] ADDRCONF: Mobile IPv6 Home Address support.
IFA_F_HOMEADDRESS is introduced for Mobile IPv6 Home Addresses on
Mobile Node.

The IFA_F_HOMEADDRESS flag should be set for Mobile IPv6 Home
Addresses for 2 purposes. 1) We need to check this on receipt of
Type 2 Routing Header (RFC3775 Secion 6.4), 2) We prefer Home
Address(es) in source address selection (RFC3484 Section 5 Rule 4).

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:20:29 -07:00
YOSHIFUJI Hideaki
8814c4b533 [IPV6] ADDRCONF: Convert addrconf_lock to RCU.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:20:26 -07:00
Ville Nuorvala
62dd93181a [IPV6] NDISC: Set per-entry is_router flag in Proxy NA.
We have sent NA with router flag from the node-wide forwarding
configuration.  This is not appropriate for proxy NA, and it should be
set according to each proxy entry's configuration.

This is used by Mobile IPv6 home agent to support physical home link
in acting as a proxy router for mobile node which is not a router,
for example.

Based on MIPL2 kernel patch.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-09-22 15:20:24 -07:00
Patrick McHardy
9123de2c04 [NETFILTER]: ip6table_mangle: reroute when nfmark changes in NF_IP6_LOCAL_OUT
Now that IPv6 supports policy routing we need to reroute in NF_IP6_LOCAL_OUT
when the mark value changes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:51 -07:00
Patrick McHardy
a1e59abf82 [XFRM]: Fix wildcard as tunnel source
Hashing SAs by source address breaks templates with wildcards as tunnel
source since the source address used for hashing/lookup is still 0/0.
Move source address lookup to xfrm_tmpl_resolve_one() so we can use the
real address in the lookup.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:06 -07:00
Alexey Kuznetsov
1ef9696c90 [TCP]: Send ACKs each 2nd received segment.
It does not affect either mss-sized connections (obviously) or
connections controlled by Nagle (because there is only one small
segment in flight).

The idea is to record the fact that a small segment arrives on a
connection, where one small segment has already been received and
still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains
receive buffer.

In other words, it is a "soft" each-2nd-segment ACK, which is enough
to preserve ACK clock even when ABC is enabled.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:05 -07:00
Adrian Bunk
1616436601 [SCTP]: Cleanups
This patch contains the following cleanups:
- make the following needlessly global function static:
  - socket.c: sctp_apply_peer_addr_params()
- add proper prototypes for the several global functions in
  include/net/sctp/sctp.h

Note that this fixes wrong prototypes for the following functions:
- sctp_snmp_proc_exit()
- sctp_eps_proc_exit()
- sctp_assocs_proc_exit()

The latter was spotted by the GNU C compiler and reported
by David Woodhouse.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:03 -07:00
Thomas Graf
eb328111ef [GENL]: Provide more information to userspace about registered genl families
Additionaly exports the following information when providing
the list of registered generic netlink families:
  - protocol version
  - header size
  - maximum number of attributes
  - list of available operations including
      - id
      - flags
      - avaiability of policy and doit/dumpit function

libnl HEAD provides a utility to read this new information:

	0x0010 nlctrl version 1
	    hdrsize 0 maxattr 6
	      op GETFAMILY (0x03) [POLICY,DOIT,DUMPIT]
	0x0011 NLBL_MGMT version 1
	    hdrsize 0 maxattr 0
	      op unknown (0x02) [DOIT]
	      op unknown (0x03) [DOIT]
	      ....

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:51 -07:00
Jamal Hadi Salim
eb878e8457 [IPSEC]: output mode to take an xfrm state as input param
Expose IPSEC modes output path to take an xfrm state as input param.
This makes it consistent with the input mode processing (which already
takes the xfrm state as a param).

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:48 -07:00
Dmitry Mishin
fda9ef5d67 [NET]: Fix sk->sk_filter field access
Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg
needlock = 0, while socket is not locked at that moment. In order to avoid
this and similar issues in the future, use rcu for sk->sk_filter field read
protection.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
2006-09-22 15:18:47 -07:00
Paul Moore
7a0e1d6022 [NetLabel]: add some missing #includes to various header files
Add some missing include files to the NetLabel related header files.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:39 -07:00
Paul Moore
1b7f775209 [NetLabel]: remove unused function prototypes
Removed some older function prototypes for functions that no longer exist.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:35 -07:00
Thomas Graf
a5531a5d85 [NETLINK]: Improve string attribute validation
Introduces a new attribute type NLA_NUL_STRING to support NUL
terminated strings. Attributes of this kind require to carry
a terminating NUL within the maximum specified in the policy.

The `old' NLA_STRING which is not required to be NUL terminated
is extended to provide means to specify a maximum length of the
string.

Aims at easing the pain with using nla_strlcpy() on temporary
buffers.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:24 -07:00
David S. Miller
e3b4eadbea [UDP]: saddr_cmp function should take const socket pointers
This also kills a warning while building ipv6:

net/ipv6/udp.c: In function ‘udp_v6_get_port’:
net/ipv6/udp.c:66: warning: passing argument 3 of ‘udp_get_port’ from incompatible pointer type

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:23 -07:00
Gerrit Renker
25030a7f9e [UDP]: Unify UDPv4 and UDPv6 ->get_port()
This patch creates one common function which is called by
udp_v4_get_port() and udp_v6_get_port(). As a result,
  * duplicated code is removed
  * udp_port_rover and local port lookup can now be removed from udp.h
  * further savings follow since the same function will be used by UDP-Litev4
    and UDP-Litev6

In contrast to the patch sent in response to Yoshifujis comments
(fixed by this variant), the code below also removes the
EXPORT_SYMBOL(udp_port_rover), since udp_port_rover can now remain
local to net/ipv4/udp.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:21 -07:00
Johannes Berg
ff5dfe736d [NETLINK]: remove third bogus argument from NLA_PUT_FLAG
This patch removes the 'value' argument from NLA_PUT_FLAG which is
unused anyway. The documentation comment was already correct so it
doesn't need an update :)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:18 -07:00
YOSHIFUJI Hideaki
75bff8f023 [IPV6] ROUTE: Routing by FWMARK.
Based on patch by Jean Lorchat <lorchat@sfc.wide.ad.jp>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-09-22 15:18:00 -07:00
David S. Miller
e4bec827fe [IPSEC] esp: Defer output IV initialization to first use.
First of all, if the xfrm_state only gets used for input
packets this entropy is a complete waste.

Secondly, it is often the case that a configuration loads
many rules (perhaps even dynamically) and they don't all
necessarily ever get used.

This get_random_bytes() call was showing up in the profiles
for xfrm_state inserts which is how I noticed this.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:17:35 -07:00
David S. Miller
2518c7c2b3 [XFRM]: Hash policies when non-prefixed.
This idea is from Alexey Kuznetsov.

It is common for policies to be non-prefixed.  And for
that case we can optimize lookups, insert, etc. quite
a bit.

For each direction, we have a dynamically sized policy
hash table for non-prefixed policies.  We also have a
hash table on policy->index.

For prefixed policies, we have a list per-direction which
we will consult on lookups when a non-prefix hashtable
lookup fails.

This still isn't as efficient as I would like it.  There
are four immediate problems:

1) Lots of excessive refcounting, which can be fixed just
   like xfrm_state was
2) We do 2 hash probes on insert, one to look for dups and
   one to allocate a unique policy->index.  Althought I wonder
   how much this matters since xfrm_state inserts do up to
   3 hash probes and that seems to perform fine.
3) xfrm_policy_insert() is very complex because of the priority
   ordering and entry replacement logic.
4) Lots of counter bumping, in addition to policy refcounts,
   in the form of xfrm_policy_count[].  This is merely used
   to let code path(s) know that some IPSEC rules exist.  So
   this count is indexed per-direction, maybe that is overkill.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:48 -07:00
David S. Miller
1c09539975 [XFRM]: Purge dst references to deleted SAs passively.
Just let GC and other normal mechanisms take care of getting
rid of DST cache references to deleted xfrm_state objects
instead of walking all the policy bundles.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:46 -07:00
David S. Miller
c7f5ea3a4d [XFRM]: Do not flush all bundles on SA insert.
Instead, simply set all potentially aliasing existing xfrm_state
objects to have the current generation counter value.

This will make routes get relooked up the next time an existing
route mentioning these aliased xfrm_state objects gets used,
via xfrm_dst_check().

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:45 -07:00
David S. Miller
9d4a706d85 [XFRM]: Add generation count to xfrm_state and xfrm_dst.
Each xfrm_state inserted gets a new generation counter
value.  When a bundle is created, the xfrm_dst objects
get the current generation counter of the xfrm_state
they will attach to at dst->xfrm.

xfrm_bundle_ok() will return false if it sees an
xfrm_dst with a generation count different from the
generation count of the xfrm_state that dst points to.

This provides a facility by which to passively and
cheaply invalidate cached IPSEC routes during SA
database changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:42 -07:00
David S. Miller
8f126e37c0 [XFRM]: Convert xfrm_state hash linkage to hlists.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:40 -07:00
David S. Miller
edcd582152 [XFRM]: Pull xfrm_state_by{spi,src} hash table knowledge out of afinfo.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:39 -07:00
David S. Miller
2770834c9f [XFRM]: Pull xfrm_state_bydst hash table knowledge out of afinfo.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:38 -07:00
Masahide NAKAMURA
f7b6983f0f [XFRM] POLICY: Support netlink socket interface for sub policy.
Sub policy can be used through netlink socket.
PF_KEY uses main only and it is TODO to support sub.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:35 -07:00
Masahide NAKAMURA
41a49cc3c0 [XFRM]: Add sorting interface for state and template.
Under two transformation policies it is required to merge them.
This is a platform to sort state for outbound and templates
for inbound respectively.
It will be used when Mobile IPv6 and IPsec are used at the same time.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:34 -07:00
Masahide NAKAMURA
4e81bb8336 [XFRM] POLICY: sub policy support.
Sub policy is introduced. Main and sub policy are applied the same flow.
(Policy that current kernel uses is named as main.)
It is required another transformation policy management to keep IPsec
and Mobile IPv6 lives separate.
Policy which lives shorter time in kernel should be a sub i.e. normally
main is for IPsec and sub is for Mobile IPv6.
(Such usage as two IPsec policies on different database can be used, too.)

Limitation or TODOs:
 - Sub policy is not supported for per socket one (it is always inserted as main).
 - Current kernel makes cached outbound with flowi to skip searching database.
   However this patch makes it disabled only when "two policies are used and
   the first matched one is bypass case" because neither flowi nor bundle
   information knows about transformation template size.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-09-22 15:08:34 -07:00
Masahide NAKAMURA
97a64b4577 [XFRM]: Introduce XFRM_MSG_REPORT.
XFRM_MSG_REPORT is a message as notification of state protocol and
selector from kernel to user-space.

Mobile IPv6 will use it when inbound reject is occurred at route
optimization to make user-space know a binding error requirement.

Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:30 -07:00
Masahide NAKAMURA
df0ba92a99 [XFRM]: Trace which secpath state is reject factor.
For Mobile IPv6 usage, it is required to trace which secpath state is
reject factor in order to notify it to user space (to know the address
which cannot be used route optimized communication).

Based on MIPL2 kernel patch.

This patch was also written by: Henrik Petander <petander@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:08:29 -07:00
Masahide NAKAMURA
2ce4272a69 [IPV6] MIP6: Transformation support mobility header.
Transformation support mobility header.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:07:03 -07:00
Masahide NAKAMURA
7be96f7628 [IPV6] MIP6: Add receiving mobility header functions through raw socket.
Like ICMPv6, mobility header is handled through raw socket.
In inbound case, check only whether ICMPv6 error should be sent as a reply
or not by kernel.
Based on MIPL2 kernel patch.

This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
This patch was also written by: Antti Tuominen <anttit@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:07:01 -07:00
Masahide NAKAMURA
2b741653b6 [IPV6] MIP6: Add Mobility header definition.
Add Mobility header definition for Mobile IPv6.
Based on MIPL2 kernel patch.

This patch was also written by: Antti Tuominen <anttit@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:07:00 -07:00
Noriaki TAKAMIYA
3d126890dd [IPV6] MIP6: Add destination options header transformation.
Add destination options header transformation for Mobile IPv6.
Based on MIPL2 kernel patch.

This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:58 -07:00
Noriaki TAKAMIYA
2c8d7ca0f7 [IPV6] MIP6: Add routing header type 2 transformation.
Add routing header type 2 transformation for Mobile IPv6.
Based on MIPL2 kernel patch.

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:57 -07:00
Masahide NAKAMURA
a80ff03e05 [IPV6]: Allow to replace skbuff by TLV parser.
In receiving Mobile IPv6 home address option which is a TLV carried by
destination options header, kernel will try to mangle source adderss
of packet. Think of cloned skbuff it is required to replace it by the
parser just like routing header case.

This is a framework to achieve that to allow TLV parser to replace
inbound skbuff pointer.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:51 -07:00
Masahide NAKAMURA
c61a404325 [IPV6]: Find option offset by type.
This is a helper to search option offset from extension header which
can carry TLV option like destination options header.

Mobile IPv6 home address option will use it.

Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:50 -07:00
Masahide NAKAMURA
65d4ed9221 [IPV6] MIP6: Add inbound interface of routing header type 2.
Add inbound interface of routing header type 2 for Mobile IPv6.
Based on MIPL2 kernel patch.

This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:48 -07:00
Masahide NAKAMURA
e53820de0f [XFRM] IPV6: Restrict bundle reusing
For outbound transformation, bundle is checked whether it is
suitable for current flow to be reused or not. In such IPv6 case
as below, transformation may apply incorrect bundle for the flow instead
of creating another bundle:

- The policy selector has destination prefix length < 128
  (Two or more addresses can be matched it)
- Its bundle holds dst entry of default route whose prefix length < 128
  (Previous traffic was used such route as next hop)
- The policy and the bundle were used a transport mode state and
  this time flow address is not matched the bundled state.

This issue is found by Mobile IPv6 usage to protect mobility signaling
by IPsec, but it is not a Mobile IPv6 specific.
This patch adds strict check to xfrm_bundle_ok() for each
state mode and address when prefix length is less than 128.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:44 -07:00
Masahide NAKAMURA
9afaca0579 [XFRM] IPV6: Update outbound state timestamp for each sending.
With this patch transformation state is updated last used time
for each sending. Xtime is used for it like other state lifetime
expiration.
Mobile IPv6 enabled nodes will want to know traffic status of each
binding (e.g. judgement to request binding refresh by correspondent node,
or to keep home/care-of nonce alive by mobile node).
The last used timestamp is an important hint about it.
Based on MIPL2 kernel patch.

This patch was also written by: Henrik Petander <petander@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:43 -07:00
Noriaki TAKAMIYA
060f02a3bd [XFRM] STATE: Introduce care-of address.
Care-of address is carried by state as a transformation option like
IPsec encryption/authentication algorithm.

Based on MIPL2 kernel patch.

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-09-22 15:06:42 -07:00
Masahide NAKAMURA
1b5c229987 [XFRM] STATE: Support non-fragment outbound transformation headers.
For originated outbound IPv6 packets which will fragment, ip6_append_data()
should know length of extension headers before sending them and
the length is carried by dst_entry.
IPv6 IPsec headers fragment then transformation was
designed to place all headers after fragment header.
OTOH Mobile IPv6 extension headers do not fragment then
it is a good idea to make dst_entry have non-fragment length to tell it
to ip6_append_data().

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:41 -07:00
Masahide NAKAMURA
99505a8436 [XFRM] STATE: Add a hook to obtain local/remote outbound address.
Outbound transformation replaces both source and destination address with
state's end-point addresses at the same time when IPsec tunnel mode.
It is also required to change them for Mobile IPv6 route optimization, but we
should care about the following differences:
 - changing result is not end-point but care-of address
 - either source or destination is replaced for each state
This hook is a common platform to change outbound address.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:41 -07:00
Masahide NAKAMURA
fbd9a5b47e [XFRM] STATE: Common receive function for route optimization extension headers.
XFRM_STATE_WILDRECV flag is introduced; the last resort state is set
it and receives packet which is not route optimized but uses such
extension headers i.e. Mobile IPv6 signaling (binding update and
acknowledgement).  A node enabled Mobile IPv6 adds the state.

Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:39 -07:00
Masahide NAKAMURA
aee5adb430 [XFRM] STATE: Add a hook to find offset to be inserted header in outbound.
On current kernel, ip6_find_1stfragopt() is used by IPv6 IPsec to find
offset to be inserted header in outbound for transport mode. (BTW, no
usage may be needed for IPv4 case.)  Mobile IPv6 requires another
logic for routing header and destination options header
respectively. This patch is common platform for the offset and adopts
it to IPsec.

Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:36 -07:00
Masahide NAKAMURA
eb2971b68a [XFRM] STATE: Search by address using source address list.
This is a support to search transformation states by its addresses
by using source address list for Mobile IPv6 usage.
To use it from user-space, it is also added a message type for
source address as a xfrm state option.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:35 -07:00
Masahide NAKAMURA
6c44e6b7ab [XFRM] STATE: Add source address list.
Support source address based searching.
Mobile IPv6 will use it.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:34 -07:00
Masahide NAKAMURA
622dc8281a [XFRM]: Expand XFRM_MAX_DEPTH for route optimization.
XFRM_MAX_DEPTH is a limit of transformation states to be applied to the same
flow. Two more extension headers are used by Mobile IPv6 transformation.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:33 -07:00
Masahide NAKAMURA
dc00a52560 [XFRM] STATE: Allow non IPsec protocol.
It will be added two more transformation protocols (routing header
and destination options header) for Mobile IPv6.
xfrm_id_proto_match() can be handle zero as all, IPSEC_PROTO_ANY as
all IPsec and otherwise as exact one.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:32 -07:00
Masahide NAKAMURA
5794708f11 [XFRM]: Introduce a helper to compare id protocol.
Put the helper to header for future use.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:06:24 -07:00
Masahide NAKAMURA
7e49e6de30 [XFRM]: Add XFRM_MODE_xxx for future use.
Transformation mode is used as either IPsec transport or tunnel.
It is required to add two more items, route optimization and inbound trigger
for Mobile IPv6.
Based on MIPL2 kernel patch.

This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:05:15 -07:00
YOSHIFUJI Hideaki
77d16f450a [IPV6] ROUTE: Unify RT6_F_xxx and RT6_SELECT_F_xxx flags
Unify RT6_F_xxx and RT6_SELECT_F_xxx flags into
RT6_LOOKUP_F_xxx flags, and put them into ip6_route.h

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:56 -07:00
YOSHIFUJI Hideaki
7fc33165a7 [IPV6] ROUTE: Put SUBTREE() as FIB6_SUBTREE() into ip6_fib.h for future use.
Based on MIPL2 kernel patch.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:51 -07:00
YOSHIFUJI Hideaki
8e1ef0a95b [IPV6]: Cache source address as well in ipv6_pinfo{}.
Based on MIPL2 kernel patch.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:45 -07:00
YOSHIFUJI Hideaki
5e032e32ec [IPV6] NDISC: Take source address into account for redirects.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:41 -07:00
Vladislav Yasevich
3fd091e73b [SCTP]: Remove multiple levels of msecs to jiffies conversions.
The SCTP sysctl entries are displayed in milliseconds, but stored
internally in jiffies. This results in multiple levels of msecs to
jiffies conversion and as a result produces a truncation error. This
patch makes things consistent in that we store and display defaults
in milliseconds and only convert once for use by association.
This patch also adds some sane min/max values so that we don't go off
the deep end.

Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:39 -07:00
Sridhar Samudrala
ac0b046272 [SCTP]: Extend /proc/net/sctp/snmp to provide more statistics.
This patch adds more statistics info under /proc/net/sctp/snmp
that should be useful for debugging. The additional events that
are counted now include timer expirations, retransmits, packet
and data chunk discards.

The Data chunk discards include all the cases where a data chunk
is discarded including high tsn, bad stream, dup tsn and the most
useful one(out of receive buffer/rwnd).

Also moved the SCTP MIB data structures from the generic include
directories to include/sctp/sctp.h.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:16 -07:00
Thomas Graf
86872cb579 [IPv6] route: FIB6 configuration using struct fib6_config
Replaces the struct in6_rtmsg based interface orignating from
the ioctl interface with a struct fib6_config based on. Allows
changing the interface without breaking the ioctl interface
and avoids passing on tons of parameters.

The recently introduced struct nl_info is used to pass on
netlink authorship information for notifications.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:12 -07:00
Thomas Graf
40e22e8f3d [IPv6] route: Simplify ip6_ins_rt()
Provide a simple ip6_ins_rt() for the majority of users and
an alternative for the exception via netlink. Avoids code
obfuscation.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:11 -07:00
Thomas Graf
e0a1ad73d3 [IPv6] route: Simplify ip6_del_rt()
Provide a simple ip6_del_rt() for the majority of users and
an alternative for the exception via netlink. Avoids code
obfuscation.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:11 -07:00
David S. Miller
e9ce1cd3cf [PKT_SCHED]: Kill pkt_act.h inlining.
This was simply making templates of functions and mostly causing a lot
of code duplication in the classifier action modules.

We solve this more cleanly by having a common "struct tcf_common" that
hash worker functions contained once in act_api.c can work with.

Callers work with real action objects that have the common struct
plus their module specific struct members.  You go from a common
object to the higher level one using a "to_foo()" macro which makes
use of container_of() to do the dirty work.

This also kills off act_generic.h which was only used by act_simple.c
and keeping it around was more work than the it's value.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:10 -07:00
Thomas Graf
d889ce3b29 [IPv4]: Convert route get to new netlink api
Fixes various unvalidated netlink attributes causing memory
corruptions when left empty by userspace applications.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:06 -07:00
Thomas Graf
4e902c5741 [IPv4]: FIB configuration using struct fib_config
Introduces struct fib_config replacing the ugly struct kern_rta
prone to ordering issues. Avoids creating faked netlink messages
for auto generated routes or requests via ioctl.

A new interface net/nexthop.h is added to help navigate through
nexthop configuration arrays.

A new struct nl_info will be used to carry the necessary netlink
information to be used for notifications later on.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:04 -07:00
Thomas Graf
97676b6b55 [RTNETLINK]: Add rtnetlink notification interface
Adds rtnl_notify() to send rtnetlink notification messages and
rtnl_set_sk_err() to report notification errors as socket
errors in order to indicate the need of a resync due to loss
of events.

nlmsg_report() is added to properly document the meaning of
NLM_F_ECHO.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:50 -07:00
Thomas Graf
d387f6ad10 [NETLINK]: Add notification message sending interface
Adds nlmsg_notify() implementing proper notification logic. The
message is multicasted to all listeners in the group. The
applications the requests orignates from can request a unicast
back report in which case said socket will be excluded from the
multicast to avoid duplicated notifications.

nlmsg_multicast() is extended to take allocation flags to
allow notification in atomic contexts.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:49 -07:00
Adrian Bunk
2aa7f36cdb [DECNET]: cleanups
- make the following needlessly global functions static:
  - dn_fib.c: dn_fib_sync_down()
  - dn_fib.c: dn_fib_sync_up()
  - dn_rules.c: dn_fib_rule_action()
- remove the following unneeded prototype:
  - dn_fib.c: dn_cache_dump()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:40 -07:00
Adrian Bunk
90d41122f7 [IPV6] ip6_fib.c: make code static
Make the following needlessly global code static:
- fib6_walker_lock
- struct fib6_walker_list
- fib6_walk_continue()
- fib6_walk()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:38 -07:00
Patrick McHardy
abcab26830 [DECNET]: Increase number of possible routing tables to 2^32
Increase the number of possible routing tables to 2^32 by replacing the
fixed sized array of pointers by a hash table and replacing iterations
over all possible table IDs by hash table walking.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:28 -07:00
Patrick McHardy
1b43af5480 [IPV6]: Increase number of possible routing tables to 2^32
Increase number of possible routing tables to 2^32 by replacing iterations
over all possible table IDs by hash table walking.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:27 -07:00
Patrick McHardy
1af5a8c4a1 [IPV4]: Increase number of possible routing tables to 2^32
Increase the number of possible routing tables to 2^32 by replacing the
fixed sized array of pointers by a hash table and replacing iterations
over all possible table IDs by hash table walking.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:26 -07:00
Patrick McHardy
9e762a4a89 [NET]: Introduce RTA_TABLE/FRA_TABLE attributes
Introduce RTA_TABLE route attribute and FRA_TABLE routing rule attribute
to hold 32 bit routing table IDs. Usespace compatibility is provided by
continuing to accept and send the rtm_table field, but because of its
limited size it can only carry the low 8 bits of the table ID. This
implies that if larger IDs are used, _all_ userspace programs using them
need to use RTA_TABLE.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:25 -07:00
Patrick McHardy
2dfe55b47e [NET]: Use u32 for routing table IDs
Use u32 for routing table IDs in net/ipv4 and net/decnet in preparation of
support for a larger number of routing tables. net/ipv6 already uses u32
everywhere and needs no further changes. No functional changes are made by
this patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:24 -07:00
Stephen Hemminger
d924424aae [NEIGHBOUR]: Use ALIGN() macro.
Rather than opencoding the mask, it looks better to use ALIGN()
macro from kernel.h.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:23 -07:00
Steven Whitehouse
a8731cbf61 [DECNET]: Covert rules to use generic code
This patch converts the DECnet rules code to use the generic
rules system created by Thomas Graf <tgraf@suug.ch>.

Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:15 -07:00
Herbert Xu
8f491069b4 [IPV4]: Use network-order dport for all visible inet_lookup_*
Right now most inet_lookup_* functions take a host-order hnum instead
of a network-order dport because that's how it is represented
internally.

This means that users of these functions have to be careful about
using the right byte-order.  To add more confusion, inet_lookup takes
a network-order dport unlike all other functions.

So this patch changes all visible inet_lookup functions to take a
dport and move all dport->hnum conversion inside them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:14 -07:00
Herbert Xu
99a92ff504 [IPV4]: Uninline inet_lookup_listener
By modern standards this function is way too big to be inlined.  It's
even bigger than __inet_lookup_listener :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:11 -07:00
Louis Nyffenegger
1a01912ae0 [INET]: Remove is_setbyuser patch
The value is_setbyuser from struct ip_options is never used and set
only one time (http://linux-net.osdl.org/index.php/TODO#IPV4).
This little patch removes it from the kernel source.

Signed-off-by: Louis Nyffenegger <louis.nyffenegger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:10 -07:00
David S. Miller
0298f36a57 [IPV4]: Kill fib4_rules_clean().
As noted by Adrian Bunk this function is totally unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:09 -07:00
Adrian Bunk
8ce11e6a9f [NET]: Make code static.
This patch makes needlessly global code static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:07 -07:00
Thomas Graf
9067c722cf [NEIGH]: Move netlink neighbour bits to linux/neighbour.h
Moves netlink neighbour bits to linux/neighbour.h. Also
moves bits to be exported to userspace from net/neighbour.h
to linux/neighbour.h and removes __KERNEL__ guards, userspace
is not supposed to be using it.

rtnetlink_rcv_msg() is not longer required to parse attributes
for the neighbour layer, remove dependency on obsolete and
buggy rta_buf.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:01 -07:00
Thomas Graf
fe4944e59c [NETLINK]: Extend netlink messaging interface
Adds:
 nlmsg_get_pos()                 return current position in message
 nlmsg_trim()                    trim part of message
 nla_reserve_nohdr(skb, len)     reserve room for an attribute w/o hdr
 nla_put_nohdr(skb, len, data)   add attribute w/o hdr
 nla_find_nested()               find attribute in nested attributes

Fixes nlmsg_new() to take allocation flags and consider size.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:43 -07:00
Thomas Graf
e1ef4bf23b [IPV4]: Use Protocol Independant Policy Routing Rules Framework
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:42 -07:00
Thomas Graf
101367c2f8 [IPV6]: Policy Routing Rules
Adds support for policy routing rules including a new
local table for routes with a local destination.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:41 -07:00
Thomas Graf
14c0b97ddf [NET]: Protocol Independant Policy Routing Rules Framework
Derived from net/ipv/fib_rules.c

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:40 -07:00
Thomas Graf
c71099acce [IPV6]: Multiple Routing Tables
Adds the framework to support multiple IPv6 routing tables.
Currently all automatically generated routes are put into the
same table. This could be changed at a later point after
considering the produced locking overhead.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:39 -07:00
Paul Moore
11a03f78fb [NetLabel]: core network changes
Changes to the core network stack to support the NetLabel subsystem.  This
includes changes to the IPv4 option handling to support CIPSO labels.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:32 -07:00
Venkat Yekkirala
4237c75c0a [MLSXFRM]: Auto-labeling of child sockets
This automatically labels the TCP, Unix stream, and dccp child sockets
as well as openreqs to be at the same MLS level as the peer. This will
result in the selection of appropriately labeled IPSec Security
Associations.

This also uses the sock's sid (as opposed to the isec sid) in SELinux
enforcement of secmark in rcv_skb and postroute_last hooks.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:29 -07:00
Venkat Yekkirala
cb969f072b [MLSXFRM]: Default labeling of socket specific IPSec policies
This defaults the label of socket-specific IPSec policies to be the
same as the socket they are set on.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:28 -07:00
Venkat Yekkirala
beb8d13bed [MLSXFRM]: Add flow labeling
This labels the flows that could utilize IPSec xfrms at the points the
flows are defined so that IPSec policy and SAs at the right label can
be used.

The following protos are currently not handled, but they should
continue to be able to use single-labeled IPSec like they currently
do.

ipmr
ip_gre
ipip
igmp
sit
sctp
ip6_tunnel (IPv6 over IPv6 tunnel device)
decnet

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:27 -07:00
Venkat Yekkirala
e0d1caa7b0 [MLSXFRM]: Flow based matching of xfrm policy and state
This implements a seemless mechanism for xfrm policy selection and
state matching based on the flow sid. This also includes the necessary
SELinux enforcement pieces.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:24 -07:00
Venkat Yekkirala
b6340fcd76 [MLSXFRM]: Add security sid to flowi
This adds security to flow key for labeling of flows as also to allow
for making flow cache lookups based on the security label seemless.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:23 -07:00
Venkat Yekkirala
892c141e62 [MLSXFRM]: Add security sid to sock
This adds security for IP sockets at the sock level. Security at the
sock level is needed to enforce the SELinux security policy for
security associations even when a sock is orphaned (such as in the TCP
LAST_ACK state).

This will also be used to enforce SELinux controls over data arriving
at or leaving a child socket while it's still waiting to be accepted.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:22 -07:00
Herbert Xu
e4d5b79c66 [CRYPTO] users: Use crypto_comp and crypto_has_*
This patch converts all users to use the new crypto_comp type and the
crypto_has_* functions.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-21 11:46:22 +10:00
Herbert Xu
1b489e11d4 [SCTP]: Use HMAC template and hash interface
This patch converts SCTP to use the new HMAC template and hash interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-21 11:46:19 +10:00
Herbert Xu
07d4ee583e [IPSEC]: Use HMAC template and hash interface
This patch converts IPsec to use the new HMAC template.  The names of
existing simple digest algorithms may still be used to refer to their
HMAC composites.

The same structure can be used by other MACs such as AES-XCBC-MAC.

This patch also switches from the digest interface to hash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-21 11:46:18 +10:00
Herbert Xu
6b7326c849 [IPSEC] ESP: Use block ciphers where applicable
This patch converts IPSec/ESP to use the new block cipher type where
applicable.  Similar to the HMAC conversion, existing algorithm names
have been kept for compatibility.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-21 11:46:14 +10:00
Herbert Xu
04ff126094 [IPSEC]: Add compatibility algorithm name support
This patch adds a compatibility name field for each IPsec algorithm.  This
is needed when parameterised algorithms are used.  For example, "md5" will
become "hmac(md5)", and "aes" will become "cbc(aes)".

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-21 11:46:14 +10:00
Herbert Xu
9409f38a0c [IPSEC]: Move linux/crypto.h inclusion out of net/xfrm.h
The header file linux/crypto.h is only needed by a few files so including
it in net/xfrm.h (which is included by half of the networking stack) is a
waste.  This patch moves it out of net/xfrm.h and into the specific header
files that actually need it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-21 11:16:30 +10:00
Jeff Garzik
a422142cfd Merge branch 'master' into upstream 2006-08-29 17:20:55 -04:00
Sridhar Samudrala
c164a9ba0a Fix sctp privilege elevation (CVE-2006-3745)
sctp_make_abort_user() now takes the msg_len along with the msg
so that we don't have to recalculate the bytes in iovec.
It also uses memcpy_fromiovec() so that we don't go beyond the
length allocated.

It is good to have this fix even if verify_iovec() is fixed to
return error on overflow.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-22 12:52:23 -07:00
Jeff Garzik
242898be7a Merge branch 'master' into upstream 2006-08-07 06:38:41 -04:00
Ilpo Järvinen
c4c0ce5c57 [PKT_SCHED] RED: Fix overflow in calculation of queue average
Overflow can occur very easily with 32 bits, e.g., with 1 second
us_idle is approx. 2^20, which leaves only 11-Wlog bits for queue
length. Since the EWMA exponent is typically around 9, queue
lengths larger than 2^2 cause overflow. Whether the affected
branch is taken when us_idle is as high as 1 second, depends on
Scell_log, but with rather reasonable configuration Scell_log is
large enough to cause p->Stab to have zero index, which always
results zero shift (typically also few other small indices result
in zero shift).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04 22:59:51 -07:00
Jeff Garzik
23946a8a98 Merge branch 'upstream-fixes' into upstream 2006-08-03 17:20:37 -04:00
Alexey Dobriyan
29bbd72d6e [NET]: Fix more per-cpu typos
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 15:02:31 -07:00
Catherine Zhang
dc49c1f94e [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch
From: Catherine Zhang <cxzhang@watson.ibm.com>

This patch implements a cleaner fix for the memory leak problem of the
original unix datagram getpeersec patch.  Instead of creating a
security context each time a unix datagram is sent, we only create the
security context when the receiver requests it.

This new design requires modification of the current
unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
secid_to_secctx and release_secctx.  The former retrieves the security
context and the latter releases it.  A hook is required for releasing
the security context because it is up to the security module to decide
how that's done.  In the case of Selinux, it's a simple kfree
operation.

Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 14:12:06 -07:00
Tom Tucker
792d1932e3 [NET]: Network Event Notifier Mechanism.
This patch uses notifier blocks to implement a network event
notifier mechanism.

Clients register their callback function by calling
register_netevent_notifier() like this:

static struct notifier_block nb = {
        .notifier_call = my_callback_func
};

...

register_netevent_notifier(&nb);

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:20 -07:00
Wei Yongjun
3687b1dc6f [TCP]: SNMPv2 tcpAttemptFails counter error
Refer to RFC2012, tcpAttemptFails is defined as following:
  tcpAttemptFails OBJECT-TYPE
      SYNTAX      Counter32
      MAX-ACCESS  read-only
      STATUS      current
      DESCRIPTION
              "The number of times TCP connections have made a direct
              transition to the CLOSED state from either the SYN-SENT
              state or the SYN-RCVD state, plus the number of times TCP
              connections have made a direct transition to the LISTEN
              state from the SYN-RCVD state."
      ::= { tcp 7 }

When I lookup into RFC793, I found that the state change should occured
under following condition:
  1. SYN-SENT -> CLOSED
     a) Received ACK,RST segment when SYN-SENT state.

  2. SYN-RCVD -> CLOSED
     b) Received SYN segment when SYN-RCVD state(came from LISTEN).
     c) Received RST segment when SYN-RCVD state(came from SYN-SENT).
     d) Received SYN segment when SYN-RCVD state(came from SYN-SENT).

  3. SYN-RCVD -> LISTEN
     e) Received RST segment when SYN-RCVD state(came from LISTEN).

In my test, those direct state transition can not be counted to
tcpAttemptFails.

Signed-off-by: Wei Yongjun <yjwei@nanjing-fnst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:19 -07:00
Herbert Xu
497c615aba [IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls
The current users of ip6_dst_lookup can be divided into two classes:

1) The caller holds no locks and is in user-context (UDP).
2) The caller does not want to lookup the dst cache at all.

The second class covers everyone except UDP because most people do
the cache lookup directly before calling ip6_dst_lookup.  This patch
adds ip6_sk_dst_lookup for the first class.

Similarly ip6_dst_store users can be divded into those that need to
take the socket dst lock and those that don't.  This patch adds
__ip6_dst_store for those (everyone except UDP/datagram) that don't
need an extra lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:14 -07:00
Daniel Drake
f2060f039e [PATCH] ieee80211: Make ieee80211_rx_any usable
ieee80211_rx_any is new to 2.6.18-rc1, even though it appears this function
was never completed:

http://lists.sipsolutions.net/pipermail/softmac-dev/2006-February/000103.html

This patch changes ieee80211_rx_any to always claim the skb, which avoids
further driver complexity and the possibility of leaking management frames.
It also exports the function so that people can actually use it.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-07-27 16:17:28 -04:00
Daniel Drake
d7712ac254 [PATCH] softmac: export highest_supported_rate function
zd1211 needs this functionality, no point duplicating it.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-07-27 16:17:28 -04:00
Daniel Drake
5acd0c4153 [PATCH] softmac: ERP handling and driver-level notifications
This patch implements ERP handling in softmac so that the drivers can support
protection and preambles properly.

I added a new struct, ieee80211softmac_bss_info, which is used for
BSS-dependent variables like these.

A new hook has been added (bssinfo_change), which allows the drivers to be
notified when anything in bssinfo changes.

I modified the txrates_change API to match the bssinfo_change API. The
existing one is a little messy and the usefulness of providing the old rates
is questionable (and can be implemented at driver level if really necessary).
No drivers are using this API (yet), so this should be safe.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-07-27 16:17:28 -04:00
Daniel Drake
d8e2be90d3 [PATCH] ieee80211: small ERP handling additions
This adds a flag to the ieee80211_network structure which indicates whether
the stored erp_value is valid (a check against 0 is not enough, since an ERP
of 0 is valid and very meaningful).

I also added the ERP IE bit-definitions to ieee80211.h.

This is needed by some upcoming softmac patches.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-07-27 16:17:27 -04:00
Guillaume Chazarain
2266d8886f [PKT_SCHED]: Fix regression in PSCHED_TADD{,2}.
In PSCHED_TADD and PSCHED_TADD2, if delta is less than tv.tv_usec (so,
less than USEC_PER_SEC too) then tv_res will be smaller than tv. The
affectation "(tv_res).tv_usec = __delta;" is wrong.  The fix is to
revert to the original code before
4ee303dfea and change the 'if' in
'while'.

[Shuya MAEDA: "while (__delta >= USEC_PER_SEC){ ... }" instead of
"while (__delta > USEC_PER_SEC){ ... }"]

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-24 12:44:23 -07:00
Adrian Bunk
64d2f0855e [I/OAT]: net/core/user_dma.c should #include <net/netdma.h>
Every file should #include the headers containing the prototypes for
its global functions.

Especially in cases like this one where gcc can tell us through a
compile error that the prototype was wrong...

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21 14:49:49 -07:00
Sridhar Samudrala
dc022a9874 [SCTP]: ADDIP: Don't use an address as source until it is ASCONF-ACKed
This implements Rules D1 and D4 of Sec 4.3 in the ADDIP draft.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21 14:49:25 -07:00
Sridhar Samudrala
ad8fec1720 [SCTP]: Verify all the paths to a peer via heartbeat before using them.
This patch implements Path Initialization procedure as described in
Sec 2.36 of RFC4460.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21 14:48:50 -07:00
Balbir Singh
fb0ba6bd02 [PATCH] per-task-delay-accounting: utilities for genetlink usage
Two utilities for simplifying usage of NETLINK_GENERIC interface.

Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:56 -07:00
Herbert Xu
a430a43d08 [NET] gso: Fix up GSO packets with broken checksums
Certain subsystems in the stack (e.g., netfilter) can break the partial
checksum on GSO packets.  Until they're fixed, this patch allows this to
work by recomputing the partial checksums through the GSO mechanism.

Once they've all been converted to update the partial checksum instead of
clearing it, this workaround can be removed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-08 13:34:56 -07:00
Joseph Jezak
cb74c432e3 [PATCH] SoftMAC: Prevent multiple authentication attempts on the same network
This patch addresses the "No queue exists" messages commonly seen during
authentication and associating.  These appear due to scheduling multiple
authentication attempts on the same network.  To prevent this, I added a
flag to stop multiple authentication attempts by the association layer.
I also added a check to the wx handler to see if we're connecting to a
different network than the one already in progress.  This scenario was
causing multiple requests on the same network because the network BSSID
was not being updated despite the fact that the ESSID changed.

Signed-off-by: Joseph Jezak <josejx@gentoo.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-07-05 13:42:58 -04:00
Marcel Holtmann
a91f2e396f [Bluetooth] Use real devices for host controllers
This patch converts the Bluetooth class devices into real devices. The
Bluetooth class is kept and the driver core provides the appropriate
symlinks for backward compatibility.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-07-03 19:54:02 -07:00
Marcel Holtmann
27d3528425 [Bluetooth] Add platform device for virtual and serial devices
This patch adds a generic Bluetooth platform device that can be used
as parent device by virtual and serial devices.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-07-03 19:54:00 -07:00
Marcel Holtmann
04837f6447 [Bluetooth] Add automatic sniff mode support
This patch introduces the automatic sniff mode feature. This allows
the host to switch idle connections into sniff mode to safe power.

Signed-off-by: Ulisses Furquim <ulissesf@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-07-03 19:53:58 -07:00
Marcel Holtmann
da1f519851 [Bluetooth] Correct SCO buffer size on request
This patch introduces a quirk that allows the drivers to tell the host
to correct the SCO buffer size values.

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-07-03 19:53:56 -07:00
Ralf Baechle DL5RB
006f68b84f [AX.25]: Reference counting for AX.25 routes.
In the past routes could be freed even though the were possibly in use ...

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-03 19:30:18 -07:00
Ingo Molnar
a5b5bb9a05 [PATCH] lockdep: annotate sk_locks
Teach sk_lock semantics to the lock validator.  In the softirq path the
slock has mutex_trylock()+mutex_unlock() semantics, in the process context
sock_lock() case it has mutex_lock()/mutex_unlock() semantics.

Thus we treat sock_owned_by_user() flagged areas as an exclusion area too,
not just those areas covered by a held sk_lock.slock.

Effect on non-lockdep kernels: minimal, sk_lock_sock_init() has been turned
into an inline function.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:10 -07:00
Ingo Molnar
c636618485 [PATCH] lockdep: annotate bh_lock_sock()
Teach special (recursive) locking code to the lock validator.  Has no effect
on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:08 -07:00
Ingo Molnar
a09785a241 [PATCH] lockdep: annotate af_unix locking
Teach special (recursive) locking code to the lock validator.  Also splits
af_unix's sk_receive_queue.lock class from the other networking skb-queue
locks.  Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:07 -07:00
Ingo Molnar
da21f24dd7 [PATCH] lockdep: annotate sock_lock_init()
Teach special (multi-initialized, per-address-family) locking code to the lock
validator.  Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:07 -07:00
Thomas Gleixner
1fb9df5d30 [PATCH] irq-flags: drivers/net: Use the new IRQF_ constants
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-02 13:58:51 -07:00
Herbert Xu
f83ef8c0b5 [IPV6]: Added GSO support for TCPv6
This patch adds GSO support for IPv6 and TCPv6.  This is based on a patch
by Ananda Raju <Ananda.Raju@neterion.com>.  His original description is:

	This patch enables TSO over IPv6. Currently Linux network stacks
	restricts TSO over IPv6 by clearing of the NETIF_F_TSO bit from
	"dev->features". This patch will remove this restriction.

	This patch will introduce a new flag NETIF_F_TSO6 which will be used
	to check whether device supports TSO over IPv6. If device support TSO
	over IPv6 then we don't clear of NETIF_F_TSO and which will make the
	TCP layer to create TSO packets. Any device supporting TSO over IPv6
	will set NETIF_F_TSO6 flag in "dev->features" along with NETIF_F_TSO.

	In case when user disables TSO using ethtool, NETIF_F_TSO will get
	cleared from "dev->features". So even if we have NETIF_F_TSO6 we don't
	get TSO packets created by TCP layer.

	SKB_GSO_TCPV4 renamed to SKB_GSO_TCP to make it generic GSO packet.
	SKB_GSO_UDPV4 renamed to SKB_GSO_UDP as UFO is not a IPv4 feature.
	UFO is supported over IPv6 also

	The following table shows there is significant improvement in
	throughput with normal frames and CPU usage for both normal and jumbo.

	--------------------------------------------------
	|          |     1500        |      9600         |
	|          ------------------|-------------------|
	|          | thru     CPU    |  thru     CPU     |
	--------------------------------------------------
	| TSO OFF  | 2.00   5.5% id  |  5.66   20.0% id  |
	--------------------------------------------------
	| TSO ON   | 2.63   78.0 id  |  5.67   39.0% id  |
	--------------------------------------------------

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30 14:12:10 -07:00
Herbert Xu
bcd7611117 [NET]: Generalise TSO-specific bits from skb_setup_caps
This patch generalises the TSO-specific bits from sk_setup_caps by adding
the sk_gso_type member to struct sock.  This makes sk_setup_caps generic
so that it can be used by TCPv6 or UFO.

The only catch is that whoever uses this must provide a GSO implementation
for their protocol which I think is a fair deal :) For now UFO continues to
live without a GSO implementation which is OK since it doesn't use the sock
caps field at the moment.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30 14:12:08 -07:00
Herbert Xu
adcfc7d0b4 [IPV6]: Added GSO support for TCPv6
This patch adds GSO support for IPv6 and TCPv6.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30 14:12:06 -07:00
Michael Chan
b0da853703 [NET]: Add ECN support for TSO
In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

With help from Herbert Xu <herbert@gondor.apana.org.au>.

Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
flag is completely removed.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:58:08 -07:00
Catherine Zhang
877ce7c1b3 [AF_UNIX]: Datagram getpeersec
This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket.  The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets.  Basically we build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).  To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt.  Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_SOCKET &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications.  We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:58:06 -07:00
Shuya MAEDA
4ee303dfea [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000
Signed-off-by: Shuya MAEDA <maeda-sxb@necst.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:58:01 -07:00
Herbert Xu
576a30eb64 [NET]: Added GSO header verification
When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY.  Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST.  If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag.  The same method
can be used to implement TSO ECN support.  We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them.  The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:53 -07:00
Allan Stephens
4938450789 [TIPC]: Corrected potential misuse of tipc_media_addr structure.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:38:29 -07:00
Randy Dunlap
f4b8ea7849 [NET]: fix net-core kernel-doc
Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:42 -07:00
Herbert Xu
37c3185a02 [NET]: Added GSO toggle
This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device.  For now it only supports in TCPv4.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:36 -07:00
Herbert Xu
f4c50d990d [NET]: Add software TSOv4
This patch adds the GSO implementation for IPv4 TCP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:33 -07:00
Herbert Xu
7967168cef [NET]: Merge TSO/UFO fields in sk_buff
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP).  So
let's merge them.

They were used to tell the protocol of a packet.  This function has been
subsumed by the new gso_type field.  This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb.  As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction.  The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
to be emulated in software.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:29 -07:00
Jeff Garzik
dbe1ab9514 Merge branch 'master' into upstream 2006-06-22 22:51:46 -04:00
Linus Torvalds
a4cfae13ce Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [ATM]: fix broken uses of NIPQUAD in net/atm
  [SCTP]: sctp_unpack_cookie() fix
  [SCTP]: Fix unintentional change to SCTP_ASSERT when !SCTP_DEBUG
  [NET]: Prevent multiple qdisc runs
  [CONNECTOR]: Initialize subsystem earlier.
  [NETFILTER]: xt_sctp: fix endless loop caused by 0 chunk length
2006-06-20 17:39:53 -07:00
Linus Torvalds
cee4cca740 Merge git://git.infradead.org/hdrcleanup-2.6
* git://git.infradead.org/hdrcleanup-2.6: (63 commits)
  [S390] __FD_foo definitions.
  Switch to __s32 types in joystick.h instead of C99 types for consistency.
  Add <sys/types.h> to headers included for userspace in <linux/input.h>
  Move inclusion of <linux/compat.h> out of user scope in asm-x86_64/mtrr.h
  Remove struct fddi_statistics from user view in <linux/if_fddi.h>
  Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
  Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
  Include <linux/types.h> and use __uXX types in <linux/cramfs_fs.h>
  Use __uXX types in <linux/i2o_dev.h>, include <linux/ioctl.h> too
  Remove private struct dx_hash_info from public view in <linux/ext3_fs.h>
  Include <linux/types.h> and use __uXX types in <linux/affs_hardblocks.h>
  Use __uXX types in <linux/divert.h> for struct divert_blk et al.
  Use __u32 for elf_addr_t in <asm-powerpc/elf.h>, not u32. It's user-visible.
  Remove PPP_FCS from user view in <linux/ppp_defs.h>, remove __P mess entirely
  Use __uXX types in user-visible structures in <linux/nbd.h>
  Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
  Use __uXX types for S390 DASD volume label definitions which are user-visible
  S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
  Remove unneeded inclusion of <linux/time.h> from <linux/ufs_fs.h>
  Fix private integer types used in V4L2 ioctls.
  ...

Manually resolve conflict in include/linux/mtd/physmap.h
2006-06-20 15:10:08 -07:00
Jeff Garzik
4b2d9cf009 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream 2006-06-20 04:46:02 -04:00
David S. Miller
65fd28f743 [SCTP]: Fix unintentional change to SCTP_ASSERT when !SCTP_DEBUG
A local debugging change slipped into a previous changeset.

When SCTP_DEBUG is off SCTP_ASSERT should do nothing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-20 00:07:52 -07:00
Herbert Xu
48d83325b6 [NET]: Prevent multiple qdisc runs
Having two or more qdisc_run's contend against each other is bad because
it can induce packet reordering if the packets have to be requeued.  It
appears that this is an unintended consequence of relinquinshing the queue
lock while transmitting.  That in turn is needed for devices that spend a
lot of time in their transmit routine.

There are no advantages to be had as devices with queues are inherently
single-threaded (the loopback device is not but then it doesn't have a
queue).

Even if you were to add a queue to a parallel virtual device (e.g., bolt
a tbf filter in front of an ipip tunnel device), you would still want to
process the queue in sequence to ensure that the packets are ordered
correctly.

The solution here is to steal a bit from net_device to prevent this.

BTW, as qdisc_restart is no longer used by anyone as a module inside the
kernel (IIRC it used to with netif_wake_queue), I have not exported the
new __qdisc_run function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-19 23:57:59 -07:00
Linus Torvalds
d0b952a983 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (109 commits)
  [ETHTOOL]: Fix UFO typo
  [SCTP]: Fix persistent slowdown in sctp when a gap ack consumes rx buffer.
  [SCTP]: Send only 1 window update SACK per message.
  [SCTP]: Don't do CRC32C checksum over loopback.
  [SCTP] Reset rtt_in_progress for the chunk when processing its sack.
  [SCTP]: Reject sctp packets with broadcast addresses.
  [SCTP]: Limit association max_retrans setting in setsockopt.
  [PFKEYV2]: Fix inconsistent typing in struct sadb_x_kmprivate.
  [IPV6]: Sum real space for RTAs.
  [IRDA]: Use put_unaligned() in irlmp_do_discovery().
  [BRIDGE]: Add support for NETIF_F_HW_CSUM devices
  [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM
  [TG3]: Convert to non-LLTX
  [TG3]: Remove unnecessary tx_lock
  [TCP]: Add tcp_slow_start_after_idle sysctl.
  [BNX2]: Update version and reldate
  [BNX2]: Use CPU native page size
  [BNX2]: Use compressed firmware
  [BNX2]: Add firmware decompression
  [BNX2]: Allow WoL settings on new 5708 chips
  ...

Manual fixup for conflict in drivers/net/tulip/winbond-840.c
2006-06-19 18:55:56 -07:00
Vlad Yasevich
4c9f5d5305 [SCTP] Reset rtt_in_progress for the chunk when processing its sack.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:56:08 -07:00
Vlad Yasevich
5636bef732 [SCTP]: Reject sctp packets with broadcast addresses.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:55:35 -07:00
David S. Miller
35089bb203 [TCP]: Add tcp_slow_start_after_idle sysctl.
A lot of people have asked for a way to disable tcp_cwnd_restart(),
and it seems reasonable to add a sysctl to do that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:30:53 -07:00
Stephen Hemminger
bdeb04c6d9 [NET]: net.ipv4.ip_autoconfig sysctl removal
The sysctl net.ipv4.ip_autoconfig is a legacy value that is not used.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:30:30 -07:00
Herbert Xu
b38dfee3d6 [NET]: skb_trim audit
I found a few more spots where pskb_trim_rcsum could be used but were not.
This patch changes them to use it.

Also, sk_filter can get paged skb data.  Therefore we must use pskb_trim
instead of skb_trim.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:30:20 -07:00
James Morris
7c9728c393 [SECMARK]: Add secmark support to conntrack
Add a secmark field to IP and NF conntracks, so that security markings
on packets can be copied to their associated connections, and also
copied back to packets as required.  This is similar to the network
mark field currently used with conntrack, although it is intended for
enforcement of security policy rather than network policy.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:30:01 -07:00
Alexey Dobriyan
6d74165350 [IPV4]: Right prototype of __raw_v4_lookup()
All users pass 32-bit values as addresses and internally they're
compared with 32-bit entities. So, change "laddr" and "raddr" types to
__be32.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:29:39 -07:00
Stephen Hemminger
72dc5b9225 [TCP]: Minimum congestion window consolidation.
Many of the TCP congestion methods all just use ssthresh
as the minimum congestion window on decrease.  Rather than
duplicating the code, just have that be the default if that
handle in the ops structure is not set.

Minor behaviour change to TCP compound.  It probably wants
to use this (ssthresh) as lower bound, rather than ssthresh/2
because the latter causes undershoot on loss.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:29:29 -07:00
Patrick McHardy
39a27a35c5 [NETFILTER]: conntrack: add sysctl to disable checksumming
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:28:57 -07:00
Herbert Xu
73654d61e5 [IPSEC] xfrm: Use IPPROTO_MAX instead of 256
The size of the type_map array (256) comes from the number of IP protocols,
i.e., IPPROTO_MAX.  This patch is based on a suggestion from Ingo Oeser.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:28:43 -07:00
Herbert Xu
b59f45d0b2 [IPSEC] xfrm: Abstract out encapsulation modes
This patch adds the structure xfrm_mode.  It is meant to represent
the operations carried out by transport/tunnel modes.

By doing this we allow additional encapsulation modes to be added
without clogging up the xfrm_input/xfrm_output paths.

Candidate modes include 4-to-6 tunnel mode, 6-to-4 tunnel mode, and
BEET modes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:28:39 -07:00
Herbert Xu
546be2405b [IPSEC] xfrm: Undo afinfo lock proliferation
The number of locks used to manage afinfo structures can easily be reduced
down to one each for policy and state respectively.  This is based on the
observation that the write locks are only held by module insertion/removal
which are very rare events so there is no need to further differentiate
between the insertion of modules like ipv6 versus esp6.

The removal of the read locks in xfrm4_policy.c/xfrm6_policy.c might look
suspicious at first.  However, after you realise that nobody ever takes
the corresponding write lock you'll feel better :)

As far as I can gather it's an attempt to guard against the removal of
the corresponding modules.  Since neither module can be unloaded at all
we can leave it to whoever fixes up IPv6 unloading :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:28:37 -07:00
Stephen Hemminger
bc0e646796 [LLC]: add multicast support for datagrams
Allow mulitcast reception of datagrams (similar to UDP).
All sockets bound to the same SAP receive a clone.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:26:08 -07:00
Stephen Hemminger
aecbd4e45c [LLC]: use more efficient ether address routines
Use more cache efficient Ethernet address manipulation functions
in etherdevice.h.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
2006-06-17 21:26:00 -07:00
Chris Leech
9593782585 [I/OAT]: Add a sysctl for tuning the I/OAT offloaded I/O threshold
Any socket recv of less than this ammount will not be offloaded

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:25:54 -07:00
Chris Leech
624d116473 [I/OAT]: Make sk_eat_skb I/OAT aware.
Add an extra argument to sk_eat_skb, and make it move early copied
packets to the async_wait_queue instead of freeing them.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:25:52 -07:00
Chris Leech
0e4b4992b8 [I/OAT]: Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-static
Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:25:50 -07:00
Chris Leech
97fc2f0848 [I/OAT]: Structure changes for TCP recv offload to I/OAT
Adds an async_wait_queue and some additional fields to tcp_sock, and a
dma_cookie_t to sk_buff.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:25:48 -07:00
Chris Leech
de5506e155 [I/OAT]: Utility functions for offloading sk_buff to iovec copies
Provides for pinning user space pages in memory, copying to iovecs,
and copying from sk_buffs including fragmented and chained sk_buffs.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:25:46 -07:00
Chris Leech
db21733488 [I/OAT]: Setup the networking subsystem as a DMA client
Attempts to allocate per-CPU DMA channels

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:24:58 -07:00
Larry Finger
74f4903363 [PATCH] wireless: Changes to ieee80211.h for user space regulatory daemon
Attached are two small patches for include/net/ieee80211.h to prepare
for later submission of code to implement a user-space daemon that
supplies 802.11 regulatory information.

The first change adds a bit indicating that 802.11h rules are to be
applied to a channel. As discussed earlier in this list, a single bit
is unlikely to be sufficient; however, at this time I have been unable
to find any regulations implementing differences between 802.11a and
802.11h other than DFS, radar detection and passive scanning. A single
bit is thus sufficient to convey to the driver that these rules should
be obeyed.

The second change adds comments to the freq and max_power fields of
struct ieee80211_channel to indicate the units that are used.

Signed-Off-By: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-06-15 15:48:13 -04:00
Daniel Drake
6ae15df16e [PATCH] softmac: Fix handling of authentication failure
My router blew up earlier, but exhibited some interesting behaviour during
its dying moments. It was broadcasting beacons but wouldn't respond to
any authentication requests.

I noticed that softmac wasn't playing nice with this, as I couldn't make it try
to connect to other networks after it had timed out authenticating to my ill
router.

To resolve this, I modified the softmac event/notify API to pass the event
code to the callback, so that callbacks being notified from
IEEE80211SOFTMAC_EVENT_ANY masks can make some judgement. In this case, the
ieee80211softmac_assoc callback needs to make a decision based upon whether
the association passed or failed.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-06-05 15:51:30 -04:00
Daniel Drake
76ea4c7f4c [PATCH] softmac: complete shared key authentication
This patch finishes of the partially-complete shared key authentication
implementation in softmac.

The complication here is that we need to encrypt a management frame during
the authentication process. I don't think there are any other scenarios where
this would have to happen.

To get around this without causing too many headaches, we decided to just use
software encryption for this frame. The softmac config option now selects
IEEE80211_CRYPT_WEP so that we can ensure this available. This also involved
a modification to some otherwise unused ieee80211 API.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-06-05 15:51:29 -04:00
John W. Linville
dea58b80f2 Merge branch 'from-linus' into upstream 2006-06-05 14:42:27 -04:00
Andrew Morton
29f767a254 [PATCH] net/compat.h build fix
From: Andrew Morton <akpm@osdl.org>

Move the forward decl outside the ifdef, since we use it in both legs.

Should fix the spacr64 build error reported in
	http://bugzilla.kernel.org/show_bug.cgi?id=6625

Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Cedric Pellerin <cedric@bidouillesoft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-31 16:27:11 -07:00
John W. Linville
f587fb74b2 Merge branch 'from-linus' into upstream 2006-05-26 16:06:58 -04:00
David Woodhouse
66643de455 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	include/asm-powerpc/unistd.h
	include/asm-sparc/unistd.h
	include/asm-sparc64/unistd.h

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-24 09:22:21 +01:00
Alexey Dobriyan
f5565f4a90 [IRDA]: fixup type of ->lsap_state
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-22 16:54:30 -07:00
John W. Linville
3b38f317e5 Merge branch 'from-linus' into upstream 2006-05-22 14:26:25 -04:00
Vladislav Yasevich
dd2d1c6f29 [SCTP]: Respect the real chunk length when walking parameters.
When performing bound checks during the parameter processing, we
want to use the real chunk and paramter lengths for bounds instead
of the rounded ones.  This prevents us from potentially walking of
the end if the chunk length was miscalculated.  We still use rounded
lengths when advancing the pointer. This was found during a
conformance test that changed the chunk length without modifying
parameters.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-05-19 11:52:20 -07:00
Sridhar Samudrala
8de8c87380 [SCTP]: Set sk_err so that poll wakes up after a non-blocking connect failure.
Also fix some other cases where sk_err is not set for 1-1 style sockets.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-05-19 10:58:12 -07:00
John W. Linville
5dd8816aeb Merge branch 'from-linus' into upstream 2006-05-17 14:51:24 -04:00
Simon Kelley
bd89efc532 [NEIGH]: Fix IP-over-ATM and ARP interaction.
The classical IP over ATM code maintains its own IPv4 <-> <ATM stuff>
ARP table, using the standard neighbour-table code. The
neigh_table_init function adds this neighbour table to a linked list
of all neighbor tables which is used by the functions neigh_delete()
neigh_add() and neightbl_set(), all called by the netlink code.

Once the ATM neighbour table is added to the list, there are two
tables with family == AF_INET there, and ARP entries sent via netlink
go into the first table with matching family. This is indeterminate
and often wrong.

To see the bug, on a kernel with CLIP enabled, create a standard IPv4
ARP entry by pinging an unused address on a local subnet. Then attempt
to complete that entry by doing

ip neigh replace <ip address> lladdr <some mac address> nud reachable

Looking at the ARP tables by using 

ip neigh show

will reveal two ARP entries for the same address. One of these can be
found in /proc/net/arp, and the other in /proc/net/atm/arp.

This patch adds a new function, neigh_table_init_no_netlink() which
does everything the neigh_table_init() does, except add the table to
the netlink all-arp-tables chain. In addition neigh_table_init() has a
check that all tables on the chain have a distinct address family.
The init call in clip.c is changed to call
neigh_table_init_no_netlink().

Since ATM ARP tables are rather more complicated than can currently be
handled by the available rtattrs in the netlink protocol, no
functionality is lost by this patch, and non-ATM ARP manipulation via
netlink is rescued. A more complete solution would involve a rtattr
for ATM ARP entries and some way for the netlink code to give
neigh_add and friends more information than just address family with
which to find the correct ARP table.

[ I've changed the assertion checking in neigh_table_init() to not
  use BUG_ON() while holding neigh_tbl_lock.  Instead we remember that
  we found an existing tbl with the same family, and after dropping
  the lock we'll give a diagnostic kernel log message and a stack dump.
  -DaveM ]

Signed-off-by: Simon Kelley <simon@thekelleys.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-12 14:56:08 -07:00
Stephen Hemminger
23aee82e75 Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2006-05-08 16:01:20 -07:00
David Woodhouse
5047f09b56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-06 19:59:18 +01:00
Neil Horman
7c3ceb4fb9 [SCTP]: Allow spillover of receive buffer to avoid deadlock.
This patch fixes a deadlock situation in the receive path by allowing
temporary spillover of the receive buffer.

- If the chunk we receive has a tsn that immediately follows the ctsn,
  accept it even if we run out of receive buffer space and renege data with
  higher TSNs.
- Once we accept one chunk in a packet, accept all the remaining chunks
  even if we run out of receive buffer space.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Mark Butler <butlerm@middle.net>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-05 17:02:09 -07:00
Daniel Drake
8462fe3cd9 [PATCH] softmac: suggest per-frame-type TX rate
This patch is the first step towards rate control inside softmac.

The txrates substructure has been extended to provide
different fields for different types of packets (management/data,
unicast/multicast). These fields are updated on association to values
compatible with the access point we are associating to.

Drivers can then use the new ieee80211softmac_suggest_txrate() function
call when deciding which rate to transmit each frame at. This is
immensely useful for ZD1211, and bcm can use it too.

The user can still specify a rate through iwconfig, which is matched
for all transmissions (assuming the rate they have specified is in
the rate set required by the AP).

At a later date, we can incorporate automatic rate management into
the ieee80211softmac_recalc_txrates() function.

This patch also removes the mcast_fallback field. Sam Leffler pointed
out that this field is meaningless, because no driver will ever be
retransmitting mcast frames (they are not acked).

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-05-05 17:10:41 -04:00
John W. Linville
fd5226a726 Merge branch 'upstream-fixes' into upstream 2006-05-05 16:56:24 -04:00
Jean Delvare
f21709d70a [PATCH] ieee80211: Fix A band channel count (resent)
The channel count for 802.11a is still not right. We better
compute it from the min and max channel numbers, rather than
hardcoding it.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-05-05 16:55:23 -04:00
Daniel Drake
d57336e3f2 [PATCH] softmac: make non-operational after being stopped
zd1211 with softmac and wpa_supplicant revealed an issue with softmac
and the use of workqueues. Some of the work functions actually
reschedule themselves, so this meant that there could still be
pending work after flush_scheduled_work() had been called during
ieee80211softmac_stop().

This patch introduces a "running" flag which is used to ensure that
rescheduling does not happen in this situation.

I also used this flag to ensure that softmac's hooks into ieee80211 are
non-operational once the stop operation has been started. This simply
makes softmac a little more robust, because I could crash it easily
by receiving frames in the short timeframe after shutting down softmac
and before turning off the ZD1211 radio. (ZD1211 is now fixed as well!)

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-05-05 16:55:22 -04:00
John W. Linville
aad61439e6 Merge branch 'from-linus' into upstream 2006-05-05 16:50:23 -04:00
Ralf Baechle
82e84249f0 [ROSE]: Eleminate HZ from ROSE kernel interfaces
Convert all ROSE sysctl time values from jiffies to ms as units.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-03 23:28:20 -07:00
Ralf Baechle
4d8937d0b1 [NETROM]: Eleminate HZ from NET/ROM kernel interfaces
Convert all NET/ROM sysctl time values from jiffies to ms as units.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-03 23:27:47 -07:00
Ralf Baechle
e1fdb5b396 [AX.25]: Eleminate HZ from AX.25 kernel interfaces
Convert all AX.25 sysctl time values from jiffies to ms as units.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-03 23:27:16 -07:00
Akinobu Mita
da753beaeb [NET]: use hlist_unhashed()
Use hlist_unhashed() rather than accessing inside data structure.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-29 18:33:15 -07:00
David Woodhouse
d6754b401a Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2006-04-29 01:42:26 +01:00
David Woodhouse
62c4f0a2d5 Don't include linux/config.h from anywhere else in include/
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-26 12:56:16 +01:00
Johannes Berg
9a1771e867 [PATCH] softmac: add SIOCSIWMLME
This patch adds the SIOCSIWMLME wext to softmac, this functionality
appears to be used by wpa_supplicant and is softmac-specific.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jouni Malinen <jkm@devicescape.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-04-24 16:15:58 -04:00
Zhu Yi
45a62ab3d6 [PATCH] ieee80211: update version stamp to 1.1.13
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-04-24 16:15:54 -04:00
Zhu Yi
73858062b6 [PATCH] ieee80211: Fix TX code doesn't enable QoS when using WPA + QoS
Fix ieee80211 TX code when using WPA+QOS. TKIP/CCMP will use
the TID field of qos_ctl in 802.11 frame header to do encryption. We
cannot ignore this field when doing host encryption and add the qos_ctl
field later.

Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-04-24 16:15:53 -04:00
Zhu Yi
ea2841521a [PATCH] ieee80211: Fix TKIP MIC calculation for QoS frames
Fix TKIP MIC verification failure when receiving QoS frames from AP.

Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-04-24 16:15:53 -04:00
Johannes Berg
818667f7c4 [PATCH] softmac: fix SIOCSIWAP
There are some bugs in the current implementation of the SIOCSIWAP wext,
for example that when you do it twice and it fails, it may still try
another access point for some reason. This patch fixes this by introducing
a new flag that tells the association code that the bssid that is in use
was fixed by the user and shouldn't be deviated from.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-04-24 15:20:23 -04:00
Linus Torvalds
f4ffaa452e Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (21 commits)
  [PATCH] wext: Fix RtNetlink ENCODE security permissions
  [PATCH] bcm43xx: iw_priv_args names should be <16 characters
  [PATCH] bcm43xx: sysfs code cleanup
  [PATCH] bcm43xx: fix pctl slowclock limit calculation
  [PATCH] bcm43xx: fix dyn tssi2dbm memleak
  [PATCH] bcm43xx: fix config menu alignment
  [PATCH] bcm43xx wireless: fix printk format warnings
  [PATCH] softmac: report when scanning has finished
  [PATCH] softmac: fix event sending
  [PATCH] softmac: handle iw_mode properly
  [PATCH] softmac: dont send out packets while scanning
  [PATCH] softmac: return -EAGAIN from getscan while scanning
  [PATCH] bcm43xx: set trans_start on TX to prevent bogus timeouts
  [PATCH] orinoco: fix truncating commsquality RID with the latest Symbol firmware
  [PATCH] softmac: fix spinlock recursion on reassoc
  [PATCH] Revert NET_RADIO Kconfig title change
  [PATCH] wext: Fix IWENCODEEXT security permissions
  [PATCH] wireless/atmel: send WEXT scan completion events
  [PATCH] wireless/airo: clean up WEXT association and scan events
  [PATCH] softmac uses Wiress Ext.
  ...
2006-04-20 15:26:25 -07:00
David S. Miller
dc6de33674 [NET]: Add skb->truesize assertion checking.
Add some sanity checking.  truesize should be at least sizeof(struct
sk_buff) plus the current packet length.  If not, then truesize is
seriously mangled and deserves a kernel log message.

Currently we'll do the check for release of stream socket buffers.

But we can add checks to more spots over time.

Incorporating ideas from Herbert Xu.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-20 00:10:50 -07:00
Johannes Berg
feeeaa87e8 [PATCH] softmac: fix event sending
Softmac is sending custom events to userspace already, but it
should _really_ be sending the right WEXT events instead. This
patch fixes that.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-04-19 17:25:39 -04:00
YOSHIFUJI Hideaki
b809739a1b [IPV6]: Clean up hop-by-hop options handler.
- Removed unused argument (nhoff) for ipv6_parse_hopopts().
- Make ipv6_parse_hopopts() to align with other extension header
  handlers.
- Removed pointless assignment (hdr), which is not used afterwards.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-18 15:57:53 -07:00
Jamal Hadi Salim
2717096ab4 [XFRM]: Fix aevent timer.
Send aevent immediately if we have sent nothing since last timer and
this is the first packet.

Fixes a corner case when packet threshold is very high, the timer low
and a very low packet rate input which is bursty.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-14 15:03:05 -07:00
Adrian Bunk
6c97e72a16 [IPV4]: Possible cleanups.
This patch contains the following possible cleanups:
- make the following needlessly global function static:
  - arp.c: arp_rcv()
- remove the following unused EXPORT_SYMBOL's:
  - devinet.c: devinet_ioctl
  - fib_frontend.c: ip_rt_ioctl
  - inet_hashtables.c: inet_bind_bucket_create
  - inet_hashtables.c: inet_bind_hash
  - tcp_input.c: sysctl_tcp_abc
  - tcp_ipv4.c: sysctl_tcp_tw_reuse
  - tcp_output.c: sysctl_tcp_mtu_probing
  - tcp_output.c: sysctl_tcp_base_mss

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-14 15:00:20 -07:00
Denis Vlasenko
b1a7ffcb7a [IPV6]: Deinline few large functions in inet6 code
Deinline a few functions which produce 200+ bytes of code.

Size  Uses Wasted Name and definition
===== ==== ====== ================================================
  429    3    818 __inet6_lookup        include/net/inet6_hashtables.h
  404    2    384 __inet6_lookup_established    include/net/inet6_hashtables.h
  206    3    372 __inet6_hash  include/net/inet6_hashtables.h

Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09 22:48:59 -07:00
David S. Miller
9b591cbd4e [X25]: Restore skb->dev setting in x25_type_trans().
Noticed by Pascal Schlafer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09 22:37:18 -07:00
Patrick McHardy
2e2f7aefa8 [NETFILTER]: Fix fragmentation issues with bridge netfilter
The conntrack code doesn't do re-fragmentation of defragmented packets
anymore but relies on fragmentation in the IP layer. Purely bridged
packets don't pass through the IP layer, so the bridge netfilter code
needs to take care of fragmentation itself.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09 22:25:23 -07:00
Herbert Xu
dbe5b4aaaf [IPSEC]: Kill unused decap state structure
This patch removes the *_decap_state structures which were previously
used to share state between input/post_input.  This is no longer
needed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-01 00:54:16 -08:00
Herbert Xu
e695633e21 [IPSEC]: Kill unused decap state argument
This patch removes the decap_state argument from the xfrm input hook.
Previously this function allowed the input hook to share state with
the post_input hook.  The latter has since been removed.

The only purpose for it now is to check the encap type.  However, it
is easier and better to move the encap type check to the generic
xfrm_rcv function.  This allows us to get rid of the decap state
argument altogether.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-01 00:52:46 -08:00
David S. Miller
0803dbed7a [TCP]: Kill unused extern decl for tcp_v4_hash_connecting()
Noticed by Alan Menegotto.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-31 02:25:46 -08:00
Herbert Xu
d2acc3479c [INET]: Introduce tunnel4/tunnel6
Basically this patch moves the generic tunnel protocol stuff out of
xfrm4_tunnel/xfrm6_tunnel and moves it into the new files of tunnel4.c
and tunnel6 respectively.

The reason for this is that the problem that Hugo uncovered is only
the tip of the iceberg.  The real problem is that when we removed the
dependency of ipip on xfrm4_tunnel we didn't really consider the module
case at all.

For instance, as it is it's possible to build both ipip and xfrm4_tunnel
as modules and if the latter is loaded then ipip simply won't load.

After considering the alternatives I've decided that the best way out of
this is to restore the dependency of ipip on the non-xfrm-specific part
of xfrm4_tunnel.  This is acceptable IMHO because the intention of the
removal was really to be able to use ipip without the xfrm subsystem.
This is still preserved by this patch.

So now both ipip/xfrm4_tunnel depend on the new tunnel4.c which handles
the arbitration between the two.  The order of processing is determined
by a simple integer which ensures that ipip gets processed before
xfrm4_tunnel.

The situation for ICMP handling is a little bit more complicated since
we may not have enough information to determine who it's for.  It's not
a big deal at the moment since the xfrm ICMP handlers are basically
no-ops.  In future we can deal with this when we look at ICMP caching
in general.

The user-visible change to this is the removal of the TUNNEL Kconfig
prompts.  This makes sense because it can only be used through IPCOMP
as it stands.

The addition of the new modules shouldn't introduce any problems since
module dependency will cause them to be loaded.

Oh and I also turned some unnecessary pskb's in IPv6 related to this
patch to skb's.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28 17:02:46 -08:00
Denis Vlasenko
f0088a50e7 [NET]: deinline 200+ byte inlines in sock.h
Sizes in bytes (allyesconfig, i386) and files where those inlines
are used:

238 sock_queue_rcv_skb 2.6.16/net/x25/x25_in.o
238 sock_queue_rcv_skb 2.6.16/net/rose/rose_in.o
238 sock_queue_rcv_skb 2.6.16/net/packet/af_packet.o
238 sock_queue_rcv_skb 2.6.16/net/netrom/nr_in.o
238 sock_queue_rcv_skb 2.6.16/net/llc/llc_sap.o
238 sock_queue_rcv_skb 2.6.16/net/llc/llc_conn.o
238 sock_queue_rcv_skb 2.6.16/net/irda/af_irda.o
238 sock_queue_rcv_skb 2.6.16/net/ipx/af_ipx.o
238 sock_queue_rcv_skb 2.6.16/net/ipv6/udp.o
238 sock_queue_rcv_skb 2.6.16/net/ipv6/raw.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/udp.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/raw.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/ipmr.o
238 sock_queue_rcv_skb 2.6.16/net/econet/econet.o
238 sock_queue_rcv_skb 2.6.16/net/econet/af_econet.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/sco.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/l2cap.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/hci_sock.o
238 sock_queue_rcv_skb 2.6.16/net/ax25/ax25_in.o
238 sock_queue_rcv_skb 2.6.16/net/ax25/af_ax25.o
238 sock_queue_rcv_skb 2.6.16/net/appletalk/ddp.o
238 sock_queue_rcv_skb 2.6.16/drivers/net/pppoe.o

276 sk_receive_skb 2.6.16/net/decnet/dn_nsp_in.o
276 sk_receive_skb 2.6.16/net/dccp/ipv6.o
276 sk_receive_skb 2.6.16/net/dccp/ipv4.o
276 sk_receive_skb 2.6.16/net/dccp/dccp_ipv6.o
276 sk_receive_skb 2.6.16/drivers/net/pppoe.o

209 sk_dst_check 2.6.16/net/ipv6/ip6_output.o
209 sk_dst_check 2.6.16/net/ipv4/udp.o
209 sk_dst_check 2.6.16/net/decnet/dn_nsp_out.o

Large inlines with multiple callers:
Size  Uses Wasted Name and definition
===== ==== ====== ================================================
  238   21   4360 sock_queue_rcv_skb    include/net/sock.h
  109   10    801 sock_recv_timestamp   include/net/sock.h
  276    4    768 sk_receive_skb        include/net/sock.h
   94    8    518 __sk_dst_check        include/net/sock.h
  209    3    378 sk_dst_check  include/net/sock.h
  131    4    333 sk_setup_caps include/net/sock.h
  152    2    132 sk_stream_alloc_pskb  include/net/sock.h
  125    2    105 sk_stream_writequeue_purge    include/net/sock.h

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28 17:02:45 -08:00
Linus Torvalds
fdccffc6b7 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NET]: drop duplicate assignment in request_sock
  [IPSEC]: Fix tunnel error handling in ipcomp6
2006-03-27 08:47:29 -08:00
Alan Stern
e041c68341 [PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe.  There is no
protection against entries being added to or removed from a chain while the
chain is in use.  The issues were discussed in this thread:

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

	"Blocking" chains are always called from a process context
	and the callout routines are allowed to sleep;

	"Atomic" chains can be called from an atomic context and
	the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API.  Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name).  New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain.  The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed.  For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections.  (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with.  For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem.  Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain.  (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications.  None
of them use the raw API, so for the moment it is only a placeholder.

  ATOMIC CHAINS
  -------------
arch/i386/kernel/traps.c:		i386die_chain
arch/ia64/kernel/traps.c:		ia64die_chain
arch/powerpc/kernel/traps.c:		powerpc_die_chain
arch/sparc64/kernel/traps.c:		sparc64die_chain
arch/x86_64/kernel/traps.c:		die_chain
drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
kernel/panic.c:				panic_notifier_list
kernel/profile.c:			task_free_notifier
net/bluetooth/hci_core.c:		hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
net/ipv6/addrconf.c:			inet6addr_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
net/netlink/af_netlink.c:		netlink_chain

  BLOCKING CHAINS
  ---------------
arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
arch/s390/kernel/process.c:		idle_chain
arch/x86_64/kernel/process.c		idle_notifier
drivers/base/memory.c:			memory_chain
drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
drivers/macintosh/adb.c:		adb_client_list
drivers/macintosh/via-pmu.c		sleep_notifier_list
drivers/macintosh/via-pmu68k.c		sleep_notifier_list
drivers/macintosh/windfarm_core.c	wf_client_list
drivers/usb/core/notify.c		usb_notifier_list
drivers/video/fbmem.c			fb_notifier_list
kernel/cpu.c				cpu_chain
kernel/module.c				module_notify_list
kernel/profile.c			munmap_notifier
kernel/profile.c			task_exit_notifier
kernel/sys.c				reboot_notifier_list
net/core/dev.c				netdev_chain
net/decnet/dn_dev.c:			dnaddr_chain
net/ipv4/devinet.c:			inetaddr_chain

It's possible that some of these classifications are wrong.  If they are,
please let us know or submit a patch to fix them.  Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Norbert Kiesel
3eb4801d7b [NET]: drop duplicate assignment in request_sock
Just noticed that request_sock.[ch] contain a useless assignment of
rskq_accept_head to itself.  I assume this is a typo and the 2nd one
was supposed to be _tail.  However, setting _tail to NULL is not
needed, so the patch below just drops the 2nd assignment.

Signed-off-By: Norbert Kiesel <nkiesel@tbdnetworks.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-26 17:39:55 -08:00
Ilia Sotnikov
cef2685e00 [IPV4]: Aggregate route entries with different TOS values
When we get an ICMP need-to-frag message, the original TOS value in the
ICMP payload cannot be used as a key to look up the routes to update.
This is because the TOS field may have been modified by routers on the
way.  Similarly, ip_rt_redirect should also ignore the TOS as the router
that gave us the message may have modified the TOS value.

The patch achieves this objective by aggregating entries with different
TOS values (but are otherwise identical) into the same bucket.  This
makes it easy to update them at the same time when an ICMP message is
received.

In future we should use a twin-hashing scheme where teh aggregation
occurs at the entry level.  That is, the TOS goes back into the hash
for normal lookups while ICMP lookups will end up with a node that
gives us a list that contains all other route entries that differ
only by TOS.

Signed-off-by: Ilia Sotnikov <hostcc@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-25 01:38:55 -08:00
David S. Miller
9932cf9546 [NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms.
This makes struct sock 8 bytes smaller on 64-bit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-24 15:45:00 -08:00
Jeff Garzik
9b7c84899e Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2006-03-23 17:15:41 -05:00
Jean Tourrilhes
711e2c33ac [PATCH] WE-20 for kernel 2.6.16
This is version 20 of the Wireless Extensions. This is the
completion of the RtNetlink work I started early 2004, it enables the
full Wireless Extension API over RtNetlink.

	Few comments on the patch :
	o totally driver transparent, no change in drivers needed.
	o iwevent were already RtNetlink based since they were created
(around 2.5.7). This adds all the regular SET and GET requests over
RtNetlink, using the exact same mechanism and data format as iwevents.
	o This is a Kconfig option, as currently most people have no
need for it. Surprisingly, patch is actually small and well
encapsulated.
	o Tested on SMP, attention as been paid to make it 64 bits clean.
	o Code do probably too many checks and could be further
optimised, but better safe than sorry.
	o RtNetlink based version of the Wireless Tools available on
my web page for people inclined to try out this stuff.

	I would also like to thank Alexey Kuznetsov for his helpful
suggestions to make this patch better.

Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-23 07:12:57 -05:00
Jeff Garzik
fa4fa40a99 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2006-03-22 22:55:57 -05:00
Johannes Berg
4855d25b1e [PATCH] softmac: add copyright and license headers
add copyright and license headers to all softmac files

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:56 -05:00
Johannes Berg
5c4df6da58 [PATCH] softmac: convert to use global workqueue
Convert softmac to use global workqueue instead of private one...

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:52 -05:00
Johannes Berg
370121e519 [PATCH] wireless: Add softmac layer to the kernel
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:50 -05:00
Dmitry Mishin
1e30a014e3 [NETFILTER]: futher {ip,ip6,arp}_tables unification
This patch moves {ip,ip6,arp}t_entry_{match,target} definitions to
x_tables.h. This move simplifies code and future compatibility fixes.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:56:56 -08:00
Pablo Neira Ayuso
b9f78f9fca [NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand
x_tables matches and targets that require nf_conntrack_ipv[4|6] to work
don't have enough information to load on demand these modules. This
patch introduces the following changes to solve this issue:

o nf_ct_l3proto_try_module_get: try to load the layer 3 connection
tracker module and increases the refcount.
o nf_ct_l3proto_module put: drop the refcount of the module.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:56:08 -08:00
Shaun Pereira
a64b7b936d [X25]: allow ITU-T DTE facilities for x25
Allows use of the optional user facility to insert ITU-T
(http://www.itu.int/ITU-T/) specified DTE facilities in call set-up x25
packets.  This feature is optional; no facilities will be added if the ioctl
is not used, and call setup packet remains the same as before.

If the ioctls provided by the patch are used, then a facility marker will be
added to the x25 packet header so that the called dte address extension
facility can be differentiated from other types of facilities (as described in
the ITU-T X.25 recommendation) that are also allowed in the x25 packet header.

Facility markers are made up of two octets, and may be present in the x25
packet headers of call-request, incoming call, call accepted, clear request,
and clear indication packets.  The first of the two octets represents the
facility code field and is set to zero by this patch.  The second octet of the
marker represents the facility parameter field and is set to 0x0F because the
marker will be inserted before ITU-T type DTE facilities.

Since according to ITU-T X.25 Recommendation X.25(10/96)- 7.1 "All networks
will support the facility markers with a facility parameter field set to all
ones or to 00001111", therefore this patch should work with all x.25 networks.

While there are many ITU-T DTE facilities, this patch implements only the
called and calling address extension, with placeholders in the
x25_dte_facilities structure for the rest of the facilities.

Testing:

This patch was tested using a cisco xot router connected on its serial ports
to an X.25 network, and on its lan ports to a host running an xotd daemon.

It is also possible to test this patch using an xotd daemon and an x25tap
patch, where the xotd daemons work back-to-back without actually using an x.25
network.  See www.fyonne.net for details on how to do this.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Andrew Hendry <ahendry@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 00:01:31 -08:00
Shaun Pereira
f0ac261441 [NET]: socket timestamp 32 bit handler for 64 bit kernel
Get socket timestamp handler function that does not use the
ioctl32_hash_table.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-21 23:59:39 -08:00
Stephen Hemminger
f4ad2b162d [LLC]: llc_mac_hdr_init const arguments
Cleanup of LLC.  llc_mac_hdr_init can take constant arguments,
and it is defined twice once in llc_output.h that is otherwise unused.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:59:36 -08:00
Arnaldo Carvalho de Melo
dec73ff029 [ICSK] compat: Introduce inet_csk_compat_[gs]etsockopt
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:46:16 -08:00
Dmitry Mishin
3fdadf7d27 [NET]: {get|set}sockopt compatibility layer
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:45:21 -08:00
Steven Whitehouse
c4ea94ab37 [DECnet]: Endian annotation and fixes for DECnet.
The typedef for dn_address has been removed in favour of using __le16
or __u16 directly as appropriate. All the DECnet header files are
updated accordingly.

The byte ordering of dn_eth2dn() and dn_dn2eth() are both changed
since just about all their callers wanted network order rather than
host order, so the conversion is now done in the functions themselves.

Several missed endianess conversions have been picked up during the
conversion process. The nh_gw field in struct dn_fib_info has been
changed from a 32 bit field to 16 bits as it ought to be.

One or two cases of using htons rather than dn_htons in the routing
code have been found and fixed.

There are still a few warnings to fix, but this patch deals with the
important cases.

Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:42:39 -08:00
Patrick McHardy
be33690d8f [XFRM]: Fix aevent related crash
When xfrm_user isn't loaded xfrm_nl is NULL, which makes IPsec crash because
xfrm_aevent_is_on passes the NULL pointer to netlink_has_listeners as socket.
A second problem is that the xfrm_nl pointer is not cleared when the socket
is releases at module unload time.

Protect references of xfrm_nl from outside of xfrm_user by RCU, check
that the socket is present in xfrm_aevent_is_on and set it to NULL
when unloading xfrm_user.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:40:54 -08:00
Rick Jones
15d99e02ba [TCP]: sysctl to allow TCP window > 32767 sans wscale
Back in the dark ages, we had to be conservative and only allow 15-bit
window fields if the window scale option was not negotiated.  Some
ancient stacks used a signed 16-bit quantity for the window field of
the TCP header and would get confused.

Those days are long gone, so we can use the full 16-bits by default
now.

There is a sysctl added so that we can still interact with such old
stacks

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:40:29 -08:00
Ingo Molnar
57b47a53ec [NET]: sem2mutex part 2
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:35:41 -08:00
Arjan van de Ven
4a3e2f711a [NET] sem2mutex: net/
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:33:17 -08:00
Michael S. Tsirkin
c5ecd62c25 [NET]: Move destructor from neigh->ops to neigh_params
struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use.  The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed.  We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.

The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup.  Two additional
patches in the patch series update ipoib to use this new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:25:41 -08:00
Arnaldo Carvalho de Melo
c4d9390941 [ICSK]: Introduce inet_csk_ctl_sock_create
Consolidating open coded sequences in tcp and dccp, v4 and v6.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:01:03 -08:00
John Heffner
0e7b13685f [TCP] mtu probing: move tcp-specific data out of inet_connection_sock
This moves some TCP-specific MTU probing state out of
inet_connection_sock back to tcp_sock.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:32:58 -08:00
Benjamin LaHaise
1d541ddd74 [AF_UNIX]: scm: better initialization
Instead of doing a memset then initialization of the fields of the scm
structure, just initialize all the members explicitly.  Prevent reloading
of current on x86 and x86-64 by storing the value in a local variable for
subsequent dereferences.  This is worth a ~7KB/s increase in af_unix
bandwidth.  Note that we avoid the issues surrounding potentially
uninitialized members of the ucred structure by constructing a struct
ucred instead of assigning the members individually, which forces the
compiler to zero any padding.

[ I modified the patch not to use the aggregate assignment since
  gcc-3.4.x and earlier cannot optimize that properly at all even
  though gcc-4.0.x and later can -DaveM ]

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:31:51 -08:00
Jamal Hadi Salim
6c5c8ca7ff [IPSEC]: Sync series - policy expires
This is similar to the SA expire insertion patch - only it inserts
expires for SP.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:17:25 -08:00
Jamal Hadi Salim
53bc6b4d29 [IPSEC]: Sync series - SA expires
This patch allows a user to insert SA expires. This is useful to
do on an HA backup for the case of byte counts but may not be very
useful for the case of time based expiry.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:17:03 -08:00
Jamal Hadi Salim
980ebd2579 [IPSEC]: Sync series - acquire insert
This introduces a feature similar to the one described in RFC 2367:
"
   ... the application needing an SA sends a PF_KEY
   SADB_ACQUIRE message down to the Key Engine, which then either
   returns an error or sends a similar SADB_ACQUIRE message up to one or
   more key management applications capable of creating such SAs.
   ...
   ...
   The third is where an application-layer consumer of security
   associations (e.g.  an OSPFv2 or RIPv2 daemon) needs a security
   association.

        Send an SADB_ACQUIRE message from a user process to the kernel.

        <base, address(SD), (address(P),) (identity(SD),) (sensitivity,)
          proposal>

        The kernel returns an SADB_ACQUIRE message to registered
          sockets.

        <base, address(SD), (address(P),) (identity(SD),) (sensitivity,)
          proposal>

        The user-level consumer waits for an SADB_UPDATE or SADB_ADD
        message for its particular type, and then can use that
        association by using SADB_GET messages.

 "
An app such as OSPF could then use ipsec KM to get keys

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:16:40 -08:00
Jamal Hadi Salim
f8cd54884e [IPSEC]: Sync series - core changes
This patch provides the core functionality needed for sync events
for ipsec. Derived work of Krisztian KOVACS <hidden@balabit.hu>

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:15:11 -08:00
Patrick McHardy
f2ffd9eeda [NETFILTER]: Move ip6_masked_addrcmp to include/net/ipv6.h
Replace netfilter's ip6_masked_addrcmp by a more efficient version
in include/net/ipv6.h to make it usable without module dependencies.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:03:16 -08:00
Harald Welte
dc808fe28d [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn'
This patch moves all helper related data fields of 'struct nf_conn'
into a separate structure 'struct nf_conn_help'.  This new structure
is only present in conntrack entries for which we actually have a
helper loaded.

Also, this patch cleans up the nf_conntrack 'features' mechanism to
resemble what the original idea was: Just glue the feature-specific
data structures at the end of 'struct nf_conn', and explicitly
re-calculate the pointer to it when needed rather than keeping
pointers around.

Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack
is 276 bytes. We still need to save another 20 bytes in order to fit
into to target of 256bytes.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:56:32 -08:00
John Heffner
5d424d5a67 [TCP]: MTU probing
Implementation of packetization layer path mtu discovery for TCP, based on
the internet-draft currently found at
<http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:53:41 -08:00
YOSHIFUJI Hideaki
70ceb4f539 [IPV6]: ROUTE: Add experimental support for Route Information Option in RA (RFC4191).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:06:24 -08:00
YOSHIFUJI Hideaki
ebacaaa0fd [IPV6]: ROUTE: Add support for Router Preference (RFC4191).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:04:53 -08:00
YOSHIFUJI Hideaki
554cfb7ee5 [IPV6]: ROUTE: Eliminate lock for default route pointer.
And prepare for more advanced router selection.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:00:26 -08:00
YOSHIFUJI Hideaki
955189efb4 [IPV6]: ADDRCONF: Use our standard algorithm for randomized ifid.
RFC 3041 describes an algorithm to generate random interface
identifier.  In RFC 3041bis, it is allowed to use different
algorithm than one described in RFC 3041.

So, let's use our standard pseudo random algorithm to simplify
our implementation.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:54:09 -08:00
Jeff Garzik
d378aca6ec Merge branch 'master' 2006-03-20 04:38:03 -05:00
Ralf Baechle DL5RB
c7c694d196 [AX.25]: Fix potencial memory hole.
If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3
(or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall
through the switch statement without calling ax25_send_iframe() or any
other function that would eventually free skbn thus leaking the packet.

Fix by restricting the sysctl inferface to allow only actually supported
AX.25 dialects.

The system administration mistake needed for this to happen is rather
unlikely, so this is an uncritical hole.

Coverity #651.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-19 13:20:06 -08:00
Alexey Kuznetsov
265a92856b [NET]: Fix race condition in sk_wait_event().
It is broken, the condition is checked out of socket lock. It is
wonderful the bug survived for so long time.

[ This fixes bugzilla #6233:
  race condition in tcp_sendmsg when connection became established ]

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-17 16:05:43 -08:00
Jeff Garzik
68727fed54 Merge branch 'upstream-fixes' 2006-03-01 01:58:38 -05:00
Herbert Xu
752c1f4c78 [IPSEC]: Kill post_input hook and do NAT-T in esp_input directly
The only reason post_input exists at all is that it gives us the
potential to adjust the checksums incrementally in future which
we ought to do.

However, after thinking about it for a bit we can adjust the
checksums without using this post_input stuff at all.  The crucial
point is that only the inner-most NAT-T SA needs to be considered
when adjusting checksums.  What's more, the checksum adjustment
comes down to a single u32 due to the linearity of IP checksums.

We just happen to have a spare u32 lying around in our skb structure :)
When ip_summed is set to CHECKSUM_NONE on input, the value of skb->csum
is currently unused.  All we have to do is to make that the checksum
adjustment and voila, there goes all the post_input and decap structures!

I've left in the decap data structures for now since it's intricately
woven into the sec_path stuff.  We can kill them later too.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:00:40 -08:00
Jeff Garzik
dbfedbb981 Merge branch 'master' 2006-02-27 11:33:51 -05:00
Herbert Xu
21380b81ef [XFRM]: Eliminate refcounting confusion by creating __xfrm_state_put().
We often just do an atomic_dec(&x->refcnt) on an xfrm_state object
because we know there is more than 1 reference remaining and thus
we can elide the heavier xfrm_state_put() call.

Do this behind an inline function called __xfrm_state_put() so that is
more obvious and also to allow us to more cleanly add refcount
debugging later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23 16:10:53 -08:00
Jeff Garzik
b04a92e160 Merge branch 'upstream-fixes' 2006-02-17 16:20:30 -05:00
Patrick McHardy
48d5cad87c [XFRM]: Fix SNAT-related crash in xfrm4_output_finish
When a packet matching an IPsec policy is SNATed so it doesn't match any
policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish
crash because of a NULL pointer dereference.

This patch directs these packets to the original output path instead. Since
the packets have already passed the POST_ROUTING hook, but need to start at
the beginning of the original output path which includes another
POST_ROUTING invocation, a flag is added to the IPCB to indicate that the
packet was rerouted and doesn't need to pass the POST_ROUTING hook again.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:10:22 -08:00
David S. Miller
15c38c6ecd Merge master.kernel.org:/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2006-02-13 15:40:55 -08:00
Joe Perches
7a11c4d063 [IRDA]: Ratelimit messages.
From: Joe Perches <joe@perches.com>

Based upon a patch by Dave Jones.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 15:34:11 -08:00
Marcel Holtmann
56f3a40a5e [Bluetooth] Reduce L2CAP MTU for RFCOMM connections
This patch reduces the default L2CAP MTU for all RFCOMM connections
from 1024 to 1013 to improve the interoperability with some broken
RFCOMM implementations. To make this more flexible the L2CAP MTU
becomes also a module parameter and so it can changed at runtime.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-02-13 11:39:57 +01:00
Samuel Ortiz
d93077fb0e [IRDA]: Set proper IrLAP device address length
This patch set IrDA's addr_len properly, i.e to 4 bytes, the size of the
IrLAP device address.

Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 16:58:46 -08:00
Jeff Garzik
3c9b3a8575 Merge branch 'master' 2006-02-07 01:47:12 -05:00
Yasuyuki Kozakai
ddc8d029ac [NETFILTER]: nf_conntrack: check address family when finding protocol module
__nf_conntrack_{l3}proto_find() doesn't check the passed protocol family,
then it's possible to touch out of the array which has only AF_MAX items.

Spotted by Pablo Neira Ayuso.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:17 -08:00
Stephen Hemminger
0dec456d1f [NET]: Add CONFIG_NETDEBUG to suppress bad packet messages.
If you are on a hostile network, or are running protocol tests, you can
easily get the logged swamped by messages about bad UDP and ICMP packets.
This turns those messages off unless a config option is enabled.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 20:40:09 -08:00
Vlad Yasevich
27852c26ba [SCTP]: Fix 'fast retransmit' to send a TSN only once.
SCTP used to "fast retransmit" a TSN every time we hit the number
of missing reports for the TSN.  However the Implementers Guide
specifies that we should only "fast retransmit" a given TSN once.
Subsequent retransmits should be timeouts only. Also change the
number of missing reports to 3 as per the latest IG(similar to TCP).

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 16:57:31 -08:00
Patrick McHardy
5d39a795bf [IPV4]: Always set fl.proto in ip_route_newports
ip_route_newports uses the struct flowi from the struct rtable returned
by ip_route_connect for the new route lookup and just replaces the port
numbers if they have changed. If an IPsec policy exists which doesn't match
port 0 the struct flowi won't have the proto field set and no xfrm lookup
is done for the changed ports.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-31 17:35:35 -08:00
Larry Finger
dd5eeb461e [PATCH] ieee80211: common wx auth code
This patch creates two functions ieee80211_wx_set_auth and
ieee80211_wx_get_auth that can be used by drivers for the wireless
extension handlers instead of writing their own, if the implementation
should be software only.

These patches enable using bcm43xx devices with WPA and this seems (as
far as I can tell) to be the only difference between the stock ieee80211
and softmac's ieee80211 left.

Signed-Off-By: Johannes Berg <johannes@sipsolutions.net>

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-30 20:35:35 -05:00
Zhu Yi
b79e20b609 [PATCH] ieee80211: Add 802.11h data type and structures
Add 802.11h data types and structure definitions to ieee80211.h.

Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:07 -05:00
Zhu Yi
9184d9348a [PATCH] ieee80211: Add TKIP crypt->build_iv
This patch adds ieee80211 TKIP build_iv() method to support hardwares
that can do TKIP encryption but relies on ieee80211 layer to build
the IV. It also changes the build_iv() interface to return the key
if possible after the IV is built (this is required by TKIP).

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:07 -05:00
Zhu Yi
24056bec08 [PATCH] ieee80211: Add LEAP authentication type
Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:06 -05:00
Zhu Yi
4a99ac3a9e [PATCH] ieee80211: Fix A band min and max channel definitions
Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 16:49:58 -05:00
David S. Miller
cf9e50a920 Merge master.kernel.org:/pub/scm/linux/kernel/git/sridhar/lksctp-2.6 2006-01-19 16:53:02 -08:00
Sridhar Samudrala
c4d2444e99 [SCTP]: Fix couple of races between sctp_peeloff() and sctp_rcv().
Validate and update the sk in sctp_rcv() to avoid the race where an
assoc/ep could move to a different socket after we get the sk, but before
the skb is added to the backlog.

Also migrate the skb's in backlog queue to new sk when doing a peeloff.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:56:26 -08:00
Vlad Yasevich
313e7b4d25 [SCTP]: Fix machine check/connection hang on IA64.
sctp_unpack_cookie used an on-stack array called digest as a result/out
parameter in the call to crypto_hmac. However, hmac code
(crypto_hmac_final)
assumes that the 'out' argument is in virtual memory (identity mapped
region)
and can use virt_to_page call on it.  This does not work with the on-stack
declared digest.  The problems observed so far have been:
 a) incorrect hmac digest
 b) machine check and hardware reset.

Solution is to define the digest in an identity mapped region by
kmalloc'ing
it.  We can do this once as part of the endpoint structure and re-use it
when
verifying the SCTP cookie.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:55:57 -08:00
Vlad Yasevich
8116ffad41 [SCTP]: Fix bad sysctl formatting of SCTP timeout values on 64-bit m/cs.
Change all the structure members that hold jiffies to be of type
unsigned long.  This also corrects bad sysctl formating on 64 bit
architectures.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:55:17 -08:00
Vlad Yasevich
9834a2bb49 [SCTP]: Fix sctp_cookie alignment in the packet.
On 64 bit architectures, sctp_cookie sent as part of INIT-ACK is not
aligned on a 64 bit boundry and thus causes unaligned access exceptions.

The layout of the cookie prameter is this:
|<----- Parameter Header --------------------|<--- Cookie DATA --------
-----------------------------------------------------------------------
| param type (16 bits) | param len (16 bits) | sig [32 bytes] | cookie..
-----------------------------------------------------------------------

The cookie data portion contains 64 bit values on 64 bit architechtures
(timeval) that fall on a 32 bit alignment boundry when used as part of
the on-wire format, but align correctly when used in internal
structures.  This patch explicitely pads the on-wire format so that
it is properly aligned.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:52:12 -08:00
Adrian Bunk
5fad5a2e1f [PATCH] hostap: don't #include C files in hostap_main.c
This patch contains an attempt to properly build hostap.o without
#include'ing C files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-16 16:51:54 -05:00
Pete Zaitcev
0b8d3256a0 [PATCH] iw_handler.h: SIOCSIWNAME -> SIOCSIWCOMMIT in comment
The ioctl was renamed from SIOCSIWNAME to SIOCSIWCOMMIT.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-16 16:51:53 -05:00
Linus Torvalds
69eebed240 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-01-13 15:28:10 -08:00
Joe Perches
46b86a2da0 [NET]: Use NIP6_FMT in kernel.h
There are errors and inconsistency in the display of NIP6 strings.
	ie: net/ipv6/ip6_flowlabel.c

There are errors and inconsistency in the display of NIPQUAD strings too.
	ie: net/netfilter/nf_conntrack_ftp.c

This patch:
	adds NIP6_FMT to kernel.h
	changes all code to use NIP6_FMT
	fixes net/ipv6/ip6_flowlabel.c
	adds NIPQUAD_FMT to kernel.h
	fixes net/netfilter/nf_conntrack_ftp.c
	changes a few uses of "%u.%u.%u.%u" to NIPQUAD_FMT for symmetry to NIP6_FMT

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 14:29:07 -08:00
Per Liden
23b0ca5bf5 [PATCH] genetlink: don't touch module ref count
Increasing the module ref count at registration will block the module from
ever being unloaded. In fact, genetlink should not care about the owner at
all. This patch removes the owner field from the struct registered with
genetlink.

Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 13:06:40 -08:00
Harald Welte
2e4e6a17af [NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables
This monster-patch tries to do the best job for unifying the data
structures and backend interfaces for the three evil clones ip_tables,
ip6_tables and arp_tables.  In an ideal world we would never have
allowed this kind of copy+paste programming... but well, our world
isn't (yet?) ideal.

o introduce a new x_tables module
o {ip,arp,ip6}_tables depend on this x_tables module
o registration functions for tables, matches and targets are only
  wrappers around x_tables provided functions
o all matches/targets that are used from ip_tables and ip6_tables
  are now implemented as xt_FOOBAR.c files and provide module aliases
  to ipt_FOOBAR and ip6t_FOOBAR
o header files for xt_matches are in include/linux/netfilter/,
  include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers
  around the xt_FOOBAR.h headers

Based on this patchset we're going to further unify the code,
gradually getting rid of all the layer 3 specific assumptions.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-12 14:06:43 -08:00
Per Liden
593a5f22d8 [TIPC] More updates of file headers
Updated copyright notice to include the year the file was
actually created. Information about file creation dates
was extracted from the files in the old CVS repository
at tipc.sourceforge.net.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:39 -08:00
Per Liden
9da1c8b694 [TIPC] Update of file headers
The copyright statements from different parts of Ericsson
have been merged into one.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:38 -08:00
Per Liden
9ea1fd3c1a [TIPC] License header update
The license header in each file now more clearly state that this
code is licensed under a dual BSD/GPL. Before this was only
evident if you looked at the MODULE_LICENSE line in core.c.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:36 -08:00
Per Liden
ea714ccda5 [TIPC] Moved configuration interface into tipc_config.h
Restored the old tipc_config.h to get a cleaner division between the
interfaces used by normal TIPC users and TIPC administration utilities.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:35 -08:00
Per Liden
b97bf3fd8f [TIPC] Initial merge
TIPC (Transparent Inter Process Communication) is a protocol designed for
intra cluster communication. For more information see
http://tipc.sourceforge.net

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:31 -08:00
Johannes Berg
c9fa7d5d6c [PATCH] fix wrong comments in ieee80211.h
The comments in ieee80211.h claim that one doesn't need to set the len
parameter of the stats struct. But if one doesn't, the management frames
are read far over the memory they actually occupy causing badness.

Signed-Off-By: Johannes Berg <johannes@sipsolutions.net>

Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-12 16:39:45 -05:00
Stephen Hemminger
770cfbcffd [INET]: congestion and af_ops can be const
The congestion ops and af_ops in the inet_connection_sock
can be const.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 12:54:26 -08:00
Patrick McHardy
f43c5a0df3 [PKT_SCHED]: Convert tc action functions to single skb pointers
tcf_action_exec only gets a single skb pointer and doesn't own the skb,
but passes double skb pointers (to a local variable) to the action
functions. Change to use single skb pointers everywhere.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:08 -08:00
Patrick McHardy
538e43a4bd [PKT_SCHED]: Use USEC_PER_SEC
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:05 -08:00
Jan Blunck
6a878184c2 [PATCH] Eliminate __attribute__ ((packed)) warnings for gcc-4.1
Since version 4.1 the gcc is warning about ignored attributes. This patch is
using the equivalent attribute on the struct instead of on each of the
structure or union members.

GCC Manual:
  "Specifying Attributes of Types

   packed
    This attribute, attached to struct or union type definition, specifies
    that
    each member of the structure or union is placed to minimize the memory
    required. When attached to an enum definition, it indicates that the
    smallest integral type should be used.

    Specifying this attribute for struct and union types is equivalent to
    specifying the packed attribute on each of the structure or union
    members."

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:07 -08:00
Adrian Bunk
97dc627fb3 [IPV4]: make ip_fragment() static
Since there's no longer any external user of ip_fragment() we can make 
it static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 13:23:39 -08:00
Patrick McHardy
5c901daaea [NETFILTER]: Redo policy lookups after NAT when neccessary
When NAT changes the key used for the xfrm lookup it needs to be done
again. If a new policy is returned in POST_ROUTING the packet needs
to be passed to xfrm4_output_one manually after all hooks were called
because POST_ROUTING is called with fixed okfn (ip_finish_output).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:35 -08:00
Patrick McHardy
3e3850e989 [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
ip_route_me_harder doesn't use the port numbers of the xfrm lookup and
uses ip_route_input for non-local addresses which doesn't do a xfrm
lookup, ip6_route_me_harder doesn't do a xfrm lookup at all.

Use xfrm_decode_session and do the lookup manually, make sure both
only do the lookup if the packet hasn't been transformed already.

Makeing sure the lookup only happens once needs a new field in the
IP6CB, which exceeds the size of skb->cb. The size of skb->cb is
increased to 48b. Apparently the IPv6 mobile extensions need some
more room anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:33 -08:00
Patrick McHardy
8cdfab8a43 [IPV4]: reset IPCB flags when neccessary
Reset IPSKB_XFRM_TUNNEL_SIZE flags in ipip and ip_gre hard_start_xmit
function before the packet reenters IP. This is neccessary so the
encapsulated packets are checked not to be oversized in xfrm4_output.c
again. Reset all flags in sit when a packet changes its address family.

Also remove some obsolete IPSKB flags.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:32 -08:00
Patrick McHardy
b05e106698 [IPV4/6]: Netfilter IPsec input hooks
When the innermost transform uses transport mode the decapsulated packet
is not visible to netfilter. Pass the packet through the PRE_ROUTING and
LOCAL_IN hooks again before handing it to upper layer protocols to make
netfilter-visibility symetrical to the output path.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:31 -08:00
Patrick McHardy
951dbc8ac7 [IPV6]: Move nextheader offset to the IP6CB
Move nextheader offset to the IP6CB to make it possible to pass a
packet to ip6_input_finish multiple times and have it skip already
parsed headers. As a nice side effect this gets rid of the manual
hopopts skipping in ip6_input_finish.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:29 -08:00
Patrick McHardy
16a6677fdf [XFRM]: Netfilter IPsec output hooks
Call netfilter hooks before IPsec transforms. Packets visit the
FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation
and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode
transform.

Patch from Herbert Xu <herbert@gondor.apana.org.au>:

Move the loop from dst_output into xfrm4_output/xfrm6_output since they're
the only ones who need to it. xfrm{4,6}_output_one() processes the first SA
all subsequent transport mode SAs and is called in a loop that calls the
netfilter hooks between each two calls.

In order to avoid the tail call issue, I've added the inline function
nf_hook which is nf_hook_slow plus the empty list check.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:28 -08:00
Kris Katterjohn
4bad4dc919 [NET]: Change sk_run_filter()'s return type in net/core/filter.c
It should return an unsigned value, and fix sk_filter() as well.

Signed-off-by: Kris Katterjohn <kjak@ispwest.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:08:20 -08:00
Patrick McHardy
1bd9bef6f9 [NETFILTER]: Call POST_ROUTING hook before fragmentation
Call POST_ROUTING hook before fragmentation to get rid of the okfn use
in ip_refrag and save the useless fragmentation/defragmentation step
when NAT is used.

The patch introduces one user-visible change, the POSTROUTING chain
in the mangle table gets entire packets, not fragments, which should
simplify use of the MARK and CLASSIFY targets for queueing as a nice
side-effect.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:20:59 -08:00
Pablo Neira Ayuso
c1d10adb4a [NETFILTER]: Add ctnetlink port for nf_conntrack
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:19:05 -08:00
Stephen Hemminger
40efc6fa17 [TCP]: less inline's
TCP inline usage cleanup:
 * get rid of inline in several places
 * replace __inline__ with inline where possible
 * move functions used in one file out of tcp.h
 * let compiler decide on used once cases

On x86_64: 
   text	   data	    bss	    dec	    hex	filename
3594701	 648348	 567400	4810449	 4966d1	vmlinux.orig
3593133	 648580	 567400	4809113	 496199	vmlinux

On sparc64:
   text	   data	    bss	    dec	    hex	filename
2538278	 406152	 530392	3474822	 350586	vmlinux.ORIG
2536382	 406384	 530392	3473158	 34ff06	vmlinux

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 16:03:49 -08:00
Per Liden
b461d2f218 [NETLINK] genetlink: fix cmd type in genl_ops to be consistent to u8
Signed-off-by: Per Liden <per.liden@ericsson.com>
ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:13:29 -08:00
Benjamin LaHaise
fd19f329a3 [AF_UNIX]: Convert to use a spinlock instead of rwlock
From: Benjamin LaHaise <bcrl@kvack.org>

In af_unix, a rwlock is used to protect internal state.  At least on my 
P4 with HT it is faster to use a spinlock due to the simpler memory 
barrier used to unlock.  This patch raises bw_unix to ~690K/s.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:10:46 -08:00
Arnaldo Carvalho de Melo
8639a11e23 [TCP]: Don't use __constant_htonl for a non const arg
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:22 -08:00
Arnaldo Carvalho de Melo
14c850212e [INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:21 -08:00
Arnaldo Carvalho de Melo
25995ff577 [SOCK]: Introduce sk_receive_skb
Its common enough to to justify that, TCP still can't use it as it has the
prequeueing stuff, still to be made generic in the not so distant future :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:19 -08:00
Eric Dumazet
90ddc4f047 [NET]: move struct proto_ops to const
I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)

This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.

This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)

I made a global stubstitute to change all existing occurences to make
them const.

This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:15 -08:00
Frank Filz
7708610b1b [SCTP]: Add support for SCTP_DELAYED_ACK_TIME socket option.
Signed-off-by: Frank Filz <ffilz@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:13 -08:00
Frank Filz
52ccb8e90c [SCTP]: Update SCTP_PEER_ADDR_PARAMS socket option to the latest api draft.
This patch adds support to set/get heartbeat interval, maximum number of
retransmissions, pathmtu, sackdelay time for a particular transport/
association/socket as per the latest SCTP sockets api draft11.

Signed-off-by: Frank Filz <ffilz@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:11 -08:00
David S. Miller
fbe9cc4a87 [AF_UNIX]: Use spinlock for unix_table_lock
This lock is actually taken mostly as a writer,
so using a rwlock actually just makes performance
worse especially on chips like the Intel P4.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:59 -08:00
Arnaldo Carvalho de Melo
d83d8461f9 [IP_SOCKGLUE]: Remove most of the tcp specific calls
As DCCP needs to be called in the same spots.

Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:58 -08:00
Arnaldo Carvalho de Melo
2271281362 [TCP]: Move the TCPF_ enum to tcp_states.h
Upcoming patches will make, for instance, ip_sockglue.c need just this enum
and not all of tcp.h.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:57 -08:00
Arnaldo Carvalho de Melo
d8313f5ca2 [INET6]: Generalise tcp_v6_hash_connect
Renaming it to inet6_hash_connect, making it possible to ditch
dccp_v6_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:56 -08:00
Arnaldo Carvalho de Melo
a7f5e7f164 [INET]: Generalise tcp_v4_hash_connect
Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:55 -08:00
Arnaldo Carvalho de Melo
6d6ee43e0b [TWSK]: Introduce struct timewait_sock_ops
So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.

Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:54 -08:00
Arnaldo Carvalho de Melo
399c07def6 [IPV6]: Export ipv6_opt_accepted
It was already non-TCP specific, will be used by DCCPv6.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:51 -08:00
Arnaldo Carvalho de Melo
0fa1a53e1f [IPV6]: Introduce inet6_timewait_sock
Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.

tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:47 -08:00
Arnaldo Carvalho de Melo
b9750ce13c [IPV6]: Generalise some functions
Using sk->sk_protocol instead of IPPROTO_TCP.

Will be used by DCCPv6 in the next changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:46 -08:00
Benjamin LaHaise
c1cbe4b7ad [NET]: Avoid atomic xchg() for non-error case
It also looks like there were 2 places where the test on sk_err was
missing from the event wait logic (in sk_stream_wait_connect and
sk_stream_wait_memory), while the rest of the sock_error() users look
to be doing the right thing.  This version of the patch fixes those,
and cleans up a few places that were testing ->sk_err directly.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:44 -08:00
Arnaldo Carvalho de Melo
af05dc9394 [ICSK]: Move v4_addr2sockaddr from TCP to icsk
Renaming it to inet_csk_addr2sockaddr.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:39 -08:00
Arnaldo Carvalho de Melo
8292a17a39 [ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops
And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:38 -08:00
Arnaldo Carvalho de Melo
8129765ac0 [IPV6]: Generalise tcp_v6_search_req & tcp_v6_synq_add
More work is needed tho to introduce inet6_request_sock from
tcp6_request_sock, in the same layout considerations as ipv6_pinfo in
inet_sock, next changeset will do that.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:36 -08:00
Arnaldo Carvalho de Melo
c2977c2213 [ICSK]: make inet_csk_reqsk_queue_hash_add timeout arg unsigned long
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:34 -08:00
Arnaldo Carvalho de Melo
90b19d3169 [IPV6]: Generalise __tcp_v6_hash, renaming it to __inet6_hash
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:33 -08:00
Arnaldo Carvalho de Melo
971af18bbf [IPV6]: Reuse inet_csk_get_port in tcp_v6_get_port
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:33 -08:00
Herbert Xu
89cee8b1cb [IPV4]: Safer reassembly
Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.

(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)

This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:31 -08:00
Trent Jaeger
df71837d50 [LSM-IPSec]: Security association restriction.
This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets.  Extensions to the SELinux LSM are
included that leverage the patch for this purpose.

This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.

Patch purpose:

The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association.  Such access
controls augment the existing ones based on network interface and IP
address.  The former are very coarse-grained, and the latter can be
spoofed.  By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.

Patch design approach:

The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.

A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.

Patch implementation details:

On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools).  This is enforced in xfrm_state_find.

On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.

The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.

Also, if IPSec is used without security contexts, the impact is
minimal.  The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.

Testing:

The pfkey interface is tested using the ipsec-tools.  ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.

The xfrm_user interface is tested via ad hoc programs that set
security contexts.  These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface.  Testing of sa functions was done by tracing kernel
behavior.

Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:24 -08:00
David L Stevens
5ab4a6c81e [IPV6] mcast: Fix multiple issues in MLDv2 reports.
The below "jumbo" patch fixes the following problems in MLDv2.

1) Add necessary "ntohs" to recent "pskb_may_pull" check [breaks
        all nonzero source queries on little-endian (!)]

2) Add locking to source filter list [resend of prior patch]

3) fix "mld_marksources()" to
        a) send nothing when all queried sources are excluded
        b) send full exclude report when source queried sources are
                not excluded
        c) don't schedule a timer when there's nothing to report

NOTE: RFC 3810 specifies the source list should be saved and each
  source reported individually as an IS_IN. This is an obvious DOS
  path, requiring the host to store and then multicast as many sources
  as are queried (e.g., millions...). This alternative sends a full, 
  relevant report that's limited to number of sources present on the
  machine.

4) fix "add_grec()" to send empty-source records when it should
        The original check doesn't account for a non-empty source
        list with all sources inactive; the new code keeps that
        short-circuit case, and also generates the group header
        with an empty list if needed.

5) fix mca_crcount decrement to be after add_grec(), which needs
        its original value

These issues (other than item #1 ;-) ) were all found by Yan Zheng,
much thanks!

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-27 14:03:00 -08:00
YOSHIFUJI Hideaki
3c21edbd11 [IPV6]: Defer IPv6 device initialization until the link becomes ready.
NETDEV_UP might be sent even if the link attached to the interface was
not ready.  DAD does not make sense in such case, so we won't do so.
After interface

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-12-21 22:57:24 +09:00
David S. Miller
399c180ac5 [IPSEC]: Perform SA switchover immediately.
When we insert a new xfrm_state which potentially
subsumes an existing one, make sure all cached
bundles are flushed so that the new SA is used
immediately.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-19 14:23:23 -08:00
Steven Whitehouse
1f12bcc9d1 [DECNET]: add memory buffer settings
The patch (originally from Steve) simply adds memory buffer settings to 
DECnet similar to those in TCP.

Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:42:06 -08:00
Jamal Hadi Salim
0ff60a4567 [IPV4]: Fix secondary IP addresses after promotion
This patch fixes the problem with promoting aliases when:
a) a single primary and > 1 secondary addresses
b) multiple primary addresses each with at least one secondary address

Based on earlier efforts from Brian Pomerantz <bapper@piratehaven.org>,
Patrick McHardy <kaber@trash.net> and Thomas Graf <tgraf@suug.ch>

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22 14:47:37 -08:00
David S. Miller
1ef43204f4 Merge git://git.skbuff.net/gitroot/yoshfuji/linux-2.6.14+advapi-fix/ 2005-11-20 20:52:16 -08:00
YOSHIFUJI Hideaki
df9890c31a [IPV6]: Fix sending extension headers before and including routing header.
Based on suggestion from Masahide Nakamura <nakam@linux-ipv6.org>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-11-20 12:23:18 +09:00
Andrew Morton
cea00da397 [PATCH] git-netdev-all-ieee80211_get_payload-warning-fix
include/net/ieee80211.h: In function `ieee80211_get_payload':
include/net/ieee80211.h:1046: warning: control reaches end of non-void function

Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-11-18 13:33:31 -05:00
Stephen Hemminger
31f3426904 [TCP]: More spelling fixes.
From Joe Perches

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-15 15:17:10 -08:00
Jochen Friedrich
cf22535657 [LLC]: Fix typo
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 21:58:18 -08:00
Neil Horman
049b3ff5a8 [SCTP]: Include ulpevents in socket receive buffer accounting.
Also introduces a sysctl option to configure the receive buffer
accounting policy to be either at socket or association level.
Default is all the associations on the same socket share the
receive buffer.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 16:08:24 -08:00
Vladislav Yasevich
19c7e9eef5 [SCTP]: Fix ia64 NaT consumption fault with sctp_sideffect commands.
On ia64, it is possible to get NaT Consumption Fault and a kernel panic
when initializing sctp sideeffect commands arguments.  The union
sctp_arg_t contains different sized elements and when loading a smaller
sized element (32 or 16 bits), it is possible for a speculative load to
fail and result in a NaT bit set which causes a kernel crash.  The easy
way to get around it is to load the largerst member of the union.

Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 16:07:40 -08:00
Vladislav Yasevich
1e7d3d90c9 [SCTP]: Remove timeouts[] array from sctp_endpoint.
The socket level timeout values are maintained in sctp_sock and
association level timeouts are in sctp_association. So there is
no need for ep->timeouts.

Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 16:06:16 -08:00
Stephen Hemminger
6a438bbe68 [TCP]: speed up SACK processing
Use "hints" to speed up the SACK processing. Various forms 
of this have been used by TCP developers (Web100, STCP, BIC)
to avoid the 2x linear search of outstanding segments.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:14:59 -08:00
Stephen Hemminger
caa20d9abe [TCP]: spelling fixes
Minor spelling fixes for TCP code.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:13:47 -08:00
Stephen Hemminger
9772efb970 [TCP]: Appropriate Byte Count support
This is an updated version of the RFC3465 ABC patch originally
for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
bytes ack'd rather than packets when updating congestion control.

The orignal ABC described in the RFC applied to a Reno style
algorithm. For advanced congestion control there is little
change after leaving slow start.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:09:53 -08:00
Stephen Hemminger
7faffa1c7f [TCP]: add tcp_slow_start helper
Move all the code that does linear TCP slowstart to one
inline function to ease later patch to add ABC support.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:07:24 -08:00
Stephen Hemminger
f4805eded7 [TCP]: fix congestion window update when using TSO deferal
TCP peformance with TSO over networks with delay is awful.
On a 100Mbit link with 150ms delay, we get 4Mbits/sec with TSO and
50Mbits/sec without TSO.

The problem is with TSO, we intentionally do not keep the maximum
number of packets in flight to fill the window, we hold out to until 
we can send a MSS chunk. But, we also don't update the congestion window 
unless we have filled, as per RFC2861.

This patch replaces the check for the congestion window being full
with something smarter that accounts for TSO.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 16:53:30 -08:00
Herbert Xu
fb286bb299 [NET]: Detect hardware rx checksum faults correctly
Here is the patch that introduces the generic skb_checksum_complete
which also checks for hardware RX checksum faults.  If that happens,
it'll call netdev_rx_csum_fault which currently prints out a stack
trace with the device name.  In future it can turn off RX checksum.

I've converted every spot under net/ that does RX checksum checks to
use skb_checksum_complete or __skb_checksum_complete with the
exceptions of:

* Those places where checksums are done bit by bit.  These will call
netdev_rx_csum_fault directly.

* The following have not been completely checked/converted:

ipmr
ip_vs
netfilter
dccp

This patch is based on patches and suggestions from Stephen Hemminger
and David S. Miller.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 13:01:24 -08:00
Thomas Graf
482a8524f8 [NETLINK]: Generic netlink family
The generic netlink family builds on top of netlink and provides
simplifies access for the less demanding netlink users. It solves
the problem of protocol numbers running out by introducing a so
called controller taking care of id management and name resolving.

Generic netlink modules register themself after filling out their
id card (struct genl_family), after successful registration the
modules are able to register callbacks to command numbers by
filling out a struct genl_ops and calling genl_register_op(). The
registered callbacks are invoked with attributes parsed making
life of simple modules a lot easier.

Although generic netlink modules can request static identifiers,
it is recommended to use GENL_ID_GENERATE and to let the controller
assign a unique identifier to the module. Userspace applications
will then ask the controller and lookup the idenfier by the module
name.

Due to the current multicast implementation of netlink, the number
of generic netlink modules is restricted to 1024 to avoid wasting
memory for the per socket multiacst subscription bitmask.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:41 +01:00
Thomas Graf
82ace47a72 [NETLINK]: Generic netlink receive queue processor
Introduces netlink_run_queue() to handle the receive queue of
a netlink socket in a generic way. Processes as much as there
was in the queue upon entry and invokes a callback function
for each netlink message found. The callback function may
refuse a message by returning a negative error code but setting
the error pointer to 0 in which case netlink_run_queue() will
return with a qlen != 0.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:40 +01:00
Thomas Graf
bfa83a9e03 [NETLINK]: Type-safe netlink messages/attributes interface
Introduces a new type-safe interface for netlink message and
attributes handling. The interface is fully binary compatible
with the old interface towards userspace. Besides type safety,
this interface features attribute validation capabilities,
simplified message contstruction, and documentation.

The resulting netlink code should be smaller, less error prone
and easier to understand.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:40 +01:00
Yasuyuki Kozakai
9fb9cbb108 [NETFILTER]: Add nf_conntrack subsystem.
The existing connection tracking subsystem in netfilter can only
handle ipv4.  There were basically two choices present to add
connection tracking support for ipv6.  We could either duplicate all
of the ipv4 connection tracking code into an ipv6 counterpart, or (the
choice taken by these patches) we could design a generic layer that
could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
(TCP, UDP, etc.) connection tracking helper module to be written.

In fact nf_conntrack is capable of working with any layer 3
protocol.

The existing ipv4 specific conntrack code could also not deal
with the pecularities of doing connection tracking on ipv6,
which is also cured here.  For example, these issues include:

1) ICMPv6 handling, which is used for neighbour discovery in
   ipv6 thus some messages such as these should not participate
   in connection tracking since effectively they are like ARP
   messages

2) fragmentation must be handled differently in ipv6, because
   the simplistic "defrag, connection track and NAT, refrag"
   (which the existing ipv4 connection tracking does) approach simply
   isn't feasible in ipv6

3) ipv6 extension header parsing must occur at the correct spots
   before and after connection tracking decisions, and there were
   no provisions for this in the existing connection tracking
   design

4) ipv6 has no need for stateful NAT

The ipv4 specific conntrack layer is kept around, until all of
the ipv4 specific conntrack helpers are ported over to nf_conntrack
and it is feature complete.  Once that occurs, the old conntrack
stuff will get placed into the feature-removal-schedule and we will
fully kill it off 6 months later.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-09 16:38:16 -08:00
Linus Torvalds
a7c243b544 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2005-11-09 08:34:36 -08:00
Christoph Hellwig
e3305626e0 ieee80211: cleanup crypto list handling, other minor cleanups. 2005-11-09 01:01:04 -05:00
Jeff Garzik
f24e09754b Merge rsync://bughost.org/repos/ieee80211-delta/ 2005-11-09 00:00:29 -05:00
Marcel Holtmann
be9d122730 [Bluetooth]: Remove the usage of /proc completely
This patch removes all relics of the /proc usage from the Bluetooth
subsystem core and its upper layers. All the previous information are
now available via /sys/class/bluetooth through appropriate functions.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:57:38 -08:00
Marcel Holtmann
1ebb92521d [Bluetooth]: Add endian annotations to the core
This patch adds the endian annotations to the Bluetooth core.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:57:21 -08:00
Stephen Hemminger
9ee6b535af [NET]: sk_add_backlog convert from macro to inline
There is no reason for sk_add_backlog to be a macro. It can
just be an inline function and get type checking.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:39:42 -08:00
YOSHIFUJI Hideaki
b1cacb6820 [IPV6]: Make ipv6_addr_type() more generic so that we can use it for source address selection.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:38:12 -08:00
YOSHIFUJI Hideaki
971f359ddc [IPV6]: Put addr_diff() into common header for future use.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:37:56 -08:00
James Ketrenos
d7e02edbc5 Update version ieee80211 stamp to 1.1.7 2005-11-07 16:19:17 -06:00
Arnaldo Carvalho de Melo
2d43f1128a Merge branch 'red' of 84.73.165.173:/home/tgr/repos/net-2.6 2005-11-05 22:30:29 -02:00
Stephen Hemminger
6df716340d [TCP/DCCP]: Randomize port selection
This patch randomizes the port selected on bind() for connections
to help with possible security attacks. It should also be faster
in most cases because there is no need for a global lock.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 21:23:15 -02:00
Thomas Graf
2566a509ca [NET]: Introduce INET_ECN_set_ce() function
Changes IP_ECN_set_ce() and IP6_ECN_set_ce() to return 0 if the CE
bits could not bet set because none of the ECT bits are set or 1
if the CE bits are already set or have been successfully set.

Introduces INET_ECN_set_ce(skb) to enable CE bits for all supported
protocols.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:24 +01:00
Thomas Graf
a783474591 [PKT_SCHED]: Generic RED layer
Extracts the RED algorithm from sch_red.c and puts it into include/net/red.h
for use by other RED based modules. The statistics are extended to be more
fine grained in order to differ between probability/forced marks/drops.
We now reset the average queue length when setting new parameters, leaving
it might result in an unreasonable qavg for a while depending on the value of W.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:24 +01:00
Linus Torvalds
62d3af1b5f Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2005-10-29 11:25:16 -07:00
Arnaldo Carvalho de Melo
974f7bc578 Merge master.kernel.org:/pub/scm/linux/kernel/git/sridhar/lksctp-2.6 2005-10-28 23:35:02 -02:00
Jeff Garzik
596c96ba06 Merge branch 'master' 2005-10-28 18:48:57 -04:00
Ivan Skytte Jorgensen
eaa5c54dbe [SCTP] Rename SCTP specific control message flags.
Rename SCTP specific control message flags to use SCTP_ prefix rather than
MSG_ prefix as per the latest sctp sockets API draft.

Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28 15:10:00 -07:00
Jesper Juhl
b4558ea93d drivers/net: Remove pointless checks for NULL prior to calling kfree() 2005-10-28 16:53:13 -04:00
Arnaldo Carvalho de Melo
de5144164f Merge master.kernel.org:/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2005-10-28 15:49:24 -02:00
Marcel Holtmann
6516455d3b [Bluetooth] Make more functions static
This patch makes another bunch of functions static.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-10-28 19:20:48 +02:00
Marcel Holtmann
408c1ce271 [Bluetooth] Move CRC table into RFCOMM core
This patch moves rfcomm_crc_table[] into the RFCOMM core, because there
is no need to keep it in a separate file.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-10-28 19:20:36 +02:00
Linus Torvalds
e5dfa9282f Merge branch 'upstream' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2005-10-28 09:05:25 -07:00
Linus Torvalds
236fa08168 Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.15 2005-10-28 08:50:37 -07:00
Al Viro
7d877f3bda [PATCH] gfp_t: net/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:47 -07:00
Jeff Garzik
1f57389a38 Merge branch 'master' 2005-10-26 01:06:45 -04:00
Herbert Xu
80b30c1023 [IPSEC]: Kill obsolete get_mss function
Now that we've switched over to storing MTUs in the xfrm_dst entries,
we no longer need the dst's get_mss methods.  This patch gets rid of
them.

It also documents the fact that our MTU calculation is not optimal
for ESP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:48:45 -02:00
Jochen Friedrich
5ed688a716 [LLC]: Strip RIF flag from source MAC address
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25 21:34:39 -02:00
Ralf Baechle
4595f25105 [AX.25]: Fix signed char bug
On architectures where the char type defaults to unsigned some of the
arithmetic in the AX.25 stack to fail, resulting in some packets being dropped
on receive.

Credits for tracking this down and the original patch to
Bob Brose N0QBJ <linuxhams@n0qbj-11.ampr.org>.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-22 17:20:50 -02:00
James Ketrenos
519a62bb8a Update version ieee80211 stamp to 1.1.6 2005-10-21 13:00:40 -05:00
Jeff Garzik
59aee3c2a1 Merge branch 'master' 2005-10-13 21:22:27 -04:00
Arnaldo Carvalho de Melo
eeb2b85606 [TWSK]: Grab the module refcount for timewait sockets
This is required to avoid unloading a module that has active timewait
sockets, such as DCCP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:25:23 -07:00
Al Viro
dd0fc66fb3 [PATCH] gfp flags annotations - part 1
- added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 15:00:57 -07:00
Sridhar Samudrala
20c9c825b1 [SCTP] Fix SCTP socket options to work with 32-bit apps on 64-bit kernels.
Adds alignment attribute to a few structures used with SCTP socket
options so that the sizes and offsets remain the same when built using
either 32 or 64 bit tools.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-06 21:37:01 -07:00
Ivan Skytte Jrgensen
5fe467ee97 [SCTP] Fix sctp_get{pl}addrs() API to work with 32-bit apps on 64-bit kernels.
The old socket options are marked with a _OLD suffix so that the
existing 32-bit apps on 32-bit kernels do not break.

Signed-off-by: Ivan Skytte Jrgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-06 21:36:17 -07:00
Herbert Xu
77d8d7a684 [IPSEC]: Document that policy direction is derived from the index.
Here is a patch that adds a helper called xfrm_policy_id2dir to
document the fact that the policy direction can be and is derived
from the index.

This is based on a patch by YOSHIFUJI Hideaki and 210313105@suda.edu.cn.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05 12:15:12 -07:00
Randy Dunlap
83fa3400eb [XFRM]: fix sparse gfp nocast warnings
Fix implicit nocast warnings in xfrm code:
net/xfrm/xfrm_policy.c:232:47: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:45:35 -07:00
Randy Dunlap
8eea00a44d [IPVS]: fix sparse gfp nocast warnings
From: Randy Dunlap <rdunlap@xenotime.net>

Fix implicit nocast warnings in ip_vs code:
net/ipv4/ipvs/ip_vs_app.c:631:54: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:42:15 -07:00
Randy Dunlap
f4a19a56e3 [DECNET]: fix sparse gfp nocast warnings
Fix implicit nocast warnings in decnet code:
net/decnet/af_decnet.c:458:40: warning: implicit cast to nocast type
net/decnet/dn_nsp_out.c:125:35: warning: implicit cast to nocast type
net/decnet/dn_nsp_out.c:219:29: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:41:48 -07:00
Eric Dumazet
6d2553612f [INET]: Shrink struct inet_ehash_bucket on 32 bits UP
No need to align struct inet_ehash_bucket on a 8 bytes boundary.

On 32 bits Uniprocessor, that's a waste of 4 bytes per struct (50 %)

On other platforms, the attribute is useless, natual alignement is already 8.

platform     | Size before | Size after patch
-------------+-------------+------------------
32 bits, UP  |         8   |     4
32 bits, SMP |         8   |     8
64 bits, UP  |         8   |     8
64 bits, SMP |        16   |    16

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 15:55:51 -07:00
Jeff Garzik
13d1ef29bc Merge rsync://bughost.org/repos/ieee80211-delta/ 2005-10-04 08:22:13 -04:00
Jeff Garzik
3c8c7b2f32 Merge branch 'upstream-fixes' 2005-10-03 22:06:19 -04:00
Eric Dumazet
81c3d5470e [INET]: speedup inet (tcp/dccp) lookups
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &head->chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))
     ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
     ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))                 &&  \
     (inet_sk(__sk)->daddr          == (__saddr))   &&  \
     (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))


- Adds a prefetch(head->chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&head->lock);

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:13:38 -07:00
James Ketrenos
ff0037b259 Lindent and trailing whitespace script executed ieee80211 subsystem
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03 10:23:42 -05:00
Ivo van Doorn
7c254d3dba This will move the ieee80211_is_ofdm_rate function to the ieee80211.h
header, and I also added the ieee80211_is_cck_rate counterpart.

Various drivers currently create there own version of these functions,
but I guess the ieee80211 stack is the best place to provide such
routines.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03 09:50:40 -05:00
Michael Wu
604116a32e This patch fixes a typo in ieee80211.h: ieee82011_deauth -> ieee80211_deauth
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-09-28 15:16:46 -05:00
James Ketrenos
6eb6edf04a [PATCH] ieee80211: in-tree driver updates to sync with latest ieee80211 series
Changed crypto method from requiring a struct ieee80211_device reference
to the init handler.  Instead we now have a get/set flags method for
each crypto component.

Setting of TKIP countermeasures can now be done via
set_flags(IEEE80211_CRYPTO_TKIP_COUNTERMEASURES)

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-22 15:40:59 -04:00
James Ketrenos
e5658d3e8a [PATCH] ieee80211: added IE comments, reason_code to reason, removed info_element from ieee80211_disassoc
tree 0254e7c97cece038cd11b47a16027c6379e464fe
parent a84f7713dc87ca1b51c6d53b391087663425a080
author James Ketrenos <jketreno@linux.intel.com> 1126661324 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127319069 -0500

Updated based on Michael Wu's patch and comments sent to netdev.

Added IE comments to ieee80211_* frame structures.
Changed reason_code to reason (consistency)
Removed info_element from ieee80211_disassoc

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-22 15:39:41 -04:00
James Ketrenos
31b59eaee8 [PATCH] ieee80211: Added handle_deauth() callback, enhanced tkip/ccmp support of varying hw/sw offload
tree de81b55e78e85997642c651ea677078d0554a14f
parent c8030da8c159f8b82712172a6748a42523aea83a
author James Ketrenos <jketreno@linux.intel.com> 1127104380 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127315225 -0500

Added handle_deauth() callback.
Enhanced crypt_{tkip,ccmp} to support varying splits of HW/SW offload.
Changed channel freq to u32 from u16.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-22 15:39:41 -04:00
James Ketrenos
31696160c7 [PATCH] ieee80211: Added subsystem version string and reporting via MODULE_VERSION
tree c1b50ac5d2d1f9b727c39c6bd86a7872f25a1127
parent 1bb997a3ac7dd1941e02426d2f70bd28993a82b7
author James Ketrenos <jketreno@linux.intel.com> 1126720779 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127314674 -0500

Added subsystem version string and reporting via MODULE_VERSION and
pritnk during load.

NOTE:  This is the version support split out from patch 24/29 of the
prior series.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-22 15:39:41 -04:00
Arnaldo Carvalho de Melo
8420e1b541 [LLC]: fix llc_ui_recvmsg, making it behave like tcp_recvmsg
In fact it is an exact copy of the parts that makes sense to LLC :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-22 08:29:08 -03:00
Arnaldo Carvalho de Melo
d389424e00 [LLC]: Fix the accept path
Borrowing the structure of TCP/IP for this. On the receive of new connections I
was bh_lock_socking the _new_ sock, not the listening one, duh, now it survives
the ssh connections storm I've been using to test this specific bug.

Also fixes send side skb sock accounting.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-22 07:57:21 -03:00
Arnaldo Carvalho de Melo
2928c19e10 [LLC]: Fix sparse warnings
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-22 05:14:33 -03:00
Arnaldo Carvalho de Melo
6e2144b768 [LLC]: Use refcounting with struct llc_sap
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-22 04:43:05 -03:00
Arnaldo Carvalho de Melo
04e4223f44 [LLC]: Do better struct sock accounting on skbs
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-22 04:40:59 -03:00
Arnaldo Carvalho de Melo
590232a715 [LLC]: Add sysctl support for the LLC timeouts
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-22 04:30:44 -03:00
Arnaldo Carvalho de Melo
1d67e6501b [LLC]: Make llc_frame_alloc take a net_device as an argument
So as to set the newly created sk_buff ->dev member with it, that way we stop
using dev_base->next, that is the wrong thing to do, as there may well be
several interfaces being used with LLC. This was not such a big problem after
all as most of the users of llc_alloc_frame were setting the correct dev, but
this way code is reduced.

This also fixes another bug in llc_station_ac_send_null_dsap_xid_c, that was
not setting the skb->dev field.

Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-22 03:27:56 -03:00
James Ketrenos
9a01c16bd4 [PATCH] ieee82011: Remove WIRELESS_EXT ifdefs
Remove old WIRELESS_EXT version compatibility

In-tree doesn't need to maintain backward compatibility.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:19:09 -04:00
James Ketrenos
ebeaddcc02 [PATCH] ieee80211: Updated copyright dates
tree 0d3e41e574fcb41b9da7f0b7e1d27ec350726654
parent dbe2885fe2f454d538eaaabefc741ded1026f476
author James Ketrenos <jketreno@linux.intel.com> 1126720499 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127314531 -0500

Updated copyright dates.

NOTE:  This is a split out of just the copyright updates from patch
24/29 in the prior series.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:04:58 -04:00
James Ketrenos
ccd0fda3a6 [PATCH] ieee80211: Mixed PTK/GTK CCMP/TKIP support
tree 5c7559a1216ae1121487f6aed94a6017490729b3
parent c1ff4c22e5622c8987bf96c09158c4924cde98c2
author Hong Liu <hong.liu@intel.com> 1125482767 +0800
committer James Ketrenos <jketreno@linux.intel.com> 1127314427 -0500

Mixed PTK/GTK CCMP/TKIP support.

Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:04:57 -04:00
James Ketrenos
42c94e43be [PATCH] ieee80211: Type-o, capbility definition for QoS, and ERP parsing
tree 3ac0dd07b9972dfd68fee47ec2152d3d378de000
parent 9ada1d971d9829c34a14d98840080b7e69fdff6b
author Mohamed Abbad <mohamed.abbas@intel.com> 1126054379 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127314340 -0500

Type-o, capbility definition for QoS, and ERP parsing

Added WLAN_CAPABILITY_QOS
Fixed type-o WLAN_CAPABILITY_OSSS_OFDM -> WLAN_CAPABILITY_DSSS_OFDM
Added ERP IE parsing to ieee80211_rx
Added handle_probe_request callback.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:04:57 -04:00
James Ketrenos
9ba7e0d157 [PATCH] ieee80211: "extern inline" to "static inline"
tree bce04549ce0a8239d8083d8da5c3d12f7e1aecd9
parent b15a5153d5f1c75d9435d5ce19b52287059d5d54
author Adrian Bunk <bunk@stusta.de> 1125026386 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127313953 -0500

"extern inline" doesn't make much sense.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:03:55 -04:00
James Ketrenos
cdcfc21082 [PATCH] ieee80211: Additional fixes for endian-aware types
tree 589bbb92ce7cdf7c2ae820b0ebd3f8fbf1baeee9
parent c6ce9081e79e8836a11e86e3d38297521a2420be
author Jiri Benc <jbenc@suse.cz> 1125015310 -0400
committer James Ketrenos <jketreno@linux.intel.com> 1127313914 -0500

Additional fixes for endian-aware types

Based on the application of __le16/__be16 changes already made w/ a
prior patch by Michael Wu <flamingice@sourmilk.net>

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:03:55 -04:00
James Ketrenos
3905ec4561 [PATCH] ieee80211: Added ieee80211_radiotap.h
tree 383c59b2516a61f2683f02dfebbed0caf6ee5dc3
parent a04948f63fd96c4b875a43f78afad1a0874cc441
author Mike Kershaw <dragorn@kismetwireless.net> 1124447833 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127313883 -0500

Added ieee80211_radiotap.h to enhance statistic reporting to user space
from wireless drivers.

Signed-off-by: Mike Kershaw <dragorn@kismetwireless.net>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:03:55 -04:00
James Ketrenos
02cda6ae01 [PATCH] ieee80211: Added ieee80211_geo to provide helper functions
tree 385b391fc0d7c124cd0547fdb6183e9a0c333391
parent 97d7a47f76e72bedde7f402785559ed4c7a8e8e8
author James Ketrenos <jketreno@linux.intel.com> 1124447590 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127313735 -0500

Added ieee80211_geo to provide helper functions to drivers for
implementing supported channel maps.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:03:55 -04:00
James Ketrenos
9e8571affd [PATCH] ieee80211: Add QoS (WME) support to the ieee80211 subsystem
tree a3ad796273e98036eb0e9fc063225070fa24508a
parent 1b9c0aeb377abf8e4a43a86cff42382f74ca0259
author Mohamed Abbas <mabbas@linux.intel.com> 1124447069 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127313435 -0500

Add QoS (WME) support to the ieee80211 subsystem.

NOTE: This requires drivers that use the ieee80211 hard_start_xmit
(ipw2100 and ipw2200) to add the priority parameter to their callback.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:03:54 -04:00
James Ketrenos
1264fc0498 [PATCH] ieee80211: Fix TKIP, repeated fragmentation problem, and payload_size reporting
tree 8428e9f510e6ad6c77baec89cb57374842abf733
parent d78bfd3ddae9c422dd350159110f9c4d7cfc50de
author Liu Hong <hong.liu@intel.com> 1124446520 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127313183 -0500

Fix TKIP, repeated fragmentation problem, and payload_size reporting

1. TKIP encryption
    Originally, TKIP encryption issues msdu + mpdu encryption on every
    fragment. Change the behavior to msdu encryption on the whole
    packet, then mpdu encryption on every fragment.

2. Avoid repeated fragmentation when !host_encrypt.
    We only need do fragmentation when using host encryption. Otherwise
    we only need pass the whole packet to driver, letting driver do the
    fragmentation.

3. change the txb->payload_size to correct value
    FW will use this value to determine whether to do fragmentation. If
    we pass the wrong value, fw may cut on the wrong bound which will
    make decryption fail when we do host encryption.

NOTE:  This requires changing drivers (hostap) that have
extra_prefix_len used within them (structure member name change).

Signed-off-by: Hong Liu <liu.hong@intel.com>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:02:31 -04:00
James Ketrenos
3f552bbf86 [PATCH] ieee82011: Added ieee80211_tx_frame to convert generic 802.11 data frames, and callbacks
tree 40adc78b623ae70d56074934ec6334eb4f0ae6a5
parent db43d847bcebaa3df6414e26d0008eb21690e8cf
author James Ketrenos <jketreno@linux.intel.com> 1124445938 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127313102 -0500

Added ieee80211_tx_frame to convert generic 802.11 data frames into
txbs for transmission.

Added several purpose specific callbacks (handle_assoc, handle_auth,
etc.) which the driver can register with for being notified on
reception of variouf frame elements.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:02:31 -04:00
James Ketrenos
3cdd00c582 [PATCH] ieee80211: adds support for the creation of RTS packets
tree b45c9c1017fd23216bfbe71e441aed9aa297fc84
parent 04aacdd71e904656a304d923bdcf57ad3bd2b254
author Ivo van Doorn <IvDoorn@gmail.com> 1124445405 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127313029 -0500

This patch adds support for the creation of RTS packets when the
config flag CFG_IEEE80211_RTS has been set.

Signed-Off-By: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:02:30 -04:00
James Ketrenos
ee34af37c0 [PATCH] ieee80211: Renamed ieee80211_hdr to ieee80211_hdr_3addr
tree e9c18b2c8e5ad446a4d213243c2dcf9fd1652a7b
parent 4e97ad6ae7084a4f741e94e76c41c68bc7c5a76a
author James Ketrenos <jketreno@linux.intel.com> 1124444315 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127312922 -0500

Renamed ieee80211_hdr to ieee80211_hdr_3addr and modified ieee80211_hdr
to just contain the frame_ctrl and duration_id.

Changed uses of ieee80211_hdr to ieee80211_hdr_4addr or
ieee80211_hdr_3addr based on what was expected for that portion of code.

NOTE: This requires changes to ipw2100, ipw2200, hostap, and atmel
drivers.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:02:30 -04:00
James Ketrenos
e0d369d1d9 [PATCH] ieee82011: Added WE-18 support to default wireless extension handler
tree 1536f39c18756698d033da72c49300a561be1289
parent 07172d7c9f10ee3d05d6f6489ba6d6ee2628da06
author Liu Hong <hong.liu@intel.com> 1124436225 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127312664 -0500

Added WE-18 support to default wireless extension handler in ieee80211
subsystem.

Updated patch since last send to account for ieee80211_device parameter
being added to the crypto init method.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:02:30 -04:00
James Ketrenos
259bf1fd8a [PATCH] ieee80211: Allow drivers to fix an issue when using wpa_supplicant with WEP
tree 898fedef6ca1b5b58b8bdf7e6d8894a78bbde4cd
parent 8720fff53090ae428d2159332b6f4b2749dea10f
author Zhu Yi <jketreno@io.(none)> 1124435746 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127312509 -0500

Allow drivers to fix an issue when using wpa_supplicant with WEP.

The problem is introduced by the hwcrypto patch. We changed indicator of
the encryption request from the upper layer (i.e. wpa_supplicant):

In the original host based crypto the driver could use: crypt &&
crypt->ops.

In the new hardware based crypto, the driver should use the flags
specified in ieee->sec.encrypt.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:01:52 -04:00
James Ketrenos
0ad0c3c644 [PATCH] ieee80211: Fix kernel Oops when module unload
tree b69e983266840983183a00f5ac02c66d5270ca47
parent cdd6372949b76694622ed74fe36e1dd17a92eb71
author Zhu Yi <jketreno@io.(none)> 1124435425 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127312421 -0500

Fix kernel Oops when module unload.

Export a new function ieee80211_crypt_quiescing from ieee80211. Device
drivers call it to make the host crypto stack enter the quiescence
state, which means "process existing requests, but don't accept new
ones". This is usually called during a driver's host crypto data
structure free (module unload) path.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:01:52 -04:00
James Ketrenos
f1bf6638af [PATCH] ieee80211: Hardware crypto and fragmentation offload support
tree 5322d496af90d03ffbec27292dc1a6268a746ede
parent 6c9364386ccb786e4a84427ab3ad712f0b7b8904
author James Ketrenos <jketreno@linux.intel.com> 1124432367 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127311810 -0500

Hardware crypto and fragmentation offload support added (Zhu Yi)

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:01:52 -04:00
James Ketrenos
20d64713ae [PATCH] ieee80211: Fixed a kernel oops on module unload
tree 367069f24fc38b4aa910e86ff40094d2078d8aa7
parent a33a198201
author James Ketrenos <jketreno@linux.intel.com> 1124430800 -0500
committer James Ketrenos <jketreno@linux.intel.com> 1127310571 -0500

Fixed a kernel oops on module unload by adding spin lock protection to
ieee80211's crypt handlers (thanks to Zhu Yi)

Modified scan result logic to report WPA and RSN IEs if set (vs.being
based on wpa_enabled)

Added ieee80211_device as the first parameter to the crypt init()
method.  TKIP modified to use that structure for determining whether to
countermeasures are active.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-21 23:01:52 -04:00
Jeff Garzik
a3536c839f Merge /spare/repo/linux-2.6/ 2005-09-21 22:34:08 -04:00
James Ketrenos
3bc5ed6842 [PATCH] ieee80211 Fixed type-o of abg_ture -> abg_true
[PATCH 14/29] Fixed type-o of abg_ture -> abg_true.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>

NOTE: This patch requires drivers using abg_ture to be updated.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-16 03:15:57 -04:00
James Ketrenos
7b1fa54020 [PATCH] ieee80211 Removed ieee80211_info_element_hdr
Removed ieee80211_info_element_hdr structure as ieee80211_info_element
provides the same use.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-16 03:10:56 -04:00
James Ketrenos
68e4e036b8 [PATCH] Changed 802.11 headers to use ieee80211_info_element[0]
Changed 802.11 headers to use ieee80211_info_element as zero sized
array so that sizeof calculations do not account for IE sizes.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-16 03:10:56 -04:00
James Ketrenos
74079fdce4 [PATCH] ieee80211 Added wireless spy support
Added wireless spy support to Rx code path.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>

NOTE:  Looks like scripts/Lindent generated output different
than the Lindented version already in-kernel, hence all the
whitespace deltas...  *sigh*
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-16 03:06:32 -04:00
James Ketrenos
b1b508e1b1 [PATCH] ieee80211 quality scaling algorithm extension handler
Incorporated Bill Moss' quality scaling algorithm into default wireless
extension handler.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-16 03:06:32 -04:00
Julian Anastasov
87375ab47c [IPVS]: ip_vs_ftp breaks connections using persistence
ip_vs_ftp when loaded can create NAT connections with unknown client
port for passive FTP. For such expectations we lookup with cport=0 on
incoming packet but it matches the format of the persistence templates
causing packets to other persistent virtual servers to be forwarded to
real server without creating connection. Later the reply packets are
treated as foreign and not SNAT-ed.

This patch changes the connection lookup for packets from clients:

* introduce IP_VS_CONN_F_TEMPLATE connection flag to mark the
  connection as template

* create new connection lookup function just for templates -
  ip_vs_ct_in_get

* make sure ip_vs_conn_in_get hits only connections with
  IP_VS_CONN_F_NO_CPORT flag set when s_port is 0. By this way
  we avoid returning template when looking for cport=0 (ftp)

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-14 21:08:51 -07:00
Adrian Bunk
7665a08928 [PATCH] drivers/net/wan/: possible cleanups
This patch contains possible cleanups including the following:
- make needlessly global code static
- #if 0 the following unused global function:
  - sdladrv.c: sdla_intde
- remove the following unused global variable:
  - lmc_media.c: lmc_t1_cables
- remove the following unneeded EXPORT_SYMBOL's:
  - cycx_drv.c: cycx_inten
  - sdladrv.c: sdla_inten
  - sdladrv.c: sdla_intde
  - sdladrv.c: sdla_intack
  - sdladrv.c: sdla_intr
  - syncppp.c: sppp_input
  - syncppp.c: sppp_change_mtu

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-14 08:36:54 -04:00
David S. Miller
ae01d2798d Merge master.kernel.org:/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2005-09-13 14:03:09 -07:00
Marcel Holtmann
21d9e30ed0 [Bluetooth] Add support for extended inquiry responses
This patch adds the handling of the extended inquiry responses and
inserts them into the inquiry cache.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-09-13 01:32:25 +02:00
Ralf Baechle
b88a762b60 [NETROM]: Introduct stuct nr_private
NET/ROM's virtual interfaces don't have a proper private data
structure yet.  Create struct nr_private and put the statistics there.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-12 14:28:03 -07:00
Ralf Baechle
e21ce8c7c0 [NETROM]: Implement G8PZT Circuit reset for NET/ROM
NET/ROM is lacking a connection reset like TCP's RST flag which at times
may result in a connecting having to slowly timing out instead of just being
reset.  An earlier attempt to reset the connection by sending a
NR_CONNACK | NR_CHOKE_FLAG transport was inacceptable as it did result in
crashes of BPQ systems.  An alternative approach of introducing a new
transport type 7 (NR_RESET) has be implemented several years ago in
Paula Jayne Dowie G8PZT's Xrouter.

Implement NR_RESET for Linux's NET/ROM but like any messing with the state
engine consider this experimental for now and thus control it by a sysctl
(net.netrom.reset) which for the time being defaults to off.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-12 14:27:37 -07:00
Ralf Baechle
b01ef8ffaf [AX.25]: Add descriptions to constants
Comment the names used for the AX.25 state machine.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-12 14:24:24 -07:00
Ralf Baechle
3bf0ae7b57 [AX.25]: Add more PIDs
Add a few more PID definitions.  AX.25 PIDs are the equivalent to IP
protocol numbers.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-12 14:22:30 -07:00
Ralf Baechle
6f74998e5c [AX.25]: Rename ax25_encapsulate to ax25_hard_header
Rename ax25_encapsulate to ax25_hard_header which these days more
accurately describes what the function is supposed to do.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-12 14:21:01 -07:00
Ralf Baechle
baed16a7ff [AX.25]: Make asc2ax() thread-proof
Asc2ax was still using a static buffer for all invocations which isn't
exactly SMP-safe.  Change asc2ax to take an additional result buffer as
the argument.  Change all callers to provide such a buffer.

This one only really is a fix for ROSE and as per recent discussions
there's still much more to fix in ROSE ...

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-08 13:40:41 -07:00
David S. Miller
2e66fc4116 Merge git://git.skbuff.net/gitroot/yoshfuji/linux-2.6-git-rfc3542 2005-09-08 12:59:43 -07:00
David S. Miller
e50ef933e6 [NET]: Need struct sock forward decl in net/compat.h
Else we get build failures like:

  CC      arch/sparc64/kernel/sparc64_ksyms.o
In file included from arch/sparc64/kernel/sparc64_ksyms.c:28:
include/net/compat.h:37: warning: "struct sock" declared inside parameter list
include/net/compat.h:37: warning: its scope is only this definition or declaration, which is probably not what you want

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-08 12:32:46 -07:00
Al Viro
8920e8f94c [PATCH] Fix 32bit sendmsg() flaw
When we copy 32bit ->msg_control contents to kernel, we walk the same
userland data twice without sanity checks on the second pass.

Second version of this patch: the original broke with 64-bit arches
running 32-bit-compat-mode executables doing sendmsg() syscalls with
unaligned CMSG data areas

Another thing is that we use kmalloc() to allocate and sock_kfree_s()
to free afterwards; less serious, but also needs fixing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-08 08:14:11 -07:00
YOSHIFUJI Hideaki
41a1f8ea4f [IPV6]: Support IPV6_{RECV,}TCLASS socket options / ancillary data.
Based on patch from David L Stevens <dlstevens@us.ibm.com>

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 10:19:03 +09:00
YOSHIFUJI Hideaki
333fad5364 [IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
  IPV6_RECVPKTINFO, IPV6_PKTINFO,
  IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
  IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
  IPV6_RECVRTHDR, IPV6_RTHDR,
  IPV6_RECVHOPOPTS, IPV6_HOPOPTS

Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
Linus Torvalds
55faed1e60 Merge branch 'upstream' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2005-09-07 17:22:43 -07:00
Jean Tourrilhes
6582c164f2 [PATCH] WE-19 for kernel 2.6.13
Hi Jeff,

	This is version 19 of the Wireless Extensions. It was supposed
to be the fallback of the WPA API changes, but people seem quite happy
about it (especially Jouni), so the patch is rather small.
	The patch has been fully tested with 2.6.13 and various
wireless drivers, and is in its final version. Would you mind pushing
that into Linus's kernel so that the driver and the apps can take
advantage ot it ?

	It includes :
	o iwstat improvement (explicit dBm). This is the result of
long discussions with Dan Williams, the authors of
NetworkManager. Thanks to him for all the fruitful feedback.
	o remove pointer from event stream. I was not totally sure if
this pointer was 32-64 bits clean, so I'd rather remove it and be at
peace with it.
	o remove linux header from wireless.h. This has long been
requested by people writting user space apps, now it's done, and it
was not even painful.
	o final deprecation of spy_offset. You did not like it, it's
now gone for good.
	o Start deprecating dev->get_wireless_stats -> debloat netdev
	o Add "check" version of event macros for ieee802.11
stack. Jiri Benc doesn't like the current macros, we aim to please ;-)
	All those changes, except the last one, have been bit-roting on
my web pages for a while...

	Patches for most kernel drivers will follow. Patches for the
Orinoco and the HostAP drivers have been sent to their respective
maintainers.

	Have fun...

	Jean
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-06 22:40:24 -04:00
Ralf Baechle
f75268cd6c [AX25]: Make ax2asc thread-proof
Ax2asc was still using a static buffer for all invocations which isn't
exactly SMP-safe.  Change ax2asc to take an additional result buffer as
the argument.  Change all callers to provide such a buffer.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-06 15:49:39 -07:00
David S. Miller
6baf1f417d [NET]: Do not protect sysctl_optmem_max with CONFIG_SYSCTL
The ipv4 and ipv6 protocols need to access it unconditionally.
SYSCTL=n build failure reported by Russell King.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-05 18:14:11 -07:00
Adrian Bunk
506e7beb74 [IRDA]: IrDA prototype fixes
Every file should #include the header files containing the prototypes
of it's global functions.

In this case this showed that the prototype of irlan_print_filter()
was wrong which is also corrected in this patch.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-05 18:08:11 -07:00
Linus Torvalds
48467641bc Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-09-05 00:11:50 -07:00
David S. Miller
6475be16fd [TCP]: Keep TSO enabled even during loss events.
All we need to do is resegment the queue so that
we record SACK information accurately.  The edges
of the SACK blocks guide our resegmenting decisions.

With help from Herbert Xu.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-01 22:47:01 -07:00
Herbert Xu
ef01578615 [TCP]: Fix sk_forward_alloc underflow in tcp_sendmsg
I've finally found a potential cause of the sk_forward_alloc underflows
that people have been reporting sporadically.

When tcp_sendmsg tacks on extra bits to an existing TCP_PAGE we don't
check sk_forward_alloc even though a large amount of time may have
elapsed since we allocated the page.  In the mean time someone could've
come along and liberated packets and reclaimed sk_forward_alloc memory.

This patch makes tcp_sendmsg check sk_forward_alloc every time as we
do in do_tcp_sendpages.
 
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-01 17:48:59 -07:00
Herbert Xu
d80d99d643 [NET]: Add sk_stream_wmem_schedule
This patch introduces sk_stream_wmem_schedule as a short-hand for
the sk_forward_alloc checking on egress.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-01 17:48:23 -07:00
Adrian Bunk
732db659b8 [IPVS]: "extern inline" -> "static inline"
"extern inline" doesn't make much sense.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-01 17:40:26 -07:00
Jeff Garzik
e3ee3b78f8 /spare/repo/netdev-2.6 branch 'master' 2005-09-01 18:02:01 -04:00
Stephen Hemminger
d8971fcb70 [INET]: compile errors when DEBUG is defined
Fix build problem found by compiling driver with DEBUG defined that used tcp.h.
Since pr_debug(arg) expands to printk("<7>" arg) the argument
needs to be string that can be concatenated.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 22:51:28 -07:00
Arnaldo Carvalho de Melo
dc40c7bc76 [ICSK]: Generalise tcp_listen_poll
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:05:32 -07:00
Adrian Bunk
8cd25c1fcf [NET]: fix PROC_FS=n compile
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:02:38 -07:00
David S. Miller
d179cd1292 [NET]: Implement SKB fast cloning.
Protocols that make extensive use of SKB cloning,
for example TCP, eat at least 2 allocations per
packet sent as a result.

To cut the kmalloc() count in half, we implement
a pre-allocation scheme wherein we allocate
2 sk_buff objects in advance, then use a simple
reference count to free up the memory at the
correct time.

Based upon an initial patch by Thomas Graf and
suggestions from Herbert Xu.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:01:54 -07:00
Arnaldo Carvalho de Melo
4c6ea29d82 [IP]: Introduce ip_options_get_from_user
This variant is needed to satisfy sparse __user annotations.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:01:39 -07:00
Arnaldo Carvalho de Melo
20380731bc [NET]: Fix sparse warnings
Of this type, mostly:

CHECK   net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:01:32 -07:00
Patrick McHardy
a61bbcf28a [NET]: Store skb->timestamp as offset to a base timestamp
Reduces skb size by 8 bytes on 64-bit.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:58:24 -07:00
Arnaldo Carvalho de Melo
17b085eace [INET_DIAG]: Move the tcp_diag interface to the proper place
With this the previous setup is back, i.e. tcp_diag can be built as a module,
as dccp_diag and both share the infrastructure available in inet_diag.

If one selects CONFIG_INET_DIAG as module CONFIG_INET_TCP_DIAG will also be
built as a module, as will CONFIG_INET_DCCP_DIAG, if CONFIG_IP_DCCP was
selected static or as a module, if CONFIG_INET_DIAG is y, being statically
linked CONFIG_INET_TCP_DIAG will follow suit and CONFIG_INET_DCCP_DIAG will be
built in the same manner as CONFIG_IP_DCCP.

Now to aim at UDP, converting it to use inet_hashinfo, so that we can use
iproute2 for UDP sockets as well.

Ah, just to show an example of this new infrastructure working for DCCP :-)

[root@qemu ~]# ./ss -dane
State      Recv-Q Send-Q Local Address:Port  Peer Address:Port
LISTEN     0      0                  *:5001             *:*     ino:942 sk:cfd503a0
ESTAB      0      0          127.0.0.1:5001     127.0.0.1:32770 ino:943 sk:cfd50a60
ESTAB      0      0          127.0.0.1:32770    127.0.0.1:5001  ino:947 sk:cfd50700
TIME-WAIT  0      0          127.0.0.1:32769    127.0.0.1:5001  timer:(timewait,3.430ms,0) ino:0 sk:cf209620

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:57:54 -07:00
Arnaldo Carvalho de Melo
73c1f4a033 [TCPDIAG]: Just rename everything to inet_diag
Next changeset will rename tcp_diag.[ch] to inet_diag.[ch].

I'm taking this longer route so as to easy review, making clear the changes
made all along the way.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:57:44 -07:00
Arnaldo Carvalho de Melo
5324a040cc [INET6_HASHTABLES]: Move inet6_lookup functions to net/ipv6/inet6_hashtables.c
Doing this we allow tcp_diag to support IPV6 even if tcp_diag is compiled
statically and IPV6 is compiled as a module, removing the previous restriction
while not building any IPV6 code if it is not selected.

Now to work on the tcpdiag_register infrastructure and then to rename the whole
thing to inetdiag, reflecting its by then completely generic nature.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:57:29 -07:00
Arnaldo Carvalho de Melo
505cbfc577 [IPV6]: Generalise the tcp_v6_lookup routines
In the same way as was done with the v4 counterparts, this will be moved
to inet6_hashtables.c.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:57:24 -07:00
Arnaldo Carvalho de Melo
6687e988d9 [ICSK]: Move TCP congestion avoidance members to icsk
This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(),
minimal renaming/moving done in this changeset to ease review.

Most of it is just changes of struct tcp_sock * to struct sock * parameters.

With this we move to a state closer to two interesting goals:

1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used
   for any INET transport protocol that has struct inet_hashinfo and are
   derived from struct inet_connection_sock. Keeps the userspace API, that will
   just not display DCCP sockets, while newer versions of tools can support
   DCCP.

2. INET generic transport pluggable Congestion Avoidance infrastructure, using
   the current TCP CA infrastructure with DCCP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:56:18 -07:00
Patrick McHardy
64ce207306 [NET]: Make NETDEBUG pure printk wrappers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:56:08 -07:00
Arnaldo Carvalho de Melo
696ab2d3bf [TIMEWAIT]: Move inet_timewait_death_row routines to net/ipv4/inet_timewait_sock.c
Also export the ones that will be used in the next changeset, when
DCCP uses this infrastructure.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:55:58 -07:00
Arnaldo Carvalho de Melo
295ff7edb8 [TIMEWAIT]: Introduce inet_timewait_death_row
That groups all of the tables and variables associated to the TCP timewait
schedulling/recycling/killing code, that now can be isolated from the TCP
specific code and used by other transport protocols, such as DCCP.

Next changeset will move this code to net/ipv4/inet_timewait_sock.c

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:55:48 -07:00
Marcel Holtmann
0d48d93947 [Bluetooth]: Move packet type into the SKB control buffer
This patch moves the usage of packet type into the SKB control
buffer. After this patch it is now possible to shrink the sk_buff
structure and redefine its pkt_type.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:55:13 -07:00
Victor Fusco
2eb25a6c34 [Bluetooth]: Fix sparse warnings (__nocast type)
This patch fixes the sparse warnings "implicit cast to nocast type"
for the priority or gfp_mask parameters of the memory allocations.

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:55:07 -07:00
J. Suter
3a5e903c09 [Bluetooth]: Implement RFCOMM remote port negotiation
This patch implements the remote port negotiation (RPN) of the RFCOMM
protocol for Bluetooth.

Signed-off-by: J. Suter <jsuter@hardwave.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:55:03 -07:00
Marcel Holtmann
85a1e930bf [Bluetooth]: Track page scan repetition mode changes
The HCI page scan repetition mode change event contains the actual
page scan repetition mode for the remote device. It is the same
value that is received from an inquiry response and it can be used
to make further reconnections faster.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:54:53 -07:00
Marcel Holtmann
45bb4bf08b [Bluetooth]: Workaround for inquiry results with RSSI and page scan mode
This patch implements a workaround for buggy Bluetooth 1.2 devices from
Silicon Wave. Their inquiry results with RSSI contain the page scan mode
field. This field was removed in the final Bluetooth 1.2 specification.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:54:47 -07:00
Arnaldo Carvalho de Melo
a019d6fe2b [ICSK]: Move generalised functions from tcp to inet_connection_sock
This also improves reqsk_queue_prune and renames it to
inet_csk_reqsk_queue_prune, as it deals with both inet_connection_sock
and inet_request_sock objects, not just with request_sock ones thus
belonging to inet_request_sock.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:49:50 -07:00
Arnaldo Carvalho de Melo
295f7324ff [ICSK]: Introduce reqsk_queue_prune from code in tcp_synack_timer
With this we're very close to getting all of the current TCP
refactorings in my dccp-2.6 tree merged, next changeset will export
some functions needed by the current DCCP code and then dccp-2.6.git
will be born!

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:49:29 -07:00
Arnaldo Carvalho de Melo
0a5578cf8e [ICSK]: Generalise tcp_listen_{start,stop}
This also moved inet_iif from tcp to inet_hashtables.h, as it is
needed by the inet_lookup callers, perhaps this needs a bit of
polishing, but for now seems fine.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:49:24 -07:00
Arnaldo Carvalho de Melo
9f1d2604c7 [ICSK]: Introduce inet_csk_clone
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:49:20 -07:00
Arnaldo Carvalho de Melo
3f421baa47 [NET]: Just move the inet_connection_sock function from tcp sources
Completing the previous changeset, this also generalises tcp_v4_synq_add,
renaming it to inet_csk_reqsk_queue_hash_add, already geing used in the
DCCP tree, which I plan to merge RSN.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:49:14 -07:00
Arnaldo Carvalho de Melo
463c84b97f [NET]: Introduce inet_connection_sock
This creates struct inet_connection_sock, moving members out of struct
tcp_sock that are shareable with other INET connection oriented
protocols, such as DCCP, that in my private tree already uses most of
these members.

The functions that operate on these members were renamed, using a
inet_csk_ prefix while not being moved yet to a new file, so as to
ease the review of these changes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:43:19 -07:00
Arnaldo Carvalho de Melo
87d11ceb9d [SOCK]: Introduce sk_clone
Out of tcp_create_openreq_child, will be used in
dccp_create_openreq_child, and is a nice sock function anyway.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:42:36 -07:00
Arnaldo Carvalho de Melo
c676270bcd [INET_TWSK]: Introduce inet_twsk_alloc
With the parts of tcp_time_wait that are not TCP specific, tcp_time_wait uses
it and so will dccp_time_wait.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:42:26 -07:00
Arnaldo Carvalho de Melo
e48c414ee6 [INET]: Generalise the TCP sock ID lookup routines
And also some TIME_WAIT functions.

[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 282955   13122    9312  305389   4a8ed net/ipv4/built-in.o
/tmp/after.size:  281566   13122    9312  304000   4a380 net/ipv4/built-in.o
[acme@toy net-2.6.14]$

I kept them still inlined, will uninline at some point to see what
would be the performance difference.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:42:18 -07:00
Arnaldo Carvalho de Melo
8feaf0c0a5 [INET]: Generalise tcp_tw_bucket, aka TIME_WAIT sockets
This paves the way to generalise the rest of the sock ID lookup
routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro
kernels (where IPv6 is always built as a module):

[root@qemu ~]# grep tw_sock /proc/slabinfo
tw_sock_TCPv6  0  0  128  31  1
tw_sock_TCP    0  0   96  41  1
[root@qemu ~]#

Now if a protocol wants to use the TIME_WAIT generic infrastructure it
only has to set the sk_prot->twsk_obj_size field with the size of its
inet_timewait_sock derived sock and proto_register will create
sk_prot->twsk_slab, for now its only for INET sockets, but we can
introduce timewait_sock later if some non INET transport protocolo
wants to use this stuff.

Next changesets will take advantage of this new infrastructure to
generalise even more TCP code.

[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 188646   11764    5068  205478   322a6 net/ipv4/built-in.o
/tmp/after.size:  188144   11764    5068  204976   320b0 net/ipv4/built-in.o
[acme@toy net-2.6.14]$

Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1
(qemu host)).

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:42:13 -07:00
Arnaldo Carvalho de Melo
33b6223190 [INET]: Generalise tcp_v4_lookup_listener
[acme@toy net-2.6.14]$ grep built-in /tmp/before /tmp/after
/tmp/before: 282560       13122    9312  304994   4a762 net/ipv4/built-in.o
/tmp/after:  282560       13122    9312  304994   4a762 net/ipv4/built-in.o

Will be used in DCCP, not exporting it right now not to get in Adrian
Bunk's exported-but-not-used-on-modules radar 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:42:08 -07:00
Arnaldo Carvalho de Melo
81849d106b [INET]: Generalise tcp_v4_hash & tcp_unhash
It really just makes the existing code be a helper function that
tcp_v4_hash and tcp_unhash uses, specifying the right inet_hashinfo,
tcp_hashinfo.

One thing I'll investigate at some point is to have the inet_hashinfo
pointer in sk_prot, so that we get all the hashtable information from
the sk pointer, this can lead to some extra indirections that may well
hurt performance/code size, we'll see. Ultimate idea would be that
sk_prot would provide _all_ the information about a protocol
implementation.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:42:02 -07:00
Arnaldo Carvalho de Melo
c752f0739f [TCP]: Move the tcp sock states to net/tcp_states.h
Lots of places just needs the states, not even linux/tcp.h, where this
enum was, needs it.

This speeds up development of the refactorings as less sources are
rebuilt when things get moved from net/tcp.h.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:41:54 -07:00
Arnaldo Carvalho de Melo
f3f05f7046 [INET]: Generalise the tcp_listen_ lock routines
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:41:49 -07:00
Arnaldo Carvalho de Melo
6e04e02165 [INET]: Move tcp_port_rover to inet_hashinfo
Also expose all of the tcp_hashinfo members, i.e. killing those
tcp_ehash, etc macros, this will more clearly expose already generic
functions and some that need just a bit of work to become generic, as
we'll see in the upcoming changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:41:44 -07:00
Arnaldo Carvalho de Melo
2d8c4ce519 [INET]: Generalise tcp_bind_hash & tcp_inherit_port
This required moving tcp_bucket_cachep to inet_hashinfo.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:40:29 -07:00
Arnaldo Carvalho de Melo
a55ebcc4c4 [INET]: Move bind_hash from tcp_sk to inet_sk
This should really be in a inet_connection_sock, but I'm leaving it
for a later optimization, when some more fields common to INET
transport protocols now in tcp_sk or inet_sk will be chunked out into
inet_connection_sock, for now its better to concentrate on getting the
changes in the core merged to leave the DCCP tree with only DCCP
specific code.

Next changesets will take advantage of this move to generalise things
like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later
receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in
the future, when tcp_tw_bucket gets transformed into the struct
timewait_sock hierarchy.

tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets
moved to sk_prot.

A cascade of incremental changes will ultimately make the tcp_lookup
functions be fully generic.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:38:48 -07:00
Arnaldo Carvalho de Melo
77d8bf9c62 [INET]: Move the TCP hashtable functions/structs to inet_hashtables.[ch]
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:38:39 -07:00
Arnaldo Carvalho de Melo
0f7ff9274e [INET]: Just rename the TCP hashtable functions/structs to inet_
This is to break down the complexity of the series of patches,
making it very clear that this one just does:

1. renames tcp_ prefixed hashtable functions and data structures that
   were already mostly generic to inet_ to share it with DCCP and
   other INET transport protocols.

2. Removes not used functions (__tb_head & tb_head)

3. Removes some leftover prototypes in the headers (tcp_bucket_unlock &
   tcp_v4_build_header)

Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can
make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port,
__tcp_put_port,  generic and get others like tcp_destroy_sock closer to generic
(tcp_orphan_count will go to sk->sk_prot to allow this).

Eventually most of these functions will be used passing the transport protocol
inet_hashinfo structure.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:38:32 -07:00
Arnaldo Carvalho de Melo
304a16180f [INET]: Move the TCP ehash functions to include/net/inet_hashtables.h
To be shared with DCCP (and others), this is the start of a series of patches
that will expose the already generic TCP hash table routines.

The few changes noticed when calling gcc -S before/after on a pentium4 were of
this type:

        movl    40(%esp), %edx
        cmpl    %esi, 472(%edx)
        je      .L168
-       pushl   $291
+       pushl   $272
        pushl   $.LC0
        pushl   $.LC1
        pushl   $.LC2

[acme@toy net-2.6.14]$ size net/ipv4/tcp_ipv4.before.o net/ipv4/tcp_ipv4.after.o
   text    data     bss     dec     hex filename
  17804     516     140   18460    481c net/ipv4/tcp_ipv4.before.o
  17804     516     140   18460    481c net/ipv4/tcp_ipv4.after.o

Holler if some weird architecture has issues with things like this 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:38:22 -07:00
Arnaldo Carvalho de Melo
32519f11d3 [INET]: Introduce inet_sk_rebuild_header
From tcp_v4_rebuild_header, that already was pretty generic, I only
needed to use sk->sk_protocol instead of the hardcoded IPPROTO_TCP and
establish the requirement that INET transport layer protocols that
want to use this function map TCP_SYN_SENT to its equivalent state.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:37:55 -07:00
Arnaldo Carvalho de Melo
6cbb0df788 [SOCK]: Introduce sk_setup_caps
From tcp_v4_setup_caps, that always is preceded by a call to
__sk_dst_set, so coalesce this sequence into sk_setup_caps, removing
one call to a TCP function in the IP layer.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:37:48 -07:00
Arnaldo Carvalho de Melo
614c6cb4f2 [SOCK]: Rename __tcp_v4_rehash to __sk_prot_rehash
This operation was already generic and DCCP will use it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:37:42 -07:00
Arnaldo Carvalho de Melo
e6848976b7 [NET]: Cleanup INET_REFCNT_DEBUG code
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:37:29 -07:00
Patrick McHardy
d13964f449 [IPV4/6]: Check if packet was actually delivered to a raw socket to decide whether to send an ICMP unreachable
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:37:22 -07:00
Andrew McDonald
0bd1b59b15 [IPV6]: Check interface bindings on IPv6 raw socket reception
Take account of whether a socket is bound to a particular device when
selecting an IPv6 raw socket to receive a packet. Also perform this
check when receiving IPv6 packets with router alert options.

Signed-off-by: Andrew McDonald <andrew@mcdonald.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:37:06 -07:00
David S. Miller
86e65da9c1 [NET]: Remove explicit initializations of skb->input_dev
Instead, set it in one place, namely the beginning of
netif_receive_skb().

Based upon suggestions from Jamal Hadi Salim.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:33:26 -07:00
Adrian Bunk
0742fd53a3 [IPV4]: possible cleanups
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 the following unused global function:
  - xfrm4_state.c: xfrm4_state_fini
- remove the following unneeded EXPORT_SYMBOL's:
  - ip_output.c: ip_finish_output
  - ip_output.c: sysctl_ip_default_ttl
  - fib_frontend.c: ip_dev_find
  - inetpeer.c: inet_peer_idlock
  - ip_options.c: ip_options_compile
  - ip_options.c: ip_options_undo
  - net/core/request_sock.c: sysctl_max_syn_backlog

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:33:20 -07:00
David S. Miller
f2ccd8fa06 [NET]: Kill skb->real_dev
Bonding just wants the device before the skb_bond()
decapsulation occurs, so simply pass that original
device into packet_type->func() as an argument.

It remains to be seen whether we can use this same
exact thing to get rid of skb->input_dev as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:32:25 -07:00
Arnaldo Carvalho de Melo
83e3609eba [REQSK]: Move the syn_table destroy from tcp_listen_stop to reqsk_queue_destroy
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:32:11 -07:00
Patrick McHardy
abc3bc5804 [NET]: Kill skb->tc_classid
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:31:18 -07:00
Jouni Malinen
91cb70c176 [PATCH] ieee80211: Fix debug comments ipw->ieee80211
Debug variables and procfs dir should be "ieee80211", not "ipw".

Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-08-28 19:23:07 -04:00
Jouni Malinen
51e828b6a1 [PATCH] ieee80211: Remove EAPOL debug
IEEE 802.11 code has no business touching payloads of EAPOL frames.
There are some EAPOL structures defined for debugging and these were
confusingly called EAP types which they are not. Let's just remove these
before someone else starts using them in the kernel.

Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-08-28 19:23:07 -04:00
Jouni Malinen
5f55d0850e [PATCH] ieee80211: Remove WIRELESS_EXT < 17 support
No need to maintain support for WIRELESS_EXT < 17 since this kernel
tree is already using WIRELESS_EXT 18.

Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-08-28 19:23:06 -04:00
Jiri Benc
099c5bb169 ieee80211: use endian-aware types
From: Michael Wu <flamingice@sourmilk.net>

This patch:
- fixes misc. whitespace/comments
- replaces u16 with __le16/__be16 where appropriate

Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
2005-08-25 20:15:10 -04:00
Jiri Benc
95d5185d1a ieee80211: convert defines to enums
From: Gertjan van Wingerde <gwingerde@home.nl>

Attached patch cleans up the long lists of #defines for status codes,
reason codes, and information elements.

Signed-off-by: Gertjan van Wingerde <gwingerde@home.nl>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
2005-08-25 20:13:04 -04:00
Jiri Benc
f13baae43e ieee80211: new constants from latest 802.11x specifications
From: Gertjan van Wingerde <gwingerde@home.nl>

Attached patch updates the definitions of the generic ieee80211 stack to
the latest versions of the published 802.11x specification suite.

Signed-off-by: Gertjan van Wingerde <gwingerde@home.nl>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
2005-08-25 20:11:46 -04:00
Jiri Benc
e88187eedc ieee80211: Puts debug macros together and makes escape_essid not inlined.
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: Jirka Bohac <jbohac@suse.cz>
2005-08-25 20:00:53 -04:00
Jeff Garzik
b2382b363d Merge upstream into ieee80211.
Hand-fix merge conflict in drivers/usb/net/zd1201.c.
2005-08-24 01:02:04 -04:00
Ralf Baechle
01d7dd0e9f [AX25]: UID fixes
o Brown paperbag bug - ax25_findbyuid() was always returning a NULL pointer
   as the result.  Breaks ROSE completly and AX.25 if UID policy set to deny.

 o While the list structure of AX.25's UID to callsign mapping table was
   properly protected by a spinlock, it's elements were not refcounted
   resulting in a race between removal and usage of an element.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:11:45 -07:00
Ralf Baechle
53b924b31f [NET]: Fix socket bitop damage
The socket flag cleanups that went into 2.6.12-rc1 are basically oring
the flags of an old socket into the socket just being created.
Unfortunately that one was just initialized by sock_init_data(), so already
has SOCK_ZAPPED set.  As the result zapped sockets are created and all
incoming connection will fail due to this bug which again was carefully
replicated to at least AX.25, NET/ROM or ROSE.

In order to keep the abstraction alive I've introduced sock_copy_flags()
to copy the socket flags from one sockets to another and used that
instead of the bitwise copy thing.  Anyway, the idea here has probably
been to copy all flags, so sock_copy_flags() should be the right thing.
With this the ham radio protocols are usable again, so I hope this will
make it into 2.6.13.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:11:30 -07:00
Jeff Garzik
1b5cca3a88 ieee80211: remove last uses of compat define WLAN_CAPABILITY_BSS 2005-08-15 00:32:15 -04:00
Jouni Malinen
2474385e5b [PATCH] ieee80211: Capability field is called ESS, not BSS
IEEE 802.11 has a capability field flag called ESS, but ieee80211 had
renamed this to BSS for some reason. hostap has been using
WLAN_CAPABILITY_ESS and since that matches with the standard, lets use
it as the name for this define. Add WLAN_CAPABILITY_BSS as a backwards
compatibility name for the same bit since ieee80211 and ipw2200 are
using this and there are versions outside kernel tree that expect to
find this define name.

Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-08-15 00:28:17 -04:00
Jeff Garzik
4c0e176dd5 Merge /spare/repo/linux-2.6/ 2005-08-14 23:10:00 -04:00
Jouni Malinen
f241be74b8 [PATCH] ieee80211: Fix frame control pver mask
IEEE 802.11 frame control has two bits reserved for protocol
version. IEEE80211_FCTL_VERS was not used anywhere, but I would assume
it was supposed to be a mask for the protocol field and as such, it
should be 0x0003, not 0x0002. This matches with WLAN_FC_PVER
definition in hostap.

Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-08-14 23:09:03 -04:00
Marcel Holtmann
66e8b6c31b [Bluetooth] Remove unused functions and cleanup symbol exports
This patch removes the unused bt_dump() function and it also removes
its BT_DMP macro. It also unexports the hci_dev_get(), hci_send_cmd()
and hci_si_event() functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06 12:36:51 +02:00
Adrian Bunk
b7721ff96f [PATCH] include/net/ieee80211.h must #include <linux/wireless.h>
-Wundef found an (although perhaps harmless) bug:

<--  snip  -->

...
  CC      net/ieee80211/ieee80211_crypt.o
In file included from net/ieee80211/ieee80211_crypt.c:21:
include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined
  CC      net/ieee80211/ieee80211_crypt_wep.o
In file included from net/ieee80211/ieee80211_crypt_wep.c:20:
include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined
  CC      net/ieee80211/ieee80211_crypt_ccmp.o
  CC      net/ieee80211/ieee80211_crypt_tkip.o
In file included from net/ieee80211/ieee80211_crypt_tkip.c:23:
include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined
...

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-07-31 00:44:10 -04:00
Baruch Even
d1b04c081e [NET]: Spelling mistakes threshoulds -> thresholds
Just simple spelling mistake fixes.

Signed-Off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:41:59 -07:00
Jeff Garzik
a670fcb43f /spare/repo/netdev-2.6 branch 'master' 2005-07-30 18:14:15 -04:00
Patrick McHardy
0303770deb [NET]: Make ipip/ip6_tunnel independant of XFRM
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:03:34 -07:00
Sridhar Samudrala
d1ad1ff299 [SCTP]: Fix potential null pointer dereference while handling an icmp error
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:44:10 -07:00
Jeff Garzik
327309e899 Merge upstream 2.6.13-rc3 into ieee80211 branch of netdev-2.6. 2005-07-13 16:23:51 -04:00
Alexey Dobriyan
ab611487d8 [NET]: __be'ify *_type_trans()
tr_type_trans(), hippi_type_trans() left as-is.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12 12:08:43 -07:00
Alexey Dobriyan
3182cd84f0 [SCTP]: __nocast annotations
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 20:57:47 -07:00
David S. Miller
79af02c253 [SCTP]: Use struct list_head for chunk lists, not sk_buff_head.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 21:47:49 -07:00
Victor Fusco
86a76caf87 [NET]: Fix sparse warnings
From: Victor Fusco <victor@cetuc.puc-rio.br>

Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 14:57:47 -07:00
David S. Miller
b03efcfb21 [NET]: Transform skb_queue_len() binary tests into skb_queue_empty()
This is part of the grand scheme to eliminate the qlen
member of skb_queue_head, and subsequently remove the
'list' member of sk_buff.

Most users of skb_queue_len() want to know if the queue is
empty or not, and that's trivially done with skb_queue_empty()
which doesn't use the skb_queue_head->qlen member and instead
uses the queue list emptyness as the test.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 14:57:23 -07:00
David S. Miller
c1b4a7e695 [TCP]: Move to new TSO segmenting scheme.
Make TSO segment transmit size decisions at send time not earlier.

The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.

This is guided by tp->xmit_size_goal.  It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.

Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant.  These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.

A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side.  Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:24:38 -07:00
David S. Miller
55c97f3e99 [TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling.
'nonagle' should be passed to the tcp_snd_test() function
as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the
tail of the write_queue.  This is because Nagle does not
apply to such frames since we cannot possibly tack more
data onto them.

However, while doing this __tcp_push_pending_frames() makes
all of the packets in the write_queue use this modified
'nonagle' value.

Fix the bug and simplify this function by just calling
tcp_write_xmit() directly if sk_send_head is non-NULL.

As a result, we can now make tcp_data_snd_check() just call
tcp_push_pending_frames() instead of the specialized
__tcp_data_snd_check().

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:38 -07:00
David S. Miller
a2e2a59c93 [TCP]: Fix redundant calculations of tcp_current_mss()
tcp_write_xmit() uses tcp_current_mss(), but some of it's callers,
namely __tcp_push_pending_frames(), already has this value available
already.

While we're here, fix the "cur_mss" argument to be "unsigned int"
instead of plain "unsigned".

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:23 -07:00
David S. Miller
a762a98007 [TCP]: Kill extra cwnd validate in __tcp_push_pending_frames().
The tcp_cwnd_validate() function should only be invoked
if we actually send some frames, yet __tcp_push_pending_frames()
will always invoke it.  tcp_write_xmit() does the call for us,
so the call here can simply be removed.

Also, tcp_write_xmit() can be marked static.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:51 -07:00
David S. Miller
84d3e7b957 [TCP]: Move __tcp_data_snd_check into tcp_output.c
It reimplements portions of tcp_snd_check(), so it
we move it to tcp_output.c we can consolidate it's
logic much easier in a later change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:18 -07:00
David S. Miller
f6302d1d78 [TCP]: Move send test logic out of net/tcp.h
This just moves the code into tcp_output.c, no code logic changes are
made by this patch.

Using this as a baseline, we can begin to untangle the mess of
comparisons for the Nagle test et al.  We will also be able to reduce
all of the redundant computation that occurs when outputting data
packets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:03 -07:00
David S. Miller
fc6415bcb0 [TCP]: Fix quick-ack decrementing with TSO.
On each packet output, we call tcp_dec_quickack_mode()
if the ACK flag is set.  It drops tp->ack.quick until
it hits zero, at which time we deflate the ATO value.

When doing TSO, we are emitting multiple packets with
ACK set, so we should decrement tp->ack.quick that many
segments.

Note that, unlike this case, tcp_enter_cwr() should not
take the tcp_skb_pcount(skb) into consideration.  That
function, one time, readjusts tp->snd_cwnd and moves
into TCP_CA_CWR state.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:17:45 -07:00
David S. Miller
c65f7f00c5 [TCP]: Simplify SKB data portion allocation with NETIF_F_SG.
The ideal and most optimal layout for an SKB when doing
scatter-gather is to put all the headers at skb->data, and
all the user data in the page array.

This makes SKB splitting and combining extremely simple,
especially before a packet goes onto the wire the first
time.

So, when sk_stream_alloc_pskb() is given a zero size, make
sure there is no skb_tailroom().  This is achieved by applying
SKB_DATA_ALIGN() to the header length used here.

Next, make select_size() in TCP output segmentation use a
length of zero when NETIF_F_SG is true on the outgoing
interface.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:17:25 -07:00
Alexey Dobriyan
b8259d9ad1 [NET]: Remove __ARGS from include/net/slhc_vj.h
I suspect "#define __ARGS(x) ()" was deprecated before I was born.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:12:04 -07:00
Thomas Graf
3d54b82fdf [PKT_SCHED]: Cleanup qdisc creation and alignment macros
Adds qdisc_alloc() to share code between qdisc_create()
and qdisc_create_dflt(). Hides the qdisc alignment behind
macros and makes use of them.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:15:09 -07:00
Thomas Graf
e41a33e6ec [PKT_SCHED]: Move sch_generic.c prototypes to correct header file
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:14:30 -07:00
Jeff Garzik
0c16877570 Merge upstream 2.6.13-rc1-git1 into 'ieee80211' branch of netdev-2.6. 2005-06-30 00:49:18 -04:00
Jeff Garzik
b9a05d1d51 [PATCH] ieee80211.h build fix
This crept in with the resync-to-mainline.  Nothing uses 802.11-crypt in
mainline, so we can safely comment it out for now.

Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 22:45:32 -07:00
YOSHIFUJI Hideaki
7fe40f73d7 [IPV6]: remove more unused IPV6_AUTHHDR things.
Remove two more unused IPV6_AUTHHDR option things, 
which I failed to remove them last time,
plus, mark IPV6_AUTHHDR obsolete.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 15:46:24 -07:00
Vlad Yasevich
2f85a42964 [SCTP] Make init & delayed sack timeouts configurable by user.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 13:24:23 -07:00
Jeff Garzik
245ac8738b Merge upstream net/ieee80211.h changes into 'ieee80211' branch. 2005-06-27 22:49:47 -04:00
Jeff Garzik
a5fe736eaf Update is_multicast_ether_addr() definition; net/ieee80211.h cleanups. 2005-06-27 22:47:18 -04:00
Christoph Hellwig
279385949e [PATCH] bring over ieee80211.h from mainline
the prototypes and inlines aren't actually needed, but let's not diverge
from -mm too far.
2005-06-27 00:23:54 -04:00
Jeff Garzik
5696c1944a Merge /spare/repo/linux-2.6/ 2005-06-26 23:38:58 -04:00
Linus Torvalds
59a49e3871 Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2005-06-24 00:31:46 -07:00
Adrian Bunk
52c1da3953 [PATCH] make various thing static
Another rollup of patches which give various symbols static scope

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 00:06:43 -07:00
David S. Miller
a8acfbac75 [TCP]: Need to declare 'tcp_reno' in net/tcp.h
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 23:45:02 -07:00
Stephen Hemminger
5f8ef48d24 [TCP]: Allow choosing TCP congestion control via sockopt.
Allow using setsockopt to set TCP congestion control to use on a per
socket basis.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 20:37:36 -07:00
Stephen Hemminger
317a76f9a4 [TCP]: Add pluggable congestion control algorithm infrastructure.
Allow TCP to have multiple pluggable congestion control algorithms.
Algorithms are defined by a set of operations and can be built in
or modules.  The legacy "new RENO" algorithm is used as a starting
point and fallback.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:19:55 -07:00
Shaun Pereira
ebc3f64b86 [X25]: Fast select with no restriction on response
This patch is a follow up to patch 1 regarding "Selective Sub Address
matching with call user data".  It allows use of the Fast-Select-Acceptance
optional user facility for X.25.

This patch just implements fast select with no restriction on response
(NRR).  What this means (according to ITU-T Recomendation 10/96 section
6.16) is that if in an incoming call packet, the relevant facility bits are
set for fast-select-NRR, then the called DTE can issue a direct response to
the incoming packet using a call-accepted packet that contains
call-user-data.  This patch allows such a response.  

The called DTE can also respond with a clear-request packet that contains
call-user-data.  However, this feature is currently not implemented by the
patch.

How is Fast Select Acceptance used?
By default, the system does not allow fast select acceptance (as before).
To enable a response to fast select acceptance,  
After a listen socket in created and bound as follows
	socket(AF_X25, SOCK_SEQPACKET, 0);
	bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
but before a listen system call is made, the following ioctl should be used.
	ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
Now the listen system call can be made
	listen(call_soc, 4);
After this, an incoming-call packet will be accepted, but no call-accepted 
packet will be sent back until the following system call is made on the socket
that accepts the call
	ioctl(vc_soc,SIOCX25SENDCALLACCPT);
The network (or cisco xot router used for testing here) will allow the 
application server's call-user-data in the call-accepted packet, 
provided the call-request was made with Fast-select NRR.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:16:17 -07:00
Shaun Pereira
cb65d506c3 [X25]: Selective sub-address matching with call user data.
From: Shaun Pereira <spereira@tusc.com.au>

This is the first (independent of the second) patch of two that I am
working on with x25 on linux (tested with xot on a cisco router).  Details
are as follows.

Current state of module:

A server using the current implementation (2.6.11.7) of the x25 module will
accept a call request/ incoming call packet at the listening x.25 address,
from all callers to that address, as long as NO call user data is present
in the packet header.

If the server needs to choose to accept a particular call request/ incoming
call packet arriving at its listening x25 address, then the kernel has to
allow a match of call user data present in the call request packet with its
own.  This is required when multiple servers listen at the same x25 address
and device interface.  The kernel currently matches ALL call user data, if
present.

Current Changes:

This patch is a follow up to the patch submitted previously by Andrew
Hendry, and allows the user to selectively control the number of octets of
call user data in the call request packet, that the kernel will match.  By
default no call user data is matched, even if call user data is present. 
To allow call user data matching, a cudmatchlength > 0 has to be passed
into the kernel after which the passed number of octets will be matched. 
Otherwise the kernel behavior is exactly as the original implementation.

This patch also ensures that as is normally the case, no call user data
will be present in the Call accepted / call connected packet sent back to
the caller 

Future Changes on next patch:

There are cases however when call user data may be present in the call
accepted packet.  According to the X.25 recommendation (ITU-T 10/96)
section 5.2.3.2 call user data may be present in the call accepted packet
provided the fast select facility is used.  My next patch will include this
fast select utility and the ability to send up to 128 octets call user data
in the call accepted packet provided the fast select facility is used.  I
am currently testing this, again with xot on linux and cisco.  

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>

(With a fix from Alexey Dobriyan <adobriyan@gmail.com>)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:15:01 -07:00
Ingo Molnar
39c715b717 [PATCH] smp_processor_id() cleanup
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.

The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.

Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.

In the new code, there are two externally visible symbols:

 - smp_processor_id(): debug variant.

 - raw_smp_processor_id(): nondebug variant. Replaces all existing
   uses of _smp_processor_id() and __smp_processor_id(). Defined
   by every SMP architecture in include/asm-*/smp.h.

There is one new internal symbol, dependent on DEBUG_PREEMPT:

 - debug_smp_processor_id(): internal debug variant, mapped to
                             smp_processor_id().

Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file.  All related comments got updated and/or
clarified.

I have build/boot tested the following 8 .config combinations on x86:

 {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT.  (Other
architectures are untested, but should work just fine.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:13 -07:00
Jamal Hadi Salim
0d51aa80a9 [IPV6]: V6 route events reported with wrong netlink PID and seq number
Essentially netlink at the moment always reports a pid and sequence of 0
always for v6 route activities. 
To understand the repurcassions of this look at:
http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html

While fixing this, i took the liberty to resolve the outstanding issue
of IPV6 routes inserted via ioctls to have the correct pids as well.

This patch tries to behave as close as possible to the v4 routes i.e
maintains whatever PID the socket issuing the command owns as opposed to
the process. That made the patch a little bulky.

I have tested against both netlink derived utility to add/del routes as
well as ioctl derived one. The Quagga folks have tested against quagga.
This fixes the problem and so far hasnt been detected to introduce any
new issues.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 13:51:04 -07:00
Robert Olsson
246955fe4c [NETLINK]: fib_lookup() via netlink
Below is a more generic patch to do fib_lookup via netlink. For others 
we should say that we discussed this as a way to verify route selection.
It's also possible there are others uses for this.

In short the fist half of struct fib_result_nl is filled in by caller 
and netlink call fills in the other half and returns it.

In case anyone is interested there is a corresponding user app to compare 
the full routing table this was used to test implementation of the LC-trie. 

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:36:39 -07:00
Alexey Dobriyan
f852640e74 [AX25]: endian-annotate ax25_type_trans()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:31:11 -07:00
Herbert Xu
d094cd83c0 [IPSEC]: Add xfrm_state_afinfo->init_flags
This patch adds the xfrm_state_afinfo->init_flags hook which allows
each address family to perform any common initialisation that does
not require a corresponding destructor call.

It will be used subsequently to set the XFRM_STATE_NOPMTUDISC flag
in IPv4.

It also fixes up the error codes returned by xfrm_init_state.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:19:41 -07:00
Herbert Xu
72cb6962a9 [IPSEC]: Add xfrm_init_state
This patch adds xfrm_init_state which is simply a wrapper that calls
xfrm_get_type and subsequently x->type->init_state.  It also gets rid
of the unused args argument.

Abstracting it out allows us to add common initialisation code, e.g.,
to set family-specific flags.

The add_time setting in xfrm_user.c was deleted because it's already
set by xfrm_state_alloc.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:18:08 -07:00
Frank Filz
3f7a87d2fa [SCTP] sctp_connectx() API support
Implements sctp_connectx() as defined in the SCTP sockets API draft by
tunneling the request through a setsockopt().

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:14:57 -07:00
Thomas Graf
9972b25d0c [PKT_SCHED]: Generic queue management interface for qdiscs using internal skb queues
Implements an interface to be used by leaf qdiscs maintaining an internal
skb queue. The interface maintains a backlog in bytes additionaly
to the skb_queue_len() maintained by the queue itself. Relevant statistics
get incremented automatically. Every function comes in two variants, one
assuming Qdisc->q is used as queue and the second taking a sk_buff_head
as argument. Be aware that, if you use multiple queues, you still have to
maintain the Qdisc->q.qlen counter yourself.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:57:26 -07:00
Thomas Graf
88121aea7b [NEIGHBOUR]: Remove unused fields in struct neigh_parms and neigh_table
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:51:12 -07:00
Thomas Graf
c7fb64db00 [NETLINK]: Neighbour table configuration and statistics via rtnetlink
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
NLM_F_DUMP flag set. Every neighbour table configuration is
spread over multiple messages to avoid running into message
size limits on systems with many interfaces. The first message
in the sequence transports all not device specific data such as
statistics, configuration, and the default parameter set.
This message is followed by 0..n messages carrying device
specific parameter sets.

Although the ordering should be sufficient, NDTA_NAME can be
used to identify sequences. The initial message can be identified
by checking for NDTA_CONFIG. The device specific messages do
not contain this TLV but have NDTPA_IFINDEX set to the
corresponding interface index.

To change neighbour table attributes, send RTM_SETNEIGHTBL
with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
otherwise. Device specific parameter sets can be changed by
setting NDTPA_IFINDEX to the interface index of the corresponding
device.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:50:55 -07:00
David S. Miller
e52c1f17e4 [NET]: Move sysctl_max_syn_backlog into request_sock.c
This fixes the CONFIG_INET=n build failure noticed
by Andrew Morton.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:49:40 -07:00
Arnaldo Carvalho de Melo
2ad69c55a2 [NET] rename struct tcp_listen_opt to struct listen_sock
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:48:55 -07:00
Arnaldo Carvalho de Melo
0e87506fcc [NET] Generalise tcp_listen_opt
This chunks out the accept_queue and tcp_listen_opt code and moves
them to net/core/request_sock.c and include/net/request_sock.h, to
make it useful for other transport protocols, DCCP being the first one
to use it.

Next patches will rename tcp_listen_opt to accept_sock and remove the
inline tcp functions that just call a reqsk_queue_ function.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:47:59 -07:00
Arnaldo Carvalho de Melo
60236fdd08 [NET] Rename open_request to request_sock
Ok, this one just renames some stuff to have a better namespace and to
dissassociate it from TCP:

struct open_request  -> struct request_sock
tcp_openreq_alloc    -> reqsk_alloc
tcp_openreq_free     -> reqsk_free
tcp_openreq_fastfree -> __reqsk_free

With this most of the infrastructure closely resembles a struct
sock methods subset.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:47:21 -07:00
Arnaldo Carvalho de Melo
2e6599cb89 [NET] Generalise TCP's struct open_request minisock infrastructure
Kept this first changeset minimal, without changing existing names to
ease peer review.

Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:

->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
  a specific protocol

The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.

I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.

Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)

Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:46:52 -07:00
Herbert Xu
f60f6b8f70 [IPSEC] Use XFRM_MSG_* instead of XFRM_SAP_*
This patch removes XFRM_SAP_* and converts them over to XFRM_MSG_*.
The netlink interface is meant to map directly onto the underlying
xfrm subsystem.  Therefore rather than using a new independent
representation for the events we can simply use the existing ones
from xfrm_user.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:44:37 -07:00
Herbert Xu
bf08867f91 [IPSEC] Turn km_event.data into a union
This patch turns km_event.data into a union.  This makes code that
uses it clearer.
  
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:44:00 -07:00
Herbert Xu
4666faab09 [IPSEC] Kill spurious hard expire messages
This patch ensures that the hard state/policy expire notifications are
only sent when the state/policy is successfully removed from their
respective tables.

As it is, it's possible for a state/policy to both expire through
reaching a hard limit, as well as being deleted by the user.

Note that this behaviour isn't actually forbidden by RFC 2367.
However, it is a quality of implementation issue.

As an added bonus, the restructuring in this patch will help
eventually in moving the expire notifications from softirq
context into process context, thus improving their reliability.

One important side-effect from this change is that SAs reaching
their hard byte/packet limits are now deleted immediately, just
like SAs that have reached their hard time limits.

Previously they were announced immediately but only deleted after
30 seconds.

This is bad because it prevents the system from issuing an ACQUIRE
command until the existing state was deleted by the user or expires
after the time is up.

In the scenario where the expire notification was lost this introduces
a 30 second delay into the system for no good reason.
 
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:43:22 -07:00
Jamal Hadi Salim
26b15dad9f [IPSEC] Add complete xfrm event notification
Heres the final patch.
What this patch provides

- netlink xfrm events
- ability to have events generated by netlink propagated to pfkey
  and vice versa.
- fixes the acquire lets-be-happy-with-one-success issue

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:42:13 -07:00
Neil Horman
cdac4e0774 [SCTP] Add support for ip_nonlocal_bind sysctl & IP_FREEBIND socket option
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:12:33 -07:00
Pravin B. Shelar
37e20a66db [IPV4]: Kill MULTIPATHHOLDROUTE flag.
It cannot work properly, so just ignore it in drr
and rr multipath algorithms just like the random
multipath algorithm does.

Suggested by Herbert Xu.

Signed-off by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29 20:26:44 -07:00
Jiri Benc
286d974797 [PATCH] ieee80211: cleanup
Cleanup of unused and duplicated constants and structures in the ieee80211
header.

Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: Jirka Bohac <jbohac@suse.cz>
2005-05-27 22:43:30 -04:00
ff0e0ea2f5 Automatic merge of /spare/repo/netdev-2.6 branch we18 2005-05-27 22:07:40 -04:00
Hideaki YOSHIFUJI
92d63decc0 From: Kazunori Miyazawa <kazunori@miyazawa.org>
[XFRM] Call dst_check() with appropriate cookie

This fixes infinite loop issue with IPv6 tunnel mode.

Signed-off-by: Kazunori Miyazawa <kazunori@miyazawa.org>
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:58:04 -07:00
Jamal Hadi Salim
1eda339e76 [PKT_SCHED]: Fixup simple action define.
Make it consistent with other net/sched files

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 12:42:39 -07:00
Jeff Garzik
b453872c35 [NET] ieee80211 subsystem
Contributors:
Host AP contributors
James Ketrenos <jketreno@linux.intel.com>
Francois Romieu <romieu@fr.zoreil.com>
Adrian Bunk <bunk@stusta.de>
Matthew Galgoci <mgalgoci@parcelfarce.linux.th
eplanet.co.uk>
2005-05-12 22:48:20 -04:00
Jesper Juhl
02c30a84e6 [PATCH] update Ross Biro bouncing email address
Ross moved.  Remove the bad email address so people will find the correct
one in ./CREDITS.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:49 -07:00
Arnaldo Carvalho de Melo
476e19cfa1 [IPV6]: Fix OOPS when using IPV6_ADDRFORM
This causes sk->sk_prot to change, which makes the socket
release free the sock into the wrong SLAB cache.  Fix this
by introducing sk_prot_creator so that we always remember
where the sock came from.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-05 13:35:15 -07:00
Patrick McHardy
6a800d456a [IPV6]: net/addrconf.h needs to include linux/in6.h earlier
Else the in6_addr layout is not known for struct
prefix_info.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2005-05-03 22:17:18 -07:00
Herbert Xu
aabc9761b6 [IPSEC]: Store idev entries
I found a bug that stopped IPsec/IPv6 from working.  About
a month ago IPv6 started using rt6i_idev->dev on the cached socket dst
entries.  If the cached socket dst entry is IPsec, then rt6i_idev will
be NULL.

Since we want to look at the rt6i_idev of the original route in this
case, the easiest fix is to store rt6i_idev in the IPsec dst entry just
as we do for a number of other IPv6 route attributes.  Unfortunately
this means that we need some new code to handle the references to
rt6i_idev.  That's why this patch is bigger than it would otherwise be.

I've also done the same thing for IPv4 since it is conceivable that
once these idev attributes start getting used for accounting, we
probably need to dereference them for IPv4 IPsec entries too.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 16:27:10 -07:00
Patrick McHardy
9dfa277f88 [PKT_SCHED]: Fix range in PSCHED_TDIFF_SAFE to 0..bound
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 14:41:18 -07:00
Herbert Xu
e4553eddae [IPV6]: Include ipv6.h for ipv6_addr_set
This patch includes net/ipv6.h from addrconf.h since it needs
ipv6_addr_set.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 14:25:13 -07:00
Martin Waitz
67be2dd1ba [PATCH] DocBook: fix some descriptions
Some KernelDoc descriptions are updated to match the current code.
No code changes.

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:26 -07:00
Pavel Pisa
4dc3b16ba1 [PATCH] DocBook: changes and extensions to the kernel documentation
I have recompiled Linux kernel 2.6.11.5 documentation for me and our
university students again.  The documentation could be extended for more
sources which are equipped by structured comments for recent 2.6 kernels.  I
have tried to proceed with that task.  I have done that more times from 2.6.0
time and it gets boring to do same changes again and again.  Linux kernel
compiles after changes for i386 and ARM targets.  I have added references to
some more files into kernel-api book, I have added some section names as well.
 So please, check that changes do not break something and that categories are
not too much skewed.

I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
by kernel convention.  Most of the other changes are modifications in the
comments to make kernel-doc happy, accept some parameters description and do
not bail out on errors.  Changed <pid> to @pid in the description, moved some
#ifdef before comments to correct function to comments bindings, etc.

You can see result of the modified documentation build at
  http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz

Some more sources are ready to be included into kernel-doc generated
documentation.  Sources has been added into kernel-api for now.  Some more
section names added and probably some more chaos introduced as result of quick
cleanup work.

Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:25 -07:00
Nicolas Dichtel
7b3c63ac7c [PKT_SCHED]: Fix range in psched_tod_diff() to 0..bound
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28 12:14:37 -07:00
Neil Horman
4eb701dfc6 [SCTP] Fix SCTP sendbuffer accouting.
- Include chunk and skb sizes in sendbuffer accounting.
- 2 policies are supported. 0: per socket accouting, 1: per association
  accounting

DaveM: I've made the default per-socket.

Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28 12:02:04 -07:00
Jerome Forissier
047a2428a1 [SCTP] Implement Sec 2.41 of SCTP Implementers guide.
- Fixed sctp_vtag_verify_either() to comply with impguide 2.41 B) and C).
- Make sure vtag is reflected when T-bit is set in SHUTDOWN-COMPLETE sent
  due to an OOTB SHUTDOWN-ACK and in ABORT sent due to an OOTB packet.
- Do not set T-Bit in ABORT chunk in response to INIT.
- Fixed some comments to reflect the new meaning of the T-Bit.

Signed-off-by: Jerome Forissier <jerome.forissier@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28 11:58:43 -07:00
Herbert Xu
0d3d077cd4 [SELINUX]: Fix ipv6_skip_exthdr() invocation causing OOPS.
The SELinux hooks invoke ipv6_skip_exthdr() with an incorrect
length final argument.  However, the length argument turns out
to be superfluous.

I was just reading ipv6_skip_exthdr and it occured to me that we can
get rid of len altogether.  The only place where len is used is to
check whether the skb has two bytes for ipv6_opt_hdr.  This check
is done by skb_header_pointer/skb_copy_bits anyway.

Now it might appear that we've made the code slower by deferring
the check to skb_copy_bits.  However, this check should not trigger
in the common case so this is OK.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24 20:16:19 -07:00
Jamal Hadi Salim
db75307979 [PKT_SCHED]: Introduce simple actions.
And provide an example simply action in order to
demonstrate usage.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24 20:10:16 -07:00
David S. Miller
d5ac99a648 [TCP]: skb pcount with MTU discovery
The problem is that when doing MTU discovery, the too-large segments in
the write queue will be calculated as having a pcount of >1.  When
tcp_write_xmit() is trying to send, tcp_snd_test() fails the cwnd test
when pcount > cwnd.

The segments are eventually transmitted one at a time by keepalive, but
this can take a long time.

This patch checks if TSO is enabled when setting pcount.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24 19:12:33 -07:00
Arnaldo Carvalho de Melo
56cb515628 [AX25] Introduce ax25_type_trans
Replacing the open coded equivalents and making ax25 look more like
a linux network protocol, i.e. more similar to inet.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24 18:53:06 -07:00
Arnaldo Carvalho de Melo
29c4be51e3 [AX25]: make ax25_queue_xmit a net_device parameter
I.e. not using skb->dev as a way to pass the parameter used to fill...
skb->dev :-)

Also to get the _type_trans open coded sequence grouped, next changesets
will introduce ax25_type_trans.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-21 16:46:56 -07:00
Herbert Xu
c4d541106b [NET]: Shave sizeof(ptr) bytes off dst_entry
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-19 20:46:37 -07:00
Andi Kleen
b6d9a5d81c [PATCH] x86_64: Make IRDA devices are not really ISA devices not depend on CONFIG_ISA
This allows to use them on x86-64

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:56 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00