This patch adds the basic infrastructure to support user-space
expectation helpers via ctnetlink and the netfilter queuing
infrastructure NFQUEUE. Basically, this patch:
* adds NF_CT_EXPECT_USERSPACE flag to identify user-space
created expectations. I have also added a sanity check in
__nf_ct_expect_check() to avoid that kernel-space helpers
may create an expectation if the master conntrack has no
helper assigned.
* adds some branches to check if the master conntrack helper
exists, otherwise we skip the code that refers to kernel-space
helper such as the local expectation list and the expectation
policy.
* allows to set the timeout for user-space expectations with
no helper assigned.
* a list of expectations created from user-space that depends
on ctnetlink (if this module is removed, they are deleted).
* includes USERSPACE in the /proc output for expectations
that have been created by a user-space helper.
This patch also modifies ctnetlink to skip including the helper
name in the Netlink messages if no kernel-space helper is set
(since no user-space expectation has not kernel-space kernel
assigned).
You can access an example user-space FTP conntrack helper at:
http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-userspace-POC.tar.bz
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Tunnels are going to use percpu for their accounting.
They are going to use a new tstats field in net_device.
skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx()
IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet stack assumes the presence of Pipe Controller, either in Modem or
on Application Processing Engine user-space for the Pipe data.
Nokia Slim Modems like WG2.5 used in ST-Ericsson U8500 platform do not
implement Pipe controller in them.
This patch adds Pipe Controller implemenation to Phonet stack to support
Pipe data over Phonet stack for Nokia Slim Modems.
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8c0c709eea
Author: Johannes Berg <johannes@sipsolutions.net>
Date: Wed Nov 25 17:46:15 2009 +0100
mac80211: move cmntr flag out of rx flags
moved the CMNTR flag into the skb RX flags for
some aggregation cleanups, but this was wrong
since the optimisation this flag tried to make
requires that it is kept across the processing
of multiple interfaces -- which isn't true for
flags in the skb. The patch not only broke the
optimisation, it also introduced a bug: under
some (common!) circumstances the flag will be
set on an already freed skb!
However, investigating this in more detail, I
found that most of the flags that we set should
be per packet, _except_ for this one, due to
a-MPDU processing. Additionally, the flags used
for processing (currently just this one) need
to be reset before processing a new packet.
Since we haven't actually seen bugs reported as
a result of the wrong flags handling (which is
not too surprising -- the only real bug case I
can come up with is an a-MSDU contained in an
a-MPDU), I'll make a different fix for rc.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The old ieee80211_find_sta_by_hw method didn't properly
find VIFS when there was more than one per AP. This caused
AMPDU logic in ath9k to get the wrong VIF when trying to
account for transmitted SKBs.
This patch changes ieee80211_find_sta_by_hw to take a
localaddr argument to distinguish between VIFs with the
same AP but different local addresses. The method name
is changed to ieee80211_find_sta_by_ifaddr.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Clean up a missing exit path in the ipv6 module init routines. In
addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
for the ipv6_addr_label_ops structure. But if module loading fails, or if the
ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
which leaves a now-bogus address on the pernet_list, leading to oopses in
subsequent registrations. This patch cleans up both the failed load path and
the unload path. Tested by myself with good results.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
include/net/addrconf.h | 1 +
net/ipv6/addrconf.c | 11 ++++++++---
net/ipv6/addrlabel.c | 5 +++++
3 files changed, 14 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
SOCK_MIN_RCVBUF current value is 256 bytes
It doesnt permit to receive the smallest possible frame, considering
socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
headroom, and we go over the limit.
With old kernels and 32bit arches, we were under the limit, if netdriver
was doing copybreak.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset queue mapping when an skb is reentering the stack via a tunnel.
On second pass, the queue mapping from the original device is no
longer valid.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes stale mac80211_tx_control_flags for
filtered / retried frames.
Because ieee80211_handle_filtered_frame feeds skbs back
into the tx path, they have to be stripped of some tx
flags so they won't confuse the stack, driver or device.
Cc: <stable@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch, you can specify the expectation flags for user-space
created expectations.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
- Use rcu_dereference_rtnl() in __in6_dev_get
- kerneldoc for __in6_dev_get() and in6_dev_get()
- Use inline functions instead of macros
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new sysctl flag "snat_reroute". Recent kernels use
ip_route_me_harder() to route LVS-NAT responses properly by
VIP when there are multiple paths to client. But setups
that do not have alternative default routes can skip this
routing lookup by using snat_reroute=0.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add more code to IPVS to work with Netfilter connection
tracking and fix some problems.
- Allow IPVS to be compiled without connection tracking as in
2.6.35 and before. This can avoid keeping conntracks for all
IPVS connections because this costs memory. ip_vs_ftp still
depends on connection tracking and NAT as implemented for 2.6.36.
- Add sysctl var "conntrack" to enable connection tracking for
all IPVS connections. For loaded IPVS directors it needs
tuning of nf_conntrack_max limit.
- Add IP_VS_CONN_F_NFCT connection flag to request the connection
to use connection tracking. This allows user space to provide this
flag, for example, in dest->conn_flags. This can be useful to
request connection tracking per real server instead of forcing it
for all connections with the "conntrack" sysctl. This flag is
set currently only by ip_vs_ftp and of course by "conntrack" sysctl.
- Add ip_vs_nfct.c file to hold all connection tracking code,
by this way main code should not depend of netfilter conntrack
support.
- Return back the ip_vs_post_routing handler as in 2.6.35 and use
skb->ipvs_property=1 to allow IPVS to work without connection
tracking
Connection tracking:
- most of the code is already in 2.6.36-rc
- alter conntrack reply tuple for LVS-NAT connections when first packet
from client is forwarded and conntrack state is NEW or RELATED.
Additionally, alter reply for RELATED connections from real server,
again for packet in original direction.
- add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering
reply) for LVS-TUN early because we want to call nf_reset. It is
needed because we add IPIP header and the original conntrack
should be preserved, not destroyed. The transmitted IPIP packets
can reuse same conntrack, so we do not set skb->ipvs_property.
- try to destroy conntrack when the IPVS connection is destroyed.
It is not fatal if conntrack disappears before that, it depends
on the used timers.
Fix problems from long time:
- add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The family parameter xfrm_state_find is used to find a state matching a
certain policy. This value is set to the template's family
(encap_family) right before xfrm_state_find is called.
The family parameter is however also used to construct a temporary state
in xfrm_state_find itself which is wrong for inter-family scenarios
because it produces a selector for the wrong family. Since this selector
is included in the xfrm_user_acquire structure, user space programs
misinterpret IPv6 addresses as IPv4 and vice versa.
This patch splits up the original init_tempsel function into a part that
initializes the selector respectively the props and id of the temporary
state, to allow for differing ip address families whithin the state.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- the sync protocol supports 16 bits only, so bits 0..15 should be
used only for flags that should go to backup server, bits 16 and
above should be allocated for flags not sent to backup.
- use IP_VS_CONN_F_DEST_MASK as mask of connection flags in
destination that can be changed by user space
- allow IP_VS_CONN_F_ONE_PACKET to be set in destination
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When a driver advertises p2p device support,
mac80211 will handle it, but internally it will
rewrite the interface type to STA/AP rather than
P2P-STA/GO since otherwise a lot of paths need
to be touched that are otherwise identical. A
p2p boolean tells drivers whether or not a given
interface will be used for p2p or not.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The output becomes:
[ 41.261941] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When both destination device and object are nul, Phonet routes the
packet according to the resource field. In fact, this is the most
common pattern when sending Phonet "request" packets. In this case,
the packet is delivered to whichever endpoint (socket) has
registered the resource.
This adds a new table so that Linux processes can register their
Phonet sockets to Phonet resources, if they have adequate privileges.
(Namespace support is not implemented at the moment.)
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Closing a pipe endpoint is not normally allowed by the Phonet pipe,
other than as a side after-effect of removing the pipe between two
endpoints. But there is no way to prevent Linux userspace processes
from being killed or suffering from bugs, so this can still happen.
We might as well forcefully close Phonet pipe endpoints then.
The cellular modem supports only a few existing pipes at a time. So we
really should not leak them. This change instructs the modem to destroy
the pipe if either of the pipe's endpoint (Linux socket) is closed too
early.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If peer uses tiny MSS (say, 75 bytes) and similarly tiny advertised
window, the SWS logic will packetize to half the MSS unnecessarily.
This causes problems with some embedded devices.
However for large MSS devices we do want to half-MSS packetize
otherwise we never get enough packets into the pipe for things
like fast retransmit and recovery to work.
Be careful also to handle the case where MSS > window, otherwise
we'll never send until the probe timer.
Reported-by: ツ Leandro Melo de Sales <leandroal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).
Problem is that following sequence :
fd = socket(...)
connect(fd, &remote, ...)
not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)
Sequence is :
- autobind() : choose a random local port, insert socket in hash tables
[while local address is INADDR_ANY]
- connect() : set remote address and port, change local address to IP
given by a route lookup.
When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.
One solution to this problem is to rehash datagram socket if needed.
We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.
This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.
Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Do not create expectation when forwarding the PORT
command to avoid blocking the connection. The problem is that
nf_conntrack_ftp.c:help() tries to create the same expectation later in
POST_ROUTING and drops the packet with "dropping packet" message after
failure in nf_ct_expect_related.
- Change ip_vs_update_conntrack to alter the conntrack
for related connections from real server. If we do not alter the reply in
this direction the next packet from client sent to vport 20 comes as NEW
connection. We alter it but may be some collision happens for both
conntracks and the second conntrack gets destroyed immediately. The
connection stucks too.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave reported an rcu lockdep warning on 2.6.35.4 kernel
task->cgroups and task->cgroups->subsys[i] are protected by RCU.
So we avoid accessing invalid pointers here. This might happen,
for example, when you are deref-ing those pointers while someone
move @task from one cgroup to another.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This updates the use of larger initial windows, as originally specified in
RFC 3390, to use the newer IW values specified in RFC 5681, section 3.1.
The changes made in RFC 5681 are:
a) the setting now is more clearly specified in units of segments (as the
comments by John Heffner emphasized, this was not very clear in RFC 3390);
b) for connections with 1095 < SMSS <= 2190 there is now a change:
- RFC 3390 says that IW <= 4380,
- RFC 5681 says that IW = 3 * SMSS <= 6570.
Since RFC 3390 is older and "only" proposed standard, whereas the newer RFC 5681
is already draft standard, it seems preferable to use the newer IW variant.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch consolidates initial-window code common to TCP and CCID-2:
* TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
* CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides a "user timeout" support as described in RFC793. The
socket option is also needed for the the local half of RFC5482 "TCP User
Timeout Option".
TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int,
when > 0, to specify the maximum amount of time in ms that transmitted
data may remain unacknowledged before TCP will forcefully close the
corresponding connection and return ETIMEDOUT to the application. If
0 is given, TCP will continue to use the system default.
Increasing the user timeouts allows a TCP connection to survive extended
periods without end-to-end connectivity. Decreasing the user timeouts
allows applications to "fail fast" if so desired. Otherwise it may take
upto 20 minutes with the current system defaults in a normal WAN
environment.
The socket option can be made during any state of a TCP connection, but
is only effective during the synchronized states of a connection
(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK).
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
TCP_USER_TIMEOUT will overtake keepalive to determine when to close a
connection due to keepalive failure.
The option does not change in anyway when TCP retransmits a packet, nor
when a keepalive probe will be sent.
This option, like many others, will be inherited by an acceptor from its
listener.
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to mac80211 for changing the interface
type even when the interface is UP, if the driver
supports it.
To achieve this
* add a new driver callback for switching,
* split some of the interface up/down code out
into new functions (do_open/do_stop), and
* maintain an own __SDATA_RUNNING bit that will
not be set during interface type, so that any
other code doesn't use the interface.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some vendor specified mechanisms for 802.1X-style
functionality use a different protocol than EAP
(even if EAP is vendor-extensible). Allow setting
the ethertype for the protocol when a driver has
support for this. The default if unspecified is
EAP, of course.
Note: This is suitable only for station mode, not
for AP implementation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The ieee80211_scan_completed() function was a frequent
source of potential deadlocks, since it is called by
drivers but may call back into drivers, so drivers had
to make sure to call it without any locks held, which
frequently lead to more complex code in drivers. Avoid
that problem by allowing the function to be called in
any context, and queueing the actual work it does.
Also update the documentation for it to indicate this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to
use do { print } while (0) guards.
Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when
lines were continued.
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Add a missing newline in "Failed bind hash alloc"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix spelling and readability of a few lines of kernel doc:
s/issueing/issuing/g
s/approriate/appropriate/g
s/supported by simply/supported simply by/
s/IEEE80211_HW_BEACON_FILTERING/IEEE80211_HW_BEACON_FILTER/g
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.
Fix this by doing the full expensive sum check if the optimized check
triggers.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Add some documentation for cfg80211. I'm hoping some of
the regulatory documentation will be filled by somebody
more familiar with it, hint hint! :)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix a small problem in the documentation for
ieee80211_request_smps, and a now erroneous
inclusion of enum ieee80211_key_alg, which no
longer exists after the change to ciphers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow userspace to register for more than just
action frames by giving the frame subtype, and
make it possible to use this in various modes
as well.
With some tweaks and some added functionality
this will, in the future, also be usable in AP
mode and be able to replace the cooked monitor
interface currently used in that case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This function analyses only its single, value-passed
argument, and has no side effects. Thus it can be
const, which makes mac80211 smaller, for example:
text data bss dec hex filename
362518 16720 884 380122 5ccda mac80211.ko (before)
362358 16720 884 379962 5cc3a mac80211.ko (after)
a 160 byte saving in text size, and an optimisation
because the function won't be called as often.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
struct net_device has its own struct net_device_stats member, so use
this one instead of a private copy in the irlan_cb struct.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPP: introduce "pptp" module which implements point-to-point tunneling protocol using pppox framework
NET: introduce the "gre" module for demultiplexing GRE packets on version criteria
(required to pptp and ip_gre may coexists)
NET: ip_gre: update to use the "gre" module
This patch introduces then pptp support to the linux kernel which
dramatically speeds up pptp vpn connections and decreases cpu usage in
comparison of existing user-space implementation
(poptop/pptpclient). There is accel-pptp project
(https://sourceforge.net/projects/accel-pptp/) to utilize this module,
it contains plugin for pppd to use pptp in client-mode and modified
pptpd (poptop) to build high-performance pptp NAS.
There was many changes from initial submitted patch, most important are:
1. using rcu instead of read-write locks
2. using static bitmap instead of dynamically allocated
3. using vmalloc for memory allocation instead of BITS_PER_LONG + __get_free_pages
4. fixed many coding style issues
Thanks to Eric Dumazet.
Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/sched: add ACT_CSUM action to update packets checksums
ACT_CSUM can be called just after ACT_PEDIT in order to re-compute some
altered checksums in IPv4 and IPv6 packets. The following checksums are
supported by this patch:
- IPv4: IPv4 header, ICMP, IGMP, TCP, UDP & UDPLite
- IPv6: ICMPv6, TCP, UDP & UDPLite
It's possible to request in the same action to update different kind of
checksums, if the packets flow mix TCP, UDP and UDPLite, ...
An example of usage is done in the associated iproute2 patch.
Version 3 changes:
- remove useless goto instructions
- improve IPv6 hop options decoding
Version 2 changes:
- coding style correction
- remove useless arguments of some functions
- use stack in tcf_csum_dump()
- add tcf_csum_skb_nextlayer() to factor code
Signed-off-by: Gregoire Baron <baronchon@n7mm.org>
Acked-by: jamal <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The task_cls_classid() function applies rcu_dereference() to integers,
which does not work with the shiny new sparse-based checking in
rcu_dereference(). This commit therefore moves to the new RCU API
rcu_dereference_index_check().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch removes the abstraction introduced by the union skb_shared_tx in
the shared skb data.
The access of the different union elements at several places led to some
confusion about accessing the shared tx_flags e.g. in skb_orphan_try().
http://marc.info/?l=linux-netdev&m=128084897415886&w=2
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mac80211 translates the cfg80211
cipher suite selectors into ALG_* values.
That isn't all too useful, and some drivers
benefit from the distinction between WEP40
and WEP104 as well. Therefore, convert it
all to use the cipher suite selectors.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sometimes drivers have more information than the
stack about how their antennas/chains are used,
and may require that the SM PS mode be changed.
This could happen, for example, when detecting
that the user disconnected an antenna. Thus this
patch introduces API to allow drivers to request
SM PS mode changes.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sometimes we don't just need to know whether or
not the device is idle, but also per interface.
This adds that reporting capability to mac80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch implement basic infrastructure to support use of NAPI by
mac80211-based hardware drivers.
Because mac80211 devices can support multiple netdevs, a dummy netdev
is used for interfacing with the NAPI code in the core of the network
stack. That structure is hidden from the hardware drivers, but the
actual napi_struct is exposed in the ieee80211_hw structure so that the
poll routines in drivers can retrieve that structure. Hardware drivers
can also specify their own weight value for NAPI polling.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The previous value of 672 for L2CAP_DEFAULT_MAX_PDU_SIZE is based on
the default L2CAP MTU. That default MTU is calculated from the size
of two DH5 packets, minus ACL and L2CAP b-frame header overhead.
ERTM is used with newer basebands that typically support larger 3-DH5
packets, and i-frames and s-frames have more header overhead. With
clean RF conditions, basebands will typically attempt to use 1021-byte
3-DH5 packets for maximum throughput. Adjusting for 2 bytes of ACL
headers plus 10 bytes of worst-case L2CAP headers yields 1009 bytes
of payload.
This PDU size imposes less overhead for header bytes and gives the
baseband the option to choose 3-DH5 packets, but is small enough for
ERTM traffic to interleave well with other L2CAP or SCO data.
672-byte payloads do not allow the most efficient over-the-air
packet choice, and cannot achieve maximum throughput over BR/EDR.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The L2CAP specification requires that the ERTM retransmit timeout be at
least 2 seconds for BR/EDR connections.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add missing kernel-doc notation to struct sock:
Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_pid'
Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_cred'
Warning(include/net/sock.h:324): No description found for parameter 'sk_classid'
Warning(include/net/sock.h:324): Excess struct/union/enum/typedef member 'sk_peercred' description in 'sock'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
phy/marvell: add 88ec048 support
igb: Program MDICNFG register prior to PHY init
e1000e: correct MAC-PHY interconnect register offset for 82579
hso: Add new product ID
can: Add driver for esd CAN-USB/2 device
l2tp: fix export of header file for userspace
can-raw: Fix skb_orphan_try handling
Revert "net: remove zap_completion_queue"
net: cleanup inclusion
phy/marvell: add 88e1121 interface mode support
u32: negative offset fix
net: Fix a typo from "dev" to "ndev"
igb: Use irq_synchronize per vector when using MSI-X
ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
e1000e: Fix irq_synchronize in MSI-X case
e1000e: register pm_qos request on hardware activation
ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
net: Add getsockopt support for TCP thin-streams
cxgb4: update driver version
cxgb4: add new PCI IDs
...
Manually fix up conflicts in:
- drivers/net/e1000e/netdev.c: due to pm_qos registration
infrastructure changes
- drivers/net/phy/marvell.c: conflict between adding 88ec048 support
and cleaning up the IDs
- drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
conflict (registration change vs marking it static)
TXATTRCREATE: Prepare a fid for setting xattr value on a file system object.
size[4] TXATTRCREATE tag[2] fid[4] name[s] attr_size[8] flags[4]
size[4] RXATTRCREATE tag[2]
txattrcreate gets a fid pointing to xattr. This fid can later be
used to set the xattr value.
flag value is derived from set Linux setxattr. The manpage says
"The flags parameter can be used to refine the semantics of the operation.
XATTR_CREATE specifies a pure create, which fails if the named attribute
exists already. XATTR_REPLACE specifies a pure replace operation, which
fails if the named attribute does not already exist. By default (no flags),
the extended attribute will be created if need be, or will simply replace
the value if the attribute exists."
The actual setxattr operation happens when the fid is clunked. At that point
the written byte count and the attr_size specified in TXATTRCREATE should be
same otherwise an error will be returned.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
TXATTRWALK: Descend a ATTR namespace
size[4] TXATTRWALK tag[2] fid[4] newfid[4] name[s]
size[4] RXATTRWALK tag[2] size[8]
txattrwalk gets a fid pointing to xattr. This fid can later be
used to read the xattr value. If name is NULL the fid returned
can be used to get the list of extended attribute associated to
the file system object.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Implement 9p2000.L version of open(LOPEN) interface in 9p client.
For LOPEN, no need to convert the flags to and from 9p mode to VFS mode.
Synopsis:
size[4] Tlopen tag[2] fid[4] mode[4]
size[4] Rlopen tag[2] qid[13] iounit[4]
[Fix mode bit format - jvrao@linux.vnet.ibm.com]
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbegren <ericvh@gmail.com>
SYNOPSIS
size[4] Tlcreate tag[2] fid[4] name[s] flags[4] mode[4] gid[4]
size[4] Rlcreate tag[2] qid[13] iounit[4]
DESCRIPTION
The Tlreate request asks the file server to create a new regular file with the
name supplied, in the directory (dir) represented by fid.
The mode argument specifies the permissions to use. New file is created with
the uid if the fid and with supplied gid.
The flags argument represent Linux access mode flags with which the caller
is requesting to open the file with. Protocol allows all the Linux access
modes but it is upto the server to allow/disallow any of these acess modes.
If the server doesn't support any of the access mode, it is expected to
return error.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Implement TMKDIR as part of 2000.L Work
Synopsis
size[4] Tmkdir tag[2] fid[4] name[s] mode[4] gid[4]
size[4] Rmkdir tag[2] qid[13]
Description
mkdir asks the file server to create a directory with given name,
mode and gid. The qid for the new directory is returned with
the mkdir reply message.
Note: 72 is selected as the opcode for TMKDIR from the reserved list.
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Synopsis
size[4] Tmknod tag[2] fid[4] name[s] mode[4] major[4] minor[4] gid[4]
size[4] Rmknod tag[2] qid[13]
Description
mknod asks the file server to create a device node with given major and
minor number, mode and gid. The qid for the new device node is returned
with the mknod reply message.
[sripathik@in.ibm.com: Fix error handling code]
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Create a symbolic link
SYNOPSIS
size[4] Tsymlink tag[2] fid[4] name[s] symtgt[s] gid[4]
size[4] Rsymlink tag[2] qid[13]
DESCRIPTION
Create a symbolic link named 'name' pointing to 'symtgt'.
gid represents the effective group id of the caller.
The permissions of a symbolic link are irrelevant hence it is omitted
from the protocol.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Reviewed-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch adds a helper function to get the dentry from inode and
uses it in creating a Hardlink
SYNOPSIS
size[4] Tlink tag[2] dfid[4] oldfid[4] newpath[s]
size[4] Rlink tag[2]
DESCRIPTION
Create a link 'newpath' in directory pointed by dfid linking to oldfid path.
[sripathik@in.ibm.com : p9_client_link should not free req structure
if p9_client_rpc has returned an error.]
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
SYNOPSIS
size[4] Tsetattr tag[2] attr[n]
size[4] Rsetattr tag[2]
DESCRIPTION
The setattr command changes some of the file status information.
attr resembles the iattr structure used in Linux kernel. It
specifies which status parameter is to be changed and to what
value. It is laid out as follows:
valid[4]
specifies which status information is to be changed. Possible
values are:
ATTR_MODE (1 << 0)
ATTR_UID (1 << 1)
ATTR_GID (1 << 2)
ATTR_SIZE (1 << 3)
ATTR_ATIME (1 << 4)
ATTR_MTIME (1 << 5)
ATTR_ATIME_SET (1 << 7)
ATTR_MTIME_SET (1 << 8)
The last two bits represent whether the time information
is being sent by the client's user space. In the absense
of these bits the server always uses server's time.
mode[4]
File permission bits
uid[4]
Owner id of file
gid[4]
Group id of the file
size[8]
File size
atime_sec[8]
Time of last file access, seconds
atime_nsec[8]
Time of last file access, nanoseconds
mtime_sec[8]
Time of last file modification, seconds
mtime_nsec[8]
Time of last file modification, nanoseconds
Explanation of the patches:
--------------------------
*) The kernel just copies relevent contents of iattr structure to
p9_iattr_dotl structure and passes it down to the client. The
only check it has is calling inode_change_ok()
*) The p9_iattr_dotl structure does not have ctime and ia_file
parameters because I don't think these are needed in our case.
The client user space can request updating just ctime by calling
chown(fd, -1, -1). This is handled on server side without a need
for putting ctime on the wire.
*) The server currently supports changing mode, time, ownership and
size of the file.
*) 9P RFC says "Either all the changes in wstat request happen, or
none of them does: if the request succeeds, all changes were made;
if it fails, none were."
I have not done anything to implement this specifically because I
don't see a reason.
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
SYNOPSIS
size[4] Tgetattr tag[2] fid[4] request_mask[8]
size[4] Rgetattr tag[2] lstat[n]
DESCRIPTION
The getattr transaction inquires about the file identified by fid.
request_mask is a bit mask that specifies which fields of the
stat structure is the client interested in.
The reply will contain a machine-independent directory entry,
laid out as follows:
st_result_mask[8]
Bit mask that indicates which fields in the stat structure
have been populated by the server
qid.type[1]
the type of the file (directory, etc.), represented as a bit
vector corresponding to the high 8 bits of the file's mode
word.
qid.vers[4]
version number for given path
qid.path[8]
the file server's unique identification for the file
st_mode[4]
Permission and flags
st_uid[4]
User id of owner
st_gid[4]
Group ID of owner
st_nlink[8]
Number of hard links
st_rdev[8]
Device ID (if special file)
st_size[8]
Size, in bytes
st_blksize[8]
Block size for file system IO
st_blocks[8]
Number of file system blocks allocated
st_atime_sec[8]
Time of last access, seconds
st_atime_nsec[8]
Time of last access, nanoseconds
st_mtime_sec[8]
Time of last modification, seconds
st_mtime_nsec[8]
Time of last modification, nanoseconds
st_ctime_sec[8]
Time of last status change, seconds
st_ctime_nsec[8]
Time of last status change, nanoseconds
st_btime_sec[8]
Time of creation (birth) of file, seconds
st_btime_nsec[8]
Time of creation (birth) of file, nanoseconds
st_gen[8]
Inode generation
st_data_version[8]
Data version number
request_mask and result_mask bit masks contain the following bits
#define P9_STATS_MODE 0x00000001ULL
#define P9_STATS_NLINK 0x00000002ULL
#define P9_STATS_UID 0x00000004ULL
#define P9_STATS_GID 0x00000008ULL
#define P9_STATS_RDEV 0x00000010ULL
#define P9_STATS_ATIME 0x00000020ULL
#define P9_STATS_MTIME 0x00000040ULL
#define P9_STATS_CTIME 0x00000080ULL
#define P9_STATS_INO 0x00000100ULL
#define P9_STATS_SIZE 0x00000200ULL
#define P9_STATS_BLOCKS 0x00000400ULL
#define P9_STATS_BTIME 0x00000800ULL
#define P9_STATS_GEN 0x00001000ULL
#define P9_STATS_DATA_VERSION 0x00002000ULL
#define P9_STATS_BASIC 0x000007ffULL
#define P9_STATS_ALL 0x00003fffULL
This patch implements the client side of getattr implementation for
9P2000.L. It introduces a new structure p9_stat_dotl for getting
Linux stat information along with QID. The data layout is similar to
stat structure in Linux user space with the following major
differences:
inode (st_ino) is not part of data. Instead qid is.
device (st_dev) is not part of data because this doesn't make sense
on the client.
All time variables are 64 bit wide on the wire. The kernel seems to use
32 bit variables for these variables. However, some of the architectures
have used 64 bit variables and glibc exposes 64 bit variables to user
space on some architectures. Hence to be on the safer side we have made
these 64 bit in the protocol. Refer to the comments in
include/asm-generic/stat.h
There are some additional fields: st_btime_sec, st_btime_nsec, st_gen,
st_data_version apart from the bitmask, st_result_mask. The bit mask
is filled by the server to indicate which stat fields have been
populated by the server. Currently there is no clean way for the
server to obtain these additional fields, so it sends back just the
basic fields.
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbegren <ericvh@gmail.com>
This patch implements the kernel part of readdir() implementation for 9p2000.L
Change from V3: Instead of inode, server now sends qids for each dirent
SYNOPSIS
size[4] Treaddir tag[2] fid[4] offset[8] count[4]
size[4] Rreaddir tag[2] count[4] data[count]
DESCRIPTION
The readdir request asks the server to read the directory specified by 'fid'
at an offset specified by 'offset' and return as many dirent structures as
possible that fit into count bytes. Each dirent structure is laid out as
follows.
qid.type[1]
the type of the file (directory, etc.), represented as a bit
vector corresponding to the high 8 bits of the file's mode
word.
qid.vers[4]
version number for given path
qid.path[8]
the file server's unique identification for the file
offset[8]
offset into the next dirent.
type[1]
type of this directory entry.
name[256]
name of this directory entry.
This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L.
This function sends P9_TREADDIR command to the server. In response the server
sends a buffer filled with dirent structures. This is different from the
existing v9fs_dir_readdir() call which receives stat structures from the server.
This results in significant speedup of readdir() on large directories.
For example, doing 'ls >/dev/null' on a directory with 10000 files on my
laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds
with the new readdir.
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The only user of unique_tuple() get_unique_tuple() doesn't care about the
return value of unique_tuple(), so make unique_tuple() return void (nothing).
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This removes duplicate code by providing a default implementation
which is used by 3 of the 4 modules that provide these call.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
some users of nf_ct_ext_exist() know ct->ext isn't NULL. For these users, the
check for ct->ext isn't necessary, the function __nf_ct_ext_exist() can be
used instead.
the type of the return value of nf_ct_ext_exist() is changed to bool.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The bdaddr in the list root is completely unused and just
taking up space.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some features require knowing the DTIM period
before associating. This implements the ability
to wait for a beacon in mac80211 before assoc
to provide this value. It is optional since
most likely not all drivers will need this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For some drivers it can be useful to know whether the channel they're
supposed to switch to is going to be used for short off-channel work or
scanning, or whether the hardware is expected to stay on it for a while
longer. This is important for various kinds of calibration work, which
takes longer to complete and should keep some persistent state, even if
the channel temporarily changes.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Save a few bytes of text
(allyesconfig)
$ size drivers/net/wireless/built-in.o*
text data bss dec hex filename
3924568 100548 871056 4896172 4ab5ac drivers/net/wireless/built-in.o.new
3926520 100548 871464 4898532 4abee4 drivers/net/wireless/built-in.o.old
$ size net/wireless/core.o*
text data bss dec hex filename
12843 216 3768 16827 41bb net/wireless/core.o.new
12328 216 3656 16200 3f48 net/wireless/core.o
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes hang when target device of mirred packet classifier
action is removed.
If a mirror or redirection action is configured to cause packets
to go to another device, the classifier holds a ref count, but was assuming
the adminstrator cleaned up all redirections before removing. The fix
is to add a notifier and cleanup during unregister.
The new list is implicitly protected by RTNL mutex because
it is held during filter add/delete as well as notifier.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nf_conntrack/nf_nat code to do the packet mangling and the TCP
sequence adjusting. The function 'ip_vs_skb_replace' is now dead
code, so it is removed.
To SNAT FTP, use something like:
% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
--vport 21 -j SNAT --to-source 192.168.10.10
and for the data connections in passive mode:
% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
--vportctl 21 -j SNAT --to-source 192.168.10.10
using '-m state --state RELATED' would also works.
Make sure the kernel modules ip_vs_ftp, nf_conntrack_ftp, and
nf_nat_ftp are loaded.
[ up-port and minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Network code uses the __packed macro instead of __attribute__((packed)).
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove IRDA_PACKED macro, which maps to __attribute__((packed)). IRDA is
one of the last users of __attribute__((packet)). Networking code uses
__packed now.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make net/ and include/net/ code consistent use __packed instead of
__attribute__ ((packed)). Bluetooth subsystem was one of the last net
subsys still using __attribute__ ((packed)).
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Implements feature to reassemble received HCI frames from any input stream
Signed-off-by: Suraj Sumangala <suraj@atheros.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Copyright for the time I worked on L2CAP during the Google Summer of Code
program.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Using a lock to deal with the ERTM race condition - interruption with
new data from the hci layer - is wrong. We should use the native skb
backlog queue.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When mode is mandatory we shall not send connect request and report this
to the userspace as well.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Qualcomm, Inc. has reassigned rights to Code Aurora Forum. Accordingly,
as files are modified by Code Aurora Forum members, the copyright
statement will be updated.
Signed-off-by: Ron Shaffer <rshaffer@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Deleted extraneous white space from the end of several lines
Signed-off-by: Ron Shaffer <rshaffer@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In some circumstances it could be desirable to reject incoming
connections on the baseband level. This patch adds this feature through
two new ioctl's: HCIBLOCKADDR and HCIUNBLOCKADDR. Both take a simple
Bluetooth address as a parameter. BDADDR_ANY can be used with
HCIUNBLOCKADDR to remove all devices from the blacklist.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Conflicts:
drivers/vhost/net.c
net/bridge/br_device.c
Fix merge conflict in drivers/vhost/net.c with guidance from
Stephen Rothwell.
Revert the effects of net-2.6 commit 573201f36f
since net-next-2.6 has fixes that make bridge netpoll work properly thus
we don't need it disabled.
Signed-off-by: David S. Miller <davem@davemloft.net>
CHECK net/wireless/wext-compat.c
net/wireless/wext-compat.c:1434:5: warning: symbol 'cfg80211_wext_siwpmksa' was not declared. Should it be static?
Add declaration in cfg80211.h. Also add an EXPORT_SYMBOL_GPL, since all
the peer functions have it.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The meaning and/or usage of the country IE is somewhat poorly defined.
In practice, this means that regulatory rulesets in a country IE are
often incomplete and might be untrustworthy. This removes the code
associated with interpreting those rulesets while preserving respect
for country "alpha2" codes also contained in the country IE.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ever since
commit e1b3ec1a2a
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date: Mon Mar 29 12:18:34 2010 +0200
mac80211: explicitly disable/enable QoS
mac80211 is telling drivers, in particular
iwlwifi, whether QoS is enabled or not.
However, this is only relevant for station mode,
since only then will any device send nullfunc
frames and need to know whether they should be
QoS frames or not. In other modes, there are
(currently) no frames the device is supposed to
send.
When you now consider virtual interfaces, it
becomes apparent that the current mechanism is
inadequate since it enables/disables QoS on a
global scale, where for nullfunc frames it has
to be on a per-interface scale.
Due to the above considerations, we can change
the way mac80211 advertises the QoS state to
drivers to only ever advertise it as "off" in
station mode, and make it a per-BSS setting.
Tested-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Correct comment stating sizeof(struct tcp_skb_cb) is 36 or 40, since its
44 bytes, since commit 951dbc8ac7 ([IPV6]: Move nextheader offset
to the IP6CB).
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves NFULNL_COPY_PACKET definition from
linux/netfilter/nfnetlink_log.h to net/netfilter/nfnetlink_log.h
since this copy mode is only for internal use.
I have also changed the value from 0x03 to 0xff. Thus, we avoid
a gap from user-space that may confuse users if we add new
copy modes in the future.
This change was introduced in:
http://www.spinics.net/lists/netfilter-devel/msg13535.html
Since this change is not included in any stable Linux kernel,
I think it's safe to make this change now. Anyway, this copy
mode does not make any sense from user-space, so this patch
should not break any existing setup.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Fix problem in reading the tx_queue recorded in a socket. In
dev_pick_tx, the TX queue is read by doing a check with
sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get.
The problem is that there is not mutual exclusion across these
calls in the socket so it it is possible that the queue in the
sock can be invalidated after sk_tx_queue_recorded is called so
that sk_tx_queue get returns -1, which sets 65535 in queue_index
and thus dev_pick_tx returns 65536 which is a bogus queue and
can cause crash in dev_queue_xmit.
We fix this by only calling sk_tx_queue_get which does the proper
checks. The interface is that sk_tx_queue_get returns the TX queue
if the sock argument is non-NULL and TX queue is recorded, else it
returns -1. sk_tx_queue_recorded is no longer used so it can be
completely removed.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a new boolean flag no_autobind is added to structure proto to avoid the autobind
calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
TCP's sendmsg() and sendpage() pathes.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/net/inet_common.h | 4 ++++
include/net/sock.h | 1 +
include/net/tcp.h | 8 ++++----
net/ipv4/af_inet.c | 15 +++++++++------
net/ipv4/tcp.c | 11 +++++------
net/ipv4/tcp_ipv4.c | 3 +++
net/ipv6/af_inet6.c | 8 ++++----
net/ipv6/tcp_ipv6.c | 3 +++
8 files changed, 33 insertions(+), 20 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Reducing real_num_queues needs to flush the qdisc otherwise
skbs with queue_mappings greater then real_num_tx_queues can
be sent to the underlying driver.
The flow for this is,
dev_queue_xmit()
dev_pick_tx()
skb_tx_hash() => hash using real_num_tx_queues
skb_set_queue_mapping()
...
qdisc_enqueue_root() => enqueue skb on txq from hash
...
dev->real_num_tx_queues -= n
...
sch_direct_xmit()
dev_hard_start_xmit()
ndo_start_xmit(skb,dev) => skb queue set with old hash
skbs are enqueued on the qdisc with skb->queue_mapping set
0 < queue_mappings < real_num_tx_queues. When the driver
decreases real_num_tx_queues skb's may be dequeued from the
qdisc with a queue_mapping greater then real_num_tx_queues.
This fixes a case in ixgbe where this was occurring with DCB
and FCoE. Because the driver is using queue_mapping to map
skbs to tx descriptor rings we can potentially map skbs to
rings that no longer exist.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calling qdisc_reset() the qdisc lock needs to be held. In
this case there is at least one driver i4l which is using this
without holding the lock. Add the locking here.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add fast path for in-order fragments
As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/net/inet_frag.h | 1 +
net/ipv4/ip_fragment.c | 12 ++++++++++++
net/ipv6/reassembly.c | 11 +++++++++++
3 files changed, 24 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
/proc/net/snmp and /proc/net/netstat expose SNMP counters.
Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.
This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.
This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.
# netstat -s|egrep "InOctets|OutOctets"
InOctets: 244068329096
OutOctets: 244069348848
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Determine the size of the xfrm_mark struct, not of its pointer.
Signed-off-by: Andreas Steffen <andreas.steffen@strongswan.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the CAIF SPI Protocol Driver for
CAIF Link Layer.
This driver implements a platform driver to accommodate for a
platform specific SPI device. A general platform driver is not
possible as there are no SPI Slave side Kernel API defined.
A sample CAIF SPI Platform device can be found in
.../Documentation/networking/caif/spi_porting.txt
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
don't clone skb when skb isn't shared
When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need
to clone a new skb. As the skb will be freed after this function returns, we
can use it freely once we get a reference to it.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/net/sch_generic.h | 11 +++++++++--
net/sched/act_mirred.c | 6 +++---
2 files changed, 12 insertions(+), 5 deletions(-)
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows use of ECN when syncookies are in effect by encoding ecn_ok
into the syn-ack tcp timestamp.
While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES.
With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc
removes the "if (0)".
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.
Callers can use __alignof__(type) to provide correct
alignment.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check at rule install time that CT accounting is enabled. Force it
to be enabled if not while also emitting a warning since this is not
the default state.
This is in preparation for deprecating CONFIG_NF_CT_ACCT upon which
CONFIG_NETFILTER_XT_MATCH_CONNBYTES depended being set.
Added 2 CT accounting support functions:
nf_ct_acct_enabled() - Get CT accounting state.
nf_ct_set_acct() - Enable/disable CT accountuing.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
In preparation for a TX power setting interface in the nl80211, change the
.set_tx_power function to use mBm units instead of dBm for greater accuracy and
smaller power levels.
Also, already in advance move the tx_power_setting enumeration to nl80211.
This change affects the .tx_set_power function prototype. As a result, the
corresponding changes are needed to modules using it. These are mac80211,
iwmc3200wifi and rndis_wlan.
Cc: Samuel Ortiz <samuel.ortiz@intel.com>
Cc: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Samuel Ortiz <samuel.ortiz@intel.com>
Acked-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
this patch is implementing IP_NODEFRAG option for IPv4 socket.
The reason is, there's no other way to send out the packet with user
customized header of the reassembly part.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a typo in include/net/netlink.h
should be finalize instead of finanlize
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit aa2ea0586d (tcp: fix outsegs stat for TSO segments) incorrectly
assumed SNMP_ADD_STATS() was used from BH context.
Fix this using mib[!in_softirq()] instead of mib[0]
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This mechanism introduced in this patch applies (at least) for hardware
designs using a single shared antenna for both WLAN and BT. In these designs,
the antenna must be toggled between WLAN and BT.
In those hardware, managing WLAN co-existence with Bluetooth requires WLAN
full power save whenever there is Bluetooth activity in order for WLAN to be
able to periodically relinquish the antenna to be used for BT. This is because
BT can only access the shared antenna when WLAN is idle or asleep.
Some hardware, for instance the wl1271, are able to indicate to the host
whenever there is BT traffic. In essence, the hardware will send an indication
to the host whenever there is, for example, SCO traffic or A2DP traffic, and
will send another indication when the traffic is over.
The hardware gets information of Bluetooth traffic via hardware co-existence
control lines - these lines are used to negotiate the shared antenna
ownership. The hardware will give the antenna to BT whenever WLAN is sleeping.
This patch adds the interface to mac80211 to facilitate temporarily disabling
of dynamic power save as per request of the WLAN driver. This interface will
immediately force WLAN to full powersave, hence allowing BT coexistence as
described above.
In these kind of shared antenna desings, when WLAN powersave is fully disabled,
Bluetooth will not work simultaneously with WLAN at all. This patch does not
address that problem. This interface will not change PSM state, so if PSM is
disabled it will remain so. Solving this problem requires knowledge about BT
state, and is best done in user-space.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Previously CAIF supported maximum transfer size of ~4050.
The transfer size is now calculated dynamically based on the
link layers mtu size.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
CAIF Remote File Manager may send or receive more than 4050 bytes.
Due to this The CAIF RFM service have to support segmentation.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Flow control is not used by all CAIF services.
The usage of flow control is now part of the gerneal
initialization function for CAIF Services.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2.6.34 introduced 'conntrack zones' to deal with cases where packets
from multiple identical networks are handled by conntrack/NAT. Packets
are looped through veth devices, during which they are NATed to private
addresses, after which they can continue normally through the stack
and possibly have NAT rules applied a second time.
This works well, but is needlessly complicated for cases where only
a single SNAT/DNAT mapping needs to be applied to these packets. In that
case, all that needs to be done is to assign each network to a seperate
zone and perform NAT as usual. However this doesn't work for packets
destined for the machine performing NAT itself since its corrently not
possible to configure SNAT mappings for the LOCAL_IN chain.
This patch adds a new INPUT chain to the NAT table and changes the
targets performing SNAT to be usable in that chain.
Example usage with two identical networks (192.168.0.0/24) on eth0/eth1:
iptables -t raw -A PREROUTING -i eth0 -j CT --zone 1
iptables -t raw -A PREROUTING -i eth0 -j MARK --set-mark 1
iptables -t raw -A PREROUTING -i eth1 -j CT --zone 2
iptabels -t raw -A PREROUTING -i eth1 -j MARK --set-mark 2
iptables -t nat -A INPUT -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A POSTROUTING -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A INPUT -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t nat -A POSTROUTING -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t raw -A PREROUTING -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A OUTPUT -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A PREROUTING -d 10.0.1.0/24 -j CT --zone 2
iptables -t raw -A OUTPUT -d 10.0.1.0/24 -j CT --zone 2
iptables -t nat -A PREROUTING -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A PREROUTING -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
Signed-off-by: Patrick McHardy <kaber@trash.net>
In unix_skb_parms store pointers to struct pid and struct cred instead
of raw uid, gid, and pid values, then translate the credentials on
reception into values that are meaningful in the receiving processes
namespaces.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start capturing not only the userspace pid, uid and gid values of the
sending process but also the struct pid and struct cred of the sending
process as well.
This is in preparation for properly supporting SCM_CREDENTIALS for
sockets that have different uid and/or pid namespaces at the different
ends.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use struct pid and struct cred to store the peer credentials on struct
sock. This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.
This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reorder the fields in scm_cookie so they pack better on 64bit.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discard the ACK if we find options that do not match current sysctl
settings.
Previously it was possible to create a connection with sack, wscale,
etc. enabled even if the feature was disabled via sysctl.
Also remove an unneeded call to tcp_sack_reset() in
cookie_check_timestamp: Both call sites (cookie_v4_check,
cookie_v6_check) zero "struct tcp_options_received", hand it to
tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
and then call cookie_check_timestamp().
Even if num_sacks/dsacks were changed, the structure is allocated on
the stack and after cookie_check_timestamp returns only a few selected
members are copied to the inet_request_sock.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches.
Thats a bit unfortunate, since old size was exactly 64 bytes.
This can be solved, using an union between this rcu_head an four fields,
that are normally used only when a refcount is taken on inet_peer.
rcu_head is used only when refcnt=-1, right before structure freeing.
Add a inet_peer_refcheck() function to check this assertion for a while.
We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inetpeer currently uses an AVL tree protected by an rwlock.
It's possible to make most lookups use RCU
1) Add a struct rcu_head to struct inet_peer
2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
lookup. This is a normal function, not a macro like lookup().
3) Add a limit to number of links followed by lookup_rcu_bh(). This is
needed in case we fall in a loop.
4) add an smp_wmb() in link_to_pool() right before node insert.
5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
last reference to an inet_peer, since lockless readers could increase
refcount, even while we hold peers.lock.
6) Delay struct inet_peer freeing after rcu grace period so that
lookup_rcu_bh() cannot crash.
7) inet_getpeer() first attempts lockless lookup.
Note this lookup can fail even if target is in AVL tree, but a
concurrent writer can let tree in a non correct form.
If this attemps fails, lock is taken a regular lookup is performed
again.
8) convert peers.lock from rwlock to a spinlock
9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
128 bytes)
In a future patch, this is probably possible to revert this part, if rcu
field is put in an union to share space with rid, ip_id_count, tcp_ts &
tcp_ts_stamp. These fields being manipulated only with refcnt > 0.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ps-qos latency handling is broken. It uses predetermined latency values
to select specific dynamic PS timeouts. With common AP configurations, these
values overlap with beacon interval and are therefore essentially useless
(for network latencies less than the beacon interval, PSM is disabled.)
This patch remedies the problem by replacing the predetermined network latency
values with one high value (1900ms) which is used to go trigger full psm. For
backwards compatibility, the value 2000ms is still mapped to a dynamic ps
timeout of 100ms.
Currently also the mac80211 internal value for storing user space configured
dynamic PSM values is incorrectly in the driver visible ieee80211_conf struct.
Move it to the ieee80211_local struct.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There is a circular locking dependency when configuring the
hardware ARP filters on association, occurring when flushing the mac80211
workqueue. This is what happens:
[ 92.026800] =======================================================
[ 92.030507] [ INFO: possible circular locking dependency detected ]
[ 92.030507] 2.6.34-04781-g2b2c009 #85
[ 92.030507] -------------------------------------------------------
[ 92.030507] modprobe/5225 is trying to acquire lock:
[ 92.030507] ((wiphy_name(local->hw.wiphy))){+.+.+.}, at: [<ffffffff8105b5c0>] flush_workq
ueue+0x0/0xb0
[ 92.030507]
[ 92.030507] but task is already holding lock:
[ 92.030507] (rtnl_mutex){+.+.+.}, at: [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20
[ 92.030507]
[ 92.030507] which lock already depends on the new lock.
[ 92.030507]
[ 92.030507]
[ 92.030507] the existing dependency chain (in reverse order) is:
[ 92.030507]
[ 92.030507] -> #2 (rtnl_mutex){+.+.+.}:
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff81341754>] mutex_lock_nested+0x44/0x300
[ 92.030507] [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20
[ 92.030507] [<ffffffffa022d47c>] ieee80211_assoc_done+0x6c/0xe0 [mac80211]
[ 92.030507] [<ffffffffa022f2ad>] ieee80211_work_work+0x31d/0x1280 [mac80211]
[ 92.030507] -> #1 ((&local->work_work)){+.+.+.}:
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff8105a51a>] worker_thread+0x22a/0x370
[ 92.030507] [<ffffffff8105ecc6>] kthread+0x96/0xb0
[ 92.030507] [<ffffffff81003a94>] kernel_thread_helper+0x4/0x10
[ 92.030507]
[ 92.030507] -> #0 ((wiphy_name(local->hw.wiphy))){+.+.+.}:
[ 92.030507] [<ffffffff81075fdc>] __lock_acquire+0x1c0c/0x1d50
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff8105b60e>] flush_workqueue+0x4e/0xb0
[ 92.030507] [<ffffffffa023ff7b>] ieee80211_stop_device+0x2b/0xb0 [mac80211]
[ 92.030507] [<ffffffffa0231635>] ieee80211_stop+0x3e5/0x680 [mac80211]
The locking in this case is quite complex. Fix the problem by rewriting the
way the hardware ARP filter list is handled - i.e. make a copy of the address
list to the bss_conf struct, and provide that list to the hardware driver
when needed.
The current patch will enable filtering also in promiscuous mode. This may need
to be changed in the future.
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds support to nl80211 and mac80211 to set basic rates when
joining/creating ibss network.
Original patch was posted by Johannes Berg on the linux-wireless posting list.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow drivers to sleep, and indicate this in
the documentation. ath9k has some locking I
don't understand, so keep it safe and disable
BHs in it, all other drivers look fine with
the context change.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The non-irqsafe aggregation start/stop done
callbacks are currently only used by ath9k_htc,
and can cause callbacks into the driver again.
This might lead to locking issues, which will
only get worse as we modify locking. To avoid
trouble, remove the non-irqsafe versions and
change ath9k_htc to use those instead.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
gen_kill_estimator() API is incomplete or not well documented, since
caller should make sure an RCU grace period is respected before
freeing stats_lock.
This was partially addressed in commit 5d944c640b
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already RCU
protected.
A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock, already RCU
protected.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove useless union keyword in rtable, rt6_info and dn_route.
Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 66018506e1 (ip: Router Alert RCU conversion) introduced RCU
lookups to ip_call_ra_chain(). It missed proper deinit phase :
When ip_ra_control() deletes an ip_ra_chain, it should make sure
ip_call_ra_chain() users can not start to use socket during the rcu
grace period. It should also delay the sock_put() after the grace
period, or we risk a premature socket freeing and corruptions, as
raw sockets are not rcu protected yet.
This delay avoids using expensive atomic_inc_not_zero() in
ip_call_ra_chain().
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use call_rcu rather than synchronize_rcu.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet, slowing down performance.
This patch converts it to a per_cpu variable.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet. This is bad for performance.
__read_mostly annotation is also a bad choice.
This patch introduces IPS_UNTRACKED bit so that we can use later a
per_cpu untrack structure more easily.
A new helper, nf_ct_untracked_get() returns a pointer to
nf_conntrack_untracked.
Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add
IPS_NAT_DONE_MASK bits to untracked status.
nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Fix a whole bunch of kernel-doc warnings
and errors that crop up when running it on
mac80211 and cfg80211; the latter isn't
normally done so lots of bit-rot happened.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We currently dirty two cache lines in struct xt_rateest, this hurts SMP
performance.
This patch moves lock/bstats/rstats at beginning of structure so that
they share a single cache line.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Straightforward conversion to RCU.
One rwlock becomes a spinlock, and is static.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch address a serious performance issue in reading the
TCP sockets table (/proc/net/tcp).
Reading the full table is done by a number of sequential read
operations. At each read operation, a seek is done to find the
last socket that was previously read. This seek operation requires
that the sockets in the table need to be counted up to the current
file position, and to count each of these requires taking a lock for
each non-empty bucket. The whole algorithm is O(n^2).
The fix is to cache the last bucket value, offset within the bucket,
and the file position returned by the last read operation. On the
next sequential read, the bucket and offset are used to find the
last read socket immediately without needing ot scan the previous
buckets the table. This algorithm t read the whole table is O(n).
The improvement offered by this patch is easily show by performing
cat'ing /proc/net/tcp on a machine with a lot of connections. With
about 182K connections in the table, I see the following:
- Without patch
time cat /proc/net/tcp > /dev/null
real 1m56.729s
user 0m0.214s
sys 1m56.344s
- With patch
time cat /proc/net/tcp > /dev/null
real 0m0.894s
user 0m0.290s
sys 0m0.594s
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm triggers a warning if dst_pop() drops a refcount
on a noref dst. This patch changes dst_pop() to
skb_dst_pop(). skb_dst_pop() drops the refcnt only
on a refcounted dst. Also we don't clone the child
dst_entry, so it is not refcounted and we can use
skb_dst_set_noref() in xfrm_output_one().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
STA_NOTIFY_ADD and STA_NOTIFY_REMOVE have no users anymore,
and station addition/removal are indicated to drivers
using sta_add() and sta_remove(), which can sleep.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There's no sense in letting anything but internal
mac80211 functions set the initiator to anything
but WLAN_BACK_INITIATOR, since WLAN_BACK_RECIPIENT
is only valid when we have received a frame from
the peer, which we react to directly in mac80211.
The debugfs code I recently added got this wrong
as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some hardware allow extended filtering of ARP frames not intended for
the host. To perform such filtering, the hardware needs to know the current
IP address(es) of the host, bound to its interface.
Add support for ARP filtering to mac80211 by adding a new op to the driver
interface, allowing to configure the current IP addresses. This op is called
upon association with the currently configured address(es), and when
associated whenever the IP address(es) change.
This patch adds configuration of IPv4 addresses only, as IPv6 addresses don't
need ARP filtering.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There's a single use of this struct member, but
as it is write-only it clearly not necessary.
Thus we can free up some space here, even if we
don't need it right now it seems pointless to
carry around the variable.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
cleanup patch.
Use new __packed annotation in net/ and include/
(except netfilter)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending action frames, we want to verify
that we do that on the correct channel. However,
checking the channel type in addition can get in
the way, since the channel type could change on
the fly during an association, and it's not
useful to have the channel type anyway since it
has no effect on the transmission. Therefore,
make it optional to specify so that if wanted,
it can still be checked, but is not required.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There are more than a dozen occurrences of following code in the
IPv6 stack:
if (opt && opt->srcrt) {
struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
ipv6_addr_copy(&final, &fl.fl6_dst);
ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
final_p = &final;
}
Replace those with a helper. Note that the helper overrides final_p
in all cases. This is ok as final_p was previously initialized to
NULL when declared.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When many cpus compete for sending frames on a given qdisc, the qdisc
spinlock suffers from very high contention.
The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire
the lock, and cannot dequeue packets fast enough, since it must wait for
this lock for each dequeued packet.
One solution to this problem is to force all cpus spinning on a second
lock before trying to get the main lock, when/if they see
__QDISC_STATE_RUNNING already set.
The owning cpu then compete with at most one other cpu for the main
lock, allowing for higher dequeueing rate.
Based on a previous patch from Alexander Duyck. I added the heuristic to
avoid the atomic in fast path, and put the new lock far away from the
cache line used by the dequeue worker. Also try to release the busylock
lock as late as possible.
Tests with following script gave a boost from ~50.000 pps to ~600.000
pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
(A single netperf flow can reach ~800.000 pps on this platform)
for j in `seq 0 3`; do
for i in `seq 0 7`; do
netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
done
done
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__QDISC_STATE_RUNNING is always changed while qdisc lock is held.
We can avoid two atomic operations in xmit path, if we move this bit in
a new __state container.
Location of this __state container is carefully chosen so that fast path
only dirties one qdisc cache line.
THROTTLED bit could later be moved into this __state location too, to
avoid dirtying first qdisc cache line.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a
second patch will move on another location.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct sk_forward_alloc handling for error_queue would need to use a
backlog of frames that softirq handler could not deliver because socket
is owned by user thread. Or extend backlog processing to be able to
process normal and error packets.
Another possibility is to not use mem charge for error queue, this is
what I implemented in this patch.
Note: this reverts commit 29030374
(net: fix sk_forward_alloc corruptions), since we dont need to lock
socket anymore.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
netlink: bug fix: wrong size was calculated for vfinfo list blob
netlink: bug fix: don't overrun skbs on vf_port dump
xt_tee: use skb_dst_drop()
netdev/fec: fix ifconfig eth0 down hang issue
cnic: Fix context memory init. on 5709.
drivers/net: Eliminate a NULL pointer dereference
drivers/net/hamradio: Eliminate a NULL pointer dereference
be2net: Patch removes redundant while statement in loop.
ipv6: Add GSO support on forwarding path
net: fix __neigh_event_send()
vhost: fix the memory leak which will happen when memory_access_ok fails
vhost-net: fix to check the return value of copy_to/from_user() correctly
vhost: fix to check the return value of copy_to/from_user() correctly
vhost: Fix host panic if ioctl called with wrong index
net: fix lock_sock_bh/unlock_sock_bh
net/iucv: Add missing spin_unlock
net: ll_temac: fix checksum offload logic
net: ll_temac: fix interrupt bug when interrupt 0 is used
sctp: dubious bitfields in sctp_transport
ipmr: off by one in __ipmr_fill_mroute()
...
This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.
After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse complains because these one-bit bitfields are signed.
include/net/sctp/structs.h:879:24: error: dubious one-bit signed bitfield
include/net/sctp/structs.h:889:31: error: dubious one-bit signed bitfield
include/net/sctp/structs.h:895:26: error: dubious one-bit signed bitfield
include/net/sctp/structs.h:898:31: error: dubious one-bit signed bitfield
include/net/sctp/structs.h:901:27: error: dubious one-bit signed bitfield
It doesn't cause a problem in the current code, but it would be better
to clean it up. This was introduced by c0058a35aa: "sctp: Save some
room in the sctp_transport by using bitfields".
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the cls_cgroup module is not loaded, task_cls_classid will
return an uninitialised classid instead of zero.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits)
drivers/net/usb/asix.c: Fix pointer cast.
be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
proc_dointvec: write a single value
hso: add support for new products
Phonet: fix potential use-after-free in pep_sock_close()
ath9k: remove VEOL support for ad-hoc
ath9k: change beacon allocation to prefer the first beacon slot
sock.h: fix kernel-doc warning
cls_cgroup: Fix build error when built-in
macvlan: do proper cleanup in macvlan_common_newlink() V2
be2net: Bug fix in init code in probe
net/dccp: expansion of error code size
ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep
wireless: fix sta_info.h kernel-doc warnings
wireless: fix mac80211.h kernel-doc warnings
iwlwifi: testing the wrong variable in iwl_add_bssid_station()
ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs()
ath9k_htc: dereferencing before check in hif_usb_tx_cb()
rt2x00: Fix rt2800usb TX descriptor writing.
rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions.
...
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.
- Make SHRT_MIN of type s16, not int, for consistency.
[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix sock.h kernel-doc warning:
Warning(include/net/sock.h:1438): No description found for parameter 'wq'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a typo in cgroup_cls_state when cls_cgroup is built-in.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel-doc warnings in mac80211.h:
Warning(include/net/mac80211.h:838): No description found for parameter 'ap_addr'
Warning(include/net/mac80211.h:1726): No description found for parameter 'get_survey'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit 03ceedea97.
This patch was reported to cause a regression in which connectivity is
lost and cannot be reestablished after a suspend/resume cycle.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit 03ceedea97, since it
breaks resume from suspend-to-ram on Rafael's Acer Ferrari One.
NetworkManager thinks everything is ok, but it can't connect to the AP
to get an IP address after the resume.
In fact, it even breaks resume for non-ath9k chipsets: reverting it also
fixes Rafael's Toshiba Protege R500 with the iwlagn driver. As Johannes
says:
"Indeed, this patch needs to be reverted. That mac80211 change is wrong
and completely unnecessary."
Reported-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Daniel Yingqiang Ma <yma.cool@gmail.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread. This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.
Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.
In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK. This is currently
unclassified.
As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.
This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)
The value is set at socket creation time. So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.
For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.
In order to minimise risks, I have not made this the authoritative
classid. For now it is only used as a backup when we execute
with soft IRQs disabled. Once we're completely happy with its
semantics we can use it as the sole classid.
Footnote: I have rearranged the error path on cls_group module
creation. If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discovered bug when running high number of parallel connect requests.
Replace buggy home brewed list with linux/list.h.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I made a V2 of this patch on top of my patches for VFS switches.
All the changes were due to change in some offsets.
rename - change name of file or directory
size[4] Trename tag[2] fid[4] newdirfid[4] name[s]
size[4] Rrename tag[2]
The rename message is used to change the name of a file, possibly moving it
to a new directory. The 9P wstat message can only rename a file within the
same directory.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
I made a V2 of this patch on top of my patches for VFS switches. The
change was adding v9fs_statfs pointer to v9fs_super_ops_dotl
instead of v9fs_super_ops.
statfs - get file system statistics
size[4] Tstatfs tag[2] fid[4]
size[4] Rstatfs tag[2] type[4] bsize[4] blocks[8] bfree[8] bavail[8]
files[8] ffree[8] fsid[8] namelen[4]
The statfs message is used to request file system information returned
by the statfs(2) system call, which is used by df(1) to report file
system and disk space usage.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This race was triggered by a 'conntrack -F' command running in parallel
to the insertion of a hash for a new connection. Losing this race led to
a dead conntrack entry effectively blocking traffic for a particular
connection until timeout or flushing the conntrack hashes again.
Now the check for an already dying connection is done inside the lock.
Signed-off-by: Joerg Marx <joerg.marx@secunet.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch replaces the boolean dead flag on inet6_ifaddr with
a state enum. This allows us to roll back changes when deleting
an address according to whether DAD has completed or not.
This patch only adds the state field and does not change the logic.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.
There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the x25 accept approve flag from char into bitfield.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the x25 interrupt flag from char into bitfield.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_input() is the version returning a refcounted dst, while
ip_route_input_noref() returns a non refcounted one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use low order bit of skb->_skb_dst to tell dst is not refcounted.
Change _skb_dst to _skb_refdst to make sure all uses are catched.
skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.
New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)
skb_dst_drop() drops a reference only if skb dst was refcounted.
skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.
Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().
Use skb_dst_force() in dev_requeue_skb().
Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP-MD5 sessions have intermittent failures, when route cache is
invalidated. ip_queue_xmit() has to find a new route, calls
sk_setup_caps(sk, &rt->u.dst), destroying the
sk->sk_route_caps &= ~NETIF_F_GSO_MASK
that MD5 desperately try to make all over its way (from
tcp_transmit_skb() for example)
So we send few bad packets, and everything is fine when
tcp_transmit_skb() is called again for this socket.
Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.
Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.
We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.
Fix is to disable preemption and BH.
Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)
This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.
The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate comments in TIPC's main API files that are either obsolete,
incorrect, misleading, or unhelpful. It also adds in one new comment.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for offloading the channel switch
operation to devices that support such, typically
by having specific firmware API for it. The reasons
for this could be that the firmware provides better
timing or that regulatory enforcement done by the
device requires special handling of CSAs.
In order to allow drivers to specify the timing to
the device, the new channel_switch callback will
pass through the received frame's mactime, where
available.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds support for multiple independant multicast routing instances,
named "tables".
Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT6_TABLE. The table number is
stored in the raw socket data and affects all following ip6mr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
is created with a default routing rule pointing to it. Newly created pim6reg
devices have the table number appended ("pim6regX"), with the exception of
devices created in the default table, which are named just "pim6reg" for
compatibility reasons.
Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.
Example usage:
- bind pimd/xorp/... to a specific table:
uint32_t table = 123;
setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));
- create routing rules directing packets to the new table:
# ip -6 mrule add iif eth0 lookup 123
# ip -6 mrule add oif eth0 lookup 123
Signed-off-by: Patrick McHardy <kaber@trash.net>
The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.
As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Instead of having a global workqueue for all controllers, it makes
more sense to have a workqueue per controller.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
l2cap_ertm_send() can be called both from user context and bottom half
context. The socket locks for that contexts are different, the user
context uses a mutex(which can sleep) and the second one uses a
spinlock_bh. That creates a race condition when we have interruptions on
both contexts at the same time.
The better way to solve this is to add a new spinlock to lock
l2cap_ertm_send() and the vars it access. The other solution was to defer
l2cap_ertm_send() with a workqueue, but we the sending process already
has one defer on the hci layer. It's not a good idea add another one.
The patch refactor the code to create l2cap_retransmit_frames(), then we
encapulate the lock of l2cap_ertm_send() for some call. It also changes
l2cap_retransmit_frame() to l2cap_retransmit_one_frame() to avoid
confusion
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Supports Local Busy condition handling through a waitqueue that wake ups
each 200ms and try to push the packets to the upper layer. If it can
push all the queue then it leaves the Local Busy state.
The patch modifies the behaviour of l2cap_ertm_reassembly_sdu() to
support retry of the push operation.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
hci_send_acl can't fail, so we can make it void. This patch changes
that and all the funcions that use hci_send_acl().
That change exposed a bug on sending connectionless data. We were not
reporting the lenght send back to the user space.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
With the sockopt extension we can set a per-channel MaxTx value.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that we can set the txWindow we need to change the acknowledgement
procedure to ack after each (pi->txWindow/6 + 1). The plus 1 is to avoid
the zero value.
It also renames pi->num_to_ack to a better name: pi->num_acked.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now we can set/get Transmission Window size via sockopt.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We were accepting values bigger than we can accept. This was leading
ERTM to drop packets because of wrong FCS checks.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We ack I-frames on each txWindow/5 I-frames received, but if the sender
stop to send I-frames and it's not a txWindow multiple we can leave some
frames unacked.
So I added a timer to ack I-frames on this case. The timer expires in
200ms.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
After receive a RR with P bit set ERTM shall use this funcion to choose
what type of frame to reply with F bit = 1.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It also removes an unneeded check for the MTU. The check is done before
on sco_send_frame()
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant. This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when one interface switches HT mode,
all others will follow along. This is clearly
undesirable, since the new one might switch to
no-HT while another one is operating in HT.
Address this issue by keeping track of the HT
mode per interface, and allowing only changes
that are compatible, i.e. switching into HT40+
is not possible when another interface is in
HT40-, in that case the second one needs to
fall back to HT20.
Also, to allow drivers to know what's going on,
store the per-interface HT mode (channel type)
in the virtual interface's bss_conf.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently (all tested with hwsim) you can do stupid
things like setting up an AP on a certain channel,
then adding another virtual interface and making
that associate on another channel -- this will make
the beaconing to move channel but obviously without
the necessary IEs data update.
In order to improve this situation, first make the
configuration APIs (cfg80211 and nl80211) aware of
multi-channel operation -- we'll eventually need
that in the future anyway. There's one userland API
change and one API addition. The API change is that
now SET_WIPHY must be called with virtual interface
index rather than only wiphy index in order to take
effect for that interface -- luckily all current
users (hostapd) do that. For monitor interfaces, the
old setting is preserved, but monitors are always
slaved to other devices anyway so no guarantees.
The second userland API change is the introduction
of a per virtual interface SET_CHANNEL command, that
hostapd should use going forward to make it easier
to understand what's going on (it can automatically
detect a kernel with this command).
Other than mac80211, no existing cfg80211 drivers
are affected by this change because they only allow
a single virtual interface.
mac80211, however, now needs to be aware that the
channel settings are per interface now, and needs
to disallow (for now) real multi-channel operation,
which is another important part of this patch.
One of the immediate benefits is that you can now
start hostapd to operate on a hardware that already
has a connection on another virtual interface, as
long as you specify the same channel.
Note that two things are left unhandled (this is an
improvement -- not a complete fix):
* different HT/no-HT modes
currently you could start an HT AP and then
connect to a non-HT network on the same channel
which would configure the hardware for no HT;
that can be fixed fairly easily
* CSA
An AP we're connected to on a virtual interface
might indicate switching channels, and in that
case we would follow it, regardless of how many
other interfaces are operating; this requires
more effort to fix but is pretty rare after all
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When reconfiguring an interface due to a previous
hardware restart, mac80211 will currently include
the new IBSS flag on non-IBSS interfaces which may
confuse drivers.
Instead of doing the ~0 trick, simply spell out
which things are going to be reconfigured.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ICMP protocol unreachable handling completely disregarded
the fact that the user may have locked the socket. It proceeded
to destroy the association, even though the user may have
held the lock and had a ref on the association. This resulted
in the following:
Attempt to release alive inet socket f6afcc00
=========================
[ BUG: held lock freed! ]
-------------------------
somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
there!
(sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
1 lock held by somenu/2672:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
stack backtrace:
Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
Call Trace:
[<c1232266>] ? printk+0xf/0x11
[<c1038553>] debug_check_no_locks_freed+0xce/0xff
[<c10620b4>] kmem_cache_free+0x21/0x66
[<c1185f25>] __sk_free+0x9d/0xab
[<c1185f9c>] sk_free+0x1c/0x1e
[<c1216e38>] sctp_association_put+0x32/0x89
[<c1220865>] __sctp_connect+0x36d/0x3f4
[<c122098a>] ? sctp_connect+0x13/0x4c
[<c102d073>] ? autoremove_wake_function+0x0/0x33
[<c12209a8>] sctp_connect+0x31/0x4c
[<c11d1e80>] inet_dgram_connect+0x4b/0x55
[<c11834fa>] sys_connect+0x54/0x71
[<c103a3a2>] ? lock_release_non_nested+0x88/0x239
[<c1054026>] ? might_fault+0x42/0x7c
[<c1054026>] ? might_fault+0x42/0x7c
[<c11847ab>] sys_socketcall+0x6d/0x178
[<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c1002959>] syscall_call+0x7/0xb
This was because the sctp_wait_for_connect() would aqcure the socket
lock and then proceed to release the last reference count on the
association, thus cause the fully destruction path to finish freeing
the socket.
The simplest solution is to start a very short timer in case the socket
is owned by user. When the timer expires, we can do some verification
and be able to do the release properly.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this one:
include/net/sock.h: error: two or more data types in declaration specifiers
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.
RCU conversion is pretty much needed :
1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).
[Future patch will add a list anchor for wakeup coalescing]
2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().
3) Respect RCU grace period when freeing a "struct socket_wq"
4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"
5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep
6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.
7) Change all sk_has_sleeper() callers to :
- Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
- Use wq_has_sleeper() to eventually wakeup tasks.
- Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
8) sock_wake_async() is modified to use rcu protection as well.
9) Exceptions :
macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.
Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we create the sctp_datamsg and fragment the user data,
we know exactly if we are sending full segments or not and
how they might be bundled. During this time, we can mark
messages a Nagle capable or not. This makes the check at
transmit time much simpler.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Right now, sctp transports are not fully initialized and when
adding any new fields, they have to be explicitely initialized.
This is prone to mistakes. So we switch to calling kzalloc()
which makes things much simpler.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The 'resent' bit is used to make sure that we don't update
rto estimate based on retransmitted chunks. However, we already
have the 'rto_pending' bit that we test when need to update rto,
so 'resent' bit is just extra. Additionally, we currently have
a bug in that we always set a 'resent' bit and thus rto estimate
is only updated by Heartbeats.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
In current implementation if ABORT chunk is received with T flag is set
and zero verification tag in COOKIE-WAIT state, the ABORT chunk will be
always accepted. This is because in COOKIE-WAIT state, the endpoint does
not know the peer's verification tag, and it's zero in the endpoint.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
sock_recv_ts_and_drops() is fat and slow (~ 4% of cpu time on some
profiles)
We can test all socket flags at once to make fast path fast again.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When queueing a skb to socket, we can immediately release its dst if
target socket do not use IP_CMSG_PKTINFO.
tcp_data_queue() can drop dst too.
This to benefit from a hot cache line and avoid the receiver, possibly
on another cpu, to dirty this cache line himself.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 95766fff ([UDP]: Add memory accounting.),
each received packet needs one extra sock_lock()/sock_release() pair.
This added latency because of possible backlog handling. Then later,
ticket spinlocks added yet another latency source in case of DDOS.
This patch introduces lock_sock_bh() and unlock_sock_bh()
synchronization primitives, avoiding one atomic operation and backlog
processing.
skb_free_datagram_locked() uses them instead of full blown
lock_sock()/release_sock(). skb is orphaned inside locked section for
proper socket memory reclaim, and finally freed outside of it.
UDP receive path now take the socket spinlock only once.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ok, version 4
Change Notes:
1) Minor cleanups, from Vlads notes
Summary:
Hey-
Recently, it was reported to me that the kernel could oops in the
following way:
<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU: 0
<5> EIP: 0060:[<c02bff27>] Not tainted VLI
<5> EFLAGS: 00010216 (2.6.9-89.0.25.EL)
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44
<5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40
<5> ds: 007b es: 007b ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d
<5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490
<5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004
<5> Call Trace:
<5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5> [<c01555a4>] cache_grow+0x140/0x233
<5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5> [<c02d005e>] nf_iterate+0x40/0x81
<5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5> [<c02d0362>] nf_hook_slow+0x83/0xb5
<5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5> [<c02e103e>] ip_rcv+0x334/0x3b4
<5> [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5> [<c02c67a4>] process_backlog+0x6c/0xd9
<5> [<c02c690f>] net_rx_action+0xfe/0x1f8
<5> [<c012a7b1>] __do_softirq+0x35/0x79
<5> [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5> [<c01094de>] do_softirq+0x46/0x4d
Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.
The problem is in sctp_process_unk_param:
if (NULL == *errp)
*errp = sctp_make_op_error_space(asoc, chunk,
ntohs(chunk->chunk_hdr->length));
if (*errp) {
sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
WORD_ROUND(ntohs(param.p->length)));
sctp_addto_chunk(*errp,
WORD_ROUND(ntohs(param.p->length)),
param.v);
When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter. Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
I've tested the below fix and confirmed that it fixes the issue. We move to a
strategy whereby we allocate a fixed size error chunk and ignore errors we don't
have space to report. Tested by me successfully
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers (e.g. iwlwifi) need to know and try
to figure it out based on other things, but making
it explicit is definitely better.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Changes:
o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous
disconnect, not waiting for any response from the modem. Due to this
the function cfcnfg_linkdestroy_rsp does nothing anymore.
o Because disconnect may take down a connection before a connect response
is received the function cfcnfg_linkup_rsp is checking if the client is
still waiting for the response, if not a disconnect request is sent to
the modem.
o cfctrl is no longer keeping track of pending disconnect requests.
o Added function cfctrl_cancel_req, which is used for deleting a pending
connect request if disconnect is done before connect response is received.
o Removed unused function cfctrl_insert_req2
o Added better handling of connect reject from modem.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes:
o Added functions cfsrvl_get and cfsrvl_put.
o Added support release_client to use by socket and net device.
o Increase reference counting for in-flight packets from cfmuxl
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes:
o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
o Fixed typo cfcfg to cfcnfg
o Renamed linkid to channel_id
o Updated documentation in caif_dev.h
o Minor formatting changes
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
not be used in this case. Therefore, we have to make a new function
sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.33-rc6 #129
---------------------------------------------------------
sctp_darn/1517 just changed the state of lock:
(clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80
but this lock took another, SOFTIRQ-unsafe lock in the past:
(slock-AF_INET){+.-...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
1 lock held by sctp_darn/1517:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp]
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now there's no need to use this fuction directly because it's handled by
register_pernet_device. So to make this simple and easy to understand,
make this static to do not tempt potentional users.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.
We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.
Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.
Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Idea from Eric Dumazet.
As for placement inside of struct sock, I tried to choose a place
that otherwise has a 32-bit hole on 64-bit systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
When scanning, it is somewhat important to scan
on the correct virtual interface. All drivers
that currently implement hw_scan only support a
single virtual interface, but that may change
and then we'd want to be ready.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Now that the mac80211 is choosing dynamic ps timeouts based on the ps-qos
network latency configuration, configure a default value of -1 as the dynamic
ps timeout in cfg80211. This value allows the mac80211 to determine the value
to be used.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Determine the dynamic PS timeout based on the configured ps-qos network
latency. For backwards wext compatibility, allow the dynamic PS timeout
configured by the cfg80211 to overrule the automatically determined value.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is used to configure APs to not bridge traffic between connected stations.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
__sk_dst_set() might be called while no state can be integrated in a
rcu_dereference_check() condition.
So use rcu_dereference_raw() to shutup lockdep warnings (if
CONFIG_PROVE_RCU is set)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
const qualifier on sock argument is misleading, since we can modify rxhash.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 1122 says the following:
...
Keep-alive packets MUST only be sent when no data or
acknowledgement packets have been received for the
connection within an interval.
...
The acknowledgement packet is reseting the keepalive
timer but the data packet isn't. This patch fixes it by
checking the timestamp of the last received data packet
too when the keepalive timer expires.
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls to twsk_net() are in some cases protected by reference counting
as an alternative to RCU protection. Cases covered by reference counts
include __inet_twsk_kill(), inet_twsk_free(), inet_twdr_do_twkill_work(),
inet_twdr_twcal_tick(), and tcp_timewait_state_process(). RCU is used
by inet_twsk_purge(). Locking is used by established_get_first()
and established_get_next(). Finally, __inet_twsk_hashdance() is an
initialization case.
It appears to be non-trivial to locate the appropriate locks and
reference counts from within twsk_net(), so used rcu_dereference_raw().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_rules_register() duplicates the template passed to it without modification,
mark the argument as const. Additionally the templates are only needed when
instantiating a new namespace, so mark them as __net_initdata, which means
they can be discarded when CONFIG_NET_NS=n.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Finally add support to detect a local IPV6_DONTFRAG event
and return the relevant data to the user if they've enabled
IPV6_RECVPATHMTU on the socket. The next recvmsg() will
return no data, but have an IPV6_PATHMTU as ancillary data.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add dontfrag argument to relevant functions for
IPV6_DONTFRAG support, as well as allowing the value
to be passed-in via ancillary cmsg data.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
V2 Feedback from John Hughes.
- Add header for userspace implementations such as xot/xoe to use
- Use explicit values for interface stability
- No changes to driver patches
V1
- Use identifiers instead of magic numbers for X25 layer 3 to device interface.
- Also fixed checkpatch notes on updated code.
[ Add new user header to include/linux/Kbuild -DaveM ]
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter. Without
doing this, the counter can be off by orders of magnitude from the
actual number of segments sent.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I updated this from the corresponding
userspace library, an annotation error crept
in -- this variable needs to be annotated as
little endian. No effect on code generation.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Define a new function to return the waitqueue of a "struct sock".
static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
return sk->sk_sleep;
}
Change all read occurrences of sk_sleep by a call to this function.
Needed for a future RCU conversion. sk_sleep wont be a field directly
available.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the survey function to both mac80211 itself and to mac80211_hwsim.
For the latter driver, we simply invent some noise level.A real driver which
cannot determine the real channel noise MUST NOT report any noise, especially
not a magically conjured one :-)
Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When I set up multiple VAPs with ath9k, I encountered an issue that
the traffic may be lost after a while.
The detailed phenomenon is
1. After a while the clients connected to one of these VAPs will get
into a state that no broadcast/multicast packets can be transfered
successfully while the unicast packets can be transfered normally.
2. Minutes latter the unitcast packets transfer will fail as well,
because the ARP entry is expired and it can't be freshed due to the
broadcast trouble.
It's caused by the group key overwritten and someone discussed this
issue in ath9k-devel maillist before, but haven't work out a fix yet.
I referred the method in madwifi, and made a patch for ath9k.
The method is to set the high bit of the sender(AP)'s address, and
associated that mac and the group key. It requires the hardware
supports multicast frame key search. It seems true for AR9160.
Not sure whether it's the correct way to fix this issue. But it seems
to work in my test. The patch is attached, feel free to revise it.
Signed-off-by: Daniel Yingqiang ma <yma.cool@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The flag is called IEEE80211_TX_STAT_AMPDU rather than using the whole word
STATUS.
Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch implements receive flow steering (RFS). RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running. RFS is an
extension of Receive Packet Steering (RPS).
The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure. The rxhash is passed in skb's received on
the connection from netif_receive_skb. For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.
The convolution of the simple approach is that it would potentially
allow OOO packets. If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.
To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.
rps_sock_table is a global hash table. Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.
rps_dev_flow_table is specific to each device queue. Each entry
contains a CPU and a tail queue counter. The CPU is the "current"
CPU for a matching flow. The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.
Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length. When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.
And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted. When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:
- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter >= queue tail counter in the
rps_dev_flow table. This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.
Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality. 2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.
This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.
There are two configuration parameters for RFS. The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue. Both are rounded to power of two.
The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).
The benefits of RFS are dependent on cache hierarchy, application
load, and other factors. On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation. However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.
Below are some benchmark results which show the potential benfit of
this patch. The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp. The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.
e1000e on 8 core Intel
No RFS or RPS 104K tps at 30% CPU
No RFS (best RPS config): 290K tps at 63% CPU
RFS 303K tps at 61% CPU
RPC test tps CPU% 50/90/99% usec latency Latency StdDev
No RFS/RPS 103K 48% 757/900/3185 4472.35
RPS only: 174K 73% 415/993/2468 491.66
RFS 223K 73% 379/651/1382 315.61
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LDPC will be enabled through the rate control algorithm
for each buffer the the tx_info flags.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As Herbert Xu said: we should be able to simply replace ipfragok
with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
has droped ipfragok and set local_df value properly.
The patch kills the ipfragok parameter of .queue_xmit().
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix IP DNAT on vlan- or pppoe-encapsulated traffic: The functions
neigh_hh_output() or dst->neighbour->output() overwrite the complete
Ethernet header, although we only need the destination MAC address.
For encapsulated packets, they ended up overwriting the encapsulating
header. The new code copies the Ethernet source MAC address and
protocol number before calling dst->neighbour->output(). The Ethernet
source MAC and protocol number are copied back in place in
br_nf_pre_routing_finish_bridge_slow(). This also makes the IP DNAT
more transparent because in the old scheme the source MAC of the
bridge was copied into the source address in the Ethernet header. We
also let skb->protocol equal ETH_P_IP resp. ETH_P_IPV6 during the
execution of the PF_INET resp. PF_INET6 hooks.
- Speed up IP DNAT by calling neigh_hh_bridge() instead of
neigh_hh_output(): if dst->hh is available, we already know the MAC
address so we can just copy it.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds support for multiple independant multicast routing instances,
named "tables".
Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT_TABLE. The table number is
stored in the raw socket data and affects all following ipmr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT)
is created with a default routing rule pointing to it. Newly created pimreg
devices have the table number appended ("pimregX"), with the exception of
devices created in the default table, which are named just "pimreg" for
compatibility reasons.
Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.
Example usage:
- bind pimd/xorp/... to a specific table:
uint32_t table = 123;
setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table));
- create routing rules directing packets to the new table:
# ip mrule add iif eth0 lookup 123
# ip mrule add oif eth0 lookup 123
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.
As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A following patch will use struct raw_sock to store state for ipmr,
so having the definitions in icmp.h doesn't fit very well anymore.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both functions are equivalent, consolidate them since a following patch
needs a third implementation for multicast routing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
work.
sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)
This rwlock is readlocked for a very small amount of time, and dst
entries are already freed after RCU grace period. This calls for RCU
again :)
This patch converts sk_dst_lock to a spinlock, and use RCU for readers.
__sk_dst_get() is supposed to be called with rcu_read_lock() or if
socket locked by user, so use appropriate rcu_dereference_check()
condition (rcu_read_lock_held() || sock_owned_by_user(sk))
This patch avoids two atomic ops per tx packet on UDP connected sockets,
for example, and permits sk_dst_lock to be much less dirtied.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet: Remove unused send_check length argument
This patch removes the unused length argument from the send_check
function in struct inet_connection_sock_af_ops.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yinghai <yinghai.lu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Timo Teräs <timo.teras@iki.fi>
Happens because CONFIG_XFRM_SUB_POLICY is not enabled, and one of
the helper functions I used did unexpected things in that case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a patch to stop X.25 examining fields beyond the end of the packet.
For example, when a simple CALL ACCEPTED was received:
10 10 0f
x25_parse_facilities was attempting to decode the FACILITIES field, but this
packet contains no facilities field.
Signed-off-by: John Hughes <john@calva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cfg80211 is quite strict on allowing authentication and association
commands only in certain states. In order to meet these requirements,
user space applications may need to clear authentication or
association state in some cases. Currently, this can be done with
deauth/disassoc command, but that ends up sending out Deauthentication
or Disassociation frame unnecessarily. Add a new nl80211 attribute to
allow this sending of the frame be skipped, but with all other
deauth/disassoc operations being completed.
Similar state change is also needed for IEEE 802.11r FT protocol in
the FT-over-DS case which does not use Authentication frame exchange
in a transition to another BSS. For this to work with cfg80211, an
authentication entry needs to be created for the target BSS without
sending out an Authentication frame. The nl80211 authentication
command can be used for this purpose, too, with the new attribute to
indicate that the command is only for changing local state. This
enables wpa_supplicant to complete FT-over-DS transition successfully.
Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.
This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to validate the cached object before returning it.
It also allows to destruct object properly, if the last reference
was held in flow cache. This is also a prepartion for caching
bundles in the flow cache.
In return for virtualizing the methods, we save on:
- not having to regenerate the whole flow cache on policy removal:
each flow matching a killed policy gets refreshed as the getter
function notices it smartly.
- we do not have to call flow_cache_flush from policy gc, since the
flow cache now properly deletes the object if it had any references
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: saving negative to unsigned char
9p: return on mutex_lock_interruptible()
9p: Creating files with names too long should fail with ENAMETOOLONG.
9p: Make sure we are able to clunk the cached fid on umount
9p: drop nlink remove
fs/9p: Clunk the fid resulting from partial walk of the name
9p: documentation update
9p: Fix setting of protocol flags in v9fs_session_info structure.
dcache prune happen on umount. So we cannot mark the client
satus disconnect. That will prevent a 9p call to the server
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Converts the list and the core manipulating with it to be the same as uc_list.
+uses two functions for adding/removing mc address (normal and "global"
variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
manipulation with lists on a sandbox (used in bonding and 80211 drivers)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The head element of rt6_info{} is dst_entry{}, and
IPv6 specific elements follow.
Because elements at the end of dst_entry{} are frequently
updated, it is not good to put frequently-used static
elements, such as rt6i_idev, rt6i_dst or rt6i_flags in the
same cache line.
On the other hand, fib6_table, rt6i_node or rt6i_gateway are
rarely used, so it is okay to stay in the same cache line.
Let's rearrange rt6_info{}.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of my test machine got a deadlock during "tc" sessions,
adding/deleting classes & filters, using traffic estimators.
After some analysis, I believe we have a potential use after free case
in est_timer() :
spin_lock(e->stats_lock); << HERE >>
read_lock(&est_lock);
if (e->bstats == NULL) << TEST >>
goto skip;
Test is done a bit late, because after estimator is killed, and before
rcu grace period elapsed, we might already have freed/reuse memory where
e->stats_locks points to (some qdisc->q.lock)
A possible fix is to respect a rcu grace period at Qdisc dismantle time.
On 64bit, sizeof(struct Qdisc) is exactly 192 bytes. Adding 16 bytes to
it (for struct rcu_head) is a problem because it might change
performance, given QDISC_ALIGNTO is 32 bytes.
This is why I also change QDISC_ALIGNTO to 64 bytes, to satisfy most
current alignment requirements.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add interface to disable/enable QoS (aka WMM or WME). Currently drivers
enable it explicitly when ->conf_tx method is called, and newer disable.
Disabling is needed for some APs, which do not support QoS, such
we should send QoS frames to them.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mac80211 drivers can now pass paged SKBs to mac80211 via
ieee80211_rx{_irqsafe}. The implementation currently use
skb_linearize() in a few places i.e. management frame
handling, software decryption, defragmentation and A-MSDU
process. We will optimize them one by one later.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Cc: Kalle Valo <kalle.valo@iki.fi>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Because we have ensured that the argument is non-zero,
it is better to use __fls() and generate better code.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Header files for CAIF Link layer net-device,
and link-layer registration.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add include files for the CAIF Core protocol stack.
caif_layer.h - Defines the structure of the CAIF protocol layers
cfcnfg.h - CAIF Configuration Module for services and link layers
cfctrl.h - CAIF Control Protocol Layer
cffrml.h - CAIF Framing Layer
cfmuxl.h - CAIF Muxing Layer
cfpkt.h - CAIF Packet layer (skb helper functions)
cfserl.h - CAIF Serial Layer
cfsrvl.h - CAIF Service Layer
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Add support for the set_cqm_config op. This op function configures the
requested connection quality monitor rssi threshold and rssi hysteresis
values to the hardware if the hardware supports
IEEE80211_HW_SUPPORTS_CQM.
For unsupported hardware, currently -EOPNOTSUPP is returned, so the mac80211
is currently not doing connection quality monitoring on the host. This could be
added later, if needed.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add support for basic configuration of a connection quality monitoring to the
nl80211 interface, and basic support for notifying about triggered monitoring
events.
Via this interface a user-space connection manager may configure and receive
pre-warning events of deteriorating WLAN connection quality, and start
preparing for roaming in advance, before the connection is already lost.
An example usage of such a trigger is starting scanning for nearby AP's in
an attempt to find one with better connection quality, and associate to it
before the connection characteristics of the existing connection become too bad
or the association is even lost, leading in a prolonged delay in connectivity.
The interface currently supports only RSSI, but it could be later extended
to include other parameters, such as signal-to-noise ratio, if need for that
arises.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch is based on a RFC patch by Kalle Valo.
The wl1271 has a feature which handles the connection monitor logic
in hardware, basically sending periodically nullfunc frames and reporting
to the host if AP is lost, after attempting to recover by sending
probe-requests to the AP.
Add support to mac80211 by adding a new flag IEEE80211_HW_CONNECTION_MONITOR
which prevents conn_mon_timer from triggering during idle periods, and
prevents sending probe-requests to the AP if beacon-loss is indicated by the
hardware.
Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There is no point to align or pad mibs to cache lines, they are per cpu
allocated with a 8 bytes alignment anyway.
This wastes space for no gain. This patch removes __SNMP_MIB_ALIGN__
Since SNMP mibs contain "unsigned long" fields only, we can relax the
allocation alignment from "unsigned long long" to "unsigned long"
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the debug files ended up wrongly in sysfs, because at that point
of time, debugfs didn't exist. Convert these files to use debugfs and
also seq_file. This patch converts all of these files at once and then
removes the exported symbol for the Bluetooth sysfs class.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Convert to list macro's for the list of addresses per interface
in IPv6.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert from reader/writer lock to RCU and spinlock for addrconf
hash list.
Adds an additional helper macro for hlist_for_each_entry_continue_rcu
to handle the continue case.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using hash list macros, simplifies code and helps later RCU.
This patch includes some initialization that is not strictly necessary,
since an empty hlist node/list is all zero; and list is in BSS
and node is allocated with kzalloc.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use list macros instead of open coded linked list.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a unaligned access in nla_get_be64() that was
introduced by myself in a17c859849.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: Skip check for mandatory locks when unlocking
9p: Fixes a simple bug enabling writes beyond 2GB.
9p: Change the name of new protocol from 9p2010.L to 9p2000.L
fs/9p: re-init the wstat in readdir loop
net/9p: Add sysfs mount_tag file for virtio 9P device
net/9p: Use the tag name in the config space for identifying mount point
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
bridge: ensure to unlock in error path in br_multicast_query().
drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
sky2: Avoid rtnl_unlock without rtnl_lock
ipv6: Send netlink notification when DAD fails
drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant
ipconfig: Handle devices which take some time to come up.
mac80211: Fix memory leak in ieee80211_if_write()
mac80211: Fix (dynamic) power save entry
ipw2200: use kmalloc for large local variables
ath5k: read eeprom IQ calibration values correctly for G mode
ath5k: fix I/Q calibration (for real)
ath5k: fix TSF reset
ath5k: use fixed antenna for tx descriptors
libipw: split ieee->networks into small pieces
mac80211: Fix sta_mtx unlocking on insert STA failure path
rt2x00: remove KSEG1ADDR define from rt2x00soc.h
net: add ColdFire support to the smc91x driver
asix: fix setting mac address for AX88772
ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
net: Fix dev_mc_add()
...
This patch changes the name of the new 9P protocol from 9p2010.L to
9p2000.u. This is because we learnt that the name 9p2010 is already
being used by others.
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits)
doc: fix typo in comment explaining rb_tree usage
Remove fs/ntfs/ChangeLog
doc: fix console doc typo
doc: cpuset: Update the cpuset flag file
Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed
Remove drivers/parport/ChangeLog
Remove drivers/char/ChangeLog
doc: typo - Table 1-2 should refer to "status", not "statm"
tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h
devres/irq: Fix devm_irq_match comment
Remove reference to kthread_create_on_cpu
tree-wide: Assorted spelling fixes
tree-wide: fix 'lenght' typo in comments and code
drm/kms: fix spelling in error message
doc: capitalization and other minor fixes in pnp doc
devres: typo fix s/dev/devm/
Remove redundant trailing semicolons from macros
fix typo "definetly" -> "definitely" in comment
tree-wide: s/widht/width/g typo in comments
...
Fix trivial conflict in Documentation/laptops/00-INDEX
Remove INIT_NSPROXY(), use C99 initializer.
Remove INIT_IPC_NS(), INIT_NET_NS() while I'm at it.
Note: headers trim will be done later, now it's quite pointless because
results will be invalidated by merge window.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit a43912ab19... ("tunnel: eliminate recursion field") eliminated
use of recursion field from tunnel structures, but its definition
still exists in ip6_tnl{}.
Let's remove that unused field.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The noise value as is won't be used, isn't
filled by most drivers and doesn't really
make a whole lot of sense on a per packet
basis -- proper cfg80211 survey support in
mac80211 will need to be different.
Mark the struct member as deprecated so it
will be removed from drivers.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add the "__must_check" tag to sk_add_backlog() so that any failure to
check and drop packets will be warned about.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6_PREFER_SRC_xxx definitions:
| #define IPV6_PREFER_SRC_TMP 0x0001
| #define IPV6_PREFER_SRC_PUBLIC 0x0002
| #define IPV6_PREFER_SRC_COA 0x0004
RT6_LOOKUP_F_xxx definitions:
| #define RT6_LOOKUP_F_SRCPREF_TMP 0x00000008
| #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010
| #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020
So, we can translate between these two groups by shift operation
instead of multiple 'if's.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_add_backlog -> __sk_add_backlog
sk_add_backlog_limited -> sk_add_backlog
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We got system OOM while running some UDP netperf testing on the loopback
device. The case is multiple senders sent stream UDP packets to a single
receiver via loopback on local host. Of course, the receiver is not able
to handle all the packets in time. But we surprisingly found that these
packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
Instead, they are kept queuing to sk->sk_backlog and finally ate up all
the memory. We believe this is a secure hole that a none privileged user
can crash the system.
The root cause for this problem is, when the receiver is doing
__release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
sk_receive_queue with the softirq enabled. In the above case, multiple
busy senders will almost make it an endless loop. The skbs in the
backlog end up eat all the system memory.
The issue is not only for UDP. Any protocols using socket backlog is
potentially affected. The patch adds limit for socket backlog so that
the backlog size cannot be expanded endlessly.
Reported-by: Alex Shi <alex.shi@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes 'dotu' variable and make everything dependent
on 'proto_version' field.
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Add new mount V9FS mount option to specify protocol version
This patch adds a new mount option to specify protocol version.
With this option it is possible to use "-o version=" switch to
specify 9P protocol version to use. Valid options for version
are:
9p2000
9p2000.u
9p2010.L
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Decreases the odds wakee will suffer from frequent cache misses.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handling HT configuration changes involved setting the channel
with the new HT parameters and then issuing a rate_update()
notification to the driver.
This behavior changed after the off-channel changes. Now, the channel
is not updated with the new HT params in enable_ht() - instead, it
is now done when the scan work terminates. This results in the driver
depending on stale information, defaulting to non-HT mode always.
Fix this by passing the new channel type to the driver.
Cc: stable@kernel.org
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When I merged the bundle creation code, I introduced a bogus
flowi value in the bundle. Instead of getting from the caller,
it was instead set to the flow in the route object, which is
totally different.
The end result is that the bundles we created never match, and
we instead end up with an ever growing bundle list.
Thanks to Jamal for find this problem.
Reported-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the Bluetooth 3.0 specification and the introduction of alternate
MAC/PHY (AMP) support, it is required to differentiate between primary
BR/EDR controllers and 802.11 AMP controllers. So introduce a special
type inside HCI device for differentiation.
For now all AMP controllers will be treated as raw devices until an
AMP manager has been implemented.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The output of the inquiry cache is only useful for debugging purposes
and so move it into debugfs.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
commit e8469ed959c373c2ff9e6f488aa5a14971aebe1f
Author: Patrick McHardy <kaber@trash.net>
Date: Tue Feb 23 20:41:30 2010 +0100
Support specifying the initial device flags when creating a device though
rtnl_link. Devices allocated by rtnl_create_link() are marked as INITIALIZING
in order to surpress netlink registration notifications. To complete setup,
rtnl_configure_link() must be called, which performs the device flag changes
and invokes the deferred notifiers if everything went well.
Two examples:
# add macvlan to eth0
#
$ ip link add link eth0 up allmulticast on type macvlan
[LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
link/ether 26:f8:84:02:f9:2a brd ff:ff:ff:ff:ff:ff
[ROUTE]ff00::/8 dev macvlan0 table local metric 256 mtu 1500 advmss 1440 hoplimit 0
[ROUTE]fe80::/64 dev macvlan0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0
[LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500
link/ether 26:f8:84:02:f9:2a
[ADDR]11: macvlan0 inet6 fe80::24f8:84ff:fe02:f92a/64 scope link
valid_lft forever preferred_lft forever
[ROUTE]local fe80::24f8:84ff:fe02:f92a via :: dev lo table local proto none metric 0 mtu 16436 advmss 16376 hoplimit 0
[ROUTE]default via fe80::215:e9ff:fef0:10f8 dev macvlan0 proto kernel metric 1024 mtu 1500 advmss 1440 hoplimit 0
[NEIGH]fe80::215:e9ff:fef0:10f8 dev macvlan0 lladdr 00:15:e9:f0:10:f8 router STALE
[ROUTE]2001:6f8:974::/64 dev macvlan0 proto kernel metric 256 expires 0sec mtu 1500 advmss 1440 hoplimit 0
[PREFIX]prefix 2001:6f8:974::/64 dev macvlan0 onlink autoconf valid 14400 preferred 131084
[ADDR]11: macvlan0 inet6 2001:6f8:974:0:24f8:84ff:fe02:f92a/64 scope global dynamic
valid_lft 86399sec preferred_lft 14399sec
# add VLAN to eth1, eth1 is down
#
$ ip link add link eth1 up type vlan id 1000
RTNETLINK answers: Network is down
<no events>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4291 section 2.4 states that all uncategorized addresses
should be considered as Global Unicast.
This will remove IPV6_ADDR_RESERVED completely
and return IPV6_ADDR_UNICAST in ipv6_addr_type() instead.
Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update rcu_dereference() primitives to use new lockdep-based
checking. The rcu_dereference() in __in6_dev_get() may be
protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
The rcu_dereference() in __sk_free() is protected by the fact
that it is never reached if an update could change it. Check
for this by using rcu_dereference_check() to verify that the
struct sock's ->sk_wmem_alloc counter is zero.
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pass mark to all SP lookups to prepare them for when we add code
to have them search.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
pass mark to all SA lookups to prepare them for when we add code
to have them search.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic structuring and accessors for xfrm mark
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert AF_PACKET to use RCU, eliminating one more reader/writer lock.
There is no need for a real sk_del_node_init_rcu(), because sk_del_node_init
is doing the equivalent thing to hlst_del_init_rcu already; but added
some comments to try and make that obvious.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The most needed command from nl80211, which Wireless Extensions had,
is support for power save mode. Add a simple command to make it possible
to enable and disable power save via nl80211.
I was also planning about extending the interface, for example adding the
timeout value, but after thinking more about this I decided not to do it.
Basically there were three reasons:
Firstly, the parameters for power save are very much hardware dependent.
Trying to find a unified interface which would work with all hardware, and
still make sense to users, will be very difficult.
Secondly, IEEE 802.11 power save implementation in Linux is still in state
of flux. We have a long way to still to go and there is no way to predict
what kind of implementation we will have after few years. And because we
need to support nl80211 interface a long time, practically forever, adding
now parameters to nl80211 might create maintenance problems later on.
Third issue are the users. Power save parameters are mostly used for
debugging, so debugfs is better, more flexible, interface for this.
For example, wpa_supplicant currently doesn't configure anything related
to power save mode. It's better to strive that kernel can automatically
optimise the power save parameters, like with help of pm qos network
and other traffic parameters.
Later on, when we have better understanding of power save, we can extend
this command with more features, if there's a need for that.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch enables fast retransmissions after one dupACK for
TCP if the stream is identified as thin. This will reduce
latencies for thin streams that are not able to trigger fast
retransmissions due to high packet interarrival time. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin.
Signed-off-by: Andreas Petlund <apetlund@simula.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch will make TCP use only linear timeouts if the
stream is thin. This will help to avoid the very high latencies
that thin stream suffer because of exponential backoff. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin. A maximum of 6 linear
timeouts is tried before exponential backoff is resumed.
Signed-off-by: Andreas Petlund <apetlund@simula.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inline function to dynamically detect thin streams based on
the number of packets in flight. Used to dynamically trigger
thin-stream mechanisms if enabled by ioctl or sysctl.
Signed-off-by: Andreas Petlund <apetlund@simula.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make remaining netlink policies as const.
Fixup coding style where needed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Randy Dunlap <randy.dunlap@oracle.com>, compilation
of nf_defrag_ipv4 fails with:
include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type
include/net/netfilter/nf_conntrack.h:178: error: 'const struct sk_buff' has no member named 'nfct'
include/net/netfilter/nf_conntrack.h:185: error: implicit declaration of function 'nf_conntrack_put'
include/net/netfilter/nf_conntrack.h:294: error: 'const struct sk_buff' has no member named 'nfct'
net/ipv4/netfilter/nf_defrag_ipv4.c:45: error: 'struct sk_buff' has no member named 'nfct'
net/ipv4/netfilter/nf_defrag_ipv4.c:46: error: 'struct sk_buff' has no member named 'nfct'
net/nf_conntrack.h must not be included with NF_CONNTRACK=n, add a
few #ifdefs. Long term the header file should be fixed to be usable
even with NF_CONNTRACK=n.
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Enhance IPVS to load balance SCTP transport protocol packets. This is done
based on the SCTP rfc 4960. All possible control chunks have been taken
care. The state machine used in this code looks some what lengthy. I tried
to make the state machine easy to understand.
Signed-off-by: Venkata Mohan Reddy Koppula <mohanreddykv@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Only used for writing, so convert to spinlock
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __percpu sparse annotations to net.
These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.
The macro and type tricks around snmp stats make things a bit
interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop computing the number of neighbour table settings we have by
counting the number of binary sysctls. This behaviour was silly
and meant that we could not add another neighbour table setting
without also adding another binary sysctl.
Don't pass the binary sysctl path for neighour table entries
into neigh_sysctl_register. These parameters are no longer
used and so are just dead code.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a new command to register for action frames
that userspace wants to handle instead of the in-kernel
rejection. It is then responsible for rejecting ones that
it decided not to handle. There is no unregistration, but
the socket can be closed for that.
Frames that are not registered for will not be forwarded
to userspace and will be rejected by the kernel, the
cfg80211 API helps implementing that.
Additionally, this patch adds a new command that allows
doing action frame transmission from userspace. It can be
used either to exchange action frames on the current
operational channel (e.g., with the AP with which we are
currently associated) or to exchange off-channel Public
Action frames with the remain-on-channel command.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Normally, each connection needs a unique identity. Conntrack zones allow
to specify a numerical zone using the CT target, connections in different
zones can use the same identity.
Example:
iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1
iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1
Signed-off-by: Patrick McHardy <kaber@trash.net>
The error handlers might need the template to get the conntrack zone
introduced in the next patches to perform a conntrack lookup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The function name must be followed by a space, hypen, space, and a
short description.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
kernel side should use uxx instead of __uxx types
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the fib size exceeds what can be dumped in a single skb, the
dump is suspended and resumed once the last skb has been received
by userspace. When the fib is changed while the dump is suspended,
the walker might contain stale pointers, causing a crash when the
dump is resumed.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
PGD 5347a067 PUD 65c7067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
...
RIP: 0010:[<ffffffffa01bce04>]
[<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
...
Call Trace:
[<ffffffff8104aca3>] ? mutex_spin_on_owner+0x59/0x71
[<ffffffffa01bd105>] inet6_dump_fib+0x11b/0x1b9 [ipv6]
[<ffffffff81371af4>] netlink_dump+0x5b/0x19e
[<ffffffff8134f288>] ? consume_skb+0x28/0x2a
[<ffffffff81373b69>] netlink_recvmsg+0x1ab/0x2c6
[<ffffffff81372781>] ? netlink_unicast+0xfa/0x151
[<ffffffff813483e0>] __sock_recvmsg+0x6d/0x79
[<ffffffff81348a53>] sock_recvmsg+0xca/0xe3
[<ffffffff81066d4b>] ? autoremove_wake_function+0x0/0x38
[<ffffffff811ed1f8>] ? radix_tree_lookup_slot+0xe/0x10
[<ffffffff810b3ed7>] ? find_get_page+0x90/0xa5
[<ffffffff810b5dc5>] ? filemap_fault+0x201/0x34f
[<ffffffff810ef152>] ? fget_light+0x2f/0xac
[<ffffffff813519e7>] ? verify_iovec+0x4f/0x94
[<ffffffff81349a65>] sys_recvmsg+0x14d/0x223
Store the serial number when beginning to walk the fib and reload
pointers when continuing to walk after a change occured. Similar
to other dumping functions, this might cause unrelated entries to
be missed when entries are deleted.
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove #ifdef at nf_ct_exp_net() by using nf_ct_net().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
nf_nat_mangle_tcp_packet() can currently only handle a single mangling
per window because it only maintains two sequence adjustment positions:
the one before the last adjustment and the one after.
This patch makes sequence number adjustment tracking in
nf_nat_mangle_tcp_packet() optional and allows a helper to manually
update the offsets after the packet has been fully handled.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Make the output a bit more informative by showing the helper an expectation
belongs to and the expectation class.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Even if the null data frame is not acked by the AP, mac80211
goes into power save. This might lead to loss of frames
from the AP.
Prevent this by restarting dynamic_ps_timer when ack is not
received for null data frames.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
get_tx_stats() driver operation is not currently used anywhere in mac80211
and there are no plans to use it in the not-so-near future. So it can go
without anyone missing it.
Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Many drivers would like to sleep during station
addition and removal, and currently have a high
complexity there from not being able to.
This introduces two new callbacks sta_add() and
sta_remove() that drivers can implement instead
of using sta_notify() and that can sleep, and
the new sta_add() callback is also allowed to
fail.
The reason we didn't do this previously is that
the IBSS code wants to insert stations from the
RX path, which is a tasklet, so cannot sleep.
This patch will keep the station allocation in
that path, but moves adding the station to the
driver out of line. Since the addition can now
fail, we can have IBSS peer structs the driver
rejected -- in that case we still talk to the
station but never tell the driver about it in
the control.sta pointer. If there will ever be
a driver that has a low limit on the number of
stations and that cannot talk to any stations
that are not known to it, we need to do come up
with a new strategy of handling larger IBSSs,
maybe quicker expiry or rejecting peers.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Upstream radiotap has adopted the namespace
proposal David Young made and I then took care
of, for which I had adapted the radiotap parser
as a library outside the kernel. This brings
the in-kernel parser up to speed.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.
Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.
We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).
If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
[Patrick: added unique slab name allocation]
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add a new target for the raw table, which can be used to specify conntrack
parameters for specific connections, f.i. the conntrack helper.
The target attaches a "template" connection tracking entry to the skb, which
is used by the conntrack core when initializing a new conntrack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Support initializing selected parameters of new conntrack entries from a
"conntrack template", which is a specially marked conntrack entry attached
to the skb.
Currently the helper and the event delivery masks can be initialized this
way.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add two masks for conntrack end expectation events to struct nf_conntrack_ecache
and use them to filter events. Their default value is "all events" when the
event sysctl is on and "no events" when it is off. A following patch will add
specific initializations. Expectation events depend on the ecache struct of
their master conntrack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated
when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is
generated when the IPS_ASSURED bit is set.
In combination with a following patch to support selective event delivery,
this can be used for "sparse" conntrack replication: start replicating the
conntrack entry after it reached the ASSURED state and that way it's SYN-flood
resistant.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Make sure not to assign a helper for a different network or transport
layer protocol to a connection.
Additionally change expectation deletion by helper to compare the name
directly - there might be multiple helper registrations using the same
name, currently one of them is chosen in an unpredictable manner and
only those expectations are removed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
When an injected frame gets buffered for a powersave STA or filtered
and retransmitted, mac80211 attempts to parse the radiotap header
again, which doesn't work because it's gone at that point.
This patch adds a new flag for checking the availability of a radiotap
header, so that it only attempts to parse it once, reusing the tx info
on the next call to ieee80211_tx().
This fixes severe issues with rekeying in AP mode.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds a new regulatory hint to be used when we know all
devices have been disconnected and idle. This can happen
when we suspend, for instance. When we disconnect we can
no longer assume the same regulatory rules learned from
a country IE or beacon hints are applicable so restore
regulatory settings to an initial state.
Since driver hints are cached on the wiphy that called
the hint, those hints are not reproduced onto cfg80211
as the wiphy will respect its own wiphy->regd regardless.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds an additional queuing strategy, called pfifo_head_drop,
to remove the oldest skb in the case of an overflow within the queue -
the head element - instead of the last skb (tail). To remove the oldest
skb in congested situations is useful for sensor network environments
where newer packets reflect the superior information.
Reviewed-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm not sure about rcu stuff near kmem cache destruction:
* checks for non-empty hashes look bogus, they're done _before_
rcu_berrier()
* unregistering netns ops is done before kmem_cache destoy
(as it should), and unregistering involves rcu barriers by itself
So it looks nothing should be done.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because DTIM information is required for powersave
but is only conveyed in beacons, wait for a beacon
before enabling powersave, and change the way the
information is conveyed to the driver accordingly.
mwl8k doesn't currently seem to implement PS but
requires the DTIM period in a different way; after
talking to Lennert we agreed to just have mwl8k do
the parsing itself in the finalize_join work.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This new function (previously a static function
called just "find_ie" can be used to find a
specific IE in a buffer of IEs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes noticed that I had incorrectly documented the context of
update_tkip_key() driver operation. It must be atomic because all
RX code is run inside rcu critical section.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.
Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.
Reorder GC threshold initialization so it'd be done before registering
XFRM policies.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"ip xfrm state|policy count" report SA/SP count from init_net,
not from netns of caller process.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a device has multiple MAC addresses, userspace will
need to know about that. Similarly, if it allows the
MAC addresses to vary by a bitmask.
If a driver exports multiple addresses, it is assumed
that it will be able to deal with that many different
addresses, which need not necessarily match the ones
programmed into the device; if a mask is set then the
device should deal addresses within that mask based
on an arbitrary "base address".
To test it all and show how it is used, add support
to hwsim even though it can't actually deal with
addresses different from the default.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When a TKIP key is updated, we should pass the station
pointer instead of just the address, since drivers can
use that to store their own data. We also need to pass
the virtual interface pointer.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use macro to define high/low thresh value, refer to IPV6_FRAG_TIMEOUT.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
In an earlier commit,
mac80211: disable software retry for now
Pavel Roskin reported a problem that seems to be due to
software retry of already transmitted frames. It turns
out that we've never done that correctly, but due to
some recent changes it now crashes in the TX code. I've
added a comment in the patch that explains the problem
better and also points to possible solutions -- which
I can't implement right now.
I disabled software retry of failed/filtered frames
because it was broken. With the work of the previous
patches, it now becomes fairly easy to re-enable it
by adding a flag indicating that the frame shouldn't
be modified, but still running it through the transmit
handlers to populate the control information.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently we don't increment SYN-ACK timeouts & retransmissions
although we do increment the same stats for SYN. We seem to have lost
the SYN-ACK accounting with the introduction of tcp_syn_recv_timer
(commit 2248761e in the netdev-vger-cvs tree).
This patch fixes this issue. In the process we also rename the v4/v6
syn/ack retransmit functions for clarity. We also add a new
request_socket operations (syn_ack_timeout) so we can keep code in
inet_connection_sock.c protocol agnostic.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrong ax25_cb refcounting in ax25_send_frame() and by its callers can
cause timer oopses (first reported with 2.6.29.6 kernel).
Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14905
Reported-by: Bernard Pidoux <bpidoux@free.fr>
Tested-by: Bernard Pidoux <bpidoux@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There has been some confusion how drivers should implement powersave
support. Improve the documentation a bit to make it more clear what
drivers need to do. Also mention about U-APSD.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There were some warnings about missing documentation and a missing reference.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
net: fix build erros with CONFIG_BUG=n, CONFIG_GENERIC_BUG=n
ipv6: skb_dst() can be NULL in ipv6_hop_jumbo().
tg3: Update copyright and driver version
tg3: Disable 5717 serdes and B0 support
tg3: Add reliable serdes detection for 5717 A0
tg3: Fix std rx prod ring handling
tg3: Fix std prod ring nicaddr for 5787 and 57765
sfc: Fix conditions for MDIO self-test
sfc: Fix polling for slow MCDI operations
e1000e: workaround link issues on busy hub in half duplex on 82577/82578
e1000e: MDIO slow mode should always be done for 82577
ixgbe: update copyright dates
ixgbe: Do not attempt to perform interrupts in netpoll when down
cfg80211: fix refcount imbalance when wext is disabled
mac80211: fix queue selection for data frames on monitor interfaces
iwlwifi: silence buffer overflow warning
iwlwifi: disable tx on beacon update notification
iwlwifi: fix iwl_queue_used bug when read_ptr == write_ptr
mac80211: fix endian error
mac80211: add missing sanity checks for action frames
...
Fixed build errors introduced by commit 7ad6848c (ip: fix mc_loop
checks for tunnels with multicast outer addresses)
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
sky2: Fix oops in sky2_xmit_frame() after TX timeout
Documentation/3c509: document ethtool support
af_packet: Don't use skb after dev_queue_xmit()
vxge: use pci_dma_mapping_error to test return value
netfilter: ebtables: enforce CAP_NET_ADMIN
e1000e: fix and commonize code for setting the receive address registers
e1000e: e1000e_enable_tx_pkt_filtering() returns wrong value
e1000e: perform 10/100 adaptive IFS only on parts that support it
e1000e: don't accumulate PHY statistics on PHY read failure
e1000e: call pci_save_state() after pci_restore_state()
netxen: update version to 4.0.72
netxen: fix set mac addr
netxen: fix smatch warning
netxen: fix tx ring memory leak
tcp: update the netstamp_needed counter when cloning sockets
TI DaVinci EMAC: Handle emac module clock correctly.
dmfe/tulip: Let dmfe handle DM910x except for SPARC on-board chips
ixgbe: Fix compiler warning about variable being used uninitialized
netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq()
mv643xx_eth: don't include cache padding in rx desc buffer size
...
Fix trivial conflict in drivers/scsi/cxgb3i/cxgb3i_offload.c
Add Unscheduled Automatic Power-Save Delivery (U-APSD) client support. The
idea is that the data frames from the client trigger AP to send the buffered
frames with ACs which have U-APSD enabled. This decreases latency and makes it
possible to save even more power.
Driver needs to use IEEE80211_HW_UAPSD to enable the feature. The current
implementation assumes that firmware takes care of the wakeup and
hardware needing IEEE80211_HW_PS_NULLFUNC_STACK is not yet supported.
Tested with wl1251 on a Nokia N900 and Cisco Aironet 1231G AP and running
various test traffic with ping.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Store information elements from Beacon and Probe Response frames in
separate buffers to allow both sets to be made available through
nl80211. This allows user space applications to get access to IEs from
Beacon frames even if we have received Probe Response frames from the
BSS. Previously, the IEs from Probe Response frames would have
overridden the IEs from Beacon frames.
This feature is of somewhat limited use since most protocols include
the same (or extended) information in Probe Response frames. However,
there are couple of exceptions where the IEs from Beacon frames could
be of some use: TIM IE is only included in Beacon frames (and it would
be needed to figure out the DTIM period used in the BSS) and at least
some implementations of Wireless Provisioning Services seem to include
the full IE only in Beacon frames).
The new BSS attribute for scan results is added to allow both the IE
sets to be delivered. This is done in a way that maintains the
previously used behavior for applications that are not aware of the
new NL80211_BSS_BEACON_IES attribute.
Signed-off-by: Jouni Malinen <j@w1.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Certain type of hardware, for example wl1251 and wl1271, need a template
for the Probe Request. Create a function ieee80211_probereq_get() which
creates the template and drivers send it to hardware.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some hardware, for example wl1251 and wl1271, handle the transmission
of power save related frames in hardware, but the driver is responsible
for creating the templates. It's better to create the templates in mac80211,
that way all drivers can benefit from this.
Add two new functions, ieee80211_pspoll_get() and ieee80211_nullfunc_get()
which drivers need to call to get the frame. Drivers are also responsible
for updating the templates after each association.
Also new struct ieee80211_hdr_3addr is added to ieee80211.h to make it
easy to calculate length of the Nullfunc frame.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new NL80211_CMD_SET_TX_BITRATE_MASK command and related
attributes to provide support for setting TX rate mask for rate
control. This uses the existing cfg80211 set_bitrate_mask operation
that was previously used only with WEXT compat code (SIOCSIWRATE). The
nl80211 command allows more generic configuration of allowed rates as
a mask instead of fixed/max rate.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Extend struct cfg80211_bitrate_mask to actually use a bitfield mask
instead of just a single fixed or maximum rate index. This change
itself does not modify the behavior (except for debugfs files), but it
prepares cfg80211 and mac80211 for a new nl80211 command for setting
which rates can be used in TX rate control.
Since frames are now going through the rate control algorithm
unconditionally, the internal IEEE80211_TX_INTFL_RCALGO flag can now
be removed. The RC implementations can use the rate_idx_mask value to
optimize their behavior if only a single rate is enabled.
The old max_rate_idx in struct ieee80211_tx_rate_control is maintained
(but commented as deprecated) for backwards compatibility with existing
RC implementations. Once these implementations have been updated to
use the more generic rate_idx_mask, the max_rate_idx value can be
removed.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If the basic rate set is configured to not include the lowest rate
(e.g., basic rate set = 6, 12, 24 Mbps in IEEE 802.11g mode), the AP
should not send out broadcast frames at 1 Mbps. This type of
configuration can be used to optimize channel usage in cases where
there is no need for backwards compatibility with IEEE 802.11b-only
devices.
In AP mode, mac80211 was unconditionally using the lowest rate for
Beacon frames and similarly, with all rate control algorithms that use
rate_control_send_low(), the lowest rate ended up being used for all
broadcast frames (and all unicast frames that are sent before
association). Change this to take into account the basic rate
configuration in AP mode, i.e., use the lowest rate in the basic rate
set instead of the lowest supported rate when selecting the rate.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mac80211 callback to driver set_coverage_class() sets slot time and ACK
timeout for given IEEE 802.11 coverage class. The callback is optional,
but it's essential for long distance links.
Signed-off-by: Lukas Turek <8an@praha12.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The new attribute NL80211_ATTR_WIPHY_COVERAGE_CLASS sets IEEE 802.11
Coverage Class, which depends on maximum distance of nodes in a
wireless network. It's required for long distance links (more than a few
hundred meters).
The attribute is now ignored by two non-mac80211 drivers, rndis and
iwmc3200wifi, together with WIPHY_PARAM_RETRY_SHORT and
WIPHY_PARAM_RETRY_LONG. If it turns out to be a problem, we could split
set_wiphy_params callback or add new capability bits.
Signed-off-by: Lukas Turek <8an@praha12.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the kernel portions needed to implement
RFC 5082 Generalized TTL Security Mechanism (GTSM).
It is a lightweight security measure against forged
packets causing DoS attacks (for BGP).
This is already implemented the same way in BSD kernels.
For the necessary Quagga patch
http://www.gossamer-threads.com/lists/quagga/dev/17389
Description from Cisco
http://www.cisco.com/en/US/docs/ios/12_3t/12_3t7/feature/guide/gt_btsh.html
It does add one byte to each socket structure, but I did
a little rearrangement to reuse a hole (on 64 bit), but it
does grow the structure on 32 bit
This should be documented on ip(4) man page and the Glibc in.h
file also needs update. IPV6_MINHOPLIMIT should also be added
(although BSD doesn't support that).
Only TCP is supported, but could also be added to UDP, DCCP, SCTP
if desired.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send aligned pipe payload if requested to do so. Then, the socket buffer
needs not be fragmented anymore.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer Nokia cellular modems can use aligned payload for their GPRS pipe.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have L3 tunnels with different inner/outer families
(i.e. IPV4/IPV6) which use a multicast address as the outer tunnel
destination address, multicast packets will be loopbacked back to the
sending socket even if IP*_MULTICAST_LOOP is set to disabled.
The mc_loop flag is present in the family specific part of the socket
(e.g. the IPv4 or IPv4 specific part). setsockopt sets the inner
family mc_loop flag. When the packet is pushed through the L3 tunnel
it will eventually be processed by the outer family which if different
will check the flag in a different part of the socket then it was set.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was very frustrated about the fact that I have to recompile the kernel
to change the hash size. So, I created this patch.
If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
command line, or, if you built IPVS as modules, you can add
options ip_vs conn_tab_bits=??.
To keep everything backward compatible, you still can select the size at
compile time, and that will be used as default.
It has been about a year since this patch was originally posted
and subsequently dropped on the basis of insufficient test data.
Mark Bergsma has provided the following test results which seem
to strongly support the need for larger hash table sizes:
We do however run into the same problem with the default setting (212 =
4096 entries), as most of our LVS balancers handle around a million
connections/SLAB entries at any point in time (around 100-150 kpps
load). With only 4096 hash table entries this implies that each entry
consists of a linked list of 256 connections *on average*.
To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
with both the default 4096 table size, and the same kernel recompiled
with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
quick test setup with a part of Wikimedia/Wikipedia's live traffic
mirrored by the switch to the test host.
With the default setting, at ~ 120 kpps packet load we saw a typical %si
CPU usage of around 30-35%, and oprofile reported a hot spot in
ip_vs_conn_in_get:
samples % image name app name
symbol name
1719761 42.3741 ip_vs.ko ip_vs.ko ip_vs_conn_in_get
302577 7.4554 bnx2 bnx2 /bnx2
181984 4.4840 vmlinux vmlinux __ticket_spin_lock
128636 3.1695 vmlinux vmlinux ip_route_input
74345 1.8318 ip_vs.ko ip_vs.ko ip_vs_conn_out_get
68482 1.6874 vmlinux vmlinux mwait_idle
After loading the recompiled kernel with 218 entries, %si CPU usage
dropped in half to around 12-18%, and oprofile looks much healthier,
with only 7% spent in ip_vs_conn_in_get:
samples % image name app name
symbol name
265641 14.4616 bnx2 bnx2 /bnx2
143251 7.7986 vmlinux vmlinux __ticket_spin_lock
140661 7.6576 ip_vs.ko ip_vs.ko ip_vs_conn_in_get
94364 5.1372 vmlinux vmlinux mwait_idle
86267 4.6964 vmlinux vmlinux ip_route_input
[ horms@verge.net.au: trivial up-port and minor style fixes ]
Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Cc: Mark Bergsma <mark@wikimedia.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (74 commits)
Revert "b43: Enforce DMA descriptor memory constraints"
iwmc3200wifi: fix array out-of-boundary access
wl1251: timeout one too soon in wl1251_boot_run_firmware()
mac80211: fix propagation of failed hardware reconfigurations
mac80211: fix race with suspend and dynamic_ps_disable_work
ath9k: fix missed error codes in the tx status check
ath9k: wake hardware during AMPDU TX actions
ath9k: wake hardware for interface IBSS/AP/Mesh removal
ath9k: fix suspend by waking device prior to stop
cfg80211: fix error path in cfg80211_wext_siwscan
wl1271_cmd.c: cleanup char => u8
iwlwifi: Storage class should be before const qualifier
ath9k: Storage class should be before const qualifier
cfg80211: fix race between deauth and assoc response
wireless: remove remaining qual code
rt2x00: Add USB ID for Linksys WUSB 600N rev 2.
ath5k: fix SWI calibration interrupt storm
mac80211: fix ibss join with fixed-bssid
libertas: Remove carrier signaling from the scan code
orinoco: fix GFP_KERNEL in orinoco_set_key with interrupts disabled
...
To make it easier to notice cases of calling sleeping ops in atomic context,
annotate driver-ops.h with appropiate might_sleep() calls. At the same time,
also document in mac80211.h the op functions with missing contexts.
mac80211 doesn't seem to use get_tx_stats anywhere currently. Just to be on
the safe side, I documented it to be atomic, but hopefully the op can be
removed in the future.
Compile-tested only.
Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All its members (vif, mac_addr, type) are now available
in the vif struct directly, so we can pass that instead
of the conf struct. I generated this patch (except the
mac80211 and header file changes) with this semantic
patch:
@@
identifier conf, fn, hw;
type tp;
@@
tp fn(struct ieee80211_hw *hw,
-struct ieee80211_if_init_conf *conf)
+struct ieee80211_vif *vif)
{
<...
(
-conf->type
+vif->type
|
-conf->mac_addr
+vif->addr
|
-conf->vif
+vif
)
...>
}
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When, for instance, a new IBSS peer is found, userspace
wants to be notified. Add events for all new stations
that mac80211 learns about.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add new commands for requesting the driver to remain awake
on a specified channel for the specified amount of time
(and another command to cancel such an operation). This
can be used to implement userspace-controlled off-channel
operations, like Public Action frame exchange on another
channel than the operation channel.
The off-channel operation should behave similarly to scan,
i.e. the local station (if associated) moves into power
save mode to request the AP to buffer frames for it and
then moves to the other channel to allow the off-channel
operation to be completed. The duration parameter can be
used to request enough time to receive a response from
the target station.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We've long lacked a good confirmation that frames
have really gone out, e.g. before going off-channel
for a scan. Add a flush() operation that drivers
can implement to provide that confirmation, and use
it in a few places:
* before scanning sends the nullfunc frames
* after scanning sends the nullfunc frames, if any
* when going idle, to send any pending frames
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This removes the remaining users of the rx status
'qual' field and the field itself.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For the cases where a lot of interfaces are used in conjunction with a
lot of LLC sockets bound to the same SAP, the iteration of the socket
list becomes prohibitively expensive.
Replacing the list with a a local address based hash significantly
improves the bind and listener lookup operations as well as the
datagram delivery.
Connected sockets delivery is also improved, but this patch does not
address the case where we have lots of sockets with the same local
address connected to different remote addresses.
In order to keep the socket sanity checks alive and fast a socket
counter was added to the SAP structure.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a per SAP device based hash table to solve the
multicast delivery scalability issue when we have large number of
interfaces and a large number of sockets bound to the same SAP.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism,
which require some extra checks in the lookup code:
a) If the current socket was released, reallocated & inserted in
another list it will short circuit the iteration for the current list,
thus we need to restart the lookup.
b) If the current socket was released, reallocated & inserted in the
same list we just need to recheck it matches the look-up criteria and
if not we can skip to the next element.
In this case there is no need to restart the lookup, since sockets are
inserted at the start of the list and the worst that will happen is
that we will iterate throught some of the list elements more then
once.
Note that the /proc and multicast delivery was not yet converted to
RCU, it still uses spinlocks for protection.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rtnetlink init_rcvwnd to set the TCP initial receive window size
advertised by passive and active TCP connections.
The current Linux TCP implementation limits the advertised TCP initial
receive window to the one prescribed by slow start. For short lived
TCP connections used for transaction type of traffic (i.e. http
requests), bounding the advertised TCP initial receive window results
in increased latency to complete the transaction.
Support for setting initial congestion window is already supported
using rtnetlink init_cwnd, but the feature is useless without the
ability to set a larger TCP initial receive window.
The rtnetlink init_rcvwnd allows increasing the TCP initial receive
window, allowing TCP connection to advertise larger TCP receive window
than the ones bounded by slow start.
Signed-off-by: Laurent Chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_push checks tcp_send_head and calls __tcp_push_pending_frames,
which again checks tcp_send_head, and this unnecessary check is
done for every other caller of __tcp_push_pending_frames.
Remove tcp_send_head check in __tcp_push_pending_frames and add
the check to tcp_push_pending_frames. Other functions call
__tcp_push_pending_frames only when tcp_send_head would evaluate
to true.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable spatial multiplexing in mac80211 by telling the
driver what to do and, where necessary, sending action
frames to the AP to update the requested SMPS mode.
Also includes a trivial implementation for hwsim that
just logs the requested mode.
For now, the userspace interface is in debugfs only,
and let you toggle the requested mode at any time.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Move the A-MSDU handling code from mac80211 to cfg80211 so that more
drivers can use it. The new created function ieee80211_amsdu_to_8023s
converts an A-MSDU frame to a list of 802.3 frames.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For bluetooth 3, we will most likely not have
a netdev for a virtual interface (sdata), so
prepare for that by reducing the reliance on
having a netdev. This patch moves the name
and address fields into the sdata struct and
uses them from there all over. Some work is
needed to keep them sync'ed, but that's not
a lot of work and in slow paths anyway.
In doing so, this also reduces the number of
pointer dereferences in many places, because
of things like sdata->dev->dev_addr becoming
sdata->vif.addr.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
net: sh_eth alignment fix for sh7724 using NET_IP_ALIGN V2
ixgbe: allow tx of pre-formatted vlan tagged packets
ixgbe: Fix 82598 premature copper PHY link indicatation
ixgbe: Fix tx_restart_queue/non_eop_desc statistics counters
bcm63xx_enet: fix compilation failure after get_stats_count removal
packet: dont call sleeping functions while holding rcu_read_lock()
tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.
ipvs: zero usvc and udest
netfilter: fix crashes in bridge netfilter caused by fragment jumps
ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
sky2: leave PCI config space writeable
sky2: print Optima chip name
x25: Update maintainer.
ipvs: fix synchronization on connection close
netfilter: xtables: document minimal required version
drivers/net/bonding/: : use pr_fmt
can: CAN_MCP251X should depend on HAS_DMA
drivers/net/usb: Correct code taking the size of a pointer
drivers/net/cpmac.c: Correct code taking the size of a pointer
drivers/net/sfc: Correct code taking the size of a pointer
...
It creates a regression, triggering badness for SYN_RECV
sockets, for example:
[19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
[19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
[19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
[19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR> CR: 24002442 XER: 00000000
[19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
This is likely caused by the change in the 'estab' parameter
passed to tcp_parse_options() when invoked by the functions
in net/ipv4/tcp_minisocks.c
But even if that is fixed, the ->conn_request() changes made in
this patch series is fundamentally wrong. They try to use the
listening socket's 'dst' to probe the route settings. The
listening socket doesn't even have a route, and you can't
get the right route (the child request one) until much later
after we setup all of the state, and it must be done by hand.
This stuff really isn't ready, so the best thing to do is a
full revert. This reverts the following commits:
f55017a93f022c3f7d821aba721ebacda42ebd67345cda2fd6dc343475ed05eaade2786a2a2d6bf8
Signed-off-by: David S. Miller <davem@davemloft.net>
When fragments from bridge netfilter are passed to IPv4 or IPv6 conntrack
and a reassembly queue with the same fragment key already exists from
reassembling a similar packet received on a different device (f.i. with
multicasted fragments), the reassembled packet might continue on a different
codepath than where the head fragment originated. This can cause crashes
in bridge netfilter when a fragment received on a non-bridge device (and
thus with skb->nf_bridge == NULL) continues through the bridge netfilter
code.
Add a new reassembly identifier for packets originating from bridge
netfilter and use it to put those packets in insolated queues.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14805
Reported-and-Tested-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Currently the same reassembly queue might be used for packets reassembled
by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT),
as well as local delivery. This can cause "packet jumps" when the fragment
completing a reassembled packet is queued from a different position in the
stack than the previous ones.
Add a "user" identifier to the reassembly queue key to seperate the queues
of each caller, similar to what we do for IPv4.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add a definition of the amount of TX headroom reserved by mac80211 itself
for its own purposes. Also add BUILD_BUG_ON to validate the value.
This define can then be used by drivers to request additional TX headroom
in the most efficient manner.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
compat_sys_recvmmsg has a compat_timespec parameter and not a
timespec parameter. This way we also get rid of an odd cast.
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves retransmits_timed_out() from include/net/tcp.h
to tcp_timer.c, where it is used.
Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a problem in the TCP connection timeout calculation.
Currently, timeout decisions are made on the basis of the current
tcp_time_stamp and retrans_stamp, which is usually set at the first
retransmission.
However, if the retransmission fails in tcp_retransmit_skb(),
retrans_stamp is not updated and remains zero. This leads to wrong
decisions in retransmits_timed_out() if tcp_time_stamp is larger than
the specified timeout, which is very likely.
In this case, the TCP connection dies after the first attempted
(and unsuccessful) retransmission.
With this patch, tcp_skb_cb->when is used instead, when retrans_stamp
is not available.
This bug has been introduced together with retransmits_timed_out() in
2.6.32, as the number of retransmissions has been used for timeout
decisions before. The corresponding commit was
6fa12c8503 (Revert Backoff [v3]:
Calculate TCP's connection close threshold as a time value.).
Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for
testing.
Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we find a timewait connection in __inet_hash_connect() and reuse
it for a new connection request, we have a race window, releasing bind
list lock and reacquiring it in __inet_twsk_kill() to remove timewait
socket from list.
Another thread might find the timewait socket we already chose, leading to
list corruption and crashes.
Fix is to remove timewait socket from bind list before releasing the bind lock.
Note: This problem happens if sysctl_tcp_tw_reuse is set.
Reported-by: kapil dakhane <kdakhane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
First patch changes __inet_hash_nolisten() and __inet6_hash()
to get a timewait parameter to be able to unhash it from ehash
at same time the new socket is inserted in hash.
This makes sure timewait socket wont be found by a concurrent
writer in __inet_check_established()
Reported-by: kapil dakhane <kdakhane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
Its currently possible that several threads issuing a connect() find
the same timewait socket and try to reuse it, leading to list
corruptions.
Condition for bug is that these threads bound their socket on same
address/port of to-be-find timewait socket, and connected to same
target. (SO_REUSEADDR needed)
To fix this problem, we could unhash timewait socket while holding
ehash lock, to make sure lookups/changes will be serialized. Only
first thread finds the timewait socket, other ones find the
established socket and return an EADDRNOTAVAIL error.
This second version takes into account Evgeniy's review and makes sure
inet_twsk_put() is called outside of locked sections.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function walks the whole hashtable so there is no point in
passing it a network namespace. Instead I purge all timewait
sockets from dead network namespaces that I find. If the namespace
is one of the once I am trying to purge I am guaranteed no new timewait
sockets can be formed so this will get them all. If the namespace
is one I am not acting for it might form a few more but I will
call inet_twsk_purge again and shortly to get rid of them. In
any even if the network namespace is dead timewait sockets are
useless.
Move the calls of inet_twsk_purge into batch_exit routines so
that if I am killing a bunch of namespaces at once I will just
call inet_twsk_purge once and save a lot of redundant unnecessary
work.
My simple 4k network namespace exit test the cleanup time dropped from
roughly 8.2s to 1.6s. While the time spent running inet_twsk_purge fell
to about 2ms. 1ms for ipv4 and 1ms for ipv6.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the code so fib_rules_register always takes a template instead
of the actual fib_rules_ops structure that will be used. This is
required for network namespace support so 2 out of the 3 callers already
do this, it allows the error handling to be made common, and it allows
fib_rules_unregister to free the template for hte caller.
Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu
to allw multiple namespaces to be cleaned up in the same rcu grace
period.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from
other parts of the xfrm code. Add xfrm.nlsk_stash a copy of xfrm.nlsk that
will never be set to NULL. This allows the synchronize_net and
netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets
have been set to NULL.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add exit_list to struct net to support building lists of network
namespaces to cleanup.
- Add exit_batch to pernet_operations to allow running operations only
once during a network namespace exit. Instead of once per network
namespace.
- Factor opt ops_exit_list and ops_exit_free so the logic with cleanup
up a network namespace does not need to be duplicated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 68144d350f4f6c348659c825cde6a82b34c27a91
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:25 2009 +0100
net: fib_rules: add oif classification
Support routing table lookup based on the flow's oif. This is useful to
classify packets originating from sockets bound to interfaces differently.
The route cache already includes the oif and needs no changes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 229e77eec406ad68662f18e49fda8b5d366768c5
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:23 2009 +0100
net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME
The next patch will add oif classification, rename interface related members
and attributes to reflect that they're used for iif classification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit b8952893d5d86f69c4e499d191b98c6658f64b0f
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:22 2009 +0100
net: fib_rules: rearrange struct fib_rule
The ifname member is only used to resolve interface names and is not needed
during rule lookups. The target and ctarget members however are used during
rule lookups and are currently located in a second cacheline.
Move ifname further to the end to make sure both target and ctarget are
located in the same cacheline as other members used during rule lookups.
The layout on 64 bit changes from:
struct fib_rule {
...
u32 table; /* 56 4 */
u8 action; /* 60 1 */
/* XXX 3 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 target; /* 64 4 */
/* XXX 4 bytes hole, try to pack */
struct fib_rule * ctarget; /* 72 8 */
struct rcu_head rcu; /* 80 16 */
struct net * fr_net; /* 96 8 */
};
to:
struct fib_rule {
...
u32 table; /* 40 4 */
u8 action; /* 44 1 */
/* XXX 3 bytes hole, try to pack */
u32 target; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
struct fib_rule * ctarget; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
char ifname[16]; /* 64 16 */
struct rcu_head rcu; /* 80 16 */
struct net * fr_net; /* 96 8 */
};
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
RejActioned is used to prevent retransmission when a entity is on the
WAIT_F state, i.e., waiting for a frame with F-bit set due local busy
condition or a expired retransmission timer. (When these two events raise
they send a frame with the Poll bit set and enters in the WAIT_F state to
wait for a frame with the Final bit set.)
The local entity doesn't send I-frames(the data frames) until the receipt
of a frame with F-bit set. When that happens it also set RejActioned to false.
RejActioned is a mandatory feature of ERTM spec.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As specified by ERTM spec an ERTM channel can acknowledge received
I-frames(the data frames) by sending an I-frame with the proper ReqSeq
value (i.e. ReqSeq is set to BufferSeq). Until now we aren't setting the
ReqSeq value on I-frame control bits. That way we can save sending
S-frames(Supervise frames) only to acknowledge receipt of I-frames. It
is very helpful to the full-duplex channel.
ReqSeq is the packet sequence number sent in an acknowledgement frame to
acknowledge receipt of frames up to (ReqSeq - 1).
BufferSeq controls the receiver buffer, it is used to delay
acknowledgement of new frames to not cause buffer overflow. BufferSeq
value is not increased until frames are pulled by reassembly function.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The tasklet schedule function helpers are just an obfuscation. So remove
them and call the schedule functions directly.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For future simplification it is important that the hci_recv_frame
function is no longer an inline function. So move it into the module
itself and export it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Eric Dumazet mentioned in a context of another problem:
"Well, it seems NFS reuses its socket, so maybe we miss some
cleaning as spotted in this old patch"
I've not check under which conditions that actually happens but
if true, we need to make sure we don't accidently leave stale
hints behind when the write queue had to be purged (whether reusing
with NFS can actually happen if purging took place is something I'm
not sure of).
...At least it compiles.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parse incoming TCP_COOKIE option(s).
Calculate <SYN,ACK> TCP_COOKIE option.
Send optional <SYN,ACK> data.
This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1d: define TCP cookie option, extend existing struct's
TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1f: Initiator Cookie => Responder
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Data structures are carefully composed to require minimal additions.
For example, the struct tcp_options_received cookie_plus variable fits
between existing 16-bit and 8-bit variables, requiring no additional
space (taking alignment into consideration). There are no additions to
tcp_request_sock, and only 1 pointer in tcp_sock.
This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
The principle difference is using a TCP option to carry the cookie nonce,
instead of a user configured offset in the data. This is more flexible and
less subject to user configuration error. Such a cookie option has been
suggested for many years, and is also useful without SYN data, allowing
several related concepts to use the same extension option.
"Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html
"Re: what a new TCP header might look like", May 12, 1998.
ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail
These functions will also be used in subsequent patches that implement
additional features.
Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Define sysctl (tcp_cookie_size) to turn on and off the cookie option
default globally, instead of a compiled configuration option.
Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
data values, retrieving variable cookie values, and other facilities.
Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
near its corresponding struct tcp_options_received (prior to changes).
This is a straightforward re-implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
These functions will also be used in subsequent patches that implement
additional features.
Requires:
net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Define (missing) hash message size for SHA1.
Define hashing size constants specific to TCP cookies.
Add new function: tcp_cookie_generator().
Maintain global secret values for tcp_cookie_generator().
This is a significantly revised implementation of earlier (15-year-old)
Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform.
Linux RCU technique appears to be well-suited to this application, though
neither of the circular queue items are freed.
These functions will also be used in subsequent patches that implement
additional features.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission. Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.
Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No that all of the callers have been updated to set fields in
struct pernet_operations, and simplified to let the network
namespace core handle the allocation and freeing of the storage
for them, remove the surpurpflous methods and update the docs
to the new style.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To get the full benefit of batched network namespace cleanup netowrk
device deletion needs to be performed by the generic code. When
using register_pernet_gen_device and freeing the data in exit_net
it is impossible to delay allocation until after exit_net has called
as the device uninit methods are no longer safe.
To correct this, and to simplify working with per network namespace data
I have moved allocation and deletion of per network namespace data into
the network namespace core. The core now frees the data only after
all of the network namespace exit routines have run.
Now it is only required to set the new fields .id and .size
in the pernet_operations structure if you want network namespace
data to be managed for you automatically.
This makes the current register_pernet_gen_device and
register_pernet_gen_subsys routines unnecessary. For the moment
I have left them as compatibility wrappers in net_namespace.h
They will be removed once all of the users have been updated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is fairly common to kill several network namespaces at once. Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments. As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.
To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch. For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.
An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.
In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The motivation for an additional notifier in batched netdevice
notification (rt_do_flush) only needs to be called once per batch not
once per namespace.
For further batching improvements I need a guarantee that the
netdevices are unregistered in order allowing me to unregister an all
of the network devices in a network namespace at the same time with
the guarantee that the loopback device is really and truly
unregistered last.
Additionally it appears that we moved the route cache flush after
the final synchronize_net, which seems wrong and there was no
explanation. So I have restored the original location of the final
synchronize_net.
Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lennert Buytenhek noticed that delBA handling in mac80211
was broken and has remotely triggerable problems, some of
which are due to some code shuffling I did that ended up
changing the order in which things were done -- this was
commit d75636ef9c
Author: Johannes Berg <johannes@sipsolutions.net>
Date: Tue Feb 10 21:25:53 2009 +0100
mac80211: RX aggregation: clean up stop session
and other parts were already present in the original
commit d92684e660
Author: Ron Rindjunsky <ron.rindjunsky@intel.com>
Date: Mon Jan 28 14:07:22 2008 +0200
mac80211: A-MPDU Tx add delBA from recipient support
The first problem is that I moved a BUG_ON before various
checks -- thereby making it possible to hit. As the comment
indicates, the BUG_ON can be removed since the ampdu_action
callback must already exist when the state is != IDLE.
The second problem isn't easily exploitable but there's a
race condition due to unconditionally setting the state to
OPERATIONAL when a delBA frame is received, even when no
aggregation session was ever initiated. All the drivers
accept stopping the session even then, but that opens a
race window where crashes could happen before the driver
accepts it. Right now, a WARN_ON may happen with non-HT
drivers, while the race opens only for HT drivers.
For this case, there are two things necessary to fix it:
1) don't process spurious delBA frames, and be more careful
about the session state; don't drop the lock
2) HT drivers need to be prepared to handle a session stop
even before the session was really started -- this is
true for all drivers (that support aggregation) but
iwlwifi which can be fixed easily. The other HT drivers
(ath9k and ar9170) are behaving properly already.
Reported-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Moves the CONFIG_SYSCTL ifdefs in x25_init into header.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
T3-rtx timer expired but did not fit in one MTU (rule E3 above)
should be marked for retransmission and sent as soon as cwnd
allows (normally, when a SACK arrives). ".
This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).
Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).
This commit reverts d0ce92910b and
also removes the now unused transport->last_rto, introduced in
b6157d8e03.
p.s The problem is not only when multiple paths are there. It
can happen in a single homed environment. If the application
stops sending data, it possible to have a hung association.
Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an interface to set, delete and flush PMKIDs through nl80211.
Main users would be fullmac devices which firmwares are capable of
generating the RSN IEs for the re-association requests, e.g. iwmc3200wifi.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The RX flags should soon be used only for flags
that cannot change within an a-MPDU, so move the
cooked monitor flag into the RX status flags.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Adding a xfrm_state requires an authentication algorithm specified
either as xfrm_algo or as xfrm_algo_auth with a specific truncation
length. For compatibility, both attributes are dumped to userspace,
and we also accept both attributes, but prefer the new syntax.
If no truncation length is specified, or the authentication algorithm
is specified using xfrm_algo, the truncation length from the algorithm
description in the kernel is used.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation of max.burst ends up limiting new
data during cwnd decay period. The decay is happening becuase
the connection is idle and we are allowed to fill the congestion
window. The point of max.burst is to limit micro-bursts in response
to large acks. This still happens, as max.burst is still applied
to each transmit opportunity. It will also apply if a very large
send is made (greater then allowed by burst).
Tested-by: Florian Niederbacher <florian.niederbacher@student.uibk.ac.at>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Recent attempt to remove deprecated socket options demonstrated
that removing options from the enum space will have severe
binary compatibility issues. The reason is that it changes
the subsequent enum space and causes option values to be redefined.
To solve this, and to get rid of the ugly double statements for
every option, we simply convert to the #define scheme.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The transport last_time_used variable is rather useless.
It was only used when determining if CWND needs to be updated
due to idle transport. However, idle transport detection was
based on a Heartbeat timer and last_time_used was not incremented
when sending Heartbeats. As a result the check for cwnd reduction
was always true. We can get rid of the variable and just base
our cwnd manipulation on the HB timer (like the code comment sais).
We also have to call into the cwnd manipulation function regardless
of whether HBs are enabled or not. That way we will detect idle
transports if the user has disabled Heartbeats.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
SCTP_GET_*_OLD stuffs are schedlued to be removed.
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
We currently send window update SACKs every time we free up 1 PMTU
worth of data. That a lot more SACKs then necessary. Instead, we'll
now send back the actuall window every time we send a sack, and do
window-update SACKs when a fraction of the receive buffer has been
opened. The fraction is controlled with a sysctl.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The "Invalid Stream Identifier" error has a 16 bit reserved
field at the end, thus making the parameter length be 8 bytes.
We've never supplied that reserved field making wireshark
tag the packet as malformed.
Reported-by: Chris Dischino <cdischino@sonusnet.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
This patch implement the sender side for SACK-IMMEDIATELY
extension.
Section 4.1. Sender Side Considerations
Whenever the sender of a DATA chunk can benefit from the
corresponding SACK chunk being sent back without delay, the sender
MAY set the I-bit in the DATA chunk header.
Reasons for setting the I-bit include
o The sender is in the SHUTDOWN-PENDING state.
o The application requests to set the I-bit of the last DATA chunk
of a user message when providing the user message to the SCTP
implementation.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
With WEXT, it happens frequently that the SME
requests an authentication but then deauthenticates
right away because some new parameters came along.
Every time this happens we print a deauth message
and send a deauth frame, but both of that is rather
confusing. Avoid it by aborting the authentication
process silently, and telling cfg80211 about that.
The patch looks larger than it really is:
__cfg80211_auth_remove() is split out from
cfg80211_send_auth_timeout(), there's no new code
except __cfg80211_auth_canceled() (a one-liner) and
the mac80211 bits (7 new lines of code).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Right now all frames mac80211 hands to the driver
have the IEEE80211_TX_CTL_REQ_TX_STATUS flag set to
request TX status. This isn't really necessary, only
the injected frames need TX status (the latter for
hostapd) so move setting this flag.
The rate control algorithms also need TX status, but
they don't require it.
Also, rt2x00 uses that bit for its own purposes and
seems to require it being set for all frames, but
that can be fixed in rt2x00.
This doesn't really change anything for any drivers
but in the future drivers using hw-rate control may
opt to not report TX status for frames that don't
have the IEEE80211_TX_CTL_REQ_TX_STATUS flag set.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com> [rt2x00 bits]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It's very likely that not many devices will support
four-address mode in station or AP mode so introduce
capability bits for both modes, set them in mac80211
and check them when userspace tries to use the mode.
Also, keep track of 4addr in cfg80211 (wireless_dev)
and not in mac80211 any more. mac80211 can also be
improved for the VLAN case by not looking at the
4addr flag but maintaining the station pointer for
it correctly. However, keep track of use_4addr for
station mode in mac80211 to avoid all the derefs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We've accumulated a number of options for wiphys
which make more sense as flags as we keep adding
more. Convert the existing ones.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
cxgb3: fix premature page unmap
ibm_newemac: Fix EMACx_TRTR[TRT] bit shifts
vlan: Fix register_vlan_dev() error path
gro: Fix illegal merging of trailer trash
sungem: Fix Serdes detection.
net: fix mdio section mismatch warning
ppp: fix BUG on non-linear SKB (multilink receive)
ixgbe: Fixing EEH handler to handle more than one error
net: Fix the rollback test in dev_change_name()
Revert "isdn: isdn_ppp: Use SKB list facilities instead of home-grown implementation."
TI Davinci EMAC : Fix Console Hang when bringing the interface down
smsc911x: Fix Console Hang when bringing the interface down.
mISDN: fix error return in HFCmulti_init()
forcedeth: mac address fix
r6040: fix version printing
Bluetooth: Fix regression with L2CAP configuration in Basic Mode
Bluetooth: Select Basic Mode as default for SOCK_SEQPACKET
Bluetooth: Set general bonding security for ACL by default
r8169: Fix receive buffer length when MTU is between 1515 and 1536
can: add the missing netlink get_xstats_size callback
...
Some devices implement the entire rate control in
firmware in some way, like wl1271 or like iwlwifi
which does some things in software but not a lot.
Therefore generic software rate control is rather
useless for them and just adds avoidable overhead
to the transmit path.
It's fairly simple to let drivers indicate that
they do not need rate control, but they need to
fulfil a number of conditions that we encode in
WARN_ONs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The entire aggregation code currently operates on the
hw pointer and station addresses, but that needs to
change to make stations purely per-vif; As one step
preparing for that make the aggregation code callable
with the station, or by the combination of virtual
interface and station address.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
While investigating for network latencies, I found inet_getid() was a
contention point for some workloads, as inet_peer_idlock is shared
by all inet_getid() users regardless of peers.
One way to fix this is to make ip_id_count an atomic_t instead
of __u16, and use atomic_add_return().
In order to keep sizeof(struct inet_peer) = 64 on 64bit arches
tcp_ts_stamp is also converted to __u32 instead of "unsigned long".
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first "node" is supposed to be the cursor used in the for_each.
The second "node" is ment literally and should not be macro expanded:
it's the name of the hlist_node field from the inet_bind_bucket.
This currently works because when inet_bind_bucket_for_each is called
it's argument is still "node".
Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define two symbols needed in both kernel and user space.
Remove old (somewhat incorrect) kernel variant that wasn't used in
most cases. Default should apply to both RMSS and SMSS (RFC2581).
Replace numeric constants with defined symbols.
Stand-alone patch, originally developed for TCPCT.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent commit 8da645e101
sctp: Get rid of an extra routing lookup when adding a transport
introduced a regression in the connection setup. The behavior was
different between IPv4 and IPv6. IPv4 case ended up working because the
route lookup routing returned a NULL route, which triggered another
route lookup later in the output patch that succeeded. In the IPv6 case,
a valid route was returned for first call, but we could not find a valid
source address at the time since the source addresses were not set on the
association yet. Thus resulted in a hung connection.
The solution is to set the source addresses on the association prior to
adding peers.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the NL80211_CMD_GET_SURVEY command and an get_survey()
ops that a driver can implement. The goal of this command is to allow a
drivers to report channel survey data (e.g. channel noise, channel
occupation).
For now, only the mechanism to report back channel noise has been
implemented.
In future, there will either be a survey-trigger command --- or the existing
scan-trigger command will be enhanced. This will allow user-space to
request survey for arbitrary channels.
Note: any driver that cannot report channel noise should not report
any value at all, e.g. made-up -92 dBm.
Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Resulting object files have the same MD5 as before.
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.
In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.
Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
UDP bind() can be O(N^2) in some pathological cases.
Thanks to secondary hash tables, we can make it O(N)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
net/fsl_pq_mdio: add module license GPL
can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo()
can: should not use __dev_get_by_index() without locks
hisax: remove bad udelay call to fix build error on ARM
ipip: Fix handling of DF packets when pmtudisc is OFF
qlge: Set PCIe reset type for EEH to fundamental.
qlge: Fix early exit from mbox cmd complete wait.
ixgbe: fix traffic hangs on Tx with ioatdma loaded
ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled
ixgbe: Fix gso_max_size for 82599 when DCB is enabled
macsonic: fix crash on PowerBook 520
NET: cassini, fix lock imbalance
ems_usb: Fix byte order issues on big endian machines
be2net: Bug fix to send config commands to hardware after netdev_register
be2net: fix to set proper flow control on resume
netfilter: xt_connlimit: fix regression caused by zero family value
rt2x00: Don't queue ieee80211 work after USB removal
Revert "ipw2200: fix oops on missing firmware"
decnet: netdevice refcount leak
netfilter: nf_nat: fix NAT issue in 2.6.30.4+
...
This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine
in the suspended state. This leads to the connectivity problems because
after resuming local machine thinks that the SAD entry is still valid, while
it has already been expired on the remote server.
The cause of this is very simple: the timeouts in the net/xfrm are bound to
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.
I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.
This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).
This patch is against 2.6.31.4. Please CC me.
Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends udp_table to contain a secondary hash table.
socket anchor for this second hash is free, because UDP
doesnt use skc_bind_node : We define an union to hold
both skc_bind_node & a new hlist_nulls_node udp_portaddr_node
udp_lib_get_port() inserts sockets into second hash chain
(additional cost of one atomic op)
udp_lib_unhash() deletes socket from second hash chain
(additional cost of one atomic op)
Note : No spinlock lockdep annotation is needed, because
lock for the secondary hash chain is always get after
lock for primary hash chain.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Union sk_hash with two u16 hashes for udp (no extra memory taken)
One 16 bits hash on (local port) value (the previous udp 'hash')
One 16 bits hash on (local address, local port) values, initialized
but not yet used. This second hash is using jenkin hash for better
distribution.
Because the 'port' is xored later, a partial hash is performed
on local address + net_hash_mix(net)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a counter in udp_hslot to keep an accurate count
of sockets present in chain.
This will permit to upcoming UDP lookup algo to chose
the shortest chain when secondary hash is added.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.
We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call. To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.
In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Some devices require that all frames to a station
are flushed when that station goes into powersave
mode before being able to send frames to that
station again when it wakes up or polls -- all in
order to avoid reordering and too many or too few
frames being sent to the station when it polls.
Normally, this is the case unless the station
goes to sleep and wakes up very quickly again.
But in that case, frames for it may be pending
on the hardware queues, and thus races could
happen in the case of multiple hardware queues
used for QoS/WMM. Normally this isn't a problem,
but with the iwlwifi mechanism we need to make
sure the race doesn't happen.
This makes mac80211 able to cope with the race
with driver help by a new WLAN_STA_PS_DRIVER
per-station flag that can be controlled by the
driver and tells mac80211 whether it can transmit
frames or not. This flag must be set according to
very specific rules outlined in the documentation
for the function that controls it.
When we buffer new frames for the station, we
normally set the TIM bit right away, but while
the driver has blocked transmission to that sta
we need to avoid that as well since we cannot
respond to the station if it wakes up due to the
TIM bit. Once the driver unblocks, we can set
the TIM bit.
Similarly, when the station just wakes up, we
need to wait until all other frames are flushed
before we can transmit frames to that station,
so the same applies here, we need to wait for
the driver to give the OK.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add support for two more NL802154 commands: ADD_IFACE and DEL_IFACE,
thus allowing creation and removal of logic WPAN interfaces on the top
of wpan-phy.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
ops->get_phy should increment reference to wpan-phy. As we return
the external structure, we should do refcounting correctly.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Follow the usual pattern of devices registration by adding new function
(wpan_phy_set_dev) that sets child->parent relationship and removing
parent argument from wpan_phy_register call.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
IEEE 802.15.4-2006 defines channel pages that hold channels (max 32 pages,
27 channels per page). Allow the driver to specify supported channels
on pages, other than the first one.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Conflicts:
drivers/net/usb/cdc_ether.c
All CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
over NAT. The "cause" of the problem was a fix of unacknowledged data
detection with NAT (commit a3a9f79e36).
However, actually, that fix uncovered a long standing bug in TCP conntrack:
when NAT was enabled, we simply updated the max of the right edge of
the segments we have seen (td_end), by the offset NAT produced with
changing IP/port in the data. However, we did not update the other parameter
(td_maxend) which is affected by the NAT offset. Thus that could drift
away from the correct value and thus resulted breaking active FTP.
The patch below fixes the issue by *not* updating the conntrack parameters
from NAT, but instead taking into account the NAT offsets in conntrack in a
consistent way. (Updating from NAT would be more harder and expensive because
it'd need to re-calculate parameters we already calculated in conntrack.)
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct can_proto had a capability field which wasn't ever used. It is
dropped entirely.
struct inet_protosw had a capability field which can be more clearly
expressed in the code by just checking if sock->type = SOCK_RAW.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trying to parse the option of a SYN packet that we have
no route entry for should just use global wide defaults
for route entry options.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Tested-by: Valdis.Kletnieks@vt.edu
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have a TODO item to make all station
management dependent on virtual interfaces, I
figured I'd start with pushing such a change
to drivers before more drivers start using the
ieee80211_find_sta() API with a hw pointer and
cause us grief later on.
For now continue exporting the old API in form
of ieee80211_find_sta_by_hw(), but discourage
its use strongly.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.
struct something
{
becomes :
struct something {
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch below also addresses a couple of other corner cases in readdir
seen with a large (e.g. 64k) msize. I'm not sure what people think of
my co-opting of fid->aux here. I'd be happy to rework if there's a better
way.
When the size of the user supplied buffer passed to readdir is smaller
than the data returned in one go by the 9P read request, v9fs_dir_readdir()
currently discards extra data so that, on the next call, a 9P read
request will be issued with offset < previous offset + bytes returned,
which voilates the constraint described in paragraph 3 of read(5) description.
This patch preseves the leftover data in fid->aux for use in the next call.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This value is unused by mac80211, because it was only
be used by wireless extensions, and turned out to not
be useful there because the quality value needs to be
comparable between scan results and the current value
which is impossible when the qual value is calculated
taking into account noise, for example.
Since it is unused anyway, this patch deprecates it
in the hope that drivers will remove their sometimes
quite expensive calculations of the value.
I'm open to actual uses of the value, but the best
way of using it seems to be what the Intel drivers do
which should probably be generalised if we have noise
values from the hardware.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Compared to ieee80211_beacon_get(), the new function
ieee80211_beacon_get_tim() returns information on the
location and length of the TIM IE, which some drivers
need in order to generate the TIM on the device. The
old function, ieee80211_beacon_get(), becomes a small
static inline wrapper around the new one to not break
all drivers.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
While there may be a case for a driver adding its
own bits of radiotap information, none currently
does. Also, drivers would have to copy the code
to generate the radiotap bits that now mac80211
generates. If some driver in the future needs to
add some driver-specific information I'd expect
that to be in a radiotap vendor namespace and we
can add a different way of passing such data up
and having mac80211 include it.
Additionally, rename IEEE80211_CONF_RADIOTAP to
IEEE80211_CONF_MONITOR since it's still used by
b43(legacy) to obtain per-frame timestamps.
The purpose of this patch is to simplify the RX
code in mac80211 to make it easier to add paged
skb support.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In
commit 601ae7f25a
Author: Bruno Randolf <br1@einfach.org>
Date: Thu May 8 19:22:43 2008 +0200
mac80211: make rx radiotap header more flexible
code was added that tried to align the radiotap header
position in memory based on the radiotap header length.
Quite obviously, that is completely useless.
Instead of trying to do that, use unaligned accesses
to generate the radiotap header. To properly do that,
we also need to mark struct ieee80211_radiotap_header
packed, but that is fine since it's already packed
(and it should be marked packed anyway since its a
wire format).
Cc: Bruno Randolf <br1@einfach.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Policy routing is not looked up by mark on reverse path filtering.
This fixes it.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding an accessor to existing dst_entry feautres field and
refactor the only supported feature (allfrag) to use it.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need tcp_parse_options to be aware of dst_entry to
take into account per dst_entry TCP options settings
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a list_head parameter to rtnl_link_ops->dellink() methods
allow us to queue devices on a list, in order to dismantle
them all at once.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ieee80211_rx() must be called with bottom halves disabled. To simplify
driver development implement ieee80211_rx_ni() which disables BH. This
function must be used when in process context.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Get rid of cookies in cfg80211_send_XXX() functions.
Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>