This patch turns QFQ into QFQ+, a variant of QFQ that provides the
following two benefits: 1) QFQ+ is faster than QFQ, 2) differently
from QFQ, QFQ+ correctly schedules also non-leaves classes in a
hierarchical setting. A detailed description of QFQ+, plus a
performance comparison with DRR and QFQ, can be found in [1].
[1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calculation of RTTVAR involves the subtraction of two unsigned
numbers which
may causes rollover and results in very high values of RTTVAR when RTT > SRTT.
With this patch it is possible to set RTOmin = 1 to get the minimum of RTO at
4 times the clock granularity.
Change Notes:
v2)
*Replaced abs() by abs64() and long by __s64, changed patch
description.
Signed-off-by: Christian Schoch <e0326715@student.tuwien.ac.at>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consider the following program, that sets the second argument to the
sendto() syscall incorrectly:
#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>
int main(void)
{
int fd;
struct sockaddr_in sa;
fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);
sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
return 0;
}
We get -ENOMEM:
$ strace -e sendto ./demo
sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory)
Propagate the error code from sctp_user_addto_chunk(), so that we will
tell user space what actually went wrong:
$ strace -e sendto ./demo
sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address)
Noticed while running Trinity (the syscall fuzzer).
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:
#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>
int main(void)
{
int fd;
struct sockaddr_in sa;
fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);
sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
return 0;
}
As far as I can tell, the leak has been around since ~2003.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pppoatm_send() does not take any lock that will prevent concurrent
vcc_sendmsg(). This causes two problems:
- there is no locking between checking the send queue size
with atm_may_send() and incrementing sk_wmem_alloc,
and the real queue size can be a little higher than sk_sndbuf
- the vcc->sendmsg() can be called concurrently. I'm not sure
if it's allowed. Some drivers (eni, nicstar, ...) seem
to assume it will never happen.
Now pppoatm_send() takes ATM socket lock, the same that is used
in vcc_sendmsg() and other ATM socket functions. The pppoatm_send()
is called with BH disabled, so bh_lock_sock() is used instead
of lock_sock().
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The pppoatm used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.
Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The pppoatm does not check if used vcc is in connected state,
causing an Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.
Now pppoatm can be assigned only on connected sockets; otherwise
-EINVAL error is returned.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The atm is using atmvcc->push(vcc, NULL) callback to notify protocol
that vcc will be closed and protocol must detach from it. This callback
is usually used by protocol to decrement module usage count by module_put(),
but it leaves small window then module is still used after module_put().
Now the owner of push() callback is kept in atmvcc and
module_put(atmvcc->owner) is called after the protocol is detached from vcc.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Name of pimreg devices are built from following format :
char name[IFNAMSIZ]; // IFNAMSIZ == 16
sprintf(name, "pimreg%u", mrt->id);
We must therefore limit mrt->id to 9 decimal digits
or risk a buffer overflow and a crash.
Restrict table identifiers in [0 ... 999999999] interval.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_getpeer_v4() can return NULL under OOM conditions, and while
inet_peer_xrlim_allow() is OK with a NULL peer, inet_putpeer() will
crash.
This code path now uses the same idiom as the others from:
1d861aa4b3 ("inet: Minimize use of
cached route inetpeer.").
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
will then require another call like if_indextoname() to get the actual interface
name, have it return the name directly.
This also matches the existing man page description on socket(7) which mentions
the argument being an interface name.
If the value has not been set, zero is returned and optlen will be set to zero
to indicate there is no interface name present.
Added a seqlock to protect this code path, and dev_ifname(), from someone
changing the device name via dev_change_name().
v2: Added seqlock protection while copying device name.
v3: Fixed word wrap in patch.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's really no excuse for an additional wmem_default of buffering
between the netdev queue and the ATM device. Two packets (one in-flight,
and one ready to send) ought to be fine. It's not as if it should take
long to get another from the netdev queue when we need it.
If necessary we can make the queue space configurable later, but I don't
think it's likely to be necessary.
cf. commit 9d02daf754 (pppoatm: Fix
excessive queue bloat) which did something very similar for PPPoATM.
Note that there is a tremendously unlikely race condition which may
result in qspace temporarily going negative. If a CPU running the
br2684_pop() function goes off into the weeds for a long period of time
after incrementing qspace to 1, but before calling netdev_wake_queue()...
and another CPU ends up calling br2684_start_xmit() and *stopping* the
queue again before the first CPU comes back, the netdev queue could
end up being woken when qspace has already reached zero.
An alternative approach to coping with this race would be to check in
br2684_start_xmit() for qspace==0 and return NETDEV_TX_BUSY, but just
using '> 0' and '< 1' for comparison instead of '== 0' and '!= 0' is
simpler. It just warranted a mention of *why* we do it that way...
Move the call to atmvcc->send() to happen *after* the accounting and
potentially stopping the netdev queue, in br2684_xmit_vcc(). This matters
if the ->send() call suffers an immediate failure, because it'll call
br2684_pop() with the offending skb before returning. We want that to
happen *after* we've done the initial accounting for the packet in
question. Also make it return an appropriate success/failure indication
while we're at it.
Tested by running 'ping -l 1000 bottomless.aaisp.net.uk' from within my
network, with only a single PPPoE-over-BR2684 link running. And after
setting txqueuelen on the nas0 interface to something low (5, in fact).
Before the patch, we'd see about 15 packets being queued and a resulting
latency of ~56ms being reached. After the patch, we see only about 8,
which is fairly much what we expect. And a max latency of ~36ms. On this
OpenWRT box, wmem_default is 163840.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Reviewed-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 82167cb8c6 ('net: dsa/slave: Fix
compilation warnings') fixed one possible invalid configuration
(NET_DSA enabled with no trailer formats) but added others: drivers
can select NET_DSA without its dependencies being met.
It's not very useful to make either the DSA core or the tagging
formats manually selectable without a driver to use them, so:
1. Define a hidden HAVE_NET_DSA option and move the dependencies of
NET_DSA to that. While we're at it, drop the deprecated
EXPERIMENTAL dependency.
2. Make NET_DSA and the drivers dependent on HAVE_NET_DSA.
3. Hide the tagging format options again.
4. Make drivers select both NET_DSA and the appropriate tagging format
option.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set in the rx_ifindex to pass the correct interface index in the case of a
message timeout detection. Usually the rx_ifindex value is set at receive
time. But when no CAN frame has been received the RX_TIMEOUT notification
did not contain a valid value.
Cc: linux-stable <stable@vger.kernel.org>
Reported-by: Andre Naujoks <nautsch2@googlemail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch adds support for skb mark matching and set action.
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
In some cases, e.g. probe_status, there were spaces
missing so the trace output was confusing. Also make
it more like mac80211 when printing netdevs/wiphys
to make reading a combined log easier.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To achieve this, limit the number of retries to
31 (instead of 255) and use the three bits that
are then free for VHT flags.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for reporting and calculating VHT MCSes.
Note that I'm not completely sure that the bitrate
calculations are correct, nor that they can't be
simplified.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Convert mac80211 (and where necessary, some drivers a
little bit) to the new channel definition struct.
This will allow extending mac80211 for VHT, which is
currently restricted to channel contexts since there
are no drivers using that which makes it easier. As
I also don't care about VHT for drivers not using the
channel context API, I won't convert the previous API
to VHT support.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Change nl80211 to support specifying a VHT (or HT)
using the control channel frequency (as before) and
new attributes for the channel width and first and
second center frequency. The old channel type is of
course still supported for HT.
Also change the cfg80211 channel definition struct
to support these by adding the relevant fields to
it (and removing the _type field.)
This also adds new helper functions:
- cfg80211_chandef_create to create a channel def
struct given the control channel and channel type,
- cfg80211_chandef_identical to check if two channel
definitions are identical
- cfg80211_chandef_compatible to check if the given
channel definitions are compatible, and return the
wider of the two
This isn't entirely complete, but that doesn't matter
until we have a driver using it. In particular, it's
missing
- regulatory checks on the usable bandwidth (if that
even makes sense)
- regulatory TX power (database can't deal with it)
- a proper channel compatibility calculation for the
new channel types
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of passing a channel pointer and channel type
to all functions and driver methods, pass a new channel
definition struct. Right now, this struct contains just
the control channel and channel type, but for VHT this
will change.
Also, add a small inline cfg80211_get_chandef_type() so
that drivers don't need to use the _type field of the
new structure all the time, which will change.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As mwifiex (and mac80211 in the software case) are the
only drivers actually implementing remain-on-channel
with channel type, userspace can't be relying on it.
This is the case, as it's used only for P2P operations
right now.
Rather than adding a flag to tell userspace whether or
not it can actually rely on it, simplify all the code
by removing the ability to use different channel types.
Leave only the validation of the attribute, so that if
we extend it again later (with the needed capability
flag), it can't break userspace sending invalid data.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If ieee80211_prep_channel() decides that HT should be
disabled (because the HT IEs from the AP were invalid)
it will set the IEEE80211_STA_DISABLE_HT to not send
HT capabilities to the AP when associating. If this
happens during authentication, the flag will be lost
and we send HT frames, even if the channel config was
set up for non-HT. This can lead to issues.
Fix this by always resetting the ifmgd flags to zero
when the channel context is released so that the flag
resetting in ieee80211_mgd_assoc() isn't necessary.
To make the code a bit easier move the call to release
the channel in ieee80211_set_disassoc() to the end of
the function together with the flag resetting (which
needs to be at the end to avoid timers setting flags.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use shortcut pointer instead where it is appropriate.
Signed-off-by: Marco Porsch <marco.porsch@etit.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Return early if not a QoS Data frame.
Give proper documentation.
Signed-off-by: Marco Porsch <marco.porsch@etit.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The debug message has to be printed also for an Auth message with
auth_sequence != 1. This helps understanding whether the two Auth
messages are exchanged correctly or not.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It does not make sense to keep a station alive if it is not authorised
at all. If IBSS/RSN is used it could also be the case that something
went wrong during the keys exchange and the stations ended up in a not
recoverable state.
By not updating last_rx we are giving the station a chance to be
deleted and to start the key exchange once again from scratch.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function cfg80211_get_p2p_attr() can fail and returns
a negative error code. However, the return type is unsigned
int. The largest positive number is determined by desired_len
variable in the function, which is u16. So changing the return
type to int to allow easy error checking. Also change the type
for the attribute to enum for improved type checking.
Signed-off-by: Arend van Spriel <arend@broadcom.com>
[fix indentation, don't use u8 attr variable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Save a few bytes per table by convert mroute_do_assert and
mroute_do_pim from int to bool.
Remove !! as the compiler does that when assigning int to bool.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) ip_mroute_setsockopt() & ip_mroute_getsockopt() should not
access/set raw_sk(sk)->ipmr_table before making sure the socket
is a raw socket, and protocol is IGMP
2) MRT_INIT should return -EINVAL if optlen != sizeof(int), not
-ENOPROTOOPT
3) MRT_ASSERT & MRT_PIM should validate optlen
4) " (v) ? 1 : 0 " can be written as " !!v "
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when none of CONFIG_NET_DSA_TAG_DSA, CONFIG_NET_DSA_TAG_EDSA and
CONFIG_NET_DSA_TAG_TRAILER is defined, we get following compilation warnings:
net/dsa/slave.c:51:12: warning: 'dsa_slave_init' defined but not used [-Wunused-function]
net/dsa/slave.c:60:12: warning: 'dsa_slave_open' defined but not used [-Wunused-function]
net/dsa/slave.c:98:12: warning: 'dsa_slave_close' defined but not used [-Wunused-function]
net/dsa/slave.c:116:13: warning: 'dsa_slave_change_rx_flags' defined but not used [-Wunused-function]
net/dsa/slave.c:127:13: warning: 'dsa_slave_set_rx_mode' defined but not used [-Wunused-function]
net/dsa/slave.c:136:12: warning: 'dsa_slave_set_mac_address' defined but not used [-Wunused-function]
net/dsa/slave.c:164:12: warning: 'dsa_slave_ioctl' defined but not used [-Wunused-function]
Earlier approach to fix this was discussed here:
lkml.org/lkml/2012/10/29/549
This is another approach to fix it. This is done by some changes in config
options, which make more sense than the earlier approach. As, atleast one
tagging option must always be selected for using net/dsa/ infrastructure, this
patch selects NET_DSA from tagging configs instead of having it as an selectable
config.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes it possible to build the CAN Identifier into the kernel, even
if the CAN support is build as a module.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/iwlwifi/pcie/tx.c
Minor iwlwifi conflict in TX queue disabling between 'net', which
removed a bogus warning, and 'net-next' which added some status
register poking code.
Signed-off-by: David S. Miller <davem@davemloft.net>
attribute is copied to IFNAMSIZ-size stack variable,
but IFNAMSIZ is smaller than IPSET_MAXNAMELEN.
Fortunately nfnetlink needs CAP_NET_ADMIN.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Paul Gortmaker says:
====================
The most interesting thing here, at least from a user perspective,
is the broadcast link fix -- where there was a corner case where
two endpoints could get in a state where they disagree on where
to start Rx and ack of broadcast packets.
There is also the poll/wait changes which could also impact
end users for certain use cases - the fixes there also better
align tipc with the rest of the networking code.
The rest largely falls into routine cleanup category, by getting
rid of some unused routines, some Kconfig clutter, etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mac80211 checks the DS params IE if present and
uses it for the (primary) BSS channel, instead of the one
that the frame was received on. This is particularly useful
in the 2.4 GHz band since a frame is often received on one
of the adjacent channels due to overlap.
Move this code to cfg80211 so other drivers also do this.
Additionally, on 5 GHz, in particular with some (possibly)
upcoming changes in 802.11ai and duplicate transmissions
when wider channels are used, something similar happens.
So if present, also use the (primary) channel information
contained in the HT operation IE.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the AP doesn't support HT, or more importantly if
it does but we have to disable it because its IEs are
broken, don't advertise HT support in our association
request. Otherwise, we configure our channel to be a
20 MHz non-HT channel but the AP might still think we
support HT, or even 40 MHz.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since the 11n spec amendment was rolled into the
2012 version, "11n" no longer makes sense. Use
"HT" instead.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the driver doesn't support 40 MHz channels, then
mac80211 erroneously sets number of RX chains to one
although the number of chains is independent of the
support for 40 MHz channels.
Fix this by checking the 40 MHz support only for the
code that sets the 40 MHz channel not the complete
HT code block.
This also means the HT20 channel type will always be
set in the changed code block so there's no need to
set it in case we override the AP due to invalid IEs
in the probe response/beacon.
The indentation is a bit quirky, but I'm rewriting
this code for VHT support so this will change again
very soon.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The radiotap vendor area in the skb head must be skipped
and accounted for in a few functions until it is removed.
I missed this in my patch, so a few places use this data
as though it was the 802.11 header, fix these places.
Reported-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com>
Tested-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Starting from 3.6 we cache output routes for
multicasts only when using route to 224/4. For local receivers
we can set RTCF_LOCAL flag depending on the membership but
in such case we use maddr and saddr which are not caching
keys as before. Additionally, we can not use same place to
cache routes that differ in RTCF_LOCAL flag value.
Fix it by caching only RTCF_MULTICAST entries
without RTCF_LOCAL (send-only, no loopback). As a side effect,
we avoid unneeded lookup for fnhe when not caching because
multicasts are not redirected and they do not learn PMTU.
Thanks to Maxime Bizon for showing the caching
problems in __mkroute_output for 3.6 kernels: different
RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
ip_output call and the visible problem is that traffic can
not reach local receivers via loopback.
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is work the same as for ipv4.
All other hacks about tcp repair are in common code for ipv4 and ipv6,
so this patch is enough for repairing ipv6 connections.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains two Netfilter fixes:
* Fix buffer overflow in the name of the timeout policy object
in the cttimeout infrastructure, from Florian Westphal.
* Fix a bug in the hash set in case that IP ranges are
specified, from Jozsef Kadlecsik.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
This pull request is intended for net-next and contains the following changes:
1) Remove a redundant check when initializing the xfrm replay functions,
from Ulrich Weber.
2) Use a faster per-cpu helper when allocating ipcomt transforms,
from Shan Wei.
3) Use a static gc threshold value for ipv6, simmilar to what we do
for ipv4 now.
4) Remove a commented out function call.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
This pull request is intended for 3.7 and contains a single patch to
fix the IPsec gc threshold value for ipv4.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There used to be a time when TIPC had lots of Kconfig knobs the
end user could alter, but they have all been made automatic or
obsolete, with the exception of CONFIG_TIPC_PORTS. This
previously existing set of options was all hidden under the
TIPC_ADVANCED setting, which does not exist in any code, but
only in Kconfig scope.
Having this now, just to hide the one remaining "advanced"
option no longer makes sense. Remove it. Also get rid of the
ifdeffery in the TIPC code that allowed for TIPC_PORTS to be
possibly undefined.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
As the variable:node is currently defined to u32 type, it is
unnecessary to cast its type to u32 again when using it.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Upon establishing a first link between two nodes, there is
currently a risk that the two endpoints will disagree on exactly
which sequence number reception and acknowleding of broadcast
packets should start.
The following scenarios may happen:
1: Node A sends an ACTIVATE message to B, telling it to start acking
packets from sequence number N.
2: Node A sends out broadcast N, but does not expect an acknowledge
from B, since B is not yet in its broadcast receiver's list.
3: Node A receives ACK for N from all nodes except B, and releases
packet N.
4: Node B receives the ACTIVATE, activates its link endpoint, and
stores the value N as sequence number of first expected packet.
5: Node B sends a NAME_DISTR message to A.
6: Node A receives the NAME_DISTR message, and activates its endpoint.
At this moment B is added to A's broadcast receiver's set.
Node A also sets sequence number 0 as the first broadcast packet
to be received from B.
7: Node A sends broadcast N+1.
8: B receives N+1, determines there is a gap in the sequence, since
it is expecting N, and sends a NACK for N back to A.
9: Node A has already released N, so no retransmission is possible.
The broadcast link in direction A->B is stale.
In addition to, or instead of, 7-9 above, the following may happen:
10: Node B sends broadcast M > 0 to A.
11: Node A receives M, falsely decides there must be a gap, since
it is expecting packet 0, and asks for retransmission of packets
[0,M-1].
12: Node B has already released these packets, so the broadcast
link is stale in direction B->A.
We solve this problem by introducing a new unicast message type,
BCAST_PROTOCOL/STATE, to convey the sequence number of the next
sent broadcast packet to the other endpoint, at exactly the moment
that endpoint is added to the own node's broadcast receivers list,
and before any other unicast messages are permitted to be sent.
Furthermore, we don't allow any node to start receiving and
processing broadcast packets until this new synchronization
message has been received.
To maintain backwards compatibility, we still open up for
broadcast reception if we receive a NAME_DISTR message without
any preceding broadcast sync message. In this case, we must
assume that the other end has an older code version, and will
never send out the new synchronization message. Hence, for mixed
old and new nodes, the issue arising in 7-12 of the above may
happen with the same probability as before.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Rename the "supported" flag in bclink structure to "recv_permitted"
to better reflect what it is used for. When this flag is set for a
given node, we are permitted to receive and acknowledge broadcast
messages from that node. Convert it to a bool at the same time,
since it is not used to store any numerical values.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The "supportable" flag in bclink structure is a compatibility flag
indicating whether a peer node is capable of receiving TIPC broadcast
messages. However, all TIPC versions since tipc-1.5, and after the
inclusion in the upstream Linux kernel in 2006, support this capability.
It is highly unlikely that anybody is still using such an old
version of TIPC, let alone that they want to mix it with TIPC-2.0
nodes. Therefore, we now remove the "supportable" flag.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Currently at the TIPC bearer layer there is the following congestion
mechanism:
Once sending packets has failed via that bearer, the bearer will be
flagged as being in congested state at once. During bearer congestion,
all packets arriving at link will be queued on the link's outgoing
buffer. When we detect that the state of bearer congestion has
relaxed (e.g. some packets are received from the bearer) we will try
our best to push all packets in the link's outgoing buffer until the
buffer is empty, or until the bearer is congested again.
However, in fact the TIPC bearer never receives any feedback from the
device layer whether a send was successful or not, so it must always
assume it was successful. Therefore, the bearer congestion mechanism
as it exists currently is of no value.
But the bearer blocking state is still useful for us. For example,
when the physical media goes down/up, we need to change the state of
the links bound to the bearer. So the code maintaing the state
information is not removed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When a socket is shut down, we should wake up all thread sleeping on
it, instead of just one of them. Otherwise, when several threads are
polling the same socket, and one of them does shutdown(), the
remaining threads may end up sleeping forever.
Also, to align socket usage with common practice in other stacks, we
use one of the common socket callback handlers, sk_state_change(),
to wake up pending users. This is similar to the usage in e.g.
inet_shutdown(). [net/ipv4/af_inet.c].
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Chen Gang reports:
the length of nla_data(cda[CTA_TIMEOUT_NAME]) is not limited in server side.
And indeed, its used to strcpy to a fixed-sized buffer.
Fortunately, nfnetlink users need CAP_NET_ADMIN.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Due to the missing ininitalization at adding/deleting entries, when
a plain_ip,port,net element was the object, multiple elements were
added/deleted instead. The bug came from the missing dangling
default initialization.
The error-prone default initialization is corrected in all hash:* types.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
socket during link congestion, the connect message will be discarded
and sendmsg will return EAGAIN. This is normal behavior, and the
application is expected to poll the socket until POLLOUT is set,
after which the connection attempt can be retried.
However, the POLLOUT flag is never set for unconnected sockets and
poll() always returns a zero mask. The application is then left without
a trigger for when it can make another attempt at sending the message.
The solution is to check if we're polling on an unconnected socket
and set the POLLOUT flag if the TIPC port owned by this socket
is not congested. The TIPC ports waiting on a specific link will be
marked as 'not congested' when the link congestion have abated.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When an application blocks at poll/select on a TIPC socket
while requesting a specific event mask, both the filter_rcv() and
wakeupdispatch() case will wake it up unconditionally whenever
the state changes (i.e an incoming message arrives, or congestion
has subsided). No mask is used.
To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
with tipc_data_ready and tipc_write_space respectively, which makes
tipc more in alignment with the rest of the networking code. These
pass the exact set of possible events to the waker in fs/select.c
hence avoiding waking up blocked processes unnecessarily.
In doing so, we uncover another issue -- that there needs to be a
memory barrier in these poll/receive callbacks, otherwise we are
subject to the the same race as documented above wq_has_sleeper()
[in commit a57de0b4 "net: adding memory barrier to the poll and
receive callbacks"]. So we need to replace poll_wait() with
sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
in these two new functions.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
John W. Linville says:
====================
This is a batch of fixes intended for 3.7...
Included are two pulls. Regarding the mac80211 tree, Johannes says:
"Please pull my mac80211.git tree (see below) to get two more fixes for
3.7. Both fix regressions introduced *before* this cycle that weren't
noticed until now, one for IBSS not cleaning up properly and the other
to add back the "wireless" sysfs directory for Fedora's startup scripts."
Regarding the iwlwifi tree, Johannes says:
"Please also pull my iwlwifi.git tree, I have two fixes: one to remove a
spurious warning that can actually trigger in legitimate situations, and
the other to fix a regression from when monitor mode was changed to use
the "sniffer" firmware mode."
Also included is an nfc tree pull. Samuel says:
"We mostly have pn533 fixes here, 2 memory leaks and an early unlocking fix.
Moreover, we also have an LLCP adapter linked list insertion fix."
On top of that, a few more bits... Albert Pool adds a USB ID
to rtlwifi. Bing Zhao provides two mwifiex fixes -- one to fix
a system hang during a command timeout, and the other to properly
report a suspend error to the MMC core. Finally, Sujith Manoharan
fixes a thinko that would trigger an ath9k hang during device reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
All packet headers in front of an ethernet header have to be completely
divisible by 2 but not by 4 to make the payload after the ethernet header again
4 bytes boundary aligned.
A packing of 2 is necessary to avoid extra padding at the end of the struct
caused by a structure member which is larger than two bytes. Otherwise the
structure would not fulfill the previously mentioned rule to avoid the
misalignment of the payload after the ethernet header. It may also lead to
leakage of information when the padding it not initialized before sending.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
If the skb is fragmented, the checksum must be computed on the
individual fragments, just using skb->data may fail on fragmented
data. Instead of doing linearizing the packet, use the new
batadv_crc32 to do that more efficiently- it should not hurt
replacing the old crc16 by the new crc32.
Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
By adding batadv_send_skb_to_orig() in send.c, we can remove duplicate
code that looks up the next hop and then calls batadv_send_skb_packet().
Furthermore, this prepares the upcoming new implementation of
fragmentation, which requires the next hop to route packets.
Please note that this doesn't entirely remove the next-hop lookup in
routing.c and unicast.c, since it is used by the current fragmentation
code.
Also note that the next-hop info is removed from debug messages in
translation-table.c, since it is looked up elsewhere.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
This patch adds support for an array of debugfs general (not soft_iface
specific) attributes. With this change it will be possible to add more general
attributes by simply appending them to the array without touching the rest of
the code.
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
The address and the VLAN VID may not be packed in the respective
structs. Fix this by comparing the elements individually.
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
The transtable_global debug file can show multiple entries for a single client
when multiple gateways exist. The chosen gateway isn't marked in the list and
therefore the user cannot easily debug the situation when there is a problem
with the currently used gateway.
The best gateway is now marked with "*" and secondary gateways are marked with
"+".
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Provide drivers with hooks to create debugfs files when
a new station is added. This would help drivers to take
advantage of mac80211's station list infrastructure and not maintain
tedious station management code internally.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
[ifdef inline wrapper functions]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In the event that an association exceeds its max_retrans attempts, we should
send an ABORT chunk indicating that we are closing the assocation as a result.
Because of the nature of the error, its unlikely to be received, but its a nice
clean way to close the association if it does make it through, and it will give
anyone watching via tcpdump a clue as to what happened.
Change notes:
v2)
* Removed erroneous changes from sctp_make_violation_parmlen
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, inet6_csk_update_pmtu() should consistently
return NULL.
Bug added in commit 35ad9b9cf7
(ipv6: Add helper inet6_csk_update_pmtu().)
Reported-by: Lluís Batlle i Rossell <viric@viric.name>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kfree on a null pointer is a no-op.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add the support of 6RD tunnels management via netlink.
Note that netdev_state_change() is now called when 6RD parameters are updated.
6RD parameters are updated only if there is at least one 6RD attribute.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
OOB authentication mechanism should be used only if pairing process
has been activated by previous OOB information exchange (Core Spec
4.0 , vol. 1, Part A, 5.1.4.3). Stored OOB data for specific device
should be removed if that device was discovered in band later on.
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving L2CAP Create Channel Request set the channel as
L2CAP_FCS_NONE. Then in "L2CAP Config req" following field will
be set: "FCS Option 0x00 (No FCS)". So by default High Speed
channels have no FCS.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
local_amp_id is used in l2cap_physical_cfm and shall be set up
before calling it.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving Physical Link Completed event we need to create L2CAP
channel with L2CAP Create Chan Request. Current code was sending
this command only if connection was pending (which is probably
needed in channel move case). If channel is not moved but created
Create Chan should be sent for outgoing channel which is checked
with BT_CONNECT flag.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Sujith reported warnings with suspend/resume due to
channel contexts. When I looked into it, I realised
that the code was completely broken as it unassigned
the channel contexts when suspending, which actually
means they are destroyed.
Eliad Peller then pointed out that we also need to
remove the channel contexts from the driver. When I
looked into this, I also noticed that the code isn't
handling the virtual monitor interface correctly (if
it exists.)
Fix this by calling just the driver methods (if they
are implemented) instead of using the channel context
management code. Also add reconfiguration for the
virtual monitor interface.
Reported-by: Sujith Manoharan <sujith@msujith.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The wireless and wext includes in net-sysfs.c aren't
needed, so remove them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flush_tasklet is a struct, not a pointer in percpu var.
so use this_cpu_ptr to get the member pointer.
Signed-off-by: Shan Wei <davidshan@tencent.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
list_add was called with swapped parameters
Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Drivers are allowed to modify the sent skb and thus we need to make a copy
of it before passing it to the driver. Without this fix, LLCP Tx skbs were
not queued properly as the ptype check was failing due to e.g. the pn533
driver skb_pushing the Tx skb.
Reported-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the tx pending queues and/or the socket tx queue is getting too deep,
we have to let userspace know. We won't be queueing any more frames until
the congestion is fixed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
A chip with pre-opened gates may send events on a gate that nobody
has opened in the handset host. Discard those events.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
NFC_HCI_ID_MGMT_VERSION_SW and NFC_HCI_ID_MGMT_VERSION_HW are optional
registers for gate NFC_HCI_ID_MGMT_GATE in standard HCI. When chip
doesn't implement, just leave all the information as zeros.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In some cases, pre-opened pipes don't stay open when a clear all pipes
command is sent. They stay created however. Therefore, one can never
assume that such a pipe is already open. As re-opening a pipe seems not
to be a problem, we do that now.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Set chan->fcs to L2CAP_FCS_NONE only for new L2CAP channels
(not moved). Other side can still request to use FCS.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>