Check link device when looking up a tunnel. When a tunnel is
linked to a interface, traffic from a different interface must not
reach the tunnel.
This also allows creating of multiple tunnels with the same
endpoints, if the link device differs.
Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When locating the tunnel, do not continue if it is found. Otherwise
a different tunnel with similar configuration would be returned and
parts could be overwritten.
Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DHCP spec allows the server to specify the MTU. This can be useful
for netbooting with UDP-based NFS-root on a network using jumbo frames.
This patch allows the kernel IP autoconfiguration to handle this option
correctly.
It would be possible to use initramfs and add a script to set the MTU,
but that seems like a complicated solution if no initramfs is otherwise
necessary, and would bloat the kernel image more than this code would.
This patch was originally submitted to LKML in 2003 by Hans-Peter Jansen.
Signed-off-by: Chris Friesen <cfriesen@nortel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nlmsg_new() adds the size of the netlink header to the value
that has been passed as parameter. If NLMSG_GOODSIZE is selected,
we request an allocation of one memory page plus the size of the
header. Instead, NLMSG_DEFAULT_SIZE should be used since it
already substracts the size of the Netlink header.
I have the impression that the similar naming in both constant
is error prone when using it with nlmsg_new(). This is already
documented in include/net/netlink.h
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can slightly reduce size of teqlN structure, not duplicating stats
structure in teql_master but using stats field from net_device.stats
for tx_errors and from netdev_queue for tx_bytes/tx_packets/tx_dropped
values.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).
CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.
It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.
David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().
List of devices that must clear this flag is :
- loopback device, because it calls netif_rx() and quoting Patrick :
"ip_route_input() doesn't accept loopback addresses, so loopback packets
already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function
- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holding rtnl_lock when we are unregistering the sysfs files can
deadlock if we unconditionally take rtnl_lock in a sysfs file. So fix
it with the now familiar patter of: rtnl_trylock and syscall_restart()
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sysctls are unregistered with the rntl_lock held making
it unsafe to unconditionally grab the the rtnl_lock. Instead
we need to call rtnl_trylock and restart the system call
if we can not grab it. Otherwise we could deadlock at unregistration
time.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just returning -ERESTARTSYS without a signal pending is not
good that will just leak it to userspace. We need return
-ERESTARTNOINTR so we always restart and set signal pending
so that we fall of the fast path of syscall return and setup
the system call restart.
So use restart_syscall() which does all of this for us.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The earlier patch to fix the deadlock between a network device going
away and writing to sysfs attributes was incomplete.
- It did not set signal_pending so we would leak ERSTARTSYS to user space.
- It used ERESTARTSYS which only restarts if sigaction configures it to.
- It did not cover store and show for ifalias.
So fix all of these up and use the new helper restart_syscall so we get
the details correct on what it takes.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New packet socket feature that makes packet socket more efficient for
transmission.
- It reduces number of system call through a PACKET_TX_RING mechanism,
based on PACKET_RX_RING (Circular buffer allocated in kernel space
which is mmapped from user space).
- It minimizes CPU copy using fragmented SKB (almost zero copy).
Signed-off-by: Johann Baudy <johann.baudy@gnu-log.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
gen_estimator can overflow bps (bytes per second) with Gb links, while
it was designed with a u32 API, with a theorical limit of 34360Mbit
(2^32 bytes)
Using 64 bit intermediate avbps/brate counters can allow us to reach
this theorical limit.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need for net/icmp.h header in net/ipv4/fib_frontend.c.
This patch removes the #include net/icmp.h from it.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can update netdev_queue tx_bytes/tx_packets/tx_dropped counters instead
of dev->stats ones, to reduce number of cache lines dirtied in xmit path.
This fixes a performance problem on SMP when many different cpus take
vlan tx path.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
offsetof(struct net_device, features)=0x44
offsetof(struct net_device, stats.tx_packets)=0x54
offsetof(struct net_device, stats.tx_bytes)=0x5c
offsetof(struct net_device, stats.tx_dropped)=0x6c
Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
tx path can slow down SMP operations, since they dirty a cache line
that should stay shared (dev->features is needed in rx and tx paths)
We could move away stats field in net_device but it wont help that much.
(Two cache lines dirtied in tx path, we can do one only)
Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
netdev_queue because this structure is already touched in tx path and
counters updates will then be free (no increase in size)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is illegal to dereference a skb after a successful ndo_start_xmit()
call. We must store skb length in a local variable instead.
Bug was introduced in 2.6.27 by commit 0abf77e55a
(net_sched: Add accessor function for packet length for qdiscs)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.
I tried my best to deal with urg hole madness too which happens
here:
if (!sock_flag(sk, SOCK_URGINLINE)) {
++*seq;
...
by using additional offset by one but I certainly have very
little interest in testing that part.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Frans Pop <elendil@planet.nl>
Tested-by: Ian Zimmermann <itz@buug.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If bridge is configured with no STP and forwarding delay of 0 (which
is typical for virtualization) then when link starts it will flood all
packets for the first 20 seconds.
This bug was introduced by a combination of earlier changes:
* forwarding database uses hold time of zero to indicate
user wants to always flood packets
* optimzation of the case of forwarding delay of 0 avoids the initial
timer tick
The fix is to just skip all the topology change detection code if
kernel STP is not being used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the bridge catches all STP packets; even if STP is turned
off. This prevents other systems (which do have STP turned on)
from being able to detect loops in the network.
With this patch, if STP is off, then any packet sent to the STP
multicast group address is forwarded to all ports.
Based on earlier patch by Joakim Tjernlund with changes
to go through forwarding (not local chain), and optimization
that only last octet needs to be checked.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct net_device trans_start field is a hot spot on SMP and high performance
devices, particularly multi queues ones, because every transmitter dirties
it. Is main use is tx watchdog and bonding alive checks.
But as most devices dont use NETIF_F_LLTX, we have to lock
a netdev_queue before calling their ndo_start_xmit(). So it makes
sense to move trans_start from net_device to netdev_queue. Its update
will occur on a already present (and in exclusive state) cache line, for
free.
We can do this transition smoothly. An old driver continue to
update dev->trans_start, while an updated one updates txq->trans_start.
Further patches could also put tx_bytes/tx_packets counters in
netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TCP frees up write buffer space, avoid waking up tasks that have
done a poll() or select() on the same socket specifying read-side
events.
This is an extension of a read-side patch by Eric Dumazet.
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a DHCP server is delayed, it's possible for the client to receive the
DHCPOFFER after it has already sent out a new DHCPDISCOVER message from
a second interface. The client then sends out a DHCPREQUEST from the
second interface, but the server doesn't recognize the device and
rejects the request.
This patch simply tracks the current device being configured and throws
away the OFFER if it is not intended for the current device. A more
sophisticated approach would be to put the OFFER information into the
struct ic_device rather than storing it globally.
Signed-off-by: Chris Friesen <cfriesen@nortel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like the dev in netpoll_poll can be NULL - at lease it's
checked at the function beginning. Thus the dev->netde_ops dereference
looks dangerous.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can remove this lock here, since we are in cgroup write handler and
thus the cgrp is guaranteed to be valid, and no lock is needed when
writing a u32 variable.
Signed-off-by: Li Zefan <lizf@cn.fujitsuc.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch fixes issues with dev->dev_addr changing from array to pointer.
Hopefully there are no others.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6:
Bluetooth: Don't trigger disconnect timeout for security mode 3 pairing
Bluetooth: Don't use hci_acl_connect_cancel() for incoming connections
Bluetooth: Fix wrong module refcount when connection setup fails
Another case of me handling the fallout from Davem's unfortunate
addiction to shuffleboard.
Won't anybody think of the children? Join the anti-shuffleboard league
today!
* git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
iwlwifi: fix device id registration for 6000 series 2x2 devices
ath5k: update channel in sw state after stopping RX and TX
rtl8187: use DMA-aware buffers with usb_control_msg
mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel
airo: airo_get_encode{,ext} potential buffer overflow
Pulled directly by Linus because Davem is off playing shuffle-board at
some Alaskan cruise, and the NULL ptr deref issue hits people and should
get merged sooner rather than later.
David - make us proud on the shuffle-board tournament!
There's this internal wifi_wme_noack_test variable that
we use to set the QoS control if set. For one, it is
unlikely that it is set. Secondly, if set it needs to
influence the IEEE80211_TX_CTL_NO_ACK TX control flag,
and finally we should also be able to set it at all, so
make it available in debugfs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently mac80211 announces a rate set with no basic rates,
this fixes it to use 1/2 or 6/9 Mbit as basic rates by default.
Additionally, mac80211 will currently adopt the peer's entire
rate set, rather than just the basic rate set; fix that too.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Even when we find an IBSS with the SSID we're looking for, we
may not be able to connect to it because it has a key and we
don't, or vice versa. Avoid such situations by checking the
privacy capability bit.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The time we wait for a probe response after probing an AP due to
beacon loss is currently the same as the monitoring interval, 2s.
This is far too long, APs should respond to probes within a
fraction of that time. To be able to adjust both values, add a
new constant IEEE80211_PROBE_WAIT, use it for checking the probe
response, and adjust it down to 200ms instead of 2 seconds.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The driver might keep reporting beacon loss until we
disassociate -- catch that and don't respond to any
subsequent events until the probe is either successful
or we disassociate.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Even though they are true, they cause sparse to complain
because it doesn't see the __acquires(dev_base_lock) on
dev_seq_start() because it is only added to the function
in net/core/dev.c, not the header file. To keep track of
the nesting correctly we should probably annotate those
functions publically, but for now let's just remove the
annotation I added to wext.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When setting a key with NL80211_CMD_NEW_KEY, we should allow the key
sequence number (RSC) to be set in order to allow replay protection to
work correctly for group keys. This patch documents this use for
nl80211 and adds the couple of missing pieces in nl80211/cfg80211 and
mac80211 to support this. In addition, WEXT SIOCSIWENCODEEXT compat
processing in cfg80211 is extended to handle the RSC (this was already
specified in WEXT, but just not implemented in cfg80211/mac80211).
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new NL80211_ATTR_CONTROL_PORT flag for NL80211_CMD_ASSOCIATE to
allow user space to indicate that it will control the IEEE 802.1X port
in station mode. Previously, mac80211 was always marking the port
authorized in station mode. This was enough when drop_unencrypted flag
was set. However, drop_unencrypted can currently be controlled only
with WEXT and the current nl80211 design does not allow fully secure
configuration. Fix this by providing a mechanism for user space to
control the IEEE 802.1X port in station mode (i.e., do the same that
we are already doing in AP mode).
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It is currently not possible to modify station flags, but that
capability would be very useful. This patch introduces a new
nl80211 attribute that contains a set/mask for station flags,
and updates the internal API (and mac80211) to mirror that.
The new attribute is parsed before falling back to the old so
that userspace can specify both (if it can) to work on all
kernels.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
NL80211_STA_FLAG_MFP was forgotten from sta_flags_policy. The previous
version added the flag due to the loop used in parse_station_flags,
but the proper behavior would be to allow nla_parse_nested() to go
through the policy for all flags.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Move key handling wireless extension ioctls from mac80211 to cfg80211
so that all drivers that implement the cfg80211 operations get wext
compatibility.
Note that this drops the SIOCGIWENCODE ioctl support for getting
IW_ENCODE_RESTRICTED/IW_ENCODE_OPEN. This means that iwconfig will
no longer report "Security mode:open" or "Security mode:restricted"
for mac80211. However, what we displayed there (the authentication
algo used) was actually wrong -- linux/wireless.h states that this
setting is meant to differentiate between "Refuse non-encoded packets"
and "Accept non-encoded packets".
(Combined with "cfg80211: fix a couple of bugs with key ioctls". -- JWL)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
nfsd: silence lockdep warning
lockd: fix list corruption on lockd restart
nfsd4: check for negative dentry before use in nfsv4 readdir
nfsd41: slots are freed with session
svcrdma: clean up error paths.
svcrdma: Fix dma map direction for rdma read targets
Currently, get_wireless_stats is racy by _design_. This is
because it returns a buffer, which needs to be statically
allocated since it cannot be freed if it was allocated
dynamically. Also, SIOCGIWSTATS and /proc/net/wireless use
no common lock, and /proc/net/wireless accesses are not
synchronised against each other. This is a design flaw in
get_wireless_stats since the beginning.
This patch fixes it by wrapping /proc/net/wireless accesses
with the RTNL so they are protected against each other and
SIOCGIWSTATS. The more correct method of fixing this would
be to pass in the buffer instead of returning it and have
the caller take care of synchronisation of the buffer, but
even then most drivers probably assume that their callback
is protected by the RTNL like all other wext callbacks.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
On non-AP interfaces userspace has no business interfering with
the station management, this can confuse mac80211 (and other
drivers probably wouldn't support it anyway). Allow adding and
removing stations only on AP interfaces.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To make it more apparent in the code what is for wext
only (and needs to be #ifdef'ed) put all the info for
wext into a substruct in each wireless_dev.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The address pointed to by mac_addr can be marked as const.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When we disassociate, we set the channel to non-HT which
obviously invalidates any ht_operation_mode setting. But
when we then associate with the next AP again, we might
still have the ht_operation_mode from the previous AP
cached and fail to configure the hardware with the new
(but unchanged) operation mode. This patch fixes it by
separately tracking whether our cache is valid.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There really is no need to have a separate struct for a
single variable. The fact that it exists is due to the
code legacy, but we can remove that now. Very simple.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>