All handler->err() routines expect that we've done a pskb_may_pull()
test to make sure that IP header length + 8 bytes can be safely
pulled.
Reported-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Can be used to match packets against netfilter ip sets created via ipset(8).
skb->sk_iif is used as 'incoming interface', skb->dev is 'outgoing interface'.
Since ipset is usually called from netfilter, the ematch
initializes a fake xt_action_param, pulls the ip header into the
linear area and also sets skb->data to the IP header (otherwise
matching Layer 4 set types doesn't work).
Tested-by: Mr Dash Four <mr.dash.four@googlemail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
First update the adapter variables with the current speed and
mode before fire the notification. Otherwise, the get_settings()
may provide old values.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6lowpan module starts collecting incomming frames and fragments
right after lowpan_module_init() therefor it will be better to
clean unfinished fragments in lowpan_cleanup_module() function
instead of doing it when link goes down.
Changed spinlocks type to prevent deadlock with expired timer event
and removed unused one.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function lowpan_alloc_new_frame() takes u8 tag as an argument. However,
its only caller, lowpan_process_data() passes down a u16. Hence,
the tag value can get corrupted. This prevent 6lowpan fragment reassembly of a
message when the fragment tag value is over 256.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make symbols static to avoid the following warning shown up
by sparse:
warning: symbol ... was not declared. Should it be static?
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use netdev_alloc_skb_ip_align() instead of alloc_skb() to get some
extra headroom in case we need to forward this frame in a tunnel or
something else.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add method to get the device short 802.15.4 address. This call
needed by ieee802154 layer to satisfy 'iz list' request from
the user space.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix LOCKDEP bug message for the irq handler spinlock.
Make the irq processing code more explicit and stable.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert the commit 768f7c7c12 to initialize
spinlock in the more preferable way and make it static to avoid sparse
warning.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"retval" has to be a signed integer for the error handling to work.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of older ColdFire CPU based boards use NS8390 based network
controllers. Most use the Davicom 9008F or the UMC 9008F. This driver
provides the support code to get these devices working on these platforms.
Generally the NS8390 based eth device is direct connected via the general
purpose bus of the ColdFire CPU. So its addressing and interrupt setup is
fixed on each of the different platforms (classic platform setup).
This driver is based on the other drivers/net/ethernet/8390 drivers, and
includes the lib8390.c code. It uses the existing definitions of the
board NS8390 device addresses, interrupts and access types from the
arch/m68k/include/asm/mcf8390.h, but moves the IO access functions into
the driver code and out of that header.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mcfne.h include contains definitions to support NS8390 eth based hardware
on ColdFire based CPU boards. So change its name to reflect that better.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As described in my patch series from the other day, we need to
rearrange redirect handling so that the local initiators of packets
(sockets, tunnels, xfrms, etc.) that implement the protocols compute
the route and pass this down into the ipv4/ipv6 routing code.
These changes here do so by implementing a new dst_ops->redirect
method.
No more do we have this funny code that tries several different sets
of routing keys to try and figure out which route the redirect should
actually be applied to.
No more do we have the problem wherein TOS rewriting causes problems
for us.
Signed-off-by: David S. Miller <davem@davemloft.net>
This sets things up so that we can have the protocol error handlers
call down into the ipv6 route code for redirects just as ipv4 already
does.
Signed-off-by: David S. Miller <davem@davemloft.net>
No longer needed, as the protocol handlers now all properly
propagate the redirect back into the routing code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass in the SKB rather than just the IP addresses, so that policy
and other aspects can reside in ip_rt_redirect() rather then
icmp_redirect().
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduce TSQ (TCP Small Queues)
TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.
sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.
TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.
As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.
This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.
Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.
Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)
I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.
As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.
If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.
[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
but some drivers call it in their start_xmit() handler.
These drivers should at least use BQL, or else a single TCP
session can still fill the whole NIC TX ring, since TSQ will
have no effect.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent patch "tcp: Maintain dynamic metrics in local cache." introduced
an out of bounds access due to what appears to be a typo. I believe this
change should resolve the issue by replacing the access to RTAX_CWND with
TCP_METRIC_CWND.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are really only 3 modes that can control the number of queues. Those
are RSS, DCB, and VMDq/SR-IOV. Currently we have things much more broken
up than they need to be for how we are configuring the rings. In order to
try and straiten some of this out I am going to start merging similar
functionality into single functions. To start with I am merging the Flow
Director ring configuration into the RSS ring configuration since Flow
Director cannot function with DCB or SR-IOV.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch replaces a switch statement for an 82598 workaround with an if
statement that only applies to 82598. In addition I am pulling out several
dead pieces of code and instead of reading the SRRCTL register and then
modifying it we are just writing a value which we generate from scratch.
Finally I am also removing any drop enable related code since that was
moved to a function of its own.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The mask value for ring features was overloaded for FCoE which can lead to
some confusion. In order to avoid any confusion I am splitting the mask
value and adding an offset value. This can be used for the start of the
FCoE rings, and in the future I hope to use it to store the start of the
registers for SR-IOV.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We are currently using indices to indicate the upper limit on a ring
feature. However since we can switch back and forth on features such as
DCB and that has effects on other features such as RSS it is preferable to
instead store the upper limit separate from the current value for the
number of rings related to the feature.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It makes much more sense for us to count q_vectors instead of MSI-X
vectors. We were using num_msix_vectors to find the number of q_vectors in
multiple places. This was wasteful since we only had one place that
actually needs the number of MSI-X vectors and that is in slow path.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
mld->mld_maxdelay is net endian, so we should use ntohs, not htons
CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 6d29b1ef introduces a bug, ntohs is __be16_to_cpu,
not cpu_to_be16.
We always use htons on IP_OFFSET and IP_MF, then compare
with network package.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ETH_P_IP is host Endian, skb->protocol is big Endian, when
compare them, Using htons on skb->protocol is wrong.
And fix two code style issues: indentation and remove
unnecessary parentheses.
CC: Tristram Ha <Tristram.Ha@micrel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/batman-adv/bridge_loop_avoidance.c
net/batman-adv/bridge_loop_avoidance.h
net/batman-adv/soft-interface.c
net/mac80211/mlme.c
With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).
The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
In rare cases, bnx2x_free_tx_skbs() can unmap the wrong DMA address
when it gets to the last entry of the tx ring. We were not using
the proper macro to skip the last entry when advancing the tx index.
Reported-by: Zongyun Lai <zlai@vmware.com>
Reviewed-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in
skb_try_coalesce()
This warning tracks drivers that incorrectly set skb->truesize
IPoIB indeed allocates a full page to store a fragment, but only
accounts in skb->truesize the used part of the page (frame length)
This patch fixes skb truesize underestimation, and
also fixes a performance issue, because RX skbs have not enough tailroom
to allow IP and TCP stacks to pull their header in skb linear part
without an expensive call to pskb_expand_head()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Shlomo Pongartz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In driver reload test there is a memory leak.
The structure vlan_info was not freed when the driver was removed.
It was not released since the nr_vids var is one after last vlan was removed.
The nr_vids is one, since vlan zero is added to the interface when the interface
is being set, but the vlan zero is not deleted at unregister.
Fix - delete vlan zero when we unregister the device.
Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix a bug generated by the wrong interaction between the GW feature and the
Bridge Loop Avoidance
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAk/2EMQACgkQpGgxIkP9cweoqgCeNGrHU9HxBnKXSylNcqhQBzqr
9jMAni+gJX+lzmrA2j1w/rCaamuNpbJG
=mXZq
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- fix a bug generated by the wrong interaction between the GW feature and the
Bridge Loop Avoidance