Call into the switchdev driver any time an IPv4 fib entry is
added/modified/deleted from the kernel's FIB. The switchdev driver may or
may not install the route to the offload device. In the case where the
driver tries to install the route and something goes wrong (device's routing
table is full, etc), then all of the offloaded routes will be flushed from the
device, route forwarding falls back to the kernel, and no more routes are
offloading.
We can refine this logic later. For now, use the simplist model of offloading
routes up to the point of failure, and then on failure, undo everything and
mark IPv4 offloading disabled.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If something goes wrong with IPv4 FIB offload, mark entire net offload
disabled. This is brute force policy to basically shut down IPv4 FIB offload
permanently if there is a problem offloading any route to an external device.
We can refine the policy in the future, to handle failures on a per-device or
per-route basis, but for now, this policy is per-net.
What we're trying to avoid is an inconsistent split between the kernel's FIB
and the offload device's FIB. We don't want the device to fwd a pkt
inconsitent with what the kernel would do. An example of a split is if device
has 10.0.0.0/16 and kernel has 10.0.0.0/16 and 10.0.0.0/24, the device wouldn't
see the longest prefix 10.0.0.0/24 and potentially forward pkts incorrectly.
Limited capacity or limited capability are two ways a route may fail to install
to the offload device. We'll not differentiate between failures at this time,
and treat any failure as fatal and mark the net as fib_offload_disabled.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flesh out ndo wrappers to call into device driver. To call into device driver,
the wrapper must interate over route's nexthops to ensure all nexthop devs
belong to the same switch device. Currently, there is no support for route's
nexthops spanning offloaded and non-offloaded devices, or spanning ports of
multiple offload devices.
Since switch device ports may be stacked under virtual interfaces (bonds and/or
bridges), and the route's nexthop may be on the virtual interface, the wrapper
will traverse the nexthop dev down to the base dev. It's the base dev that's
passed to the switchdev driver's ndo ops.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep switchdev FIB offload model simple for now and don't allow custom ip
rules.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IPv4 fib ndo wrapper funcs and stub them out for now.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two new ndo ops for IPv4 fib offload support, add and del. Add uses
modifiy semantics if fib entry already offloaded. Drivers implementing the new
ndo ops will return err<0 if programming device fails, for example if device's
tables are full.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new RTNH_F_EXTERNAL flag to mark fib entries offloaded externally, for
example to a switchdev switch device.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: dsa: code re-organization
This pull request contains the first part of the patches required to implement
the grand plan detailed here:
http://www.spinics.net/lists/netdev/msg295942.html
These are mostly code re-organization and function bodies re-arrangement to
allow different callers of lower-level initialization functions for 'struct
dsa_switch' and 'struct dsa_switch_tree' to be later introduced.
There is no functional code change at this point.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the core logic that setups a 'struct dsa_switch_tree' and
removes it, update dsa_probe() and dsa_remove() to use the two helper
functions. This will be useful to allow for other callers to setup
this structure differently.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support the new DSA device driver model, a dsa_switch should
be able to advertise the type of tagging protocol supported by the
underlying switch device. This also removes constraints on how tagging
can be stacked to each other.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the part of dsa_switch_setup() which is responsible for allocating
and initializing a 'struct dsa_switch' and the part which is doing a
given switch device setup and slave network device creation.
This is a preliminary change to allow a separate caller of
dsa_switch_setup_one() which may have externally initialized the
dsa_switch structure, outside of dsa_switch_setup().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for allowing a different model to register DSA switches,
update dsa_of_probe() and dsa_probe() to return -EPROBE_DEFER where
appropriate.
Failure to find a phandle or Device Tree property is still fatal, but
looking up the internal device structure associated with a Device Tree
node is something that might need to be delayed based on driver probe
ordering.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for allowing a different mechanism to register DSA switch
devices and driver, update dsa_of_probe and dsa_of_remove to take a
struct device pointer since neither of these two functions uses the
struct platform_device pointer.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
timewait sockets now share a common base with established sockets.
inet_twsk_diag_dump() can use inet_diag_bc_sk() instead of duplicating
code, granted that inet_diag_bc_sk() does proper userlocks
initialization.
twsk_build_assert() will catch any future changes that could break
the assumptions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As pointed out by Ben Hutchings, ioremap uses unsigned long as
its parameter type, so we should be using that instead of u32
or int.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip/udp bearer can be configured in a point-to-point
mode by specifying both local and remote ip/hostname,
or it can be enabled in multicast mode, where links are
established to all tipc nodes that have joined the same
multicast group. The multicast IP address is generated
based on the TIPC network ID, but can be overridden by
using another multicast address as remote ip.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The payload area following the TIPC discovery message header is an
opaque area defined by the media. INT_H_SIZE was enough for
Ethernet/IB/IPv4 but needs to be expanded to carry IPv6 addressing
information.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
This series contains updates to i40e only.
Greg provides fixes for the NPAR transmit scheduler where the driver
initialization caused the BW configurations to not take effect, so use
a BW configuration read and write back to "kick" the transmit scheduler
into action. Fixes the ethtool offline test, where we were not actually
taking the device offline before doing the testing.
Matt modifies the get and set LED functions so they ignore activity LEDs
since we are required to blink the link LEDs only.
Neerav provides a workaround for whenever a DCBX configuration is changed,
where the firmware doe not set the operational status bit of the
application TLV status as returned from the "Get CEE DCBX Oper Cfg" admin
queue command. So remove the check for the operational and sync bits of
the application TLV status until a firmware fix is provided.
Shannon changes the driver to grab the NVM devstarter version and not
the image version, since it is the more useful version and is what
should be displayed. Moves the IRQ tracking setup and tear down into
the same routines that do the IRQ setup and tear down. This keeps
like activities together and allows us to track exactly the number
of vectors reserved from the OS, which may be fewer than are available
from the hardware.
Jesse provides a fix to use a more portable sign extension by replacing
0xffff.... with ~(u64)0 or ~(u32)0. Also fixes XPS mask when resetting,
where the driver would accidentally clear the XPS mask for all queues
back to 0. This caused higher CPU utilization and had some other
performance impacts for transmit tests. Cleans up some whitespace
formatting.
Catherine provides a fix where some firmware versions are incorrectly
reporting a breakout cable as PHY type 0x3 when it should be 0x16
(I40E_PHY_TYPE_10GBASE_SFPP_CU). Adds the 10G and 40G AOC PHY types
to the case statement in get_media_type and ethtool get_settings so
that the correct information gets reported back to the user.
Anjali provides IOREMAP changes for future device support, where we
do not want to map the whole CSR space since some of it is mapped by
other drivers with different mapping methods.
Mitch changes the i40e driver to not "spam" the system log with
messages about VF VSI when VFs are created and when they are reset to
reduce user annoyance.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes this build error:
net/mpls/af_mpls.c: In function 'resize_platform_label_table':
net/mpls/af_mpls.c:767:4: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
labels = vzalloc(size);
^
Fixes: 7720c01f3f ("mpls: Add a sysctl to control the size of the mpls label table")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai says:
====================
cxgb4: RX Queue related cleanup and fixes
This patch series adds a common function to allocate RX queues and queue
allocation changes to RDMA CIQ
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To allow for better scalability on systems with large core counts, we
will try and allocate enough RDMA Concentrator IQs and MSI/X vectors as
we have cores. If we cannot get enough MSI/X vectors, fall back to the
minimum required: 1 per adapter rx channel.
Also clean up cxgb_enable_msix() to make it readable and correct a bug
where the vectors are not correctly assigned if the driver doesn't get
the full amount requested.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a common function for all Rx queue allocation.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This extends the design in commit 958501163d ("bridge: Add support for
IEEE 802.11 Proxy ARP") with optional set of rules that are needed to
meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The
previously added BR_PROXYARP behavior is left as-is and a new
BR_PROXYARP_WIFI alternative is added so that this behavior can be
configured from user space when required.
In addition, this enables proxyarp functionality for unicast ARP
requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible
to use unicast as well as broadcast for these frames.
The key differences in functionality:
BR_PROXYARP:
- uses the flag on the bridge port on which the request frame was
received to determine whether to reply
- block bridge port flooding completely on ports that enable proxy ARP
BR_PROXYARP_WIFI:
- uses the flag on the bridge port to which the target device of the
request belongs
- block bridge port flooding selectively based on whether the proxyarp
functionality replied
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bump i40e to 1.2.11 and i40evf to 1.2.5
Change-ID: Ie13375941606b0a027e5b5dbc235f5f5f03b75c8
Signed-off-by: Sravanthi Tangeda <sravanthi.tangeda@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The PF driver spams the system log with messages about VF VSI when VFs
are created, as well as each time they are reset. This is annoying, and
the information isn't even useful most of the time.
Remove this message to reduce user annoyance.
Change-ID: I8de90d05380f54b038c9c8c3265150be87c9242c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Move the IRQ tracking setup and teardown into the same routines that
do the IRQ setup and teardown. This keeps like activities together and
allows us to track exactly the number of vectors reserved from the OS,
which may be fewer than are available from the HW.
Change-ID: I6b2b1a955c5f0ac6b94c3084304ed0b2ea6777cf
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
For future device support we do not want to map the whole CSR space since some
of it is mapped by other drivers with different mapping methods.
Note: As a side effect, the flash region (if exposed through the memory map)
gets unmapped too since it follows the future use region.
Change-ID: Ic729a2eacd692984220b1a415ff4fa0f98ea419a
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix some double blank lines and un-split a function declaration that all
fits on one line. Also make i40e_get_priv_flags static.
Change-ID: I11b5d25d1153a06b286d0d2f5d916d7727c58e4a
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add the 10G and 40G AOC PHY types to the case statement in get_media_type
and ethtool get_settings so that the correct information gets reported
back to the user.
Change-ID: I1b4849d22199a9acf7c8807166d0317c1faad375
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the system administrator is requesting an offline diagnostic test using
'ethtool -t' then we should, you know, actually take the device offline
before doing the testing.
Change-ID: I6afa1cbfcc821c9ab6e6f47ed4d8dc2d8dd20e82
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Some FW versions are incorrectly reporting a breakout cable as PHY type
0x3 when it should be 0x16 (I40E_PHY_TYPE_10GBASE_SFPP_CU).
If we get this value back from FW and the version is < 4.40, reassign it
to I40E_PHY_TYPE_10GBASE_SFPP_CU.
Change-ID: Ibb41a0e3cd2c0753744e8553959240df6ed13ae8
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
During resets (possibly caused by a Tx hang) the driver would
accidentally clear the XPS mask for all queues back to 0.
This caused higher CPU utilization and had some other performance impacts
for transmit tests.
Change-ID: I95f112432c9e643a153eaa31cd28cdcbfdd01831
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use automatic sign extension by replacing 0xffff... constants
with ~(u64)0 or ~(u32)0.
Change-ID: I73cab4cd2611795bb12e00f0f24fafaaee07457c
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
0x2A is the NVM version so it has useful data but it is per image
version every image can have a different one. 0x18 is the dev starter
version which all the images for release will have the same version.
Of the two 0x18 is more useful and is what should be displayed.
Change-ID: Idf493da13a42ab211e2de0bef287f5de51033cca
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In CEE mode the firmware does not set the operational status bit of
the application TLV status as returned from the "Get CEE DCBX Oper Cfg"
AQ command. This occurs whenever a DCBX configuration is changed.
This is a workaround to remove the check for the operational and sync bits
of the application TLV status till a firmware fix is provided.
Change-ID: I1a31ff2fcadcb06feb5b55776a33593afc6ea176
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Modify our get and set LED functions so they ignore activity LEDs,
as we are required to blink the link LEDs only.
Change-ID: I647ea67a6fc95cbbab6e3cd01d81ec9ae096a9ad
Signed-off-by: Matt Jared <matthew.a.jared@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Recent changes to the driver initialization have caused the BW
configurations to not take effect. We use a BW configuration read and
write back to "kick" the Tx scheduler into action.
Change-ID: I94ab377c58d3a3986e3de62b6c199be3fd2ee5e6
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
1. Use c_index and ring->c_index to determine how many TxCBs/TxBDs are
ready for cleanup
- c_index = the current value of TDMA_CONS_INDEX
- TDMA_CONS_INDEX is HW-incremented and auto-wraparound (0x0-0xFFFF)
- ring->c_index = __bcmgenet_tx_reclaim() cleaned up to this point on
the previous invocation
2. Add bcmgenet_tx_ring->clean_ptr
- index of the next TxCB to be cleaned
- incremented as TxCBs/TxBDs are processed
- value always in range [ring->cb_ptr, ring->end_ptr]
3. Fix incrementing of dev->stats.tx_packets
- should be incremented only when tx_cb_ptr->skb != NULL
These changes simplify __bcmgenet_tx_reclaim(). Furthermore, Tx ring size
can now be any value.
With the old code, Tx ring size had to be a power-of-2:
num_tx_bds = ring->size;
c_index &= (num_tx_bds - 1);
last_c_index &= (num_tx_bds - 1);
Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck says:
====================
ipv4/fib_trie: Cleanups to prepare for introduction of key vector
This patch series is meant to mostly just clean up the fib_trie to prepare
it for the introduction of the key_vector. As such there are a number of
minor clean-ups such as reformatting the tnode to match the format once the
key vector is introduced, some optimizations to drop the need for a leaf
parent pointer, and some changes to remove duplication of effort such as
the 2 look-ups that were essentially being done per node insertion.
v2: Added code to cleanup idx >> n->bits and explain unsigned long logic
Added code to prevent allocation when tnode size is larger than size_t
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds code to prevent us from attempting to allocate a tnode with
a size larger than what can be represented by size_t.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change updates the fib_table_lookup function so that it is in sync
with the fib_find_node function in terms of the explanation for the index
check based on the bits value.
I have also updated it from doing a mask to just doing a compare as I have
found that seems to provide more options to the compiler as I have seen it
turn this into a shift of the value and test under some circumstances.
In addition I addressed one minor issue in which we kept computing the key
^ n->key when checking the fib aliases. I pulled the xor out of the loop
in order to reduce the number of memory reads in the lookup. As a result
we should save a couple cycles since the xor is only done once much earlier
in the lookup.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections. This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we are going to compact the leaf and tnode we first need to make sure
the fields are all in the same place. In that regard I am moving the leaf
pointer which represents the fib_alias hash list to occupy what is
currently the first key_vector pointer.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that the insert and delete functions make use of
the tnode pointer returned in the fib_find_node call. By doing this we
will not have to rely on the parent pointer in the leaf which will be going
away soon.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that the parent pointer is returned by reference in
fib_find_node. By doing this I can use it to find the parent node when I
am performing an insertion and I don't have to look for it again in
fib_insert_node.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that leaf_walk_rcu takes a tnode and a key instead
of the trie and a leaf.
The main idea behind this is to avoid using the leaf parent pointer as that
can have additional overhead in the future as I am trying to reduce the
size of a leaf down to 16 bytes on 64b systems and 12b on 32b systems.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>