TLS offload device may lose sync with the TCP stream if packets
arrive out of order. Drivers can currently request a resync at
a specific TCP sequence number. When a record is found starting
at that sequence number kernel will inform the device of the
corresponding record number.
This requires the device to constantly scan the stream for a
known pattern (constant bytes of the header) after sync is lost.
This patch adds an alternative approach which is entirely under
the control of the kernel. Kernel tracks records it had to fully
decrypt, even though TLS socket is in TLS_HW mode. If multiple
records did not have any decrypted parts - it's a pretty strong
indication that the device is out of sync.
We choose the min number of fully encrypted records to be 2,
which should hopefully be more than will get retransmitted at
a time.
After kernel decides the device is out of sync it schedules a
resync request. If the TCP socket is empty the resync gets
performed immediately. If socket is not empty we leave the
record parser to resync when next record comes.
Before resync in message parser we peek at the TCP socket and
don't attempt the sync if the socket already has some of the
next record queued.
On resync failure (encrypted data continues to flow in) we
retry with exponential backoff, up to once every 128 records
(with a 16k record thats at most once every 2M of data).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
handle_device_resync() doesn't describe the function very well.
The function checks if resync should be issued upon parsing of
a new record.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS offload code casts record number to a u64. The buffer
should be aligned to 8 bytes, but its actually a __be64, and
the rest of the TLS code treats it as big int. Make the
offload callbacks take a byte array, drivers can make the
choice to do the ugly cast if they want to.
Prepare for copying the record number onto the stack by
defining a constant for max size of the byte array.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We subtract "TLS_HEADER_SIZE - 1" from req_seq, then if they
match we add the same constant to seq. Just add it to seq,
and we don't have to touch req_seq.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable 'status' in __packet_lookup_frame_in_block() is never used since
introduction in commit f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer
implementation."), we can remove it.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ASSERT_OVSL() in ovs_vport_del() is unnecessary because
ovs_vport_del() is only called by ovs_dp_detach_port() and
ovs_dp_detach_port() calls ASSERT_OVSL() too.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink_walk_start() needed to return an error code because of
rhashtable_walk_init(). but that was converted to rhashtable_walk_enter()
and it is a void type function. so now netlink_walk_start() doesn't need
any return value.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test checks that route exceptions can be successfully listed and
flushed using ip -6 route {list,flush} cache.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
net: Enable nexthop objects with IPv4 and IPv6 routes
This is the final set of the initial nexthop object work. When I
started this idea almost 2 years ago, it took 18 seconds to inject
700k+ IPv4 routes with 1 hop and about 28 seconds for 4-paths. Some
of that time was due to inefficiencies in 'ip', but most of it was
kernel side with excessive synchronize_rcu calls in ipv4, and redundant
processing validating a nexthop spec (device, gateway, encap). Worse,
the time increased dramatically as the number of legs in the routes
increased; for example, taking over 72 seconds for 16-path routes.
After this set, with increased dirty memory limits (fib_sync_mem sysctl),
an improved ip and nexthop objects a full internet fib (743,799 routes
based on a pull in January 2019) can be pushed to the kernel in 4.3
seconds. Even better, the time to insert is "almost" constant with
increasing number of paths. The 'almost constant' time is due to
expanding the nexthop definitions when generating notifications. A
follow on patch will be sent adding a sysctl that allows an admin to
avoid the nexthop expansion and truly get constant route insert time
regardless of the number of paths in a route! (Useful once all programs
used for a deployment that care about routes understand nexthop objects).
To be clear, 'ip' is used for benchmarking for no other reason than
'ip -batch' is a trivial to use for the tests. FRR, for example, better
manages nexthops and route changes and the way those are pushed to the
kernel and thus will have less userspace processing times than 'ip -batch'.
Patches 1-10 iterate over fib6_nh with a nexthop invoke a processing
function per fib6_nh. Prior to nexthop objects, a fib6_info referenced
a single fib6_nh. Multipath routes were added as separate fib6_info for
each leg of the route and linked as siblings:
f6i -> sibling -> sibling ... -> sibling
| |
+--------- multipath route ---------+
With nexthop objects a single fib6_info references an external
nexthop which may have a series of fib6_nh:
f6i ---> nexthop ---> fib6_nh
...
fib6_nh
making IPv6 routes similar to IPv4. The side effect is that a single
fib6_info now indirectly references a series of fib6_nh so the code
needs to walk each entry and call the local, per-fib6_nh processing
function.
Patches 11 and 13 wire up use of nexthops with fib entries for IPv4
and IPv6. With these commits you can actually use nexthops with routes.
Patch 12 is an optimization for IPv4 when using nexthops in the most
predominant use case (no metrics).
Patches 14 handles replace of a nexthop config.
Patches 15-18 add update pmtu and redirect tests to use both old and
new routing.
Patches 19 and 20 add new tests for the nexthop infrastructure. The first
is single nexthop is used by multiple prefixes to communicate with remote
hosts. This is on top of the functional tests already committed. The
second verifies multipath selection.
v4
- changed return to 'goto out' in patch 9 since the rcu_read_lock is
held (noticed by Wei)
v3
- removed found arg in patch 7 and changed rt6_nh_remove_exception_rt
to return 1 when a match is found for an exception
v2
- changed ++i to i++ in patches 1 and 14 as noticed by DaveM
- improved commit message for patch 14 (nexthop replace)
- removed the skip_fib argument to remove_nexthop; vestige of an
older design
====================
Reviewed-By: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a version of router_multipath.sh that uses nexthop objects for
routes.
Ido requested a version that does not cause regressions with mlxsw
testing since it does not support nexthop objects yet.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tests where multiple FIB entries use the same nexthop object. Generate
per-cpu cached routes for each by running ping on each cpu, and then
generate exceptions unique to each prefix (remote host) with different
mtus.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a second pass to icmp_redirect.sh to use nexthop objects for
routes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add routing setup using nexthop objects and repeat tests with
old and new routing.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the route add commands to a new function called setup_routing_old.
The '_old' refers to the classic way of installing routes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the block of code that runs a test and prints the verdict to a
new function, run_test.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for atomically upating a nexthop config.
When updating a nexthop, walk the lists of associated fib entries and
verify the new config is valid. Replace is done by swapping nh_info
for single nexthops - new config is applied to old nexthop struct, and
old config is moved to new nexthop struct. For nexthop groups the same
applies but for nh_group. In addition for groups the nh_parent reference
needs to be updated. The old config is released by calling __remove_nexthop
on the 'new' nexthop which now has the old config. This is done to avoid
messing around with the list_heads that track which fib entries are
using the nexthop.
After the swap of config data, bump the sequence counters for FIB entries
to invalidate any dst entries and send notifications to userspace. The
notifications include the new nexthop spec as well as any fib entries
using the updated nexthop struct.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib6_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.
Update ip6_route_del to check metric and protocol before nexthop
specs. If fc_nh_id is set, then it must match the id in the route
entry. Since IPv6 allows delete of a cached entry (an exception),
add ip6_del_cached_rt_nh to cycle through all of the fib6_nh in
a fib entry if it is using a nexthop.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Be optimistic about re-using a fib_info when nexthop id is given and
the route does not use metrics. Avoids a memory allocation which in
most cases is expected to be freed anyways.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.
Update fib_nh_match to check ids on a route delete.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nexthop_for_each_fib6_nh to call fib6_nh_mtu_change for each
fib6_nh in a nexthop for rt6_mtu_change_route. For __ip6_rt_update_pmtu,
we need to find the nexthop that correlates to the device and gateway
in the rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nexthop_for_each_fib6_nh and fib6_nh_find_match to find the
fib6_nh in a nexthop that correlates to the device and gateway
in the rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a hook in __ip6_route_redirect to handle a nexthop struct in a
fib6_info. Use nexthop_for_each_fib6_nh and fib6_nh_redirect_match
to call ip6_redirect_nh_match for each fib6_nh looking for a match.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a hook in rt6_flush_exceptions, rt6_remove_exception_rt,
rt6_update_exception_stamp_rt, and rt6_age_exceptions to handle
nexthop struct in a fib6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a hook in fib6_info_uses_dev to handle nexthop struct in a fib6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a hook in rt6_nlmsg_size to handle nexthop struct in a fib6_info.
rt6_nh_nlmsg_size is used to sum the space needed for all nexthops in
the fib entry.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a hook in __find_rr_leaf to handle nexthop struct in a fib6_info.
nexthop_for_each_fib6_nh is used to walk each fib6_nh in a nexthop and
call find_match. On a match, use the fib6_nh saved in the callback arg
to setup fib6_result.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a hook in rt6_device_match to handle nexthop struct in a fib6_info.
The new rt6_nh_dev_match uses nexthop_for_each_fib6_nh to walk each
fib6_nh in a nexthop and call __rt6_device_match. On match,
rt6_nh_dev_match returns the fib6_nh and rt6_device_match uses it to
setup fib6_result.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nexthop_for_each_fib6_nh to walk all fib6_nh in a nexthop when
dropping 'from' reference in pcpu routes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 has traditionally had a single fib6_nh per fib6_info. With
nexthops we can have multiple fib6_nh associated with a fib6_info.
Add a nexthop helper to invoke a callback for each fib6_nh in a
'struct nexthop'. If the callback returns non-0, the loop is
stopped and the return value passed to the caller.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warning:
net/ipv4/tcp_fastopen.c:75:29: warning:
symbol 'tcp_fastopen_alloc_ctx' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
r8169: improve handling of chip-specific configuration
This series improves and simplifies handling of chip-specific
configuration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the code by removing struct rtl_cfg_info. Only info we need
per PCI ID is whether it supports GBit or not.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To prepare removal of struct rtl_cfg_info, set the coalesce
config based on the chip version number.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the latest changes we don't need separate functions
rtl_hw_start_8168 and rtl_hw_start_8101 any longer. This allows us to
simplify the code. For this change we need to move rtl_hw_start() and
rtl_hw_start_8169(). rtl_hw_start_8169() is unchanged.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CPCMD_QUIRK_MASK isn't specific to certain chip versions. The vendor
driver applies this mask to all 8168 versions. Therefore remove QUIRK
from the mask name and apply it on all chip versions.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far several places in the code deal with setting the interrupt mask
for the respective chip versions. Improve this by having one function
for this only. In addition don't set RxFIFOOver for all 8101 chip
versions like in the vendor driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Page pods are used for direct data placement, this patch
enables eDRAM page pods if firmware supports this feature.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier says:
====================
net: mvpp2: Add extra ethtool stats
This series adds support for more ethtool counters in PPv2 :
- Per port counters, including one indicating the classifier drops
- Per RXQ and per TXQ counters
The first 2 patches perform some light rework and renaming, and the 3rd
adds the extra counters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Besides the MIB counters, some other useful counters can be exposed to
the user. This commit adds support for :
- Per-port counters, that indicate FIFO drops and classifier drops,
- Per-rxq counters,
- Per-txq counters
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we'll be adding support for other kind of internal counters, make
clear that the currently supported counters are the MIB counters.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When first configuring a port on PPv2, we want to clear the internal
counters so that we don't get values from previous boot stages.
However, we can't really clear these counters when resetting the MAC,
since there are valid reasons to do so while the port is being used,
such as when reconfiguring the interface mode with the PHY.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/mscc/ocelot_ace.c: In function ‘vcap_cmd’:
drivers/net/ethernet/mscc/ocelot_ace.c:108:6: warning: variable ‘rc’ set
but not used [-Wunused-but-set-variable]
int rc;
^
It's never used since introduction in commit b596229448 ("net: mscc:
ocelot: Add support for tcam")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case autoflowlabel is in action, skb_get_hash_flowi6()
derives a non zero skb->hash to the flowlabel.
If skb->hash is zero, a flow dissection is performed.
Since all TCP skbs sent from ESTABLISH state inherit their
skb->hash from sk->sk_txhash, we better keep a copy
of sk->sk_txhash into the TIME_WAIT socket.
After this patch, ACK or RST packets sent on behalf of
a TIME_WAIT socket have the flowlabel that was previously
used by the flow.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
RGMII delays for SJA1105 DSA driver
This patchset configures the Tunable Delay Lines of the SJA1105 P/Q/R/S
switches. These add a programmable phase offset on the RGMII RX and TX
clock signals and get used by the driver for fixed-link interfaces that
use the rgmii-id, rgmii-txid or rgmii-rxid phy-modes.
Tested on a board where RGMII delays were already set up, by adding
MAC-side delays on the RGMII interface towards a BCM5464R PHY and
noticing that the MAC now reports SFD, preamble, FCS etc. errors.
Conflicts trivially in drivers/net/dsa/sja1105/sja1105_spi.c with
https://patchwork.ozlabs.org/project/netdev/list/?series=112614&state=*
which must be applied first.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As per the DT phy-mode specification, RGMII delays are applied by the
MAC when there is no PHY present on the link.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pad_mii_tx registers point to the same memory region but were
unused. So convert to using these for RGMII I/O cell configuration, as
they bear a shorter name.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This puts the quad PHY ports in power-down mode when the PHY transitions
to the PHY_HALTED state. It is likely that all the other PHYs support
the BMCR_PDOWN bit, but I only have the BCM5464R to test.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Rethink PHYLINK callbacks for SJA1105 DSA
This patchset implements phylink_mac_link_up and phylink_mac_link_down,
while also removing the code that was modifying the EGRESS and INGRESS
MAC settings for STP and replacing them with the "inhibit TX"
functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The first fact that needs to be stated is that the per-MAC settings in
SJA1105 called EGRESS and INGRESS do *not* disable egress and ingress on
the MAC. They only prevent non-link-local traffic from being
sent/received on this port.
So instead of having .phylink_mac_config essentially mess with the STP
state and force it to DISABLED/BLOCKING (which also brings useless
complications in sja1105_static_config_reload), simply add the
.phylink_mac_link_down and .phylink_mac_link_up callbacks which inhibit
TX at the MAC level, while leaving RX essentially enabled.
Also stop from trying to put the link down in .phylink_mac_config, which
is incorrect.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used to stop egress traffic in .phylink_mac_link_up.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>