Currently, a nexthop group is destroyed when the last FIB entry is
detached from it.
When nexthop objects are supported, this can no longer be the case, as
the group is a separate object whose lifetime is managed by user space.
Add an indication if a nexthop group can be destroyed and always set it
to true for the existing IPv4 and IPv6 nexthop groups.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the IPv6 FIB info has a nexthop object, the nexthop offload
indication is set on the nexthop object and not on the FIB info itself.
Therefore, do not try to clear the offload indication from the FIB info
when it has a nexthop object.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Attach the FIB entry to the nexthop group after setting the offload flag
on the IPv6 FIB info (i.e., 'struct fib6_info'). The second operation is
not needed when the nexthop group is a nexthop object. This will allow
us to have a common exit path from the function, regardless of the
nexthop group's type.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The previous patch associated a nexthop group with the FIB entry before
the entry's type is determined.
Make use of the nexthop group when determining the entry's type instead
of relying on helpers that assume that the nexthop info is not a nexthop
object (i.e., 'struct nexthop').
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Each FIB entry has a type (e.g., remote, local) that determines how the
entry is programmed to the device. In order to determine if the entry is
local (directly connected) or remote (has a gateway) the relevant FIB
info structures (e.g., 'struct fib_info') are checked.
When entries that use nexthop objects are supported, these checks will
need to be changed to take into account 'struct nexthop'.
Instead, first associate the entry with a nexthop group so that the next
patch could determine the entry's type based on the associated nexthop
group's type.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The sole caller of the function will soon only have the ifindex
available, instead of the pointer itself.
Therefore, change the function to take the ifindex as input and have it
get the pointer.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ifindex of the nexthop device was never set for IPv4 nexthops,
unlike IPv6 nexthops. This went unnoticed since only IPv6 nexthops use
it.
Set the ifindex for IPv4 nexthops in order to be consistent with IPv6
and also because it will be used by a later patch.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function allocates 'nhgi', not 'nh_grp', so it needs to free the
former in its error path.
Fixes: 7f7a417e6a ("mlxsw: spectrum_router: Split nexthop group configuration to a different struct")
Addresses-Coverity: ("Memory - corruptions (USE_AFTER_FREE)")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver sends Ethernet Management Datagram (EMAD) packets to the
device for configuration purposes and waits for up to 200ms for a reply.
A request is retried up to 5 times.
When the system is under heavy load, replies are not always processed in
time and EMAD transactions fail.
Make the process more robust to such delays by using exponential
backoff. First wait for up to 200ms, then retransmit and wait for up to
400ms and so on.
Fixes: caf7297e7a ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Reported-by: Denis Yulevich <denisyu@nvidia.com>
Tested-by: Denis Yulevich <denisyu@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit cited below moved firmware flashing functionality from
mlxsw_spectrum to mlxsw_core, but did not adjust the Kconfig
dependencies. This makes it possible to have mlxsw_core as built-in and
mlxfw as a module. The mlxfw code is therefore not reachable from
mlxsw_core and firmware flashing fails:
# devlink dev flash pci/0000:01:00.0 file mellanox/mlxsw_spectrum-13.2008.1310.mfa2
devlink answers: Operation not supported
Fix by having mlxsw_core select mlxfw.
Fixes: b79cb787ac ("mlxsw: Move fw flashing code into core.c")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: a6a5325239 ("atl1e: Atheros L1E Gigabit Ethernet driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1605581875-36281-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 43250ddd75 ("atl1c: Atheros L1C Gigabit Ethernet driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1605581721-36028-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the following unreachable code issue:
drivers/net/ethernet/netronome/nfp/crypto/tls.c: In function 'nfp_net_tls_add':
include/linux/compiler_attributes.h:208:41: warning: statement will never be executed [-Wswitch-unreachable]
208 | # define fallthrough __attribute__((__fallthrough__))
| ^~~~~~~~~~~~~
drivers/net/ethernet/netronome/nfp/crypto/tls.c:299:3: note: in expansion of macro 'fallthrough'
299 | fallthrough;
| ^~~~~~~~~~~
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Link: https://lore.kernel.org/r/20201117171347.GA27231@embeddedor
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The code refactoring of ILT configuration was not complete, the old
unused variables were used for the SRC block. That could lead to the memory
corruption by HW when rx filters are configured.
This patch completes that refactoring.
Fixes: 8a52bbab39 (qed: Debug feature: ilt and mdump)
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Link: https://lore.kernel.org/r/20201116132944.2055-1-dbogdanov@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since mailbox message for installing flows is in place,
remove the RXVLAN_ALLOC mbox message which is redundant.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch introduces new mailbox mesages to retrieve a given
MCAM entry or base flow steering rule of a VF installed by its
parent PF. This helps while updating the existing MCAM rules
with out re-framing the whole mailbox request again. The INSTALL
FLOW mailbox consumer can read-modify-write the existing entry.
Similarly while installing new flow rules for a VF, the base
flow steering rule match creteria is copied to the new flow rule
and the deltas are appended to the new rule.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Co-developed-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch handles the VF mac address changes as given below.
1. mac addr configrued by VF will be retained until VF module unload.
2. mac addr configred by PF for VF will be retained until power cycle.
3. mac addr confgired by PF for its VF can't be overwritten by VF.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds support for ndo_set_vf_mac, ndo_set_vf_vlan
and ndo_get_vf_config handlers. The traffic redirection
based on the VF mac address or vlan id is done by installing
MCAM rules. Reserved RX_VTAG_TYPE7 in each NIXLF for VF VLAN
which strips the VLAN tag from ingress VLAN traffic. The NIX PF
allocates two MCAM entries for VF VLAN feature, one used for
ingress VTAG strip and another entry for egress VTAG insertion.
This patch also updates the MAC address in PF installed VF VLAN
rule upon receiving nix_lf_start_rx mbox request for VF since
Administrative Function driver will assign a valid MAC addr
in nix_lf_start_rx function.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Co-developed-by: Tomasz Duszynski <tduszynski@marvell.com>
Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch implements egress VLAN offload by appending NIX_SEND_EXT_S
header to NIX_SEND_HDR_S. The VLAN TCI information is specified
in the NIX_SEND_EXT_S. The VLAN offload in the ingress path is
implemented by configuring the NIX_RX_VTAG_ACTION_S to strip and
capture the outer vlan fields. The NIX PF allocates one MCAM entry
for Rx VLAN offload.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch modifies the existing nix_vtag_config mailbox message
to allocate and free TX VTAG entries as requested by a NIX PF.
The TX VTAG entries are global resource that shared by all PFs
and each entry specifies the size of VTAG to insert and the VTAG
header data to insert. The mailbox response contains the entry
index which is used by mailbox requester in configuring the
NPC_TX_VTAG_ACTION for any MCAM entry.
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add debugfs support to dump the MCAM rules installed using
NPC_INSTALL_FLOW mbox message. Debugfs file can display mcam
entry, counter if any, flow type and counter hits.
Ethtool will dump the ntuple flows related to the PF only.
The debugfs file gives systemwide view of the MCAM rules
installed by all the PF's.
Below is the example output when the debugfs file is read:
~ # mount -t debugfs none /sys/kernel/debug
~ # cat /sys/kernel/debug/octeontx2/npc/mcam_rules
Installed by: PF1
direction: RX
mcam entry: 227
udp source port 23 mask 0xffff
Forward to: PF1 VF0
action: Direct to queue 0
enabled: yes
counter: 1
hits: 0
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add unicast MAC address filtering support using install flow
message. Total of 8 MCAM entries are allocated for adding
unicast mac filtering rules. If the MCAM allocation fails,
the unicast filtering support will not be advertised.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds support for adding and deleting ethtool ntuple
filters. The filters for ether, ipv4, ipv6, tcp, udp and sctp
are supported. The mask is also supported. The supported actions
are drop and direct to a queue. Additionally we support FLOW_EXT
field vlan_tci and FLOW_MAC_EXT.
The NIX PF will allocate total 32 MCAM entries for the use of
ethtool ntuple filters. The Administrative Function(AF) will
install/delete the MCAM rules when NIX PF sends mailbox message
to install/delete the ntuple filters.
Ethtool ntuple filters support is restricted to PFs as of now
and PF can install ntuple filters to direct the traffic to its
VFs. Hence added a separate callback for VFs to get/set RSS
configuration.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Added new mailbox messages to install and delete MCAM rules.
These mailbox messages will be used for adding/deleting ethtool
n-tuple filters by NIX PF. The installed MCAM rules are stored
in a list that will be traversed later to delete the MCAM entries
when the interface is brought down or when PCIe FLR is received.
The delete mailbox supports deleting a single MCAM entry or range
of entries or all the MCAM entries owned by the pcifunc. Each MCAM
entry can be associated with a HW match stat entry if the mailbox
requester wants to check the hit count for debugging.
Modified adding default unicast DMAC match rule using install
flow API. The default unicast DMAC match entry installed by
Administrative Function is saved and can be changed later by the
mailbox user to fit additional fields, or the default MCAM entry
rule action can be used for other flow rules installed later.
Modified rvu_mbox_handler_nix_lf_free mailbox to add a flag to
disable or delete the MCAM entries. The MCAM entries are disabled
when the interface is brought down and deleted in FLR handler.
The disabled MCAM entries will be re-enabled when the interface
is brought up again.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Key Extraction(KEX) profile decides how the packet metadata such as
layer information and selected packet data bytes at each layer are
placed in MCAM search key. This patch reads the configured KEX profile
parameters to find out the bit position and bit mask for each field.
The information is used when programming the MCAM match data by SW
to match a packet flow and take appropriate action on the flow. This
patch also verifies the mandatory fields such as channel and DMAC
are not overwritten by the KEX configuration of other fields.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds support to verify the channel number sent by
mailbox requester before writing MCAM entry for Ingress packets.
Similarly for Egress packets, verifying the PF_FUNC sent by the
mailbox user.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current default Key Extraction(KEX) profile can only use RX
packet fields while generating the MCAM search key. The profile
can't be used for matching TX packet fields. This patch modifies
the default KEX profile to add support for extracting TX packet
fields into MCAM search key. Enabled Tx KPU packet parsing by
configuring TX PKIND in tx_parse_cfg.
Modified the KEX profile to extract 2 bytes of VLAN TCI from an
offset of 2 bytes from LB_PTR. The LB_PTR points to the byte offset
where the VLAN header starts. The NPC KPU parser profile has been
modified to point LB_PTR to the starting byte offset of VLAN header
which points to the tpid field.
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the new batched xsk interfaces for the Tx path in the i40e driver
to improve performance. On my machine, this yields a throughput
increase of 4% for the l2fwd sample app in xdpsock. If we instead just
look at the Tx part, this patch set increases throughput with above
20% for Tx.
Note that I had to explicitly loop unroll the inner loop to get to
this performance level, by using a pragma. It is honored by both clang
and gcc and should be ignored by versions that do not support
it. Using the -funroll-loops compiler command line switch on the
source file resulted in a loop unrolling on a higher level that
lead to a performance decrease instead of an increase.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com
Remove the unnecessary access to the software ring for the AF_XDP
zero-copy driver. This was used to record the length of the packet so
that the driver Tx completion code could sum this up to produce the
total bytes sent. This is now performed during the transmission of the
packet, so no need to record this in the software ring.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-3-git-send-email-magnus.karlsson@gmail.com
Due to a hardware issue, an access to MDIO registers
that is concurrent with other ENETC register accesses
may lead to the MDIO access being dropped or corrupted.
The workaround introduces locking for all register accesses
to the ENETC register space. To reduce performance impact,
a readers-writers locking scheme has been implemented.
The writer in this case is the MDIO access code (irrelevant
whether that MDIO access is a register read or write), and
the reader is any access code to non-MDIO ENETC registers.
Also, the datapath functions acquire the read lock fewer times
and use _hot accessors. All the rest of the code uses the _wa
accessors which lock every register access.
The commit introducing MDIO support is -
commit ebfcb23d62 ("enetc: Add ENETC PF level external MDIO support")
but due to subsequent refactoring this patch is applicable on
top of a later commit.
Fixes: 6517798dd3 ("enetc: Make MDIO accessors more generic and export to include/linux/fsl")
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201112182608.26177-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: aedd133d17 ("net/mlx5e: Support CT offload for tc nic flows")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Avoid calling mlx5_esw_modify_vport_rate() if qos is not enabled and
avoid unnecessary syndrome messages from firmware.
Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently when QoS is enabled for VF and any min_rate is configured,
the driver sets bw_share value to at least 1 and doesn’t allow to set
it to 0 to make minimal rate unlimited. It means there is always a
minimal rate configured for every VF, even if user tries to remove it.
In order to make QoS disable possible, check whether all vports have
configured min_rate = 0. If this is true, set their bw_share to 0 to
disable min_rate limitations.
Fixes: c9497c9890 ("net/mlx5: Add support for setting VF min rate")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, if user disables VFs with some min and max rates configured,
they are cleared. But QoS data is not cleared and restored upon next VF
enable placing limits on minimal rate for given VF, when user expects
none.
To match cleared vport->info struct with QoS-related min and max rates
upon VF disable, clear vport->qos struct too.
Fixes: 556b9d16d3 ("net/mlx5: Clear VF's configuration on disabling SRIOV")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Bond events handler uses bond_slave_get_rtnl to check if net device
is bond slave. bond_slave_get_rtnl return the rcu rx_handler pointer
from the netdev which exists for bond slaves but also exists for
devices that are attached to linux bridge so using it as indication
for bond slave is wrong.
Fix by using netif_is_lag_port instead.
Fixes: 7e51891a23 ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Both TC and IPsec crypto offload use metadata_regB to store
private information. Since TC does not use bit 31 of regB, IPsec
will use bit 31 as the IPsec packet marker. The IPsec's regB usage
is changed to:
Bit31: IPsec marker
Bit30-24: IPsec syndrome
Bit23-0: IPsec obj id
Fixes: b2ac7541e3 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The IP's checksum partial still requires L4 csum flag on Ethernet WQE.
Make the IPsec WAs only for the IP's non checksum partial case
(for example icmd packet)
Fixes: 5be019040c ("net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offload")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
On resync, the driver calls inet_lookup_established
(__inet6_lookup_established) that increases sk_refcnt of the socket. To
decrease it, the driver set skb->destructor to sock_edemux. However, it
didn't work well, because the TCP stack also sets this destructor for
early demux, and the refcount gets decreased only once, while increased
two times (in mlx5e and in the TCP stack). It leads to a socket leak, a
TLS context leak, which in the end leads to calling tls_dev_del twice:
on socket close and on driver unload, which in turn leads to a crash.
This commit fixes the refcount leak by calling sock_gen_put right away
after using the socket, thus fixing all the subsequent issues.
Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Besides GL(Gap Limiting), QL(Quantity Limiting) can be modified
dynamically when DIM is supported. So rename gl_adapt_enable as
adapt_enable in struct hns3_enet_coalesce.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For device whose version is above V3(include V3), the GL
configuration can set as 1us unit, so adds support for
configuring this field.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For maintainability and compatibility, add support for querying
the maximum value of GL.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
QL(quantity limiting) means that hardware supports the interrupt
coalesce based on the frame quantity. QL can be configured when
int_ql_max in device's specification is non-zero, so add support
to configure it. Also, rename two coalesce init function to fit
their purpose.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When removing the driver we would hit BUG_ON(!list_empty(&dev->ptype_specific))
in net/core/dev.c due to still having the NC-SI packet handler
registered.
# echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/unbind
------------[ cut here ]------------
kernel BUG at net/core/dev.c:10254!
Internal error: Oops - BUG: 0 [#1] SMP ARM
CPU: 0 PID: 115 Comm: sh Not tainted 5.10.0-rc3-next-20201111-00007-g02e0365710c4 #46
Hardware name: Generic DT based system
PC is at netdev_run_todo+0x314/0x394
LR is at cpumask_next+0x20/0x24
pc : [<806f5830>] lr : [<80863cb0>] psr: 80000153
sp : 855bbd58 ip : 00000001 fp : 855bbdac
r10: 80c03d00 r9 : 80c06228 r8 : 81158c54
r7 : 00000000 r6 : 80c05dec r5 : 80c05d18 r4 : 813b9280
r3 : 813b9054 r2 : 8122c470 r1 : 00000002 r0 : 00000002
Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
Control: 00c5387d Table: 85514008 DAC: 00000051
Process sh (pid: 115, stack limit = 0x7cb5703d)
...
Backtrace:
[<806f551c>] (netdev_run_todo) from [<80707eec>] (rtnl_unlock+0x18/0x1c)
r10:00000051 r9:854ed710 r8:81158c54 r7:80c76bb0 r6:81158c10 r5:8115b410
r4:813b9000
[<80707ed4>] (rtnl_unlock) from [<806f5db8>] (unregister_netdev+0x2c/0x30)
[<806f5d8c>] (unregister_netdev) from [<805a8180>] (ftgmac100_remove+0x20/0xa8)
r5:8115b410 r4:813b9000
[<805a8160>] (ftgmac100_remove) from [<805355e4>] (platform_drv_remove+0x34/0x4c)
Fixes: bd466c3fb5 ("net/faraday: Support NCSI mode")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20201117024448.1170761-1-joel@jms.id.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 39a6f4bce6 ("b44: replace the ssb_dma API with the generic DMA API")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1605582131-36735-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The code in this driver which parses the devicetree to determine
the phy/fixed link setup, can be replaced by a single library
function: of_phy_get_and_connect().
Behaviour is identical, except that the library function will
complain when 'phy-connection-type' is omitted, instead of
blindly using PHY_INTERFACE_MODE_NA, which would result in an
invalid phy configuration.
The library function no longer brings out the exact phy_mode,
but the driver doesn't need this, because phy_interface_is_rgmii()
queries the phydev directly. Remove 'phy_mode' from the private
adapter struct.
While we're here, log info about the attached phy on connect,
this is useful because the phy type and connection method is now
fully configurable via the devicetree.
Tested on a lan7430 chip with built-in phy. Verified that adding
fixed-link/phy-connection-type in the devicetree results in a
fixed-link setup. Used ethtool to verify that the devicetree
settings are used.
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201116170155.26967-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 469981b17a ("qed: Add unaligned and packed packet processing")
Fixes: fcb39f6c10 ("qed: Add mpa buffer descriptors for storing and processing mpa fpdus")
Fixes: 1e28eaad07 ("qed: Add iWARP support for fpdu spanned over more than two tcp packets")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Link: https://lore.kernel.org/r/1605532033-27373-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The only time when nr_frags isn't SKB_MAX_FRAGS is when entering
rtl8169_start_xmit(). However we can use SKB_MAX_FRAGS also here
because when queue isn't stopped there should always be room for
MAX_SKB_FRAGS + 1 descriptors.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/3d1f2ad7-31d5-2cac-4f4a-394f8a3cab63@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VFs do not have access permissions to issue NVM_GET_DEV_INFO
firmware command.
Fixes: 4933f6753b ("bnxt_en: Add bnxt_hwrm_nvm_get_dev_info() to query NVM info.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_add_one_ctr() adds a hardware counter to a software counter and
adjusts for the hardware counter wraparound against the mask. The logic
assumes that the hardware counter is always smaller than or equal to
the mask.
This assumption is mostly correct. But in some cases if the firmware
is older and does not provide the accurate mask, the driver can use
a mask that is smaller than the actual hardware mask. This can cause
some extra carry bits to be added to the software counter, resulting in
counters that far exceed the actual value. Fix it by masking the
hardware counter with the mask passed into bnxt_add_one_ctr().
Fixes: fea6b33355 ("bnxt_en: Accumulate all counters.")
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware is unable to retain the port counters during any kind of
fatal or non-fatal resets, so we must clear the port counters to
avoid false detection of port counter overflow.
Fixes: fea6b33355 ("bnxt_en: Accumulate all counters.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The module eeprom address range returned by bnxt_get_module_eeprom()
should be 256 bytes of A0h address space, the lower half of the A2h
address space, and page 0 for the upper half of the A2h address space.
Fix the firmware call by passing page_number 0 for the A2h slave address
space.
Fixes: 42ee18fe4c ("bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During rmnet unregistration, the real device rx_handler is first cleared
followed by the removal of rx_handler_data after the rcu synchronization.
Any packets in the receive path may observe that the rx_handler is NULL.
However, there is no check when dereferencing this value to use the
rmnet_port information.
This fixes following splat by adding the NULL check.
Unable to handle kernel NULL pointer dereference at virtual
address 000000000000000d
pc : rmnet_rx_handler+0x124/0x284
lr : rmnet_rx_handler+0x124/0x284
rmnet_rx_handler+0x124/0x284
__netif_receive_skb_core+0x758/0xd74
__netif_receive_skb+0x50/0x17c
process_backlog+0x15c/0x1b8
napi_poll+0x88/0x284
net_rx_action+0xbc/0x23c
__do_softirq+0x20c/0x48c
Fixes: ceed73a2cf ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Link: https://lore.kernel.org/r/1605298325-3705-1-git-send-email-subashab@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recycle the page running page_pool_put_full_page() in
mvneta_swbm_add_rx_fragment routine when the last descriptor
contains just the FCS or if the received packet contains more than
MAX_SKB_FRAGS fragments
Fixes: ca0e014609 ("net: mvneta: move skb build after descriptors processing")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/df6a2bad70323ee58d3901491ada31c1ca2a40b9.1605291228.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix an issue where dump stack is printed on suspend resume flow due to
netif_set_real_num_rx_queues() is not called with rtnl_lock held().
Fixes: 686cff3d70 ("net: stmmac: Fix incorrect location to set real_num_rx|tx_queues")
Reported-by: Christophe ROULLIER <christophe.roullier@st.com>
Tested-by: Christophe ROULLIER <christophe.roullier@st.com>
Cc: Alexandre TORGUE <alexandre.torgue@st.com>
Reviewed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Link: https://lore.kernel.org/r/20201115074210.23605-1-vee.khee.wong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 83a8471ba2 ("net: ethernet: ti: cpsw: refactor probe to group common hw initialization")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1605250173-18438-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 9efc9b2b04 ("net: stmmac: Add dwmac-intel-plat for GBE driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1605249243-17262-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 3ced0a88cd ("qlcnic: Add support to run firmware POST")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1605248186-16013-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pm_runtime_get_sync() will increment pm usage at first and it will
resume the device later. If runtime of the device has error or
device is in inaccessible state(or other error state), resume
operation will fail. If we do not call put operation to decrease
the reference, it will result in reference count leak. Moreover,
this device cannot enter the idle state and always stay busy or other
non-idle state later. So we fixed it by replacing it with
pm_runtime_resume_and_get.
Fixes: 8fff755e9f ("net: fec: Ensure clocks are enabled while using mdio bus")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improve the following in rtl8169_start_xmit:
- tp->cur_tx can be accessed in parallel by rtl_tx(), therefore
annotate the race by using WRITE_ONCE
- avoid checking stop_queue a second time by moving the doorbell check
- netif_stop_queue() uses atomic operation set_bit() that includes a
full memory barrier on some platforms, therefore use
smp_mb__after_atomic to avoid overhead
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/80085451-3eaf-507a-c7c0-08d607c46fbc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 21151f64a4 ("mlxsw: Add new FIB entry type for reject
routes") this comment is no longer correct. Remove it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two functions are identical, so consolidate them to
mlxsw_sp_nexthop_type_fini().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two functions are now identical, so consolidate them to
mlxsw_sp_nexthop_type_init().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove it as it is unused.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of passing the nexthop and resolving the nexthop netdev from it,
pass the nexthop netdev directly.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of passing the route and resolving the nexthop netdev from it,
pass the nexthop netdev directly.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The overlay protocol (i.e., IPv4/IPv6) that is being encapsulated has
no impact on whether a certain IP tunnel can be offloaded or not. Only
the underlay protocol matters.
Therefore, remove the unused overlay protocol parameter from the
callback.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the individual nexthops member in the group and attributes of
the group (e.g., its type) are stored in the same struct (i.e., 'struct
mlxsw_sp_nexthop_group'). This is fine since the individual nexthops
cannot change during the lifetime of the group.
With nexthop objects this is no longer the case. An existing nexthop
group can be replaced to use a new set of nexthops. Creating a new
struct whenever a group is replaced entails replacing the group pointer
of all the routes (i.e., 'struct mlxsw_sp_fib_entry') using the group.
Avoid this inefficient step by splitting the nexthop group configuration
to a different struct (i.e., 'struct mlxsw_sp_nexthop_group_info').
When a nexthop group is replaced a new group info struct is created and
the individual rotues do not need to be touched.
Illustration after the change:
mlxsw_sp_fib_entry mlxsw_sp_nexthop_group mlxsw_sp_nexthop_group_info
+-------------------+ +----------------------+ +---------------------------+
| nh_group; +--> nhgi; +--> |
| | | | | |
+-------------------+ +----------------------+ +---------------------------+
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of storing the FIB info as 'priv' when the nexthop group
represents an IPv4 nexthop group, simply store it as a FIB info with a
proper comment.
When nexthop objects are supported, this field will become a union with
the nexthop object's identifier.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Not used anywhere.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When needed, IPv4 routes fetch the FIB info (i.e., 'struct fib_info')
from their associated nexthop group. This will not work when the nexthop
group represents a nexthop object (i.e., 'struct nexthop'), as it will
only have access to the nexthop's identifier.
Instead, store the FIB info in the route itself.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As explained in the previous patch, nexthop objects can have both IPv4
and IPv6 nexthops in the same group. Therefore, move the neighbour table
to be a property of the nexthop instead of the nexthop group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both IPv4 and IPv6 nexthop groups are hashed in the same table. The
protocol field is used to indicate how the hash should be computed for
each group.
When nexthop group objects are supported, the hash will be computed for
them based on the nexthop identifier.
To differentiate between all the nexthop group types, encode the type of
the group in the key instead of the protocol.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the type (i.e., IPv4/IPv6) of the nexthop group is derived
from the neighbour table associated with the group.
This is problematic when nexthop objects are taken into account, as a
nexthop group object can contain both IPv4 and IPv6 nexthops.
Instead, add a new field that indicates the type of the group and
initialize it during the group's creation. Currently, the types are IPv4
('struct fib_info') and IPv6 ('struct fib6_info'). In the future another
type will be added for nexthop objects ('struct nexthop').
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When comparing a key with a nexthop group in rhastable's obj_cmpfn()
callback, make sure that the key and nexthop group are of the same type
(i.e., IPv4 / IPv6).
The bug is not currently visible because IPv6 nexthop groups do not
populate the FIB info pointer and IPv4 nexthop groups do not set the
ifindex for the individual nexthops.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the phy enables power saving technology, the dwmac's software reset
needs more time to complete, enlarge dma reset timeout to 200000us.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Link: https://lore.kernel.org/r/20201113090902.5c7aab1a@xhacker.debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On arm imx6, when opening the chip's netdev, the whole Linux
kernel intermittently hangs/freezes.
This is caused by a bug in the driver code which tests if pcie
interrupts are working correctly, using the software interrupt:
1. open: enable the software interrupt
2. open: tell the chip to assert the software interrupt
3. open: wait for flag
4. ISR: acknowledge s/w interrupt, set flag
5. open: notice flag, disable the s/w interrupt, continue
Unfortunately the ISR only acknowledges the s/w interrupt, but
does not disable it. This will re-trigger the ISR in a tight
loop.
On some (lucky) platforms, open proceeds to disable the s/w
interrupt even while the ISR is 'spinning'. On arm imx6,
the spinning ISR does not allow open to proceed, resulting
in a hung Linux kernel.
Fix minimally by disabling the s/w interrupt in the ISR, which
will prevent it from spinning. This won't break anything because
the s/w interrupt is used as a one-shot interrupt.
Note that this is a minimal fix, overlooking many possible
cleanups, e.g.:
- lan743x_intr_software_isr() is completely redundant and reads
INT_STS twice for no apparent reason
- disabling the s/w interrupt in lan743x_intr_test_isr() is now
redundant, but harmless
- waiting on software_isr_flag can be converted from a sleeping
poll loop to wait_event_timeout()
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # arm imx6 lan7430
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201112204741.12375-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When running this chip on arm imx6, we intermittently observe
the following kernel warning in the log, especially when the
system is under high load:
[ 50.119484] ------------[ cut here ]------------
[ 50.124377] WARNING: CPU: 0 PID: 303 at kernel/softirq.c:169 __local_bh_enable_ip+0x100/0x184
[ 50.132925] IRQs not enabled as expected
[ 50.159250] CPU: 0 PID: 303 Comm: rngd Not tainted 5.7.8 #1
[ 50.164837] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 50.171395] [<c0111a38>] (unwind_backtrace) from [<c010be28>] (show_stack+0x10/0x14)
[ 50.179162] [<c010be28>] (show_stack) from [<c05b9dec>] (dump_stack+0xac/0xd8)
[ 50.186408] [<c05b9dec>] (dump_stack) from [<c0122e40>] (__warn+0xd0/0x10c)
[ 50.193391] [<c0122e40>] (__warn) from [<c0123238>] (warn_slowpath_fmt+0x98/0xc4)
[ 50.200892] [<c0123238>] (warn_slowpath_fmt) from [<c012b010>] (__local_bh_enable_ip+0x100/0x184)
[ 50.209860] [<c012b010>] (__local_bh_enable_ip) from [<bf09ecbc>] (destroy_conntrack+0x48/0xd8 [nf_conntrack])
[ 50.220038] [<bf09ecbc>] (destroy_conntrack [nf_conntrack]) from [<c0ac9b58>] (nf_conntrack_destroy+0x94/0x168)
[ 50.230160] [<c0ac9b58>] (nf_conntrack_destroy) from [<c0a4aaa0>] (skb_release_head_state+0xa0/0xd0)
[ 50.239314] [<c0a4aaa0>] (skb_release_head_state) from [<c0a4aadc>] (skb_release_all+0xc/0x24)
[ 50.247946] [<c0a4aadc>] (skb_release_all) from [<c0a4b4cc>] (consume_skb+0x74/0x17c)
[ 50.255796] [<c0a4b4cc>] (consume_skb) from [<c081a2dc>] (lan743x_tx_release_desc+0x120/0x124)
[ 50.264428] [<c081a2dc>] (lan743x_tx_release_desc) from [<c081a98c>] (lan743x_tx_napi_poll+0x5c/0x18c)
[ 50.273755] [<c081a98c>] (lan743x_tx_napi_poll) from [<c0a6b050>] (net_rx_action+0x118/0x4a4)
[ 50.282306] [<c0a6b050>] (net_rx_action) from [<c0101364>] (__do_softirq+0x13c/0x53c)
[ 50.290157] [<c0101364>] (__do_softirq) from [<c012b29c>] (irq_exit+0x150/0x17c)
[ 50.297575] [<c012b29c>] (irq_exit) from [<c0196a08>] (__handle_domain_irq+0x60/0xb0)
[ 50.305423] [<c0196a08>] (__handle_domain_irq) from [<c05d44fc>] (gic_handle_irq+0x4c/0x90)
[ 50.313790] [<c05d44fc>] (gic_handle_irq) from [<c0100ed4>] (__irq_usr+0x54/0x80)
[ 50.321287] Exception stack(0xecd99fb0 to 0xecd99ff8)
[ 50.326355] 9fa0: 1cf1aa74 00000001 00000001 00000000
[ 50.334547] 9fc0: 00000001 00000000 00000000 00000000 00000000 00000000 00004097 b6d17d14
[ 50.342738] 9fe0: 00000001 b6d17c60 00000000 b6e71f94 800b0010 ffffffff
[ 50.349364] irq event stamp: 2525027
[ 50.352955] hardirqs last enabled at (2525026): [<c0a6afec>] net_rx_action+0xb4/0x4a4
[ 50.360892] hardirqs last disabled at (2525027): [<c0d6d2fc>] _raw_spin_lock_irqsave+0x1c/0x50
[ 50.369517] softirqs last enabled at (2524660): [<c01015b4>] __do_softirq+0x38c/0x53c
[ 50.377446] softirqs last disabled at (2524693): [<c012b29c>] irq_exit+0x150/0x17c
[ 50.385027] ---[ end trace c0b571db4bc8087d ]---
The driver is calling dev_kfree_skb() from code inside a spinlock,
where h/w interrupts are disabled. This is forbidden, as documented
in include/linux/netdevice.h. The correct function to use
dev_kfree_skb_irq(), or dev_kfree_skb_any().
Fix by using the correct dev_kfree_skb_xxx() functions:
in lan743x_tx_release_desc():
called by lan743x_tx_release_completed_descriptors()
called by in lan743x_tx_napi_poll()
which holds a spinlock
called by lan743x_tx_release_all_descriptors()
called by lan743x_tx_close()
which can-sleep
conclusion: use dev_kfree_skb_any()
in lan743x_tx_xmit_frame():
which holds a spinlock
conclusion: use dev_kfree_skb_irq()
in lan743x_tx_close():
which can-sleep
conclusion: use dev_kfree_skb()
in lan743x_rx_release_ring_element():
called by lan743x_rx_close()
which can-sleep
called by lan743x_rx_open()
which can-sleep
conclusion: use dev_kfree_skb()
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201112185949.11315-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With a few more uses of true and false in function calls, we
need to give them some useful names so we can tell from the
calling point what we're doing.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of having two different ways of expressing the same
sleepability concept, using opposite logic, we can rework the
from_ndo to can_sleep for a more consistent usage.
Fixes: 1800eee166 ("net: ionic: Replace in_interrupt() usage.")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The _ionic_lif_rx_mode() is only used once and really doesn't
need to be broken out.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We should be using the multicast sync routines for the multicast
filters. Also, let's just flatten the logic a bit and pull
the small unicast routine back into ionic_set_rx_mode().
Fixes: 1800eee166 ("net: ionic: Replace in_interrupt() usage.")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We don't need to refill the rx descriptors on every napi
if only a few were handled. Waiting until we can batch up
a few together will save us a few Rx cycles.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After the queues are stopped, expressly quiesce the lif.
This assures that even if the queues were in an odd state,
the firmware will close up everything cleanly.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Request a link check as soon as the netdev is registered rather
than waiting for the watchdog to go off in order to get the
interface operational a little more quickly.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change the order of operations in the link_up handling to be
sure that the queues are up and ready before we announce that
the link is up.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-11-14
1) Add BTF generation for kernel modules and extend BTF infra in kernel
e.g. support for split BTF loading and validation, from Andrii Nakryiko.
2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
on inlined branch conditions, from Alexei Starovoitov.
3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.
4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
infra, from Martin KaFai Lau.
5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
XDP_REDIRECT path, from Lorenzo Bianconi.
6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
context, from Song Liu.
7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
for different target archs on same source tree, from Jean-Philippe Brucker.
8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.
9) Move functionality from test_tcpbpf_user into the test_progs framework so it
can run in BPF CI, from Alexander Duyck.
10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.
Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.
[0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/
* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
net: mlx5: Add xdp tx return bulking support
net: mvpp2: Add xdp tx return bulking support
net: mvneta: Add xdp tx return bulking support
net: page_pool: Add bulk support for ptr_ring
net: xdp: Introduce bulking for xdp tx return path
bpf: Expose bpf_d_path helper to sleepable LSM hooks
bpf: Augment the set of sleepable LSM hooks
bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Rename some functions in bpf_sk_storage
bpf: Folding omem_charge() into sk_storage_charge()
selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
selftests/bpf: Add skb_pkt_end test
bpf: Support for pointers beyond pkt_end.
tools/bpf: Always run the *-clean recipes
tools/bpf: Add bootstrap/ to .gitignore
bpf: Fix NULL dereference in bpf_task_storage
tools/bpftool: Fix build slowdown
tools/runqslower: Build bpftool using HOSTCC
tools/runqslower: Enable out-of-tree build
...
====================
Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the devm_reset_control_get_optional() and devm_clk_get_optional()
rather than open coding them.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Link: https://lore.kernel.org/r/20201112092606.5173aa6f@xhacker.debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can simplify the for() condition and eliminate variable tx_left.
The change also considers that tp->cur_tx may be incremented by a
racing rtl8169_start_xmit().
In addition replace the write to tp->dirty_tx and the following
smp_mb() with an equivalent call to smp_store_mb(). This implicitly
adds a WRITE_ONCE() to the write.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/c2e19e5e-3d3f-d663-af32-13c3374f5def@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tp->dirty_tx and tp->cur_tx may be changed by a racing rtl_tx() or
rtl8169_start_xmit(). Use READ_ONCE() to annotate the races and ensure
that the compiler doesn't use cached values.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/5676fee3-f6b4-84f2-eba5-c64949a371ad@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can treat SKB_GSO_GRE almost exactly the same as UDP tunnels, except
that we don't want to edit the outer UDP len (as there isn't one).
For SKB_GSO_GRE_CSUM, we have to use GSO_PARTIAL as the device doesn't
support offload of non-UDP outer L4 checksums.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
By asking the HW for the correct edits, we can make UDP tunnel TSO
work without needing GSO_PARTIAL. So don't specify it in our
netdev->gso_partial_features.
However, retain GSO_PARTIAL support, as this will be used for other
protocols later.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Our TSO descriptors got even more fussy.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
We always have to update the value of ret, otherwise the
error value may be the previous one.
Fixes: f6bd59526c ("net: ethernet: ti: introduce am654 common platform time sync driver")
Signed-off-by: Wang Qing <wangqing@vivo.com>
[grygorii.strashko@ti.com: fix build warn, subj add fixes tag]
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20201112164541.3223-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 4c2703dfd7 ("net: marvell: prestera: Add PCI interface support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Vadym Kochan <vadym.kochan@plvision.eu>
Acked-by: Vadym Kochan <vadym.kochan@plvision.eu>
Link: https://lore.kernel.org/r/20201113113236.71678-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Depending on the SoC/platform the CPSW can completely lose context after a
suspend/resume cycle, including CPSW wrapper (WR) which will cause reset of
WR_C0_MISC_EN register, so CPTS IRQ will became disabled.
Fix it by moving CPTS IRQ enabling in cpsw_ndo_open() where CPTS is
actually started.
Fixes: 84ea9c0a95 ("net: ethernet: ti: cpsw: enable cpts irq")
Reported-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20201112111546.20343-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 8c7bd5a454 ("net: ethernet: mtk-star-emac: new driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/1605180879-2573-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ndo_start_xmit() method must return NETDEV_TX_OK if the DMA mapping
fails, after freeing the socket buffer.
Fix the mtk_star_netdev_start_xmit() function accordingly.
Fixes: 8c7bd5a454 ("net: ethernet: mtk-star-emac: new driver")
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20201112084833.21842-1-vincent.stehle@laposte.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Follow-up patchset introducing XMDR implementation is going to need
to distinguish write and update ops. Therefore introduce "update op"
and call "write op" only when new FIB entry is inserted.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case bulking is used, the entry that was previously added may not
be yet committed to the HW as it waits in the queue for bulk send. For
such entries, skip the deletion.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the low-level ops that need to store some data alongside
the fib_entry and introduce a per-fib_entry priv for ll ops.
The priv is reference counted as in the follow-up patch it is going
to be saved in pack() function and used later on in commit() even in
case the related fib_entry gets freed in the middle.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Get the max size needed for FIB entry op context and allocate it once
for the instance. Use it repeatedly from the scheduled work.
By this, allow to extend the context to hold more data than it is wise
to do when it was on the stack. Make sure to signalize that the context
needs to be initialized in case families of subsequent FIB entries differ.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For XMDR register it is possible to carry multiple FIB entry
operations in a single write. However the FW does not restrict mixing
the types of operations, make the code easier and indicate the bulking
is ok only in case the bulk contains FIB operations of the same family
and event.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With follow-up introduction of XM implementation, XMDR register is
going to be optionally used instead of RALUE register. Push the RALUE
packing helpers and write call into low-level router ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unify the RALUE register payload packing and use the
__mlxsw_sp_fib_entry_ralue_pack() helper from
__mlxsw_sp_router_set_abort_trap().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for the change that is going to be done in the next
patch, allow to pass NULL pointer to mlxsw_reg_ralue_pack4() and
mlxsw_reg_ralue_pack6() helpers.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of passing destination IP as a u32 value, pass it as pointer to
u32. Avoid using local variable for the pointer store.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As the RALUE packing is going to be put into op, make the user from
IPIP code use the same helper as the router code does.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As the RALUE packing is going to be pushed into an op, in preparation
for that push the code into a separate function in the meantime.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, RALUE payload is defined locally in the function that is
calling the register write. With introduction of alternative register to
RALUE, XMDR, it has to be possible to put multiple FIB entry
operations into single register write.
So in order to prepare for that, have per-work entry operation context
and propagate it all the way down to the functions writing RALUE.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, every FIB event is queued-up as a separate work to be
processed. However, that allows to process only one FIB entry per work
callback.
In preparation of future XMDR register bulking of multiple FIB entries,
convert to FIB event queue. Implement this by a list_head, adding new
events to the end of the list in the FIB notify callback. That allows to
process multiple events from the list inside the work callback.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the write/delete of FIB entry is going to be implemented by XMDR
register for XM implementation, introduce RALUE-independent enum for op
so the enum could be used in both RALUE and XMDR.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't pass RALXX register enum and rather pass enum mlxsw_sp_l3proto
to __mlxsw_sp_router_set_abort_trap(). This is in preparation to fib
entry pack implementation by XMDR register.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improve the build testing of these SMSC drivers by enabling them when
COMPILE_TEST is selected.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/smsc/smc911x.c: In function ‘smc911x_hardware_send_pkt’:
drivers/net/ethernet/smsc/smc911x.c:471:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
471 | cmdA = (((u32)skb->data & 0x3) << 16) |
When built on 64bit targets, the skb->data pointer cannot be cast to a
u32 in a meaningful way. Use uintptr_t instead.
Suggested-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the compiler always sees the parameters passed to the DBG()
macro, it gives an error message about wrong parameters. The comment
says it all:
/* ndev is not valid yet, so avoid passing it in. */
DBG(SMC_DEBUG_FUNC, "--> %s\n", __func__);
You cannot not just pass a parameter!
The DBG does not seem to have any real value, to just remove it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/smsc/smc911x.c: In function ‘smc911x_timeout’:
drivers/net/ethernet/smsc/smc911x.c:1251:6: warning: variable ‘status’ set but not used [-Wunused-but-set-variable]
1251 | int status, mask;
The status is read in order to print it via the DBG macro. However,
due to the way DBG is disabled, the compiler never sees it being used.
Change the DBG macro to actually make use of the passed parameters,
and the leave the optimiser to remove the unwanted code inside the
while (0).
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/smsc/smc911x.c: In function ‘smc911x_phy_interrupt’:
drivers/net/ethernet/smsc/smc911x.c:976:6: warning: variable ‘status’ set but not used [-Wunused-but-set-variable]
976 | int status;
A comment indicates the status needs to be read from the PHY,
otherwise bad things happen. But due to the macro magic, it is hard to
perform the read without assigning it to a variable. So add
_always_unused attribute to status to tell the compiler we don't
expect to use the value.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/smsc/smc91x.c:2199: warning: Function parameter or member 'dev' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2199: warning: Function parameter or member 'desc' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2199: warning: Function parameter or member 'name' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2199: warning: Function parameter or member 'index' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2199: warning: Function parameter or member 'value' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2199: warning: Function parameter or member 'nsdelay' not described in 'try_toggle_control_gpio'
Document these parameters.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/smsc/smc91x.c:706:51: warning: variable ‘pkt_len’ set but not used [-Wunused-but-set-variable]
706 | unsigned int saved_packet, packet_no, tx_status, pkt_len;
The read of the packet length in the descriptor probably needs to be
kept in order to keep the hardware happy. So tell the compiler we
don't expect to use the value by using the __always_unused attribute.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To improve build testing of this driver, add COMPILE_TEST support.
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet//xilinx/xilinx_emaclite.c:341:35: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
341 | addr = (void __iomem __force *)((u32 __force)addr ^
Use uintptr_t instead of u32 to avoid problems on 64 bit systems.
Also, cast the address to an unsigned long for printing.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The txqueue parameter to the watchdog callback is unused in this
driver. But it still needs to be documented.
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nfp_cpp_from_nfp6000_pcie() returns ERR_PTR() and never returns
NULL. The NULL test should be removed, also return correct err.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Link: https://lore.kernel.org/r/20201112145852.6580-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When no devicetree is present, the driver will use an
uninitialized variable.
Fix by initializing this variable.
Fixes: 902a66e08c ("lan743x: correctly handle chips with internal PHY")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201112152513.1941-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-11-10
This series contains updates to i40e and igc drivers and the MAINTAINERS
file.
Slawomir fixes updating VF MAC addresses to fix various issues related
to reporting and setting of these addresses for i40e.
Dan Carpenter fixes a possible used before being initialized issue for
i40e.
Vinicius fixes reporting of netdev stats for igc.
Tony updates repositories for Intel Ethernet Drivers.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
MAINTAINERS: Update repositories for Intel Ethernet Drivers
igc: Fix returning wrong statistics
i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc()
i40e: Fix MAC address setting for a VF via Host/VM
====================
Link: https://lore.kernel.org/r/20201111001955.533210-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In function ‘strncpy’,
inlined from ‘sky2_name’ at drivers/net/ethernet/marvell/sky2.c:4903:3,
inlined from ‘sky2_probe’ at drivers/net/ethernet/marvell/sky2.c:5049:2:
./include/linux/string.h:297:30: warning: ‘__builtin_strncpy’ specified bound 16 equals destination size [-Wstringop-truncation]
None of the device names are 16 characters long, so it was never an
issue. But replace the strncpy with an snprintf() to prevent the
theoretical overflow.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20201110023222.1479398-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stop the queue and ask for the credits if queue reaches to
threashold.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
context id and port id should be filled while sending tcb update.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If TCP congestion caused a very small packets which only has some
part fo the TAG, and that too is not till the end. HW can't handle
such case, so falling back to sw crypto in such cases.
v1->v2:
- Marked chcr_ktls_sw_fallback() static.
Fixes: dc05f3df8f ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If its a last packet and fin is set. Make sure FIN is informed
to HW before skb gets freed.
Fixes: 429765a149 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There could be a case where ACK for tls exchanges prior to start
marker is missed out, and by the time tls is offloaded. This pkt
should not be discarded and handled carefully. It could be
plaintext alone or plaintext + finish as well.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a record starts in middle, reset TCB UNA so that we could
avoid sending out extra packet which is needed to make it 16
byte aligned to start AES CTR.
Check also considers prev_seq, which should be what is
actually sent, not the skb data length.
Avoid updating partial TAG to HW at any point of time, that's
why we need to check if remaining part is smaller than TAG
size, then reset TX_MAX to be TAG starting sequence number.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If an skb has only header part which doesn't start from
beginning, is not being handled properly.
Fixes: dc05f3df8f ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
trimmed length calculation goes wrong if skb has only tag part
to send. It should be zero if there is no data bytes apart from
TAG.
Fixes: dc05f3df8f ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Creating SKB per tls record and freeing the original one causes
panic. There will be race if connection reset is requested. By
freeing original skb, refcnt will be decremented and that means,
there is no pending record to send, and so tls_dev_del will be
requested in control path while SKB of related connection is in
queue.
Better approach is to use same SKB to send one record (partial
data) at a time. We still have to create a new SKB when partial
last part of a record is requested.
This fix introduces new API cxgb4_write_partial_sgl() to send
partial part of skb. Present cxgb4_write_sgl can only provide
feasibility to start from an offset which limits to header only
and it can write sgls for the whole skb len. But this new API
will help in both. It can start from any offset and can end
writing in middle of the skb.
v4->v5:
- Removed extra changes.
Fixes: 429765a149 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Checksum update was missing in the WR.
Fixes: 429765a149 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a possibility of linear skbs coming in. Correcting
the length extraction logic.
v2->v3:
- Separated un-related changes from this patch.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If skb has retransmit data starting before start marker, e.g. ccs,
decrypted bit won't be set for that, and if it has some data to
encrypt, then it must be given to crypto ULD. So in place of
decrypted, check if socket is tls offloaded. Also, unless skb has
some data to encrypt, no need to give it for tls offload handling.
v2->v3:
- Removed ifdef.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The return pointer of dwc_eth_dwmac_data's .probe isn't used, and
"probe" usually return int, so change the prototype to follow standard
way. Secondly, it can simplify the tegra_eqos_probe() code.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Link: https://lore.kernel.org/r/20201109160440.3a736ee3@xhacker.debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the net core, the struct net_device_ops -> ndo_set_rx_mode()
callback is called with the dev->addr_list_lock spinlock held.
However, this driver's ndo_set_rx_mode callback eventually calls
lan743x_dp_write(), which acquires a mutex. Mutex acquisition
may sleep, and this is not allowed when holding a spinlock.
Fix by removing the dp_lock mutex entirely. Its purpose is to
prevent concurrent accesses to the data port. No concurrent
accesses are possible, because the dev->addr_list_lock
spinlock in the core only lets through one thread at a time.
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201109203828.5115-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 6f197fb638 ("lan743x: Added fixed link and RGMII support")
assumes that chips with an internal PHY will never have a devicetree
entry. This is incorrect: even for these chips, a devicetree entry
can be useful e.g. to pass the mac address from bootloader to chip:
&pcie {
status = "okay";
host@0 {
reg = <0 0 0 0 0>;
#address-cells = <3>;
#size-cells = <2>;
lan7430: ethernet@0 {
/* LAN7430 with internal PHY */
compatible = "microchip,lan743x";
status = "okay";
reg = <0 0 0 0 0>;
/* filled in by bootloader */
local-mac-address = [00 00 00 00 00 00];
};
};
};
If a devicetree entry is present, the driver will not attach the chip
to its internal phy, and the chip will be non-operational.
Fix by tweaking the phy connection algorithm:
- first try to connect to a phy specified in the devicetree
(could be 'real' phy, or just a 'fixed-link')
- if that doesn't succeed, try to connect to an internal phy, even
if the chip has a devnode
Tested on a LAN7430 with internal PHY. I cannot test a device using
fixed-link, as I do not have access to one.
Fixes: 6f197fb638 ("lan743x: Added fixed link and RGMII support")
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201108171224.23829-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The '!=' expression itself is bool, no need to convert it to bool.
Fix the following coccicheck warning:
./drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1477:34-39: WARNING: conversion to bool not needed here
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1604797919-10157-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'igc_update_stats()' was not updating 'netdev->stats', so the returned
statistics, for example, requested by:
$ ip -s link show dev enp3s0
were not being updated and were always zero.
Fix by returning a set of statistics that are actually being
updated (adapter->stats64).
Fixes: c9a11c23ce ("igc: Add netdev")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The "failure" variable is used without being initialized. It should be
set to false.
Fixes: 8cbf741499 ("i40e, xsk: move buffer allocation out of the Rx processing loop")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix MAC setting flow for the PF driver.
Update the unicast VF's MAC address in VF structure if it is
a new setting in i40e_vc_add_mac_addr_msg.
When unicast MAC address gets deleted, record that and
set the new unicast MAC address that is already waiting in the filter
list. This logic is based on the order of messages arriving to
the PF driver.
Without this change the MAC address setting was interpreted
incorrectly in the following use cases:
1) Print incorrect VF MAC or zero MAC
ip link show dev $pf
2) Don't preserve MAC between driver reload
rmmod iavf; modprobe iavf
3) Update VF MAC when macvlan was set
ip link add link $vf address $mac $vf.1 type macvlan
4) Failed to update mac address when VF was trusted
ip link set dev $vf address $mac
This includes all other configurations including above commands.
Fixes: f657a6e131 ("i40e: Fix VF driver MAC address configuration")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Set all EHL/TGL phy_addr to -1 so that the driver will automatically
detect it at run-time by probing all the possible 32 addresses.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Link: https://lore.kernel.org/r/20201106094341.4241-1-vee.khee.wong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With CONFIG_BRIDGE=m the compilation fails:
ld: drivers/net/ethernet/marvell/prestera/prestera_switchdev.o: in function `prestera_bridge_port_event':
prestera_switchdev.c:(.text+0x2ebd): undefined reference to `br_vlan_enabled'
in case the driver is statically enabled.
Fix it by adding 'BRIDGE || BRIDGE=n' dependency.
Fixes: e1189d9a5f ("net: marvell: prestera: Add Switchdev driver implementation")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20201106161128.24069-1-vadym.kochan@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl+kXcMACgkQSD+KveBX
+j61Zgf+IBCrevYpytPWmbrLnpWP6sVNkg0Plw8cmhRcLCCeN8mEMPIrhNDysmJu
WOen+yVz7K65WEA5LQ1nUM+VY8PdI3sAre81rTrfyEm2Nkd0aO22BoWSESrNZowf
XDNwn4lEJ5MToxvozfV158Gi921EwGtB/JKS6If7Cyjf+Ok8+ie8ayGfvPi7SiNK
+2Bnai7RJSeYYt8xZjvvg17b1+ZbTeoJF/waEXpFsM/BWVKg5ZOawcfRKr4MuvEJ
4WDzCjdaoeDZ8T6cppuWnznpoAeEf6Avq5DG/wMnUQxXRk9X5i043jr1Iydo5Bj9
MvE31Mg9v15CecaKBldVs4cMSv86gw==
=pwvB
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2020-11-03
v1->v2:
- Fix fixes line tag in patch #1
- Toss ktls refcount leak fix, Maxim will look further into the root
cause.
- Toss eswitch chain 0 prio patch, until we determine if it is needed
for -rc and net.
* tag 'mlx5-fixes-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix incorrect access of RCU-protected xdp_prog
net/mlx5e: Fix VXLAN synchronization after function reload
net/mlx5: E-switch, Avoid extack error log for disabled vport
net/mlx5: Fix deletion of duplicate rules
net/mlx5e: Use spin_lock_bh for async_icosq_lock
net/mlx5e: Protect encap route dev from concurrent release
net/mlx5e: Fix modify header actions memory leak
====================
Link: https://lore.kernel.org/r/20201105202129.23644-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RTL8125B has same or similar short packet hw padding bug as RTL8168evl.
The main workaround has been extended accordingly, however we have to
disable also hw checksumming for short packets on affected new chip
versions. Instead of checking for an affected chip version let's
simply disable hw checksumming for short packets in general.
v2:
- remove the version checks and disable short packet hw csum in general
- reflect this in commit title and message
Fixes: 0439297be9 ("r8169: add support for RTL8125B")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/7fbb35f0-e244-ef65-aa55-3872d7d38698@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The caller of rtl8169_tso_csum_v2() frees the skb if false is returned.
eth_skb_pad() internally frees the skb on error what would result in a
double free. Therefore use __skb_put_padto() directly and instruct it
to not free the skb on error.
Fixes: b423e9ae49 ("r8169: fix offloaded tx checksum for small packets.")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/f7e68191-acff-9ded-4263-c016428a8762@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the gcc warning:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c:2673:9: warning: this 'for' clause does not guard... [-Wmisleading-indentation]
2673 | for (i = 0; i < n; ++i) \
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1604467444-23043-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MDIO spec does not require an MDC at all times, only when MDIO
transactions are occurring. This patch allows the xilinx_axienet
driver to disable the MDC when not in use, and re-enable it when
needed. It also simplifies the driver by removing MDC disable
and enable in device reset sequence.
Signed-off-by: Clayton Rayment <clayton.rayment@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce helper functions to enable/disable MDIO interface clock. This
change serves a preparatory patch for the coming feature to dynamically
control the management bus clock.
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 16b5f5ce35
where it restructures do_reset. There are patches being tested that
would require major rework if this is committed first.
We will resend this after the other patches have been applied.
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20201106191745.1679846-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
and netfilter subtrees.
Current release - bugs in new features:
- can: isotp: isotp_rcv_cf(): enable RX timeout handling in
listen-only mode
Previous release - regressions:
- mac80211:
- don't require VHT elements for HE on 2.4 GHz
- fix regression where EAPOL frames were sent in plaintext
- netfilter:
- ipset: Update byte and packet counters regardless of whether
they match
- ip_tunnel: fix over-mtu packet send by allowing fragmenting even
if inner packet has IP_DF (don't fragment) set in its header
(when TUNNEL_DONT_FRAGMENT flag is not set on the tunnel dev)
- net: fec: fix MDIO probing for some FEC hardware blocks
- ip6_tunnel: set inner ipproto before ip6_tnl_encap to un-break
gso support
- sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian
platforms, sparse-related fix used the wrong integer size
Previous release - always broken:
- netfilter: use actual socket sk rather than skb sk when routing
harder
- r8169: work around short packet hw bug on RTL8125 by padding frames
- net: ethernet: ti: cpsw: disable PTPv1 hw timestamping
advertisement, the hardware does not support it
- chelsio/chtls: fix always leaking ctrl_skb and another leak caused
by a race condition
- fix drivers incorrectly writing into skbs on TX:
- cadence: force nonlinear buffers to be cloned
- gianfar: Account for Tx PTP timestamp in the skb headroom
- gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
- can: flexcan:
- remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
- add ECC initialization for VF610 and LX2160A
- flexcan_remove(): disable wakeup completely
- can: fix packet echo functionality:
- peak_canfd: fix echo management when loopback is on
- make sure skbs are not freed in IRQ context in case they need
to be dropped
- always clone the skbs to make sure they have a reference on
the socket, and prevent it from disappearing
- fix real payload length return value for RTR frames
- can: j1939: return failure on bind if netdev is down, rather than
waiting indefinitely
Misc:
- IPv6: reply ICMP error if the first fragment don't include all
headers to improve compliance with RFC 8200
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+kTDcACgkQMUZtbf5S
IrtC9A//f9rwNFI7sRaz9FYi6ljtWY7paPxdOxy3pWRoNzbfffjTGSPheNvy1pQb
IPaLsNwRrckQNSEPTbQqlUYcjzk1W74ffvq0sQOan4kNKxjX3uf78E6RuWARJsRC
dLqfcJctO6bFi6sEMwIFZ2tLOO5lUIA+Pd0GbjhSdObWzl3uqJ26v7wC6vVk29vS
116Mmhe8/TDVtCOzwlZnBPHqBJkTAirB+MAEX4Sp6FB9YirlcNZbWyHX5L6ejGqC
WQVjU2tPBBugeo0j72tc+y0mD3iK0aLcPL+dk0EQQYHRDMVTebl+gxNPUXCo9Out
HGe5z4e4qrR4Rx1W6MQ3pKwTYuCdwKjMRGd72JAi428/l4NN3y9W/HkI2Zuppd2l
7ifURkNQllYjGCSoHBviJbajyFBeA1nkFJgMSJiRs4T167K3zTbsyjNnfa4LnsvS
B3SrYMGqIH+oR20R9EoV8prVX+Alj1hh/jX02J8zsCcHmBqF2yZi17NarVAWoarm
v/AAqehlP+D1vjAmbCG9DeborrjaNi+v6zFTKK6ZadvLXRJX/N+wEPIpG4KjiK8W
DWKIVlee0R+kgCXE1n9AuZaZLWb7VwrAjkG1Pmfi3vkZhWeAhOW4X98ehhi/hVR/
Gq+e48ZECW5yuOA1q4hbsCYkGr2qAn/LPbsXxhEmW8qwkJHZYkI=
=5R2w
-----END PGP SIGNATURE-----
Merge tag 'net-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc3, including fixes from wireless, can, and
netfilter subtrees.
Current merge window - bugs in new features:
- can: isotp: isotp_rcv_cf(): enable RX timeout handling in
listen-only mode
Previous releases - regressions:
- mac80211:
- don't require VHT elements for HE on 2.4 GHz
- fix regression where EAPOL frames were sent in plaintext
- netfilter:
- ipset: Update byte and packet counters regardless of whether
they match
- ip_tunnel: fix over-mtu packet send by allowing fragmenting even if
inner packet has IP_DF (don't fragment) set in its header (when
TUNNEL_DONT_FRAGMENT flag is not set on the tunnel dev)
- net: fec: fix MDIO probing for some FEC hardware blocks
- ip6_tunnel: set inner ipproto before ip6_tnl_encap to un-break gso
support
- sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian
platforms, sparse-related fix used the wrong integer size
Previous releases - always broken:
- netfilter: use actual socket sk rather than skb sk when routing
harder
- r8169: work around short packet hw bug on RTL8125 by padding frames
- net: ethernet: ti: cpsw: disable PTPv1 hw timestamping
advertisement, the hardware does not support it
- chelsio/chtls: fix always leaking ctrl_skb and another leak caused
by a race condition
- fix drivers incorrectly writing into skbs on TX:
- cadence: force nonlinear buffers to be cloned
- gianfar: Account for Tx PTP timestamp in the skb headroom
- gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
- can: flexcan:
- remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
- add ECC initialization for VF610 and LX2160A
- flexcan_remove(): disable wakeup completely
- can: fix packet echo functionality:
- peak_canfd: fix echo management when loopback is on
- make sure skbs are not freed in IRQ context in case they need to
be dropped
- always clone the skbs to make sure they have a reference on the
socket, and prevent it from disappearing
- fix real payload length return value for RTR frames
- can: j1939: return failure on bind if netdev is down, rather than
waiting indefinitely
Misc:
- IPv6: reply ICMP error if the first fragment don't include all
headers to improve compliance with RFC 8200"
* tag 'net-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
ionic: check port ptr before use
r8169: work around short packet hw bug on RTL8125
net: openvswitch: silence suspicious RCU usage warning
chelsio/chtls: fix always leaking ctrl_skb
chelsio/chtls: fix memory leaks caused by a race
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
...
This series includes updates to mlx5 software steering component.
1) Few improvements in the DR area, such as removing unneeded checks,
renaming to better general names, refactor in some places, etc.
2) Software steering (DR) Memory management improvements
This patch series contains SW Steering memory management improvements:
using buddy allocator instead of an existing bucket allocator, and
several other optimizations.
The buddy system is a memory allocation and management algorithm
that manages memory in power of two increments.
The algorithm is well-known and well-described, such as here:
https://en.wikipedia.org/wiki/Buddy_memory_allocation
Linux uses this algorithm for managing and allocating physical pages,
as described here:
https://www.kernel.org/doc/gorman/html/understand/understand009.html
In our case, although the algorithm in principal is similar to the
Linux physical page allocator, the "building blocks" and the circumstances
are different: in SW steering, buddy allocator doesn't really allocates
a memory, but rather manages ICM (Interconnect Context Memory) that was
previously allocated and registered.
The ICM memory that is used in SW steering is always power
of 2 (order), so buddy system is a good fit for this.
Patches in this series:
[PATH 4] net/mlx5: DR, Add buddy allocator utilities
This patch adds a modified implementation of a well-known buddy allocator,
adjusted for SW steering needs: the algorithm in principal is similar to
the Linux physical page allocator, but in our case buddy allocator doesn't
really allocate a memory, but rather manages ICM memory that was previously
allocated and registered.
[PATH 5] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management
This patch changes ICM management of SW steering to use buddy-system mechanism
Instead of the previous bucket management.
[PATH 6] net/mlx5: DR, Sync chunks only during free
This patch makes syncing happen only when freeing memory chunks.
[PATH 7] net/mlx5: DR, ICM memory pools sync optimization
This patch adds tracking of pool's "hot" memory and makes the
check whether steering sync is required much shorter and faster.
[PATH 8] net/mlx5: DR, Free buddy ICM memory if it is unused
This patch adds tracking buddy's used ICM memory,
and frees the buddy if all its memory becomes unused.
3) Misc code cleanups
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl+kW/sACgkQSD+KveBX
+j7pRwf+KH6vIkwKlt0I2cYUmUINAkz0o1E82wXGx5Q81iMJLeIeMPKiatai6/0r
BIYD8t8oagOVw5OU+H9DbVIR47wz6pB90bkQIrIJk0S3ocVcDfLlN0ssbwCdEDlH
jj56SB6jjVVj1LlTXSVfUrXEKJ2FBjTxPbNtVL/NLW0GQkoJM5RaYjK++ZB58o+O
YI1Gb2w+FT5vVHdRVxs/2a/NTy71VQOrYctkhql4/8P0SsstPvhgOD/oRU26l8Il
Qpl68H/0N9KM04SlOlD91AwaWhPEIBP1qFSgRnfStmqxAfQlAfqajOoSntK8Vkz0
0yRF7pVLELhzxx+IR/W9Vpdf7uzjeA==
=rQcg
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-11-03
This series includes updates to mlx5 software steering component.
1) Few improvements in the DR area, such as removing unneeded checks,
renaming to better general names, refactor in some places, etc.
2) Software steering (DR) Memory management improvements
This patch series contains SW Steering memory management improvements:
using buddy allocator instead of an existing bucket allocator, and
several other optimizations.
The buddy system is a memory allocation and management algorithm
that manages memory in power of two increments.
The algorithm is well-known and well-described, such as here:
https://en.wikipedia.org/wiki/Buddy_memory_allocation
Linux uses this algorithm for managing and allocating physical pages,
as described here:
https://www.kernel.org/doc/gorman/html/understand/understand009.html
In our case, although the algorithm in principal is similar to the
Linux physical page allocator, the "building blocks" and the circumstances
are different: in SW steering, buddy allocator doesn't really allocates
a memory, but rather manages ICM (Interconnect Context Memory) that was
previously allocated and registered.
The ICM memory that is used in SW steering is always power
of 2 (order), so buddy system is a good fit for this.
Patches in this series:
[PATH 4] net/mlx5: DR, Add buddy allocator utilities
This patch adds a modified implementation of a well-known buddy allocator,
adjusted for SW steering needs: the algorithm in principal is similar to
the Linux physical page allocator, but in our case buddy allocator doesn't
really allocate a memory, but rather manages ICM memory that was previously
allocated and registered.
[PATH 5] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management
This patch changes ICM management of SW steering to use buddy-system mechanism
Instead of the previous bucket management.
[PATH 6] net/mlx5: DR, Sync chunks only during free
This patch makes syncing happen only when freeing memory chunks.
[PATH 7] net/mlx5: DR, ICM memory pools sync optimization
This patch adds tracking of pool's "hot" memory and makes the
check whether steering sync is required much shorter and faster.
[PATH 8] net/mlx5: DR, Free buddy ICM memory if it is unused
This patch adds tracking buddy's used ICM memory,
and frees the buddy if all its memory becomes unused.
3) Misc code cleanups
* tag 'mlx5-updates-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net: mlx5: Replace in_irq() usage
net/mlx5: Cleanup kernel-doc warnings
net/mlx4: Cleanup kernel-doc warnings
net/mlx5e: Validate stop_room size upon user input
net/mlx5: DR, Free unused buddy ICM memory
net/mlx5: DR, ICM memory pools sync optimization
net/mlx5: DR, Sync chunks only during free
net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets
net/mlx5: DR, Add buddy allocator utilities
net/mlx5: DR, Rename matcher functions to be more HW agnostic
net/mlx5: DR, Rename builders HW specific names
net/mlx5: DR, Remove unused member of action struct
====================
Link: https://lore.kernel.org/r/20201105201242.21716-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rq->xdp_prog is RCU-protected and should be accessed only with
rcu_access_pointer for the NULL check in mlx5e_poll_rx_cq.
rq->xdp_prog may change on the fly only from one non-NULL value to
another non-NULL value, so the checks in mlx5e_xdp_handle and
mlx5e_poll_rx_cq will have the same result during one NAPI cycle,
meaning that no additional synchronization is needed.
Fixes: fe45386a20 ("net/mlx5e: Use RCU to protect rq->xdp_prog")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
During driver reload, perform firmware tear-down which results in
firmware losing the configured VXLAN ports. These ports are still
available in the driver's database. Fix this by cleaning up driver's
VXLAN database in the nic unload flow, before firmware tear-down. With
that, minimize mlx5_vxlan_destroy() to remove only what was added in
mlx5_vxlan_create() and warn on leftover UDP ports.
Fixes: 18a2b7f969 ("net/mlx5: convert to new udp_tunnel infrastructure")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When E-switch vport is disabled, querying its hardware address is
unsupported.
Avoid setting extack error log message in such case.
Fixes: f099fde16d ("net/mlx5: E-switch, Support querying port function mac address")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When a rule is duplicated, the refcount of the rule is increased so only
the second deletion of the rule should cause destruction of the FTE.
Currently, the FTE will be destroyed in the first deletion of rule since
the modify_mask will be 0.
Fix it and call to destroy FTE only if all the rules (FTE's children)
have been removed.
Fixes: 718ce4d601 ("net/mlx5: Consolidate update FTE for all removal changes")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
async_icosq_lock may be taken from softirq and non-softirq contexts. It
requires protection with spin_lock_bh, otherwise a softirq may be
triggered in the middle of the critical section, and it may deadlock if
it tries to take the same lock. This patch fixes such a scenario by
using spin_lock_bh to disable softirqs on that CPU while inside the
critical section.
Fixes: 8d94b590f1 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In functions mlx5e_route_lookup_ipv{4|6}() route_dev can be arbitrary net
device and not necessary mlx5 eswitch port representor. As such, in order
to ensure that route_dev is not destroyed concurrent the code needs either
explicitly take reference to the device before releasing reference to
rtable instance or ensure that caller holds rtnl lock. First approach is
chosen as a fix since rtnl lock dependency was intentionally removed from
mlx5 TC layer.
To prevent unprotected usage of route_dev in encap code take a reference to
the device before releasing rt. Don't save direct pointer to the device in
mlx5_encap_entry structure and use ifindex instead. Modify users of
route_dev pointer to properly obtain the net device instance from its
ifindex.
Fixes: 61086f3910 ("net/mlx5e: Protect encap hash table with mutex")
Fixes: 6707f74be8 ("net/mlx5e: Update hw flows when encap source mac changed")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Modify header actions are allocated during parse tc actions and only
freed during the flow creation, however, on error flow the allocated
memory is wrongly unfreed.
Fix this by calling dealloc_mod_hdr_actions in __mlx5e_add_fdb_flow
and mlx5e_add_nic_flow error flow.
Fixes: d7e75a325c ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Fixes: 2f4fe4cab0 ("net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5_eq_async_int() uses in_irq() to decide whether eq::lock needs to be
acquired and released with spin_[un]lock() or the irq saving/restoring
variants.
The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.
mlx5_eq_async_int() knows the context via the action argument already so
using it for the lock variant decision is a straight forward replacement
for in_irq().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
$ git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/ | \
xargs scripts/kernel-doc -none
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57:
warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_I2C' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57:
warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_DONTCARE' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:118:
warning: Function parameter or member 'cb_arg' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160:
warning: Function parameter or member 'conn' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160:
warning: Excess function parameter 'fdev' description ...
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Stop room is a space that may be taken by WQEs in the SQ during a packet
transmit. It is used to check if next packet has enough room in the SQ.
Stop room guarantees this packet can be served and if not, the queue is
stopped, so no more packets are passed to the driver until it's ready.
Currently, stop_room size is calculated and validated upon tx queues
allocation. This makes it impossible to know if user provided valid
input for certain parameters when interface is down.
Instead, store stop_room in mlx5e_sq_param and create
mlx5e_validate_params(), to validate its fields upon user input even
when the interface is down.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Track buddy's used ICM memory, and free it if all
of the buddy's memory bacame unused.
Do this only for STEs.
MODIFY_ACTION buddies are much smaller, so in case there
is a large amount of modify_header actions, which result
in large amount of MODIFY_ACTION buddies, doing this
cleanup during sync will result in performance hit while
not freeing significant amount of memory.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Track the pool's hot ICM memory when freeing/allocating
chunk, so that when checking if the sync is required, just
check if the pool hot memory has reached the sync threshold.
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When freeing chunks, we want to sync the steering
so that all the "hot" memory will be written to ICM
and all the chunks that are in the hot_list will be
actually destroyed.
When allocating from the pool, we don't have a need
to sync the steering, as we're not freeing anything,
and sync might just hurt the performance in terms of
flow-per-second offloaded.
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Till now in order to manage the ICM memory we used bucket
mechanism, which kept a bucket per specified size (sizes were
between 1 block to 2^21 blocks).
Now changing that with buddy-system mechanism, which gives us much
more flexible way to manage the ICM memory.
Its biggest advantage over the bucket is by using the same ICM memory
area for all the sizes of blocks, which reduces the memory consumption.
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add implementation of SW Steering variation of buddy allocator.
The buddy system for ICM memory uses 2 main data structures:
- Bitmap per order, that keeps the current state of allocated
blocks for this order
- Indicator for the number of available blocks per each order
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Remove flex parser from the matcher function names since
the matcher should not be aware of such HW specific details.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
We will support multiple STE versions.
The existing naming is not suitable for newer versions.
Removed the HW specific details and renamed with a more
general names.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Network problems with RTL8125B have been reported [0] and with help
from Realtek it turned out that this chip version has a hw problem
with short packets (similar to RTL8168evl). Having said that activate
the same workaround as for RTL8168evl.
Realtek suggested to activate the workaround for RTL8125A too, even
though they're not 100% sure yet which RTL8125 versions are affected.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=209839
Fixes: 0439297be9 ("r8169: add support for RTL8125B")
Reported-by: Maxim Plotnikov <wgh@torlan.ru>
Tested-by: Maxim Plotnikov <wgh@torlan.ru>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/8002c31a-60b9-58f1-f0dd-8fd07239917f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tx checksumming has been defeatured and completely removed
from the h/w reference manual. Made a little cleanup for the
TSE case as this is complementary code.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201103140213.3294-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
patch adds a logic to utilize multiple queues to process requests.
The queue selection logic uses a round-robin distribution technique
using a counter.
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201102162832.22344-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An incorrect sizeof() is being used, sizeof(u64 *) is not correct,
it should be sizeof(*sq->sqb_ptrs).
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201102134601.698436-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.
The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context.
Use the `sched_napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Aymen Sghaier <aymen.sghaier@nxp.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Tested-by: Camelia Groza <camelia.groza@nxp.com>
dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if
invoked from:
- Hard interrupt context
- Any context which is not serving soft interrupts
Any context which is not serving soft interrupts includes hard interrupts
so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment
about this:
/*
* In case of threaded ISR, for RT kernels in_irq() does not return
* appropriate value, so use in_serving_softirq to distinguish between
* softirq and irq contexts.
*/
if (in_irq() || !in_serving_softirq())
This has nothing to do with RT. Even on a non RT kernel force threaded
interrupts run obviously in thread context and therefore in_irq() returns
false when invoked from the handler.
The extension of the in_irq() check with !in_serving_softirq() was there
when the drivers were added, but in the out of tree FSL BSP the original
condition was in_irq() which got extended due to failures on RT.
The usage of in_xxx() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context. Right he is, the above construct is
clearly showing why.
The following callchains have been analyzed to end up in
dpaa_eth_napi_schedule():
qman_p_poll_dqrr()
__poll_portal_fast()
fq->cb.dqrr()
dpaa_eth_napi_schedule()
portal_isr()
__poll_portal_fast()
fq->cb.dqrr()
dpaa_eth_napi_schedule()
Both need to schedule NAPI.
The crypto part has another code path leading up to this:
kill_fq()
empty_retired_fq()
qman_p_poll_dqrr()
__poll_portal_fast()
fq->cb.dqrr()
dpaa_eth_napi_schedule()
kill_fq() is called from task context and ends up scheduling NAPI, but
that's pointless and an unintended side effect of the !in_serving_softirq()
check.
The code path:
caam_qi_poll() -> qman_p_poll_dqrr()
is invoked from NAPI and I *assume* from crypto's NAPI device and not
from qbman's NAPI device. I *guess* it is okay to skip scheduling NAPI
(because this is what happens now) but could be changed if it is wrong
due to `budget' handling.
Add an argument to __poll_portal_fast() which is true if NAPI needs to be
scheduled. This requires propagating the value to the caller including
`qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Aymen Sghaier <aymen.sghaier@nxp.com>
Cc: Herbert XS <herbert@gondor.apana.org.au>
Cc: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Tested-by: Camelia Groza <camelia.groza@nxp.com>
This is the 3rd revision of the patch fix for potential null pointer dereference
with lan743x card.
The simpliest way to reproduce: boot with bare lan743x and issue "ethtool ethN"
commant where ethN is the interface with lan743x card. Example:
$ sudo ethtool eth7
dmesg:
[ 103.510336] BUG: kernel NULL pointer dereference, address: 0000000000000340
...
[ 103.510836] RIP: 0010:phy_ethtool_get_wol+0x5/0x30 [libphy]
...
[ 103.511629] Call Trace:
[ 103.511666] lan743x_ethtool_get_wol+0x21/0x40 [lan743x]
[ 103.511724] dev_ethtool+0x1507/0x29d0
[ 103.511769] ? avc_has_extended_perms+0x17f/0x440
[ 103.511820] ? tomoyo_init_request_info+0x84/0x90
[ 103.511870] ? tomoyo_path_number_perm+0x68/0x1e0
[ 103.511919] ? tty_insert_flip_string_fixed_flag+0x82/0xe0
[ 103.511973] ? inet_ioctl+0x187/0x1d0
[ 103.512016] dev_ioctl+0xb5/0x560
[ 103.512055] sock_do_ioctl+0xa0/0x140
[ 103.512098] sock_ioctl+0x2cb/0x3c0
[ 103.512139] __x64_sys_ioctl+0x84/0xc0
[ 103.512183] do_syscall_64+0x33/0x80
[ 103.512224] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 103.512274] RIP: 0033:0x7f54a9cba427
...
Previous versions can be found at:
v1:
initial version
https://lkml.org/lkml/2020/10/28/921
v2:
do not return from lan743x_ethtool_set_wol if netdev->phydev == NULL, just skip
the call of phy_ethtool_set_wol() instead.
https://lkml.org/lkml/2020/10/31/380
v3:
in function lan743x_ethtool_set_wol:
use ternary operator instead of if-else sentence (review by Markus Elfring)
return -ENETDOWN insted of -EIO (review by Andrew Lunn)
Signed-off-by: Sergej Bauer <sbauer@blackbox.su>
Link: https://lore.kernel.org/r/20201101223556.16116-1-sbauer@blackbox.su
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We had to remove flag IRQF_NO_THREAD because it conflicts with shared
interrupts in case legacy interrupts are used. Following up on the
linked discussion set IRQF_NO_THREAD if MSI or MSI-X is used, because
both guarantee that interrupt won't be shared.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://www.spinics.net/lists/netdev/msg695341.html
Link: https://lore.kernel.org/r/446cf5b8-dddd-197f-cb96-66783141ade4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lowest number of tx descriptors used in the vendor drivers is 256 in
r8169. r8101/r8168/r8125 use 1024 what seems to be the hw limit. Stay
on the safe side and go with 256, same as number of rx descriptors.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/a52a6de4-f792-5038-ae2f-240d3b7860eb@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for support of XM router implementation which uses
different registers to work with trees and FIB entries, introduce
a structure to hold low-level ops and implement tree manipulation
register ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a couple of registers used to manipulate LPM trees on XM:
The XRALTA is used to allocate the XLT LPM trees.
The XRALST is used to set and query the structure of an XLT LPM tree.
The XRALTB register is used to bind virtual router and protocol to
an allocated LPM tree.
Since the XM registers are identical to the legacy router registers
with a fixed offset, re-use their pack functions.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit b27507bb59 ("net/ibmvnic: unlock rtnl_lock in reset so
linkwatch_event can run") introduced do_change_param_reset function to
solve the rtnl lock issue. Majority of the code in do_change_param_reset
duplicates do_reset. Also, we can handle the rtnl lock issue in do_reset
itself. Hence merge do_change_param_reset back into do_reset to clean up
the code.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201031094645.17255-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct skb refcount in alloc_ctrl_skb(), causing skb memleak
when chtls_send_abort() called with NULL skb.
it was always leaking the skb, correct it by incrementing skb
refs by one.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201102173909.24826-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
race between user context and softirq causing memleak,
consider the call sequence scenario
chtls_setkey() //user context
chtls_peer_close()
chtls_abort_req_rss()
chtls_setkey() //user context
work request skb queued in chtls_setkey() won't be freed
because resources are already cleaned for this connection,
fix it by not queuing work request while socket is closing.
v1->v2:
- fix W=1 warning.
v2->v3:
- separate it out from another memleak fix.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201102173650.24754-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Put the preparation phase of switchdev VLAN objects to some good use,
and move the check we already had, for preventing the existence of more
than one egress-untagged VLAN per port, to the preparation phase of the
addition.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the ocelot_port_set_native_vlan() function starts dropping
untagged and prio-tagged traffic when the native VLAN is removed?
What is the native VLAN? It is the only egress-untagged VLAN that ocelot
supports on a port. If the port is a trunk with 100 VLANs, one of those
VLANs can be transmitted as egress-untagged, and that's the native VLAN.
Is it wrong to drop untagged and prio-tagged traffic if there's no
native VLAN? Yes and no.
In this case, which is more typical, it's ok to apply that drop
configuration:
$ bridge vlan add dev swp0 vid 1 pvid untagged <- this is the native VLAN
$ bridge vlan add dev swp0 vid 100
$ bridge vlan add dev swp0 vid 101
$ bridge vlan del dev swp0 vid 1 <- delete the native VLAN
But only because the pvid and the native VLAN have the same ID.
In this case, it isn't:
$ bridge vlan add dev swp0 vid 1 pvid
$ bridge vlan add dev swp0 vid 100 untagged <- this is the native VLAN
$ bridge vlan del dev swp0 vid 101
$ bridge vlan del dev swp0 vid 100 <- delete the native VLAN
It's wrong, because the switch will drop untagged and prio-tagged
traffic now, despite having a valid pvid of 1.
The confusion seems to stem from the fact that the native VLAN is an
egress setting, while the PVID is an ingress setting. It would be
correct to drop untagged and prio-tagged traffic only if there was no
pvid on the port. So let's do just that.
Background:
https://lore.kernel.org/netdev/CA+h21hrRMrLH-RjBGhEJSTZd6_QPRSd3RkVRQF-wNKkrgKcRSA@mail.gmail.com/#t
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently we are checking in some places whether the port has a native
VLAN on egress or not, by comparing the ocelot_port->vid value with zero.
That works, because VID 0 can never be a native VLAN configured by the
bridge, but now we want to make similar checks for the pvid. That won't
work, because there are cases when we do have the pvid set to 0 (not by
the bridge, by ourselves, but still.. it's confusing). And we can't
encode a negative value into an u16, so add a bool to the structure.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I have no idea why this code is here, but I have 2 hypotheses:
1.
A desperate attempt to keep untagged traffic working when the bridge
deletes the pvid on a port.
There was a fairly okay discussion here:
https://lore.kernel.org/netdev/CA+h21hrRMrLH-RjBGhEJSTZd6_QPRSd3RkVRQF-wNKkrgKcRSA@mail.gmail.com/#t
which established that in vlan_filtering=1 mode, the absence of a pvid
should denote that the ingress port should drop untagged and priority
tagged traffic. While in vlan_filtering=0 mode, nothing should change.
So in vlan_filtering=1 mode, we should simply let things happen, and not
attempt to save the day. And in vlan_filtering=0 mode, the pvid is 0
anyway, no need to do anything.
2.
The driver encodes the native VLAN (ocelot_port->vid) value of 0 as
special, meaning "not valid". There are checks based on that. But there
are no such checks for the ocelot_port->pvid value of 0. In fact, that's
a perfectly valid value, which is used in standalone mode. Maybe there
was some confusion and the author thought that 0 means "invalid" here as
well.
In conclusion, delete the code*.
*in fact we'll add it back later, in a slightly different form, but for
an entirely different reason than the one for which this exists now.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, mscc_ocelot ports configure pvid=0 in standalone mode, and
inherit the pvid from the bridge when one is present.
When the bridge has vlan_filtering=0, the software semantics are that
packets should be received regardless of whether there's a pvid
configured on the ingress port or not. However, ocelot does not observe
those semantics today.
Moreover, changing the PVID is also a problem with vlan_filtering=0.
We are privately remapping the VID of FDB, MDB entries to the port's
PVID when those are VLAN-unaware (i.e. when the VID of these entries
comes to us as 0). But we have no logic of adjusting that remapping when
the user changes the pvid and vlan_filtering is 0. So stale entries
would be left behind, and untagged traffic will stop matching on them.
And even if we were to solve that, there's an even bigger problem. If
swp0 has pvid 1, and swp1 has pvid 2, and both are under a vlan_filtering=0
bridge, they should be able to forward traffic between one another.
However, with ocelot they wouldn't do that.
The simplest way of fixing this is to never configure the pvid based on
what the bridge is asking for, when vlan_filtering is 0. Only if there
was a VLAN that the bridge couldn't mangle, that we could use as pvid....
So, turns out, there's 0 just for that. And for a reason: IEEE
802.1Q-2018, page 247, Table 9-2-Reserved VID values says:
The null VID. Indicates that the tag header contains only
priority information; no VID is present in the frame.
This VID value shall not be configured as a PVID or a member
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
of a VID Set, or configured in any FDB entry, or used in any
Management operation.
So, aren't we doing exactly what 802.1Q says not to? Well, in a way, but
what we're doing here is just driver-level bookkeeping, all for the
better. The fact that we're using a pvid of 0 is not observable behavior
from the outside world: the network stack does not see the classified
VLAN that the switch uses, in vlan_filtering=0 mode. And we're also more
consistent with the standalone mode now.
And now that we use the pvid of 0 in this mode, there's another advantage:
we don't need to perform any VID remapping for FDB and MDB entries either,
we can just use the VID of 0 that the bridge is passing to us.
The only gotcha is that every time we change the vlan_filtering setting,
we need to reapply the pvid (either to 0, or to the value from the bridge).
A small side-effect visible in the patch is that ocelot_port_set_pvid
needs to be moved above ocelot_port_vlan_filtering, so that it can be
called from there without forward-declarations.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 5a18e1e0c1 introduced the 'failover_pending' state to track
the "failover pending window" - where we wait for the partner to become
ready (after a transport event) before actually attempting to failover.
i.e window is between following two events:
a. we get a transport event due to a FAILOVER
b. later, we get CRQ_INITIALIZED indicating the partner is
ready at which point we schedule a FAILOVER reset.
and ->failover_pending is true during this window.
If during this window, we attempt to open (or close) a device, we pretend
that the operation succeded and let the FAILOVER reset path complete the
operation.
This is fine, except if the transport event ("a" above) occurs during the
open and after open has already checked whether a failover is pending. If
that happens, we fail the open, which can cause the boot scripts to leave
the interface down requiring administrator to manually bring up the device.
This fix "extends" the failover pending window till we are _actually_
ready to perform the failover reset (i.e until after we get the RTNL
lock). Since open() holds the RTNL lock, we can be sure that we either
finish the open or if the open() fails due to the failover pending window,
we can again pretend that open is done and let the failover complete it.
We could try and block the open until failover is completed but a) that
could still timeout the application and b) Existing code "pretends" that
failover occurred "just after" open succeeded, so marks the open successful
and lets the failover complete the open. So, mark the open successful even
if the transport event occurs before we actually start the open.
Fixes: 5a18e1e0c1 ("ibmvnic: Fix failover case for non-redundant configuration")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Acked-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20201030170711.1562994-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use new dev_err_probe() API to handle deferred probe properly and simplify
the code.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds final multi-port support to TI AM65x CPSW driver path in
preparation for adding support for multi-port devices, like Main CPSW0 on
K3 J721E SoC or future CPSW3g on K3 AM64x SoC.
- the separate netdev is created for every enabled external Port;
- DMA channels are common/shared for all external Ports and the RX/TX NAPI
and DMA processing assigned to first available netdev;
- external Ports are configured in mac-only mode, which is similar to TI
"dual-mac" mode for legacy TI CPSW - packets are sent to the Host port only
in ingress and directly to the Port on egress. No packet switching between
external ports happens.
- every port supports the same features as current AM65x CPSW on external
device.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds multi-port support to TI AM65x CPSW driver xmit/rx path in
preparation for adding support for multi-port devices, like Main CPSW0 on
K3 J721E SoC or future CPSW3g on K3 AM64x SoC.
Hence DMA channels are common/shared for all ext Ports and the RX/TX NAPI
and DMA processing going to be assigned to first available netdev this patch:
- ensures all RX descriptors fields are initialized;
- adds synchronization for TX DMA push/pop operation (locking) as
Networking core locks are not enough any more;
- updates TX bql processing for every packet in
am65_cpsw_nuss_tx_compl_packets() as every completed TX skb can have
different ndev assigned (come from different netdevs).
To avoid performance issues for existing one-port CPSW2g devices the above
changes are done only for multi-port devices by splitting xmit path for
one-port and multi-port devices.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current implementation uses .ndo_set_features() callback to track
NETIF_F_HW_CSUM feature changes and update generic
CPSW_P0_CONTROL_REG.RX_CHECKSUM_EN option accordingly. It's not going to
work in case of multi-port devices as TX csum offload can be changed per
netdev.
On K3 CPSWxG devices TX csum offload enabled in the following way:
- the CPSW_P0_CONTROL_REG.RX_CHECKSUM_EN option enables TX csum offload in
generic and affects all TX DMA channels and packets;
- corresponding fields in TX DMA descriptor have to be filed properly when
upper layer wants to offload TX csum (skb->ip_summed == CHECKSUM_PARTIAL)
and it's per-packet option.
The Linux Network core is expected to never request TX csum offload if
netdev NETIF_F_HW_CSUM feature is disabled, and, as result, TX DMA
descriptors should not be modified, and per-packet TX csum offload will be
disabled (or enabled) on per-netdev basis. Which, in turn, makes it safe to
enable the CPSW_P0_CONTROL_REG.RX_CHECKSUM_EN option unconditionally.
Hence, fix TX csum offload for multi-port devices by:
- enabling the CPSW_P0_CONTROL_REG.RX_CHECKSUM_EN option in
am65_cpsw_nuss_common_open() unconditionally
- and removing .ndo_set_features() callback implementation, which was used
only NETIF_F_HW_CSUM feature update purposes
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some K3 CPSW NUSS instances can lose context after PM runtime ON->OFF->ON
transition depending on integration (including all submodules: CPTS, MDIO,
etc), like J721E Main CPSW (CPSW9G).
In case CPTS is enabled it's initialized during probe and does not expect
to be reset. Hence, keep K3 CPSW active by forbidding PM runtime if CPTS is
enabled.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The VLAN offload for AM65x CPSW2G is implemented using existing ALE APIs,
which are also used by legacy CPSW drivers.
So, now it always adds current Ext. Port and Host as VLAN members when VLAN
is added by 8021Q core (.ndo_vlan_rx_add_vid) and forcibly removes VLAN
from ALE table in .ndo_vlan_rx_kill_vid(). This works as for AM65x CPSW2G
(which has only one Ext. Port) as for legacy CPSW devices (which can't
support same VLAN on more then one Port in multi mac (dual-mac) mode). But
it doesn't work for the new J721E and AM64x multi port CPSWxG versions
doesn't have such restrictions and allow to offload the same VLAN on any
number of ports.
Now the attempt to add same VLAN on two (or more) K3 CPSWxG Ports will
cause:
- VLAN members mask overwrite when VLAN is added
- VLAN removal from ALE table when any Port removes VLAN
This patch fixes an issue by:
- switching to use cpsw_ale_vlan_add_modify() instead of
cpsw_ale_add_vlan() when VLAN is added to ALE table, so VLAN members
mask will not be overwritten;
- Updates cpsw_ale_del_vlan() as:
if more than one ext. Port is in VLAN member mask
then remove only current port from VLAN member mask
else remove VLAN ALE entry
Example:
add: P1 | P0 (Host) -> members mask: P1 | P0
add: P2 | P0 -> members mask: P2 | P1 | P0
rem: P1 | P0 -> members mask: P2 | P0
rem: P2 | P0 -> members mask: -
The VLAN is forcibly removed if port_mask=0 passed to cpsw_ale_del_vlan()
to preserve existing legacy CPSW drivers functionality.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add/export cpsw_ale_vlan_del_modify() and use it in cpsw_switchdev instead
of generic cpsw_ale_del_vlan() to avoid mixing 8021Q and switchdev VLAN
offload. This is preparation patch equired by follow up changes.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation of adding more multi-port K3 CPSW versions move free
descriptor queue mode selection in am65_cpsw_pdata, so it can be selected
basing on DT compatibility property.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation of adding more multi-port K3 CPSW versions move ALE
selection in am65_cpsw_pdata, so it can be selected basing on DT
compatibility property.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improve the build testing of this davicom driver by enabling it when
COMPILE_TEST is selected.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/davicom//dm9000.c: In function ‘dm9000_dumpblk_8bit’:
drivers/net/ethernet/davicom//dm9000.c:235:6: warning: variable ‘tmp’ set but not used [-Wunused-but-set-variable]
The driver needs to read packet data from the device even when the
packet is known bad. There is no need to assign the data to a variable
during this discard operation.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When compiled for platforms other than __i386__ or __x86_64__:
drivers/net/ethernet/dec/tulip/tulip_core.c: In function ‘tulip_init_one’:
drivers/net/ethernet/dec/tulip/tulip_core.c:1296:13: warning: variable ‘last_irq’ set but not used [-Wunused-but-set-variable]
1296 | static int last_irq;
Add more #if defined() to totally remove the code when not needed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201031005445.1060112-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
phy-handle can't be handled well for ast2400/2500 which has an embedded
MDIO controller. Add ftgmac100_mdio_setup for ast2400/2500 and initialize
PHYs from mdio child node with of_mdiobus_register.
Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Split MDIO registration and PHY connect into ftgmac100_setup_mdio and
ftgmac100_mii_probe.
Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The TI CPTS does not natively support PTPv1, only PTPv2. But, as it
happens, the CPTS can provide HW timestamp for PTPv1 Sync messages, because
CPTS HW parser looks for PTP messageType id in PTP message octet 0 which
value is 0 for PTPv1. As result, CPTS HW can detect Sync messages for PTPv1
and PTPv2 (Sync messageType = 0 for both), but it fails for any other PTPv1
messages (Delay_req/resp) and will return PTP messageType id 0 for them.
The commit e9523a5a32 ("net: ethernet: ti: cpsw: enable
HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter") added PTPv1 hw timestamping
advertisement by mistake, only to make Linux Kernel "timestamping" utility
work, and this causes issues with only PTPv1 compatible HW/SW - Sync HW
timestamped, but Delay_req/resp are not.
Hence, fix it disabling PTPv1 hw timestamping advertisement, so only PTPv1
compatible HW/SW can properly roll back to SW timestamping.
Fixes: e9523a5a32 ("net: ethernet: ti: cpsw: enable HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20201029190910.30789-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The headroom reserved for received frames needs to be aligned to an
RX specific value. There is currently a discrepancy between the values
used in the Ethernet driver and the values passed to the FMan.
Coincidentally, the resulting aligned values are identical.
Fixes: 3c68b8fffb ("dpaa_eth: FMan erratum A050385 workaround")
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Impose a larger RX private data area only when the A050385 erratum is
present on the hardware. A smaller buffer size is sufficient in all
other scenarios. This enables a wider range of linear Jumbo frame
sizes in non-erratum scenarios, instead of turning to multi
buffer Scatter/Gather frames. The maximum linear frame size is
increased by 128 bytes for non-erratum arm64 platforms.
Cleanup the hardware annotations header defines in the process.
Fixes: 3c68b8fffb ("dpaa_eth: FMan erratum A050385 workaround")
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In my test setup, I had a SAMA5D27 device configured with ip forwarding, and
second device with usb ethernet (r8152) sending ICMP packets. If the packet
was larger than about 220 bytes, the SAMA5 device would "oops" with the
following trace:
kernel BUG at net/core/skbuff.c:1863!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in: xt_MASQUERADE ppp_async ppp_generic slhc iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 can_raw can bridge stp llc ipt_REJECT nf_reject_ipv4 sd_mod cdc_ether usbnet usb_storage r8152 scsi_mod mii o
ption usb_wwan usbserial micrel macb at91_sama5d2_adc phylink gpio_sama5d2_piobu m_can_platform m_can industrialio_triggered_buffer kfifo_buf of_mdio can_dev fixed_phy sdhci_of_at91 sdhci_pltfm libphy sdhci mmc_core ohci_at91 ehci_atmel o
hci_hcd iio_rescale industrialio sch_fq_codel spidev prox2_hal(O)
CPU: 0 PID: 0 Comm: swapper Tainted: G O 5.9.1-prox2+ #1
Hardware name: Atmel SAMA5
PC is at skb_put+0x3c/0x50
LR is at macb_start_xmit+0x134/0xad0 [macb]
pc : [<c05258cc>] lr : [<bf0ea5b8>] psr: 20070113
sp : c0d01a60 ip : c07232c0 fp : c4250000
r10: c0d03cc8 r9 : 00000000 r8 : c0d038c0
r7 : 00000000 r6 : 00000008 r5 : c59b66c0 r4 : 0000002a
r3 : 8f659eff r2 : c59e9eea r1 : 00000001 r0 : c59b66c0
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c53c7d Table: 2640c059 DAC: 00000051
Process swapper (pid: 0, stack limit = 0x75002d81)
<snipped stack>
[<c05258cc>] (skb_put) from [<bf0ea5b8>] (macb_start_xmit+0x134/0xad0 [macb])
[<bf0ea5b8>] (macb_start_xmit [macb]) from [<c053e504>] (dev_hard_start_xmit+0x90/0x11c)
[<c053e504>] (dev_hard_start_xmit) from [<c0571180>] (sch_direct_xmit+0x124/0x260)
[<c0571180>] (sch_direct_xmit) from [<c053eae4>] (__dev_queue_xmit+0x4b0/0x6d0)
[<c053eae4>] (__dev_queue_xmit) from [<c05a5650>] (ip_finish_output2+0x350/0x580)
[<c05a5650>] (ip_finish_output2) from [<c05a7e24>] (ip_output+0xb4/0x13c)
[<c05a7e24>] (ip_output) from [<c05a39d0>] (ip_forward+0x474/0x500)
[<c05a39d0>] (ip_forward) from [<c05a13d8>] (ip_sublist_rcv_finish+0x3c/0x50)
[<c05a13d8>] (ip_sublist_rcv_finish) from [<c05a19b8>] (ip_sublist_rcv+0x11c/0x188)
[<c05a19b8>] (ip_sublist_rcv) from [<c05a2494>] (ip_list_rcv+0xf8/0x124)
[<c05a2494>] (ip_list_rcv) from [<c05403c4>] (__netif_receive_skb_list_core+0x1a0/0x20c)
[<c05403c4>] (__netif_receive_skb_list_core) from [<c05405c4>] (netif_receive_skb_list_internal+0x194/0x230)
[<c05405c4>] (netif_receive_skb_list_internal) from [<c0540684>] (gro_normal_list.part.0+0x14/0x28)
[<c0540684>] (gro_normal_list.part.0) from [<c0541280>] (napi_complete_done+0x16c/0x210)
[<c0541280>] (napi_complete_done) from [<bf14c1c0>] (r8152_poll+0x684/0x708 [r8152])
[<bf14c1c0>] (r8152_poll [r8152]) from [<c0541424>] (net_rx_action+0x100/0x328)
[<c0541424>] (net_rx_action) from [<c01012ec>] (__do_softirq+0xec/0x274)
[<c01012ec>] (__do_softirq) from [<c012d6d4>] (irq_exit+0xcc/0xd0)
[<c012d6d4>] (irq_exit) from [<c0160960>] (__handle_domain_irq+0x58/0xa4)
[<c0160960>] (__handle_domain_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x90)
Exception stack(0xc0d01ef0 to 0xc0d01f38)
1ee0: 00000000 0000003d 0c31f383 c0d0fa00
1f00: c0d2eb80 00000000 c0d2e630 4dad8c49 4da967b0 0000003d 0000003d 00000000
1f20: fffffff5 c0d01f40 c04e0f88 c04e0f8c 30070013 ffffffff
[<c0100b0c>] (__irq_svc) from [<c04e0f8c>] (cpuidle_enter_state+0x7c/0x378)
[<c04e0f8c>] (cpuidle_enter_state) from [<c04e12c4>] (cpuidle_enter+0x28/0x38)
[<c04e12c4>] (cpuidle_enter) from [<c014f710>] (do_idle+0x194/0x214)
[<c014f710>] (do_idle) from [<c014fa50>] (cpu_startup_entry+0xc/0x14)
[<c014fa50>] (cpu_startup_entry) from [<c0a00dc8>] (start_kernel+0x46c/0x4a0)
Code: e580c054 8a000002 e1a00002 e8bd8070 (e7f001f2)
---[ end trace 146c8a334115490c ]---
The solution was to force nonlinear buffers to be cloned. This was previously
reported by Klaus Doth (https://www.spinics.net/lists/netdev/msg556937.html)
but never formally submitted as a patch.
This is the third revision, hopefully the formatting is correct this time!
Suggested-by: Klaus Doth <krnl@doth.eu>
Fixes: 653e92a917 ("net: macb: add support for padding and fcs computation")
Signed-off-by: Mark Deneen <mdeneen@saucontech.com>
Link: https://lore.kernel.org/r/20201030155814.622831-1-mdeneen@saucontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can safely runtime-suspend the chip if rtl_open() fails. Therefore
switch the error path to use pm_runtime_put_sync() as well.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/aa093b1e-f295-5700-1cb7-954b54dd8f17@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tp->dirty_tx isn't changed outside rtl_tx(). Therefore I see no need
to guarantee a specific order of reading tp->dirty_tx and tp->cur_tx.
Having said that we can remove the memory barrier.
In addition use READ_ONCE() when reading tp->cur_tx because it can
change in parallel to rtl_tx().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/2264563a-fa9e-11b0-2c42-31bc6b8e2790@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds support for 10GBASE-R interface to the linux driver for
Cadence's ethernet controller.
This controller has separate MAC's and PCS'es for low and high speed paths.
High speed PCS supports 100M, 1G, 2.5G, 5G and 10G through rate adaptation
implementation. However, since it doesn't support auto negotiation, linux
driver is modified to support 10GBASE-R instead of USXGMII.
Signed-off-by: Parshuram Thombare <pthombar@cadence.com>
Link: https://lore.kernel.org/r/1603975627-18338-1-git-send-email-pthombar@cadence.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hi Linus,
Please, pull the following patches that replace zero-length arrays with
flexible-array members.
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl+cjRUACgkQRwW0y0cG
2zGWAhAAjUfTsAmXWhKNaWFSCYR0Q822puTUWOKfiBd+jjGaO04luTtr2gjv2Dkb
Vgad8H4N8oZU79xfh5JZ5PUyScaso8wE6ZJTh2PLKXpKmNd213f5x/pIt78CCDTa
Y1L/eR41mmveTL3VNS3sf6WaZpT9owxJKGIY8JgdiOmSjxJQpX5zdaC1KYso4eXr
lIXIRo9VLEmVLhhHhZi+QmX6+aQ05E1D9K0ENe4/uEnRsV525W78iwZ4fYeLzr+A
krEOdgx6sPgzajPYnHoayrrcKNKxD5YY1SWuVSm2tqYYIhlRoK3f5xgLOd10RiHE
YMgx8aWzGmGJwoUhgp1bo/l9EZ7O8OWRqM/GOP4x6Wgjdhqw2x5jgskmhsKNGEXu
/BlbS+qL5aUrMCxhvNbApuZW6xBiBbva76MH3vU9vFhZbVz1CHLQdGI0tfxggYWS
jc2UPgoxL9OQlf3jSc+gK7RMFhBGNWn2Aiy8GQas3BxPYXuYPvwOj+irDOG/qZ9D
VZ5swUw4+th+DsF5K53mEFeLv0fONMgL9Ka5bNR6+k6HG0WNLYYVOiet3xYUDo1f
eZbMZthfc+QW7R8cwG0WuFk6rC6mLqE+A9nQuLZoJD+VMuJd4pwW9+6EW8nDX08w
FS4/o92xUFJfOCgaLRS61FSAuSmFENieN+yoKMK/Uf6PJVdNMb4=
=vyu3
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull more flexible-array member conversions from Gustavo A. R. Silva:
"Replace zero-length arrays with flexible-array members"
* tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
printk: ringbuffer: Replace zero-length array with flexible-array member
net/smc: Replace zero-length array with flexible-array member
net/mlx5: Replace zero-length array with flexible-array member
mei: hw: Replace zero-length array with flexible-array member
gve: Replace zero-length array with flexible-array member
Bluetooth: btintel: Replace zero-length array with flexible-array member
scsi: target: tcmu: Replace zero-length array with flexible-array member
ima: Replace zero-length array with flexible-array member
enetc: Replace zero-length array with flexible-array member
fs: Replace zero-length array with flexible-array member
Bluetooth: Replace zero-length array with flexible-array member
params: Replace zero-length array with flexible-array member
tracepoint: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member
mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member
dmaengine: ti-cppi5: Replace zero-length array with flexible-array member
Unlike earlier silicon variants, OcteonTx2 98xx
silicon has 2 NIX blocks and each of the CGX is
mapped to either of the NIX blocks. Each NIX
block supports 100G. Mapping btw NIX blocks and
CGX is done by firmware based on CGX speed config
to have a maximum possible network bandwidth.
Since the mapping is not fixed, it's difficult
for a user to figure out. Hence added a debugfs
entry which displays mapping between CGX LMAC,
NIX block and RVU PF.
Sample result of this entry ::
~# cat /sys/kernel/debug/octeontx2/rvu_pf_cgx_map
PCI dev RVU PF Func NIX block CGX LMAC
0002:02:00.0 0x400 NIX0 CGX0 LMAC0
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If NIX1 block is also implemented then add a new
directory for NIX1 in debugfs root. Stats of
NIX1 block can be read/writen from/to the files
in directory "/sys/kernel/debug/octeontx2/nix1/".
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CGX links are followed by LBK links but number of
CGX and LBK links varies between platforms. Hence
get the number of links present in hardware from
AF and use it to calculate LBK link number.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch puts together all mailbox changes
for 98xx silicon:
Attach ->
Modify resource attach mailbox handler to
request LFs from a block address out of multiple
blocks of same type. If a PF/VF need LFs from two
blocks of same type then attach mbox should be
called twice.
Example:
struct rsrc_attach *attach;
.. Allocate memory for message ..
attach->cptlfs = 3; /* 3 LFs from CPT0 */
.. Send message ..
.. Allocate memory for message ..
attach->modify = 1;
attach->cpt_blkaddr = BLKADDR_CPT1;
attach->cptlfs = 2; /* 2 LFs from CPT1 */
.. Send message ..
Detach ->
Update detach mailbox and its handler to detach
resources from CPT1 and NIX1 blocks.
MSIX ->
Updated the MSIX mailbox and its handler to return
MSIX offsets for the new block CPT1.
Free resources ->
Update free_rsrc mailbox and its handler to return
the free resources count of new blocks NIX1 and CPT1
Links ->
Number of CGX,LBK and SDP links may vary between
platforms. For example, in 98xx number of CGX and LBK
links are more than 96xx. Hence the info about number
of links present in hardware is useful for consumers to
request link configuration properly. This patch sends
this info in nix_lf_alloc_rsp.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On 98xx silicon, NPC block has additional
mcam entries, counters and NIX1 interfaces.
Extended set of registers are present for the
new mcam entries and counters.
This patch does the following:
- updates the register accessing macros
to use extended set if present.
- configures the MKEX profile for NIX1 interfaces also.
- updates mcam entry write functions to use assigned
NIX0/1 interfaces for the PF/VF.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Initialize MCE context for the assigned NIX0/1
block for a CGX mapped PF. Modified rvu_nix_aq_enq_inst
function to work with nix_hw so that MCE contexts
for both NIX blocks can be inited.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware configures NIX block mapping for all CGXs
to achieve maximum throughput. This patch reads
the configuration and create mapping between RVU
PF and NIX blocks. And for LBK VFs assign NIX0 for
even numbered VFs and NIX1 for odd numbered VFs.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch modifies NIX functions to operate
with nix_hw context so that existing functions
can be used for both NIX0 and NIX1 blocks. And
the NIX blocks present in the system are initialized
during driver init and freed during exit.
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
AF manages the tasks of allocating, freeing
LFs from RVU blocks to PF and VFs. With new
NIX1 and CPT1 blocks in 98xx, this patch
adds support for handling new blocks too.
Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since multiple blocks of same type are present in
98xx, modify functions which get resource count and
which update resource count to work with individual
block address instead of block type.
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update the axienet driver to properly support the Xilinx PCS/PMA PHY
component which is used for 1000BaseX and SGMII modes, including
properly configuring the auto-negotiation mode of the PHY and reading
the negotiated state from the PHY.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Link: https://lore.kernel.org/r/20201028171429.1699922-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After switching to the net core rx/tx byte/packet counters we can
remove the now unused private version.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Switch to the net core rx/tx byte/packet counter infrastructure.
This simplifies the code, only small drawback is some memory overhead
because we use just one queue, but allocate the counters per cpu.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver uses in_irq() to determine if the tlan_priv::lock has to be
acquired in tlan_mii_read_reg() and tlan_mii_write_reg().
The interrupt handler acquires the lock outside of these functions so the
in_irq() check is meant to prevent a lock recursion deadlock. But this
check is incorrect when interrupt force threading is enabled because then
the handler runs in thread context and in_irq() correctly returns false.
The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.
tlan_set_timer() has this conditional as well, but this function is only
invoked from task context or the timer callback itself. So it always has to
lock and the check can be removed.
tlan_mii_read_reg(), tlan_mii_write_reg() and tlan_phy_print() are invoked
from interrupt and other contexts.
Split out the actual function body into helper variants which are called
from interrupt context and make the original functions wrappers which
acquire tlan_priv::lock unconditionally.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Samuel Chessman <chessman@tux.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nv_update_stats() triggers a WARN_ON() when invoked from hard interrupt
context because the locks in use are not hard interrupt safe. It also has
an assert_spin_locked() which was the lock check before the lockdep era.
Lockdep has way broader locking correctness checks and covers both issues,
so replace the warning and the lock assert with lockdep_assert_held().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Rain River <rain.1986.08.12@gmail.com>
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
wait_for_cmd_complete() uses in_interrupt() to detect whether it is safe to
sleep or not.
The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.
in_interrupt() also is only partially correct because it fails to chose the
correct code path when just preemption or interrupts are disabled.
Add an argument 'may_block' to both functions and adjust the callers to
pass the context information.
The following call chains which end up invoking wait_for_cmd_complete()
were analyzed to be safe to sleep:
s2io_card_up()
s2io_set_multicast()
init_nic()
init_tti()
s2io_close()
do_s2io_delete_unicast_mc()
do_s2io_add_mac()
s2io_set_mac_addr()
do_s2io_prog_unicast()
do_s2io_add_mac()
s2io_reset()
do_s2io_restore_unicast_mc()
do_s2io_add_mc()
do_s2io_add_mac()
s2io_open()
do_s2io_prog_unicast()
do_s2io_add_mac()
The following call chains which end up invoking wait_for_cmd_complete()
were analyzed to be safe to sleep:
__dev_set_rx_mode()
s2io_set_multicast()
s2io_txpic_intr_handle()
s2io_link()
init_tti()
Add a may_sleep argument to wait_for_cmd_complete(), s2io_set_multicast()
and init_tti() and hand the context information in from the call sites.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is one main difference in mscc_ocelot between IP multicast and L2
multicast. With IP multicast, destination ports are encoded into the
upper bytes of the multicast MAC address. Example: to deliver the
address 01:00:5E:11:22:33 to ports 3, 8, and 9, one would need to
program the address of 00:03:08:11:22:33 into hardware. Whereas for L2
multicast, the MAC table entry points to a Port Group ID (PGID), and
that PGID contains the port mask that the packet will be forwarded to.
As to why it is this way, no clue. My guess is that not all port
combinations can be supported simultaneously with the limited number of
PGIDs, and this was somehow an issue for IP multicast but not for L2
multicast. Anyway.
Prior to this change, the raw L2 multicast code was bogus, due to the
fact that there wasn't really any way to test it using the bridge code.
There were 2 issues:
- A multicast PGID was allocated for each MDB entry, but it wasn't in
fact programmed to hardware. It was dummy.
- In fact we don't want to reserve a multicast PGID for every single MDB
entry. That would be odd because we can only have ~60 PGIDs, but
thousands of MDB entries. So instead, we want to reserve a multicast
PGID for every single port combination for multicast traffic. And
since we can have 2 (or more) MDB entries delivered to the same port
group (and therefore PGID), we need to reference-count the PGIDs.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This saves a re-classification of the MDB address on deletion.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is Not Needed, a comment will suffice.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since a helper is available for copying Ethernet addresses, let's use it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ocelot.h says:
/* MAC table entry types.
* ENTRYTYPE_NORMAL is subject to aging.
* ENTRYTYPE_LOCKED is not subject to aging.
* ENTRYTYPE_MACv4 is not subject to aging. For IPv4 multicast.
* ENTRYTYPE_MACv6 is not subject to aging. For IPv6 multicast.
*/
We don't want the permanent entries added with 'bridge mdb' to be
subject to aging.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
AIUI, the NETIF_F_TSO_MANGLEID flag is a signal to the stack that a
driver may _need_ to mangle IDs in order to do TSO, and conversely
a signal from the stack that the driver is permitted to do so.
Since we support both fixed and incrementing IPIDs, we should rely
on the SKB_GSO_FIXEDID flag on a per-skb basis, rather than using
the MANGLEID feature to make all TSOs fixed-id.
Includes other minor cleanups of ef100_make_tso_desc() coding style.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The NIC only needs to know where the headers it has to edit (TCP and
inner and outer IPv4) are, which fits GSO_PARTIAL nicely.
It also supports non-PARTIAL offload of UDP tunnels, again just
needing to be told the outer transport offset so that it can edit
the UDP length field.
(It's not clear to me whether the stack will ever use the non-PARTIAL
version with the netdev feature flags we're setting here.)
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need EFX_POPULATE_OWORD_17 for an encap TSO descriptor on EF100.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver does not implement a shutdown handler which leads to issues
when using kexec in certain scenarios. The NIC keeps on fetching
descriptors which gets flagged by the IOMMU with errors like this:
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Link: https://lore.kernel.org/r/20201028172125.496942-1-mdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The
older style of one-element or zero-length arrays should no longer be
used[2].
Refactor the code according to the use of a flexible-array member in
struct gve_stats_report, instead of a zero-length array, and use the
struct_size() helper to calculate the size for the resource allocation.
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
This driver was only used on the EBSA110 platform, which is now
getting removed, so the driver is no longer needed either.
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This patch enables the HW LPI Timer which controls the automatic entry
and exit of the LPI state.
The EEE LPI timer value is configured through ethtool. The driver will
auto select the LPI HW timer if the value in the HW timer supported range.
Else, the driver will fallback to SW timer.
Signed-off-by: Vineetha G. Jaya Kumaran <vineetha.g.jaya.kumaran@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Link: https://lore.kernel.org/r/20201027160051.22898-1-weifeng.voon@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit "stmmac: intel: Adding ref clock 1us tic for LPI cntr"
introduced a regression which leads to the kernel panic duing loading
of the dwmac_intel module.
Move the code block after pci resources is obtained.
Fixes: b4c5f83ae3 ("stmmac: intel: Adding ref clock 1us tic for LPI cntr")
Cc: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Link: https://lore.kernel.org/r/20201029093228.1741-1-vee.khee.wong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When PTP timestamping is enabled on Tx, the controller
inserts the Tx timestamp at the beginning of the frame
buffer, between SFD and the L2 frame header. This means
that the skb provided by the stack is required to have
enough headroom otherwise a new skb needs to be created
by the driver to accommodate the timestamp inserted by h/w.
Up until now the driver was relying on the second option,
using skb_realloc_headroom() to create a new skb to accommodate
PTP frames. Turns out that this method is not reliable, as
reallocation of skbs for PTP frames along with the required
overhead (skb_set_owner_w, consume_skb) is causing random
crashes in subsequent skb_*() calls, when multiple concurrent
TCP streams are run at the same time on the same device
(as seen in James' report).
Note that these crashes don't occur with a single TCP stream,
nor with multiple concurrent UDP streams, but only when multiple
TCP streams are run concurrently with the PTP packet flow
(doing skb reallocation).
This patch enforces the first method, by requesting enough
headroom from the stack to accommodate PTP frames, and so avoiding
skb_realloc_headroom() & co, and the crashes no longer occur.
There's no reason not to set needed_headroom to a large enough
value to accommodate PTP frames, so in this regard this patch
is a fix.
Reported-by: James Jurack <james.jurack@ametek.com>
Fixes: bee9e58c9e ("gianfar:don't add FCB length to hard_header_len")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201020173605.1173-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When PTP timestamping is enabled on Tx, the controller
inserts the Tx timestamp at the beginning of the frame
buffer, between SFD and the L2 frame header. This means
that the skb provided by the stack is required to have
enough headroom otherwise a new skb needs to be created
by the driver to accommodate the timestamp inserted by h/w.
Up until now the driver was relying on skb_realloc_headroom()
to create new skbs to accommodate PTP frames. Turns out that
this method is not reliable in this context at least, as
skb_realloc_headroom() for PTP frames can cause random crashes,
mostly in subsequent skb_*() calls, when multiple concurrent
TCP streams are run at the same time with the PTP flow
on the same device (as seen in James' report). I also noticed
that when the system is loaded by sending multiple TCP streams,
the driver receives cloned skbs in large numbers.
skb_cow_head() instead proves to be stable in this scenario,
and not only handles cloned skbs too but it's also more efficient
and widely used in other drivers.
The commit introducing skb_realloc_headroom in the driver
goes back to 2009, commit 93c1285c5d
("gianfar: reallocate skb when headroom is not enough for fcb").
For practical purposes I'm referencing a newer commit (from 2012)
that brings the code to its current structure (and fixes the PTP
case).
Fixes: 9c4886e5e6 ("gianfar: Fix invalid TX frames returned on error queue when time stamping")
Reported-by: James Jurack <james.jurack@ametek.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201029081057.8506-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some (apparently older) versions of the FEC hardware block do not like
the MMFR register being cleared to avoid generation of MII events at
initialization time. The action of clearing this register results in no
future MII events being generated at all on the problem block. This means
the probing of the MDIO bus will find no PHYs.
Create a quirk that can be checked at the FECs MII init time so that
the right thing is done. The quirk is set as appropriate for the FEC
hardware blocks that are known to need this.
Fixes: f166f890c8 ("net: ethernet: fec: Replace interrupt driven MDIO with polled IO")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Acked-by: Fugang Duan <fugand.duan@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Link: https://lore.kernel.org/r/20201028052232.1315167-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release regressions:
- r8169: fix forced threading conflicting with other shared
interrupts; we tried to fix the use of raise_softirq_irqoff
from an IRQ handler on RT by forcing hard irqs, but this
driver shares legacy PCI IRQs so drop the _irqoff() instead
- tipc: fix memory leak caused by a recent syzbot report fix
to tipc_buf_append()
Current release - bugs in new features:
- devlink: Unlock on error in dumpit() and fix some error codes
- net/smc: fix null pointer dereference in smc_listen_decline()
Previous release - regressions:
- tcp: Prevent low rmem stalls with SO_RCVLOWAT.
- net: protect tcf_block_unbind with block lock
- ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering
to only send legal frames to the hypervisor was too strict
- net: hns3: Clear the CMDQ registers before unmapping BAR region;
incorrect cleanup order was leading to a crash
- bnxt_en - handful of fixes to fixes:
- Send HWRM_FUNC_RESET fw command unconditionally, even
if there are PCIe errors being reported
- Check abort error state in bnxt_open_nic().
- Invoke cancel_delayed_work_sync() for PFs also.
- Fix regression in workqueue cleanup logic in bnxt_remove_one().
- mlxsw: Only advertise link modes supported by both driver
and device, after removal of 56G support from the driver
56G was not cleared from advertised modes
- net/smc: fix suppressed return code
Previous release - always broken:
- netem: fix zero division in tabledist, caused by integer overflow
- bnxt_en: Re-write PCI BARs after PCI fatal error.
- cxgb4: set up filter action after rewrites
- net: ipa: command payloads already mapped
Misc:
- s390/ism: fix incorrect system EID, it's okay to change since
it was added in current release
- vsock: use ns_capable_noaudit() on socket create to suppress
false positive audit messages
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+bGTcACgkQMUZtbf5S
IrtMvxAAldlA7x22atOHJ2HMTqUGK3rlIQYgxlWJbfDnA7Ui4rZTDa/K0VkuS4ey
rfaBf37XLDmzZkHgYvXG1qV2kB0MrXQqF7jJn+BNlAuM1kIsURt85Y2FxVu/+x6X
wWtBgg/D77VXpeMimGcp8wBg5xFlUDdTezo+tInSuY9ahi1dUQx3ZSBTgqz3a5Vn
wUwD7U0wkBEHkZFeLE6u0tdN9wY8IHH6cbMfzfnPxxIv6VVUOcQcvbomc+reEPhH
vxeCHg7tK3yxbe9cPEbuwVDpoapB8Y627rv08Njhfuxx6Yysp/OOvUNRIBeD/7Gi
TiZc6RMQ9XZ9QoGueaxFVSFIGRpRIQiO/gh+O5lWVX8dGsIjlKnw2E8gWmSS48YP
cMAez0Fe+CJ2S2QNFbGVyJJX6xOl5h6kQaf88OiEhudpEUgyz156MNVwbJnE4fYk
8GONCIea1hNjLQ1VUfcQEYdxChWVeAoUEZIFcK2YKA+1w9Ris6hV21j/aUxYXQRt
RGOALFUtCRIEX28ZW8eEyXgp1EdUvp7qcIK5YZEF6YHWlRxQ8LkU6qhD7Mm2oqkE
fydoMDz9TEBaWqFtpgQmZH76JYqd7btCsR2YPwnlKmcKQ3tEKtW0NKt1QH/DKcvm
nmDA6A+52XSbar1sRlVPnr3IGfodqGQ3A35sVFS8jkcmMvDRlbk=
=reLi
-----END PGP SIGNATURE-----
Merge tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release regressions:
- r8169: fix forced threading conflicting with other shared
interrupts; we tried to fix the use of raise_softirq_irqoff from an
IRQ handler on RT by forcing hard irqs, but this driver shares
legacy PCI IRQs so drop the _irqoff() instead
- tipc: fix memory leak caused by a recent syzbot report fix to
tipc_buf_append()
Current release - bugs in new features:
- devlink: Unlock on error in dumpit() and fix some error codes
- net/smc: fix null pointer dereference in smc_listen_decline()
Previous release - regressions:
- tcp: Prevent low rmem stalls with SO_RCVLOWAT.
- net: protect tcf_block_unbind with block lock
- ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering
to only send legal frames to the hypervisor was too strict
- net: hns3: Clear the CMDQ registers before unmapping BAR region;
incorrect cleanup order was leading to a crash
- bnxt_en - handful of fixes to fixes:
- Send HWRM_FUNC_RESET fw command unconditionally, even if there
are PCIe errors being reported
- Check abort error state in bnxt_open_nic().
- Invoke cancel_delayed_work_sync() for PFs also.
- Fix regression in workqueue cleanup logic in bnxt_remove_one().
- mlxsw: Only advertise link modes supported by both driver and
device, after removal of 56G support from the driver 56G was not
cleared from advertised modes
- net/smc: fix suppressed return code
Previous release - always broken:
- netem: fix zero division in tabledist, caused by integer overflow
- bnxt_en: Re-write PCI BARs after PCI fatal error.
- cxgb4: set up filter action after rewrites
- net: ipa: command payloads already mapped
Misc:
- s390/ism: fix incorrect system EID, it's okay to change since it
was added in current release
- vsock: use ns_capable_noaudit() on socket create to suppress false
positive audit messages"
* tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
r8169: fix issue with forced threading in combination with shared interrupts
netem: fix zero division in tabledist
ibmvnic: fix ibmvnic_set_mac
mptcp: add missing memory scheduling in the rx path
tipc: fix memory leak caused by tipc_buf_append()
gtp: fix an use-before-init in gtp_newlink()
net: protect tcf_block_unbind with block lock
ibmveth: Fix use of ibmveth in a bridge.
net/sched: act_mpls: Add softdep on mpls_gso.ko
ravb: Fix bit fields checking in ravb_hwtstamp_get()
devlink: Unlock on error in dumpit()
devlink: Fix some error codes
chelsio/chtls: fix memory leaks in CPL handlers
chelsio/chtls: fix deadlock issue
net: hns3: Clear the CMDQ registers before unmapping BAR region
bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.
bnxt_en: Check abort error state in bnxt_open_nic().
bnxt_en: Re-write PCI BARs after PCI fatal error.
bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
...
As reported by Serge flag IRQF_NO_THREAD causes an error if the
interrupt is actually shared and the other driver(s) don't have this
flag set. This situation can occur if a PCI(e) legacy interrupt is
used in combination with forced threading.
There's no good way to deal with this properly, therefore we have to
remove flag IRQF_NO_THREAD. For fixing the original forced threading
issue switch to napi_schedule().
Fixes: 424a646e07 ("r8169: fix operation under forced interrupt threading")
Link: https://www.spinics.net/lists/netdev/msg694960.html
Reported-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Link: https://lore.kernel.org/r/b5b53bfe-35ac-3768-85bf-74d1290cf394@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski brought up a concern in ibmvnic_set_mac().
ibmvnic_set_mac() does this:
ether_addr_copy(adapter->mac_addr, addr->sa_data);
if (adapter->state != VNIC_PROBED)
rc = __ibmvnic_set_mac(netdev, addr->sa_data);
So if state == VNIC_PROBED, the user can assign an invalid address to
adapter->mac_addr, and ibmvnic_set_mac() will still return 0.
The fix is to validate ethernet address at the beginning of
ibmvnic_set_mac(), and move the ether_addr_copy to
the case of "adapter->state != VNIC_PROBED".
Fixes: c26eba03e4 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201027220456.71450-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The check for src mac address in ibmveth_is_packet_unsupported is wrong.
Commit 6f2275433a wanted to shut down messages for loopback packets,
but now suppresses bridged frames, which are accepted by the hypervisor
otherwise bridging won't work at all.
Fixes: 6f2275433a ("ibmveth: Detect unsupported packets before sending to the hypervisor")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Link: https://lore.kernel.org/r/20201026104221.26570-1-msuchanek@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the function ravb_hwtstamp_get() in ravb_main.c with the existing
values for RAVB_RXTSTAMP_TYPE_V2_L2_EVENT (0x2) and RAVB_RXTSTAMP_TYPE_ALL
(0x6)
if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL)
config.rx_filter = HWTSTAMP_FILTER_ALL;
if the test on RAVB_RXTSTAMP_TYPE_ALL should be true,
it will never be reached.
This issue can be verified with 'hwtstamp_config' testing program
(tools/testing/selftests/net/hwtstamp_config.c). Setting filter type
to ALL and subsequent retrieving it gives incorrect value:
$ hwtstamp_config eth0 OFF ALL
flags = 0
tx_type = OFF
rx_filter = ALL
$ hwtstamp_config eth0
flags = 0
tx_type = OFF
rx_filter = PTP_V2_L2_EVENT
Correct this by converting if-else's to switch.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Link: https://lore.kernel.org/r/20201026102130.29368-1-andrew_gabbasov@mentor.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CPL handler functions chtls_pass_open_rpl() and
chtls_close_listsrv_rpl() should return CPL_RET_BUF_DONE
so that caller function will do skb free to avoid leak.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201025194228.31271-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In chtls_pass_establish() we hold child socket lock using bh_lock_sock
and we are again trying bh_lock_sock in add_to_reap_list, causing deadlock.
Remove bh_lock_sock in add_to_reap_list() as lock is already held.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201025193538.31112-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the AER or firmware reset flow, if we are in fatal error state or
if pci_channel_offline() is true, we don't send any commands to the
firmware because the commands will likely not reach the firmware and
most commands don't matter much because the firmware is likely to be
reset imminently.
However, the HWRM_FUNC_RESET command is different and we should always
attempt to send it. In the AER flow for example, the .slot_reset()
call will trigger this fw command and we need to try to send it to
effect the proper reset.
Fixes: b340dc680e ("bnxt_en: Avoid sending firmware messages when AER error is detected.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a PCIe fatal error occurs, the internal latched BAR addresses
in the chip get reset even though the BAR register values in config
space are retained.
pci_restore_state() will not rewrite the BAR addresses if the
BAR address values are valid, causing the chip's internal BAR addresses
to stay invalid. So we need to zero the BAR registers during PCIe fatal
error to force pci_restore_state() to restore the BAR addresses. These
write cycles to the BAR registers will cause the proper BAR addresses to
latch internally.
Fixes: 6316ea6db9 ("bnxt_en: Enable AER support.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As part of the commit b148bb238c
("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."),
cancel_delayed_work_sync() is called only for VFs to fix a possible
crash by cancelling any pending delayed work items. It was assumed
by mistake that the flush_workqueue() call on the PF would flush
delayed work items as well.
As flush_workqueue() does not cancel the delayed workqueue, extend
the fix for PFs. This fix will avoid the system crash, if there are
any pending delayed work items in fw_reset_task() during driver's
.remove() call.
Unify the workqueue cleanup logic for both PF and VF by calling
cancel_work_sync() and cancel_delayed_work_sync() directly in
bnxt_remove_one().
Fixes: b148bb238c ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A recent patch has moved the workqueue cleanup logic before
calling unregister_netdev() in bnxt_remove_one(). This caused a
regression because the workqueue can be restarted if the device is
still open. Workqueue cleanup must be done after unregister_netdev().
The workqueue will not restart itself after the device is closed.
Call bnxt_cancel_sp_work() after unregister_netdev() and
call bnxt_dl_fw_reporters_destroy() after that. This fixes the
regession and the original NULL ptr dereference issue.
Fixes: b16939b59c ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).
The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.
In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].
Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.
[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004
CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250
mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
instrument_atomic_read include/linux/instrumented.h:56 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
refcount_read include/linux/refcount.h:147 [inline]
skb_unref include/linux/skbuff.h:1044 [inline]
consume_skb+0x30/0x370 net/core/skbuff.c:833
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline]
mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672
mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063
mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
__do_softirq+0x223/0x964 kernel/softirq.c:292
asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711
Allocated by task 1006:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
slab_post_alloc_hook mm/slab.h:586 [inline]
slab_alloc_node mm/slub.c:2824 [inline]
slab_alloc mm/slub.c:2832 [inline]
kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837
__build_skb+0x21/0x60 net/core/skbuff.c:311
__netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464
netdev_alloc_skb include/linux/skbuff.h:2810 [inline]
mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline]
mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline]
mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817
mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831
mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline]
mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365
mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037
devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765
genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:714 [inline]
genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470
genl_rcv+0x24/0x40 net/netlink/genetlink.c:742
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:671
____sys_sendmsg+0x6d8/0x840 net/socket.c:2359
___sys_sendmsg+0xff/0x170 net/socket.c:2413
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2446
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 73:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1474 [inline]
slab_free_freelist_hook mm/slub.c:1507 [inline]
slab_free mm/slub.c:3072 [inline]
kmem_cache_free+0xbe/0x380 mm/slub.c:3088
kfree_skbmem net/core/skbuff.c:622 [inline]
kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616
__kfree_skb net/core/skbuff.c:679 [inline]
consume_skb net/core/skbuff.c:837 [inline]
consume_skb+0xe1/0x370 net/core/skbuff.c:831
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613
mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625
process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
kthread+0x355/0x470 kernel/kthread.c:291
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293
The buggy address belongs to the object at ffff88804f5703c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 212 bytes inside of
224-byte region [ffff88804f5703c0, ffff88804f5704a0)
The buggy address belongs to the page:
page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0
flags: 0x100000000000200(slab)
raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
Fixes: caf7297e7a ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During port creation the driver instructs the device to advertise all
the supported link modes queried from the device.
Since cited commit not all the link modes supported by the device are
supported by the driver. This can result in the device negotiating a
link mode that is not recognized by the driver causing ethtool to show
an unsupported speed:
$ ethtool swp1
...
Speed: Unknown!
This is especially problematic when the netdev is enslaved to a bond, as
the bond driver uses unknown speed as an indication that the link is
down:
[13048.900895] net_ratelimit: 86 callbacks suppressed
[13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex
[13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex
Fix this by making sure that only link modes that are supported by both
the device and the driver are advertised.
Fixes: b97cd89126 ("mlxsw: Remove 56G speed support")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current code sets up the filter action field before
rewrites are set up. When the action 'switch' is used
with rewrites, this may result in initial few packets
that get switched out don't have rewrites applied
on them.
So, make sure filter action is set up along with rewrites
or only after everything else is set up for rewrites.
Fixes: 12b276fbf6 ("cxgb4: add support to create hash filters")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/20201023115852.18262-1-rajur@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Smatch complains that "ret" might be uninitialized if we don't enter
the loop. We do always enter the loop so it's a false positive, but
it's cleaner to just return a literal zero and that silences the
warning as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201023112212.GA282278@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.
Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo
mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence. A below call graph
of the unload flow.
cleanup_net()
down_read(&pernet_ops_rwsem); <- first sem acquired
ops_pre_exit_list()
pre_exit()
devlink_pernet_pre_exit()
devlink_reload()
mlx5_devlink_reload_down()
mlx5_unload_one()
[...]
mlx5_ib_remove()
mlx5_ib_unbind_slave_port()
mlx5_remove_netdev_notifier()
unregister_netdevice_notifier()
down_write(&pernet_ops_rwsem);<- recurrsive lock
Hence, when net namespace is deleted, mlx5 reload results in deadlock.
When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.
Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.
Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Clang warns about the extra parentheses in this comparison:
drivers/net/ethernet/freescale/ucc_geth.c:1361:28:
warning: equality comparison with extraneous parentheses
if ((ugeth->phy_interface == PHY_INTERFACE_MODE_SGMII))
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
It seems clear the intent here is to do a comparison not an
assignment, so drop the extra parentheses to avoid any confusion.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023033236.3296988-1-mpe@ellerman.id.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The sentinel descriptor entry was getting missed in the
traverse of the ring from head to tail, so change to a
loop of 0 to the end.
Fixes: f1d2e894f1 ("ionic: use index not pointer for queue tracking")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kmemleak pointed out to us that ionic_rx_flush() is sending
skbs into napi_gro_XXX with a disabled napi context, and these
end up getting lost and leaked. We can safely remove the flush.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The sparse complaints around the static_asserts were obscuring
more useful complaints. So, don't check the static_asserts,
and fix the remaining sparse complaints.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
chtls_pt_recvmsg() receives a skb with tls header and subsequent
skb with data, need to finalize the data copy whenever next skb
with tls header is available. but here current tls header is
overwritten by next available tls header, ends up corrupting
user buffer data. fixing it by finalizing current record whenever
next skb contains tls header.
v1->v2:
- Improved commit message.
Fixes: 17a7d24aa8 ("crypto: chtls - generic handling of data and hdr")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201022190556.21308-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-tree/merge window issues:
- rtl8150: don't incorrectly assign random MAC addresses; fix late
in the 5.9 cycle started depending on a return code from
a function which changed with the 5.10 PR from the usb subsystem
Current release - regressions:
- Revert "virtio-net: ethtool configurable RXCSUM", it was causing
crashes at probe when control vq was not negotiated/available
Previous releases - regressions:
- ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
bus, only first device would be probed correctly
- nexthop: Fix performance regression in nexthop deletion by
effectively switching from recently added synchronize_rcu()
to synchronize_rcu_expedited()
- netsec: ignore 'phy-mode' device property on ACPI systems;
the property is not populated correctly by the firmware,
but firmware configures the PHY so just keep boot settings
Previous releases - always broken:
- tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
bulk transfers getting "stuck"
- icmp: randomize the global rate limiter to prevent attackers from
getting useful signal
- r8169: fix operation under forced interrupt threading, make the
driver always use hard irqs, even on RT, given the handler is
light and only wants to schedule napi (and do so through
a _irqoff() variant, preferably)
- bpf: Enforce pointer id generation for all may-be-null register
type to avoid pointers erroneously getting marked as null-checked
- tipc: re-configure queue limit for broadcast link
- net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
tunnels
- fix various issues in chelsio inline tls driver
Misc:
- bpf: improve just-added bpf_redirect_neigh() helper api to support
supplying nexthop by the caller - in case BPF program has already
done a lookup we can avoid doing another one
- remove unnecessary break statements
- make MCTCP not select IPV6, but rather depend on it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+R+5UACgkQMUZtbf5S
Irt9KxAAiYme2aSvMOni0NQsOgQ5mVsy7tk0/4dyRqkAx0ggrfGcFuhgZYNm8ZKY
KoQsQyn30Wb/2wAp1vX2I4Fod67rFyBfQg/8iWiEAu47X7Bj1lpPPJexSPKhF9/X
e0TuGxZtoaDuV9C3Su/FOjRmnShGSFQu1SCyJThshwaGsFL3YQ0Ut07VRgRF8x05
A5fy2SVVIw0JOQgV1oH0GP5oEK3c50oGnaXt8emm56PxVIfAYY0oq69hQUzrfMFP
zV9R0XbnbCIibT8R3lEghjtXavtQTzK5rYDKazTeOyDU87M+yuykNYj7MhgDwl9Q
UdJkH2OpMlJylEH3asUjz/+ObMhXfOuj/ZS3INtO5omBJx7x76egDZPMQe4wlpcC
NT5EZMS7kBdQL8xXDob7hXsvFpuEErSUGruYTHp4H52A9ke1dRTH2kQszcKk87V3
s+aVVPtJ5bHzF3oGEvfwP0DFLTF6WvjD0Ts0LmTY2DhpE//tFWV37j60Ni5XU21X
fCPooihQbLOsq9D8zc0ydEvCg2LLWMXM5ovCkqfIAJzbGVYhnxJSryZwpOlKDS0y
LiUmLcTZDoNR/szx0aJhVHdUUVgXDX/GsllHoc1w7ZvDRMJn40K+xnaF3dSMwtIl
imhfc5pPi6fdBgjB0cFYRPfhwiwlPMQ4YFsOq9JvynJzmt6P5FQ=
=ceke
-----END PGP SIGNATURE-----
Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Cross-tree/merge window issues:
- rtl8150: don't incorrectly assign random MAC addresses; fix late in
the 5.9 cycle started depending on a return code from a function
which changed with the 5.10 PR from the usb subsystem
Current release regressions:
- Revert "virtio-net: ethtool configurable RXCSUM", it was causing
crashes at probe when control vq was not negotiated/available
Previous release regressions:
- ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
bus, only first device would be probed correctly
- nexthop: Fix performance regression in nexthop deletion by
effectively switching from recently added synchronize_rcu() to
synchronize_rcu_expedited()
- netsec: ignore 'phy-mode' device property on ACPI systems; the
property is not populated correctly by the firmware, but firmware
configures the PHY so just keep boot settings
Previous releases - always broken:
- tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
bulk transfers getting "stuck"
- icmp: randomize the global rate limiter to prevent attackers from
getting useful signal
- r8169: fix operation under forced interrupt threading, make the
driver always use hard irqs, even on RT, given the handler is light
and only wants to schedule napi (and do so through a _irqoff()
variant, preferably)
- bpf: Enforce pointer id generation for all may-be-null register
type to avoid pointers erroneously getting marked as null-checked
- tipc: re-configure queue limit for broadcast link
- net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
tunnels
- fix various issues in chelsio inline tls driver
Misc:
- bpf: improve just-added bpf_redirect_neigh() helper api to support
supplying nexthop by the caller - in case BPF program has already
done a lookup we can avoid doing another one
- remove unnecessary break statements
- make MCTCP not select IPV6, but rather depend on it"
* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
tcp: fix to update snd_wl1 in bulk receiver fast path
net: Properly typecast int values to set sk_max_pacing_rate
netfilter: nf_fwd_netdev: clear timestamp in forwarding path
ibmvnic: save changed mac address to adapter->mac_addr
selftests: mptcp: depends on built-in IPv6
Revert "virtio-net: ethtool configurable RXCSUM"
rtnetlink: fix data overflow in rtnl_calcit()
net: ethernet: mtk-star-emac: select REGMAP_MMIO
net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
mptcp: depends on IPV6 but not as a module
sfc: move initialisation of efx->filter_sem to efx_init_struct()
mpls: load mpls_gso after mpls_iptunnel
net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
net: dsa: bcm_sf2: make const array static, makes object smaller
mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
...
After mac address change request completes successfully, the new mac
address need to be saved to adapter->mac_addr as well as
netdev->dev_addr. Otherwise, adapter->mac_addr still holds old
data.
Fixes: 62740e9788 ("net/ibmvnic: Update MAC address settings after adapter reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201020223919.46106-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver depends on mmio regmap API but doesn't select the appropriate
Kconfig option. This fixes it.
Fixes: 8c7bd5a454 ("net: ethernet: mtk-star-emac: new driver")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20201020073515.22769-1-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
efx_probe_filters() has not been called yet when EF100 calls into
efx_mcdi_filter_table_probe(), for which it wants to take the
filter_sem.
Fixes: a9dc3d5612 ("sfc_ef100: RX filter table management and related gubbins")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Link: https://lore.kernel.org/r/24fad43e-887d-051e-25e3-506f23f63abf@solarflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix build errors when TLS=m, TLS_TOE=y, and CRYPTO_DEV_CHELSIO_TLS=y.
Having (tristate) CRYPTO_DEV_CHELSIO_TLS depend on (bool) TLS_TOE
is not strong enough to prevent the bad combination of TLS=m and
CRYPTO_DEV_CHELSIO_TLS=y, so add a dependency on TLS to prevent the
problematic kconfig combination.
Fixes these build errors:
hppa-linux-ld: drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.o: in function `chtls_free_uld':
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c:165: undefined reference to `tls_toe_unregister_device'
hppa-linux-ld: drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.o: in function `chtls_register_dev':
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c:204: undefined reference to `tls_toe_register_device'
Fixes: 53b4414a70 ("net/tls: allow compiling TLS TOE out")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201019181059.22634-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When chtls_sock *csk is freed, same memory can be allocated
to different csk in chtls_sock_create().
csk->cdev = NULL; statement might ends up modifying wrong
csk, eventually causing kernel panic.
removing (csk->cdev = NULL) statement as it is not required.
Fixes: 3a0a978389 ("crypto/chtls: Fix chtls crash in connection cleanup")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the logic to compare net_device returned by ip_dev_find()
with the net_device list in cdev->ports[] array and return
net_device if matched else NULL.
Fixes: 6abde0b241 ("crypto/chtls: IPv6 support for inline TLS")
Signed-off-by: Venkatesh Ellapu <venkatesh.e@chelsio.com>
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Netdev is filled in egress_dev when connection is established,
If connection is closed before establishment, then egress_dev
is NULL, Fix it using ip_dev_find() rather then extracting from
egress_dev.
Fixes: 6abde0b241 ("crypto/chtls: IPv6 support for inline TLS")
Signed-off-by: Venkatesh Ellapu <venkatesh.e@chelsio.com>
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In chtls_sendpage() socket lock is released but not acquired,
fix it by taking lock.
Fixes: 36bedb3f2e ("crypto: chtls - Inline TLS record Tx")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e rx/tx
delay config"), the Realtek PHY driver will override any TX/RX delay
set by hardware straps if the phy-mode device property does not match.
This is causing problems on SynQuacer based platforms (the only SoC
that incorporates the netsec hardware), since many were built with
this Realtek PHY, and shipped with firmware that defines the phy-mode
as 'rgmii', even though the PHY is configured for TX and RX delay using
pull-ups.
From the driver's perspective, we should not make any assumptions in
the general case that the PHY hardware does not require any initial
configuration. However, the situation is slightly different for ACPI
boot, since it implies rich firmware with AML abstractions to handle
hardware details that are not exposed to the OS. So in the ACPI case,
it is reasonable to assume that the PHY comes up in the right mode,
regardless of whether the mode is set by straps, by boot time firmware
or by AML executed by the ACPI interpreter.
So let's ignore the 'phy-mode' device property when probing the netsec
driver in ACPI mode, and hardcode the mode to PHY_INTERFACE_MODE_NA,
which should work with any PHY provided that it is configured by the
time the driver attaches to it. While at it, document that omitting
the mode is permitted for DT probing as well, by setting the phy-mode
DT property to the empty string.
Fixes: 533dd11a12 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201018163625.2392-1-ardb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes gcc warning:
passing argument 1 of 'kfree' makes pointer from integer without a cast
Fixes: 3af5f0f5c7 ("net: korina: fix kfree of rx/tx descriptor array")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Link: https://lore.kernel.org/r/20201018184255.28989-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For several network drivers it was reported that using
__napi_schedule_irqoff() is unsafe with forced threading. One way to
fix this is switching back to __napi_schedule, but then we lose the
benefit of the irqoff version in general. As stated by Eric it doesn't
make sense to make the minimal hard irq handlers in drivers using NAPI
a thread. Therefore ensure that the hard irq handler is never
thread-ified.
Fixes: 9a899a35b0 ("r8169: switch to napi_schedule_irqoff")
Link: https://lkml.org/lkml/2020/10/18/19
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/4d3ef84a-c812-5072-918a-22a6f6468310@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ian reports that after upgrade from v5.8.14 to v5.9 only one
of his 4 ixgbe netdevs appear in the system.
Quoting the comment on ixgbe_x550em_a_has_mii():
* Returns true if hw points to lowest numbered PCI B:D.F x550_em_a device in
* the SoC. There are up to 4 MACs sharing a single MDIO bus on the x550em_a,
* but we only want to register one MDIO bus.
This matches the symptoms, since the return value from
ixgbe_mii_bus_init() is no longer ignored we need to handle
the higher ports of x550em without an error.
Fixes: 09ef193fef ("net: ethernet: ixgbe: check the return value of ixgbe_mii_bus_init()")
Reported-by: Ian Kumlien <ian.kumlien@gmail.com>
Tested-by: Ian Kumlien <ian.kumlien@gmail.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20201016232006.3352947-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The typical set of driver updates across the subsystem:
- Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns,
usnic, qib, qedr, cxgb4, hns, bnxt_re
- Various rtrs fixes and updates
- Bug fix for mlx4 CM emulation for virtualization scenarios where MRA
wasn't working right
- Use tracepoints instead of pr_debug in the CM code
- Scrub the locking in ucma and cma to close more syzkaller bugs
- Use tasklet_setup in the subsystem
- Revert the idea that 'destroy' operations are not allowed to fail at
the driver level. This proved unworkable from a HW perspective.
- Revise how the umem API works so drivers make fewer mistakes using it
- XRC support for qedr
- Convert uverbs objects RWQ and MW to new the allocation scheme
- Large queue entry sizes for hns
- Use hmm_range_fault() for mlx5 On Demand Paging
- uverbs APIs to inspect the GID table instead of sysfs
- Move some of the RDMA code for building large page SGLs into
lib/scatterlist
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g
mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2
l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi
5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk
46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp
oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL
YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310
/b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk
p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a
ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl
fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI
16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE=
=LKpl
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A usual cycle for RDMA with a typical mix of driver and core subsystem
updates:
- Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
hns, usnic, qib, qedr, cxgb4, hns, bnxt_re
- Various rtrs fixes and updates
- Bug fix for mlx4 CM emulation for virtualization scenarios where
MRA wasn't working right
- Use tracepoints instead of pr_debug in the CM code
- Scrub the locking in ucma and cma to close more syzkaller bugs
- Use tasklet_setup in the subsystem
- Revert the idea that 'destroy' operations are not allowed to fail
at the driver level. This proved unworkable from a HW perspective.
- Revise how the umem API works so drivers make fewer mistakes using
it
- XRC support for qedr
- Convert uverbs objects RWQ and MW to new the allocation scheme
- Large queue entry sizes for hns
- Use hmm_range_fault() for mlx5 On Demand Paging
- uverbs APIs to inspect the GID table instead of sysfs
- Move some of the RDMA code for building large page SGLs into
lib/scatterlist"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
RDMA/ucma: Fix use after free in destroy id flow
RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
RDMA: Explicitly pass in the dma_device to ib_register_device
lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
IB/mlx4: Convert rej_tmout radix-tree to XArray
RDMA/rxe: Fix bug rejecting all multicast packets
RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
RDMA/rxe: Remove duplicate entries in struct rxe_mr
IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
IB/rdmavt: Fix sizeof mismatch
MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
RDMA/uverbs: Expose the new GID query API to user space
...
The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
hang when handling scatter-gather DMA. Disable the problematic feature
by setting MAC register 0x58 bit28 and bit27.
Fixes: 39bfab8844 ("net: ftgmac100: Add support for DT phy-handle property")
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
From Maor Gottlieb says:
====================
This series extends __sg_alloc_table_from_pages to allow chaining of new
pages to an already initialized SG table.
This allows for drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in a very
large temporary buffer prior to the call to SG table initialization.
The last patch changes the Infiniband core to use the new API. It removes
duplicate functionality from the code and benefits from the optimization
of allocating dynamic SG table from pages.
In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
====================
* branch 'dynamic_sg':
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
traversal in common container configs and improving TCP back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
Expand netlink policy support and improve policy export to user space.
(Ge)netlink core performs request validation according to declared
policies. Expand the expressiveness of those policies (min/max length
and bitmasks). Allow dumping policies for particular commands.
This is used for feature discovery by user space (instead of kernel
version parsing or trial and error).
Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
Allow more than 255 IPv4 multicast interfaces.
Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
In Multi-patch TCP (MPTCP) support concurrent transmission of data
on multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
Allow more calls to same peer in RxRPC.
Support two new Controller Area Network (CAN) protocols -
CAN-FD and ISO 15765-2:2016.
Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
Add TC actions for implementing MPLS L2 VPNs.
Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary notifications
and make it easier to offload nexthops into HW by converting
to a blocking notifier.
Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific
TCP option use.
Reorganize TCP congestion control (CC) initialization to simplify life
of TCP CC implemented in BPF.
Add support for shipping BPF programs with the kernel and loading them
early on boot via the User Mode Driver mechanism, hence reusing all the
user space infra we have.
Support sleepable BPF programs, initially targeting LSM and tracing.
Add bpf_d_path() helper for returning full path for given 'struct path'.
Make bpf_tail_call compatible with bpf-to-bpf calls.
Allow BPF programs to call map_update_elem on sockmaps.
Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
Support listing and getting information about bpf_links via the bpf
syscall.
Enhance kernel interfaces around NIC firmware update. Allow specifying
overwrite mask to control if settings etc. are reset during update;
report expected max time operation may take to users; support firmware
activation without machine reboot incl. limits of how much impact
reset may have (e.g. dropping link or not).
Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
Adopt or extend devlink use for debug, monitoring, fw update
in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
mv88e6xxx, dpaa2-eth).
In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
Add XDP support for Intel's igb driver.
Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
Improve performance for packets which don't require much offloads
on recent Mellanox NICs by 20% by making multiple packets share
a descriptor entry.
Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
subtree to drivers/net. Move MDIO drivers out of the phy directory.
Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
=bc1U
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
- rework the non-coherent DMA allocator
- move private definitions out of <linux/dma-mapping.h>
- lower CMA_ALIGNMENT (Paul Cercueil)
- remove the omap1 dma address translation in favor of the common
code
- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)
- support per-node DMA CMA areas (Barry Song)
- increase the default seg boundary limit (Nicolin Chen)
- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)
- various cleanups
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl+IiPwLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPKEQ//TM8vxjucnRl/pklpMin49dJorwiVvROLhQqLmdxw
286ZKpVzYYAPc7LnNqwIBugnFZiXuHu8xPKQkIiOa2OtNDTwhKNoBxOAmOJaV6DD
8JfEtZYeX5mKJ/Nqd2iSkIqOvCwZ9Wzii+aytJ2U88wezQr1fnyF4X49MegETEey
FHWreSaRWZKa0MMRu9AQ0QxmoNTHAQUNaPc0PeqEtPULybfkGOGw4/ghSB7WcKrA
gtKTuooNOSpVEHkTas2TMpcBp6lxtOjFqKzVN0ml+/nqq5NeTSDx91VOCX/6Cj76
mXIg+s7fbACTk/BmkkwAkd0QEw4fo4tyD6Bep/5QNhvEoAriTuSRbhvLdOwFz0EF
vhkF0Rer6umdhSK7nPd7SBqn8kAnP4vBbdmB68+nc3lmkqysLyE4VkgkdH/IYYQI
6TJ0oilXWFmU6DT5Rm4FBqCvfcEfU2dUIHJr5wZHqrF2kLzoZ+mpg42fADoG4GuI
D/oOsz7soeaRe3eYfWybC0omGR6YYPozZJ9lsfftcElmwSsFrmPsbO1DM5IBkj1B
gItmEbOB9ZK3RhIK55T/3u1UWY3Uc/RVr+kchWvADGrWnRQnW0kxYIqDgiOytLFi
JZNH8uHpJIwzoJAv6XXSPyEUBwXTG+zK37Ce769HGbUEaUrE71MxBbQAQsK8mDpg
7fM=
=Bkf/
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- rework the non-coherent DMA allocator
- move private definitions out of <linux/dma-mapping.h>
- lower CMA_ALIGNMENT (Paul Cercueil)
- remove the omap1 dma address translation in favor of the common code
- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)
- support per-node DMA CMA areas (Barry Song)
- increase the default seg boundary limit (Nicolin Chen)
- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)
- various cleanups
* tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
ARM/ixp4xx: add a missing include of dma-map-ops.h
dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
dma-direct: factor out a dma_direct_alloc_from_pool helper
dma-direct check for highmem pages in dma_direct_alloc_pages
dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h>
dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma
dma-mapping: move dma-debug.h to kernel/dma/
dma-mapping: remove <asm/dma-contiguous.h>
dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h>
dma-contiguous: remove dma_contiguous_set_default
dma-contiguous: remove dev_set_cma_area
dma-contiguous: remove dma_declare_contiguous
dma-mapping: split <linux/dma-mapping.h>
cma: decrease CMA_ALIGNMENT lower limit to 2
firewire-ohci: use dma_alloc_pages
dma-iommu: implement ->alloc_noncoherent
dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
dma-mapping: add a new dma_alloc_pages API
dma-mapping: remove dma_cache_sync
53c700: convert to dma_alloc_noncoherent
...
Minor conflicts in net/mptcp/protocol.h and
tools/testing/selftests/net/Makefile.
In both cases code was added on both sides in the same place
so just keep both.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch changes the module name to "ch_ipsec" and prepends
"ch_ipsec" string instead of "chcr" in all debug messages and
function names.
V1->V2:
-Removed inline keyword from functions.
-Removed CH_IPSEC prefix from pr_debug.
-Used proper indentation for the continuation line of the function
arguments.
V2->V3:
Fix the checkpatch.pl warnings.
Fixes: 1b77be4639 ("crypto/chcr: Moving chelsio's inline ipsec functionality to /drivers/net")
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ingress large send packets are identified by either:
The IBMVETH_RXQ_LRG_PKT flag in the receive buffer
or with a -1 placed in the ip header checksum.
The method used depends on firmware version. Frame
geometry and sufficient header validation is performed by the
hypervisor eliminating the need for further header checks here.
Fixes: 7b5967389f ("ibmveth: set correct gso_size and gso_type")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Reviewed-by: Cristobal Forno <cris.forno@ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ibmveth_rx_csum_helper() must be called after ibmveth_rx_mss_helper()
as ibmveth_rx_csum_helper() may alter ip and tcp checksum values.
Fixes: 66aa0678ef ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Reviewed-by: Cristobal Forno <cris.forno@ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 4-tuple NAT offload via PEDIT always overwrites all the 4-tuple
fields even if they had not been explicitly enabled. If any fields in
the 4-tuple are not enabled, then the hardware overwrites the
disabled fields with zeros, instead of ignoring them.
So, add a parser that can translate the enabled 4-tuple PEDIT fields
to one of the NAT mode combinations supported by the hardware and
hence avoid overwriting disabled fields to 0. Any rule with
unsupported NAT mode combination is rejected.
Signed-off-by: Herat Ramani <herat@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Here are some SPDX-specific changes for 5.10-rc1.
They include:
- driver fixes to make spdxcheck.pl work properly
- add GFDL licenses as "deprecated" but required due to some of
our documentation using them
- add Zlib license as "deprecated" but required because we have
code with this license in the tree.
- convert some drivers to have SPDX identifiers that previously
didn't have them.
All have been in linux-next for a very long time with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX4c6oA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yl35ACg2i+pP5CBExSzQUtA1Tx/UD2CVNMAoIAQChwj
SHZurDuyHkEiCdB+5n1u
=C9qR
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are some SPDX-specific changes for 5.10-rc1.
They include:
- driver fixes to make spdxcheck.pl work properly
- add GFDL licenses as "deprecated" but required due to some of our
documentation using them
- add Zlib license as "deprecated" but required because we have code
with this license in the tree.
- convert some drivers to have SPDX identifiers that previously
didn't have them.
All have been in linux-next for a very long time with no reported
issues"
* tag 'spdx-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
scripts/spdxcheck.py: handle license identifiers in XML comments
net/mlx5: IPsec: make spdxcheck.py happy
LICENSES/deprecated: add Zlib license text
LICENSE: add GFDL deprecated licenses
net/qla3xxx: Convert to SPDX license identifiers
net/qlge: Convert to SPDX license identifiers
net/qlcnic: Convert to SPDX license identifiers
scsi/qla2xxx: Convert to SPDX license identifiers
scsi/qla4xxx: Convert to SPDX license identifiers
The e1000_clear_vfta function was triggering a warning in kbuild-bot
testing. It's actually a bug but has no functional impact.
drivers/net/ethernet/intel/e1000/e1000_hw.c:4415:58: warning: Same expression in both branches of ternary operator. [duplicateExpressionTernary]
Fix this warning by removing the offending code and simplifying
the routine to do exactly what it did before, no functional
change.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Starting with API version 1.10 firmware for X722 devices has ability
to change FEC settings in PHY. Code added in this patch allows
changing FEC settings if the capability flag indicates the device
supports this feature.
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A helper for checking whether a net_device belongs to mscc_ocelot
already existed and did not need to be rewritten. Use it.
Fixes: 319e4dd11a ("net: mscc: ocelot: introduce conversion helpers between port and netdev")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201011092041.3535101-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The at91rm9200 variant used by a few chips including the MSC313 supports
two Tx descriptors (one frame being serialized and another one queued).
However the driver only implemented a single one, which adds a dead time
after each transfer to receive and process the interrupt and wake the
queue up, preventing from reaching line rate.
This patch implements a very basic 2-deep queue to address this limitation.
The tests run on a Breadbee board equipped with an MSC313E show that at
1 GHz, HTTP traffic on medium-sized objects (45kB) was limited to exactly
50 Mbps before this patch, and jumped to 76 Mbps with this patch. And tests
on a single TCP stream with an MTU of 576 jump from 10kpps to 15kpps. With
1500 byte packets it's now possible to reach line rate versus 75 Mbps
before.
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Cc: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20201011090944.10607-4-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RM9200 supports one frame being sent while another one is waiting in
queue. This avoids the dead time that follows the emission of a frame
and which prevents one from reaching line speed.
Right now the driver supports only a single skb, so we'll first replace
the rm9200-specific skb info with an array of two macb_tx_skb (already
used by other drivers). This patch only moves the skb_length to
txq[0].size and skb_physaddr to skb[0].mapping but doesn't perform any
other change. It already uses [desc] in order to minimize future changes.
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Cc: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20201011090944.10607-3-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Transmit Buffer Register Empty replaces TXERR on RM9200 and signals the
sender may try to send again becase the last queued frame is no longer
in queue (being transmitted or already transmitted).
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Cc: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20201011090944.10607-2-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull copy_and_csum cleanups from Al Viro:
"Saner calling conventions for csum_and_copy_..._user() and friends"
[ Removing 800+ lines of code and cleaning stuff up is good - Linus ]
* 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ppc: propagate the calling conventions change down to csum_partial_copy_generic()
amd64: switch csum_partial_copy_generic() to new calling conventions
sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
mips: propagate the calling convention change down into __csum_partial_copy_..._user()
mips: __csum_partial_copy_kernel() has no users left
mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
i386: propagate the calling conventions change down to csum_partial_copy_generic()
sh: propage the calling conventions change down to csum_partial_copy_generic()
m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
arm: propagate the calling convention changes down to csum_partial_copy_from_user()
alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
saner calling conventions for csum_and_copy_..._user()
csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
csum_partial_copy_nocheck(): drop the last argument
unify generic instances of csum_partial_copy_nocheck()
icmp_push_reply(): reorder adding the checksum up
skb_copy_and_csum_bits(): don't bother with the last argument
Add new FTE in TX IPsec FT per IPsec state. It has the
same matching criteria as the RX steering rule.
The IPsec FT is created/destroyed when the first/last rule
is added/deleted respectively.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add new namespace that represents the NIC TX domain.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the error exit path err_free kfree's attr. In the case where
flow and parse_attr failed to be allocated this return path will free
the uninitialized pointer attr, which is not correct. In the other
case where attr fails to allocate attr does not need to be freed. So
in both error exits via err_free attr should not be freed, so remove
it.
Addresses-Coverity: ("Uninitialized pointer read")
Fixes: ff7ea04ad5 ("net/mlx5e: Fix potential null pointer dereference")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch adds FW versions stored in the flash to devlink info_get
callback. Return the correct fw.psid running version using the
newly added bp->nvm_cfg_ver.
v2:
Ensure stored pkg_name string is NULL terminated when copied to
devlink.
Return directly from the last call to bnxt_dl_info_put().
If the FW call to get stored version fails for any reason, return
success immediately to devlink without the stored versions.
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-10-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new function bnxt_dl_info_put() to simplify the code, as there
are more stored firmware version fields to be added in the next patch.
Also, rename fw_ver variable name to ncsi_ver for better naming while
copying to devlink info_get cb.
v2:
Ensure active_pkg_name string is NULL terminated when copied to
devlink.
Return directly from the last call to bnxt_dl_info_put().
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-9-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new bnxt_hwrm_nvm_get_dev_info() to query firmware version
information via NVM_GET_DEV_INFO firmware command. Use it to
get the running version of the NVM configuration information.
This new function will also be used in subsequent patches to get the
stored firmware versions.
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-8-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the VF virtual link is set to always enabled, the speed may be
unknown when the physical link is down. The driver currently logs
the link speed as 4294967295 Mbps which is SPEED_UNKNOWN. Modify
the link up log message as "speed unknown" which makes more sense.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-7-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
event_data1 and event_data2 are used when processing most events.
Store these in local variables at the beginning of the function to
simplify many of the case statements.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-5-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, bp->msg_enable has default value of 0. It is more useful
to have the commonly used NETIF_MSG_DRV and NETIF_MSG_HW enabled by
default.
v2: Change the fall back bnxt_reset_task() inside bnxt_rx_ring_reset()
to silent mode. With older fw, we would take the fall back path and
it would be very noisy.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-4-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Online self tests are not disruptive and can be run in NPAR mode
and in multi-host NIC as well.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-3-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If NVRAM resources are locked, NVM writes are not permitted. In such
scenarios, firmware returns HWRM_ERR_CODE_RESOURCE_LOCKED error to
firmware commands.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-2-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The phy_reset_after_clk_enable() is always called with ndev->phydev,
however that pointer may be NULL even though the PHY device instance
already exists and is sufficient to perform the PHY reset.
This condition happens in fec_open(), where the clock must be enabled
first, then the PHY must be reset, and then the PHY IDs can be read
out of the PHY.
If the PHY still is not bound to the MAC, but there is OF PHY node
and a matching PHY device instance already, use the OF PHY node to
obtain the PHY device instance, and then use that PHY device instance
when triggering the PHY reset.
Fixes: 1b0a83ac04 ("net: fec: add phy_reset_after_clk_enable() support")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Richard Leitner <richard.leitner@skidata.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
netcons calls napi_poll with a budget of 0 to transmit packets.
Handle this by:
- skipping RX processing
- do not try to recycle TX packets to the RX cache
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kmalloc returns KSEG0 addresses so convert back from KSEG1
in kfree. Also make sure array is freed when the driver is
unloaded from the kernel.
Fixes: ef11291bcd ("Add support the Korina (IDT RC32434) Ethernet MAC")
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The VCAP_IS1_ACT_VID_REPLACE_ENA action, from the VCAP IS1 ingress TCAM,
changes the classified VLAN.
We are only exposing this ability for switch ports that are under VLAN
aware bridges. This is because in standalone ports mode and under a
bridge with vlan_filtering=0, the ocelot driver configures the switch to
operate as VLAN-unaware, so the classified VLAN is not derived from the
802.1Q header from the packet, but instead is always equal to the
port-based VLAN ID of the ingress port. We _can_ still change the
classified VLAN for packets when operating in this mode, but the end
result will most likely be a drop, since both the ingress and the egress
port need to be members of the modified VLAN. And even if we install the
new classified VLAN into the VLAN table of the switch, the result would
still not be as expected: we wouldn't see, on the output port, the
modified VLAN tag, but the original one, even though the classified VLAN
was indeed modified. This is because of how the hardware works: on
egress, what is pushed to the frame is a "port tag", which gives us the
following options:
- Tag all frames with port tag (derived from the classified VLAN)
- Tag all frames with port tag, except if the classified VLAN is 0 or
equal to the native VLAN of the egress port
- No port tag
Needless to say, in VLAN-unaware mode we are disabling the port tag.
Otherwise, the existing VLAN tag would be ignored, and a second VLAN
tag (the port tag), holding the classified VLAN, would be pushed
(instead of replacing the existing 802.1Q tag). This is definitely not
what the user wanted when installing a "vlan modify" action.
So it is simply not worth bothering with VLAN modify rules under other
configurations except when the ports are fully VLAN-aware.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a methodical transition of the driver from phylib
to phylink, following the guidelines from sfp-phylink.rst.
The MAC register configurations based on interface mode
were moved from the probing path to the mac_config() hook.
MAC enable and disable commands (enabling Rx and Tx paths
at MAC level) were also extracted and assigned to their
corresponding phylink hooks.
As part of the migration to phylink, the serdes configuration
from the driver was offloaded to the PCS_LYNX module,
introduced in commit 0da4c3d393 ("net: phy: add Lynx PCS module"),
the PCS_LYNX module being a mandatory component required to
make the enetc driver work with phylink.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.cionei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Decouple internal mdio bus creation from serdes
configuration, as a prerequisite to offloading
serdes configuration to a different module.
Group together mdio bus creation routines, cleanup.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Decouple level MAC configuration based on phy interface type
from general port configuration.
Group together MAC and link configuration code.
Decouple external mdio bus creation from interface type
parsing. No longer return an (unhandled) error code when
phy_node not found, use phy_node to indicate whether the
port has a phy or not. No longer fall-through when serdes
configuration fails for the link modes that require
internal link configuration.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When packets are received on the error queue, this function under
net_ratelimit():
netif_err(priv, hw, net_dev, "Err FD status = 0x%08x\n");
does not get printed. Instead we only see:
[ 3658.845592] net_ratelimit: 244 callbacks suppressed
[ 3663.969535] net_ratelimit: 230 callbacks suppressed
[ 3669.085478] net_ratelimit: 228 callbacks suppressed
Enabling NETIF_MSG_HW fixes this issue, and we can see some information
about the frame descriptors of packets.
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Factor out handling the private packet/byte counters to new
functions rtl_get_priv_stats() and rtl_inc_priv_stats().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make use of the new struct_size() helper instead of the offsetof() idiom.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A subsequent addition of an IP4 or IP6 rule after other rules would
overwrite any existing TCAM entries of related L4 protocols(ex: tcp4 or
udp6). This was due to the mask including too many TCAM entries. Add new
packet type masks with bits properly excluded so rules are not overwritten.
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pointers should be casted to unsigned long to avoid
-Wpointer-to-int-cast warnings:
drivers/net/ethernet/intel/ice/ice_flow.h:197:33: warning:
cast from pointer to integer of different size
drivers/net/ethernet/intel/ice/ice_flow.h:198:32: warning:
cast to pointer from integer of different size
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While debugging a recent failure to update the flash of an ice device,
I found it helpful to add additional logging which helped determine the
root cause of the problem being a timeout issue.
Add some extra dev_dbg() logging messages which can be enabled using the
dynamic debug facility, including one for ice_aq_wait_for_event that
will use jiffies to capture a rough estimate of how long we waited for
the completion of a firmware command.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the devlink_port structure is stored within the ice_pf. This
made sense because we create a single devlink_port for each PF. This
setup does not mesh with the abstractions in the driver very well, and
led to a flow where we accidentally call devlink_port_unregister twice
during error cleanup.
In particular, if devlink_port_register or devlink_port_unregister are
called twice, this leads to a kernel panic. This appears to occur during
some possible flows while cleaning up from a failure during driver
probe.
If register_netdev fails, then we will call devlink_port_unregister in
ice_cfg_netdev as it cleans up. Later, we again call
devlink_port_unregister since we assume that we must cleanup the port
that is associated with the PF structure.
This occurs because we cleanup the devlink_port for the main PF even
though it was not allocated. We allocated the port within a per-VSI
function for managing the main netdev, but did not release the port when
cleaning up that VSI, the allocation and destruction are not aligned.
Instead of attempting to manage the devlink_port as part of the PF
structure, manage it as part of the PF VSI. Doing this has advantages,
as we can match the de-allocation of the devlink_port with the
unregister_netdev associated with the main PF VSI.
Moving the port to the VSI is preferable as it paves the way for
handling devlink ports allocated for other purposes such as SR-IOV VFs.
Since we're changing up how we allocate the devlink_port, also change
the indexing. Originally, we indexed the port using the PF id number.
This came from an old goal of sharing a devlink for each physical
function. Managing devlink instances across multiple function drivers is
not workable. Instead, lets set the port number to the logical port
number returned by firmware and set the index using the VSI index
(sometimes referred to as VSI handle).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add "fw.app.bundle_id" to display the DDP Track ID of the active DDP
package. This id is similar to "fw.bundle_id" and is a unique identifier
for the DDP package that is loaded in the device. Each new DDP has
a unique Track ID generated for it, and the ID can be used to identify
and track the DDP package.
Add documentation for the new devlink info version.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ice_info_get_dsn always returns 0, so just make it void.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new test in checkpatch detects repeated words; cleanup all pre-existing
occurrences of those now.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Co-developed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use %*phD format to print small buffer as hex string.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devlink reload action fw_activate with reload limit
no_reset which does firmware live patching, updating the firmware image
without reset, no downtime and no configuration lose. The driver checks
if the firmware is capable of handling the pending firmware changes as a
live patch. If it is then it triggers firmware live patching flow.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The enable_remote_dev_reset devlink param flags that the host admin
allows resets by other hosts. In case it is cleared mlx5 host PF driver
will send NACK on pci sync for firmware update reset request and the
command will fail.
By default enable_remote_dev_reset parameter is true, so pci sync for
firmware update reset is enabled.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devlink reload action fw_activate. To activate firmware
image the mlx5 driver resets the firmware and reloads it from flash. If
a new image was stored on flash it will be loaded. Once this reload
command is executed the driver initiates fw sync reset flow, where the
firmware synchronizes all PFs on coming reset and driver reload.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If firmware sends sync_reset_abort to driver the driver should clear the
reset requested mode as reset is not expected any more.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On sync_reset_now event the driver does reload and PCI link toggle to
activate firmware upgrade reset. When the firmware sends this event it
syncs the event on all PFs, so all PFs will do PCI link toggle at once.
To do PCI link toggle, the driver ensures that no other device ID under
the same bridge by checking that all the PF functions under the same PCI
bridge have same device ID. If no other device it uses PCI bridge link
control to turn link down and up.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Set capability to notify the firmware that this host driver is capable
of handling pci sync for firmware update events.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add functions to query and set the MFRL reset options supported by
firmware.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add reload limit to demand restrictions on reload actions.
Reload limits supported:
no_reset: No reset allowed, no down time allowed, no link flap and no
configuration is lost.
By default reload limit is unspecified and so no constraints on reload
actions are required.
Some combinations of action and limit are invalid. For example, driver
can not reinitialize its entities without any downtime.
The no_reset reload limit will have usecase in this patchset to
implement restricted fw_activate on mlx5.
Have the uapi parameter of reload limit ready for future support of
multiselection.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add devlink reload action to allow the user to request a specific reload
action. The action parameter is optional, if not specified then devlink
driver re-init action is used (backward compatible).
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
Reload actions supported are:
driver_reinit: driver entities re-initialization, applying devlink-param
and devlink-resource values.
fw_activate: firmware activate.
command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
driver_reinit
$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
driver_reinit fw_activate
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The phy_reset_after_clk_enable() does a PHY reset, which means the PHY
loses its register settings. The fec_enet_mii_probe() starts the PHY
and does the necessary calls to configure the PHY via PHY framework,
and loads the correct register settings into the PHY. Therefore,
fec_enet_mii_probe() should be called only after the PHY has been
reset, not before as it is now.
Fixes: 1b0a83ac04 ("net: fec: add phy_reset_after_clk_enable() support")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Without these definitions, the driver will crash in:
mscc_ocelot_probe
-> ocelot_init
-> ocelot_vcap_init
-> __ocelot_target_read_ix
I missed this because I did not have the VSC7514 hardware to test, only
the VSC9959 and VSC9953, and the probing part is different.
Fixes: e3aea296d8 ("net: mscc: ocelot: add definitions for VCAP ES0 keys, actions and target")
Fixes: a61e365d7c ("net: mscc: ocelot: add definitions for VCAP IS1 keys, actions and target")
Reported-by: Divya Koppera <Divya.Koppera@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Small conflict around locking in rxrpc_process_event() -
channel_lock moved to bundle in next, while state lock
needs _bh() from net.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some firmware files trigger a PHY soft reset and don't wait for it to
be finished. PHY register writes directly after applying the firmware
may fail or provide unexpected results therefore. Fix this by waiting
for bit BMCR_RESET to be cleared after applying firmware.
There's nothing wrong with the referenced change, it's just that the
fix will apply cleanly only after this change.
Fixes: 89fbd26cca ("r8169: fix firmware not resetting tp->ocp_base")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mediadetect is another name for the EDPD (energy detect power down).
This feature allows device to save extra power when no link is available.
PHY goes into the extreme power saving mode and only periodically wakes up
and checks for the link.
AQC devices has fixed check period of 6 seconds
The feature may increase linkup time.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
PHY downshift allows phy to try renegotiate if link is unstable
and can carry higher speed.
AQC devices has integrated PHY which is controlled by MAC firmware.
Thus, driver defines new ethtool callbacks to implement phy tunables
via netdev.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an upper bound to the value that a watermark may hold. That
upper bound is not immediately obvious during configuration, and it
might be possible to have accidental truncation.
Actually this has happened already, add a warning to prevent it from
happening again.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tail dropping is enabled for a port when:
1. A source port consumes more packet buffers than the watermark encoded
in SYS:PORT:ATOP_CFG.ATOP.
AND
2. Total memory use exceeds the consumption watermark encoded in
SYS:PAUSE_CFG:ATOP_TOT_CFG.
The unit of these watermarks is a 60 byte memory cell. That unit is
programmed properly into ATOP_TOT_CFG, but not into ATOP. Actually when
written into ATOP, it would get truncated and wrap around.
Fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
A driver may refuse to enable VLAN filtering for any reason beyond what
the DSA framework cares about, such as:
- having tc-flower rules that rely on the switch being VLAN-aware
- the particular switch does not support VLAN, even if the driver does
(the DSA framework just checks for the presence of the .port_vlan_add
and .port_vlan_del pointers)
- simply not supporting this configuration to be toggled at runtime
Currently, when a driver rejects a configuration it cannot support, it
does this from the commit phase, which triggers various warnings in
switchdev.
So propagate the prepare phase to drivers, to give them the ability to
refuse invalid configurations cleanly and avoid the warnings.
Since we need to modify all function prototypes and check for the
prepare phase from within the drivers, take that opportunity and move
the existing driver restrictions within the prepare phase where that is
possible and easy.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Landen Chao <Landen.Chao@mediatek.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Jonathan McDowell <noodles@earth.li>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang static analysis reports this problem:
drivers/net/ethernet/marvell/mvneta.c:3465:2: warning:
Attempt to free released memory
kfree(txq->buf);
^~~~~~~~~~~~~~~
When mvneta_txq_sw_init() fails to alloc txq->tso_hdrs,
it frees without poisoning txq->buf. The error is caught
in the mvneta_setup_txqs() caller which handles the error
by cleaning up all of the txqs with a call to
mvneta_txq_sw_deinit which also frees txq->buf.
Since mvneta_txq_sw_deinit is a general cleaner, all of the
partial cleaning in mvneta_txq_sw_deinit()'s error handling
is not needed.
Fixes: 2adb719d74 ("net: mvneta: Implement software TSO")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver will schedule RX ring reset when we get a buffer
error in the RX completion record. These RX buffer errors can be due
to normal out-of-buffer conditions or a permanent error in the RX
ring. Because the driver cannot distinguish between these 2
conditions, we assume all these buffer errors require reset.
This is very disruptive when it is just a normal out-of-buffer
condition. Newer firmware will now monitor the rings for the permanent
failure and will send a notification to the driver when it happens.
This allows the driver to reset only when such a notification is
received. In environments where we have predominently out-of-buffer
conditions, we now can avoid these unnecessary resets.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is logic in the RX path to detect unexpected handles in the
RX completion. We'll print a warning and schedule a reset. The
next expected handle is then set to 0xffff which is guaranteed to
not match any valid handle. This will force all remaining packets in
the ring to be discarded before the reset. There can be hundreds of
these packets remaining in the ring and there is no need to print the
warnings for these forced errors.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a per ring rx_resets counter to count these RX resets.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some older chips, it is necessary to do a reset when we get buffer
errors associated with an RX ring. These buffer errors may become
frequent if the RX ring underruns under heavy traffic. The current
code does a global reset of all reasources when this happens. This
works but creates a big disruption of all rings when one RX ring is
having problem. This patch implements a localized RX ring reset of
just the RX ring having the issue. All other rings including all
TX rings will not be affected by this single RX ring reset.
Only the older chips prior to the P5 class supports this reset.
Because it is not a global reset, packets may still be arriving
while we are calling firmware to reset that ring. We need to be
sure that we don't post any buffers during this time while the
ring is undergoing reset. After firmware completes successfully,
the ring will be in the reset state with no buffers and we can start
filling it with new buffers and posting them.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_init_one_rx_ring() includes logic to initialize the BDs for one RX
ring and to allocate the buffers. Separate the allocation logic into a
new bnxt_alloc_one_rx_ring() function. The allocation function will be
used later to allocate new buffers for one specified RX ring when we
reset that RX ring.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_free_rx_skbs() frees all the allocated buffers and SKBs for
every RX ring. Refactor this function by calling a new function
bnxt_free_one_rx_ring_skbs() to free these buffers on one specified
RX ring at a time. This is preparation work for resetting one RX
ring during run-time.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If firmware does not come out of reset, log FW health status info
to provide more information on firmware status.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NS3 SoC platforms require assistance from the OP-TEE to recover
firmware if a crash occurs while no driver is bound. The
CRASHED_NO_MASTER condition is recorded in the firmware status register
during the crash to indicate when driver intervension is needed to
coordinate a firmware reload. This condition is detected during early
driver initialization in order to effect a firmware fastboot on
supported platforms when necessary.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware now supports device independent discovery of the status
register location. This status register can provide more detailed
information about firmware errors, especially if problems occur
before the HWRM interface is functioning. Attempt to map this
register if it is present and report the firmware status on firmware
init failures.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>