Currently, the tx unicast promisc is always enabled when promisc
mode on. If tx unicast promisc on, a function will receive all
unicast packet from other functions belong to the same port.
Add a ethtool private flag to control whether enable tx
unicast promisc. Then the function is able to filter the
unknown unicast packets from other function.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For DEVICE_VERSION_V2, the hardware supports enable tx and rx
promiscuous separately. But tx or rx promiscuous is active for
unicast, multicast and broadcast promiscuous simultaneously.
To support traffics between functions belong to the same port,
we always enable tx promiscuous for broadcast promiscuous, so
tx promiscuous for unicast and multicast promiscuous is also
enabled.
For DEVICE_VERSION_V3, the hardware decouples the above
relationship. Tx unicast promiscuous, rx unicast promiscuous,
tx multicast promiscuous, rx multicast promiscuous, tx broadcast
promiscuous and rx broadcast promiscuous can be enabled separately.
So add support for the new promiscuous command.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Create a function to fill the fields of struct mlx5e_create_cq_param
based on a channel. The purpose is code reuse between normal CQs, XSK
CQs and the upcoming QoS CQs.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use the new FW caps to advertise for ip-in-ip tunnel support separately
for RX and TX.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fix smatch warnings:
drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c:105 esw_acl_egress_lgcy_setup() warn: passing zero to 'PTR_ERR'
drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c:177 esw_acl_egress_ofld_setup() warn: passing zero to 'PTR_ERR'
drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c:184 esw_acl_ingress_lgcy_setup() warn: passing zero to 'PTR_ERR'
drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.c:262 esw_acl_ingress_ofld_setup() warn: passing zero to 'PTR_ERR'
esw_acl_table_create() never returns NULL, so
NULL test should be removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, when more than one EQ is sharing an IRQ, and this IRQ is
being interrupted, all the EQs sharing the IRQ will be armed. This is
done regardless of whether an EQ has EQE.
When multiple EQs are sharing an IRQ, one or more EQs can have valid
EQEs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Since kvzalloc will initialize the allocated memory, it is not
necessary to initialize it once again.
Fixes: 11b717d615 ("net/mlx5: E-Switch, Get reg_c0 value on CQE")
Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Transmitted packet timestamping accuracy can be improved when using
timestamp from the port, instead of packet CQE creation timestamp, as
it better reflects the actual time of a packet's transmit.
TX port timestamping is supported starting from ConnectX6-DX hardware.
Although at the original completion, only CQE timestamp can be attached,
we are able to get TX port timestamping via an additional completion over
a special CQ associated with the SQ (in addition to the regular CQ).
Driver to ignore the original packet completion timestamp, and report
back the timestamp of the special CQ completion. If the absolute timestamp
diff between the two completions is greater than 1 / 128 second, ignore
the TX port timestamp as it has a jitter which is too big.
No skb will be generate out of the extra completion.
Allocate additional CQ per ptpsq, to receive the TX port timestamp.
Driver to hold an skb FIFO in order to map between transmitted skb to
the two expected completions. When using ptpsq, hold double refcount on
the skb, to gaurantee it will not get released before both completions
arrive.
Expose dedicated counters of the ptp additional CQ and connect it to the
TX health reporter.
This patch improves TX Hardware timestamping offset to be less than 40ns
at a 100Gbps line rate, compared to 600ns before.
With that, making our HW compliant with G.8273.2 class C, and allow Linux
systems to be deployed in the 5G telco edge, where this standard is a
must.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add TX PTP port object support for better TX timestamping accuracy.
Currently, driver supports CQE based TX port timestamp. Device
also offers TX port timestamp, which has less jitter and better
reflects the actual time of a packet's transmit.
Define new driver layout called ptpsq, on which driver will create
SQs that will support TX port timestamp for their transmitted packets.
Driver to identify PTP TX skbs and steer them to these dedicated SQs
as part of the select queue ndo.
Driver to hold ptpsq per TC and report them at
netif_set_real_num_tx_queues().
Add support for all needed functionality in order to xmit and poll
completions received via ptpsq.
Add ptpsq to the TX reporter recover, diagnose and dump methods.
Creation of ptpsqs is disabled by default, and can be enabled via
tx_port_ts private flag.
This patch steer all timestamp related packets to a ptpsq, but it
does not open the port timestamp support for it. The support will
be added in the following patch.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
MLX5E_RX_ERR_CQE Macro is used only in data-path, move it to the
appropriate header file.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SW group counter update function aggregates sw stats out of many
mlx5e_*_stats resides in a given mlx5e_channel_stats struct.
Split the function into a few helper functions.
This will be used later in the series to calculate specific
mlx5e_*_stats which are not defined inside mlx5e_channel_stats.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The skb fifo push/pop API used pre-defined attributes within the
mlx5e_txqsq.
In order to share the skb fifo API with other non-SQ use cases,
change the API input to get newly defined mlx5e_skb_fifo struct.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In order to be able to create an SQ outside of a channel context, remove
sq->channel direct pointer. This requires adding a direct pointer to:
netdevice, priv and mlx5_core in order to support SQs that are part of
mlx5e_channel. Use channel_stats from the corresponding CQ.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In order to be able to create an RQ outside of a channel context, remove
rq->channel direct pointer. This requires adding a direct pointer to:
ICOSQ and priv in order to support RQs that are part of mlx5e_channel.
Use channel_stats from the corresponding CQ.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In order to be able to create a CQ outside of a channel context, remove
cq->channel direct pointer. This requires adding a direct pointer to
channel statistics, netdevice, priv and to mlx5_core in order to support
CQs that are a part of mlx5e_channel.
In addition, parameters the were previously derived from the channel
like napi, NUMA node, channel stats and index are now assembled in
struct mlx5e_create_cq_param which is given to mlx5e_open_cq() instead
of channel pointer. Generalizing mlx5e_open_cq() allows opening CQ
outside of channel context which will be used in following patches in
the patch-set.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The drop RQ has very limited objects to be freed, and differs
from regular RQs in the context that it is freed from.
Add a dedicated function for it, use it where needed, and remove
the drop_rq-specific checks in the generic function.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Saeed Mahameed says:
====================
mlx5-next auxbus support
This pull request is targeting net-next and rdma-next branches.
This series provides mlx5 support for auxiliary bus devices.
It starts with a merge commit of tag 'auxbus-5.11-rc1' from
gregkh/driver-core into mlx5-next, then the mlx5 patches that will convert
mlx5 ulp devices (netdev, rdma, vdpa) to use the proper auxbus
infrastructure instead of the internal mlx5 device and interface management
implementation, which Leon is deleting at the end of this patchset.
Link: https://lore.kernel.org/alsa-devel/20201026111849.1035786-1-leon@kernel.org/
Thanks to everyone for the joint effort !
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
RDMA/mlx5: Remove IB representors dead code
net/mlx5: Simplify eswitch mode check
net/mlx5: Delete custom device management logic
RDMA/mlx5: Convert mlx5_ib to use auxiliary bus
net/mlx5e: Connect ethernet part to auxiliary bus
vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus
net/mlx5: Register mlx5 devices to auxiliary virtual bus
vdpa/mlx5: Make hardware definitions visible to all mlx5 devices
net/mlx5_core: Clean driver version and name
net/mlx5: Properly convey driver version to firmware
driver core: auxiliary bus: minor coding style tweaks
driver core: auxiliary bus: make remove function return void
driver core: auxiliary bus: move slab.h from include file
Add auxiliary bus support
====================
Link: https://lore.kernel.org/r/20201207053349.402772-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The m250_sel mux clock uses bit 4 in the PRG_ETH0 register. Fix this by
shifting the PRG_ETH0_CLK_M250_SEL_MASK accordingly as the "mask" in
struct clk_mux expects the mask relative to the "shift" field in the
same struct.
While here, get rid of the PRG_ETH0_CLK_M250_SEL_SHIFT macro and use
__ffs() to determine it from the existing PRG_ETH0_CLK_M250_SEL_MASK
macro.
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20201205213207.519341-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add an 'of_node_put()' call when a tested device node is not available.
Fixes: 94ae899b20 ("dpaa2-mac: add PCS support through the Lynx module")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20201206151339.44306-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We sometimes run into situations where a soft/hard reset of the adapter
takes a long time or fails to complete. Having additional messages that
include important adapter state info will hopefully help understand what
is happening, reduce the guess work and minimize requests to reproduce
problems with debug patches.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Link: https://lore.kernel.org/r/20201205022235.2414110-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Noticed some inconsistencies in packet statistics reporting.
This patch adds the missing Tx packet counter registers to
ethtool reporting and fixes the information strings for a
few of them.
Fixes: 16eb4c85c9 ("enetc: Add ethtool statistics")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201204171505.21389-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
hclge_dbg_reg_info[] is defined as an array of packed structure
accidentally. However, this array contains pointers, which are
no longer aligned naturally, and cannot be relocated on PPC64.
Hence, when compile-testing this driver on PPC64 with
CONFIG_RELOCATABLE=y (e.g. PowerPC allyesconfig), there will be
some warnings.
Since each field in structure hclge_qos_pri_map_cmd and
hclge_dbg_bitmap_cmd is type u8, the pragma packed is unnecessary
for these two structures as well, so remove the pragma packed in
hclge_debugfs.h to fix this issue, and this increases
hclge_dbg_reg_info[] by 4 bytes per entry.
Fixes: a582b78dfc ("net: hns3: code optimization for debugfs related to "dump reg"")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turned out that mlxsw_sp_ipip_fib_entry_op_gre4() does not need to
figure out the IP address and virtual router id. Those are exactly
the same as in the fib_entry it is called for. So just use that and
reduce mlxsw_sp_ipip_fib_entry_op_gre4() function to only call
mlxsw_sp_ipip_fib_entry_op_gre4_rtdp() make the ipip decap op
code similar to nve.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The indicated version fixes an issue whereby the MOMTE register would by
default enable mirroring of ECN-marked traffic from all traffic classes,
once the ECN mirroring was configured. This fix is necessary for offload
of RED "ecn_mark" qevent.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppresses the following coccinelle warning:
drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:139:3-7:
WARNING use flexible-array member instead
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppresses the following coccinelle warning:
drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:18:15-19: WARNING use flexible-array member instead
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlxsw triggers the 'devlink:devlink_hwmsg' tracepoint
whenever a request is sent to the device and whenever a response is
received from it. However, the tracepoint is not triggered when an event
(e.g., port up / down) is received from the device.
Also trace EMAD events in order to log a more complete picture of all
the exchanged hardware messages.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case a router interface (RIF) is configured for a LAG, make sure its
configuration is applied on the new LAG member.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide mlx5_core device instead of "priv" pointer while checking
eswith mode.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
After conversion to use auxiliary bus, all custom device management is
not needed anymore, delete it.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
The conversion to auxiliary bus solves long standing issue with
existing mlx5_ib<->mlx5_core coupling. It required to have both
modules in initramfs if one of them needed for the boot.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Change module registration logic to use auxiliary bus instead of custom
made mlx5 register interface.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
After recent changes there's no need any longer to define NUM_RX_DESC
as an unsigned value.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need to check min(budget, NUM_RX_DESC). At first budget
(NAPI_POLL_WEIGHT = 64) is less then NUM_RX_DESC (256).
And more important: Even in case of budget > NUM_RX_DESC we could
safely continue processing descriptors as long as they are owned by
the CPU. In addition replace rx_left with a normal counter variable,
this allows to simplify the code. Last but not least there's no need
any longer to pass the budget as an u32.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current assumption is that the felix DSA driver has flooding knobs
per traffic class, while ocelot switchdev has a single flooding knob.
This was correct for felix VSC9959 and ocelot VSC7514, but with the
introduction of seville VSC9953, we see a switch driven by felix.c which
has a single flooding knob.
So it is clear that we must do what should have been done from the
beginning, which is not to overwrite the configuration done by ocelot.c
in felix, but instead to teach the common ocelot library about the
differences in our switches, and set up the flooding PGIDs centrally.
The effect that the bogus iteration through FELIX_NUM_TC has upon
seville is quite dramatic. ANA_FLOODING is located at 0x00b548, and
ANA_FLOODING_IPMC is located at 0x00b54c. So the bogus iteration will
actually overwrite ANA_FLOODING_IPMC when attempting to write
ANA_FLOODING[1]. There is no ANA_FLOODING[1] in sevile, just ANA_FLOODING.
And when ANA_FLOODING_IPMC is overwritten with a bogus value, the effect
is that ANA_FLOODING_IPMC gets the value of 0x0003CF7D:
MC6_DATA = 61,
MC6_CTRL = 61,
MC4_DATA = 60,
MC4_CTRL = 0.
Because MC4_CTRL is zero, this means that IPv4 multicast control packets
are not flooded, but dropped. An invalid configuration, and this is how
the issue was actually spotted.
Reported-by: Eldar Gasanov <eldargasanov2@gmail.com>
Reported-by: Maxim Kochetkov <fido_max@inbox.ru>
Tested-by: Eldar Gasanov <eldargasanov2@gmail.com>
Fixes: 84705fc165 ("net: dsa: felix: introduce support for Seville VSC9953 switch")
Fixes: 3c7b51bd39 ("net: dsa: felix: allow flooding for all traffic classes")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201204175416.1445937-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When CONFIG_OF is disabled, there is a harmless warning about
an unused variable:
enetc_pf.c: In function 'enetc_phylink_create':
enetc_pf.c:981:17: error: unused variable 'dev' [-Werror=unused-variable]
Slightly rearrange the code to pass around the of_node as a
function argument, which avoids the problem without hurting
readability.
Fixes: 71b77a7a27 ("enetc: Migrate to PHYLINK and PCS_LYNX")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201204120800.17193-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 501ef3066c ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1607071782-34006-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When CONFIG_IPV6 is disabled, clang complains that a variable
is uninitialized for non-IPv4 data:
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:1046:6: error: variable 'cntrl1' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (tx_info->ip_family == AF_INET) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:1059:2: note: uninitialized use occurs here
cntrl1 |= T6_TXPKT_ETHHDR_LEN_V(maclen - ETH_HLEN) |
^~~~~~
Replace the preprocessor conditional with the corresponding C version,
and make the ipv4 case unconditional in this configuration to improve
readability and avoid the warning.
Fixes: 86716b51d1 ("ch_ktls: Update cheksum information")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201203222641.964234-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A number of ethernet drivers require crc32 functionality to be
avaialable in the kernel, causing a link error otherwise:
arm-linux-gnueabi-ld: drivers/net/ethernet/agere/et131x.o: in function `et1310_setup_device_for_multicast':
et131x.c:(.text+0x5918): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/cadence/macb_main.o: in function `macb_start_xmit':
macb_main.c:(.text+0x4b88): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/faraday/ftgmac100.o: in function `ftgmac100_set_rx_mode':
ftgmac100.c:(.text+0x2b38): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fec_main.o: in function `set_multicast_list':
fec_main.c:(.text+0x6120): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fman/fman_dtsec.o: in function `dtsec_add_hash_mac_address':
fman_dtsec.c:(.text+0x830): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fman/fman_dtsec.o:fman_dtsec.c:(.text+0xb68): more undefined references to `crc32_le' follow
arm-linux-gnueabi-ld: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_hwinfo.o: in function `nfp_hwinfo_read':
nfp_hwinfo.c:(.text+0x250): undefined reference to `crc32_be'
arm-linux-gnueabi-ld: nfp_hwinfo.c:(.text+0x288): undefined reference to `crc32_be'
arm-linux-gnueabi-ld: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.o: in function `nfp_resource_acquire':
nfp_resource.c:(.text+0x144): undefined reference to `crc32_be'
arm-linux-gnueabi-ld: nfp_resource.c:(.text+0x158): undefined reference to `crc32_be'
arm-linux-gnueabi-ld: drivers/net/ethernet/nxp/lpc_eth.o: in function `lpc_eth_set_multicast_list':
lpc_eth.c:(.text+0x1934): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_flow_tbl_do':
rocker_ofdpa.c:(.text+0x2e08): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_flow_tbl_del':
rocker_ofdpa.c:(.text+0x3074): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_port_fdb':
arm-linux-gnueabi-ld: drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.o: in function `mlx5dr_ste_calc_hash_index':
dr_ste.c:(.text+0x354): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/microchip/lan743x_main.o: in function `lan743x_netdev_set_multicast':
lan743x_main.c:(.text+0x5dc4): undefined reference to `crc32_le'
Add the missing 'select CRC32' entries in Kconfig for each of them.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Acked-by: Mark Einon <mark.einon@gmail.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Link: https://lore.kernel.org/r/20201203232114.1485603-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When building FSL_DPAA_ETH the following build error shows up:
/tmp/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c: In function ‘dpaa_fq_init’:
/tmp/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:1135:9: error: too few arguments to function ‘xdp_rxq_info_reg’
1135 | err = xdp_rxq_info_reg(&dpaa_fq->xdp_rxq, dpaa_fq->net_dev,
| ^~~~~~~~~~~~~~~~
Commit b02e5a0ebb ("xsk: Propagate napi_id to XDP socket Rx path")
added an extra argument to function xdp_rxq_info_reg and commit
d57e57d0cd ("dpaa_eth: add XDP_TX support") didn't know about that
extra argument.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/r/20201203144343.790719-1-anders.roxell@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-12-03
The main changes are:
1) Support BTF in kernel modules, from Andrii.
2) Introduce preferred busy-polling, from Björn.
3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh.
4) Memcg-based memory accounting for bpf objects, from Roman.
5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits)
selftests/bpf: Fix invalid use of strncat in test_sockmap
libbpf: Use memcpy instead of strncpy to please GCC
selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module
selftests/bpf: Add tp_btf CO-RE reloc test for modules
libbpf: Support attachment of BPF tracing programs to kernel modules
libbpf: Factor out low-level BPF program loading helper
bpf: Allow to specify kernel module BTFs when attaching BPF programs
bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier
selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF
selftests/bpf: Add support for marking sub-tests as skipped
selftests/bpf: Add bpf_testmod kernel module for testing
libbpf: Add kernel module BTF support for CO-RE relocations
libbpf: Refactor CO-RE relocs to not assume a single BTF object
libbpf: Add internal helper to load BTF data by FD
bpf: Keep module's btf_data_size intact after load
bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP
bpf: Adds support for setting window clamp
samples/bpf: Fix spelling mistake "recieving" -> "receiving"
bpf: Fix cold build of test_progs-no_alu32
...
====================
Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove exposed driver version as it was done in other drivers,
so module version will work correctly by displaying the kernel
version for which it is compiled.
And move mlx5_core module name to general include, so auxiliary drivers
will be able to use it as a basis for a name in their device ID tables.
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
mlx5 firmware expects driver version in specific format X.X.X, so
make it always correct and based on real kernel version aligned with
the driver.
Fixes: 012e50e109 ("net/mlx5: Set driver version into firmware")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
STEs format for Connect-X5 and Connect-X6DX different. Currently, on
Connext-X6DX the SW steering would break at some point when building STEs
w/o giving a proper error message. Fix this by checking the STE format of
the current device when initializing domain: add mlx5_ifc definitions for
Connect-X6DX SW steering, read FW capability to get the current format
version, and check this version when domain is being created.
Fixes: 26d688e33f ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Checksum calculation cannot be done in SW for TX kTLS HW offloaded
packets.
Offload it to the device, disregard the declared state of the TX
csum offload feature.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix build when CONFIG_IPV6 is not enabled by making a function
be built conditionally.
Fixes these build errors and warnings:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c: In function 'accel_fs_tcp_set_ipv6_flow':
../include/net/sock.h:380:34: error: 'struct sock_common' has no member named 'skc_v6_daddr'; did you mean 'skc_daddr'?
380 | #define sk_v6_daddr __sk_common.skc_v6_daddr
| ^~~~~~~~~~~~
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:55:14: note: in expansion of macro 'sk_v6_daddr'
55 | &sk->sk_v6_daddr, 16);
| ^~~~~~~~~~~
At top level:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:47:13: warning: 'accel_fs_tcp_set_ipv6_flow' defined but not used [-Wunused-function]
47 | static void accel_fs_tcp_set_ipv6_flow(struct mlx5_flow_spec *spec, struct sock *sk)
Fixes: 5229a96e59 ("net/mlx5e: Accel, Expose flow steering API for rules add/del")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When command interface is down, driver to reclaim all 4K page chucks that
were hold by the Firmeware. Fix a bug for 64K page size systems, where
driver repeatedly released only the first chunk of the page.
Define helper function to fill 4K chunks for a given Firmware pages.
Iterate over all unreleased Firmware pages and call the hepler per each.
Fixes: 5adff6a088 ("net/mlx5: Fix incorrect page count when in internal error")
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return negative error code -ENOENT from invalid configuration
error handling case instead of 0, as done elsewhere in this function.
Fixes: 4bb0432628 ("net: mvpp2: phylink support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201203141806.37966-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "skb" is freed by the transmit code in cxgb4_ofld_send() and we
shouldn't use it again. But in the current code, if we hit an error
later on in the function then the clean up code will call kfree_skb(skb)
and so it causes a double free.
Set the "skb" to NULL and that makes the kfree_skb() a no-op.
Fixes: d25f2f71f6 ("crypto: chtls - Program the TLS session Key")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8ilb6PtBRLWiSHp@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver core ignores the return value of struct device_driver::remove
because there is only little that can be done. For the shutdown callback
it's ps3_system_bus_shutdown() which ignores the return value.
To simplify the quest to make struct device_driver::remove return void,
let struct ps3_system_bus_driver::remove return void, too. All users
already unconditionally return 0, this commit makes it obvious that
returning an error code is a bad idea and ensures future users behave
accordingly.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126165950.2554997-2-u.kleine-koenig@pengutronix.de
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 72b05b9940 ("pasemi_mac: RX/TX ring management cleanup")
Fixes: 8d636d8bc5 ("pasemi_mac: jumbo frame support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1606903035-1838-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: b1fb1f280d ("cxgb3 - Fix dma mapping error path")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/1606902965-1646-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Low level mlx5 updates required by both netdev and rdma trees:
net/mlx5: Treat host PF vport as other (non eswitch manager) vport
net/mlx5: Enable host PF HCA after eswitch is initialized
net/mlx5: Rename peer_pf to host_pf
net/mlx5: Make API mlx5_core_is_ecpf accept const pointer
net/mlx5: Export steering related functions
net/mlx5: Expose other function ifc bits
net/mlx5: Expose IP-in-IP TX and RX capability bits
net/mlx5: Update the hardware interface definition for vhca state
net/mlx5: Update the list of the PCI supported devices
net/mlx5: Avoid exposing driver internal command helpers
net/mlx5: Add ts_cqe_to_dest_cqn related bits
net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits
net/mlx5: Check dr mask size against mlx5_match_param size
net/mlx5: Add sampler destination type
net/mlx5: Add sample offload hardware bits and structures
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl/IOZcACgkQSD+KveBX
+j4J8wgAuxwflrYrbCWXV7LE08J7T7ZHRDE+jEbaZ0Zp9mOsYDDpcifpKwy2EVRf
RKcpMYh/GzAljmEpeWIAlMxmlpXhKWXTDruWCx73r1jvdXf/RU24/zQHa6BjeiDo
rMB8bgiW4a66+z4LcN/U6ahbVM5gScBNEt2sS1OIi9ZInngGVo9FgfhYMpERPNcH
3+mcHulCnGBNbbLwoTllOcgbxexn+xoByukg5Z0ddBJp007DMjzBIWDpDS0y2HaT
jGo1LYONgRc3zoGVmdeu9F+tSsWBIgsaiyGxKj1T/8sZUaNz2TKj9VOiYIj9BLff
cp6GRc88k7HWA4tImSHQiLbK6cx+yA==
=mjvI
-----END PGP SIGNATURE-----
Merge tag 'mlx5-next-2020-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-next-2020-12-02
Low level mlx5 updates required by both netdev and rdma trees.
* tag 'mlx5-next-2020-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Treat host PF vport as other (non eswitch manager) vport
net/mlx5: Enable host PF HCA after eswitch is initialized
net/mlx5: Rename peer_pf to host_pf
net/mlx5: Make API mlx5_core_is_ecpf accept const pointer
net/mlx5: Export steering related functions
net/mlx5: Expose other function ifc bits
net/mlx5: Expose IP-in-IP TX and RX capability bits
net/mlx5: Update the hardware interface definition for vhca state
net/mlx5: Update the list of the PCI supported devices
net/mlx5: Avoid exposing driver internal command helpers
net/mlx5: Add ts_cqe_to_dest_cqn related bits
net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits
net/mlx5: Check dr mask size against mlx5_match_param size
net/mlx5: Add sampler destination type
net/mlx5: Add sample offload hardware bits and structures
====================
Link: https://lore.kernel.org/r/20201203011010.213440-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These debugfs never return NULL so all this code will never be run.
In the normal case, (and in this case particularly), the debugfs
functions are not supposed to be checked for errors so all this error
checking code can be safely deleted.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8c6vpapJDYI2eWI@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The timestamp fields should be copied to new skb too in
A-050385 workaround for later TX timestamping handling.
Fixes: 3c68b8fffb ("dpaa_eth: FMan erratum A050385 workaround")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Link: https://lore.kernel.org/r/20201201075258.1875-1-yangbo.lu@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On chip versions supporting tally counter reset we currently update
the counters after a reset although we know all counters are zero.
Skip this unnecessary step.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/526618b2-b1bf-1844-b82a-dab2df7bdc8f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Order of operations is slightly more correct in the driver
to change the netdev->mtu after the queues have been stopped
rather than before.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove memory allocation fail messages where the OOM stack
trace will make it obvious which allocation request failed.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After adding support for QinQ, a.k.a 802.1ad protocol, there are a few
scenarios that should be vetoed.
The vetoes are motivated by various ASIC limitations.
For example, a port that is member in a 802.1ad bridge cannot have 802.1q
uppers as the port needs to be configured to treat 802.1q packets as
untagged packets.
Veto all those unsupported scenarios and return suitable messages.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
802.1ad, also known as QinQ is an extension to the 802.1q standard, which
is concerned with passing possibly 802.1q-tagged packets through another
VLAN-like tunnel. The format of 802.1ad tag is the same as 802.1q, except
it uses the EtherType of 0x88a8, unlike 802.1q's 0x8100.
Add support for 802.1ad protocol. Most of the configuration is the same
as 802.1q. The difference is that before a port joins an 802.1ad bridge it
needs to be configured to recognize 802.1ad packets as tagged and other
packets (e.g., 802.1q) as untagged.
VXLAN is not currently supported with 802.1ad bridge, so return an error
with an appropriate extack message.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The code in mlxsw_sp_bridge_8021q_port_{join, leave}() can be used also
for 802.1ad bridge.
Move the code to functions called
mlxsw_sp_bridge_vlan_aware_port_{join, leave}() and call them from
mlxsw_sp_bridge_8021q_port_{join, leave}() respectively to enable code
reuse.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, when pushing a PVID at ingress, mlxsw always uses 802.1q
EtherType.
Make this EtherType configurable by extending mlxsw_sp_port_pvid_set()
with an EtherType argument.
This is a preparation for QinQ support, that needs to push a PVID with
802.1ad EtherType.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By default, the device considers both 802.1ad and 802.1q packets as tagged,
but this is not supported by the driver. It only supports VLAN and bridge
devices that use 802.1q protocol.
Instead, configure the device to only treat 802.1q packets as tagged
packets.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
et_vlan field is used to configure which EtherType is used when VLAN is
pushed at ingress (for untagged packets or for QinQ push mode).
It will be used to configure tagging with ether_type1 (i.e., 0x88A8) for
QinQ mode.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SPVC configures the port to identify packets as untagged / single tagged /
double tagged packets based on the packet EtherTypes.
It will be used to classify 802.1q packets as untagged and 802.1ad packets
as tagged when received by ports member in a 802.1ad bridge.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bellow HNAE3_DEVICE_VERSION_V3, MAC pause mode just support one
TC, when enabled multiple TCs, force enable PFC mode.
HNAE3_DEVICE_VERSION_V3 can support MAC pause mode on multiple
TCs, so when enable multiple TCs, just keep MAC pause mode,
and enable PFC mode just according to the user settings.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For the device whose version is above V3(include V3), the hardware
can do checksum offload for the non-tunnel udp packet, who has
a dest port as the IANA assigned. So add a check for devcie's verion
in hns3_tunnel_csum_bug().
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since TX hardware checksum and RX completion checksum have been
supported now, so add related information in hns3_dbg_bd_info().
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For the device who has the capability to handle udp tunnel
checksum segmentation, add support for it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, device V1 and V2 do not support segmentation
offload for UDP based tunnel packet who needs outer UDP
checksum offload, so there is a workaround in the driver
to set the checksum of the outer UDP checksum as zero. This
is not what the user wants, so remove this feature for
device V1 and V2, add support for it later(when the device
has the ability to do that).
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For the device that supports TX hardware checksum, the hardware
can calculate the checksum from the start and fill the checksum
to the offset position, which reduces the operations of
calculating the type and header length of L3/L4. So add this
feature for the HNS3 ethernet driver.
The previous simple BD description is unsuitable, rename it as
HW TX CSUM.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In some cases (for example ip fragment), hardware will
calculate the checksum of whole packet in RX, and setup
the HNS3_RXD_L2_CSUM_B flag in the descriptor, so add
support to utilize this checksum.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The CNIC kconfig symbol selects UIO and UIO depends on MMU.
Since 'select' does not follow dependency chains, add the same MMU
dependency to CNIC.
Quietens this kconfig warning:
WARNING: unmet direct dependencies detected for UIO
Depends on [n]: MMU [=n]
Selected by [m]:
- CNIC [=m] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_BROADCOM [=y] && PCI [=y] && (IPV6 [=m] || IPV6 [=m]=n)
Fixes: adfc5217e9 ("broadcom: Move the Broadcom drivers")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Rasesh Mody <rmody@marvell.com>
Cc: GR-Linux-NIC-Dev@marvell.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20201129070843.3859-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TX completions received with an error return code are not
being processed properly. When an error code is seen, do not
proceed to the next completion before cleaning up the existing
entry's data structures.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that received Subordinate Command-Response Queue (SCRQ)
entries are properly read in order by the driver. These queues
are used in the ibmvnic device to process RX buffer and TX completion
descriptors. dma_rmb barriers have been added after checking for a
pending descriptor to ensure the correct descriptor entry is checked
and after reading the SCRQ descriptor to ensure the entire
descriptor is read before processing.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
there is kernel panic in inet_twsk_free() while chtls
module unload when socket is in TIME_WAIT state because
sk_prot_creator was not preserved on connection socket.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Udai Sharma <udai.sharma@chelsio.com>
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201125214913.16938-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For XDP TX, even tough we start out with correctly aligned buffers, the
XDP program might change the data's alignment. For REDIRECT, we have no
control over the alignment either.
Create a new workaround for xdp_frame structures to verify the erratum
conditions and move the data to a fresh buffer if necessary. Create a new
xdp_frame for managing the new buffer and free the old one using the XDP
API.
Due to alignment constraints, all frames have a 256 byte headroom that
is offered fully to XDP under the erratum. If the XDP program uses all
of it, the data needs to be move to make room for the xdpf backpointer.
Disable the metadata support since the information can be lost.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Explicitly point that the current workaround addresses skbs. This change is
in preparation for adding a workaround for XDP scenarios.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After transmission, the frame is returned on confirmation queues for
cleanup. For this, store a backpointer to the xdp_frame in the private
reserved area at the start of the TX buffer.
No TX batching support is implemented at this time.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use an xdp_frame structure for managing the frame. Store a backpointer to
the structure at the start of the buffer before enqueueing for cleanup
on TX confirmation. Reserve DPAA_TX_PRIV_DATA_SIZE bytes from the frame
size shared with the XDP program for this purpose. Use the XDP
API for freeing the buffer when it returns to the driver on the TX
confirmation path.
The frame queues are shared with the netstack. The DPAA driver is a LLTX
driver so no explicit locking is required on transmission.
This approach will be reused for XDP REDIRECT.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement the ndo_change_mtu callback to prevent users from setting an
MTU that would permit processing of S/G frames. The maximum MTU size
is dependent on the buffer size.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement the XDP_DROP and XDP_PASS actions.
Avoid draining and reconfiguring the buffer pool at each XDP
setup/teardown by increasing the frame headroom and reserving
XDP_PACKET_HEADROOM bytes from the start. Since we always reserve an
entire page per buffer, this change only impacts Jumbo frame scenarios
where the maximum linear frame size is reduced by 256 bytes. Multi
buffer Scatter/Gather frames are now used instead in these scenarios.
Allow XDP programs to access the entire buffer.
The data in the received frame's headroom can be overwritten by the XDP
program. Extract the relevant fields from the headroom while they are
still available, before running the XDP program.
Since the headroom might be resized before the frame is passed up to the
stack, remove the check for a fixed headroom value when building an skb.
Allow the meta data to be updated and pass the information up the stack.
Scatter/Gather frames are dropped when XDP is enabled.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We maintain an skb backpointer in the software annotations area of Tx
frames. Introduce a structure for explicit handling.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
Reduce the wait time for Command Response Queue response from 30 seconds
to 20 seconds, as recommended by VIOS and Power Hypervisor teams.
Fixes: bd0b672313 ("ibmvnic: Move login and queue negotiation into ibmvnic_open")
Fixes: 53da09e929 ("ibmvnic: Add set_link_state routine for setting adapter link state")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reset timeout is going off right after adapter reset. This patch ensures
that timeout is scheduled if it has been 5 seconds since the last reset.
5 seconds is the default watchdog timeout.
Fixes: ed651a1087 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
send_login() does not check for the result of ibmvnic_send_crq() of the
login request. This results in the driver needlessly retrying the login
10 times even when CRQ is no longer active. Check the return code and
give up in case of errors in sending the CRQ.
The only time we want to retry is if we get a PARITALSUCCESS response
from the partner.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that
the worker thread will start reset process and free the login response
buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will
then try to access the login response buffer and crash.
Have ibmvnic track pending logins and discard any stale login responses.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If auto-priority failover is enabled, the backing device needs time
to settle if hard resetting fails for any reason. Add a delay of 60
seconds before retrying the hard-reset.
Fixes: 2770a7984d ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a failed reset, driver could end up in VNIC_PROBED or VNIC_CLOSED
state and cannot recover in subsequent resets, leaving it offline.
This patch restores the adapter state to reset_state, the original
state when reset was called.
Fixes: b27507bb59 ("net/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run")
Fixes: 2770a7984d ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
scrq->msgs could be NULL during device reset, causing Linux to crash.
So, check before memset scrq->msgs.
Fixes: c8b2ad0a4a ("ibmvnic: Sanitize entire SCRQ buffer on reset")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When ibmvnic fails to reset, it breaks out of the reset loop and frees
all of the remaining resets from the workqueue. Doing so prevents the
adapter from recovering if no reset is scheduled after that. Instead,
have the driver continue to process resets on the workqueue.
Remove the no longer need free_all_rwi().
Fixes: ed651a1087 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Inconsistent login with the vnicserver is causing the device to be
removed. This does not give the device a chance to recover from error
state. This patch schedules a FATAL reset instead to bring the adapter
up.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Trivial conflict in CAN, keep the net-next + the byteswap wrapper.
Conflicts:
drivers/net/can/usb/gs_usb.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The device supports an operation that allows the driver to issue one
request to update the adjacency index for all the routes in a given
virtual router (VR) from old index and size to new ones. This is useful
in case the configuration of a certain nexthop group is updated and its
adjacency index changes.
Currently, the driver does not use this operation in an efficient
manner. It iterates over all the routes using the nexthop group and
issues an update request for the VR if it is not the same as the
previous VR.
Instead, use the VR tracking added in the previous patch to update the
adjacency index once for each VR currently using the nexthop group.
Example:
8k IPv6 routes were added in an alternating manner to two VRFs. All the
routes are using the same nexthop object ('nhid 1').
Before:
# perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3
Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':
16,385 devlink:devlink_hwmsg
4.255933213 seconds time elapsed
0.000000000 seconds user
0.666923000 seconds sys
Number of EMAD transactions corresponds to number of routes using the
nexthop group.
After:
# perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3
Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':
3 devlink:devlink_hwmsg
0.077655094 seconds time elapsed
0.000000000 seconds user
0.076698000 seconds sys
Number of EMAD transactions corresponds to number of VRFs / VRs.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For each nexthop group, track in which virtual routers (VRs) the group is
used. This is going to be used by the next patch to perform a more
efficient adjacency index update whenever the group's adjacency index
changes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the rare case where the adjacency pointer cannot be updated for a
given virtual router, rollback the operation so that virtual routers
that are already using the new index will use the old one again.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlxsw_sp_adj_index_mass_update_vr() only needs the virtual router's
identifier and protocol, so pass them directly. In a subsequent patch
the caller will not have access to the pointer.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When eswitch manager is running on ECPF, host PF should be treated
as non eswitch manager port, similar to other VF vports.
Fail to do so, results in firmware treating PF's vport as ECPF
vport for eswitch ACL tables.
Non zero check to figure out if a given vport is other vport or not
is not sufficient becase PF vport number = 0 on ECPF.
Hence, create esw acl tables with an attribute of other vport.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently ECPF enables external host PF too early in the initialization
sequence for Ethernet links when ECPF is eswitch manager.
Due to this, when external host PF driver is loaded, host PF's HCA CAP has
inner_ip_version supported by NIC RX flow table.
This capability is later updated by firmware after ECPF driver enables
ENCAP/DECAP as eswitch manager.
This results into a timing race condition, where CREATE_TIR command
fails with a below syndrome on host PF.
mlx5_cmd_check:775:(pid 510): CREATE_TIR(0x900) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0x562b00)
Hence, enable the external host PF after necessary eswitch and per vport
initialization is completed.
Continue to enable host PF when eswitch manager capability is off for a
ECPF.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To match the hardware spec, rename peer_pf to host_pf.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Export
mlx5_create_flow_table()
mlx5_create_flow_group()
mlx5_destroy_flow_group().
These symbols are required by the VDPA implementation to create rules
that consume VDPA specific traffic.
We do not deal with putting the prototypes in a header file since they
already exist in include/linux/mlx5/fs.h.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5 command init and cleanup routines are internal to mlx5_core driver.
Hence, avoid exporting them and move their definition to mlx5_core
driver's internal file mlx5_core.h
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add misc4 match params to enable matching on prog_sample_fields.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This is to allow passing misc4 match param from userspace when
function like ib_flow_matcher_create is called.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The flow sampler object is a new destination type. Add a new member
for the flow destination.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2020-11-24
This series contains updates to i40e and igbvf drivers.
Marek removes a redundant assignment for i40e.
Stefan Assmann corrects reporting of VF link speed for i40e.
Karen revises a couple of error messages to warnings for igbvf as they
could be misinterpreted as issues when they are not.
v2: Dropped PTP patch as it's being updated.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igbvf: Refactor traces
i40e: report correct VF link speed when link state is set to enable
i40e: remove redundant assignment
====================
Link: https://lore.kernel.org/r/20201124165245.2844118-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently lock gets freed only if timeout expires, but missed a
case when HW returns failure and goes for cleanup.
Fixes: efca3878a5 ("ch_ktls: Issue if connection offload fails")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Link: https://lore.kernel.org/r/20201125072626.10861-1-rohitm@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tc-taprio base time indicates the beginning of the tc-taprio
schedule, which is cyclic by definition (where the length of the cycle
in nanoseconds is called the cycle time). The base time is a 64-bit PTP
time in the TAI domain.
Logically, the base-time should be a future time. But that imposes some
restrictions to user space, which has to retrieve the current PTP time
from the NIC first, then calculate a base time that will still be larger
than the base time by the time the kernel driver programs this value
into the hardware. Actually ensuring that the programmed base time is in
the future is still a problem even if the kernel alone deals with this.
Luckily, the enetc hardware already advances a base-time that is in the
past into a congruent time in the immediate future, according to the
same formula that can be found in the software implementation of taprio
(in taprio_get_start_time):
/* Schedule the start time for the beginning of the next
* cycle.
*/
n = div64_s64(ktime_sub_ns(now, base), cycle);
*start = ktime_add_ns(base, (n + 1) * cycle);
There's only one problem: the driver doesn't let the hardware do that.
It interferes with the base-time passed from user space, by special-casing
the situation when the base-time is zero, and replaces that with the
current PTP time. This changes the intended effective base-time of the
schedule, which will in the end have a different phase offset than if
the base-time of 0.000000000 was to be advanced by an integer multiple
of the cycle-time.
Fixes: 34c6adf197 ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201124220259.3027991-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 7579262478 ("net: stmmac: add flexible PPS to dwmac
4.10a") was intended to modify the struct dwmac410_ops, but it got
somehow badly merged and modified the struct dwmac4_ops.
Revert the modification in struct dwmac4_ops and re-apply it
properly in struct dwmac410_ops.
Fixes: 7579262478 ("net: stmmac: add flexible PPS to dwmac 4.10a")
Signed-off-by: Antonio Borneo <antonio.borneo@st.com>
Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Link: https://lore.kernel.org/r/20201124223729.886992-1-antonio.borneo@st.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Based on the discussion with Sukadev Bhattiprolu and Dany Madden,
we believe that checking adapter->resetting bit is preferred
since RESETTING state flag is not as strict as resetting bit.
RESETTING state flag is removed since it is verbose now.
Fixes: 7d7195a026 ("ibmvnic: Do not process device remove during device reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver's ISR sends a 'software interrupt' event to the probe()
thread using the following method:
- probe(): write 0 to flag, enable s/w interrupt
- probe(): poll on flag, relax using usleep_range()
- ISR : write 1 to flag
Replace with wake_up() / wait_event_timeout(). Besides being easier
to get right, this abstraction has better timing and memory
consistency properties.
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201123191529.14908-2-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For no apparent reason, this function reads the INT_STS register, and
checks if the software interrupt bit is set. These things have already
been carried out by this function's only caller.
Clean up by removing the redundant code.
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201123191529.14908-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch fixes two lines in which the rx_offset received by the device
wasn't taken into account:
- prefetch function:
In our driver the copied data would reside in
rx_info->page + rx_headroom + rx_offset
so the prefetch function is changed accordingly.
- setting page_offset to zero for descriptors > 1:
for every descriptor but the first, the rx_offset is zero. Hence
the page_offset value should be set to rx_headroom.
The previous implementation changed the value of rx_info after
the descriptor was added to the SKB (essentially providing wrong
page offset).
Fixes: 68f236df93 ("net: ena: add support for the rx offset feature")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ENA driver uses the readless mechanism, which uses DMA, to find
out what the DMA mask is supposed to be.
If DMA is used without setting the dma_mask first, it causes the
Intel IOMMU driver to think that ENA is a 32-bit device and therefore
disables IOMMU passthrough permanently.
This patch sets the dma_mask to be ENA_MAX_PHYS_ADDR_SIZE_BITS=48
before readless initialization in
ena_device_init()->ena_com_mmio_reg_read_request_init(),
which is large enough to workaround the intel_iommu issue.
DMA mask is set again to the correct value after it's received from the
device after readless is initialized.
The patch also changes the driver to use dma_set_mask_and_coherent()
function instead of the two pci_set_dma_mask() and
pci_set_consistent_dma_mask() ones. Both methods achieve the same
effect.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Mike Cui <mikecui@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After request id is checked in validate_rx_req_id() its value is still
used in the line
rx_ring->free_ids[next_to_clean] =
rx_ring->ena_bufs[i].req_id;
even if it was found to be out-of-bound for the array free_ids.
The patch moves the request id to an earlier stage in the napi routine and
makes sure its value isn't used if it's found out-of-bounds.
Fixes: 30623e1ed1 ("net: ena: avoid memory access violation by validating req_id properly")
Signed-off-by: Ido Segev <idose@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Build skb_shared_info on mvneta_rx_swbm stack and sync it to xdp_buff
skb_shared_info area only on the last fragment. Leftover cache miss in
mvneta_swbm_rx_frame will be addressed introducing mb bit in
xdp_buff/xdp_frame struct
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pass skb_shared_info pointer from mvneta_xdp_put_buff caller. This is a
preliminary patch to reduce accesses to skb_shared_info area and reduce
cache misses.
Remove napi parameter in mvneta_xdp_put_buff signature since it is always
run in NAPI context
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tx/Rx FIFO is a HW resource limited by total size, but shared
by all ports of same CP110 and impacting port-performance.
Do not divide the FIFO for ports which are not enabled in DTS,
so active ports could have more FIFO.
No change in FIFO allocation if all 3 ports on the communication
processor enabled in DTS.
The active port mapping should be done in probe before FIFO-init.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/1606154073-28267-1-git-send-email-stefanc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The dpaa2 driver depends on devlink, so it should select
NET_DEVLINK in order to fix compile errors, such as:
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.o: in function `dpaa2_eth_rx_err':
dpaa2-eth.c:(.text+0x3cec): undefined reference to `devlink_trap_report'
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.o: in function `dpaa2_eth_dl_info_get':
dpaa2-eth-devlink.c:(.text+0x160): undefined reference to `devlink_info_driver_name_put'
Fixes: ceeb03ad8e ("dpaa2-eth: add basic devlink support")
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20201123163553.1666476-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Register with devlink the blackhole_nexthop trap so that mlxsw will be
able to report packets dropped due to a blackhole nexthop.
The internal trap identifier is "DISCARD_ROUTER3", which traps packets
dropped in the adjacency table.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for blackhole nexthops by programming them to the adjacency
table with a discard action.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two are the same, but for blackhole nexthops we will not have an
associated neighbour struct, so resolve the RIF from the nexthop struct
itself instead.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the driver creates a loopback RIF during its initialization, it
can be used to program the adjacency entries for unresolved nexthops
instead of other RIFs. The loopback RIF is guaranteed to exist for the
entire life time of the driver, unlike other RIFs that come and go.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unresolved nexthops are currently written to the adjacency table with a
discard action. Packets hitting such entries are trapped to the CPU via
the 'DISCARD_ROUTER3' trap which can be enabled or disabled on demand,
but is always enabled in order to ensure the kernel can resolve the
unresolved neighbours.
This trap will be needed for blackhole nexthops support. Therefore, move
unresolved nexthops to explicitly program the adjacency entries with a
trap action and a different trap identifier, 'RTR_EGRESS0'.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Up until now RIFs (router interfaces) were created on demand (e.g.,
when an IP address was added to a netdev). However, sometimes the device
needs to be provided with a RIF when one might not be available.
For example, adjacency entries that drop packets need to be programmed
with an egress RIF despite the RIF not being used to forward packets.
Create such a RIF during initialization so that it could be used later
on to support blackhole nexthops.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When performing IPv6 forwarding, there is an expectation that SKBs
will have some headroom. When forwarding a packet from the aquantia
driver, this does not always happen, triggering a kernel warning.
aq_ring.c has this code (edited slightly for brevity):
if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);
} else {
skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
There is a significant difference between the SKB produced by these
2 code paths. When napi_alloc_skb creates an SKB, there is a certain
amount of headroom reserved. However, this is not done in the
build_skb codepath.
As the hardware buffer that build_skb is built around does not
handle the presence of the SKB header, this code path is being
removed and the napi_alloc_skb path will always be used. This code
path does have to copy the packet header into the SKB, but it adds
the packet data as a frag.
Fixes: 018423e90b ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Lincoln Ramsay <lincoln.ramsay@opengear.com>
Link: https://lore.kernel.org/r/MWHPR1001MB23184F3EAFA413E0D1910EC9E8FC0@MWHPR1001MB2318.namprd10.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This driver uses a normal timer for TX coalescing, which means that the
with the default tx-usecs of 1000 microseconds the cleanups actually
happen 10 ms or more later with HZ=100. This leads to very low
througput with TCP when bridged to a slow link such as a 4G modem. Fix
this by using an hrtimer instead.
On my ARM platform with HZ=100 and the default TX coalescing settings
(tx-frames 25 tx-usecs 1000), with "tc qdisc add dev eth0 root netem
delay 60ms 40ms rate 50Mbit" run on the server, netperf's TCP_STREAM
improves from ~5.5 Mbps to ~100 Mbps.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/r/20201120150208.6838-1-vincent.whitchurch@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Refactoring "PF still resetting" and changing "Failed
to add vlan id" to "Vlan id is not added"
messages because previous version looked like a bug
- it informed about changes that worked as
designed but might confuse users
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the virtual link state was set to "enable" ethtool would report
link speed as 40000Mb/s regardless of the underlying device.
Report the correct link speed.
Example from a XXV710 NIC.
Before:
$ ip link set ens3f0 vf 0 state auto
$ ethtool enp8s2 | grep Speed
Speed: 25000Mb/s
$ ip link set ens3f0 vf 0 state enable
$ ethtool enp8s2 | grep Speed
Speed: 40000Mb/s
After:
$ ip link set ens3f0 vf 0 state auto
$ ethtool enp8s2 | grep Speed
Speed: 25000Mb/s
$ ip link set ens3f0 vf 0 state enable
$ ethtool enp8s2 | grep Speed
Speed: 25000Mb/s
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove a redundant assignment of the software ring pointer in the i40e
driver. The variable is assigned twice with no use in between, so just
get rid of the first occurrence.
Fixes: 3b4f0b66c2 ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL")
Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Get rid of the __call_single_node union and cleanup the API a little
to avoid external code relying on the structure layout as much.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
linux/netdevice.h is included in very many places, touching any
of its dependecies causes large incremental builds.
Drop the linux/ethtool.h include, linux/netdevice.h just needs
a forward declaration of struct ethtool_ops.
Fix all the places which made use of this implicit include.
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20201120225052.1427503-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Memory allocation are done with 'dma_alloc_coherent()'. Be consistent
and use 'dma_free_coherent()' to free the corresponding memory.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20201121090330.1332543-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by
an equivalent 'dma_set_mask_and_coherent()' which is much less verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20201121090302.1332491-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prevent VFs from resetting when PF driver is being unloaded:
- introduce new pf state: __I40E_VF_RESETS_DISABLED;
- check if pf state has __I40E_VF_RESETS_DISABLED state set,
if so, disable any further VFLR event notifications;
- when i40e_remove (rmmod i40e) is called, disable any resets on
the VFs;
Previously if there were bare-metal VFs passing traffic and PF
driver was removed, there was a possibility of VFs triggering a Tx
timeout right before iavf_remove. This was causing iavf_close to
not be called because there is a check in the beginning of iavf_remove
that bails out early if adapter->state < IAVF_DOWN_PENDING. This
makes it so some resources do not get cleaned up.
Fixes: 6a9ddb36ee ("i40e: disable IOV before freeing resources")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20201120180640.3654474-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As pointed out by Herbert in a recent related patch, the LSM hooks do
not have the necessary address family information to use the flowi
struct safely. As none of the LSMs currently use any of the protocol
specific flowi information, replace the flowi pointers with pointers
to the address family independent flowi_common struct.
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add support to choose RSS flow key algorithm with IPv4 transport protocol
field included in hashing input data. This will be enabled by default.
There-by enabling 3/5 tuple hash
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: George Cherian <george.cherian@marvell.com>
Link: https://lore.kernel.org/r/20201120093906.2873616-1-george.cherian@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sometimes it takes longer than 5 seconds (watchdog timeout) to complete
failover, migration, and other resets. In stead of scheduling another
timeout reset, we wait for the current one to complete.
Suggested-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 61d3e1d9bc ("ibmvnic: Remove netdev notify for failover resets")
excluded the failover case for notify call because it said
netdev_notify_peers() can cause network traffic to stall or halt.
Current testing does not show network traffic stall
or halt because of the notify call for failover event.
netdev_notify_peers may be used when a device wants to inform the
rest of the network about some sort of a reconfiguration
such as failover or migration.
It is unnecessary to call that in other events like
FATAL, NON_FATAL, CHANGE_PARAM, and TIMEOUT resets
since in those scenarios the hardware does not change.
If the driver must do a hard reset, it is necessary to notify peers.
Fixes: 61d3e1d9bc ("ibmvnic: Remove netdev notify for failover resets")
Suggested-by: Brian King <brking@linux.vnet.ibm.com>
Suggested-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When netdev_notify_peers was substituted in
commit 986103e792 ("net/ibmvnic: Fix RTNL deadlock during device reset"),
call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev) was missed.
Fix it now.
Fixes: 986103e792 ("net/ibmvnic: Fix RTNL deadlock during device reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adds debugfs to dump new shaping parameters: rate and flag.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the calculation of the driver is fixed, if the number of
queue or clock changed, the calculated result may be inaccurate.
So for compatible and maintainable, add a new flag to tell the
firmware to calculate the shaping parameters with the specified
rate.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For HNAE3_DEVICE_VERSION_V3, a maximum of 1281 interrupt
resources are supported. To utilize these new resources,
extend the corresponding field or variable to 16bit type,
and remove the restriction of NIC client that only use a
maximum of 65 interrupt vectors. In addition, the I/O address
of the extended interrupt resources are different, so an extra
handler is needed.
Currently, the total number of interrupts is the sum of RoCE's
number and RoCE's offset (RoCE is in front of NIC), since
the number of both NIC and RoCE are same. For readability,
rewrite the corresponding field of the command, rename the
RoCE's offset field as the number of NIC interrupts, then
the total number of interrupts is sum of the number of RoCE
and NIC, and replace vport->back with hdev in
hclge_init_roce_base_info() for simplifying the code.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For device who has device memory accessed through the PCI BAR4,
IO descriptor push of NIC and direct WQE(Work Queue Element) of
RoCE will use this device memory, so add support for mapping
this device memory, and add this info to the RoCE client whose
new feature needs.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For DEVICE_VERSION_V1/2, there are total 1024 queues and
queue sets. For DEVICE_VERSION_V3, it increases to 1280,
and can be assigned to one pf, so remove the limitation
of 1024.
To keep compatible with DEVICE_VERSION_V1/2 and old driver
version, the queue number is split into two part:
tqp_num(range 0~1023) and ext_tqp_num(range 1024~1279).
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 9d2e5e9eeb ("cxgb4/ch_ktls: decrypted bit is not enough")
whenever CONFIG_TLS=m and CONFIG_CHELSIO_T4=y, the following build
failure occurs:
ld: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.o: in function
`cxgb_select_queue':
cxgb4_main.c:(.text+0x2dac): undefined reference to `tls_validate_xmit_skb'
Fix this by ensuring that if TLS is set to be a module, CHELSIO_T4 will
also be compiled as a module. As otherwise the cxgb4 driver will not be
able to access TLS' symbols.
Fixes: 9d2e5e9eeb ("cxgb4/ch_ktls: decrypted bit is not enough")
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Link: https://lore.kernel.org/r/20201120192528.615-1-tseewald@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reduce the amount of time spent replenishing RX buffers by only doing
so once available buffers has fallen under a certain threshold, in this
case half of the total number of buffers, or if the polling loop exits
before the packets processed is less than its budget. Non-exhaustion of
NAPI budget implies lower incoming packet pressure, allowing the leeway
to refill the buffers in preparation for any impending burst.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Take advantage of the additional optimizations in netdev_alloc_skb when
allocating socket buffers to be used for packet reception.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the current NAPI polling loop exits without completing it's
budget, only re-enable interrupts if there are no entries remaining
in the queue and napi_complete_done is successful. If there are entries
remaining on the queue that were missed, restart the polling loop.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PCI bus slowdowns were observed on IBM VNIC devices as a result
of partial cache line writes and non-cache aligned full cache line writes.
Ensure that packet data buffers are cache-line aligned to avoid these
slowdowns.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is not longer used, so remove it.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove unused and superfluous code and members in
existing TX implementation and data structures.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Include support for the xmit_more feature utilizing the
H_SEND_SUB_CRQ_INDIRECT hypervisor call which allows the sending
of multiple subordinate Command Response Queue descriptors in one
hypervisor call via a DMA-mapped buffer. This update reduces hypervisor
calls and thus hypervisor call overhead per TX descriptor.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Utilize the H_SEND_SUB_CRQ_INDIRECT hypervisor call to send
multiple RX buffer descriptors to the device in one hypervisor
call operation. This change will reduce the number of hypervisor
calls and thus hypervisor call overhead needed to transmit
RX buffer descriptors to the device.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch introduces the infrastructure to send batched subordinate
Command Response Queue descriptors, which are used by the ibmvnic
driver to send TX frame and RX buffer descriptors.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some chip versions have a hw bug resulting in lost door bell rings.
To work around this the doorbell is also rung whenever we still have
tx descriptors in flight after having cleaned up tx descriptors.
These PCI(e) writes come at a cost, therefore let's reduce the number
of extra doorbell rings.
If skb is NULL then this means:
- last cleaned-up descriptor belongs to a skb with at least one fragment
and last fragment isn't marked as sent yet
- hw is in progress sending the skb, therefore no extra doorbell ring
is needed for this skb
- once last fragment is marked as transmitted hw will trigger
a tx done interrupt and we come here again (with skb != NULL)
and ring the doorbell if needed
Therefore skip the workaround doorbell ring if skb is NULL.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/0a15a83c-aecf-ab51-8071-b29d9dcd529a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Explicitly enable the FSL_XGMAC_MDIO Kconfig option in order to have
MDIO access to internal and external PHYs.
Fixes: 7194792308 ("dpaa2-eth: add MAC/PHY support through phylink")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20201119145106.712761-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the driver supports nexthop objects, the check is no longer
necessary. Remove it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the FIB info (i.e, 'struct fib_info', 'struct fib6_info') uses a
nexthop object, then use the object's identifier to resolve the nexthop
group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Register a listener to the nexthop notification chain and parse notified
nexthop objects into the existing mlxsw nexthop data structures.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The call to pc_delete_flow can kfree the iter object, so the following
dev_err message that accesses iter->entry can accessmemory that has
just been kfree'd. Fix this by adding a temporary variable 'entry'
that has a copy of iter->entry and also use this when indexing into
the array mcam->entry2target_pffunc[]. Also print the unsigned value
using the %u format specifier rather than %d.
Addresses-Coverity: ("Read from pointer after free")
Fixes: 55307fcb92 ("octeontx2-af: Add mbox messages to install and delete MCAM rules")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201118143803.463297-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the variable err may be uninitialized if several of the if
statements are not executed in function nix_tx_vtag_decfg and a garbage
value in err is returned. Fix this by initialized ret at the start of
the function.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 9a946def26 ("octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201118132502.461098-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The shifting of the u16 result from ntohs(proto) by 16 bits to the
left will be promoted to a 32 bit signed int and then sign-extended
to a u64. In the event that the top bit of the return from ntohs(proto)
is set then all then all the upper 32 bits of a 64 bit long end up as
also being set because of the sign-extension. Fix this by casting to
a u64 long before the shift.
Addresses-Coverity: ("Unintended sign extension")
Fixes: f0c2982aaf ("octeontx2-pf: Add support for SR-IOV management function")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201118130520.460365-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SMT entry is allocated only when loopback Source MAC
rewriting is requested. Accessing SMT entry for non
smac rewrite cases results in kernel panic.
Fix the panic caused by non smac rewrite
Fixes: 937d842058 ("cxgb4: set up filter action after rewrites")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/20201118143213.13319-1-rajur@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On OcteonTX2 SoC, the admin function (AF) is the only one with all
priviliges to configure HW and alloc resources, PFs and it's VFs
have to request AF via mailbox for all their needs. This patch adds
a mailbox interface for CPT PFs and VFs to allocate resources
for cryptography. It also adds hardware CPT AF register defines.
Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On OcteonTX2 platform CPT instruction enqueue and NIX
packet send are only possible via LMTST operations which
uses LDEOR instruction. This patch moves lmt flush
function from OcteonTX2 nic driver to include/linux/soc
since it will be used by OcteonTX2 CPT and NIC driver for
LMTST.
Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Performance analysis counters are maintained under the MLX4_EN_PERF_STAT
definition, which is never set.
Clean them up, with all related structures and logic.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Link: https://lore.kernel.org/r/20201118103427.4314-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jump to init_err_release to cleanup. bnxt_unmap_bars() will also be
called but it will do nothing if the BARs are not mapped yet.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1605858271-8209-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert netsec driver to xdp_return_frame_bulk APIs.
Rely on xdp_return_frame_rx_napi for XDP_TX in order to try to recycle
the page in the "in-irq" page_pool cache.
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/01487b8f5167d62649339469cdd0c6d8df885902.1605605531.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the control buffer descriptor (cbd) fields have endianness
restrictions while the commands passed into the control buffers
don't (with one exception). This patch fixes offending code,
by adding endianness accessors for cbd fields and removing the
unnecessary ones in case of data buffer fields. Currently there's
no need to convert all commands to little endian format, the patch
only focuses on fixing current endianness issues reported by sparse.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These particular fields are specified in the H/W reference
manual as having network byte order format, so enforce big
endian annotation for them and clear the related sparse
warnings in the process.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Link: https://lore.kernel.org/r/1605792621-6268-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: c213eae8d3 ("bnxt_en: Improve VF/PF link change logic.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Link: https://lore.kernel.org/r/1605701851-20270-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When performing a flash update via devlink, device drivers may inform
user space of status updates via
devlink_flash_update_(begin|end|timeout|status)_notify functions.
It is expected that drivers do not send any status notifications unless
they send a begin and end message. If a driver sends a status
notification without sending the appropriate end notification upon
finishing (regardless of success or failure), the current implementation
of the devlink userspace program can get stuck endlessly waiting for the
end notification that will never come.
The current ice driver implementation may send such a status message
without the appropriate end notification in rare cases.
Fixing the ice driver is relatively simple: we just need to send the
begin_notify at the start of the function and always send an end_notify
no matter how the function exits.
Rather than assuming driver authors will always get this right in the
future, lets just fix the API so that it is not possible to get wrong.
Make devlink_flash_update_begin_notify and
devlink_flash_update_end_notify static, and call them in devlink.c core
code. Always send the begin_notify just before calling the driver's
flash_update routine. Always send the end_notify just after the routine
returns regardless of success or failure.
Doing this makes the status notification easier to use from the driver,
as it no longer needs to worry about catching failures and cleaning up
by calling devlink_flash_update_end_notify. It is now no longer possible
to do the wrong thing in this regard. We also save a couple of lines of
code in each driver.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All drivers which implement the devlink flash update support, with the
exception of netdevsim, use either request_firmware or
request_firmware_direct to locate the firmware file. Rather than having
each driver do this separately as part of its .flash_update
implementation, perform the request_firmware within net/core/devlink.c
Replace the file_name parameter in the struct devlink_flash_update_params
with a pointer to the fw object.
Use request_firmware rather than request_firmware_direct. Although most
Linux distributions today do not have the fallback mechanism
implemented, only about half the drivers used the _direct request, as
compared to the generic request_firmware. In the event that
a distribution does support the fallback mechanism, the devlink flash
update ought to be able to use it to provide the firmware contents. For
distributions which do not support the fallback userspace mechanism,
there should be essentially no difference between request_firmware and
request_firmware_direct.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.
This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure. So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.
Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.
This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1. It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Slave function read the following capabilities from the wrong offset:
1. log_mc_entry_sz
2. fs_log_entry_sz
3. log_mc_hash_sz
Fix that by adjusting these capabilities offset to match firmware
layout.
Due to the wrong offset read, the following issues might occur:
1+2. Negative value reported at max_mcast_qp_attach.
3. Driver to init FW with multicast hash size of zero.
Fixes: a40ded6043 ("net/mlx4_core: Add masking for a few queries on HCA caps")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20201118081922.553-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rtl_tx() the released descriptors are zero'ed by
rtl8169_unmap_tx_skb(). And in the beginning of rtl8169_start_xmit()
we check that enough descriptors are free, therefore there's no way
the DescOwn bit can be set here.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/6965d665-6c50-90c5-70e6-0bb335d4ea47@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl+0KZ4ACgkQSD+KveBX
+j5DDAgAip1LGfOUq7RISh1OWQ9zLl2KT/mmbdSioObGgjKunU4lMGZAPrLB0bpe
RH1RodslY+1bepcz7VQ/QxbL0EU605SZn08xLZIQAYYXT0Sar+hhg7h7vhD0iSA1
dwDF/XLQYHf8snc3x/tu6/zp9BzfzRfuiieYAbC1ri97Iz9uoYO+QcRI7OyqgXWf
LmK9/s6WzgaC0DKg8XXEQGp7F1MbZfJph6tThZnF62NY9Rrkut6AEah85p1QaqF7
9WLjqOtJwo893fzbN+WCXJTYo2UX9oU18nUSvSoYid2LEofiwY3PcT8R0gKSPZto
cVcjm1oGsotbonBnPlR2eMZq8LdPDQ==
=MKi5
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2020-11-17
This series introduces some fixes to mlx5 driver.
* tag 'mlx5-fixes-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: fix error return code in mlx5e_tc_nic_init()
net/mlx5: E-Switch, Fail mlx5_esw_modify_vport_rate if qos disabled
net/mlx5: Disable QoS when min_rates on all VFs are zero
net/mlx5: Clear bw_share upon VF disable
net/mlx5: Add handling of port type in rule deletion
net/mlx5e: Fix check if netdev is bond slave
net/mlx5e: Fix IPsec packet drop by mlx5e_tc_update_skb
net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.
net/mlx5e: Fix refcount leak on kTLS RX resync
====================
Link: https://lore.kernel.org/r/20201117195702.386113-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function is responsible for allocating the adjacency entries used by
the nexthop group and populating them with the adjacency information
such as egress RIF and MAC address.
Allow the function to return an error when it encounters a problem and
have the relevant call sites check it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>