netcons calls napi_poll with a budget of 0 to transmit packets.
Handle this by:
- skipping RX processing
- do not try to recycle TX packets to the RX cache
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kmalloc returns KSEG0 addresses so convert back from KSEG1
in kfree. Also make sure array is freed when the driver is
unloaded from the kernel.
Fixes: ef11291bcd ("Add support the Korina (IDT RC32434) Ethernet MAC")
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The VCAP_IS1_ACT_VID_REPLACE_ENA action, from the VCAP IS1 ingress TCAM,
changes the classified VLAN.
We are only exposing this ability for switch ports that are under VLAN
aware bridges. This is because in standalone ports mode and under a
bridge with vlan_filtering=0, the ocelot driver configures the switch to
operate as VLAN-unaware, so the classified VLAN is not derived from the
802.1Q header from the packet, but instead is always equal to the
port-based VLAN ID of the ingress port. We _can_ still change the
classified VLAN for packets when operating in this mode, but the end
result will most likely be a drop, since both the ingress and the egress
port need to be members of the modified VLAN. And even if we install the
new classified VLAN into the VLAN table of the switch, the result would
still not be as expected: we wouldn't see, on the output port, the
modified VLAN tag, but the original one, even though the classified VLAN
was indeed modified. This is because of how the hardware works: on
egress, what is pushed to the frame is a "port tag", which gives us the
following options:
- Tag all frames with port tag (derived from the classified VLAN)
- Tag all frames with port tag, except if the classified VLAN is 0 or
equal to the native VLAN of the egress port
- No port tag
Needless to say, in VLAN-unaware mode we are disabling the port tag.
Otherwise, the existing VLAN tag would be ignored, and a second VLAN
tag (the port tag), holding the classified VLAN, would be pushed
(instead of replacing the existing 802.1Q tag). This is definitely not
what the user wanted when installing a "vlan modify" action.
So it is simply not worth bothering with VLAN modify rules under other
configurations except when the ports are fully VLAN-aware.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a methodical transition of the driver from phylib
to phylink, following the guidelines from sfp-phylink.rst.
The MAC register configurations based on interface mode
were moved from the probing path to the mac_config() hook.
MAC enable and disable commands (enabling Rx and Tx paths
at MAC level) were also extracted and assigned to their
corresponding phylink hooks.
As part of the migration to phylink, the serdes configuration
from the driver was offloaded to the PCS_LYNX module,
introduced in commit 0da4c3d393 ("net: phy: add Lynx PCS module"),
the PCS_LYNX module being a mandatory component required to
make the enetc driver work with phylink.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.cionei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Decouple internal mdio bus creation from serdes
configuration, as a prerequisite to offloading
serdes configuration to a different module.
Group together mdio bus creation routines, cleanup.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Decouple level MAC configuration based on phy interface type
from general port configuration.
Group together MAC and link configuration code.
Decouple external mdio bus creation from interface type
parsing. No longer return an (unhandled) error code when
phy_node not found, use phy_node to indicate whether the
port has a phy or not. No longer fall-through when serdes
configuration fails for the link modes that require
internal link configuration.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When packets are received on the error queue, this function under
net_ratelimit():
netif_err(priv, hw, net_dev, "Err FD status = 0x%08x\n");
does not get printed. Instead we only see:
[ 3658.845592] net_ratelimit: 244 callbacks suppressed
[ 3663.969535] net_ratelimit: 230 callbacks suppressed
[ 3669.085478] net_ratelimit: 228 callbacks suppressed
Enabling NETIF_MSG_HW fixes this issue, and we can see some information
about the frame descriptors of packets.
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Factor out handling the private packet/byte counters to new
functions rtl_get_priv_stats() and rtl_inc_priv_stats().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make use of the new struct_size() helper instead of the offsetof() idiom.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A subsequent addition of an IP4 or IP6 rule after other rules would
overwrite any existing TCAM entries of related L4 protocols(ex: tcp4 or
udp6). This was due to the mask including too many TCAM entries. Add new
packet type masks with bits properly excluded so rules are not overwritten.
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pointers should be casted to unsigned long to avoid
-Wpointer-to-int-cast warnings:
drivers/net/ethernet/intel/ice/ice_flow.h:197:33: warning:
cast from pointer to integer of different size
drivers/net/ethernet/intel/ice/ice_flow.h:198:32: warning:
cast to pointer from integer of different size
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While debugging a recent failure to update the flash of an ice device,
I found it helpful to add additional logging which helped determine the
root cause of the problem being a timeout issue.
Add some extra dev_dbg() logging messages which can be enabled using the
dynamic debug facility, including one for ice_aq_wait_for_event that
will use jiffies to capture a rough estimate of how long we waited for
the completion of a firmware command.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the devlink_port structure is stored within the ice_pf. This
made sense because we create a single devlink_port for each PF. This
setup does not mesh with the abstractions in the driver very well, and
led to a flow where we accidentally call devlink_port_unregister twice
during error cleanup.
In particular, if devlink_port_register or devlink_port_unregister are
called twice, this leads to a kernel panic. This appears to occur during
some possible flows while cleaning up from a failure during driver
probe.
If register_netdev fails, then we will call devlink_port_unregister in
ice_cfg_netdev as it cleans up. Later, we again call
devlink_port_unregister since we assume that we must cleanup the port
that is associated with the PF structure.
This occurs because we cleanup the devlink_port for the main PF even
though it was not allocated. We allocated the port within a per-VSI
function for managing the main netdev, but did not release the port when
cleaning up that VSI, the allocation and destruction are not aligned.
Instead of attempting to manage the devlink_port as part of the PF
structure, manage it as part of the PF VSI. Doing this has advantages,
as we can match the de-allocation of the devlink_port with the
unregister_netdev associated with the main PF VSI.
Moving the port to the VSI is preferable as it paves the way for
handling devlink ports allocated for other purposes such as SR-IOV VFs.
Since we're changing up how we allocate the devlink_port, also change
the indexing. Originally, we indexed the port using the PF id number.
This came from an old goal of sharing a devlink for each physical
function. Managing devlink instances across multiple function drivers is
not workable. Instead, lets set the port number to the logical port
number returned by firmware and set the index using the VSI index
(sometimes referred to as VSI handle).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add "fw.app.bundle_id" to display the DDP Track ID of the active DDP
package. This id is similar to "fw.bundle_id" and is a unique identifier
for the DDP package that is loaded in the device. Each new DDP has
a unique Track ID generated for it, and the ID can be used to identify
and track the DDP package.
Add documentation for the new devlink info version.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ice_info_get_dsn always returns 0, so just make it void.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new test in checkpatch detects repeated words; cleanup all pre-existing
occurrences of those now.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Co-developed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use %*phD format to print small buffer as hex string.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devlink reload action fw_activate with reload limit
no_reset which does firmware live patching, updating the firmware image
without reset, no downtime and no configuration lose. The driver checks
if the firmware is capable of handling the pending firmware changes as a
live patch. If it is then it triggers firmware live patching flow.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The enable_remote_dev_reset devlink param flags that the host admin
allows resets by other hosts. In case it is cleared mlx5 host PF driver
will send NACK on pci sync for firmware update reset request and the
command will fail.
By default enable_remote_dev_reset parameter is true, so pci sync for
firmware update reset is enabled.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devlink reload action fw_activate. To activate firmware
image the mlx5 driver resets the firmware and reloads it from flash. If
a new image was stored on flash it will be loaded. Once this reload
command is executed the driver initiates fw sync reset flow, where the
firmware synchronizes all PFs on coming reset and driver reload.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If firmware sends sync_reset_abort to driver the driver should clear the
reset requested mode as reset is not expected any more.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On sync_reset_now event the driver does reload and PCI link toggle to
activate firmware upgrade reset. When the firmware sends this event it
syncs the event on all PFs, so all PFs will do PCI link toggle at once.
To do PCI link toggle, the driver ensures that no other device ID under
the same bridge by checking that all the PF functions under the same PCI
bridge have same device ID. If no other device it uses PCI bridge link
control to turn link down and up.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Set capability to notify the firmware that this host driver is capable
of handling pci sync for firmware update events.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add functions to query and set the MFRL reset options supported by
firmware.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add reload limit to demand restrictions on reload actions.
Reload limits supported:
no_reset: No reset allowed, no down time allowed, no link flap and no
configuration is lost.
By default reload limit is unspecified and so no constraints on reload
actions are required.
Some combinations of action and limit are invalid. For example, driver
can not reinitialize its entities without any downtime.
The no_reset reload limit will have usecase in this patchset to
implement restricted fw_activate on mlx5.
Have the uapi parameter of reload limit ready for future support of
multiselection.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add devlink reload action to allow the user to request a specific reload
action. The action parameter is optional, if not specified then devlink
driver re-init action is used (backward compatible).
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
Reload actions supported are:
driver_reinit: driver entities re-initialization, applying devlink-param
and devlink-resource values.
fw_activate: firmware activate.
command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
driver_reinit
$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
driver_reinit fw_activate
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The phy_reset_after_clk_enable() does a PHY reset, which means the PHY
loses its register settings. The fec_enet_mii_probe() starts the PHY
and does the necessary calls to configure the PHY via PHY framework,
and loads the correct register settings into the PHY. Therefore,
fec_enet_mii_probe() should be called only after the PHY has been
reset, not before as it is now.
Fixes: 1b0a83ac04 ("net: fec: add phy_reset_after_clk_enable() support")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Without these definitions, the driver will crash in:
mscc_ocelot_probe
-> ocelot_init
-> ocelot_vcap_init
-> __ocelot_target_read_ix
I missed this because I did not have the VSC7514 hardware to test, only
the VSC9959 and VSC9953, and the probing part is different.
Fixes: e3aea296d8 ("net: mscc: ocelot: add definitions for VCAP ES0 keys, actions and target")
Fixes: a61e365d7c ("net: mscc: ocelot: add definitions for VCAP IS1 keys, actions and target")
Reported-by: Divya Koppera <Divya.Koppera@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Small conflict around locking in rxrpc_process_event() -
channel_lock moved to bundle in next, while state lock
needs _bh() from net.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some firmware files trigger a PHY soft reset and don't wait for it to
be finished. PHY register writes directly after applying the firmware
may fail or provide unexpected results therefore. Fix this by waiting
for bit BMCR_RESET to be cleared after applying firmware.
There's nothing wrong with the referenced change, it's just that the
fix will apply cleanly only after this change.
Fixes: 89fbd26cca ("r8169: fix firmware not resetting tp->ocp_base")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mediadetect is another name for the EDPD (energy detect power down).
This feature allows device to save extra power when no link is available.
PHY goes into the extreme power saving mode and only periodically wakes up
and checks for the link.
AQC devices has fixed check period of 6 seconds
The feature may increase linkup time.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
PHY downshift allows phy to try renegotiate if link is unstable
and can carry higher speed.
AQC devices has integrated PHY which is controlled by MAC firmware.
Thus, driver defines new ethtool callbacks to implement phy tunables
via netdev.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an upper bound to the value that a watermark may hold. That
upper bound is not immediately obvious during configuration, and it
might be possible to have accidental truncation.
Actually this has happened already, add a warning to prevent it from
happening again.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tail dropping is enabled for a port when:
1. A source port consumes more packet buffers than the watermark encoded
in SYS:PORT:ATOP_CFG.ATOP.
AND
2. Total memory use exceeds the consumption watermark encoded in
SYS:PAUSE_CFG:ATOP_TOT_CFG.
The unit of these watermarks is a 60 byte memory cell. That unit is
programmed properly into ATOP_TOT_CFG, but not into ATOP. Actually when
written into ATOP, it would get truncated and wrap around.
Fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
A driver may refuse to enable VLAN filtering for any reason beyond what
the DSA framework cares about, such as:
- having tc-flower rules that rely on the switch being VLAN-aware
- the particular switch does not support VLAN, even if the driver does
(the DSA framework just checks for the presence of the .port_vlan_add
and .port_vlan_del pointers)
- simply not supporting this configuration to be toggled at runtime
Currently, when a driver rejects a configuration it cannot support, it
does this from the commit phase, which triggers various warnings in
switchdev.
So propagate the prepare phase to drivers, to give them the ability to
refuse invalid configurations cleanly and avoid the warnings.
Since we need to modify all function prototypes and check for the
prepare phase from within the drivers, take that opportunity and move
the existing driver restrictions within the prepare phase where that is
possible and easy.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Landen Chao <Landen.Chao@mediatek.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Jonathan McDowell <noodles@earth.li>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang static analysis reports this problem:
drivers/net/ethernet/marvell/mvneta.c:3465:2: warning:
Attempt to free released memory
kfree(txq->buf);
^~~~~~~~~~~~~~~
When mvneta_txq_sw_init() fails to alloc txq->tso_hdrs,
it frees without poisoning txq->buf. The error is caught
in the mvneta_setup_txqs() caller which handles the error
by cleaning up all of the txqs with a call to
mvneta_txq_sw_deinit which also frees txq->buf.
Since mvneta_txq_sw_deinit is a general cleaner, all of the
partial cleaning in mvneta_txq_sw_deinit()'s error handling
is not needed.
Fixes: 2adb719d74 ("net: mvneta: Implement software TSO")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver will schedule RX ring reset when we get a buffer
error in the RX completion record. These RX buffer errors can be due
to normal out-of-buffer conditions or a permanent error in the RX
ring. Because the driver cannot distinguish between these 2
conditions, we assume all these buffer errors require reset.
This is very disruptive when it is just a normal out-of-buffer
condition. Newer firmware will now monitor the rings for the permanent
failure and will send a notification to the driver when it happens.
This allows the driver to reset only when such a notification is
received. In environments where we have predominently out-of-buffer
conditions, we now can avoid these unnecessary resets.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is logic in the RX path to detect unexpected handles in the
RX completion. We'll print a warning and schedule a reset. The
next expected handle is then set to 0xffff which is guaranteed to
not match any valid handle. This will force all remaining packets in
the ring to be discarded before the reset. There can be hundreds of
these packets remaining in the ring and there is no need to print the
warnings for these forced errors.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a per ring rx_resets counter to count these RX resets.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some older chips, it is necessary to do a reset when we get buffer
errors associated with an RX ring. These buffer errors may become
frequent if the RX ring underruns under heavy traffic. The current
code does a global reset of all reasources when this happens. This
works but creates a big disruption of all rings when one RX ring is
having problem. This patch implements a localized RX ring reset of
just the RX ring having the issue. All other rings including all
TX rings will not be affected by this single RX ring reset.
Only the older chips prior to the P5 class supports this reset.
Because it is not a global reset, packets may still be arriving
while we are calling firmware to reset that ring. We need to be
sure that we don't post any buffers during this time while the
ring is undergoing reset. After firmware completes successfully,
the ring will be in the reset state with no buffers and we can start
filling it with new buffers and posting them.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_init_one_rx_ring() includes logic to initialize the BDs for one RX
ring and to allocate the buffers. Separate the allocation logic into a
new bnxt_alloc_one_rx_ring() function. The allocation function will be
used later to allocate new buffers for one specified RX ring when we
reset that RX ring.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_free_rx_skbs() frees all the allocated buffers and SKBs for
every RX ring. Refactor this function by calling a new function
bnxt_free_one_rx_ring_skbs() to free these buffers on one specified
RX ring at a time. This is preparation work for resetting one RX
ring during run-time.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If firmware does not come out of reset, log FW health status info
to provide more information on firmware status.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NS3 SoC platforms require assistance from the OP-TEE to recover
firmware if a crash occurs while no driver is bound. The
CRASHED_NO_MASTER condition is recorded in the firmware status register
during the crash to indicate when driver intervension is needed to
coordinate a firmware reload. This condition is detected during early
driver initialization in order to effect a firmware fastboot on
supported platforms when necessary.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware now supports device independent discovery of the status
register location. This status register can provide more detailed
information about firmware errors, especially if problems occur
before the HWRM interface is functioning. Attempt to map this
register if it is present and report the firmware status on firmware
init failures.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The allocator for the firmware health structure conflates allocation
and capability checks, limiting the reusability of the code. This patch
separates out the capability check and disablement and improves the
warning message to better describe the consequences of an allocation
failure.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Main changes is to extend hwrm_nvm_get_dev_info_output() for stored
firmware versions and a new flag is added to fw_status_reg.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix many (lots deleted here) build errors in hinic by selecting NET_DEVLINK.
ld: drivers/net/ethernet/huawei/hinic/hinic_hw_dev.o: in function `mgmt_watchdog_timeout_event_handler':
hinic_hw_dev.c:(.text+0x30a): undefined reference to `devlink_health_report'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump':
hinic_devlink.c:(.text+0x1c): undefined reference to `devlink_fmsg_u32_pair_put'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump':
hinic_devlink.c:(.text+0x126): undefined reference to `devlink_fmsg_binary_pair_put'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_hw_reporter_dump':
hinic_devlink.c:(.text+0x1ba): undefined reference to `devlink_fmsg_string_pair_put'
ld: hinic_devlink.c:(.text+0x227): undefined reference to `devlink_fmsg_u8_pair_put'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_alloc':
hinic_devlink.c:(.text+0xaee): undefined reference to `devlink_alloc'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_free':
hinic_devlink.c:(.text+0xb04): undefined reference to `devlink_free'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_register':
hinic_devlink.c:(.text+0xb26): undefined reference to `devlink_register'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_unregister':
hinic_devlink.c:(.text+0xb46): undefined reference to `devlink_unregister'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_create':
hinic_devlink.c:(.text+0xb75): undefined reference to `devlink_health_reporter_create'
ld: hinic_devlink.c:(.text+0xb95): undefined reference to `devlink_health_reporter_create'
ld: hinic_devlink.c:(.text+0xbac): undefined reference to `devlink_health_reporter_destroy'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_destroy':
Fixes: 51ba902a16 ("net-next/hinic: Initialize hw interface")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Bin Luo <luobin9@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Cc: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethtool manual stated that the tx-timer is the "the amount of time the
device should stay in idle mode prior to asserting its Tx LPI". The
previous implementation for "ethtool --set-eee tx-timer" sets the LPI TW
timer duration which is not correct. Hence, this patch fixes the
"ethtool --set-eee tx-timer" to configure the EEE LPI timer.
The LPI TW Timer will be using the defined default value instead of
"ethtool --set-eee tx-timer" which follows the EEE LS timer implementation.
Changelog V2
*Not removing/modifying the eee_timer.
*EEE LPI timer can be configured through ethtool and also the eee_timer
module param.
*EEE TW Timer will be configured with default value only, not able to be
configured through ethtool or module param. This follows the implementation
of the EEE LS Timer.
Fixes: d765955d2a ("stmmac: add the Energy Efficient Ethernet support")
Signed-off-by: Vineetha G. Jaya Kumaran <vineetha.g.jaya.kumaran@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the new group of devlink traps - PARSER_ERROR_DROPS.
This consists of registering the array of parser error drops supported,
controlling their action through the .trap_group_action_set() callback
and reporting an erroneous skb received on the error queue
appropriately.
DPAA2 devices do not support controlling the action of independent
parser error traps, thus the .trap_action_set() callback just returns an
EOPNOTSUPP while .trap_group_action_set() actually notifies the hardware
what it should do with a frame marked as having a header error.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic support in dpaa2-eth for devlink. For the moment, just
register the device with devlink, add the corresponding devlink port and
implement the .info_get() callback.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the new firmware image downladed for update is corrupted
or is a bad format, the download process will report a status
code specifically for that.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the lif's ident information for the uc and mc filter
counts rather than the ionic's version, to be sure
we're getting the info that is specific to this lif.
While we're thinking about it, add some missing error
checking where we get the lif's identity information.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
After we do a fw upgrade and refill the ionic->ident.dev, we
also need to update the other identity info. Since the lif
identity needs to be updated each time the ionic identity is
refreshed, we can pull it into ionic_identify().
The debugfs entry is moved so that it doesn't cause an
error message when the data is refreshed after the fw upgrade.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some time ago we short-circuited the queue disables on a timeout
error in order to not have to wait on every queue when we already
know it will time out. However, this meant that we're not
properly stopping all the interrupts and napi contexts. This
changes queue disable to always call ionic_qcq_disable() and to
give it an argument to know when to not do the adminq request.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a couple of error recovery paths that can come through
ionic_qcq_disable() without having set up the qcq, so we need
to make sure we have a valid qcq pointer before using it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear our link check requested flag on an allocation error.
We end up dropping this link check request, but that should
be fine as our watchdog will come back a few seconds later
and request it again.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check through our work list for additional items. This normally
will only have one item, but occasionally may have another
job waiting. There really is no need reschedule ourself here.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The event notification queue is set up a little differently in the
NIC and so the notifyq q and cq descriptor structures need to be
contiguous, which got missed in an earlier patch that separated
out the q and cq descriptor allocations. That patch was aimed at
making the big tx and rx descriptor queue allocations easier to
manage - the notifyq is much smaller and doesn't need to be split.
This patch simply adds an if/else and slightly different code for
the notifyq descriptor allocation.
Fixes: ea5a8b09dc ("ionic: reduce contiguous memory allocation requirement")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl93ap4ACgkQSD+KveBX
+j5+eAf/dvxx+WyWr5pV3gxd0x7K/wV+F1JFVe99k8yH6kYbpo56U+oRQGP4kvnG
4Ggb/XE7hSahvReRVD5vn4LKk2RQ/GMWEurF/GQDPklaHZyZHtcI3+2C/azEHnf+
vGgbM1xDT0gNZoa+2pA7LBgruJF/k+gRbth6EHrjlcxqiqt2k4d5Hs0m/Xd5R0TC
D+Yks3uHAOrTiP2idOWNoWmd5AOmh802wX0w4iyKZ9ZfJGMN3t2AKyZDIhNfPyPf
avebIihkr5y5DsYGZE+HjZjK0+vXaKVAGgzDbeLZ2sPVdCJJAFRZptG6mPJ0d0N3
TPwcBZcs5BsJkEQ5XqBA0IwJjksVeA==
=421I
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2020-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
From: Saeed Mahameed <saeedm@nvidia.com>
====================
This series introduces some fixes to mlx5 driver.
v1->v2:
- Patch #1 Don't return while mutex is held. (Dave)
v2->v3:
- Drop patch #1, will consider a better approach (Jakub)
- use cpu_relax() instead of cond_resched() (Jakub)
- while(i--) to reveres a loop (Jakub)
- Drop old mellanox email sign-off and change the committer email
(Jakub)
Please pull and let me know if there is any problem.
For -stable v4.15
('net/mlx5e: Fix VLAN cleanup flow')
('net/mlx5e: Fix VLAN create flow')
For -stable v4.16
('net/mlx5: Fix request_irqs error flow')
For -stable v5.4
('net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU')
('net/mlx5: Avoid possible free of command entry while timeout comp handler')
For -stable v5.7
('net/mlx5e: Fix return status when setting unsupported FEC mode')
For -stable v5.8
('net/mlx5e: Fix race condition on nhe->n pointer in neigh update')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Via the OCELOT_MASK_MODE_REDIRECT flag put in the IS2 action vector, it
is possible to replace previous forwarding decisions with the port mask
installed in this rule.
I have studied Table 54 "MASK_MODE and PORT_MASK Combinations" from the
VSC7514 documentation and it appears to behave sanely when this rule is
installed in either lookup 0 or 1. Namely, a redirect in lookup 1 will
overwrite the forwarding decision taken by any entry in lookup 0.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The issue which led to the introduction of this check was that MAC_ETYPE
rules, such as filters on dst_mac and src_mac, would only match non-IP
frames. There is a knob in VCAP_S2_CFG which forces all IP frames to be
treated as non-IP, which is what we're currently doing if the user
requested a dst_mac filter, in order to maintain sanity.
But that knob is actually per IS2 lookup. And the good thing with
exposing the lookups to the user via tc chains is that we're now able to
offload MAC_ETYPE keys to one lookup, and IP keys to the other lookup.
So let's do that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were installing TCAM rules with the LOOKUP field as unmasked, meaning
that all entries were matching on all lookups. Now that lookups are
exposed as individual chains, let's make the LOOKUP explicit when
offloading TCAM entries.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VCAP ES0 is an egress VCAP operating on all outgoing frames.
This patch added ES0 driver to support vlan push action of tc filter.
Usage:
tc filter add dev swp1 egress protocol 802.1Q flower indev swp0 skip_sw \
vlan_id 1 vlan_prio 1 action vlan push id 2 priority 2
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VCAP IS1 is a VCAP module which can filter on the most common L2/L3/L4
Ethernet keys, and modify the results of the basic QoS classification
and VLAN classification based on those flow keys.
There are 3 VCAP IS1 lookups, mapped over chains 10000, 11000 and 12000.
Currently the driver is hardcoded to use IS1_ACTION_TYPE_NORMAL half
keys.
Note that the VLAN_MANGLE has been omitted for now. In hardware, the
VCAP_IS1_ACT_VID_REPLACE_ENA field replaces the classified VLAN
(metadata associated with the frame) and not the VLAN from the header
itself. There are currently some issues which need to be addressed when
operating in standalone, or in bridge with vlan_filtering=0 modes,
because in those cases the switch ports have VLAN awareness disabled,
and changing the classified VLAN to anything other than the pvid causes
the packets to be dropped. Another issue is that on egress, we expect
port tagging to push the classified VLAN, but port tagging is disabled
in the modes mentioned above, so although the classified VLAN is
replaced, it is not visible in the packet transmitted by the switch.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Ocelot switches, there are 2 ingress pipelines for flow offload
rules: VCAP IS1 (Ingress Classification) and IS2 (Security Enforcement).
IS1 and IS2 support different sets of actions. The pipeline order for a
packet on ingress is:
Basic classification -> VCAP IS1 -> VCAP IS2
Furthermore, IS1 is looked up 3 times, and IS2 is looked up twice (each
TCAM entry can be configured to match only on the first lookup, or only
on the second, or on both etc).
Because the TCAMs are completely independent in hardware, and because of
the fixed pipeline, we actually have very limited options when it comes
to offloading complex rules to them while still maintaining the same
semantics with the software data path.
This patch maps flow offload rules to ingress TCAMs according to a
predefined chain index number. There is going to be a script in
selftests that clarifies the usage model.
There is also an egress TCAM (VCAP ES0, the Egress Rewriter), which is
modeled on top of the default chain 0 of the egress qdisc, because it
doesn't have multiple lookups.
Suggested-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Co-developed-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the mscc_ocelot_switch_lib is common between a pure switchdev and
a DSA driver, the procedure of retrieving a net_device for a certain
port index differs, as those are registered by their individual
front-ends.
Up to now that has been dealt with by always passing the port index to
the switch library, but now, we're going to need to work with net_device
pointers from the tc-flower offload, for things like indev, or mirred.
It is not desirable to refactor that, so let's make sure that the flower
offload core has the ability to translate between a net_device and a
port index properly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At this stage, the tc-flower offload of mscc_ocelot can only delegate
rules to the VCAP IS2 security enforcement block. These rules have, in
hardware, separate bits for policing and for overriding the destination
port mask and/or copying to the CPU. So it makes sense that we attempt
to expose some more of that low-level complexity instead of simply
choosing between a single type of action.
Something similar happens with the VCAP IS1 block, where the same action
can contain enable bits for VLAN classification and for QoS
classification at the same time.
So model the action structure after the hardware description, and let
the high-level ocelot_flower.c construct an action vector from multiple
tc actions.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When interface is attached while in promiscuous mode and with VLAN
filtering turned off, both configurations are not respected and VLAN
filtering is performed.
There are 2 flows which add the any-vid rules during interface attach:
VLAN creation table and set rx mode. Each is relaying on the other to
add any-vid rules, eventually non of them does.
Fix this by adding any-vid rules on VLAN creation regardless of
promiscuous mode.
Fixes: 9df30601c8 ("net/mlx5e: Restore vlan filter after seamless reset")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Prior to this patch unloading an interface in promiscuous mode with RX
VLAN filtering feature turned off - resulted in a warning. This is due
to a wrong condition in the VLAN rules cleanup flow, which left the
any-vid rules in the VLAN steering table. These rules prevented
destroying the flow group and the flow table.
The any-vid rules are removed in 2 flows, but none of them remove it in
case both promiscuous is set and VLAN filtering is off. Fix the issue by
changing the condition of the VLAN table cleanup flow to clean also in
case of promiscuous mode.
mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1
mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1
mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1
...
...
------------[ cut here ]------------
FW pages counter is 11560 after reclaiming all pages
WARNING: CPU: 1 PID: 28729 at
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660
mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core]
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
mlx5_function_teardown+0x2f/0x90 [mlx5_core]
mlx5_unload_one+0x71/0x110 [mlx5_core]
remove_one+0x44/0x80 [mlx5_core]
pci_device_remove+0x3e/0xc0
device_release_driver_internal+0xfb/0x1c0
device_release_driver+0x12/0x20
pci_stop_bus_device+0x68/0x90
pci_stop_and_remove_bus_device+0x12/0x20
hv_eject_device_work+0x6f/0x170 [pci_hyperv]
? __schedule+0x349/0x790
process_one_work+0x206/0x400
worker_thread+0x34/0x3f0
? process_one_work+0x400/0x400
kthread+0x126/0x140
? kthread_park+0x90/0x90
ret_from_fork+0x22/0x30
---[ end trace 6283bde8d26170dc ]---
Fixes: 9df30601c8 ("net/mlx5e: Restore vlan filter after seamless reset")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Verify the configured FEC mode is supported by at least a single link
mode before applying the command. Otherwise fail the command and return
"Operation not supported".
Prior to this patch, the command was successful, yet it falsely set all
link modes to FEC auto mode - like configuring FEC mode to auto. Auto
mode is the default configuration if a link mode doesn't support the
configured FEC mode.
Fixes: b5ede32d33 ("net/mlx5e: Add support for FEC modes based on 50G per lane links")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Declare GRE offload support with respect to the inner protocol. Add a
list of supported inner protocols on which the driver can offload
checksum and GSO. For other protocols, inform the stack to do the needed
operations. There is no noticeable impact on GRE performance.
Fixes: 2729984149 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit introduced the following coverity issue at function
mlx5_tc_ct_rule_to_tuple_nat:
- Memory - corruptions (OVERRUN)
Overrunning array "tuple->ip.src_v6.in6_u.u6_addr32" of 4 4-byte
elements at element index 7 (byte offset 31) using index
"ip6_offset" (which evaluates to 7).
In case of IPv6 destination address rewrite, ip6_offset values are
between 4 to 7, which will cause memory overrun of array
"tuple->ip.src_v6.in6_u.u6_addr32" to array
"tuple->ip.dst_v6.in6_u.u6_addr32".
Fixed by writing the value directly to array
"tuple->ip.dst_v6.in6_u.u6_addr32" in case ip6_offset values are
between 4 to 7.
Fixes: bc562be967 ("net/mlx5e: CT: Save ct entries tuples in hashtables")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Prior to this fix, in Striding RQ mode the driver was vulnerable when
receiving packets in the range (stride size - headroom, stride size].
Where stride size is calculated by mtu+headroom+tailroom aligned to the
closest power of 2.
Usually, this filtering is performed by the HW, except for a few cases:
- Between 2 VFs over the same PF with different MTUs
- On bluefield, when the host physical function sets a larger MTU than
the ARM has configured on its representor and uplink representor.
When the HW filtering is not present, packets that are larger than MTU
might be harmful for the RQ's integrity, in the following impacts:
1) Overflow from one WQE to the next, causing a memory corruption that
in most cases is unharmful: as the write happens to the headroom of next
packet, which will be overwritten by build_skb(). In very rare cases,
high stress/load, this is harmful. When the next WQE is not yet reposted
and points to existing SKB head.
2) Each oversize packet overflows to the headroom of the next WQE. On
the last WQE of the WQ, where addresses wrap-around, the address of the
remainder headroom does not belong to the next WQE, but it is out of the
memory region range. This results in a HW CQE error that moves the RQ
into an error state.
Solution:
Add a page buffer at the end of each WQE to absorb the leak. Actually
the maximal overflow size is headroom but since all memory units must be
of the same size, we use page size to comply with UMR WQEs. The increase
in memory consumption is of a single page per RQ. Initialize the mkey
with all MTTs pointing to a default page. When the channels are
activated, UMR WQEs will redirect the RX WQEs to the actual memory from
the RQ's pool, while the overflow MTTs remain mapped to the default page.
Fixes: 73281b78a3 ("net/mlx5e: Derive Striding RQ size from MTU")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Increase granularity of the error path to avoid unneeded free/release.
Fix the cleanup to be symmetric to the order of creation.
Fixes: 0ddf543226 ("xdp/mlx5: setup xdp_rxq_info")
Fixes: 422d4c401e ("net/mlx5e: RX, Split WQ objects for different RQ types")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In case of pci is offline reclaim_pages_cmd() will still try to call
the FW to release FW pages, cmd_exec() in this case will return a silent
success without actually calling the FW.
This is wrong and will cause page leaks, what we should do is to detect
pci offline or command interface un-available before tying to access the
FW and manually release the FW pages in the driver.
In this patch we share the code to check for FW command interface
availability and we call it in sensitive places e.g. reclaim_pages_cmd().
Alternative fix:
1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
command success simulation list.
2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
It is possible that new command entry index allocation will temporarily
fail. The new command holds the semaphore, so it means that a free entry
should be ready soon. Add one second retry mechanism before returning an
error.
Patch "net/mlx5: Avoid possible free of command entry while timeout comp
handler" increase the possibility to bump into this temporarily failure
as it delays the entry index release for non-callback commands.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Once driver detects a command interface command timeout, it warns the
user and returns timeout error to the caller. In such case, the entry of
the command is not evacuated (because only real event interrupt is allowed
to clear command interface entry). If the HW event interrupt
of this entry will never arrive, this entry will be left unused forever.
Command interface entries are limited and eventually we can end up without
the ability to post a new command.
In addition, if driver will not consume the EQE of the lost interrupt and
rearm the EQ, no new interrupts will arrive for other commands.
Add a resiliency mechanism for manually polling the command EQ in case of
a command timeout. In case resiliency mechanism will find non-handled EQE,
it will consume it, and the command interface will be fully functional
again. Once the resiliency flow finished, wait another 5 seconds for the
command interface to complete for this command entry.
Define mlx5_cmd_eq_recover() to manage the cmd EQ polling resiliency flow.
Add an async EQ spinlock to avoid races between resiliency flows and real
interrupts that might run simultaneously.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Upon command completion timeout, driver simulates a forced command
completion. In a rare case where real interrupt for that command arrives
simultaneously, it might release the command entry while the forced
handler might still access it.
Fix that by adding an entry refcount, to track current amount of allowed
handlers. Command entry to be released only when this refcount is
decremented to zero.
Command refcount is always initialized to one. For callback commands,
command completion handler is the symmetric flow to decrement it. For
non-callback commands, it is wait_func().
Before ringing the doorbell, increment the refcount for the real completion
handler. Once the real completion handler is called, it will decrement it.
For callback commands, once the delayed work is scheduled, increment the
refcount. Upon callback command completion handler, we will try to cancel
the timeout callback. In case of success, we need to decrement the callback
refcount as it will never run.
In addition, gather the entry index free and the entry free into a one
flow for all command types release.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As part of driver unload, it destroys the commands EQ (via FW command).
As the commands EQ is destroyed, FW will not generate EQEs for any command
that driver sends afterwards. Driver should poll for later commands status.
Driver commands mode metadata is updated before the commands EQ is
actually destroyed. This can lead for double completion handle by the
driver (polling and interrupt), if a command is executed and completed by
FW after the mode was changed, but before the EQ was destroyed.
Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee
that only DESTROY_EQ command can be executed during this time period.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use netif_msg_init() to process param settings
and use only the proper initialized value of
ei_local->msg_level for later processing;
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some EtherAVB variants support internal clock delay configuration, which
can add larger delays than the delays that are typically supported by
the PHY (using an "rgmii-*id" PHY mode, and/or "[rt]xc-skew-ps"
properties).
Historically, the EtherAVB driver configured these delays based on the
"rgmii-*id" PHY mode. This caused issues with PHY drivers that
implement PHY internal delays properly[1]. Hence a backwards-compatible
workaround was added by masking the PHY mode[2].
Add proper support for explicit configuration of the MAC internal clock
delays using the new "[rt]x-internal-delay-ps" properties.
Fall back to the old handling if none of these properties is present.
[1] Commit bcf3440c6d ("net: phy: micrel: add phy-mode support for
the KSZ9031 PHY")
[2] Commit 9b23203c32 ("ravb: Mask PHY mode to avoid inserting
delays twice").
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, full delay handling is done in both the probe and resume
paths. Split it in two parts, so the resume path doesn't have to redo
the parsing part over and over again.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr reported that after resume from suspend RTL8402 partially
truncates incoming packets, and re-initializing register RxConfig
before the actual chip re-initialization sequence is needed to avoid
the issue.
Reported-by: Petr Tesarik <ptesarik@suse.cz>
Proposed-by: Petr Tesarik <ptesarik@suse.cz>
Tested-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr reported that system freezes on r8169 driver load on a system
using ether_clk. The original change was done under the assumption
that the clock isn't needed for basic operations like chip register
access. But obviously that was wrong.
Therefore effectively revert the original change, and in addition
leave the clock active when suspending and WoL is enabled. Chip may
not be able to process incoming packets otherwise.
Fixes: 9f0b54cd16 ("r8169: move switching optional clock on/off to pll power functions")
Reported-by: Petr Tesarik <ptesarik@suse.cz>
Tested-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls to kzalloc() and kvzalloc() should be null-checked
in order to avoid any potential failures. In this case,
a potential null pointer dereference.
Fix this by adding null checks for _parse_attr_ and _flow_
right after allocation.
Addresses-Coverity-ID: 1497154 ("Dereference before null check")
Fixes: c620b77215 ("net/mlx5: Refactor tc flow attributes structure")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This code frees "shared_counter" and then dereferences on the next line
to get the error code.
Fixes: 1edae2335a ("net/mlx5e: CT: Use the same counter for both directions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When removing a flow from the slow path fdb, a flow attr struct is
allocated for the rule removal process. If the allocation fails the
code prints a warning message but continues with the removal flow
which include dereferencing a pointer which could be null.
Fix this by exiting the function in case the attr allocation failed.
Fixes: c620b77215 ("net/mlx5: Refactor tc flow attributes structure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use the PCI device directly for dma accesses as non PCI device unlikely
support IOMMU and dma mappings.
Introduce and use helper routine to access DMA device.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Set flow source as hint for local vport.
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently devlink eswitch ports are registered and unregistered by the
representor layer.
However it is better to register them at eswitch layer so that in future
user initiated command port add and delete commands can also
register/unregister devlink ports without depending on representor layer.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To register and unregister devlink ports when loading/unload
representors, refactor the code to helper functions.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently only VF vports need egress ACL table.
Add a generic helper to check whether a vport need egress
ACL table or not.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently only 256 vports can be supported as only 8 bits are
reserved for them and 8 bits are reserved for vhca_ids in
metadata reg c0. To support more than 256 vports, replace
vhca_id with a unique shorter 4-bit PF number which covers
upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095.
This will continue to generate unique metadata even if
multiple PCI devices have same switch_id.
Signed-off-by: sunils <sunils@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Skip the rule according to flow arrival source, in case of RX and the
source is local port skip and in case of TX and the source is uplink
skip, we get this info according to the flow source hint we get from
upper layers when creating the rule.
This is needed because for example in case of FDB table which has a TX
and RX tables and we are inserting a rule with an encap action which
is only a TX action, in this case rule will fail on RX, so we can rely
on the flow source hint and skip RX in such case.
Until now we relied on metadata regc_0 that upper layer mapped the
port in the regc_0, but the problem is that upper layer did not always
use regc_0 for port mapping, so now we added support to flow source
hint which upper layers will pass to SW steering when creating a rule.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Instead of getting the tag in each function, call the builder
directly with the tag. This will allow to use the same function
for building the tag and the bitmask.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The misc3 variable is used only once and can be dropped.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When we create a matcher we check that all fields are consumed.
There is no need for this specific check. This keeps the STE
builder functions simple and clean.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mask validity for ste builders is checked by mlx5dr_ste_build_pre_check
during matcher creation.
It already checks the mask value of source_vport, so removing
this duplicated check.
Also, moving there the check of source_eswitch_owner_vhca_id mask.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Validity check is done by reading the next lu_type from the STE,
this check can be replaced by checking the refcount.
This will make the check independent on internal STE structure.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In one corner case scenario, the driver device lif setup can
get delayed such that the ionic_watchdog_cb() timer goes off
before the ionic->lif is set, thus causing a NULL pointer panic.
We catch the problem by checking for a NULL lif just a little
earlier in the callback.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to be better at making sure we don't have a link check
watchdog go off while we're shutting things down, so let's stop
the timer as soon as we start the remove.
Meanwhile, since that was the only thing in
ionic_dev_teardown(), simplify and remove that function.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mbox implementation in octeontx2 driver has three states
alloc, send and reset in mbox response. VF allocate and
sends message to PF for processing, PF ACKs them back and
reset the mbox memory. In some case we see synchronization
issue where after msgs_acked is incremented and before
mbox_reset API is called, if current execution is scheduled
out and a different thread is scheduled in which checks for
msgs_acked. Since the new thread sees msgs_acked == msgs_sent
it will try to allocate a new message and to send a new mbox
message to PF.Now if mbox_reset is scheduled in, PF will see
'0' in msgs_send.
This patch fixes the issue by calling mbox_reset before
incrementing msgs_acked flag for last processing message and
checks for valid message size.
Fixes: d424b6c02 ("octeontx2-pf: Enable SRIOV and added VF mbox handling")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently in otx2_open on failure of nix_lf_start
transmit queues are not stopped which are already
started in link_event. Since the tx queues are not
stopped network stack still try's to send the packets
leading to driver crash while access the device resources.
Fixes: 50fe6c02e ("octeontx2-pf: Register and handle link notifications")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For TCP/UDP checksum offload feature in Octeontx2
expects L3TYPE to be set irrespective of IP header
checksum is being offloaded or not. Currently for
IPv6 frames L3TYPE is not being set resulting in
packet drop with checksum error. This patch fixes
this issue.
Fixes: 3ca6c4c88 ("octeontx2-pf: Add packet transmission support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet replication feature present in Octeontx2
is a hardware linked list of PF and its VF
interfaces so that broadcast packets are sent
to all interfaces present in the list. It is
driver job to add and delete a PF/VF interface
to/from the list when the interface is brought
up and down. This patch fixes the
npc_enadis_default_entries function to handle
broadcast replication properly if packet replication
feature is present.
Fixes: 40df309e41 ("octeontx2-af: Support to enable/disable default MCAM entries")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct macb_platform_data is only used by macb_pci to register the platform
device, move its definition to cadence/macb.h and remove platform_data/macb.h
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the driver initializes in safe mode, it will call
ice_set_safe_mode_caps. This results in clearing the capabilities
structures, in order to set them up for operating in safe mode, ensuring
many features are disabled.
This has a side effect of also clearing the capability bits that relate
to NVM update. The result is that the device driver will not indicate
support for unified update, even if the firmware is capable.
Fix this by adding the relevant capability fields to the list of values
we preserve. To simplify the code, use a common_cap structure instead of
a handful of local variables. To reduce some duplication of the
capability name, introduce a couple of macros used to restore the
capabilities values from the cached copy.
Fixes: de9b277ee0 ("ice: Add support for unified NVM update flow capability")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice driver needs to wait for a firmware response to each command to
write a block of data to the scratch area used to update the device
firmware. The driver currently waits for up to 1 second for this to be
returned.
It turns out that firmware might take longer than 1 second to return
a completion in some cases. If this happens, the flash update will fail
to complete.
Fix this by increasing the maximum time that the driver will wait for
both writing a block of data, and for activating the new NVM bank. The
timeout for an erase command is already several minutes, as the firmware
had to erase the entire bank which was already expected to take a minute
or more in the worst case.
In the case where firmware really won't respond, we will now take longer
to fail. However, this ensures that if the firmware is simply slow to
respond, the flash update can still complete. This new maximum timeout
should not adversely increase the update time, as the implementation for
wait_event_interruptible_timeout, and should wake very soon after we get
a completion event. It is better for a flash update be slow but still
succeed than to fail because we gave up too quickly.
Fixes: d69ea414c9 ("ice: implement device flash update via devlink")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently a new filter is created, containing just enough correct
information to be able to call ocelot_vcap_block_find_filter_by_index()
on it.
This will be limiting us in the future, when we'll have more metadata
associated with a filter, which will matter in the stats() and destroy()
callbacks, and which we can't make up on the spot. For example, we'll
start "offloading" some dummy tc filter entries for the TCAM skeleton,
but we won't actually be adding them to the hardware, or to block->rules.
So, it makes sense to avoid deleting those rules too. That's the kind of
thing which is difficult to determine unless we look up the real filter.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And rename the existing find to ocelot_vcap_block_find_filter_by_index.
The index is the position in the TCAM, and the id is the flow cookie
given by tc.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'cnt' variable is actually used for 2 purposes, to hold the number
of sub-words per VCAP entry, and the number of sub-words per VCAP
action.
In fact, I'm pretty sure these 2 numbers can never be different from one
another. By hardware definition, the entry (key) TCAM rows are divided
into the same number of sub-words as its associated action RAM rows.
But nonetheless, let's at least rename the variables such that
observations like this one are easier to make in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This gets rid of one of the 2 variables named, very generically,
"count".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calculating the offsets for the current entry within the row and
placing them inside struct vcap_data, the function assumes half key
entry (2 keys per row).
This patch modifies the vcap_data_offset_get() function to calculate a
correct data offset when the setting VCAP Type-Group of a key to
VCAP_TG_FULL or VCAP_TG_QUARTER.
This is needed because, for example, VCAP ES0 only supports full keys.
Also rename the 'count' variable to 'num_entries_per_row' to make the
function just one tiny bit easier to follow.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we'll make the switch to multiple chain offloading, we'll want to
know first what VCAP block the rule is offloaded to. This impacts what
keys are available. Since the VCAP block is determined by what actions
are used, parse the action first.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we are deriving these from the constants exposed by the
hardware, we can delete the static info we're keeping in the driver.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The numbers in struct vcap_props are not intuitive to derive, because
they are not a straightforward copy-and-paste from the reference manual
but instead rely on a fairly detailed level of understanding of the
layout of an entry in the TCAM and in the action RAM. For this reason,
bugs are very easy to introduce here.
Ease the work of hardware porters and read from hardware the constants
that were exported for this particular purpose. Note that this implies
that struct vcap_props can no longer be const.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation step for the offloading to ES0, let's create the
infrastructure for talking with this hardware block.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation step for the offloading to IS1, let's create the
infrastructure for talking with this hardware block.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the Ocelot switches there are 3 TCAMs: VCAP ES0, IS1 and IS2, which
have the same configuration interface, but different sets of keys and
actions. The driver currently only supports VCAP IS2.
In preparation of VCAP IS1 and ES0 support, the existing code must be
generalized to work with any VCAP.
In that direction, we should move the structures that depend upon VCAP
instantiation, like vcap_is2_keys and vcap_is2_actions, out of struct
ocelot and into struct vcap_props .keys and .actions, a structure that
is replicated 3 times, once per VCAP. We'll pass that structure as an
argument to each function that does the key and action packing - only
the control logic needs to distinguish between ocelot->vcap[VCAP_IS2]
or IS1 or ES0.
Another change is to make use of the newly introduced ocelot_target_read
and ocelot_target_write API, since the 3 VCAPs have the same registers
but put at different addresses.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although it doesn't look like it is possible to hit these conditions
from user space, there are 2 separate, but related, issues.
First, the ocelot_vcap_block_get_filter_index function, née
ocelot_ace_rule_get_index_id prior to the aae4e500e1 ("net: mscc:
ocelot: generalize the "ACE/ACL" names") rename, does not do what the
author probably intended. If the desired filter entry is not present in
the ACL block, this function returns an index equal to the total number
of filters, instead of -1, which is maybe what was intended, judging
from the curious initialization with -1, and the "++index" idioms.
Either way, none of the callers seems to expect this behavior.
Second issue, the callers don't actually check the return value at all.
So in case the filter is not found in the rule list, propagate the
return code.
So update the callers and also take the opportunity to get rid of the
odd coding idioms that appear to work but don't.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some targets (register blocks) in the Ocelot switch that are
instantiated more than once. For example, the VCAP IS1, IS2 and ES0
blocks all share the same register layout for interacting with the cache
for the TCAM and the action RAM.
For the VCAPs, the procedure for servicing them is actually common. We
just need an API specifying which VCAP we are talking to, and we do that
via these raw ocelot_target_read and ocelot_target_write accessors.
In plain ocelot_read, the target is encoded into the register enum
itself:
u16 target = reg >> TARGET_OFFSET;
For the VCAPs, the registers are currently defined like this:
enum ocelot_reg {
[...]
S2_CORE_UPDATE_CTRL = S2 << TARGET_OFFSET,
S2_CORE_MV_CFG,
S2_CACHE_ENTRY_DAT,
S2_CACHE_MASK_DAT,
S2_CACHE_ACTION_DAT,
S2_CACHE_CNT_DAT,
S2_CACHE_TG_DAT,
[...]
};
which is precisely what we want to avoid, because we'd have to duplicate
the same register map for S1 and for S0, and then figure out how to pass
VCAP instance-specific registers to the ocelot_read calls (basically
another lookup table that undoes the effect of shifting with
TARGET_OFFSET).
So for some targets, propose a more raw API, similar to what is
currently done with ocelot_port_readl and ocelot_port_writel. Those
targets can only be accessed with ocelot_target_{read,write} and not
with ocelot_{read,write} after the conversion, which is fine.
The VCAP registers are not actually modified to use this new API as of
this patch. They will be modified in the next one.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not use rx_desc pointers if possible since rx descriptors are stored in
uncached memory and dereferencing rx_desc pointers generate extra loads.
This patch improves XDP_DROP performance of ~ 110Kpps (700Kpps vs 590Kpps)
on Marvell Espressobin
Analyzed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VIA Technologies VT8251 South Bridge's integrated Rhine-II
Ethernet MAC comes has a PCI revision value of 0x7c. This was
verified on ASUS P5V800-VM mainboard.
Signed-off-by: Kevin Brace <kevinbrace@bracecomputerlab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rhine_resume() and rhine_suspend(), the code calls netif_running()
to see if the network interface is down or not. If it is down (i.e.,
netif_running() returning false), they will skip any housekeeping work
within the function relating to the hardware. This becomes a problem
when the hardware resumes from a standby since it is counting on
rhine_resume() to map its MMIO and power up rest of the hardware.
Not getting its MMIO remapped and rest of the hardware powered
up lead to a soft reset failure and hardware disappearance. The
solution is to map its MMIO and power up rest of the hardware inside
rhine_open() before soft reset is to be performed. This solution was
verified on ASUS P5V800-VM mainboard's integrated Rhine-II Ethernet
MAC inside VIA Technologies VT8251 South Bridge.
Signed-off-by: Kevin Brace <kevinbrace@bracecomputerlab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace panic() call in lib8390.c with BUILD_BUG_ON()
since checking the size of struct e8390_pkt_hdr should
happen at compile-time.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
vxge_os_dma_malloc() and vxge_os_dma_malloc_async() are both called from
callchains which use GFP_KERNEL allocations unconditionally or have other
requirements to be called from fully preemptible task context..
vxge_os_dma_malloc():
1) __vxge_hw_blockpool_create() <- GFP_KERNEL
2) __vxge_hw_mempool_grow() <- vzalloc()
__vxge_hw_blockpool_malloc()
vxge_os_dma_malloc_async():
1 __vxge_hw_mempool_grow() <- vzalloc()
__vxge_hw_blockpool_malloc()
__vxge_hw_blockpool_blocks_add()
2) vxge_hw_vpath_open() <- vzalloc()
__vxge_hw_blockpool_block_allocate()
That means neither of these functions needs a conditional allocation mode.
Remove the in_interrupt() conditional and use GFP_KERNEL.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
lance_interrupt() contains two pointless checks:
- A check whether the 'dev_id' argument is NULL. 'dev_id' is the pointer
which was handed in to request_irq() and the interrupt handler will
always be invoked with that pointer as 'dev_id' argument by the core
code.
- A check for interrupt reentrancy. The core code already guarantees
non-reentrancy of interrupt handlers.
Remove these check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
bigmac_init_rings() has an argument signaling if it is called from the
interrupt handler. This is used to decide between GFP_KERNEL and GFP_ATOMIC
for memory allocations.
But it also checks in_interrupt() to handle invocations which come from the
timer callback bigmac_timer() via bigmac_hw_init(), which is invoked with
'in_irq = 0'. While the timer callback is clearly not in hard interrupt
context it is still not sleepable context.
Rename the argument to `non_blocking' and set it to true if invoked from
the timer callback or the interrupt handler which allows to remove the
in_interrupt() check and makes the code consistent.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_ef10_try_update_nic_stats_vf() is now only invoked from thread context
and can sleep after efx::stats_lock is dropped.
Change the allocation mode from GFP_ATOMIC to GFP_KERNEL.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_ef10_try_update_nic_stats_vf() used in_interrupt() to figure out
whether it is safe to sleep (for MCDI) or not.
The only caller from which it was not is efx_net_stats(), which can be
invoked under dev_base_lock from net-sysfs::netstat_show().
So add a new update_stats_atomic() method to struct efx_nic_type, and call
it from efx_net_stats(), removing the need for
efx_ef10_try_update_nic_stats_vf() to behave differently for this case
(which it wasn't doing correctly anyway).
For all nic_types other than EF10 VF, this method is NULL so the the
regular update_stats() methods are invoked , which are happy with being
called from atomic contexts.
Fixes: f00bf2305c ("sfc: don't update stats on VF when called in atomic context")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.
sonic_quiesce() uses 'in_interrupt() || irqs_disabled()' to chose either
udelay() or usleep_range() in the wait loop.
In all callchains leading to it the context is well defined and known.
Add a 'may_sleep' argument and pass it through the various callchains
leading to this function.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
in_interrupt() is ill defined and does not provide what the name
suggests. The usage especially in driver code is deprecated and a tree wide
effort to clean up and consolidate the (ab)usage of in_interrupt() and
related checks is happening.
In this case the check covers only parts of the contexts in which these
functions cannot be called. It fails to detect preemption or interrupt
disabled invocations.
As the functions which are invoked from ionic_adminq_post() and
ionic_dev_cmd_wait() contain a broad variety of checks (always enabled or
debug option dependent) which cover all invalid conditions already, there
is no point in having inconsistent warnings in those drivers.
Just remove them.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The in_interrupt() usage in this driver tries to figure out which context
may sleep and which context may not sleep. in_interrupt() is not really
suitable as it misses both preemption disabled and interrupt disabled
invocations from task context.
Conditionals like that in driver code are frowned upon in general because
invocations of functions from invalid contexts might not be detected
as the conditional papers over it.
ionic_lif_addr() and _ionoc_lif_rx_mode() can be called from:
1) ->ndo_set_rx_mode() which is under netif_addr_lock_bh()) so it must not
sleep.
2) Init and setup functions which are in fully preemptible task context.
ionic_link_status_check_request() has two call paths:
1) NAPI which obviously cannot sleep
2) Setup which is again fully preemptible task context
Add arguments which convey the execution context to the affected functions
and let the callers provide the context instead of letting the functions
deduce it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
in_interrupt() is ill defined and does not provide what the name
suggests. The usage especially in driver code is deprecated and a tree wide
effort to clean up and consolidate the (ab)usage of in_interrupt() and
related checks is happening.
In this case the checks cover only parts of the contexts in which these
functions cannot be called. They fail to detect preemption or interrupt
disabled invocations.
As the functions which are invoked from the various places contain already
a broad variety of checks (always enabled or debug option dependent) cover
all invalid conditions already, there is no point in having inconsistent
warnings in those drivers.
Just remove them.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.
mpc52xx_fec_stop() uses in_interrupt() to check if it is safe to sleep. All
callers run in well defined contexts.
Pass an argument from the callers indicating whether it is safe to sleep.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
e100_hw_init() invokes e100_self_test() only if in_interrupt() returns
false as e100_self_test() uses msleep() which requires sleepable task
context. The in_interrupt() check is incomplete because in_interrupt()
cannot catch callers from contexts which have just preemption or interrupts
disabled.
e100_hw_init() is invoked from:
- e100_loopback_test() which clearly is sleepable task context as the
function uses msleep() itself.
- e100_up() which clearly is sleepable task context as well because it
invokes e100_alloc_cbs() abd request_irq() which both require sleepable
task context due to GFP_KERNEL allocations and mutex_lock() operations.
Remove the pointless in_interrupt() check.
As a side effect of this analysis it turned out that e100_rx_alloc_list()
which is only invoked from e100_loopback_test() and e100_up() pointlessly
uses a GFP_ATOMIC allocation. The next invoked function e100_alloc_cbs() is
using GFP_KERNEL already.
Change the allocation mode in e100_rx_alloc_list() to GFP_KERNEL as well.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
t4_sge_stop() is only ever called from task context and the in_interrupt()
check is presumably a leftover from copying t3_sge_stop().
Aside of in_interrupt() being deprecated because it's not providing what it
claims to provide, this check would paper over illegitimate callers.
The functions invoked from t4_sge_stop() contain already warnings to catch
invocations from invalid contexts.
Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
t3_sge_stop() is called from task context and from error handlers in
interrupt context. It relies on in_interrupt() to differentiate the
contexts.
in_interrupt() is deprecated as it is ill defined and does not provide what
it suggests.
Instead of replacing it with some other construct, simply split the
function into t3_sge_stop_dma(), which can be called from any context, and
t3_sge_stop() which can be only called from task context.
This has the advantage that any bogus invocation of t3_sge_stop() from
wrong contexts can be caught by debug kernels instead of being papered over
by the conditional.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
in_interrupt() is ill defined and does not provide what the name
suggests. The usage especially in driver code is deprecated and a tree wide
effort to clean up and consolidate the (ab)usage of in_interrupt() and
related checks is happening.
In this case the check covers only parts of the contexts in which these
functions cannot be called. It fails to detect preemption or interrupt
disabled invocations.
As the functions which are invoked from at*_reinit_locked() contain a broad
variety of checks (always enabled or debug option dependent) which cover
all invalid conditions already, there is no point in having inconsistent
warnings in those drivers.
Just remove them.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
enic_dev_wait() has a BUG_ON(in_interrupt()).
Chasing the callers of enic_dev_wait() revealed the gems of enic_reset()
and enic_tx_hang_reset() which are both invoked through work queues in
order to be able to call rtnl_lock(). So far so good.
After locking rtnl both functions acquire enic::enic_api_lock which
serializes against the (ab)use from infiniband. This is where the
trainwreck starts.
enic::enic_api_lock is a spin_lock() which implicitly disables preemption,
but both functions invoke a ton of functions under that lock which can
sleep. The BUG_ON(in_interrupt()) does not trigger in that case because it
can't detect the preempt disabled condition.
This clearly has never been tested with any of the mandatory debug options
for 7+ years, which would have caught that for sure.
Cure it by adding a enic_api_busy member to struct enic, which is modified
and evaluated with enic::enic_api_lock held.
If enic_api_devcmd_proxy_by_index() observes enic::enic_api_busy as true,
it drops enic::enic_api_lock and busy waits for enic::enic_api_busy to
become false.
It would be smarter to wait for a completion of that busy period, but
enic_api_devcmd_proxy_by_index() is called with other spin locks held which
obviously can't sleep.
Remove the BUG_ON(in_interrupt()) check as well because it's incomplete and
with proper debugging enabled the problem would have been caught from the
debug checks in schedule_timeout().
Fixes: 0b038566c0 ("drivers/net: enic: Add an interface for USNIC to interact with firmware")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the ktls stats were at adapter level, but now changing it
to port level.
Fixes: 62370a4f34 ("cxgb4/chcr: Add ipv6 support and statistics")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing these logs to dynamic debugs. If issue is seen, these
logs can be enabled at run time.
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since driver first return success to tls_dev_add, if req to HW is
successful, but later if HW returns failure, that connection traffic
fails permanently and connection status remains unknown to stack.
v1->v2:
- removed conn_up from all places.
v2->v3:
- Corrected timeout handling.
Fixes: 34aba2c450 ("cxgb4/chcr : Register to tls add and del callback")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds debugfs to dump tqp enable status.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to query specifications of the device, add a new debugfs
command "dev spec" to do that.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NETIF_F_HW_VLAN_CTAG_FILTER is not set in netdev->hw_feature,
but set in netdev->features.
So the handler of NETIF_F_HW_VLAN_CTAG_FILTER in hns3_self_test() is
always true, remove it.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add RoCE VF client reset support by notifying the RoCE VF client
when hns3 VF is resetting and adding a interface to query whether
CMDQ is ready to work.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for UDP segmentation offload to the HNS3 driver
when the device can do it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the maximun BD number may not be 8 now, so rename
hns3_over_8bd() to hns3_over_max_bd().
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver is able to query the device's specifications,
which includes the maximum BD number of non TSO packet, so replace
macro HNS3_MAX_NON_TSO_BD_NUM with the queried value, and rewrite
macro HNS3_MAX_NON_TSO_SIZE whose value depends on the the maximum
BD number of non TSO packet.
Also, add a new parameter max_non_tso_bd_num to function
hns3_tx_bd_num() and hns3_skb_need_linearized(), then they can get
the maximum BD number of non TSO packet from the caller instead of
calculating by themself, The note of hns3_skb_need_linearized()
should be update as well.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for NAT-T-ESP to KPU parser configuration. NAT ESP is a UDP
based protocol. So move ESP to LE so that both UDP and ESP can be
extracted.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 fragmented packet may not contain completed layer 4 information.
So stop KPU parsing after setting ipv6 fragmentation flag.
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added some IPv6 protocol fields to the default MKEX profile.
They include everything from the beginning of IP header and up to
source address. The pattern occupies full KW2 in MCAM entry.
Only one out of two LD registers for this protocol is used.
Signed-off-by: Vidhya Vidhyaraman <vraman@marvell.com>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KPU profile interpret Extended DSA and eDSA by looking source dev. This
was incorrect and it restricts to use few source device ids and also
created confusion while parsing regular DSA tag. With below patch lookup
was based on bit 12 of Word0. This is always zero for DSA tag and it
should be one for Extended DSA and eDSA.
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marvell Prestera switches supports distributed switch architecture
by inserting Forward DSA tag of 4 bytes right after ethernet SMAC.
This tag don't have a tpid field.
This patch provides parser and extraction support for the same.
Default ldata extraction profile added for FDSA such that Src_port
is extracted and placed inplace of vlanid field. Like extended DSA
and eDSA tags,a special PKIND of 62 is used for this tag.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor KPU related NPC code gathering all configuration data in a
structured format and putting it in one place (npc_profile.h).
This increases readability and makes it easier to extend the profile
configuration (as opposed to jumping between multiple header and source
files).
To do this:
* Gather all KPU profile related data into a single adapter struct.
* Convert the built-in MKEX definition to a structured one to streamline
the MKEX loading.
* Convert LT default register configuration into a structure, keeping
default protocol settings in same file where identifiers for those
protocols are defined.
* Add a single point for KPU profile loading, so that its source may
change in the future once proper interfaces for loading such config
are in place.
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since LD contains LTYPE definitions tweaked toward efficient
NIX_AF_RX_FLOW_KEY_ALG(0..31)_FIELD(0..4) usage, the original location
of NPC_LT_LD_CUSTOM0/1 was aliased with MPLS_IN_* definitions.
Moving custom frame to value 6 and 7 removes the aliasing at the cost of
custom frames being also considered when TCP/UDP RSS algo is configured.
However since the goal of CUSTOM frames is to classify them to a
separate set of RQs, this cost is acceptable.
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI devices support two variants of the D3 power state: D3hot (main power
present) D3cold (main power removed). Previously struct pci_dev contained:
unsigned int d3_delay; /* D3->D0 transition time in ms */
unsigned int d3cold_delay; /* D3cold->D0 transition time in ms */
"d3_delay" refers specifically to the D3hot state. Rename it to
"d3hot_delay" to avoid ambiguity and align with the ACPI "_DSM for
Specifying Device Readiness Durations" in the PCI Firmware spec r3.2,
sec 4.6.9.
There is no change to the functionality.
Link: https://lore.kernel.org/r/20200730210848.1578826-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
Refactor the code according to the use of a flexible-array member in
struct qed_ll2_tx_packet, instead of a one-element array and use the
struct_size() helper to calculate the size for the allocations. Commit
f5823fe689 ("qed: Add ll2 option to limit the number of bds per packet")
was used as a reference point for these changes.
Also, it's important to notice that flexible-array members should occur
last in any structure, and structures containing such arrays and that
are members of other structures, must also occur last in the containing
structure. That's why _cur_completing_packet_ is now moved to the bottom
in struct qed_ll2_tx_queue. _descq_mem_ and _cur_send_packet_ are also
moved for unification.
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays
Tested-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/5f707198.PA1UCZ8MYozYZYAR%25lkp@intel.com/
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding reference clock (1us tic) for all LPI timer on Intel platforms.
The reference clock is derived from ptp clk. This also enables all LPI
counter.
Signed-off-by: Rusaimi Amira Ruslan <rusaimi.amira.rusaimi@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor send_control_ip_offload out of handle_query_ip_offload_rsp.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor send_query_ip_offload out of handle_request_cap_rsp to
pair with handle_query_ip_offload_rsp.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new name send_query_map pairs with handle_query_map_rsp.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new name send_request_cap pairs with handle_request_cap_rsp.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new name send_query_cap pairs with handle_query_cap_rsp.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set up the speed according to crq->query_phys_parms.rsp.speed.
Fix IBMVNIC_10GBPS typo.
Fixes: f8d6ae0d27 ("ibmvnic: Report actual backing device speed and duplex values")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.
In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Try to recycle the xdp tx buffer into the in-irq page_pool cache if
mvneta_txq_bufs_free is executed in the NAPI context for XDP_TX use case
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devices IDs for the next LOM generations that will be
available on the next Intel Client platform (Meteor Lake)
This patch provides the initial support for these devices
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
flash_bank_size and flash_base_addr field not in use and can
be removed from a nvm_info structure
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When we set the BASET registers of i225 with a base_time in the
future, i225 will "hold" all packets until that base_time is reached,
causing a lot of TX Hangs.
As this behaviour seems contrary to the expectations of the IEEE
802.1Q standard (section 8.6.9, especially 8.6.9.4.5), let's start by
rejecting these types of schedules. If this is too limiting, we can
for example, setup a timer to configure the BASET registers closer to
the start time, only blocking the packets for a "short" while.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The next patch will need a way to retrieve the current timestamp from
the NIC's PTP clock.
The 'i225' suffix is removed, if anything model specific is needed,
those specifics should be hidden by this function.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Boolean reset disable flag not applicable for i225 device and
could be removed.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Many TSN features depend on the internal PTP clock, so the internal
PTP jumping when the adapter is reset can cause problems, usually in
the form of "TX Hangs" warnings in the driver.
The solution is to save the PTP time before a reset and restore it
after the reset is done. The value of the PTP time is saved before a
reset and we use the difference from CLOCK_MONOTONIC from reset time
to now, to correct what's going to be the new PTP time.
This is heavily inspired by commit bf4bf09bdd ("i40e: save PTP time
before a device reset").
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In i225, it's no longer necessary to use the SYSTIMR register to
latch the timer value, the timestamp is latched when SYSTIML is read.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Completion to commit 900d1e8b34 ("igc: Add LPI counters")
LPI counters exposed by statistics update method.
A EEE TX LPI counter reflect the transmitter entries EEE (IEEE 802.3az)
into the LPI state. A EEE RX LPI counter reflect the receiver link
partner entries into EEE(IEEE 802.3az) LPI state.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i225 advanced receive descriptor doesn't have the following extend error
bits: CE, SE, SEQ, CXE. In addition to that, the bit TCPE is called L4E
in the datasheet.
Clean up the code accordingly, and get rid of the macro
IGC_RXDEXT_ERR_FRAME_ERR_MASK since it doesn't make much sense anymore.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The Tx timestamp timeout is already checked by the watchdog_task
which runs periodically. In addition to that, from the ptp_tx work
perspective, if __IGC_PTP_TX_IN_PROGRESS flag is set we always want
handle the timestamp stored in hardware and update the skb. So remove
the timeout check in igc_ptp_tx_work() function.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ptp_tx work is scheduled only if TSICR.TXTS bit is set, therefore
TSYNCTXCTL.TXTT_0 bit is expected to be set when we check it igc_ptp_tx_
work(). If it isn't, something is really off and rescheduling the ptp_tx
work to check it later doesn't help much. This patch changes the code to
WARN_ON_ONCE() if this situation ever happens.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Rename the IGC_TSYNCTXCTL_VALID macro to IGC_TSYNCTXCTL_TXTT_0 so it
matches the datasheet.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add new device ID's for the next step of the silicon and
reflect i221 and i226 parts
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fixed flash presence check for 82576 controllers so the part
number string is read and displayed correctly.
Signed-off-by: Gal Hammer <ghammer@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add XDP support to the IGB driver.
The implementation follows the IXGBE XDP implementation
closely and I used the following patches as basis:
1. commit 9247080816 ("ixgbe: add XDP support for pass and drop actions")
2. commit 33fdc82f08 ("ixgbe: add support for XDP_TX action")
3. commit ed93a39871 ("ixgbe: tweak page counting for XDP_REDIRECT")
Due to the hardware constraints of the devices using the
IGB driver we must share the TX queues with XDP which
means locking the TX queue for XDP.
I ran tests on an older device to get better numbers.
Test machine:
Intel(R) Atom(TM) CPU C2338 @ 1.74GHz (2 Cores)
2x Intel I211
Routing Original Driver Network Stack: 382 Kpps
Routing XDP Redirect (xdp_fwd_kern): 1.48 Mpps
XDP Drop: 1.48 Mpps
Using XDP we can achieve line rate forwarding even on
an older Intel Atom CPU.
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Convert ice to the new infra, use share port tables.
Leave a tiny bit more error checking in place than usual,
because this driver really does quite a bit of magic.
We need to calculate the number of VxLAN and GENEVE entries
the firmware has reserved.
Thanks to the conversion the driver will no longer sleep in
an atomic section.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ice_get_open_tunnel_port() is always passed TNL_ALL
as the second parameter.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the "shared port table" to convert i40e to the new
infra.
i40e did not have any reference tracking, locking is also dodgy
because rtnl gets released while talking to FW, so port may get
removed from the table while it's getting added etc.
On the good side i40e does not seem to be using the ports for
TX so we can remove the table from the driver state completely.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_alloc_etherdev() to simplify the code instead of alloc_etherdev().
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mistakenly bit 2 was set instead of bit 3 as in the vendor driver.
Fixes: a7a92cf815 ("r8169: sync PCIe PHY init with vendor driver 8.047.01")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic that calculates the preset maximum value for combined
channel does not take into account the rings used for XDP and mqprio
TCs. Each of these features will reduce the number of TX rings. Add
the logic to divide the TX rings accordingly based on whether the
device is currently in XDP mode and whether TCs are in use.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This feature allows the user to set the different FEC modes on the NIC
port. Any new setting will take effect immediately after a link toggle.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code is reporting the FEC configured settings during link up.
Change it to report the more useful active FEC encoding that may be
negotiated or auto detected.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement .get_fecparam() method to report the configured and active FEC
settings. Also report the supported and advertised FEC settings to
the .get_link_ksettings() method.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PORT_PHY_CONFIG is always sent with REQ_FLAGS_RESET_PHY set. This flag
must be set in order for the firmware to institute the requested PHY
change immediately, but it results in a link flap. This is unnecessary
and results in an improved user experience if the PHY reconfiguration
is avoided when the user requested speed does not constitute a change.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some 200G dual port NICs, if one port is configured to 200G,
firmware will disable the ethernet link on the other port. Firmware
will send notification to the driver for the disabled port when this
happens. Define a new field in the link_info structure to keep track
of this state. The new phy_state field replaces the unused loop_back
field.
Log a message when the phy_state changes state. In the disabled state,
disallow any PHY configurations on the disabled port as the firmware
will fail all calls to configure the PHY in this state.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ethtool PAM4 link modes for:
50000baseCR_Full
100000baseCR2_Full
200000baseCR4_Full
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware interface has added support for new link speeds using
PAM4 modulation. Expand the bnxt_link_info structure to closely
mirror the new firmware structures. Add logic to copy the PAM4
capabilities and settings from the firmware.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It will be necessary to update more than one field in the link_info
structure when PAM4 speeds are added in a later patch. Instead of
merely translating ethtool speed values to firmware speed values,
change the responsiblity of this function to update all the necessary
link_info fields required to force the speed change to the desired
ethtool value. This also reduces code duplication somewhat at the two
call sites, which otherwise both have to independently update link_info
fields to turn off auto negotiation advertisements.
Also use the appropriate REQ_FORCE_LINK_SPEED definitions. These happen
to have the same values, but req_link_speed is utilimately passed as
force_link_speed in HWRM_PORT_PHY_CFG which is not defined in terms of
REQ_AUTO_LINK_SPEED.
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the code for determining an advertised speed is no longer
supported into a separate function. This will avoid some code
duplication in a later patch when supporting PAM4 speeds, since
these speeds are specified in a separate field.
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The main changes include FEC, ECN statistics, HWRM_PORT_PHY_QCFG
response size reduction, and a new counter added to
ctx_hw_stats_ext struct to support the new 58818 chip.
The ctx_hw_stats_ext structure is now the superset supporting the new
58818 chips and the prior P5 chips. Add a new flag to identify the new
chip and use constants for the chip specific ring statistics sizes
instead of the size of the structure.
Because the HWRM_PORT_PHY_QCFG response structure size has shrunk back
to 96 bytes, the workaround added earlier to limit the size of this
message for forwarding to the VF can be removed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rivers/net/ethernet/marvell/mvpp2/mvpp2_main.c:7084:36: warning: ‘mvpp2_acpi_match’ defined but not used [-Wunused-const-variable=]
7084 | static const struct acpi_device_id mvpp2_acpi_match[] = {
| ^~~~~~~~~~~~~~~~
Wrap the definition inside #ifdef/#endif.
Compile tested only.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add structures for port statistics which read from core and not directly
from registers.
When netdev's ethtool statistics are queried, query the corresponding
module's overheat counter from core and expose it as
"transceiver_overheat".
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Module temperature warning events are enabled for modules that have a
temperature sensor and configured according to the temperature
thresholds queried from the module.
When a module is unplugged we are guaranteed not to get temperature
warning events. However, when a module is plugged in we need to
potentially update its current settings (i.e., event enablement and
thresholds).
Register to port module plug/unplug events and update module's settings
upon plug in events.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The overheat counter is a per-module counter, but it is exposed as part
of the corresponding netdev's statistics. It should therefore be
presented to user space relative to the netdev's lifetime.
Query the counter just before registering the netdev, so that the value
exposed to user space will be relative to this initial value.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MTWE (Management Temperature Warning Event) is triggered for sensors
whose temperature event enable bit is enabled in the MTMP register.
Enable events for all the modules that have a temperature sensor.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MTWE (Management Temperature Warning Event) is triggered when module's
temperature is higher than its threshold.
Register for MTWE events and increase the module's overheat counter when
its corresponding sensor goes above the configured threshold.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize an array that stores per-module overheat state and a counter
indicating how many times the module was in overheat state.
Export a function to query the counter according to module number.
Will be used later on by the switch driver (i.e., mlxsw_spectrum) to expose
module's overheat counter as part of ethtool statistics.
Initialize mlxsw_env after driver initialization to be able to query
number of modules from MGPIR register.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MTMP register controls various temperature settings on a per-sensor
basis. Subsequent patches are going to alter some of these settings for
sensors found on port modules in response to certain events.
In order to prevent the current callers that write to MTMP from
overriding these settings, have them first query the register and then
change only the relevant register fields.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PMAOS register configures and retrieves the per module status.
The register is used also for enabling event for status change.
It will be used to enable PMPE (Port Module Plug/Unplug) event.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PMPE register reports any operational status change of a module.
It will be used for enabling temperature warning event when a module is
plugged in.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MTWE (Management Temperature Warning Event) register, which is used
for over temperature warning.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As function hclge_shaper_para_calc() has too many arguments to add
more, so encapsulate its three arguments ir_b, ir_u, ir_s into a
structure.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device specifications querying is unsupported by the old
firmware, in this case, these specifications are 0. However,
some specifications should not be 0 or will cause problem.
So after querying from firmware, some device specifications
are needed to check their value and set to default value if
their values are 0.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The max tm rate is a fixed value(100Gb/s) now as it is defined by a
macro. In order to support other rates in different kinds of device,
it is better to use specification queried from firmware to replace
this macro.
As function hclge_shaper_para_calc() has too many arguments to add
more, so encapsulate its three arguments ir_b, ir_u, ir_s into a
structure.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve code maintainability and compatibility, new commands
HCLGE_OPC_QUERY_DEV_SPECS for PF and HCLGEVF_OPC_QUERY_DEV_SPECS
for VF are introduced to query device specifications, instead of
statically defining specifications by checking the hardware version
or other methods.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds debugfs to dump each device capability whether is supported.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to improve code maintainability and compatibility, the
capabilities of new features are queried from firmware.
The member flag in struct hnae3_ae_dev indicates not only
capabilities, but some initialized status. As capabilities bits
queried from firmware is too many, it is better to use new member
to indicate them. So adds member capabs in struce hnae3_ae_dev.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the revision of the pci device is used to identify
whether FEC is supported, which is not good for maintainability
and compatibility. So use a capability flag to do that.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to improve code maintainability and compatibility,
add support to query the device capability by expanding the
existing version query command. The device capability refers
to the features supported by the device.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fibre device of PCI revision 0x20 don't support autoneg, and the ops
get_autoneg() return AUTONEG_DISABLE so function hns3_nway_reset()
will return earlier than judging PCI revision.
Function hclge_handle_rocee_ras_error() don't need to judge PCI
revision again because its caller hclge_handle_hw_ras_error() has
judged once.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To better identify the device version, struct hnae3_handle adds a
member dev_version to replace pci revision. The dev_version consists
of hardware version and PCI revision. The hardware version is queried
from firmware by an existing firmware version query command.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If mlxsw_sp_acl_tcam_group_id_get() fails, the mutex initialized earlier
is not destroyed.
Fix this by initializing the mutex after calling the function. This is
symmetric to mlxsw_sp_acl_tcam_group_del().
Fixes: 5ec2ee28d2 ("mlxsw: spectrum_acl: Introduce a mutex to guard region list updates")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the ocelot_configure_cpu() function, which was in fact bringing
up 2 ports: the CPU port module, which both switchdev and DSA have, and
the NPI port, which only DSA has.
The (non-Ethernet) CPU port module is at a fixed index in the analyzer,
whereas the NPI port is selected through the "ethernet" property in the
device tree.
Therefore, the function to set up an NPI port is DSA-specific, so we
move it there, simplifying the ocelot switch library a little bit.
Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: UNGLinuxDriver <UNGLinuxDriver@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support the recently added DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK
parameter in the ice flash update handler. Convert the overwrite mask
bitfield into the appropriate preservation level used by the firmware
when updating.
Because there is no equivalent preservation level for overwriting only
identifiers, this combination is rejected by the driver as not supported
with an appropriate extended ACK message.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The devlink core recently gained support for checking whether the driver
supports a flash_update parameter, via `supported_flash_update_params`.
However, parameters are specified as function arguments. Adding a new
parameter still requires modifying the signature of the .flash_update
callback in all drivers.
Convert the .flash_update function to take a new `struct
devlink_flash_update_params` instead. By using this structure, and the
`supported_flash_update_params` bit field, a new parameter to
flash_update can be added without requiring modification to existing
drivers.
As before, all parameters except file_name will require driver opt-in.
Because file_name is a necessary field to for the flash_update to make
sense, no "SUPPORTED" bitflag is provided and it is always considered
valid. All future additional parameters will require a new bit in the
supported_flash_update_params bitfield.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When implementing .flash_update, drivers which do not support
per-component update are manually checking the component parameter to
verify that it is NULL. Without this check, the driver might accept an
update request with a component specified even though it will not honor
such a request.
Instead of having each driver check this, move the logic into
net/core/devlink.c, and use a new `supported_flash_update_params` field
in the devlink_ops. Drivers which will support per-component update must
now specify this by setting DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT in
the supported_flash_update_params in their devlink_ops.
This helps ensure that drivers do not forget to check for a NULL
component if they do not support per-component update. This also enables
a slightly better error message by enabling the core stack to set the
netlink bad attribute message to indicate precisely the unsupported
attribute in the message.
Going forward, any new additional parameter to flash update will require
a bit in the supported_flash_update_params bitfield.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Cc: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver subfolder files refer parent folder includes in an
absolute manner.
Makefile contains a -I for this, but apparently that does not
work if object tree is separated.
Adding srctree to fix that.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a null-check for _pcs_, but it is being dereferenced
prior to this null-check. So, if _pcs_ can actually be null,
then there is a potential null pointer dereference that should
be fixed by null-checking _pcs_ before being dereferenced.
Addresses-Coverity-ID: 1497159 ("Dereference before null check")
Fixes: 94ae899b20 ("dpaa2-mac: add PCS support through the Lynx module")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SHARED_FS is enabled on a DPNI object the flow steering tables are
shared between all the traffic classes. Modify the driver so that we
only add a new flow steering entry on the TC#0 when this new option is
enabled.
Signed-off-by: Ionut-robert Aron <ionut-robert.aron@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to dpaa2_eth_link_state_update() is a leftover from the time
when on DPAA2 platforms the PHYs were started at boot time so when an
ifconfig was issued on the associated interface, the link status needed
to be checked directly from the ndo_open() callback.
This is not needed anymore since we are now properly integrated with the
PHY layer thus a link interrupt will come directly from the PHY
eventually without the need to call the sync function.
Fix this up by removing the call to dpaa2_eth_link_state_update().
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to check if both the MDIO controller node and its
child node, the PCS device, are available since there is no chance that
the child node would be enabled when the parent it's not.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adding the support for TBF offload, the improper command version
was added even though the command format is for the V2 of
dpni_set_tx_shaping(). This does not affect the functionality of TBF
since the only change between these two versions is the addition of the
exceeded parameters which are not used in TBF. Still, fix the bug so
that we keep things in sync.
Fixes: 39344a8962 ("dpaa2-eth: add API for Tx shaping")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To start also "phy state machine", with UP state as it should be,
the phy_start() has to be used, in another case machine even is not
triggered. After this change negotiation is supposed to be triggered
by SM workqueue.
It's not correct usage, but it appears after the following patch,
so add it as a fix.
Fixes: 74a992b359 ("net: phy: add phy_check_link_status")
Signed-off-by: Ivan Khoronzhuk <ikhoronz@cisco.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
While unloading the dwmac-intel driver, clk_disable_unprepare() is
being called twice in stmmac_dvr_remove() and
intel_eth_pci_remove(). This causes kernel panic on the second call.
Removing the second call of clk_disable_unprepare() in
intel_eth_pci_remove().
Fixes: 09f012e64e ("stmmac: intel: Fix clock handling on error and remove paths")
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add option in plat_stmmacenet_data struct to enable VLAN Filter Fail
Queuing. This option allows packets that fail VLAN filter to be routed
to a specific Rx queue when Receive All is also set.
When this option is enabled:
- Enable VFFQ only when entering promiscuous mode, because Receive All
will pass up all rx packets that failed address filtering (similar to
promiscuous mode).
- VLAN-promiscuous mode is never entered to allow rx packet to fail VLAN
filters and get routed to selected VFFQ Rx queue.
Reviewed-by: Voon Weifeng <weifeng.voon@intel.com>
Reviewed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Chuah, Kim Tatt <kim.tatt.chuah@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the W=1 cleanups for ethernet, a million [*] driver
comments had to be cleaned up to get the W=1 compilation to
succeed. This change finally makes the drivers/net/ethernet tree
compile with W=1 set on the command line. NOTE: The kernel uses
kdoc style (see Documentation/process/kernel-doc.rst) when
documenting code, not doxygen or other styles.
After this patch the x86_64 build has no warnings from W=1, however
scripts/kernel-doc says there are 1545 more warnings in source files, that
I need to develop a script to fix in a followup patch.
The errors fixed here are all kdoc of a few classes, with a few outliers:
In file included from drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c:10:
drivers/net/ethernet/qlogic/netxen/netxen_nic.h:1193:18: warning: ‘FW_DUMP_LEVELS’ defined but not used [-Wunused-const-variable=]
1193 | static const u32 FW_DUMP_LEVELS[] = { 0x3, 0x7, 0xf, 0x1f, 0x3f, 0x7f, 0xff };
| ^~~~~~~~~~~~~~
... repeats 4 times...
drivers/net/ethernet/sun/cassini.c:2084:24: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]
2084 | RX_USED_ADD(page, i);
drivers/net/ethernet/natsemi/ns83820.c: In function ‘phy_intr’:
drivers/net/ethernet/natsemi/ns83820.c:603:6: warning: variable ‘tbisr’ set but not used [-Wunused-but-set-variable]
603 | u32 tbisr, tanar, tanlpar;
| ^~~~~
drivers/net/ethernet/natsemi/ns83820.c: In function ‘ns83820_get_link_ksettings’:
drivers/net/ethernet/natsemi/ns83820.c:1207:11: warning: variable ‘tanar’ set but not used [-Wunused-but-set-variable]
1207 | u32 cfg, tanar, tbicr;
| ^~~~~
drivers/net/ethernet/packetengines/yellowfin.c:1063:18: warning: variable ‘yf_size’ set but not used [-Wunused-but-set-variable]
1063 | int data_size, yf_size;
| ^~~~~~~
Normal kdoc fixes:
warning: Function parameter or member 'x' not described in 'y'
warning: Excess function parameter 'x' description in 'y'
warning: Cannot understand <string> on line <NNN> - I thought it was a doc line
[*] - ok it wasn't quite a million, but it felt like it.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kernel-doc script as used by W=1, is confused by the macro
usage inside the header describing the efx_ptp_data struct.
drivers/net/ethernet/sfc/ptp.c:345: warning: Function parameter or member 'MC_CMD_PTP_IN_TRANSMIT_LENMAX' not described in 'efx_ptp_data'
After some discussion on the list, break this patch out to
a separate one, and fix the issue through a creative
macro declaration.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the W=1 series for ethernet, these drivers were
discovered to be using kdoc style comments but were not actually
doing kdoc. The kernel uses kdoc style when documenting code, not
doxygen or other styles.
Fixed Warnings:
drivers/net/ethernet/amazon/ena/ena_com.c:613: warning: Function parameter or member 'ena_dev' not described in 'ena_com_set_llq'
drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c:1540: warning: Cannot understand * @brief Set VLAN filter table
drivers/net/ethernet/xilinx/ll_temac_main.c:114: warning: Function parameter or member 'lp' not described in 'temac_indirect_busywait'
drivers/net/ethernet/xilinx/ll_temac_main.c:129: warning: Function parameter or member 'lp' not described in 'temac_indirect_in32'
drivers/net/ethernet/xilinx/ll_temac_main.c:129: warning: Function parameter or member 'reg' not described in 'temac_indirect_in32'
drivers/net/ethernet/xilinx/ll_temac_main.c:147: warning: Function parameter or member 'lp' not described in 'temac_indirect_in32_locked'
drivers/net/ethernet/xilinx/ll_temac_main.c:147: warning: Function parameter or member 'reg' not described in 'temac_indirect_in32_locked'
drivers/net/ethernet/xilinx/ll_temac_main.c:172: warning: Function parameter or member 'lp' not described in 'temac_indirect_out32'
drivers/net/ethernet/xilinx/ll_temac_main.c:172: warning: Function parameter or member 'reg' not described in 'temac_indirect_out32'
drivers/net/ethernet/xilinx/ll_temac_main.c:172: warning: Function parameter or member 'value' not described in 'temac_indirect_out32'
drivers/net/ethernet/xilinx/ll_temac_main.c:188: warning: Function parameter or member 'lp' not described in 'temac_indirect_out32_locked'
drivers/net/ethernet/xilinx/ll_temac_main.c:188: warning: Function parameter or member 'reg' not described in 'temac_indirect_out32_locked'
drivers/net/ethernet/xilinx/ll_temac_main.c:188: warning: Function parameter or member 'value' not described in 'temac_indirect_out32_locked'
drivers/net/ethernet/xilinx/ll_temac_main.c:212: warning: Function parameter or member 'lp' not described in 'temac_dma_in32_be'
drivers/net/ethernet/xilinx/ll_temac_main.c:212: warning: Function parameter or member 'reg' not described in 'temac_dma_in32_be'
drivers/net/ethernet/xilinx/ll_temac_main.c:228: warning: Function parameter or member 'lp' not described in 'temac_dma_out32_be'
drivers/net/ethernet/xilinx/ll_temac_main.c:228: warning: Function parameter or member 'reg' not described in 'temac_dma_out32_be'
drivers/net/ethernet/xilinx/ll_temac_main.c:228: warning: Function parameter or member 'value' not described in 'temac_dma_out32_be'
drivers/net/ethernet/xilinx/ll_temac_main.c:247: warning: Function parameter or member 'lp' not described in 'temac_dma_dcr_in'
drivers/net/ethernet/xilinx/ll_temac_main.c:247: warning: Function parameter or member 'reg' not described in 'temac_dma_dcr_in'
drivers/net/ethernet/xilinx/ll_temac_main.c:255: warning: Function parameter or member 'lp' not described in 'temac_dma_dcr_out'
drivers/net/ethernet/xilinx/ll_temac_main.c:255: warning: Function parameter or member 'reg' not described in 'temac_dma_dcr_out'
drivers/net/ethernet/xilinx/ll_temac_main.c:255: warning: Function parameter or member 'value' not described in 'temac_dma_dcr_out'
drivers/net/ethernet/xilinx/ll_temac_main.c:265: warning: Function parameter or member 'lp' not described in 'temac_dcr_setup'
drivers/net/ethernet/xilinx/ll_temac_main.c:265: warning: Function parameter or member 'op' not described in 'temac_dcr_setup'
drivers/net/ethernet/xilinx/ll_temac_main.c:265: warning: Function parameter or member 'np' not described in 'temac_dcr_setup'
drivers/net/ethernet/xilinx/ll_temac_main.c:300: warning: Function parameter or member 'ndev' not described in 'temac_dma_bd_release'
drivers/net/ethernet/xilinx/ll_temac_main.c:330: warning: Function parameter or member 'ndev' not described in 'temac_dma_bd_init'
drivers/net/ethernet/xilinx/ll_temac_main.c:600: warning: Function parameter or member 'ndev' not described in 'temac_setoptions'
drivers/net/ethernet/xilinx/ll_temac_main.c:600: warning: Function parameter or member 'options' not described in 'temac_setoptions'
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A couple of drivers had a "generic documentation" section that
would trigger a "can't understand" message from W=1 compiles.
Fix by using correct DOC: tags in the generic sections.
Fixed Warnings:
drivers/net/ethernet/arc/emac_arc.c:4: info: Scanning doc for c
drivers/net/ethernet/cadence/macb_pci.c:3: warning: missing initial short description on line:
* Cadence GEM PCI wrapper.
drivers/net/ethernet/cadence/macb_pci.c:3: info: Scanning doc for Cadence
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While fixing the W=1 builds, this warning came up because the
developers used a very tricky way to get structures initialized
to a non-zero value, but this causes GCC to warn about an
override. In this case the override was intentional, so just
disable the warning for this code with a kernel macro that results
in disabling the warning for compiles on GCC versions after 8.
It is not appropriate to change the struct to initialize all the
values as it will just add a lot more code for no value. The code
is completely correct as is, we just want to acknowledge that
this code could generate a warning and we're ok with that.
NOTE: the __diag_ignore macro currently only accepts a second
argument of 8 (version 80000), it's either use this one or
open code the pragma.
Fixed Warnings example (all the same):
drivers/net/ethernet/renesas/sh_eth.c:51:12: warning: initialized field overwritten [-Woverride-init]
drivers/net/ethernet/renesas/sh_eth.c:52:12: warning: initialized field overwritten [-Woverride-init]
drivers/net/ethernet/renesas/sh_eth.c:53:13: warning: initialized field overwritten [-Woverride-init]
+ 256 more...
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The W=1 builds showed a few files exporting functions
(non-static) that were not prototyped. What actually happened is
that there were prototypes, but the include file was forgotten in
the implementation file.
Add the include file and remove the warnings.
Fixed Warnings:
drivers/net/ethernet/cavium/liquidio/cn68xx_device.c:124:5: warning: no previous prototype for ‘lio_setup_cn68xx_octeon_device’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:159:1: warning: no previous prototype for ‘octeon_pci_read_core_mem’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:168:1: warning: no previous prototype for ‘octeon_pci_write_core_mem’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:176:5: warning: no previous prototype for ‘octeon_read_device_mem64’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:185:5: warning: no previous prototype for ‘octeon_read_device_mem32’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:194:6: warning: no previous prototype for ‘octeon_write_device_mem32’ [-Wmissing-prototypes]
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c:453:6: warning: no previous prototype for ‘hclge_dcb_ops_set’ [-Wmissing-prototypes]
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the W=1 compliation series, these lines all created
warnings about unused variables that were assigned a value. Most
of them are from register reads, but some are just picking up
a return value from a function and never doing anything with it.
Fixed warnings:
.../ethernet/brocade/bna/bnad.c:3280:6: warning: variable ‘rx_count’ set but not used [-Wunused-but-set-variable]
.../ethernet/brocade/bna/bnad.c:3280:6: warning: variable ‘rx_count’ set but not used [-Wunused-but-set-variable]
.../ethernet/cortina/gemini.c:512:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable]
.../ethernet/cortina/gemini.c:2110:21: warning: variable ‘config0’ set but not used [-Wunused-but-set-variable]
.../ethernet/cavium/liquidio/octeon_device.c:1327:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable]
.../ethernet/cavium/liquidio/octeon_device.c:1358:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable]
.../ethernet/dec/tulip/media.c:322:8: warning: variable ‘setup’ set but not used [-Wunused-but-set-variable]
.../ethernet/dec/tulip/de4x5.c:4928:13: warning: variable ‘r3’ set but not used [-Wunused-but-set-variable]
.../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable]
.../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable]
.../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable]
.../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable]
.../ethernet/micrel/ksz884x.c:4981:6: warning: variable ‘rx_status’ set but not used [-Wunused-but-set-variable]
.../ethernet/micrel/ksz884x.c:6510:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable]
.../ethernet/micrel/ksz884x.c:6087: warning: cannot understand function prototype: 'struct hw_regs '
.../ethernet/microchip/lan743x_main.c:161:6: warning: variable ‘int_en’ set but not used [-Wunused-but-set-variable]
.../ethernet/microchip/lan743x_main.c:1702:6: warning: variable ‘int_sts’ set but not used [-Wunused-but-set-variable]
.../ethernet/microchip/lan743x_main.c:3041:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]
.../ethernet/natsemi/ns83820.c:603:6: warning: variable ‘tbisr’ set but not used [-Wunused-but-set-variable]
.../ethernet/natsemi/ns83820.c:1207:11: warning: variable ‘tanar’ set but not used [-Wunused-but-set-variable]
.../ethernet/marvell/mvneta.c:754:6: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable]
.../ethernet/neterion/vxge/vxge-traffic.c:33:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable]
.../ethernet/neterion/vxge/vxge-traffic.c:160:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable]
.../ethernet/neterion/vxge/vxge-traffic.c:490:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable]
.../ethernet/neterion/vxge/vxge-traffic.c:2378:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable]
.../ethernet/packetengines/yellowfin.c:1063:18: warning: variable ‘yf_size’ set but not used [-Wunused-but-set-variable]
.../ethernet/realtek/8139cp.c:1242:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable]
.../ethernet/mellanox/mlx4/en_tx.c:858:6: warning: variable ‘ring_cons’ set but not used [-Wunused-but-set-variable]
.../ethernet/sis/sis900.c:792:6: warning: variable ‘status’ set but not used [-Wunused-but-set-variable]
.../ethernet/sfc/falcon/farch.c:878:11: warning: variable ‘rx_ev_pkt_type’ set but not used [-Wunused-but-set-variable]
.../ethernet/sfc/falcon/farch.c:877:23: warning: variable ‘rx_ev_mcast_pkt’ set but not used [-Wunused-but-set-variable]
.../ethernet/sfc/falcon/farch.c:877:7: warning: variable ‘rx_ev_hdr_type’ set but not used [-Wunused-but-set-variable]
.../ethernet/sfc/falcon/farch.c:876:7: warning: variable ‘rx_ev_other_err’ set but not used [-Wunused-but-set-variable]
.../ethernet/sfc/falcon/farch.c:1646:21: warning: variable ‘buftbl_min’ set but not used [-Wunused-but-set-variable]
.../ethernet/sfc/falcon/farch.c:2535:32: warning: variable ‘spec’ set but not used [-Wunused-but-set-variable]
.../ethernet/via/via-velocity.c:880:6: warning: variable ‘curr_status’ set but not used [-Wunused-but-set-variable]
.../ethernet/ti/tlan.c:656:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable]
.../ethernet/ti/davinci_emac.c:1230:6: warning: variable ‘num_tx_pkts’ set but not used [-Wunused-but-set-variable]
.../ethernet/synopsys/dwc-xlgmac-common.c:516:8: warning: variable ‘str’ set but not used [-Wunused-but-set-variable]
.../ethernet/ti/cpsw_new.c:1662:22: warning: variable ‘priv’ set but not used [-Wunused-but-set-variable]
The register reads should be OK, because the current
implementation of readl and friends will always execute even
without an lvalue.
When it makes sense, just remove the lvalue assignment and the
local. Other times, just remove the offending code, and
occasionally, just mark the variable as maybe unused since it
could be used in an ifdef or debug scenario.
Only compile tested with W=1.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove variables that were storing a return value from a register
read or other read, where the return value wasn't used. Those
conversions to remove the lvalue of the assignment should be safe
because the readl memory mapped reads are marked volatile and
should not be optimized out without an lvalue (I suspect a very
long time ago this wasn't guaranteed as it is today).
These changes are part of a separate patch to make it easier to review.
Warnings Fixed:
.../intel/e100.c:2596:9: warning: variable ‘err’ set but not used [-Wunused-but-set-variable]
.../intel/ixgb/ixgb_hw.c:101:6: warning: variable ‘icr_reg’ set but not used [-Wunused-but-set-variable]
.../intel/ixgb/ixgb_hw.c:277:6: warning: variable ‘ctrl_reg’ set but not used [-Wunused-but-set-variable]
.../intel/ixgb/ixgb_hw.c:952:15: warning: variable ‘temp_reg’ set but not used [-Wunused-but-set-variable]
.../intel/ixgb/ixgb_hw.c:1164:7: warning: variable ‘mdio_reg’ set but not used [-Wunused-but-set-variable]
.../intel/e1000/e1000_hw.c:132:6: warning: variable ‘ret_val’ set but not used [-Wunused-but-set-variable]
.../intel/e1000/e1000_hw.c:380:6: warning: variable ‘icr’ set but not used [-Wunused-but-set-variable]
.../intel/e1000/e1000_hw.c:2378:6: warning: variable ‘signal’ set but not used [-Wunused-but-set-variable]
.../intel/e1000/e1000_hw.c:2374:6: warning: variable ‘ctrl’ set but not used [-Wunused-but-set-variable]
.../intel/e1000/e1000_hw.c:2373:6: warning: variable ‘rxcw’ set but not used [-Wunused-but-set-variable]
.../intel/e1000/e1000_hw.c:4678:15: warning: variable ‘temp’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This takes care of all of the trivial W=1 fixes in the Intel
Ethernet drivers, which allows developers and maintainers to
build more of the networking tree with more complete warning
checks.
There are three classes of kdoc warnings fixed:
- cannot understand function prototype: 'x'
- Excess function parameter 'x' description in 'y'
- Function parameter or member 'x' not described in 'y'
All of the changes were trivial comment updates on
function headers.
Inspired by Lee Jones' series of wireless work to do the same.
Compile tested only, and passes simple test of
$ git ls-files *.[ch] | egrep drivers/net/ethernet/intel | \
xargs scripts/kernel-doc -none
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During ice_vsi_setup, if ice_cfg_vsi_lan fails, it does not properly
release memory associated with the VSI rings. If we had used devres
allocations for the rings, this would be ok. However, we use kzalloc and
kfree_rcu for these ring structures.
Using the correct label to cleanup the rings during ice_vsi_setup
highlights an issue in the ice_vsi_clear_rings function: it can leave
behind stale ring pointers in the q_vectors structure.
When releasing rings, we must also ensure that no q_vector associated
with the VSI will point to this ring again. To resolve this, loop over
all q_vectors and release their ring mapping. Because we are about to
free all rings, no q_vector should remain pointing to any of the rings
in this VSI.
Fixes: 5513b920a4 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_setup_pf_sw function can cause a memory leak if register_netdev
fails, due to accidentally failing to free the VSI rings. Fix the memory
leak by using ice_vsi_release, ensuring we actually go through the full
teardown process.
This should be safe even if the netdevice is not registered because we
will have set the netdev pointer to NULL, ensuring ice_vsi_release won't
call unregister_netdev.
An alternative fix would be moving management of the PF VSI netdev into
the main VSI setup code. This is complicated and likely requires
significant refactor in how we manage VSIs
Fixes: 3a858ba392 ("ice: Add support for VSI allocation and deallocation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
It appears that the ice_suspend flow is missing a call to pci_save_state
and this is triggering the message "State of device not saved by
ice_suspend" and a call trace. Fix it.
Fixes: 769c500dcc ("ice: Add advanced power mgmt for WoL")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When calling iavf_resume there was a crash because wrong
function was used to get iavf_adapter and net_device pointers.
Changed how iavf_resume is getting iavf_adapter and net_device
pointers from pci_dev.
Fixes: 5eae00c57f ("i40evf: main driver core")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use the new non-coherent DMA API including proper ownership transfers.
This includes adding additional calls to dma_sync_desc_dev as the
old syncing was rather ad-hoc.
Thanks to Thomas Bogendoerfer for debugging the ownership transfer
issues.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Use the new non-coherent DMA API including proper ownership transfers.
This includes moving the DMA helpers to lib82596 based of an ifdef to
avoid include order problems.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> (SNI part)
This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare
for untangling the coherent vs non-coherent DMA allocation API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> (SNI part)
The au1000-eth driver contains none of the manual cache synchronization
required for using DMA_ATTR_NON_CONSISTENT. From what I can tell it
can be used on both dma coherent and non-coherent DMA platforms, but
I suspect it has been buggy on the non-coherent platforms all along.
Signed-off-by: Christoph Hellwig <hch@lst.de>
VF devices do not have speed division, its speed is depended on its PF.
So macro name of PCI device id of VF is incorrent to have 100G info, it
should be renamed by removing 100G info.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 200G device has a new device id 0xA228, so adds this device id to
pci table, then the driver can probe it.
As speed_ability queried from firmware has only 8 bits and already be
used up, so firmware adds extra speed_ability_ext to indicate more
speed abilities to support 200G and driver needs to parse it.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pf's interrupt resources will be changed with the number of
enabled pf. Dumping this resource information will be helpful
for debugging.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In hns3_process_hw_error(), the hardware error detection of the
ROCEE AXI RESP error type is added. When this error occurs,
the client needs to be notified of this error and take
corresponding operation.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a variable is assigned a value before it is used, it's no
need to assign an initial value to the variable. So remove
these redundant operations.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some unnecessary parameters of hclge_title_idx_print(),
and rename this function for readability.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MDIO reads can happen during PHY probing, and printing an error with
dev_err can result in a large number of error messages during device
probe. On a platform with a serial console this can result in
excessively long boot times in a way that looks like an infinite loop
when multiple busses are present. Since 0f183fd151 (net/fsl: enable
extended scanning in xgmac_mdio) we perform more scanning so there are
potentially more failures.
Reduce the logging level to dev_dbg which is consistent with the
Freescale enetc driver.
Cc: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify the comment typo: "compliment" -> "complement".
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It should also be regarded as an error when hw return status=4 for PF's
setting mac cmd. Only if PF return status=4 to VF should this cmd be
taken special treatment.
Fixes: 7dd29ee128 ("hinic: add sriov feature support")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series includes mlx5 updates
1) Add support for Connection Tracking offload in NIC mode.
Supporting CT offload in NIC mode on Mellanox cards is useful for
scenarios where the dual port NIC serves as a gateway between 2
networks and forwards traffic between these networks.
Since the traffic is not terminated on the host in this case,
no use of SRIOV VFs and/or switchdev mode is required.
Today Mellanox NIC cards already support offloading of packet forwarding
between physical ports without going to the host so combining it with CT
offloading allows users to create a gateway with forwarding and CT
(Including NAT) offloading capabilities in non-switchdev mode.
To support connection tracking in non-Switchdev mode (Single NIC mode),
we need to make use of the current Connection tracking infrastructure
implemented on top of E-Switch and the mlx5 generic flow table chains
APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain,
the following was performed:
1.1) Refactor current flow steering chains infrastructure and
updates TC nic mode implementation to use flow table chains.
1.2) Refactor current Connection Tracking (CT) infrastructure to not
assume E-switch backend, and make the CT layer agnostic to
underlying steering mode (E-Switch/NIC)
1.3) Plumbing to support CT offload in NIC mode.
2) Trivial code cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl9rz9cACgkQSD+KveBX
+j7K/gf/ZysTFuFuC7MCo7xJO2vxlGGE1r6/ENsqonvUT2tcoZCdK9bZMw1Mx17Z
r1nyn0xQ3MwRheXMSpqXngTPpfGM6eNgV9CDfFXm62z6WXMYieen0t/LrM/mxo+2
s74Okp53peyGNpePyseewEUGV7zaR6F6uukkKvr441gvAOF3Fcfaz+dIv7KzxKNS
+b78yw0b6mGc4foYLSuJcDQlSwqjeIpdSib8xmETMZwRzCt20GCEBDsBAaKt0wzM
1fTZttY+kuLd/m/q+sh3s/4lN2kOO+dwK5NGf+RWtiOaDWT+J/ogVmI2ywXIwsg7
U63nhjGAr7GPqkaG0Jv3aS7na6pbSA==
=sByc
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-09-22
This series includes mlx5 updates
1) Add support for Connection Tracking offload in NIC mode.
Supporting CT offload in NIC mode on Mellanox cards is useful for
scenarios where the dual port NIC serves as a gateway between 2
networks and forwards traffic between these networks.
Since the traffic is not terminated on the host in this case,
no use of SRIOV VFs and/or switchdev mode is required.
Today Mellanox NIC cards already support offloading of packet forwarding
between physical ports without going to the host so combining it with CT
offloading allows users to create a gateway with forwarding and CT
(Including NAT) offloading capabilities in non-switchdev mode.
To support connection tracking in non-Switchdev mode (Single NIC mode),
we need to make use of the current Connection tracking infrastructure
implemented on top of E-Switch and the mlx5 generic flow table chains
APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain,
the following was performed:
1.1) Refactor current flow steering chains infrastructure and
updates TC nic mode implementation to use flow table chains.
1.2) Refactor current Connection Tracking (CT) infrastructure to not
assume E-switch backend, and make the CT layer agnostic to
underlying steering mode (E-Switch/NIC)
1.3) Plumbing to support CT offload in NIC mode.
2) Trivial code cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Include PCS support in the dpaa2-eth driver by integrating it with the
new Lynx PCS module. There is not much to talk about in terms of changes
needed in the dpaa2-eth driver since the only steps necessary are to
find the MDIO device representing the PCS, register it to the Lynx PCS
module and then let phylink know if its existence also.
After this, the PCS callbacks will be treated directly by Lynx, without
interraction from dpaa2-eth's part.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, ocelot switchdev passes the skb directly to the function that
enqueues it to the list of skb's awaiting a TX timestamp. Whereas the
felix DSA driver first clones the skb, then passes the clone to this
queue.
This matters because in the case of felix, the common IRQ handler, which
is ocelot_get_txtstamp(), currently clones the clone, and frees the
original clone. This is useless and can be simplified by using
skb_complete_tx_timestamp() instead of skb_tstamp_tx().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EEE should be only be enabled during stmmac_mac_link_up() when the
link are up and being set up properly. set_eee should only do settings
configuration and disabling the eee.
Without this fix, turning on EEE using ethtool will return
"Operation not supported". This is due to the driver is in a dead loop
waiting for eee to be advertised in the for eee to be activated but the
driver will only configure the EEE advertisement after the eee is
activated.
Ethtool should only return "Operation not supported" if there is no EEE
capbility in the MAC controller.
Fixes: 8a7493e58a ("net: stmmac: Fix a race in EEE enable callback")
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TX DMA channel data is accessed by the xrx200_start_xmit() and the
xrx200_tx_housekeeping() function from different threads. Make sure the
accesses are synchronized by acquiring the netif_tx_lock() in the
xrx200_tx_housekeeping() function too. This lock is acquired by the
kernel before calling xrx200_start_xmit().
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to control rx-flow-hash based on VLAN.
By default VLAN plus 4-tuple based hashing is enabled.
Changes can be done runtime using ethtool
To enable 2-tuple plus VLAN based flow distribution
# ethtool -N <intf> rx-flow-hash <prot> sdv
To enable 4-tuple plus VLAN based flow distribution
# ethtool -N <intf> rx-flow-hash <prot> sdfnv
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for PF/VF drivers to choose RSS flow key algorithm
with VLAN tag included in hashing input data. Only CTAG is considered.
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1838d6c62f.
This commit moved the ravb_mdio_init() call (and thus the
of_mdiobus_register() call) from the ravb_probe() to the ravb_open()
call. This causes a regression during system resume (s2idle/s2ram), as
new PHY devices cannot be bound while suspended.
During boot, the Micrel PHY is detected like this:
Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=228)
ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
During system suspend, (A) defer_all_probes is set to true, and (B)
usermodehelper_disabled is set to UMH_DISABLED, to avoid drivers being
probed while suspended.
A. If CONFIG_MODULES=n, phy_device_register() calling device_add()
merely adds the device, but does not probe it yet, as
really_probe() returns early due to defer_all_probes being set:
dpm_resume+0x128/0x4f8
device_resume+0xcc/0x1b0
dpm_run_callback+0x74/0x340
ravb_resume+0x190/0x1b8
ravb_open+0x84/0x770
of_mdiobus_register+0x1e0/0x468
of_mdiobus_register_phy+0x1b8/0x250
of_mdiobus_phy_device_register+0x178/0x1e8
phy_device_register+0x114/0x1b8
device_add+0x3d4/0x798
bus_probe_device+0x98/0xa0
device_initial_probe+0x10/0x18
__device_attach+0xe4/0x140
bus_for_each_drv+0x64/0xc8
__device_attach_driver+0xb8/0xe0
driver_probe_device.part.11+0xc4/0xd8
really_probe+0x32c/0x3b8
Later, phy_attach_direct() notices no PHY driver has been bound,
and falls back to the Generic PHY, leading to degraded operation:
Generic PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=POLL)
ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
B. If CONFIG_MODULES=y, request_module() returns early with -EBUSY due
to UMH_DISABLED, and MDIO initialization fails completely:
mdio_bus e6800000.ethernet-ffffffff:00: error -16 loading PHY driver module for ID 0x00221622
ravb e6800000.ethernet eth0: failed to initialize MDIO
PM: dpm_run_callback(): ravb_resume+0x0/0x1b8 returns -16
PM: Device e6800000.ethernet failed to resume: error -16
Ignoring -EBUSY in phy_request_driver_module(), like was done for
-ENOENT in commit 21e194425a ("net: phy: fix issue with loading
PHY driver w/o initramfs"), would makes it fall back to the Generic
PHY, like in the CONFIG_MODULES=n case.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: stable@vger.kernel.org
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With tracepoints support present in the mailbox
code this patch adds tracepoints in PF and VF drivers
at places where mailbox messages are allocated,
sent and at message interrupts.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added tracepoints in mailbox code so that
the mailbox operations like message allocation,
sending message and message interrupts are traced.
Also the mailbox errors occurred like timeout
or wrong responses are traced.
These will help in debugging mailbox issues.
Here's an example output showing one of the mailbox
messages sent by PF to AF and AF responding to it:
~# mount -t tracefs none /sys/kernel/tracing/
~# echo 1 > /sys/kernel/tracing/events/rvu/enable
~# ifconfig eth0 up
~# cat /sys/kernel/tracing/trace
~# cat /sys/kernel/tracing/trace
tracer: nop
_-----=> irqs-off
/ _----=> need-resched
| / _---=> hardirq/softirq
|| / _--=> preempt-depth
||| / delay
TASK-PID CPU# |||| TIMESTAMP FUNCTION
| | | |||| | |
ifconfig-2382 [002] .... 756.161892: otx2_msg_alloc: [0002:02:00.0] msg:(0x400) size:40
ifconfig-2382 [002] ...1 756.161895: otx2_msg_send: [0002:02:00.0] sent 1 msg(s) of size:48
<idle>-0 [000] d.h1 756.161902: otx2_msg_interrupt: [0002:01:00.0] mbox interrupt PF(s) to AF (0x2)
kworker/u49:0-1165 [000] .... 756.162049: otx2_msg_process: [0002:01:00.0] msg:(0x400) error:0
kworker/u49:0-1165 [000] ...1 756.162051: otx2_msg_send: [0002:01:00.0] sent 1 msg(s) of size:32
kworker/u49:0-1165 [000] d.h. 756.162056: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt AF to PF (0x1)
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comment "holders of db->lock must always block IRQs" and related
code to do irqsave and irqrestore don't make sense since we are in a
IRQ-disabled hardIRQ context.
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of static variables were not modified. Make them const to allow
the compiler to put them in read-only memory. In order to do so,
constify a couple of input pointers as well as some local pointers.
This moves about 35Kb to read-only memory as seen by the output of the
size command.
Before:
text data bss dec hex filename
404938 111534 640 517112 7e3f8 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko
After:
text data bss dec hex filename
439499 76974 640 517113 7e3f9 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last return statement is unreachable code. I'm not sure if it will
provoke any warnings, but it looks ugly.
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Memory ft->g in accel_fs_tcp_create_groups() is allocaed with kcalloc().
It's excessive to free ft->g with kvfree(). Use kfree() instead.
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Variables flow_group_in, spec in rx_fs_create() are allocated with
kvzalloc(). It's incorrect to free them with kfree(). Use kvfree()
instead.
Fixes: 5e46634529 ("net/mlx5e: IPsec: Add IPsec steering in local NIC RX")
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Keep and use a direct reference to the mlx5 core device in all of
tc_ct code instead of accessing it via a pointer to mlx5 eswitch
in order to support nic mode ct offload for VF devices that don't
have a valid eswitch pointer set.
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
priv is never used in this function
Fixes: 7e36feeb04 ("net/mlx5e: CT: Don't offload tuple rewrites for established tuples")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
A connection is represented by two 5-tuple entries, one for each direction.
Currently, each direction allocates its own hw counter, which is
inefficient as ct aging is managed per connection.
Share the counter that was allocated for the original direction with the
reverse direction.
Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Adding support to perform CT related tc actions and
matching on CT states for nic flows.
The ct flows management and handling will be done using a new
instance of the ct database that is declared in this patch to
keep it separate from the eswitch ct flows database.
Offloading and unoffloading ct flows will be done using the
existing ct offload api by providing it the relevant ct
database reference in each mode.
In addition, refactoring the tc ct api is introduced to make it
agnostic to the flow type and perform the resource allocations
and rule insertion to the proper steering domain in the device.
In the initialization call, the api requests and stores in the ct
database instance all the relevant information that distinguishes
between nic flows and esw flows, such as chains database, steering
namespace and mod hdr table.
This way the operations of adding and removing ct flows to the device
can later performed agnostically to the flow type.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The changes are:
- Use mlx5_core print macros instead of netdev_warn since
netdev is not always initialized at that stage.
- Print a warning message in case the issue is with lack of
support for CT offload without indicating an error.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Allow adding nic tc flow rules with goto chain action.
Connecting the nic flows to the mlx5 chains infrastructure in previous
patches allows us to support the creation of chained flow tables and
rules that direct to another chain for further packet processing.
This is a required preparation to support CT offloads for nic tc flows.
We allow the creation of 256 different chains for nic flows since we
have 8 bits available for the chain restore tag in case of a miss.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to support chains and connection tracking offload for
nic flows, there's a need to introduce a common flow attributes
struct so that these features can be agnostic and have access to
a single attributes struct, regardless of the flow type.
Therefore, a new tc flow attributes format is introduced to allow
access to attributes that are common to eswitch and nic flows.
The common attributes will always get allocated for the new flows,
regardless of their type, while the type specific attributes are
separated into different structs and will be allocated based on the
flow type to avoid memory waste.
When allocating the flow attributes the caller provides the flow
steering namespace and according the namespace type the additional
space for the extra, type specific, attributes is determined and
added to the total attribute allocation size.
In addition, the attributes that are going to be common to both
flow types are moved to the common attributes struct.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
For future support of CT offload with nic tc flows, where
the flow rule is not created immediately but rather following
a future event, the patch is splitting the nic rule creation
and deletion into 2 parts:
1. Creating/Deleting and setting the rule attributes.
2. Creating/Deleting the flow table and flow rule itself.
This way the attributes can be prepared and stored in the
flow handle when the tc flow is created but the rule can
actually be created at any point in the future, using these
pre allocated attributes.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Change nic tc flows offload path to use the chains and prios
infrastructure for the flow table creation as a preparation to
support tc multi chains and priorities for nic flows.
Adding an instance of the table chaining database to the nic tc struct
and perform the root table creation and desctuction via the chains api
while keeping the limit of a single chain (0) in nic tc mode.
This will be extendable to supporting multiple chains in the following
patches.
The flow table sizes and default miss table parameters that are provided
to the chains creation api are kept the same.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Allow setting a flow table with a lower level
as a rule destination in nic rx tables.
This is required in order to support table chaining
of tc nic flows.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Decouple the chains infrastructure from eswitch and make
it generic to support other steering namespaces.
The change defines an agnostic data structure to keep
all the relevant information for maintaining flow table
chaining in any steering namespace. Each namespace that
requires table chaining will be required to allocate
such data structure.
The chains creation code will receive the steering namespace
and flow table parameters from the caller so it will operate
agnosticly when creating the required resources to
maintain the table chaining function while Parts of the code
that are relevant to eswitch specific functionality are moved
to eswitch files.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/realtek/8139cp.c: In function cp_tx_timeout:
drivers/net/ethernet/realtek/8139cp.c:1242:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable]
`rc` is never used, so remove it.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the warnings about function header comments when building hinic
driver with "W=1" option.
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-09-23
The following pull-request contains BPF updates for your *net-next* tree.
We've added 95 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 4211 insertions(+), 2040 deletions(-).
The main changes are:
1) Full multi function support in libbpf, from Andrii.
2) Refactoring of function argument checks, from Lorenz.
3) Make bpf_tail_call compatible with functions (subprograms), from Maciej.
4) Program metadata support, from YiFei.
5) bpf iterator optimizations, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/microchip/lan743x_main.c: In function lan743x_pm_suspend:
`ret` is set but not used. In fact, `pci_prepare_to_sleep` function value should
be the right value of `lan743x_pm_suspend` function, therefore, fix it.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Multi packet TX descriptor support for SKBs.
This series introduces some refactoring of the regular TX data path in
mlx5 and adds the Enhanced TX MPWQE feature support. MPWQE stands for
multi-packet work queue element, and it can serve multiple packets,
reducing the PCI bandwidth spent on control traffic. It should improve
performance in scenarios where PCI is the bottleneck, and xmit_more is
signaled by the kernel. The refactoring done in this series also
improves the packet rate on its own.
MPWQE is already implemented in the XDP tx path, this series adds the
support of MPWQE for regular kernel SKB tx path.
MPWQE is supported from ConnectX-5 and onward, for legacy devices we need
to keep backward compatibility for regular (Single packet) WQE descriptor.
MPWQE is not compatible with certain offloads and features, such as TLS
offload, TSO, nonlinear SKBs. If such incompatible features are in use,
the driver gracefully falls back to non-MPWQE per SKB.
Prior to the final patch "net/mlx5e: Enhanced TX MPWQE for SKBs" that adds
the actual support, Maxim did some refactoring to the tx data path to
split it into stages and smaller helper functions that can be utilized and
reused for both legacy and new MPWQE feature.
Performance testing:
UDP performance is improved in a single stream pktgen test:
Packet rate: 16.86 Mpps (±0.15 Mpps) -> 20.94 Mpps (±0.33 Mpps)
Instructions per packet: 434 -> 329
Cycles per packet: 158 -> 123
Instructions per cycle: 2.75 -> 2.67
TCP and XDP_TX single stream tests show no performance difference.
MPWQE can reduce PCI bandwidth:
PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 80.3%
Inbound PCI utilization with MPWQE on: 59.0%
PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 65.4%
Inbound PCI utilization with MPWQE on: 49.3%
MPWQE can also reduce CPU load, increasing the packet rate in case of
CPU bottleneck:
PCI Gen2, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 37.5 Mpps
Packet rate with MPWQE on: 49.0 Mpps
PCI Gen3, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 57.0 Mpps
Packet rate with MPWQE on: 66.8 Mpps
Burst size in all pktgen tests is 32.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl9pZE0ACgkQSD+KveBX
+j5V/Qf+M0PI/ZyTsOlHbl78Mz7acgGSZTjFBPo0MQ7U0ReY8C25YVDycKazlwwZ
XL8Ip1gV08uDbROB92ozQcDekIyiTyae04ACXa+oCl/lxJydxN5ZDAiJV+bUhb0E
Ti4rBrgPH46FMbKso2XPFxdk9f9krqOLA2Jl7Am+R+W1nYgdBkqumTRXGkDEV8oi
p1YeFb/ldBXS6En/QQAZ89FbHaoV+V4Z2uHhdoWjLPhumgplk14BwRMT0UCRn3IK
6Q8jk55gW7lE9vdhQuOHZeU3SRr2+VcyYii2/htfvdQjsGrBVrAm1gWcF2KrUa6C
VxuDQ1oXh3r/eibnTq/XReadRiGSVg==
=ouzY
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-09-21
Multi packet TX descriptor support for SKBs.
This series introduces some refactoring of the regular TX data path in
mlx5 and adds the Enhanced TX MPWQE feature support. MPWQE stands for
multi-packet work queue element, and it can serve multiple packets,
reducing the PCI bandwidth spent on control traffic. It should improve
performance in scenarios where PCI is the bottleneck, and xmit_more is
signaled by the kernel. The refactoring done in this series also
improves the packet rate on its own.
MPWQE is already implemented in the XDP tx path, this series adds the
support of MPWQE for regular kernel SKB tx path.
MPWQE is supported from ConnectX-5 and onward, for legacy devices we need
to keep backward compatibility for regular (Single packet) WQE descriptor.
MPWQE is not compatible with certain offloads and features, such as TLS
offload, TSO, nonlinear SKBs. If such incompatible features are in use,
the driver gracefully falls back to non-MPWQE per SKB.
Prior to the final patch "net/mlx5e: Enhanced TX MPWQE for SKBs" that adds
the actual support, Maxim did some refactoring to the tx data path to
split it into stages and smaller helper functions that can be utilized and
reused for both legacy and new MPWQE feature.
Performance testing:
UDP performance is improved in a single stream pktgen test:
Packet rate: 16.86 Mpps (±0.15 Mpps) -> 20.94 Mpps (±0.33 Mpps)
Instructions per packet: 434 -> 329
Cycles per packet: 158 -> 123
Instructions per cycle: 2.75 -> 2.67
TCP and XDP_TX single stream tests show no performance difference.
MPWQE can reduce PCI bandwidth:
PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 80.3%
Inbound PCI utilization with MPWQE on: 59.0%
PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 65.4%
Inbound PCI utilization with MPWQE on: 49.3%
MPWQE can also reduce CPU load, increasing the packet rate in case of
CPU bottleneck:
PCI Gen2, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 37.5 Mpps
Packet rate with MPWQE on: 49.0 Mpps
PCI Gen3, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 57.0 Mpps
Packet rate with MPWQE on: 66.8 Mpps
Burst size in all pktgen tests is 32.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Two minor conflicts:
1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.
2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds support for Enhanced TX MPWQE feature in the regular
(SKB) data path. A MPWQE (multi-packet work queue element) can serve
multiple packets, reducing the PCI bandwidth on control traffic.
Two new stats (tx*_mpwqe_blks and tx*_mpwqe_pkts) are added. The feature
is on by default and controlled by the skb_tx_mpwqe private flag.
In a MPWQE, eseg is shared among all packets, so eseg-based offloads
(IPSEC, GENEVE, checksum) run on a separate eseg that is compared to the
eseg of the current MPWQE session to decide if the new packet can be
added to the same session.
MPWQE is not compatible with certain offloads and features, such as TLS
offload, TSO, nonlinear SKBs. If such incompatible features are in use,
the driver gracefully falls back to non-MPWQE.
This change has no performance impact in TCP single stream test and
XDP_TX single stream test.
UDP pktgen, 64-byte packets, single stream, MPWQE off:
Packet rate: 16.96 Mpps (±0.12 Mpps) -> 17.01 Mpps (±0.20 Mpps)
Instructions per packet: 421 -> 429
Cycles per packet: 156 -> 161
Instructions per cycle: 2.70 -> 2.67
UDP pktgen, 64-byte packets, single stream, MPWQE on:
Packet rate: 16.96 Mpps (±0.12 Mpps) -> 20.94 Mpps (±0.33 Mpps)
Instructions per packet: 421 -> 329
Cycles per packet: 156 -> 123
Instructions per cycle: 2.70 -> 2.67
Enabling MPWQE can reduce PCI bandwidth:
PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 80.3%
Inbound PCI utilization with MPWQE on: 59.0%
PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 65.4%
Inbound PCI utilization with MPWQE on: 49.3%
Enabling MPWQE can also reduce CPU load, increasing the packet rate in
case of CPU bottleneck:
PCI Gen2, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 37.5 Mpps
Packet rate with MPWQE on: 49.0 Mpps
PCI Gen3, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 57.0 Mpps
Packet rate with MPWQE on: 66.8 Mpps
Burst size in all pktgen tests is 32.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_txwqe_complete performs some actions that can be taken to separate
functions:
1. Update the flags needed for hardware timestamping.
2. Stop the TX queue if it's full.
Take these actions into separate functions to be reused by the MPWQE
code in the following commit and to maintain clear responsibilities of
functions.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As preparation for the upcoming TX MPWQE support for SKBs, rename struct
mlx5e_xdp_mpwqe to mlx5e_tx_mpwqe and move it above struct mlx5e_txqsq.
This structure will be reused in the regular SQ and in the regular TX
data path. Also rename mlx5e_xdp_xmit_data to mlx5e_xmit_data - it will
be used in the upcoming TX MPWQE flow.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As preparation for the upcoming TX MPWQE for SKBs, create a function
(mlx5e_tx_mpwqe_is_full) to check whether an MPWQE session is full. This
function will be shared by MPWQE code for XDP and for SKBs. Defines are
renamed and moved to make them not XDP-specific.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
TX MPWQE support for SKBs is coming in one of the following patches, and
a single MPWQE can send multiple SKBs. This commit prepares the TX path
code to handle such cases:
1. An additional FIFO for SKBs is added, just like the FIFO for DMA
chunks.
2. struct mlx5e_tx_wqe_info will contain num_fifo_pkts. If a given WQE
contains only one packet, num_fifo_pkts will be zero, and the SKB will
be stored in mlx5e_tx_wqe_info, as usual. If num_fifo_pkts > 0, the SKB
pointer will be NULL, and the SKBs will be stored in the FIFO.
This change has no performance impact in TCP single stream test and
XDP_TX single stream test.
When compiled with a recent GCC, this change shows no visible
performance impact on UDP pktgen (burst 32) single stream test either:
Packet rate: 16.95 Mpps (±0.15 Mpps) -> 16.96 Mpps (±0.12 Mpps)
Instructions per packet: 429 -> 421
Cycles per packet: 160 -> 156
Instructions per cycle: 2.69 -> 2.70
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Before this patch, mlx5e_ktls_tx_handle_resync_dump_comp checked for
resync_dump_frag_page. It happened for all WQEs without an SKB,
including padding WQEs, and required a function call. Normally, padding
WQEs happen more often than TLS resyncs. Take this check out of the
function and put it to an inline function to save a call on all padding
WQEs.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
A constant for the number of DS in an empty WQE (i.e. a WQE without data
segments) is needed in multiple places (normal TX data path, MPWQE in
XDP), but currently we have a constant for XDP and an inline formula in
normal TX. This patch introduces a common constant.
Additionally, mlx5e_xdp_mpwqe_session_start is converted to use struct
assignment, because the code nearby is touched.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use MLX5E_XDP_MPW_MAX_WQEBBS to reserve space for a MPWQE, because it's
actually the maximal size a MPWQE can take.
Reorganize the logic that checks when to close the MPWQE session:
1. Put all checks into a single function.
2. When inline is on, make only one comparison - if it's false, the less
strict one will also be false. The compiler probably optimized it out
anyway, but it's clearer to also reflect it in the code.
The MLX5E_XDP_INLINE_WQE_* defines are also changed to make the
calculations more correct from the logical point of view. Though
MLX5E_XDP_INLINE_WQE_MAX_DS_CNT used to be 16 and didn't change its
value, the calculation used to be DIV_ROUND_UP(max inline packet size,
MLX5_SEND_WQE_DS), and the numerator should have included sizeof(struct
mlx5_wqe_inline_seg).
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
A huge function mlx5e_sq_xmit was split into several to achieve multiple
goals:
1. Reuse the code in IPoIB.
2. Better intergrate with TLS, IPSEC, GENEVE and checksum offloads. Now
it's possible to reserve space in the WQ before running eseg-based
offloads, so:
2.1. It's not needed to copy cseg and eseg after mlx5e_fill_sq_frag_edge
anymore.
2.2. mlx5e_txqsq_get_next_pi will be used instead of the legacy
mlx5e_fill_sq_frag_edge for better code maintainability and reuse.
3. Prepare for the upcoming TX MPWQE for SKBs. It will intervene after
mlx5e_sq_calc_wqe_attr to check if it's possible to use MPWQE, and the
code flow will split into two paths: MPWQE and non-MPWQE.
Two high-level functions are provided to send packets:
* mlx5e_xmit is called by the networking stack, runs offloads and sends
the packet. In one of the following patches, MPWQE support will be added
to this flow.
* mlx5e_sq_xmit_simple is called by the TLS offload, runs only the
checksum offload and sends the packet.
This change has no performance impact in TCP single stream test and
XDP_TX single stream test.
When compiled with a recent GCC, this change shows no visible
performance impact on UDP pktgen (burst 32) single stream test either:
Packet rate: 16.86 Mpps (±0.15 Mpps) -> 16.95 Mpps (±0.15 Mpps)
Instructions per packet: 434 -> 429
Cycles per packet: 158 -> 160
Instructions per cycle: 2.75 -> 2.69
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Move mlx5e_tx_wqe_inline_mode from en/txrx.h to en_tx.c as it's only
used there.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Struct assignment guarantees that all fields of the structure are
initialized (those that are not mentioned are zeroed). It makes code
mode robust and reduces chances for unpredictable behavior when one
forgets to reset some field and it holds an old value from previous
iterations of using the structure.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As preparation for the next patch, don't increase ihs to calculate
ds_cnt and then decrease it, but rather calculate the intermediate value
temporarily. This code has the same amount of arithmetic operations, but
now allows to split out ds_cnt calculation, which will be performed in
the next patch.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The IS2 IP4_TCP_UDP key offsets do not correspond to the VSC7514
datasheet. Whether they work or not is unknown to me. On VSC9959 and
VSC9953, with the same mistake and same discrepancy from the
documentation, tc-flower src_port and dst_port rules did not work, so I
am assuming the same is true here.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl9pQ8EACgkQSD+KveBX
+j7I2wf/cu9W3mC8sNeJaZKIbJ+H6KhgZsGbeLud5tFscjcf5IaCpR97hyeZPfEG
doNRtcsT9Pj5YJn458L/p+zTVeWOuaOGPMsV8pdP/8OlFzjJW/rGXnBrEUt0ehkS
Sa//xGD6V8+nW9Z34fwQqrrqJeZik3H9V/RkriZUTsJ/zR/otLF3fVOQFwrS9Ka2
/dl1ERFepjBWupY39PSMFS2S2BZ6LYY8G/ewgHKeexbqLykxU27P3+mFz46YPmP6
jdIMmvo+fuPqyu9Tjtg6pGjYpCWttnBBtDmeSg+ewf61qW4mSemJzfGcbZYY2XT6
CxRsm4aTJ5COTEx05JFOqIhpP5LuAA==
=Hcsv
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2020-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes-2020-09-18
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
v1->v2:
Remove missing patch from -stable list.
For -stable v5.1
('net/mlx5: Fix FTE cleanup')
For -stable v5.3
('net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported')
('net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported')
For -stable v5.7
('net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready')
For -stable v5.8
('net/mlx5e: Use RCU to protect rq->xdp_prog')
('net/mlx5e: Fix endianness when calculating pedit mask first bit')
('net/mlx5e: Use synchronize_rcu to sync with NAPI')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The spinlock only needed when accessing the channel's icosq, grab the lock
after the buf allocation in resync_post_get_progress_params() to avoid
kzalloc(GFP_KERNEL) in atomic context.
Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Reported-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Using synchronize_rcu() is sufficient to wait until running NAPI quits.
See similar upstream fix with detailed explanation:
("net/mlx5e: Use synchronize_rcu to sync with NAPI")
This change also fixes a possible use-after-free as the NAPI
might be already released at this stage.
Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The set of TLS TX global SW counters in mlx5e_tls_sw_stats_desc
is updated from all rings by using atomic ops.
This set of stats is used only in the FPGA TLS use case, not in
the Connect-X TLS one, where regular per-ring counters are used.
Do not expose them in the Connect-X use case, as this would cause
counter duplication. For example, tx_tls_drop_no_sync_data would
appear twice in the ethtool stats.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit started to reuse function mlx5e_update_ndo_stats() for
the representors as well.
However, the function is hard-coded to work on mlx5e_nic_stats_grps only.
Due to this issue, the representors statistics were not updated in the
output of "ip -s".
Fix it to work with the correct group by extracting it from the caller's
profile.
Also, while at it and since this function became generic, move it to
en_stats.c and rename it accordingly.
Fixes: 8a236b1514 ("net/mlx5e: Convert rep stats to mlx5e_stats_grp-based infra")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the FW does not generate events for counters other than error
counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s
uses) might run in atomic context, while the FW interface is non atomic.
Thus, 'ip' is not allowed to issue FW commands, so it will only display
cached counters in the driver.
Add a SW counter (mcast_packets) in the driver to count rx multicast
packets. The counter also counts broadcast packets, as we consider it a
special case of multicast.
Use the counter value when calling "ip -s"/"ifconfig".
Fixes: f62b8bb8f2 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The field mask value is provided in network byte order and has to
be converted to host byte order before calculating pedit mask
first bit.
Fixes: 88f30bbcba ("net/mlx5e: Bit sized fields rewrite support")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit creates peer miss group during switchdev mode
initialization in order to handle miss packets correctly while in VF
LAG mode. This is done regardless of FW support of such groups which
could cause rules setups failure later on.
Fix by adding FW capability check before creating peer groups/rule.
Fixes: ac004b8321 ("net/mlx5e: E-Switch, Add peer miss rules")
Signed-off-by: Maor Dickman <maord@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add missing mapping remove call when removing ct rule,
as the mapping was allocated when ct rule was adding with ct_label.
Also there is a missing mapping remove call in error flow.
Fixes: 54b154ecfb ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When deleting vxlan flow rule under multipath, tun_info in parse_attr is
not freed when the rule is not ready.
Fixes: ef06c9ee89 ("net/mlx5e: Allow one failure when offloading tc encap rules under multipath")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
As described in the previous commit, napi_synchronize doesn't quite fit
the purpose when we just need to wait until the currently running NAPI
quits. Its implementation waits until NAPI is not running by polling and
waiting for 1ms in between. In cases where we need to deactivate one
queue (e.g., recovery flows) or where we deactivate them one-by-one
(deactivate channel flow), we may get stuck in napi_synchronize forever
if other queues keep NAPI active, causing a soft lockup. Depending on
kernel configuration (CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC), it may result
in a kernel panic.
To fix the issue, use synchronize_rcu to wait for NAPI to quit, and wrap
the whole NAPI in rcu_read_lock.
Fixes: acc6c5953a ("net/mlx5e: Split open/close channels to stages")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, the RQs are temporarily deactivated while hot-replacing the
XDP program, and napi_synchronize is used to make sure rq->xdp_prog is
not in use. However, napi_synchronize is not ideal: instead of waiting
till the end of a NAPI cycle, it polls and waits until NAPI is not
running, sleeping for 1ms between the periodic checks. Under heavy
workloads, this loop will never end, which may even lead to a kernel
panic if the kernel detects the hangup. Such workloads include XSK TX
and possibly also heavy RX (XSK or normal).
The fix is inspired by commit 326fe02d1e ("net/mlx4_en: protect
ring->xdp_prog with rcu_read_lock"). As mlx5e_xdp_handle is already
protected by rcu_read_lock, and bpf_prog_put uses call_rcu to free the
program, there is no need for additional synchronization if proper RCU
functions are used to access the pointer. This patch converts all
accesses to rq->xdp_prog to use RCU functions.
Fixes: 86994156c7 ("net/mlx5e: XDP fast RX drop bpf programs support")
Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, when an FTE is allocated, its refcount is decreased to 0
with the purpose it will not be a stand alone steering object and every
rule (destination) of the FTE would increase the refcount.
When mlx5_cleanup_fs is called while not all rules were deleted by the
steering users, it hit refcount underflow on the FTE once clean_tree
calls to tree_remove_node after the deleted rules already decreased
the refcount to 0.
FTE is no longer destroyed implicitly when the last rule (destination)
is deleted. mlx5_del_flow_rules avoids it by increasing the refcount on
the FTE and destroy it explicitly after all rules were deleted. So we
can avoid the refcount underflow by making FTE as stand alone object.
In addition need to set del_hw_func to FTE so the HW object will be
destroyed when the FTE is deleted from the cleanup_tree flow.
refcount_t: underflow; use-after-free.
WARNING: CPU: 2 PID: 15715 at lib/refcount.c:28 refcount_warn_saturate+0xd9/0xe0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
tree_put_node+0xf2/0x140 [mlx5_core]
clean_tree+0x4e/0xf0 [mlx5_core]
clean_tree+0x4e/0xf0 [mlx5_core]
clean_tree+0x4e/0xf0 [mlx5_core]
clean_tree+0x5f/0xf0 [mlx5_core]
clean_tree+0x4e/0xf0 [mlx5_core]
clean_tree+0x5f/0xf0 [mlx5_core]
mlx5_cleanup_fs+0x26/0x270 [mlx5_core]
mlx5_unload+0x2e/0xa0 [mlx5_core]
mlx5_unload_one+0x51/0x120 [mlx5_core]
mlx5_devlink_reload_down+0x51/0x90 [mlx5_core]
devlink_reload+0x39/0x120
? devlink_nl_cmd_reload+0x43/0x220
genl_rcv_msg+0x1e4/0x420
? genl_family_rcv_msg_attrs_parse+0x100/0x100
netlink_rcv_skb+0x47/0x110
genl_rcv+0x24/0x40
netlink_unicast+0x217/0x2f0
netlink_sendmsg+0x30f/0x430
sock_sendmsg+0x30/0x40
__sys_sendto+0x10e/0x140
? handle_mm_fault+0xc4/0x1f0
? do_page_fault+0x33f/0x630
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x48/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 718ce4d601 ("net/mlx5: Consolidate update FTE for all removal changes")
Fixes: bd71b08ec2 ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/natsemi/ns83820.c: In function ns83820_get_link_ksettings:
drivers/net/ethernet/natsemi/ns83820.c:1210:11: warning: variable ‘tanar’ set but not used [-Wunused-but-set-variable]
`tanar` is never used, so remove it.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This failure path should return a negative error code but it currently
returns success.
Fixes: 51b35a454e ("sfc: skeleton EF100 PF driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After running Sparse checker on the driver using
make C=1 M=drivers/net/ethernet/amazon/ena
the only error that is thrown is:
sparse: sparse: Using plain integer as NULL pointer
about the line
struct ena_calc_queue_size_ctx calc_queue_ctx = { 0 };
This patch fixes this warning, thus making our driver free (for now) of
Sparse errors/warnings.
To make a more complete work, this patch also fixes all static warnings
that were found using an internal static checker.
Signed-off-by: Ido Segev <idose@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The formal name changes to "ENA_ADMIN_RSS_INDIRECTION_TABLE_CONFIG".
Indirection is the ability to reference "something" using "something else"
instead of the value itself.
Indirection table, as the name implies, is the ability to reference
CPU/Queue value using hash-to-CPU table instead of CPU/Queue itself.
This patch renames the variable keys_num, which describes the number of
words in the RSS hash key, to key_parts which makes its purpose clearer
in RSS context.
Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The placement policy is printed in the process of queue creation in
ena_up(). No need to print it in ena_probe().
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Capitalize all log strings printed by the ena driver to make their
format uniform across it.
Also fix indentation, spelling mistakes and comments to improve code
readability. This also includes adding comments to macros/enums whose
purpose might be difficult to understand.
Separate some code into functions to make it easier to understand the
purpose of these lines.
Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make log prints in ena_netdev use the same log functions as the rest of
the driver.
For the sake of consistency, all prints in ena_netdev file were
converted into netif_* format except where netdev struct isn't yet
defined. For these places, dev_* log functions are used (similar to
the patch for ena_com files).
This commit leaves some corner cases which would be changed in a
future patch.
Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All ena files should now use SPDX format in their license string. This
doesn't change the license of the files, but rather states the same
license in fewer words.
Also update the license years in some of the files.
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the return expression.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the return expression.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sg_init_table zeroes its first argument, so the allocation of that argument
doesn't have to.
the semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,n,flags;
@@
x =
- kcalloc
+ kmalloc_array
(n,sizeof(struct scatterlist),flags)
...
sg_init_table(x,n)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The wrong flag value caused the firmware call to return actual port
counters instead of the counter masks. This messed up the counter
overflow logic and caused erratic extended port counters to be
displayed under ethtool -S.
Fixes: 531d1d269c ("bnxt_en: Retrieve hardware masks for port counters.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix it to set the required fid input parameter. The firmware call
fails without this patch.
Fixes: d752d0536c ("bnxt_en: Retrieve hardware counter masks from firmware if available.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Debug firmware commands are not supported on VFs to read registers.
This patch avoids logging unnecessary access_denied error on VFs
when user calls ETHTOOL_GREGS.
By returning error in get_regs_len() method on the VF, the get_regs()
method will not be called.
Fixes: b5d600b027 ("bnxt_en: Add support for 'ethtool -d'")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All changes related to bp->link_info require the protection of the
link_lock mutex. It's not sufficient to rely just on RTNL.
Fixes: 163e9ef636 ("bnxt_en: Fix race when modifying pause settings.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Returning "unknown" as a temperature value violates the hwmon interface
rules. Appropriate error codes should be returned via device_attribute
show instead. These will ultimately be propagated to the user via the
file system interface.
In addition to the corrected error handling, it is an even better idea to
not present the sensor in sysfs at all if it is known that the read will
definitely fail. Given that temp1_input is currently the only sensor
reported, ensure no hwmon registration if TEMP_MONITOR_QUERY is not
supported or if it will fail due to access permissions. Something smarter
may be needed if and when other sensors are added.
Fixes: 12cce90b93 ("bnxt_en: fix HWRM error when querying VF temperature")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using strlcpy() to copy from VPD is not correct because VPD strings
are not necessarily NULL terminated. Use memcpy() to copy the VPD
length up to the destination buffer size - 1. The destination is
zeroed memory so it will always be NULL terminated.
Fixes: a0d0fd70fe ("bnxt_en: Read partno and serialno of the board from VPD")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/marvell/mvneta.c:754:6: warning:
variable 'dummy' set but not used [-Wunused-but-set-variable]
754 | u32 dummy;
| ^~~~~
This variable is not used in function mvneta_mib_counters_clear(), so
remove it to avoid build warning.
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid copying skb_shared_info frags array in mvneta_swbm_build_skb() since
__build_skb_around() does not overwrite it
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recycle the received page into the page_pool cache if the dma descriptors
arrived in a wrong order
Fixes: ca0e014609 ("net: mvneta: move skb build after descriptors processing")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This addresses the following coccinelle warning:
drivers/net/ethernet/ti/cpsw.c:1599:2-17: WARNING: Assignment of 0/1 to
bool variable
drivers/net/ethernet/ti/cpsw.c:1300:2-17: WARNING: Assignment of 0/1 to
bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This addresses the following coccinelle warning:
drivers/net/ethernet/realtek/8139too.c:981:2-8: WARNING: Assignment of
0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This addresses the following coccinelle warning:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:15415:1-26: WARNING:
Assignment of 0/1 to bool variable
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:12393:2-17: WARNING:
Assignment of 0/1 to bool variable
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:15497:2-27: WARNING:
Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This addresses the following coccinelle warning:
drivers/net/ethernet/qlogic/qed/qed_rdma.c:1465:2-13: WARNING:
Assignment of 0/1 to bool variable
drivers/net/ethernet/qlogic/qed/qed_rdma.c:1468:2-14: WARNING:
Assignment of 0/1 to bool variable
drivers/net/ethernet/qlogic/qed/qed_rdma.c:1471:2-13: WARNING:
Assignment of 0/1 to bool variable
drivers/net/ethernet/qlogic/qed/qed_rdma.c:1472:2-14: WARNING:
Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This addresses the following coccinelle warning:
drivers/net/ethernet/broadcom/b44.c:2213:6-20: WARNING: Assignment of
0/1 to bool variable
drivers/net/ethernet/broadcom/b44.c:2218:2-16: WARNING: Assignment of
0/1 to bool variable
drivers/net/ethernet/broadcom/b44.c:2226:3-17: WARNING: Assignment of
0/1 to bool variable
drivers/net/ethernet/broadcom/b44.c:2230:3-17: WARNING: Assignment of
0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/micrel/ksz884x.c: In function rx_proc:
drivers/net/ethernet/micrel/ksz884x.c:4981:6: warning: variable ‘rx_status’ set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/micrel/ksz884x.c: In function netdev_get_ethtool_stats:
drivers/net/ethernet/micrel/ksz884x.c:6512:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable]
these variable is never used, so remove it.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/intel/e1000/e1000_hw.c: In function e1000_phy_init_script:
drivers/net/ethernet/intel/e1000/e1000_hw.c:132:6: warning: variable ‘ret_val’ set but not used [-Wunused-but-set-variable]
`ret_val` is never used, so remove it.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/cavium/liquidio/octeon_device.c: In function lio_pci_readq:
drivers/net/ethernet/cavium/liquidio/octeon_device.c:1327:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/cavium/liquidio/octeon_device.c: In function lio_pci_writeq:
drivers/net/ethernet/cavium/liquidio/octeon_device.c:1358:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable]
these variable is never used, so remove it.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the region to be snapshotted to the function performing the
snapshot. This allows one function to operate on numerous regions.
v4:
Add missing kerneldoc for ICE
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a good measure to ensure correctness if the structures that are
meant to remain constant are only processed by functions that thake
constant arguments.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the header file containing a function's prototype isn't included by
the sourcefile containing the associated function, the build system
complains of missing prototypes.
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c:453:6: warning: no previous prototype for ‘hclge_dcb_ops_set’ [-Wmissing-prototypes]
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the header file containing a function's prototype isn't included by
the sourcefile containing the associated function, the build system
complains of missing prototypes.
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/cavium/liquidio/cn68xx_device.c:124:5: warning: no previous prototype for ‘lio_setup_cn68xx_octeon_device’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:159:1: warning: no previous prototype for ‘octeon_pci_read_core_mem’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:168:1: warning: no previous prototype for ‘octeon_pci_write_core_mem’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:176:5: warning: no previous prototype for ‘octeon_read_device_mem64’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:185:5: warning: no previous prototype for ‘octeon_read_device_mem32’ [-Wmissing-prototypes]
drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:194:6: warning: no previous prototype for ‘octeon_write_device_mem32’ [-Wmissing-prototypes]
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix below sparse warning in dpmac.c.
warning: cast to restricted __le64
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make a distinction between different irqs by netdev name or pci name.
Signed-off-by: Luo bin <luobin9@huawei.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/cortina/gemini.c: In function gmac_get_ringparam:
drivers/net/ethernet/cortina/gemini.c:2125:21: warning: variable ‘config0’ set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/cortina/gemini.c: In function gmac_init:
drivers/net/ethernet/cortina/gemini.c:512:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable]
these variable is never used, so remove it.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add PTP sync packet one-step timestamping support.
Before egress, one-step timestamping enablement needs,
- Enabling timestamp and FAS (Frame Annotation Status) in
dpni buffer layout.
- Write timestamp to frame annotation and set PTP bit in
FAS to mark as one-step timestamping event.
- Enabling one-step timestamping by dpni_set_single_step_cfg()
API, with offset provided to insert correction time on frame.
The offset must respect all MAC headers, VLAN tags and other
protocol headers accordingly. The correction field update can
consider delays up to one second. So PTP frame needs to be
filtered and parsed, and written timestamp into Sync frame
originTimestamp field.
The operation of API dpni_set_single_step_cfg() has to be done
when no one-step timestamping frames are in flight. So we have
to make sure the last one-step timestamping frame has already
been transmitted on hardware before starting to send the current
one. The resolution is,
- Utilize skb->cb[0] to mark timestamping request per packet.
If it is one-step timestamping PTP sync packet, queue to skb queue.
If not, transmit immediately.
- Schedule a work to transmit skbs in skb queue.
- mutex lock is used to ensure the last one-step timestamping packet
has already been transmitted on hardware through TX confirmation queue
before transmitting current packet.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a preparation for next hardware one-step timestamping
support. For DPAA2, the one step timestamping configuration on
hardware registers has to be done when there is no one-step timestamping
packet in flight. So we will have to use workqueue and skb queue
for such packets transmitting, to make sure waiting the last packet has
already been sent on hardware, and starting to transmit the current one.
So the tx timestamping flag in private data may not reflect the actual
request for the one-step timestamping packets of skb queue. This also
affects skb headroom allocation. Let's use skb->cb[0] to mark the
timestamping request for each skb.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Invoke dpaa2_eth_enable_tx_tstamp() once in code after building FD,
rather than calling it in dpaa2_eth_build_single_fd(),
dpaa2_eth_build_sg_fd_single_buf(), and dpaa2_eth_build_sg_fd().
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define a global ptp_qoriq structure pointer, and export to use.
The ptp clock operations will be used in dpaa2-eth driver.
For example, supporting one step timestamping needs to write
current time to hardware frame annotation before sending and
then hardware inserts the delay time on frame during sending.
So in driver, at least clock gettime operation will be needed
to make sure right time is written to hardware frame annotation
for one step timestamping.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add APIs of 1588 single step timestamping.
- dpni_set_single_step_cfg
- dpni_get_single_step_cfg
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call netif_tx_disable firstly before starting doing self-test to
avoid sending packet from networking core and self-test packet
simultaneously which may cause self-test failure or hw abnormal.
Fixes: 4aa218a4fe ("hinic: add self test support")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 550f4d46af.
adapter->from_passive_init may be changed in ibmvnic_handle_crq
while ibmvnic_reset_init is waiting for the completion of
adapter->init_done.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for firmware update through the devlink interface.
This update copies the firmware object into the device, asks
the current firmware to install it, then asks the firmware to
select the new firmware for the next boot-up.
The install and select steps are launched as asynchronous
requests, which are then followed up with status request
commands. These status request commands will be answered with
an EAGAIN return value and will try again until the request
has completed or reached the timeout specified.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the rest of the firmware api bits needed to support the
driver running a firmware update.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mscc_ocelot_init_ports() will skip initializing a port when it
doesn't have a phy-handle, so the ocelot->ports[port] pointer will be
NULL. Take this into consideration when tearing down the driver, and add
a new function ocelot_deinit_port() to the switch library, mirror of
ocelot_init_port(), which needs to be called by the driver for all ports
it has initialized.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver was not unregistering its network interfaces on unbind.
Now it is.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mscc_ocelot_probe() is already pretty large and hard to follow. So move
the code for parsing ports in a separate function.
This makes it easier for the next patch to just call
mscc_ocelot_release_ports from the error path of mscc_ocelot_init_ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_init() allocates memory, resets the switch and polls for a status
register, things which can fail. Stop probing the driver in that case,
and propagate the error result.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not proceed probing if we couldn't allocate memory for the ports
array, just error out.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot_port->ts_id is used to:
(a) populate skb->cb[0] for matching the TX timestamp in the PTP IRQ
with an skb.
(b) populate the REW_OP from the injection header of the ongoing skb.
Only then is ocelot_port->ts_id incremented.
This is a problem because, at least theoretically, another timestampable
skb might use the same ocelot_port->ts_id before that is incremented.
Normally all transmit calls are serialized by the netdev transmit
spinlock, but in this case, ocelot_port_add_txtstamp_skb() is also
called by DSA, which has started declaring the NETIF_F_LLTX feature
since commit 2b86cb8299 ("net: dsa: declare lockless TX feature for
slave ports"). So the logic of using and incrementing the timestamp id
should be atomic per port.
The solution is to use the global ocelot_port->ts_id only while
protected by the associated ocelot_port->ts_id_lock. That's where we
populate skb->cb[0]. Note that for ocelot, ocelot_port_add_txtstamp_skb
is called for the actual skb, but for felix, it is called for the skb's
clone. That is something which will also be changed in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TX-timestampable skb is added late to the ocelot_port->tx_skbs. It
is in a race with the TX timestamp IRQ, which checks that queue trying
to match the timestamp with the skb by the ts_id. The skb should be
added to the queue before the IRQ can fire.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky says:
====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
* branch 'mlx5_active_speed':
RDMA: Fix link active_speed size
RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
net/mlx5: Refactor query port speed functions
struct ethtool_fecparam carries bitmasks not bit numbers.
We want to return 1 (NONE), not 0.
Fixes: 0d08709383 ("nfp: implement ethtool FEC mode settings")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rx_request_irq(), it will just return what irq_set_affinity_hint()
returns. If it is failed, the napi and irq requested are not freed
properly. So add exits for failures to handle these.
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two interfaces to configure ETS: qdiscs and DCB. Historically,
DCB ETS configuration was projected to ingress as well, and configured port
buffers. Qdisc was not.
Keep qdiscs behaving this way, and if an offloaded qdisc is configured on a
port, move this port's headroom to a manual mode, thus allowing
configuration of port buffers through dcbnl_setbuffer.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add dcbnl_setbuffer, which bounces requests if a headroom is in DCB mode.
Implement dcbnl_getbuffer such that it can always be used to determine
port-buffer configuration, regardless of headroom mode.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two interfaces to configure ETS: qdiscs and DCB. Historically,
DCB ETS configuration was projected to ingress as well, and configured port
buffers. Qdisc was not.
So as not to break clients that today use DCB ETS and PFC and rely on
getting a reasonable ingress buffer priomap, keep the ETS mirroring in
effect.
Since qdiscs have not done this mirroring historically, it is reasonable
not to introduce it, but rather permit manual ingress configuration through
dcbnl_setbuffer only in the qdisc mode.
This will require a toggle to indicate whether buffer sizes should be
autocomputed or taken from dcbnl_setbuffer, and likewise for priomaps.
Introduce such and initialize it, and guard port buffer size configuration
as appropriate. The toggle is currently left in the DCB position. In a
following patch, qdisc code will switch it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool API provides support for the configuration of the following
features: speed and duplex, auto-negotiation, MDI-x, forward error
correction, port media type. The API also provides information about the
port status, hardware and software statistic. The following limitation
exists:
- port media type should be configured before speed setting
- ethtool -m option is not supported
- ethtool -p option is not supported
- ethtool -r option is supported for RJ45 port only
- the following combination of parameters is not supported:
ethtool -s sw1pX port XX autoneg on
- forward error correction feature is supported only on SFP ports, 10G
speed
- auto-negotiation and MDI-x features are not supported on
Copper-to-Fiber SFP module
Co-developed-by: Andrii Savka <andrii.savka@plvision.eu>
Signed-off-by: Andrii Savka <andrii.savka@plvision.eu>
Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add very basic support for devlink interface:
- driver name
- fw version
- devlink ports
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add PCI interface driver for Prestera Switch ASICs family devices, which
provides:
- Firmware loading mechanism
- Requests & events handling to/from the firmware
- Access to the firmware on the bus level
The firmware has to be loaded each time the device is reset. The driver
is loading it from:
/lib/firmware/mrvl/prestera/mvsw_prestera_fw-v{MAJOR}.{MINOR}.img
The full firmware image version is located within the internal header
and consists of 3 numbers - MAJOR.MINOR.PATCH. Additionally, driver has
hard-coded minimum supported firmware version which it can work with:
MAJOR - reflects the support on ABI level between driver and loaded
firmware, this number should be the same for driver and loaded
firmware.
MINOR - this is the minimum supported version between driver and the
firmware.
PATCH - indicates only fixes, firmware ABI is not changed.
Firmware image file name contains only MAJOR and MINOR numbers to make
driver be compatible with any PATCH version.
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marvell Prestera 98DX326x integrates up to 24 ports of 1GbE with 8
ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely
wireless SMB deployment.
The current implementation supports only boards designed for the Marvell
Switchdev solution and requires special firmware.
The core Prestera switching logic is implemented in prestera_main.c,
there is an intermediate hw layer between core logic and firmware. It is
implemented in prestera_hw.c, the purpose of it is to encapsulate hw
related logic, in future there is a plan to support more devices with
different HW related configurations.
This patch contains only basic switch initialization and RX/TX support
over SDMA mechanism.
Currently supported devices have DMA access range <= 32bit and require
ZONE_DMA to be enabled, for such cases SDMA driver checks if the skb
allocated in proper range supported by the Prestera device.
Also meanwhile there is no TX interrupt support in current firmware
version so recycling work is scheduled on each xmit.
Port's mac address is generated from the switch base mac which may be
provided via device-tree (static one or as nvme cell), or randomly
generated. This is required by the firmware.
Co-developed-by: Andrii Savka <andrii.savka@plvision.eu>
Signed-off-by: Andrii Savka <andrii.savka@plvision.eu>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the memory leak in mps during module unload
path by freeing mps reference entries if the list
adpter->mps_ref is not already empty
Fixes: 28b3870578 ("cxgb4: Re-work the logic for mps refcounting")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use napi_consume_skb() to batch consuming skb when cleaning
tx desc in NAPI polling.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
writel() can be used to order I/O vs memory by default when
writing portable drivers. Use writel() to replace wmb() +
writel_relaxed(), and writel() is dma_wmb() + writel_relaxed()
for ARM64, so there is an optimization here because dma_wmb()
is a lighter barrier than wmb().
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently HNS3_RING_RX_RING_FBDNUM_REG register is read to determine
how many rx desc can be cleaned. To avoid the register read operation
in the critical data path, use the valid bit in the rx desc to determine
if a specific rx desc can be cleaned.
The hns3 driver clear valid bit in the rx desc before notifying the
rx desc to the hw, and hw will only set the valid bit of the rx desc
after corresponding buffer is filled with packet data and other field
in the rx desc is set accordingly.
Add hns3_rx_ring_move_fw() function to clear the valid bit in the rx
desc before moving rx ring's next_to_clean forward to avoid double
cleaning a rx desc, also add a dma_rmb() barrier in hns3_handle_rx_bd()
to make sure valid bit is set before reading other field in the rx desc.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently HNS3_RING_TX_RING_HEAD_REG register is read to determine
how many tx desc can be cleaned. To avoid the register read operation
in the critical data path, use the valid bit in the tx desc to determine
if a specific tx desc can be cleaned.
The hns3 driver sets valid bit in the tx desc before ringing a doorbell
to the hw, and hw will only clear the valid bit of the tx desc after
corresponding packet is sent out to the wire. And because next_to_use
for tx ring is a changing variable when the driver is filling the tx
desc, so reuse the pull_len for rx ring to record the tx desc that has
notified to the hw, so that hns3_nic_reclaim_desc() can decide how many
tx desc's valid bit need checking when reclaiming tx desc.
And io_err_cnt stat is also removed for it is not used anymore.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use netdev_xmit_more() to defer the tx doorbell operation when
the skb is passed to the driver continuously. By doing this we
can improve the overall xmit performance by avoid some doorbell
operations.
Also, the tx_err_cnt stat is not used, so rename it to tx_more
stat.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Batch the page reference count updates instead of doing them
one at a time. By doing this we can improve the overall receive
performance by avoid some atomic increment operations when the
rx page is reused.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Combine two same enums to avoid duplication.
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.
And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.
According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.
Fixes: ada68c31ba ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the dim library to manage dynamic interrupt
moderation in ionic.
v3: rebase
v2: untangled declarations in ionic_dim_work()
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add support to --show-ring & --set-ring Ethtool functions:
- Adding min, max, power of two check to new ring parameter's value.
- Bring down the network interface before changing the value of ring
parameters.
- Bring up the network interface after changing the value of ring
parameters.
Signed-off-by: Song, Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Traffic mirroring modes that are in-chip implemented on egress need an
internal buffer to work. As the only client, the SPAN module was managing
the buffer so far. However logically it belongs to the buffers module. E.g.
buffer size validation needs to take the size of the internal buffer into
account.
Therefore move the related code from SPAN to spectrum_buffers. Move over
the callbacks that determine the minimum buffer size as a function of
maximum speed and MTU. Add a field describing the internal buffer to struct
mlxsw_sp_hdroom. Extend mlxsw_sp_hdroom_bufs_reset_sizes() to take care of
sizing the internal buffer as well. Change the SPAN module to invoke that
function and mlxsw_sp_hdroom_configure() like all the other hdroom clients.
Drop the now-unnecessary mlxsw_sp_span_port_buffer_disable().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size of the internal buffer is currently calculated in the SPAN module.
Logically it belongs to the spectrum_buffers module, where it should be
moved. However, that being a chip-specific operation, it needs dynamic
dispatch. There currently is a chip-specific structure for description of
shared buffer values, struct mlxsw_sp_sb_vals. However placing ops into
this structure would be confusing. Therefore introduce a new per-chip
structure, currently empty, and initialize the ops pointer as appropriate.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mlxsw_sp_port_headroom_init() configures both priomap and buffers
by hand. Additionally, for port buffers, it configures buffer 0 with a size
that it will never again have if PFC configuration is touched.
Rewrite the init code to become a client of the new hdroom code. The only
difference in invocation is that the configuration is forced, so that it is
issued even if the desired configuration happens to match what is contained
in (hitherto not initialized with meaningful values) mlxsw_sp_port->hdroom.
Since now mlxsw_sp_port_headroom_init() initializes all the PG buffers to
meaningful values, mlxsw_sp_hdroom_configure_buffers() can avoid querying
the current configuration, and can fill the whole PBMC itself.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is now only used from the buffers module, and is a trivial
field reference. Just inline it and drop the related artifacts.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move all the headroom code to the spectrum_buffers module, where it
belongs.
Rename mlxsw_sp_pg_buf_threshold_get() and mlxsw_sp_pg_buf_pack() to
..._hdroom_... to match the naming convention of the new headroom code.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ETS handler performs the headroom configuration in three steps: first
it resizes the buffers and adds any new ones. Then it redirects priorities
to the new buffers. And finally it sets the size of the now-unused buffers
to zero. This way no packet drops are introduced.
This sort of careful approach will also be useful for configuring port
buffer sizes and priority map by hand, through dcbnl_setbuffer. Therefore
move the code from the DCB handler to the generic headroom function.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new hdroom code has certain conventions: iteration over priorities is
done through a variable named `prio', configuration is not pushed unless it
is dirty, but a `force' flag can be used to override this, updated
configuration is written to port. Convert the function
mlxsw_sp_port_pg_prio_map() to use these conventions and rename
appropriately to fit in.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ETS handler performs the headroom configuration in three steps: first
it resizes the buffers and adds any new ones. Then it redirects priorities
to the new buffers. And finally it sets the size of the now-unused buffers
to zero. This way no packet drops are introduced.
Both of the buffer size configuration operations are simply buffer size
configurations, there is no material difference between setting buffers to
zero and any other value. Therefore simply invoke the same
mlxsw_sp_hdroom_configure(), and drop mlxsw_sp_port_pg_destroy() and
mlxsw_sp_ets_has_pg() which are now unused.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split mlxsw_sp_port_headroom_set() to three functions.
mlxsw_sp_hdroom_bufs_reset_sizes() changes the sizes of the individual PG
buffers, and mlxsw_sp_hdroom_configure_buffers() will actually apply the
configuration. A third function, mlxsw_sp_hdroom_bufs_fit(), verifies that
the requested buffer configuration matches total headroom size
requirements.
Add wrappers, mlxsw_sp_hdroom_configure() and __..., that will eventually
perform full headroom configuration, but for now, only have them verify the
configured headroom size, and invoke mlxsw_sp_hdroom_configure_buffers().
Have them take the `force` argument to prepare for a later patch, even
though it is currently unused.
Note that the loop in mlxsw_sp_hdroom_configure_buffers() only goes through
DCBX_MAX_BUFFERS. Since there is no logic to configure the control buffer,
it needs to keep the values queried from the FW. Eventually this function
should configure all the PGs.
Note that conversion of __mlxsw_sp_dcbnl_ieee_setets() is not trivial. That
function performs the headroom configuration in three steps: first it
resizes the buffers and adds any new ones. Then it redirects priorities to
the new buffers. And finally it sets the size of the now-unused buffers to
zero. This way no packet drops are introduced.
So after invoking mlxsw_sp_hdroom_bufs_reset_sizes(), tweak the
configuration to keep the old sizes of PG buffers for those buffers whose
size was set to zero.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, port buffers were always autoconfigured. When dcbnl_setbuffer
callback is implemented, it will allow the user to change the buffer size
configuration by hand. The sizes therefore need to be a configuration
parameter, not always deduced, and therefore belong to struct
mlxsw_sp_hdroom, where the configuration routine should take them from.
Update mlxsw_sp_port_headroom_set() to update these sizes. Have the
function update the sizes even for the case that a given buffer is not
used.
Additionally, change the loop iteration end to DCBX_MAX_BUFFERS instead of
IEEE_8021QAZ_MAX_TCS. The value is the same, but the semantics differ.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Client-side configuration has lossiness as an attribute of a priority.
Therefore add a "lossy" attribute to struct mlxsw_sp_hdroom_prio.
To a Spectrum ASIC, lossiness is a feature of a port buffer. Therefore add
struct mlxsw_sp_hdroom_buf, which in the following patches will get more
attributes, but right now only use it to track port buffer lossiness.
Instead of passing around the primary indicators of PFC and pause_en, add a
function mlxsw_sp_hdroom_bufs_reset_lossiness() to compute the buffer
lossiness from the priority map and priority lossiness. Change
mlxsw_sp_port_headroom_set() to take the buffer lossy flag from the
headroom configuration. Have the PFC and pause handlers configure priority
lossiness in mlxsw_sp_hdroom, from where it will propagate.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mapping from priorities to buffers determines which buffers should be
configured. Lossiness of these priorities combined with the mapping
determines whether a given buffer should be lossy.
Currently this configuration is stored implicitly in DCB ETS, PFC and
ethtool PAUSE configuration. Keeping it together with the rest of the
headroom configuration and deriving it as needed from PFC / ETS / PAUSE
will make things clearer. To that end, add a field "prios" to struct
mlxsw_sp_hdroom.
Previously, __mlxsw_sp_port_headroom_set() took prio_tc as an argument, and
assumed that the same mapping as we use on the egress should be used on
ingress as well. Instead, track this configuration at each priority, so
that it can be adjusted flexibly.
In the following patches, as dcbnl_setbuffer is implemented, it will need
to store its own mapping, and it will also be sometimes necessary to revert
back to the original ETS mapping. Therefore track two buffer indices: the
one for chip configuration (buf_idx), and the source one (ets_buf_idx).
Introduce a function to configure the chip-level buffer index, and for now
have it simply copy the ETS mapping over to the chip mapping.
Update the ETS handler to project prio_tc to the ets_buf_idx and invoke the
buf_idx recomputation.
Now that there is a canonical place to look for this configuration,
mlxsw_sp_port_headroom_set() does not need to invent def_prio_tc to use if
DCB is compiled out.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MTU influences sizes of auto-allocated buffers. Make it a part of port
buffer configuration and have __mlxsw_sp_port_headroom_set() take it from
there, instead of as an argument.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a priority is marked as lossless using DCB PFC, or when pause frames
are enabled on a port, mlxsw adds to port buffers an extra space to cover
the traffic that will arrive between the time that a pause or PFC frame is
emitted, and the time traffic actually stops. This is called the delay. The
concept is the same in PFC and pause, however the way the extra buffer
space is calculated differs.
In this patch, unify this handling. Delay is to be measured in bytes of
extra space, and will not include MTU. PFC handler sets the delay directly
from the parameter it gets through the DCB interface.
To convert pause handler, move MLXSW_SP_PAUSE_DELAY to ethtool module,
convert to bytes, and reduce it by maximum MTU, and divide by two. Then it
has the same meaning as the delay_bytes set by the PFC handler.
Keep the delay_bytes value in struct mlxsw_sp_hdroom introduced in the
previous patch. Change PFC and pause handlers to store the new delay value
there and have __mlxsw_sp_port_headroom_set() take it from there.
Instead of mlxsw_sp_pfc_delay_get() and mlxsw_sp_pg_buf_delay_get(),
introduce mlxsw_sp_hdroom_buf_delay_get() to calculate the delay provision.
Drop the unnecessary MLXSW_SP_CELL_FACTOR, and instead add an explanatory
comment describing the formula used.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port headroom handling is currently strewn across several modules and
tricky to follow: MTU, DCB PFC, DCB ETS and ethtool pause all influence the
settings, and then there is the completely separate initial configuraion in
spectrum_buffers. A following patch will implement the dcbnl_setbuffer
callback, which is going to further complicate the landscape.
In order to simplify work with port buffers, the following patches are
going to centralize all port-buffer handling in spectrum_buffers. As a
first step, introduce a (currently empty) struct mlxsw_sp_hdroom that will
keep the configuration parameters, and allocate and free it in appropriate
places.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various updates to mlx5 driver,
1) Eli adds support for TC trap action.
2) Eran, minor improvements to clock.c code structure
3) Better handling of error reporting in LAG from Jianbo
4) IPv6 traffic class (DSCP) header rewrite support from Maor
5) Ofer Levi adds support for CQE compression of multi-strides packets
6) Vu, Enables use of vport meta data by default.
7) Some minor code cleanup
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl9hDykACgkQSD+KveBX
+j6qAQgAn4HWJp7Bu7S7okRbv1bg+uj7mQgU1oEU7P1xzpx2gfZcD0ejjwoxGV/8
iK/FC2KQeuBKqIkLPnQC1o4CH8Fk9kr2HuhmX46Gkn07ohyObf6w8fFVrGv/5QrB
fWUWhu+TQJNA/qnMlCfQ5t5Jt+XYL0m7VdfhCHE3R5rmpcZ2PHhxmvoG/NlBLUUK
kjggjtjX6Vv1CRit0w08FJwsJbqHy3wqpciX4Xc+wZp9A+D5VAyVtXP6ngaDIsAA
RcUzGyH8x4gphnplySkvj/LXboaqiMtd8sPeXCOax2HlYarFAAnNG//7fwhfYIHe
c/509buvfjSFsIwQYRem7d/abkU5Rw==
=4r5e
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-09-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-09-15
Various updates to mlx5 driver,
1) Eli adds support for TC trap action.
2) Eran, minor improvements to clock.c code structure
3) Better handling of error reporting in LAG from Jianbo
4) IPv6 traffic class (DSCP) header rewrite support from Maor
5) Ofer Levi adds support for CQE compression of multi-strides packets
6) Vu, Enables use of vport meta data by default.
7) Some minor code cleanup
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All files related to this driver contain the following notice:
See LICENSE.qla3xxx for copyright and licensing details.
LICENSE.qla3xxx can be found in
Documentation/networking/device_drivers/qlogic/. The file contains:
- A copyright notice
This copyright notice is redundant as all files contain the same
copyright notice already
- A license notice
You may modify and redistribute the device driver code under the GNU
General Public License (a copy of which is attached hereto as Exhibit
A) published by the Free Software Foundation (version 2 or a later
version).
This can be replaced with the corresponding SPDX license identifier
(GPL-2.0-or-later) in the source files which reference this license
file.
- A license for the device firmware
This license is pointless in the context of the kernel as the firmware
is not distributed as part of the kernel.
LICENSE.qla2xxx contained exactly the same firmware license which was
removed with commit bc3f957c06 ("[SCSI] qla2xxx: Update
LICENSE.qla2xxx.").
The firmware license is there due to the fact that the out of tree
driver tarball which was available from the qlogic website contained
the firmware binary. The firmware license in the qla3xxx license file
got probably forgotten when the other qlogic license files were
updated.
Remove the notices and add the SPDX license identifier GPL-2.0-or-later to
the source files.
Finally remove the now redundant LICENSE.qla3xxx file.
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All files in this driver directory contain the following notice:
See LICENSE.qlcnic for copyright and licensing details.
LICENSE.qlacnic can be found in
Documentation/networking/device_drivers/qlogic/. The file contains:
- A copyright notice
This copyright notice is redundant as all files contain the same
copyright notice already
- A license notice
You may modify and redistribute the device driver code under the
GNU General Public License (a copy of which is attached hereto as
Exhibit A) published by the Free Software Foundation (version 2).
This can be replaced with the corresponding SPDX license identifier
(GPL-2.0-only) in the source files which reference this license
file.
- The full GPLv2 license text
A redundant copy of LICENSES/preferred/GPL-2.0
Remove the notices and add the SPDX license identifier GPL-2.0-only to the
source files.
Finally remove the now redundant LICENSE.qlcnic file.
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To support modifying the used_maps array, we use a mutex to protect
the use of the counter and the array. The mutex is initialized right
after the prog aux is allocated, and destroyed right before prog
aux is freed. This way we guarantee it's initialized for both cBPF
and eBPF.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Link: https://lore.kernel.org/bpf/20200915234543.3220146-2-sdf@google.com
As CHELSIO_INLINE_CRYPTO is bool, and CHELSIO_T4 is tristate, the
dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is not sufficient to
protect CRYPTO_DEV_CHELSIO_TLS and CHELSIO_IPSEC_INLINE. The latter two
are also tristate, hence if CHELSIO_T4=n, they cannot be builtin, as
that would lead to link failures like:
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c:259: undefined reference to `cxgb4_port_viid'
and
drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c:752: undefined reference to `cxgb4_reclaim_completed_tx'
Fix this by re-adding dependencies on CHELSIO_T4 to tristate symbols.
The dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is kept to avoid
asking the user.
Fixes: 6bd860ac1c ("chelsio/chtls: CHELSIO_INLINE_CRYPTO should depend on CHELSIO_T4")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce devlink health reporter to report FW fatal events. Implement
the event listener using MFDE trap and enable the events to be
propagated using MFGD register configuration.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce MFGD register that is used to configure firmware debugging.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce MFDE register that is passed through MFDE trap in case of
fatal FW event.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the fw flashing code was moved to core.c, move the param which is
related to it there as well. Remove unnecessary parentheses on the way.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the code calling params register/unregister driver ops into
separate functions. Call publish/unpublish unconditionally.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the firmware flashing is not specific to Spectrum, move the code to
core.c and avoid one op call and 2 exported symbols. Also, this allows
to do flash before call of driver->init function and possibly do other
core calls in between.
Do some small renaming here and there on the way to be consistent with
the rest of core.c code.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Among other changes, this version supports FW monitoring.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation of stmmac_stop_all_queues() and
stmmac_start_all_queues() will not work correctly when the value of
tx_queues_to_use is changed through ethtool -L DEVNAME rx N tx M command.
Also, netif_tx_start|stop_all_queues() are only needed in driver open()
and close() only.
Fixes: c22a3f48 net: stmmac: adding multiple napi mechanism
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_set_real_num_tx_queues() & netif_set_real_num_rx_queues() should be
used to inform network stack about the real Tx & Rx queue (active) number
in both stmmac_open() and stmmac_resume(), therefore, we move the code
from stmmac_dvr_probe() to stmmac_hw_setup().
Fixes: c02b7a9145 net: stmmac: use netif_set_real_num_{rx,tx}_queues
Signed-off-by: Aashish Verma <aashishx.verma@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Restructure NAPI add and delete process so that we can call them
accordingly in open() and ethtool_set_channels() accordingly.
Introduced stmmac_reinit_queues() to handle the transition needed
for changing Rx & Tx channels accordingly.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check if the pause stats are reported by HW by checking the bitmap.
Calculation is based on the order of strings in main_strings from
ethtool -S. Hopefully the semantics of these stats match the standard..
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Plumb through all the indirection and copy some code from
ethtool -S. The names of the group indicate that these are
the stats we are after (and Saeed confirms it).
v3:
- fix build in mlx5_rep
v2:
- drop the ethool helper and call stats directly
- don't pass 0 as initialized to in buffer
- use local buffer
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report standard pause frame stats. They are already aggregated
in struct ixgbe_hw_stats.
The combination of the registers is suggested as equivalent to
PAUSEMACCtrlFramesTransmitted / PAUSEMACCtrlFramesReceived
by the Intel 82576EB datasheet, I could not find any information
in the HW actually supported by ixgbe.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These stats are already reported in ethtool -S.
Michael confirms they are equivalent to standard stats.
v2: - fix sparse warning about endian by using the macro
- use u64 for pointer type
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add CQE compression support for completions of packets that span
multiple strides in a Striding RQ, per the HW capability.
In our memory model, we use small strides (256B as of today) for the
non-linear SKB mode. This feature allows CQE compression to work also
for multiple strides packets. In this case decompressing the mini CQE
array will use stride index provided by HW as part of the mini CQE.
Before this feature, compression was possible only for single-strided
packets, i.e. for packets of size up to 256 bytes when in non-linear
mode, and the index was maintained by SW.
This feature is supported for ConnectX-5 and above.
Feature performance test:
This was whitebox-tested, we reduced the PCI speed from 125Gb/s to
62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming
packets before building the SKB to achieve low cpu utilization.
Outcome is low cpu utilization and bottleneck on pci only.
Test setup:
Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores
NIC: ConnectX-6 DX.
Sender side generates 300 byte packets at full pci bandwidth.
Receiver side configuration:
Single channel, one cpu processing with one ring allocated. Cpu utilization
is ~20% while pci bandwidth is fully utilized.
For the generated traffic and interface MTU of 4500B (to activate the
non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps
to ~21Mpps.
Without this feature, counters show no CQE compression blocks for
this setup, while with the feature, counters show ~20.7Mpps compressed CQEs
in ~500K compression blocks.
Signed-off-by: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support for rewriting of IPV6 DSCP part of traffic class field.
Next commands, for example, can be used to offload rewrite action:
OVS:
$ ovs-ofctl add-flow ovs-sriov "tcpv6, in_port=REP, \
actions=mod_nw_tos:68, output:NIC"
iproute2:
$ tc filter add dev REP ingress protocol ipv6 prio 1 flower skip_sw \
ip_proto tcp \
action pedit ex munge ip6 traffic_class set 68 retain 0xfc pipe \
action mirred egress redirect dev NIC
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Support tc trap such that packets can explicitly be forwarded to slow
path if they match a specific rule.
In the example below, we want packets with src IP equals 7.7.7.8 to be
forwarded to software, in which case it will get to the appropriate
representor net device.
$ tc filter add dev eth1 protocol ip prio 1 root flower skip_sw \
src_ip 7.7.7.8 action trap
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Multiple features use metadata matching such as bond vport
in live migration, multi-port RoCE mode, stacked devices;
hence, enable vport metadata matching by default.
Fixes: 1e62e222db ("net/mlx5: E-Switch, Use vport metadata matching only when mandatory")
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In merged eswitch configuration, peer miss rule is setup for all
vports. If metadata is enabled, peer miss rule with metadata matching
will be configured instead of source port matching; however, some
vports that have not yet been enabled don't have default_metadata
setup and their default_metadata will be zero.
Hence, setup/cleanup default metadata for all vports when eswitch moves
in/out of offloads mode.
Fixes: 133dcfc577 ("net/mlx5: E-Switch, Alloc and free unique metadata for match")
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Uplink vport must have a dedicated metadata with vhca_id
being part of the metadata.
Fixes: 133dcfc577 ("net/mlx5: E-Switch, Alloc and free unique metadata for match")
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Check E-Switch capabilities and enable metadata support flag
before using it to setup other features that need metadata.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
LAG offload can't be enabled if the enslaved PF is not lag master,
which is indicated by HCA capabilities bit. It is cleared if more than
64 VFs are configured for this PF.
Previously, a data structure is created to store lag info, including
PFs to be enslaved, then a handler is registered for netdev notifier.
However, this initialization is skipped if PF is not lag master. So
PF can't handle CHANGEUPPER event from upper bond device. Even worse,
PF is enslaved silently, and LAG offload is not activated.
Fix this by registering netdev notifier for PFs which are not lag
masters. When CHANGEUPPER event is received, and both physical ports
(and only them) on the same NIC are about to be enslaved, a warning is
returned for user to know it.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If bond tx type is not active-backup or hash, LAG offload can't be
enabled. When CHANGEUPPER event is received, and both PFs (and only
them) under the same lag master are about to be enslaved, a warning
is returned for user to know the offload failure, otherwise PFs are
enslaved silently without LAG offload activated.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Change the return value to -ENOENT, to make it more meaningful.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Before calling timecounter_cyc2time(), clock->lock must be taken.
Use mlx5_timecounter_cyc2time instead which guarantees a safe access.
Fixes: afc98a0b46 ("net/mlx5: Update ptp_clock_event foreach PPS event")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Holding the clock lock is not required when scheduling a PPS work.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on
some cases used container_of and on another used clock->mdev.
Align code to use container_of amd remove clock->mdev pointer.
While here, fix reverse xmas tree coding style.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
This isn't a fall through because it was after a return statement. The
fall through annotation leads to a Smatch warning:
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:246
mlx5e_ethtool_get_sset_count() warn: ignoring unreachable code.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Add variable initialization to eliminate the warning
"variable may be used uninitialized".
Fixes: 5f29458b77 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Moshe Tal <moshet@mellanox.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Clean and rebuild the debugfs info for the queues being swapped.
Fixes: a34e25ab97 ("ionic: change the descriptor ring length without full reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The napi_schedule() call will only schedule the NAPI if it is not
already running. To make sure that we do not deactivate interrupts
without scheduling NAPI only deactivate the interrupts in case NAPI also
gets scheduled.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use napi_complete_done() and activate the interrupts when this function
returns true. This way the generic NAPI code can take care of activating
the interrupts.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_tx_napi_add() should be used for NAPI in the TX direction instead
of the netif_napi_add() function.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to netif_wake_queue() when the TX descriptors were freed was
missing. When there are no TX buffers available the TX queue will be
stopped, but it was not started again when they are available again,
this is fixed in this patch.
Fixes: fe1a56420c ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBIB register configures the size of an internal buffer that the
Spectrum ASICs use when mirroring traffic on egress. This size should be
taken into account when validating that the port headroom buffers are not
larger than the chip can handle. Up until now this was not done, which is
incidentally not a problem, because the priority group buffers that mlxsw
auto-configures are small enough that the boundary condition could not be
violated.
However when dcbnl_setbuffer is implemented, the user has control over
sizes of PG buffers, and they might overshoot the headroom capacity.
However the size of the SBIB buffer depends on port speed, and that cannot
be vetoed. Therefore SBIB size should be deduced from maximum port speed.
Additionally, once the buffers are configured by hand, the user could get
into an uncomfortable situation where their MTU change requests get vetoed,
because the SBIB does not fit anymore. Therefore derive SBIB size from
maximum permissible MTU as well.
Remove all the code that adjusted the SBIB size whenever speed or MTU
changed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The maximum port speed depends on link modes supported by the port, and for
Ethernet ports is constant. The maximum speed will be handy when setting
SBIB, the internal buffer used for traffic mirroring. Therefore, keep it in
struct mlxsw_sp_port for easy access.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The maximum port MTU depends on port type. On Spectrum, mlxsw configures
all ports as Ethernet ports, and the maximum MTU therefore never changes.
Besides checking MTU configuration, maximum MTU will also be handy when
setting SBIB, the internal buffer used for traffic mirroring. Therefore,
keep it in struct mlxsw_sp_port for easy access.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBIB register configures the size of an internal buffer that the
Spectrum ASICs use when mirroring traffic on egress. This size should be
taken into account when validating that the port headroom buffers are not
larger than the chip can handle. Up until now this was not done, which is
incidentally not a problem, because the priority group buffers that mlxsw
auto-configures are small enough that the boundary condition could not be
violated.
When dcbnl_setbuffer is implemented, the user gets control over sizes of PG
buffers, and they might overshoot the headroom capacity. However the size
of the SBIB buffer depends on port speed, which cannot be vetoed. There is
obviously no way to retroactively push back on requests for overlarge PG
buffers, or reject an overlarge MTU, or cancel losslessness of a certain
PG.
Therefore, instead of taking into account the current speed when
calculating SBIB buffer size, take into account the maximum speed that a
port with given Ethernet protocol capabilities can have.
To that end, add a new ethtool callback, ptys_max_speed, which determines
this maximum speed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow reusing the logic, extract from
mlxsw_sp_port_get_link_ksettings() the code to obtain Ethernet protocol
attributes, mlxsw_sp_port_ptys_query().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2020-09-14
This series contains updates to i40e driver only.
Li RongQing removes binding affinity mask to a fixed CPU and sets
prefetch of Rx buffer page to occur conditionally.
Björn provides AF_XDP performance improvements by not prefetching HW
descriptors, using 16 byte descriptors, and moving buffer allocation
out of Rx processing loop.
v2: Define prefetch_page_address in a common header for patch 2.
Dropped, previous, patch 5 as it is being reworked to be more
generalized.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add NETIF_F_GSO_UDP_TUNNEL and NETIF_F_GSO_UDP_TUNNEL_CSUM features
to support vxlan segmentation and checksum offload. Ipip and ipv6
tunnel packets are regarded as non-tunnel pkt for hw and as for other
type of tunnel pkts, checksum offload is disabled.
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c:661:6: warning:
variable 'val' set but not used [-Wunused-but-set-variable]
661 | u32 val;
| ^~~
After commit 7f9664525f ("qlcnic: 83xx memory map and HW access
routines"), variable 'val' is never used in qlcnic_83xx_cam_unlock(), so
removing it to avoid build warning.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/marvell/pxa168_eth.c:1190:6: warning:
variable 'retval' set but not used [-Wunused-but-set-variable]
1190 | int retval;
| ^~~~~~
Function pxa168_eth_change_mtu() always return zero, so variable 'retval'
is redundant, just remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/freescale/fec_ptp.c:523:6: warning:
variable 'ns' set but not used [-Wunused-but-set-variable]
523 | u64 ns;
| ^~
After commit 6605b730c0 ("FEC: Add time stamping code and a PTP
hardware clock"), variable 'ns' is never used in fec_time_keep(),
so removing it to avoid build warning.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/dnet.c:510:6: warning:
variable 'tx_status' set but not used [-Wunused-but-set-variable]
u32 tx_status, irq_enable;
^~~~~~~~~
After commit 4796417417 ("dnet: Dave DNET ethernet controller driver
(updated)"), variable 'tx_status' is never used in dnet_start_xmit(),
so removing it to avoid build warning.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of checking in each iteration of the Rx packet processing
loop, move the allocation out of the loop and do it once for each napi
activation.
For AF_XDP the rx_drop benchmark was improved by 6%.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The i40e NIC supports two flavors of HW descriptors, 16 and 32
byte. The latter has, obviously, room for more offloading
information. However, the only fields of the 32B HW descriptor that is
being used by the driver, is also available in the 16B descriptor.
In other words; Reading and writing 32 bytes instead of 16 byte is a
waste of bus bandwidth.
This commit starts using 16 byte descriptors instead of 32 byte
descriptors.
For AF_XDP the rx_drop benchmark was improved by 2%.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The software prefetching of HW descriptors has a negative impact on
the performance. Therefore, it is now removed.
Performance for the rx_drop benchmark increased with 2%.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
refcount of rx_buffer page will be added here originally, so prefetchw
is needed, but after commit 1793668c3b ("i40e/i40evf: Update code to
better handle incrementing page count"), and refcount is not added
every time, so change prefetchw as prefetch.
Now it mainly services page_address(), but which accesses struct page
only when WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL is defined otherwise
it returns address based on offset, so we prefetch it conditionally.
Jakub suggested to define prefetch_page_address in a common header.
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After commit 759dc4a7e6 ("i40e: initialize our affinity_mask
based on cpu_possible_mask"), NAPI IRQ affinity_mask is bind to
all possible CPUs, not a fixed CPU
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Richard Cochran points out that SKBTX_IN_PROGRESS should be set when
the skbuff is queued for timestamping. Add this.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'w89c840_open()' is incorrectly reported in a debug message. Use __func__
instead.
While at it, fix some style issue in the same function.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
We found a set but not used variable 'ring_cons' in mlx4_en_xmit(), it will
cause a warning when build the kernel. And after checking the commit record
of this function, we found that it was introduced by a previous patch.
So, We delete this redundant assignment code.
Fixes: 488a9b48e3 ("net/mlx4_en: Wake TX queues only when there's enough room")
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than doing request_irq and then disabling the irq immediately, it
should be safer to use IRQ_NOAUTOEN flag for the irq. It removes any gap
between request_irq() and disable_irq().
Cc: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missed suspend/resume callbacks to properly restore networking after
suspend/resume cycle.
Fixes: ed3525eda4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TI J721E (CPSW9g) ALE version is similar, in general, to Sitara AM3/4/5
CPSW ALE, but has more extended functions and different ALE VLAN entry
format.
This patch adds support for for multi port TI J721E (CPSW9g) ALE variant.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ALE VLAN entries are too much differ between different TI CPSW ALE
versions. So, handling them using flags, defines and get/set functions
became over-complicated.
This patch introduces tables to describe the ALE VLAN entries fields, which
are different between TI CPSW ALE versions, and new get/set access
functions. It also allows to detect incorrect access to not available ALL
entry fields.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AM65x ALE supports HW auto-ageing which can be enabled by programming
ageing interval in ALE_AGING_TIMER register. For this CPSW fck_clk
frequency has to be know by ALE.
This patch extends cpsw_ale_params with bus_freq field and enables ALE HW
auto ageing for AM65x CPSW2G ALE version.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hence all existing driver updated to use ALE dev_id the usage of ale dev_id
can be made mandatory and cpsw_ale_create() can be updated to use
"features" property from ALE static configuration.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch has introduced possibility to select CPSW ALE by using
ALE dev_id identifier. Switch TI TI AM65x/J721E CPSW NUSS driver to use
dev_id.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch has introduced possibility to select CPSW ALE by using
ALE dev_id identifier. Switch TI Keystone 2 NETCP driver to use dev_id and
perform clean up by removing "ale_entries" configuration code.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch has introduced possibility to select CPSW ALE by using
ALE dev_id identifier. Switch TI cpsw driver to use dev_id="cpsw" and
perform clean up by removing "ale_entries" configuration code.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As existing, as newly introduced CPSW ALE versions have differences in
supported features and ALE table formats. Especially it's actual for the
recent AM65x/J721E/J7200 SoC and feature AM64x, which supports features
like: auto-aging, classifiers, Link aggregation, additional hw filtering,
etc.
Existing ALE configuration interface is not practical in terms of adding
new features and requires consumers to program a lot static parameters. Any
attempt to add new options will case endless adding and maintaining
different combination of flags and options.
Hence CPSW ALE configuration is static and fixed for SoC (or set of SoC) It
is reasonable to add support for static ALE configurations inside ALE
module. This patch adds static ALE configuration table for different ALE
versions and provides option for consumers to select required ALE
configuration by providing ALE const char *dev_id identifier.
This feature is not enabled by default until existing CPSW drivers will be
modified by follow up patches.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cpsw_ale_get_num_entries() API to return number of ALE table entries
and update existing drivers to use it.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves complete nic tls offload (kTLS) code from crypto
directory to drivers/net/ethernet/chelsio/inline_crypto/ch_ktls
directory. nic TLS is made a separate ULD of cxgb4.
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calling hinic_close in hinic_set_channels, all queues are
stopped after netif_tx_disable, but some queue may be rewaken in
free_tx_poll by mistake while drv is handling tx irq. If one queue
is rewaken core may call hinic_xmit_frame to send pkt after
netif_tx_disable within a short time which may results in accessing
memory that has been already freed in hinic_close. So we call
napi_disable before netif_tx_disable in hinic_close to fix this bug.
Fixes: 2eed5a8b61 ("hinic: add set_channels ethtool_ops support")
Signed-off-by: Luo bin <luobin9@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-09-09
This series contains updates to i40e and igc drivers.
Stefan Assmann changes num_vlans to u16 to fix may be used uninitialized
error and propagates error in i40_set_vsi_promisc() for i40e.
Vinicius corrects timestamping latency values for i225 devices and
accounts for TX timestamping delay for igc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Necessitates an .ndo_features_check, as the EF10 datapath has several
limitations on what it can handle.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
>From the 8000 series onwards, EF10 NICs with suitable firmware are able
to perform TSO within VXLAN or NVGRE encapsulation.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the tx_queue->handle_tso function pointer, and just use
tx_queue->tso_version to decide which function to call, thus removing
an indirect call from the fast path.
Instead of passing a tso_v2 flag to efx_mcdi_tx_init(), set the desired
tx_queue->tso_version before calling it.
In efx_mcdi_tx_init(), report back failure to obtain a TSOv2 context by
setting tx_queue->tso_version to 0, which will cause the TX path to
use the GSO-based fallback.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Won't actually be exercised until we start advertising the corresponding
offload features.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the MC reports the VXLAN_NVGRE datapath capability, then these queues
can be used for checksum offload of encapsulated packets.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing yet creates inner csum TXQs; just change all references to
EFX_TXQ_TYPE_OFFLOAD to the new EFX_TXQ_TYPE_OUTER_CSUM.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make it possible to have an arbitrary mapping from types to labels,
because when we add inner-csum-offload TXQs there will no longer be a
convenient nesting hierarchy of NIC types (EF10 will have inner-csum
TXQs, while Siena will have HIGHPRI).
Correct a misleading comment on efx_hard_start_xmit().
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are never modified, so constify them to allow the compiler to
place them in read-only memory. This moves about 25kB to read-only
memory as seen by the output of the size command.
Before:
text data bss dec hex filename
296203 65464 1248 362915 589a3 drivers/net/ethernet/marvell/octeontx2/af/octeontx2_af.ko
After:
text data bss dec hex filename
321003 40664 1248 362915 589a3 drivers/net/ethernet/marvell/octeontx2/af/octeontx2_af.ko
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The old_channel argument is never used, so remove it.
The function is only called from elsewhere in efx_channels.c, so make
it static and remove the declaration from the header file.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The statement above it already returns, so there is no way to get here.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_init_struct already calls this, we don't need to do it again.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the correct resolution for the conflict from
merging the "net" tree fix:
commit 26cb7085c8 ("enetc: Remove the mdio bus on PF probe bailout")
with the "net-next" new work:
commit 07095c025a ("net: enetc: Use DT protocol information to set up the ports")
that moved mdio bus allocation to an ealier stage of
the PF probing routine.
Fixes: a57066b1a0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add flow control support. The functionality was tested on AR9331 SoC and
confirmed by iperf3 results and HW counters exported over ethtool.
Following test configurations was used:
iMX6S receiver <--- TL-SG1005D switch <---- AR9331 sender
The switch is supporting symmytric flow control:
Settings for eth0:
Supported ports: [ MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
--->> Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 100Mb/s
Duplex: Full
Auto-negotiation: on
Port: MII
PHYAD: 4
Transceiver: external
Link detected: yes
The iMX6S system was configured to 10Mbit, to let the switch use flow
control:
- ethtool -s eth0 speed 10
With flow control disabled on AR9331:
- ethtool -A eth0 rx off tx off
- iperf3 -u -c 172.17.0.1 -b100M -l1472 -t10
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-10.00 sec 66.2 MBytes 55.5 Mbits/sec 0.000 ms 0/47155 (0%) sender
[ 5] 0.00-10.04 sec 11.5 MBytes 9.57 Mbits/sec 1.309 ms 38986/47146 (83%) receiver
With flow control enabled on AR9331:
- ethtool -A eth0 rx on tx on
- iperf3 -u -c 172.17.0.1 -b100M -l1472 -t10
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-10.00 sec 15.1 MBytes 12.6 Mbits/sec 0.000 ms 0/10727 (0%) sender
[ 5] 0.00-10.05 sec 11.5 MBytes 9.57 Mbits/sec 1.371 ms 2525/10689 (24%) receiver
Similar results are get in opposite direction by introducing extra CPU
load on AR9331:
- chrt 40 dd if=/dev/zero of=/dev/null &
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic ethtool support. The functionality was tested on AR9331 SoC.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We found the following warning when using W=1 to build kernel:
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3634:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]
int ret, coe = priv->hw->rx_csum;
When digging stmmac_get_rx_header_len(), dwmac4_get_rx_header_len() and
dwxgmac2_get_rx_header_len() return 0 only, without any error code to
report. Therefore, it's better to define get_rx_header_len() as void.
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change allows the driver to report the device link speed
when the ethtool command:
ethtool <nic name>
is run.
Getting the link speed is done via a new admin queue command:
ReportLinkSpeed.
Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes the driver better aware of the connectivity status of the
device. Based on the device's status register, the driver can call
netif_carrier_{on,off}.
Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Patricio Noyola <patricion@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for batching AQ commands and uses it for creating and
destroying queues.
Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds per queue NIC stats to ethtool stats and to report-stats.
These stats are always exposed to guest whether or not the
report-stats flag is turned on.
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds functionality to report driver stats to Hypervisor.
(Users may want to turn this feature off as a matter of privacy
so a "report-stats" flag is added as an ethtool priv option.
It is also disabled by default.)
The hypervisor would trigger a stats report in case "too many"
packets dropped; the stats would be useful in debugging stuck
queues.
A "stats_report_trigger_cnt" stat is added to count the number of times
the hypervisor attempts to trigger stats report.
A timer is also added so that when report-stats is enabled, stat are
updated once every 20 seconds.
Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Kuo Zhao <kuozhao@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the driver to use dev_info/err instead of netif_info/err.
Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for getting and setting the RX copybreak
value via ethtool.
Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Kuo Zhao <kuozhao@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MTU change on ethtool is currently not supported for iWARP. Notify qedr
driver so that appropriate logging can take place.
Link: https://lore.kernel.org/r/20200902165741.8355-6-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
zbva is always false, so fbo is never read.
A 'zero-based-virtual-address' is simply IOVA == 0, and the driver already
supports this.
Link: https://lore.kernel.org/r/16-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Fixes the following warning when using W=1 to build kernel:
drivers/net/ethernet/smsc/smc91x.c: In function ‘smc_phy_configure’:
drivers/net/ethernet/smsc/smc91x.c:1039:6: warning: variable ‘status’ set but not used [-Wunused-but-set-variable]
int status;
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase Rx ring size to address issue where hardware is reaching
the receive work limit.
Before:
[ 102.223342] de2104x 0000:17:00.0 eth0: rx work limit reached
[ 102.245695] de2104x 0000:17:00.0 eth0: rx work limit reached
[ 102.251387] de2104x 0000:17:00.0 eth0: rx work limit reached
[ 102.267444] de2104x 0000:17:00.0 eth0: rx work limit reached
Signed-off-by: Lucy Yan <lucyyan@google.com>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c:115: warning: Excess function parameter 'hw_handler' description in 'hinic_aeq_register_hw_cb'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c:382: warning: Excess function parameter 'size' description in 'api_cmd'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1841: warning: Excess function parameter 'netdev' description in 'hns_set_multicast_list'
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1841: warning: Excess function parameter 'p' description in 'hns_set_multicast_list'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c:137: warning: Excess function parameter 'drv' description in 'hns_xgmac_enable'
drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c:497: warning: Excess function parameter 'cmd' description in 'hns_xgmac_get_regs'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename cdev to hdev.
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/hisilicon/hns/hnae.c:444: warning: Excess function parameter 'cdev' description in 'hnae_ae_unregister'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/huawei/hinic/hinic_hw_io.c:373: warning: Excess function parameter 'sq_msix_entry' description in 'hinic_io_create_qps'
drivers/net/ethernet/huawei/hinic/hinic_hw_io.c:373: warning: Excess function parameter 'rq_msix_entry' description in 'hinic_io_create_qps'
Rename these wrong names.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the functions mvpp2_isr_handle_xlg() and
mvpp2_isr_handle_gmac_internal(), the bool variable link is assigned a
true value in the case that a given bit of val is set. However, if the
bit is unset, no value is assigned to link and it is then passed to
mvpp2_isr_handle_link() without being initialised. Fix by assigning to
link the value of the bit test.
Build-tested on x86.
Fixes: 36cfd3a6e5 ("net: mvpp2: restructure "link status" interrupt handling")
Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:2209: warning: Excess function parameter 'adapter' description in 'clear_sge_ctxt'
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:2975: warning: Excess function parameter 'adapter' description in 't3_set_proto_sram'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using XDP every ingress packet is passed to an eBPF (xdp) program
which returns an action for this packet.
This patch adds counters for the number of times each such action was
received. It also counts all the invalid actions received from the eBPF
program.
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added statistics for TX queues that are used for XDP TX. The statistics
are the same as the ones printed for regular non-XDP TX queues.
The XDP queue statistics can be queried using
`ethtool -S <ifname>`
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new metrics provide granular visibility along multiple network
dimensions and enable troubleshooting and remediation of issues caused
by instances exceeding network performance allowances.
The new statistics can be queried using ethtool command.
Signed-off-by: Guy Tzalik <gtzalik@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The type of all stat fields is u64, therefore when iterating over stat
fields in a stats struct, it makes sense to use an offset in 64 bit
resolution. Doing so allows us to drop some of the casting that is
currently used when referencing stats.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Release first buffer as last one since it contains references
to subsequent fragments. This code will be optimized introducing
multi-buffer bit in xdp_buff structure.
Fixes: ca0e014609 ("net: mvneta: move skb build after descriptors processing")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove mvneta_stats from mvneta_swbm_rx_frame signature since now stats
are accounted in mvneta_run_xdp routine
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We allow drivers to call napi_hash_del() before calling
netif_napi_del() to batch RCU grace periods. This makes
the API asymmetric and leaks internal implementation details.
Soon we will want the grace period to protect more than just
the NAPI hash table.
Restructure the API and have drivers call a new function -
__netif_napi_del() if they want to take care of RCU waits.
Note that only core was checking the return status from
napi_hash_del() so the new helper does not report if the
NAPI was actually deleted.
Some notes on driver oddness:
- veth observed the grace period before calling netif_napi_del()
but that should not matter
- myri10ge observed normal RCU flavor
- bnx2x and enic did not actually observe the grace period
(unless they did so implicitly)
- virtio_net and enic only unhashed Rx NAPIs
The last two points seem to indicate that the calls to
napi_hash_del() were a left over rather than an optimization.
Regardless, it's easy enough to correct them.
This patch may introduce extra synchronize_net() calls for
interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
free_netdev() to call netif_napi_del(). This seems inevitable
since we want to use RCU for netpoll dev->napi_list traversal,
and almost no drivers set IFF_DISABLE_NETPOLL.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even tho mlx4_core registers the devlink ports, it's mlx4_en
and mlx4_ib which set their type. In situations where one of
the two is not built yet the machine has ports of given type
we see the devlink warning from devlink_port_type_warn() trigger.
Having ports of a type not supported by the kernel may seem
surprising, but it does occur in practice - when the unsupported
port is not plugged in to a switch anyway users are more than happy
not to see it (and potentially allocate any resources to it).
Set the type in mlx4_core if type-specific driver is not built.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to easily change the rx buffer size, rely on
MVNETA_MAX_RX_BUF_SIZE instead of PAGE_SIZE in mvneta_swbm_rx_frame
routine for rx buffer split. Currently this is not an issue since we set
MVNETA_MAX_RX_BUF_SIZE to PAGE_SIZE - MVNETA_SKB_PAD but it is a good to
have to configure a different rx buffer size.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When timestamping a packet there's a delay between the start of the
packet and the point where the hardware actually captures the
timestamp. This difference needs to be considered if we want accurate
timestamps.
This was done on the RX side, but not on the TX side.
Fixes: 2c344ae245 ("igc: Add support for TX timestamping")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The previous timestamping latency numbers were obtained by
interpolating the i210 numbers with the i225 crystal clock value. That
calculation was wrong.
Use the correct values from real measurements.
Fixes: 81b055205e ("igc: Add support for RX timestamping")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The for loop in i40e_set_vsi_promisc() reports errors via dev_err() but
does not propagate the error up the call chain. Instead it continues the
loop and potentially overwrites the reported error value.
This results in the error being recorded in the log buffer, but the
caller might never know anything went the wrong way.
To avoid this situation i40e_set_vsi_promisc() needs to temporarily store
the error after reporting it. This is still not optimal as multiple
different errors may occur, so store the first error and hope that's
the main issue.
Fixes: 37d318d780 (i40e: Remove scheduling while atomic possibility)
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c: In function ‘i40e_set_vsi_promisc’:
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1176:14: error: ‘aq_ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
i40e_status aq_ret;
In case the code inside the if statement and the for loop does not get
executed aq_ret will be uninitialized when the variable gets returned at
the end of the function.
Avoid this by changing num_vlans from int to u16, so aq_ret always gets
set. Making this change in additional places as num_vlans should never
be negative.
Fixes: 37d318d780 ("i40e: Remove scheduling while atomic possibility")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the assert during VF driver installation when the personality is iWARP
Fixes: 1fe614d10f ("qed: Relax VF firmware requirements")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some configurations ARFS cannot be used, so disable it if device
is not capable.
Fixes: e4917d46a6 ("qede: Add aRFS support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In CMT and NPAR the PF is unknown when the GFS block processes the
packet. Therefore cannot use searcher as it has a per PF database,
and thus ARFS must be disabled.
Fixes: d51e4af5c2 ("qed: aRFS infrastructure support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for timestamping transmit packets. We allocate SYNC
messages to queue 1, every other message to queue 0.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for receive timestamping. When enabled, the hardware adds
a timestamp into the receive queue descriptor for all received packets
with no filtering. Hence, we can only support NONE or ALL receive
filter modes.
The timestamp in the receive queue contains two bit sof seconds and
the full nanosecond timestamp. This has to be merged with the remainder
of the seconds from the TAI clock to arrive at a full timestamp before
we can convert it to a ktime for the skb hardware timestamp field.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the TAI block in the mvpp2.2 hardware.
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the first level interrupt status registers to determine how to
further process the port interrupt. We will need this to know whether
to invoke the link status processing and/or the PTP processing for
both XLG and GMAC.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The link interrupt is used for way more than just the link status; it
comes from a collection of units to do with the port. The Marvell
documentation describes the interrupt as "GOP port X interrupt".
Since we are adding PTP support, and the PTP interrupt uses this,
rename it to be more inline with the documentation.
This interrupt is also mis-named in the DT binding, but we leave that
alone.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "link status" interrupt is used for more than just link status.
Restructure mvpp2_link_status_isr() so we can add additional handling.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
A devlink port may be for a controller consist of PCI device.
A devlink instance holds ports of two types of controllers.
(1) controller discovered on same system where eswitch resides
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external host system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system.
When a devlink eswitch instance serves the devlink ports of both
controllers together, PCI PF/VF numbers may overlap.
Due to this a unique phys_port_name cannot be constructed.
For example in below such system controller-0 and controller-1, each has
PCI PF pf0 whose eswitch ports can be present in controller-0.
These results in phys_port_name as "pf0" for both.
Similar problem exists for VFs and upcoming Sub functions.
An example view of two controller systems:
---------------------------------------------------------
| |
| --------- --------- ------- ------- |
----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
| server | | ------- ----/---- ---/----- ------- ---/--- ---/--- |
| pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ |
| connect | | ------- ------- |
----------- | | controller_num=1 (no eswitch) |
------|--------------------------------------------------
(internal wire)
|
---------------------------------------------------------
| devlink eswitch ports and reps |
| ----------------------------------------------------- |
| |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
| |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | |
| ----------------------------------------------------- |
| |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
| |pf1 | pf1vfN | pf1sfN | pf1 | pf1vfN |pf0sfN | |
| ----------------------------------------------------- |
| |
| |
| --------- --------- ------- ------- |
| | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
| ------- ----/---- ---/----- ------- ---/--- ---/--- |
| | pf0 |______/________/ | pf1 |___/_______/ |
| ------- ------- |
| |
| local controller_num=0 (eswitch) |
---------------------------------------------------------
An example devlink port for external controller with controller
number = 1 for a VF 1 of PF 0:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port show pci/0000:06:00.0/2 -jp
{
"port": {
"pci/0000:06:00.0/2": {
"type": "eth",
"netdev": "ens2f0pf0vf1",
"flavour": "pcivf",
"controller": 1,
"pfnum": 0,
"vfnum": 1,
"external": true,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00"
}
}
}
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A devlink eswitch port may represent PCI PF/VF ports of a controller.
A controller either located on same system or it can be an external
controller located in host where such NIC is plugged in.
Add the ability for driver to specify if a port is for external
controller.
Use such flag in the mlx5_core driver.
An example of an external controller having VF1 of PF0 belong to
controller 1.
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port show pci/0000:06:00.0/2 -jp
{
"port": {
"pci/0000:06:00.0/2": {
"type": "eth",
"netdev": "ens2f0pf0vf1",
"flavour": "pcivf",
"pfnum": 0,
"vfnum": 1,
"external": true,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00"
}
}
}
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ECPF supports one external host controller. Read controller number
from the device.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because clk_prepare_enable() and clk_disable_unprepare() already checked
NULL clock parameter, so the additional checks are unnecessary, just
remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because clk_prepare_enable() and clk_disable_unprepare() already checked
NULL clock parameter, so the additional checks are unnecessary, just
remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename ptp to ptp_info.
Fix W=1 compile warnings (invalid kerneldoc):
drivers/net/ethernet/cavium/common/cavium_ptp.c:94: warning: Excess function parameter 'ptp' description in 'cavium_ptp_adjfine'
drivers/net/ethernet/cavium/common/cavium_ptp.c:141: warning: Excess function parameter 'ptp' description in 'cavium_ptp_adjtime'
drivers/net/ethernet/cavium/common/cavium_ptp.c:163: warning: Excess function parameter 'ptp' description in 'cavium_ptp_gettime'
drivers/net/ethernet/cavium/common/cavium_ptp.c:185: warning: Excess function parameter 'ptp' description in 'cavium_ptp_settime'
drivers/net/ethernet/cavium/common/cavium_ptp.c:208: warning: Excess function parameter 'ptp' description in 'cavium_ptp_enable'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As ch_ipsec was removed without clearing xfrmdev_ops and netdev
feature(esp-hw-offload). When a recalculation of netdev feature is
triggered by changing tls feature(tls-hw-tx-offload) from user
request, it causes a page fault due to absence of valid xfrmdev_ops.
Fixes: 6dad4e8ab3 ("chcr: Add support for Inline IPSec")
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix typo/spello of "functionality".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code recently moved into this file contained a number of coding style
issues, about which checkpatch and xmastree complained. Fix them.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c:4238: warning: Excess function parameter 'netdev' description in 'bnx2x_setup_tc'
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c:4238: warning: Excess function parameter 'tc' description in 'bnx2x_setup_tc'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/smsc/smsc911x.c: In function ‘smsc911x_rx_fastforward’:
drivers/net/ethernet/smsc/smsc911x.c:1199:16: warning: variable ‘temp’ set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/smsc/smsc911x.c: In function ‘smsc911x_eeprom_write_location’:
drivers/net/ethernet/smsc/smsc911x.c:2058:6: warning: variable ‘temp’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hns3_update_promisc_mode is defined, but not be used, so remove it.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several macros related queue defined, but never
used, so remove them.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'tc_num_last_time' is defined, but never used, so remove it.
Reported-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'io_base' has been defined and initialized, but never used,
so remove it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The member link of struct hclge_mac stores the link status of
MAC and PHY if PHY exists, but its annotation uses word "exit",
so fix it.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When reset fails, if there are some pending jobs for the periodic
service task, it does not do anything except print error each
time the task is scheduled. So skip the periodic service task if
reset failed.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since variable send_msg and ret only used in if branch, so move
their definition into the if branch.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph says[1] that dma_set_mask_and_coherent() is smart enough to
truncate the mask itself if it's too long. So we can get rid of our
"lop off one bit and retry" loop in efx_init_io().
[1]: https://www.spinics.net/lists/netdev/msg677266.html
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Per-module versions for in-tree drivers are deprecated.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the reported PHY capabilities do not include a given FEC mode, don't
attempt to select that FEC mode anyway. If the user tries to set a mode
through ethtool that is not supported, return an error.
The _REQUESTED bits don't appear in the supported caps, but are implied
by the corresponding FEC bits.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Originally there were several implementations of PHY operations for the
several different PHYs used on Falcon boards. But Falcon is now in a
separate driver, and all sfc NICs since then have had MCDI-managed PHYs.
Thus, there is no need to indirect through function pointers in
efx->phy_op; we can simply call the efx_mcdi_phy_* functions directly.
This also hooks up these functions for EF100, which was previously using
the dummy_phy_ops.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dev_close(), by way of ef100_net_stop(), already brings down the filter
table, so there's no need to do it again (which just causes lots of
WARN_ONs).
Similarly, don't bring it up ourselves, as dev_open() -> ef100_net_open()
will do it, and will fail if it's already been brought up.
Fixes: a9dc3d5612 ("sfc_ef100: RX filter table management and related gubbins")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Indentation and logic clearly show that this code is missing
parenthesis.
Fixes: 9f13457377 ("ibmvnic fix NULL tx_pools and rx_tools issue at do_reset")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/dnet.c: In function dnet_start_xmit
drivers/net/ethernet/dnet.c:511:15: warning: variable ‘len’ set but not used [-Wunused-but-set-variable]
commit 4796417417 ("dnet: Dave DNET ethernet controller driver (updated)")
involved this unused variable, remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because clk_prepare_enable() and clk_disable_unprepare() already checked
NULL clock parameter, so the additional checks are unnecessary, just
remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because clk_prepare_enable() and clk_disable_unprepare() already checked
NULL clock parameter, so the additional checks are unnecessary, just
remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because clk_prepare_enable() and clk_disable_unprepare() already checked
NULL clock parameter, so the additional checks are unnecessary, just
remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because clk_prepare_enable() and clk_disable_unprepare() already checked
NULL clock parameter, so the additional checks are unnecessary, just
remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_fw_reset_task() which runs from a workqueue can race with
bnxt_remove_one(). For example, if firmware reset and VF FLR are
happening at about the same time.
bnxt_remove_one() already cancels the workqueue and waits for it
to finish, but we need to do this earlier before the devlink
reporters are destroyed. This will guarantee that
the devlink reporters will always be valid when bnxt_fw_reset_task()
is still running.
Fixes: b148bb238c ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the driver goes through PCIe AER reset in error state, all
firmware messages will timeout because the PCIe bus is no longer
accessible. This can lead to AER reset taking many minutes to
complete as each firmware command takes time to timeout.
Define a new macro BNXT_NO_FW_ACCESS() to skip these firmware messages
when either firmware is in fatal error state or when
pci_channel_offline() is true. It now takes a more reasonable 20 to
30 seconds to complete AER recovery.
Fixes: b4fff2079d ("bnxt_en: Do not send firmware messages if firmware is in error state.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'vnic_dev_classifier()', 'vnic_dev_fw_info()',
'vnic_dev_notify_set()' and 'vnic_dev_stats_dump()' (vnic_dev.c) GFP_ATOMIC
must be used because its callers take a spinlock before calling these
functions.
When memory is allocated in '__enic_set_rsskey()' and 'enic_set_rsscpu()'
GFP_ATOMIC must be used because they can be called with a spinlock.
The call chain is:
enic_reset <-- takes 'enic->enic_api_lock'
--> enic_set_rss_nic_cfg
--> enic_set_rsskey
--> __enic_set_rsskey <-- uses dma_alloc_coherent
--> enic_set_rsscpu <-- uses dma_alloc_coherent
When memory is allocated in 'vnic_dev_init_prov2()' GFP_ATOMIC must be used
because a spinlock is hidden in the ENIC_DEVCMD_PROXY_BY_INDEX macro, when
this function is called in 'enic_set_port_profile()'.
When memory is allocated in 'vnic_dev_alloc_desc_ring()' GFP_KERNEL can be
used because it is only called from 5 functions ('vnic_dev_init_devcmd2()',
'vnic_cq_alloc()', 'vnic_rq_alloc()', 'vnic_wq_alloc()' and
'enic_wq_devcmd2_alloc()'.
'vnic_dev_init_devcmd2()': already uses GFP_KERNEL and no lock is taken
in the between.
'enic_wq_devcmd2_alloc()': is called from ' vnic_dev_init_devcmd2()'
which already uses GFP_KERNEL and no lock is taken in the between.
'vnic_cq_alloc()', 'vnic_rq_alloc()', 'vnic_wq_alloc()': are called
from 'enic_alloc_vnic_resources()'
'enic_alloc_vnic_resources()' has only 2 call chains:
1) enic_probe
--> enic_dev_init
--> enic_alloc_vnic_resources
'enic_probe()' is a probe function and no lock is taken in the between
2) enic_set_ringparam
--> enic_alloc_vnic_resources
'enic_set_ringparam()' is a .set_ringparam function (see struct
ethtool_ops). It seems to only take a mutex and no spinlock.
So all paths are safe to use GFP_KERNEL.
@@
@@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
@@
@@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@
@@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
@@
- PCI_DMA_NONE
+ DMA_NONE
@@
expression e1, e2, e3;
@@
- pci_alloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3;
@@
- pci_zalloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3, e4;
@@
- pci_free_consistent(e1, e2, e3, e4)
+ dma_free_coherent(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_sg(e1, e2, e3, e4)
+ dma_map_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_sg(e1, e2, e3, e4)
+ dma_unmap_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+ dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_device(e1, e2, e3, e4)
+ dma_sync_single_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+ dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+ dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It's nice if the phy is online before we register the netdev
so try to do that first.
Stop trying to do "second tried" to register the phy, it
works perfectly fine the first time.
Stop remvoving the phy in uninit. Remove it when the
driver is remove():d, symmetric to where it is added, in
probe().
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PAE bit of NCFGR register, when set, pauses transmission
if a non-zero 802.3 classic pause frame is received.
Fixes: 7897b071ac ("net: macb: convert to phylink")
Signed-off-by: Parshuram Thombare <pthombar@cadence.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pass the correct offset to clear the stale filter hit
bytes counter. Otherwise, the counter starts incrementing
from the stale information, instead of 0.
Fixes: 12b276fbf6 ("cxgb4: add support to create hash filters")
Signed-off-by: Ganji Aravind <ganji.aravind@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware erases the entire flash region which may take several
seconds before flashing, so we bump up the timeout to ensure this
cmd won't return failure.
Fixes: 5e126e7c4e ("hinic: add firmware update support")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We free memory regardless of the return value of SET_FUNC_STATE
cmd in hinic_close function to avoid memory leak and this cmd may
timeout when fw is busy with handling other cmds, so we bump up the
timeout of this cmd to ensure it won't return failure.
Fixes: 00e57a6d4a ("net-next/hinic: Add Tx operation")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use efx_for_each_channel_tx_queue() rather than efx_tx_queue_partner().
Make some related simplifications of efx_nic_tx_is_empty() to remove
entry points that aren't used.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of open-coding the calculation with efx_tx_queue_partner(), use
the functions that iterate over numbers of queues other than 2 with
efx_for_each_channel_tx_queue().
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As in the Siena/EF10 case, it minimises cacheline ping-pong between
the TX and completion paths.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This should cause no functional change; merely make there only be one
design of xmit_more handling to understand. As with the EF10/Siena
version, we set tx_queue->xmit_pending when we queue up a TX, and
clear it when we ring the doorbell (in ef100_notify_tx_desc).
While we're at it, make ef100_notify_tx_desc static since nothing
outside of ef100_tx.c uses it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of using efx_tx_queue_partner(), which relies on the assumption
that tx_queues_per_channel is 2, efx_tx_send_pending() iterates over
txqs with efx_for_each_channel_tx_queue().
We unconditionally set tx_queue->xmit_pending (renamed from
xmit_more_available), then condition on xmit_more for the call to
efx_tx_send_pending(), which will clear xmit_pending. Thus, after an
xmit_more TX, the doorbell is un-rung and xmit_pending is true.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We got slightly different patches removing a double word
in a comment in net/ipv4/raw.c - picked the version from net.
Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
values instead of VNIC login response buffer (following what
commit 507ebe6444 ("ibmvnic: Fix use-after-free of VNIC login
response buffer") did).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from David Miller:
1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
Kivilinna.
2) Fix loss of RTT samples in rxrpc, from David Howells.
3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.
4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.
5) We disable BH for too lokng in sctp_get_port_local(), add a
cond_resched() here as well, from Xin Long.
6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.
7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
Yonghong Song.
8) Missing of_node_put() in mt7530 DSA driver, from Sumera
Priyadarsini.
9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.
10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.
11) Memory leak in rxkad_verify_response, from Dinghao Liu.
12) In tipc, don't use smp_processor_id() in preemptible context. From
Tuong Lien.
13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.
14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.
15) Fix ABI mismatch between driver and firmware in nfp, from Louis
Peens.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
net/smc: fix sock refcounting in case of termination
net/smc: reset sndbuf_desc if freed
net/smc: set rx_off for SMCR explicitly
net/smc: fix toleration of fake add_link messages
tg3: Fix soft lockup when tg3_reset_task() fails.
doc: net: dsa: Fix typo in config code sample
net: dp83867: Fix WoL SecureOn password
nfp: flower: fix ABI mismatch between driver and firmware
tipc: fix shutdown() of connectionless socket
ipv6: Fix sysctl max for fib_multipath_hash_policy
drivers/net/wan/hdlc: Change the default of hard_header_len to 0
net: gemini: Fix another missing clk_disable_unprepare() in probe
net: bcmgenet: fix mask check in bcmgenet_validate_flow()
amd-xgbe: Add support for new port mode
net: usb: dm9601: Add USB ID of Keenetic Plus DSL
vhost: fix typo in error message
net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
pktgen: fix error message with wrong function name
net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
cxgb4: fix thermal zone device registration
...
It is necessary to manage the Wake-on-LAN clock to turn on the
appropriate blocks for MPD or CFP-based packet matching to work
otherwise we will not be able to reliably match packets during suspend.
Reported-by: Blair Prescott <blair.prescott@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We disable clocks shortly after probing the device to save as much power as
possible in case the interface is never used. When bcm_sysport_open() is
invoked, clocks are enabled, and disabled in bcm_sysport_stop().
A similar scheme is applied to the suspend/resume functions.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While CHELSIO_INLINE_CRYPTO is a guard symbol, and just enabling it does
not cause any additional code to be compiled in, all configuration
options protected by it depend on CONFIG_CHELSIO_T4. Hence it doesn't
make much sense to bother the user with the guard symbol question when
CONFIG_CHELSIO_T4 is disabled.
Fix this by moving the dependency from the individual config options to
the guard symbol.
Fixes: 44fd1c1fd8 ("chelsio/chtls: separate chelsio tls driver from crypto driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the XLG and GMAC PCS implementations and switch between them
during the mac_prepare() method.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert mvpp2 to phylink's new pcs support.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the GMAC reset handling into mac_prepare() / mac_finish()
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that the port is forced down while reconfiguring, controlling
this via mac_prepare() and mac_finish() so that it is down while we
are configuring our (future) PCS.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert mvpp2 to use the mac_prepare() and mac_finish() methods in
preparation to converting mvpp2 to split-PCS support.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tidy up the ACPI hack so that we can minimise the function prototypes
for this. This avoids adding further prototypes unnecessarily.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
If tg3_reset_task() fails, the device state is left in an inconsistent
state with IFF_RUNNING still set but NAPI state not enabled. A
subsequent operation, such as ifdown or AER error can cause it to
soft lock up when it tries to disable NAPI state.
Fix it by bringing down the device to !IFF_RUNNING state when
tg3_reset_task() fails. tg3_reset_task() running from workqueue
will now call tg3_close() when the reset fails. We need to
modify tg3_reset_task_cancel() slightly to avoid tg3_close()
calling cancel_work_sync() to cancel tg3_reset_task(). Otherwise
cancel_work_sync() will wait forever for tg3_reset_task() to
finish.
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Reported-by: Baptiste Covolato <baptiste@arista.com>
Fixes: db21997379 ("tg3: Schedule at most one tg3_reset_task run")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new attributes to hwmon object for exposing critical and emergency
alarms.
In case that current temperature is higher than emergency threshold,
EMERGENCY alarm will be reported in sensors utility:
$ sensors
...
front panel 025: +55.0°C (crit = +35.0°C, emerg = +40.0°C) ALARM(EMERGENCY)
In case that current temperature is higher than critical threshold,
CRIT alarm will be reported in sensors utility:
$ sensors
...
front panel 025: +54.0°C (crit = +35.0°C, emerg = +80.0°C) ALARM(CRIT)
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the value of MLXSW_HWMON_ATTR_COUNT is calculated not really
accurate.
Add several defines to make the calculation clearer and easier to
change.
Calculate the precise high bound of number of attributes that may be
needed.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw_hwmon_module_temp_show(), mlxsw_hwmon_module_temp_critical_show()
and mlxsw_hwmon_module_temp_emergency_show() query the relevant
temperature from firmware and fill the value in provided buffers.
Split the temperature querying functionality to individual get()
functions and call them from the show() functions.
The get() functions will be used by subsequent patches in the set.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix an issue where the driver wrongly detected ipv6 neighbour updates
from the NFP as corrupt. Add a reserved field on the kernel side so
it is similar to the ipv4 version of the struct and has space for the
extra bytes from the card.
Fixes: 9ea9bfa122 ("nfp: flower: support ipv6 tunnel keep-alive messages from fw")
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add parenthesis to clarify a boolean usage.
Pointed out in https://lore.kernel.org/lkml/202008060413.VgrMuqLJ%25lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only thing calling ionic_napi any more is the adminq
processing, so combine and simplify.
Co-developed-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some unnecessary struct fields and related code.
Co-developed-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move a few active struct fields to the front of the struct
for a little better cache use and performance.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal page handling can be cleaned up by passing our
local page struct rather than dma addresses, and by putting
more of the mgmt code into the alloc and free routines.
Co-developed-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
We recently added some calls to clk_disable_unprepare() but we missed
the last error path if register_netdev() fails.
I made a couple cleanups so we avoid mistakes like this in the future.
First I reversed the "if (!ret)" condition and pulled the code in one
indent level. Also, the "port->netdev = NULL;" is not required because
"port" isn't used again outside this function so I deleted that line.
Fixes: 4d5ae32f5e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VALIDATE_MASK(eth_mask->h_source) is checked twice in a row in
bcmgenet_validate_flow(). Add VALIDATE_MASK(eth_mask->h_dest)
instead.
Fixes: 3e37095228 ("net: bcmgenet: add support for ethtool rxnfc flows")
Signed-off-by: Denis Efremov <efremov@linux.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for a new port mode that is a backplane connection without
support for auto negotiation.
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As qedr driver supports both RoCE and iWarp, make sure to set the
max_pkeys only when running in RoCE mode.
Link: https://lore.kernel.org/r/20200827141655.406185-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Defines UDP segmentation algorithm in hardware and supports
offloading UDP segmentation.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some static functions in the dpaa2-eth driver don't have the dpaa2_eth_
prefix and this is becoming an inconvenience when looking at, for
example, a perf top output and trying to determine easily which entries
are dpaa2-eth related. Ammend this by adding the prefix to all the
functions.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some static functions in the dpaa2-eth driver don't have the dpaa2_eth_
prefix and this is becoming an inconvenience when looking at, for
example, a perf top output and trying to determine easily which entries
are dpaa2-eth related. Ammend this by adding the prefix to all the
functions.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some static functions in the dpaa2-eth driver don't have the dpaa2_eth_
prefix and this is becoming an inconvenience when looking at, for
example, a perf top output and trying to determine easily which entries
are dpaa2-eth related. Ammend this by adding the prefix to all the
functions.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, the device or firmware may be busy when the
driver attempts to perform the CRQ initialization handshake.
If the partner is busy, the hypervisor will return the H_CLOSED
return code. The aim of this patch is that, if the device is not
ready, to query the device a number of times, with a small wait
time in between queries. If all initialization requests fail,
the driver will remain in a dormant state, awaiting a signal
from the device that it is ready for operation.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-09-01
The following pull-request contains BPF updates for your *net-next* tree.
There are two small conflicts when pulling, resolve as follows:
1) Merge conflict in tools/lib/bpf/libbpf.c between 88a8212028 ("libbpf: Factor
out common ELF operations and improve logging") in bpf-next and 1e891e513e
("libbpf: Fix map index used in error message") in net-next. Resolve by taking
the hunk in bpf-next:
[...]
scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx);
data = elf_sec_data(obj, scn);
if (!scn || !data) {
pr_warn("elf: failed to get %s map definitions for %s\n",
MAPS_ELF_SEC, obj->path);
return -EINVAL;
}
[...]
2) Merge conflict in drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c between
9647c57b11 ("xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for
better performance") in bpf-next and e20f0dbf20 ("net/mlx5e: RX, Add a prefetch
command for small L1_CACHE_BYTES") in net-next. Resolve the two locations by retaining
net_prefetch() and taking xsk_buff_dma_sync_for_cpu() from bpf-next. Should look like:
[...]
xdp_set_data_meta_invalid(xdp);
xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
net_prefetch(xdp->data);
[...]
We've added 133 non-merge commits during the last 14 day(s) which contain
a total of 246 files changed, 13832 insertions(+), 3105 deletions(-).
The main changes are:
1) Initial support for sleepable BPF programs along with bpf_copy_from_user() helper
for tracing to reliably access user memory, from Alexei Starovoitov.
2) Add BPF infra for writing and parsing TCP header options, from Martin KaFai Lau.
3) bpf_d_path() helper for returning full path for given 'struct path', from Jiri Olsa.
4) AF_XDP support for shared umems between devices and queues, from Magnus Karlsson.
5) Initial prep work for full BPF-to-BPF call support in libbpf, from Andrii Nakryiko.
6) Generalize bpf_sk_storage map & add local storage for inodes, from KP Singh.
7) Implement sockmap/hash updates from BPF context, from Lorenz Bauer.
8) BPF xor verification for scalar types & add BPF link iterator, from Yonghong Song.
9) Use target's prog type for BPF_PROG_TYPE_EXT prog verification, from Udip Pant.
10) Rework BPF tracing samples to use libbpf loader, from Daniel T. Lee.
11) Fix xdpsock sample to really cycle through all buffers, from Weqaar Janjua.
12) Improve type safety for tun/veth XDP frame handling, from Maciej Żenczykowski.
13) Various smaller cleanups and improvements all over the place.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
On machines with much memory (> 2 TByte) and log_mtts_per_seg == 0, a
max_order of 31 will be passed to mlx_buddy_init(), which results in
s = BITS_TO_LONGS(1 << 31) becoming a negative value, leading to
kvmalloc_array() failure when it is converted to size_t.
mlx4_core 0000:b1:00.0: Failed to initialize memory region table, aborting
mlx4_core: probe of 0000:b1:00.0 failed with error -12
Fix this issue by changing the left shifting operand from a signed literal to
an unsigned one.
Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unneeded return value cast.
This is detected by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove dma_alloc_coherent return value cast.
This is detected by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In RMII link mode it's required to set bit 15 IFCTL_A in MAC_SL MAC_CONTROL
register to enable support for 100Mbit link speed.
Fixes: 93a7653031 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new bit TX_GENF_CLR_EN has been added in AM65x SR2.0 to fix i2083
errata, which can be just set unconditionally for all SoCs.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
W=1 warnings indicated that 'rc' was unused in efx_mcdi_set_id_led();
change the function to return int instead of void and plumb the rc
through the caller efx_ethtool_phys_id().
Since (post-Falcon) all sfc NICs use MCDI for this, there's no point in
indirecting through a nic_type method, so remove that and just call
efx_mcdi_set_id_led() directly.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Missing 'struct' keyword caused "cannot understand function prototype"
warnings.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to some past refactor, 'spec' is not actually used in this
function; the code using it moved to the callee efx_farch_filter_remove.
Remove the variable to fix a W=1 warning.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of these RX-event flags aren't used at all, so remove them.
Others are used only #ifdef DEBUG to log a message; suppress the
unused-var warnings #ifndef DEBUG with a void cast.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add debugfs node for querying function table, for example:
cat /sys/kernel/debug/hinic/0000:15:00.0/func_table/valid
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add debugfs node for querying rq info, for example:
cat /sys/kernel/debug/hinic/0000:15:00.0/RQs/0x0/rq_hw_pi
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add debugfs node for querying sq info, for example:
cat /sys/kernel/debug/hinic/0000:15:00.0/SQs/0x0/sq_pi
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take the tx accounting out of the work_done calculation to
prevent a possible duplicate napi_schedule call when under
high Tx stress but low Rx traffic.
Fixes: b14e4e95f9 ("ionic: tx separate servicing")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test for dma_need_sync earlier to increase
performance. xsk_buff_dma_sync_for_cpu() takes an xdp_buff as
parameter and from that the xsk_buff_pool reference is dug out. Perf
shows that this dereference causes a lot of cache misses. But as the
buffer pool is now sent down to the driver at zero-copy initialization
time, we might as well use this pointer directly, instead of going via
the xsk_buff and we can do so already in xsk_buff_dma_sync_for_cpu()
instead of in xp_dma_sync_for_cpu. This gets rid of these cache
misses.
Throughput increases with 3% for the xdpsock l2fwd sample application
on my machine.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-11-git-send-email-magnus.karlsson@intel.com
Rename the AF_XDP zero-copy driver interface functions to better
reflect what they do after the replacement of umems with buffer
pools in the previous commit. Mostly it is about replacing the
umem name from the function names with xsk_buff and also have
them take the a buffer pool pointer instead of a umem. The
various ring functions have also been renamed in the process so
that they have the same naming convention as the internal
functions in xsk_queue.h. This so that it will be clearer what
they do and also for consistency.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
Replace the explicit umem reference passed to the driver in AF_XDP
zero-copy mode with the buffer pool instead. This in preparation for
extending the functionality of the zero-copy mode so that umems can be
shared between queues on the same netdev and also between netdevs. In
this commit, only an umem reference has been added to the buffer pool
struct. But later commits will add other entities to it. These are
going to be entities that are different between different queue ids
and netdevs even though the umem is shared between them.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
Convert tx_timeout handler to not do the full reset. As this was
the last user of ionic_reset_queues(), we can drop it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add to our new ionic_reconfigure_queues() to also be able to change
the number of queues in use, and to change the queue interrupt layout
between split and combined.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original way of changing ring length was to completely
tear down the lif's queue structure and then rebuild it, while
running the risk of allocations that might fail in the middle
and leave us with a broken driver.
Instead, we can set up all the new queue and descriptor
allocations first, then swap them out and delete the old
allocations. If the new allocations fail, we report the error,
stay with the old setup and continue running. This gives us
a safer path, and a smaller window of time where we're not
processing traffic.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
We really don't need to tear down and rebuild the whole queue structure
when changing the MTU; we can simply stop the queues, clean and refill,
then restart the queues.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use index counters rather than pointers for tracking head
and tail in the queues to save a little memory and to perhaps
slightly faster queue processing.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split out the queue descriptor blocks into separate dma
allocations to make for smaller blocks.
Co-developed-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
ionic_open() and ionic_stop() are not referenced outside of their
defining file, so make them static.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a block of stats structs attached to the lif instead of
little ones attached to each qcq. This simplifies our memory
management and gets rid of a lot of unnecessary indirection.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we aren't yet supporting multiple lifs, we can remove
complexity by removing the list concept and related code,
to be re-engineered later when actually needed.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kcalloc for allocating arrays of structures.
Following along after
commit e71642009cbdA ("ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc()")
there are a couple more array allocations that can be converted
to using devm_kcalloc().
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NIC might tell us its minimum MTU, but let's be sure not
to use something smaller than ETH_MIN_MTU.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a bunch of issues in cpsw_ndo_vlan_rx_kill_vid()
- pm_runtime_get_sync() returns non zero value. This results in
non zero value return to caller which will be interpreted as error.
So overwrite ret with zero.
- If VID matches with port VLAN VID, then set error code.
- Currently when VLAN interface is deleted, all of the VLAN mc addresses
are removed from ALE table, however the return values from ale function
calls are not checked. These functions can return error code -ENOENT.
But that shouldn't happen in a normal case. So add error print to
catch the situations so that these can be investigated and addressed.
return zero in these cases as these are not real error case, but only
serve to catch ALE table update related issues and help address the
same in the driver.
Fixes: ed3525eda4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This header file is currently included into the ethernet driver via a
relative path into the PHY subsystem. This is bad practice, and causes
issues for the upcoming move of the MDIO driver. Move the header file
into include/linux to clean this up.
v2:
Move header to include/linux/mdio
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create drivers/net/pcs and move the Synopsys DesignWare XPCS into the
new directory. Move the header file into a subdirectory
include/linux/pcs
Start a naming convention of all PCS files use the prefix pcs-, and
rename the XPCS files to fit.
v2:
Add include/linux/pcs
v4:
Fix include path in stmmac.
Remove PCS_DEVICES to avoid new prompts
Cc: Jose Abreu <Jose.Abreu@synopsys.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only usage of bb_ops is to assign its address to the ops field in
the mdiobb_ctrl struct, which is a const pointer. Make it const to allow
the compiler to put it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only usage of bb_ops is to assign its address to the ops field in
the mdiobb_ctrl struct, which is a const pointer. Make it const to allow
the compiler to put it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only usage of qca_serdev_ops is to pass its address to
serdev_device_set_client_ops() which takes a const pointer. Make it
const to allow the compiler to put it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two functions share the majority of the code, hence merge
them together. In the meanwhile, add a reset pass-in parameter
to differentiate them. Thus, the code is easier to read and to tell
the difference between reset_init and regular init.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the beginning of the function, from_passive_init is set false by
"adapter->from_passive_init = false;",
hence the if statement will never run.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When H_SEND_CRQ command returns with H_CLOSED, it means the
server's CRQ is not ready yet. Instead of resetting immediately,
we wait for the server to launch passive init.
ibmvnic_init() and ibmvnic_reset_init() should also return the
error code from ibmvnic_send_crq_init() call.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of comparing (adapter->init_done_rc == 1), let it
be (adapter->init_done_rc == PARTIALSUCCESS).
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A single cacheline might not contain the packet header for
small L1_CACHE_BYTES values.
Use net_prefetch() as it issues an additional prefetch
in this case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A single cacheline might not contain the packet header for
small L1_CACHE_BYTES values.
Use net_prefetch() as it issues an additional prefetch
in this case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many device drivers use the same prefetch code structure to
deal with small L1 cacheline size.
Take this code into a function and call it from the drivers.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add dwmac-intel-plat to enable the stmmac driver in Intel Keem Bay.
Also add fix_mac_speed and tx_clk in order to change link speeds.
This is required as mac_speed_o is not connected in the
Intel Keem Bay SoC.
Signed-off-by: Rusaimi Amira Ruslan <rusaimi.amira.rusaimi@intel.com>
Signed-off-by: Vineetha G. Jaya Kumaran <vineetha.g.jaya.kumaran@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
when skb->encapsulation is 0, skb->ip_summed is CHECKSUM_PARTIAL
and it is udp packet, which has a dest port as the IANA assigned.
the hardware is expected to do the checksum offload, but the
hardware will not do the checksum offload when udp dest port is
6081.
This patch fixes it by doing the checksum in software.
Reported-by: Li Bing <libing@winhong.com>
Signed-off-by: Yi Li <yili@winhong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent changes to support user-defined RSS map assume that RX
rings are always reserved and the default RSS map is set after the
RX rings are successfully reserved. If the firmware spec is older
than 1.6.1, no ring reservations are required and the default RSS
map is not setup at all. In another scenario where the fw Resource
Manager is older, RX rings are not reserved and we also end up with
no valid RSS map.
Fix both issues in bnxt_need_reserve_rings(). In both scenarios
described above, we don't need to reserve RX rings so we need to
call this new function bnxt_check_rss_map_no_rmgr() to setup the
default RSS map when needed.
Without valid RSS map, the NIC won't receive packets properly.
Fixes: 1667cbf6a4 ("bnxt_en: Add logical RSS indirection table structure.")
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are no VF rings available during probe when the device is configured
using the Minimal-Static reservation strategy. In this case, the RSS
indirection table can only be initialized later, during bnxt_open_nic().
However, this was not happening because the rings will already have been
reserved via bnxt_init_dflt_ring_mode(), causing bnxt_need_reserve_rings()
to return false in bnxt_reserve_rings() and bypass the RSS table init.
Solve this by pushing the call to bnxt_set_dflt_rss_indir_tbl() into
__bnxt_reserve_rings(), which is common to both paths and is called
whenever ring configuration is changed. After doing this, the RSS table
init that must be called from bnxt_init_one() happens implicitly via
bnxt_set_default_rings(), necessitating doing the allocation earlier in
order to avoid a null pointer dereference.
Fixes: bd3191b5d8 ("bnxt_en: Implement ethtool -X to set indirection table.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware returns RESOURCE_ACCESS_DENIED for HWRM_TEMP_MONITORY_QUERY for
VFs. This produces unpleasing error messages in the log when temp1_input
is queried via the hwmon sysfs interface from a VF.
The error is harmless and expected, so silence it and return unknown as
the value. Since the device temperature is not particularly sensitive
information, provide flexibility to change this policy in future by
silencing the error rather than avoiding the HWRM call entirely for VFs.
Fixes: cde49a42a9 ("bnxt_en: Add hwmon sysfs support to read temperature")
Cc: Marc Smith <msmith626@gmail.com>
Reported-by: Marc Smith <msmith626@gmail.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_fw_reset_task() is run from a delayed workqueue. The current
code is not cancelling the workqueue in the driver's .remove()
method and it can potentially crash if the device is removed with
the workqueue still pending.
The fix is to clear the BNXT_STATE_IN_FW_RESET flag and then cancel
the delayed workqueue in bnxt_remove_one(). bnxt_queue_fw_reset_work()
also needs to check that this flag is set before scheduling. This
will guarantee that no rescheduling will be done after it is cancelled.
Fixes: 230d1f0de7 ("bnxt_en: Handle firmware reset.")
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a PCI error is detected the PCI state could be corrupt, save
the PCI state after initialization and restore it after the slot
reset.
Fixes: 6316ea6db9 ("bnxt_en: Enable AER support.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are returning the wrong count for ETH_SS_STATS in get_sset_count()
when XDP or TCs are enabled. In a recent commit, we got rid of
irrelevant counters when the ring is RX only or TX only, but we
did not make the proper adjustments for the count. As a result,
when we have XDP or TCs enabled, we are returning an excess count
because some of the rings are TX only. This causes ethtool -S to
display extra counters with no counter names.
Fix bnxt_get_num_ring_stats() by not assuming that all rings will
always have RX and TX counters in combined mode.
Fixes: 125592fbf4 ("bnxt_en: show only relevant ethtool stats for a TX or RX ring")
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If firmware goes into unstable state, HWRM_NVM_GET_DIR_INFO firmware
command may return zero dir entries. Return error in such case to
avoid zero length dma buffer request.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rare conditions like two stage OS installation, the
ethtool's get_channels function may be called when the
device is in D3 state, leading to uncorrectable PCI error.
Check netif_running() first before making any query to FW
which involves writing to BAR.
Fixes: db4723b3cd ("bnxt_en: Check max_tx_scheduler_inputs value from firmware.")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the time of do_rest, ibmvnic tries to re-initalize the tx_pools
and rx_pools to avoid re-allocating the long term buffer. However
there is a window inside do_reset that the tx_pools and
rx_pools were freed before re-initialized making it possible to deference
null pointers.
This patch fix this issue by always check the tx_pool
and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
the pools. This will avoid getting into calling reset_tx/rx_pools with
NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.
Signed-off-by: Mingming Cao <mmc@linux.vnet.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To flush the vid + mc entries from ALE, which is required when a VLAN
interface is removed, driver needs to call cpsw_ale_flush_multicast()
with ALE_PORT_HOST for port mask as these entries are added only for
host port. Without this, these entries remain in the ALE table even
after removing the VLAN interface. cpsw_ale_flush_multicast() calls
cpsw_ale_flush_mcast which expects a port mask to do the job.
Fixes: ed3525eda4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To flush the vid + mc entries from ALE, which is required when a VLAN
interface is removed, driver needs to call cpsw_ale_flush_multicast()
with ALE_PORT_HOST for port mask as these entries are added only for
host port. Without this, these entries remain in the ALE table even
after removing the VLAN interface. cpsw_ale_flush_multicast() calls
cpsw_ale_flush_mcast which expects a port mask to do the job.
Fixes: 15180eca56 ("net: ethernet: ti: cpsw: fix vlan mcast")
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4 does not look for HASHTBLMEMCRCERR and CMDTIDERR
bits in LE_DB_INT_CAUSE register, but these are enabled
in LE_DB_INT_ENABLE. So, add error handlers to LE
interrupt handler to emit a warning or alert message
for hash table mem crc and cmd tid errors
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds PTP clock and uses it in Octeontx2
network device. PTP clock uses mailbox calls to
access the hardware counter on the RVU side.
Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Aleksey Makarov <amakarov@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Precision Timestamping block found on Octeontx2
platform is an independent coprocessor and has
internal PTP hardware clock. Once configured PTP
runs independently and when a packet arrives
CGX hardware block gets the current timestamp
from PTP block and forwards the packet to NIX
by prepending timestamp to the packet.
This patch adds the pci driver for PTP block.
The driver gets registered by AF driver and does
initial configuration and exposes a mailbox function to
read and adjust PTP hardware clock. The mailbox function
is called by AF consumers like netdev drivers or
userspace drivers. Since PTP being a single block
in platform this driver helps in accessing PTP
block by any AF consumer.
Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Aleksey Makarov <amakarov@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Four new mbox messages ids and handler are added in order to
enable or disable timestamping procedure on tx and rx side.
Additionally when PTP is enabled, the packet parser must skip
over 8 bytes and start analyzing packet data there. To make NPC
profiles work seemlesly PTR_ADVANCE of IKPU is set so that
parsing can be done as before when all data pointers
are shifted by 8 bytes automatically.
Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Zyta Szpak <zyta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
check_fcs() was returning bool as 0/1, which was a sign that the sense
of the function was unclear: false was good, which doesn't really match
a name like 'check_$thing'. So rename it to ef100_has_fcs_error(), and
use proper booleans in the return statements.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case recovery was not successful, netdev still should be
present. But we should clear cdev if something bad happens
on recovery.
We also check cdev for null on dev close. That could be a case
if recovery was not successful.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gather and push out full device dump to devlink.
Device dump is the same as with `ethtool -d`, but now its generated
exactly at the moment bad thing happens.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove forcible recovery trigger and put it as a normal devlink
callback.
This allows user to enable/disable it via
devlink health set pci/0000:03:00.0 reporter fw_fatal auto_recover false
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devlink_health_report to push error indications.
We implement this in qede via callback function to make it possible
to reuse the same for other drivers sitting on top of qed in future.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here we declare health reporter ops (empty for now)
and register these in qed probe and remove callbacks.
This way we get devlink attached to all kind of qed* PCI
device entities: networking or storage offload entity.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here we return existing fw & mfw versions, we also fetch device's
serial number:
~$ sudo ~/iproute2/devlink/devlink dev info
pci/0000:01:00.1:
driver qed
board.serial_number REE1915E44552
versions:
running:
fw.app 8.42.2.0
stored:
fw.mgmt 8.52.10.0
MFW and FW are different firmwares on device.
Management is a firmware responsible for link configuration and
various control plane features. Its permanent and resides in NVM.
Running FW (or fastpath FW) is an embedded microprogram implementing
all the packet processing, offloads, etc. This FW is being loaded
on each start by the driver from FW binary blob.
The base device specific structure (qed_dev_info) was not directly
available to the base driver before. Thus, here we create and store
a private copy of this structure in qed_dev root object to
access the data.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces stubs in kconfig help entries with an actual description.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink instance lifecycle was linked to qed_dev object,
that caused devlink to be recreated on each recovery.
Changing it by making higher level driver (qede) responsible for its
life. This way devlink now survives recoveries.
qede now stores devlink structure pointer as a part of its device
object, devlink private data contains a linkage structure,
qed_devlink.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are extending devlink infrastructure, thus move the existing
stuff into a new file qed_devlink.c
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When devm_gpiod_get_optional() fails, bus should be
freed just like when of_mdiobus_register() fails.
Fixes: 1bddd96cba ("net: arc_emac: support the phy reset for emac driver")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>