Everytime driver registers with firmware, driver is required to
register for async event notifications as well. These 2 calls
are done using the same firmware command and can be combined.
We are also missing the 2nd step to register for async events
in the suspend/resume path and this will fix it. Prior to this,
we were getting only default notifications.
ULP can register for additional async events for the RDMA driver,
so we add a parameter to the new function to only do step 2 when
it is called from ULP.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
In the bnxt_init_one() failure path, if the driver has already called
firmware to register the driver, it is not undoing the driver
registration. Add this missing step to unregister for correctness,
so that the firmware knows that the driver has unloaded.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Fix BNXT_CHIP_NUM_5645X() to include 57452 and 56454 chip IDs, so
that these chips will be properly classified as P4 chips to take
advantage of the P4 fixes and features.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Report health status update to devlink health reporter, once
reset is completed.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If firmware supports hot reset, extend ETHTOOL_RESET to support
hot reset driver which does not require a driver reload after
ETHTOOL_RESET. The driver will go through the same coordinated
reset sequence as a firmware initiated fatal/non-fatal reset.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When hardware reports RX buffer errors, the latest 57500 chips do not
require reset. The packet is discarded by the hardware and the
ring will continue to operate.
Also, add an rx_buf_errors counter for this type of error. It can help
the user to identify if the aggregation ring is too small.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The aRFS ring table interface has changed for the 57500 chips. Updating
it accordingly so it will work with the latest production firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The decap (VXLAN tunnel) flow rules are not getting offloaded with
upstream kernel. This is because TC block callback infrastructure has
been updated to use indirect callbacks to get offloaded rules from
other higher level devices (such as tunnels), instead of ndo_setup_tc().
Since the decap rules are applied to the tunnel devices (e.g, vxlan_sys),
the driver should register for indirect TC callback with tunnel devices
to get the rules for offloading. This patch updates the driver to
register and process indirect TC block callbacks from VXLAN tunnels.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver supports 2 types of core dumps.
1. Live dump - Firmware dump when system is up and running.
2. Crash dump - Dump which is collected during firmware crash
that can be retrieved after recovery.
Crash dump is currently supported only on specific 58800 chips
which can be retrieved using OP-TEE API only, as firmware cannot
access this region directly.
User needs to set the dump flag using following command before
initiating the dump collection:
$ ethtool -W|--set-dump eth0 N
Where N is "0" for live dump and "1" for crash dump
Command to collect the dump after setting the flag:
$ ethtool -w eth0 data Filename
v3: Modify set_dump to support even when CONFIG_TEE_BNXT_FW=n.
Also change log message to netdev_info().
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Sheetal Tigadoli <sheetal.tigadoli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In error recovery process when firmware indicates that it is
completely down, initiate a firmware reset by calling OP-TEE API.
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Sheetal Tigadoli <sheetal.tigadoli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new state is required when firmware indicates that the error
recovery process requires polling for firmware state to be completely
down before initiating reset. For example, firmware may take some
time to collect the crash dump before it is down and ready to be
reset.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware coredump messages take much longer than standard messages,
so increase the timeout accordingly.
Fixes: 6c5657d085 ("bnxt_en: Add support for ethtool get dump.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Health show command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw_fatal
pci/0000:af:00.0:
name fw_fatal
state healthy error 1 recover 1 grace_period 0 auto_recover true
Fatal events from firmware or missing periodic heartbeats will
be reported and recovery will be handled.
We also turn on the support flags when we register with the firmware to
enable this health and recovery feature in the firmware.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This call will handle fatal firmware errors by forcing a reset on the
firmware. The master function driver will carry out the forced reset.
The sequence will go through the same bnxt_fw_reset_task() workqueue.
This fatal reset differs from the non-fatal reset at the beginning
stages. From the BNXT_FW_RESET_STATE_ENABLE_DEV state onwards where
the firmware is coming out of reset, it is practically identical to the
non-fatal reset.
The next patch will add the periodic heartbeat check and the devlink
reporter to report the fatal event and to initiate the bnxt_fw_exception()
call.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a flag to mark that the firmware has encountered fatal condition.
The driver will not send any more firmware messages and will return
error to the caller. Fix up some clean up functions to continue
and not abort when the firmware message function returns error.
This is preparation work to fully handle firmware error recovery
under fatal conditions.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink health reporter for the firmware reset event. Once we get
the notification from firmware about the impending reset, the driver
will report this to devlink and the call to bnxt_fw_reset() will be
initiated to complete the reset sequence.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the bnxt_fw_reset() main function to handle firmware reset. This
is triggered by firmware to initiate an orderly reset, for example
when a non-fatal exception condition has been detected. bnxt_fw_reset()
will first wait for all VFs to shutdown and then start the
bnxt_fw_reset_task() work queue to go through the sequence of reset,
re-probe, and re-initialization.
The next patch will add the devlink reporter to start the sequence and
call bnxt_fw_reset().
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This event from firmware signals a coordinated reset initiated by the
firmware. It may be triggered by some error conditions encountered
in the firmware or other orderly reset conditions.
We store the parameters from this event. Subsequent patches will
add logic to handle reset itself using devlink reporters.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create new FW devlink_health_reporter, to know the current health
status of FW.
Command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw
pci/0000:af:00.0:
name fw
state healthy error 0 recover 0
FW status: Healthy; Reset count: 1
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new flag will be set in subsequent patches when firmware is
going through reset. If bnxt_close() is called while the new flag
is set, the FW reset sequence will have to be aborted because the
NIC is prematurely closed before FW reset has completed. We also
reject SRIOV configurations while FW reset is in progress.
v2: No longer drop rtnl_lock() in close and wait for FW reset to complete.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle the async event from the firmware that enables firmware health
monitoring. Store initial health metrics.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pre-map the GRC registers for periodic firmware health monitoring.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call the new firmware API HWRM_ERROR_RECOVERY_QCFG if it is supported
to discover the firmware health and recovery capabilities and settings.
This feature allows the driver to reset the chip if firmware crashes and
becomes unresponsive.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During IF_UP, newer firmware has a new status flag that indicates that
firmware has reset. Add new function bnxt_fw_init_one() to re-probe the
firmware and re-setup VF resources on the PF if necessary. If the
re-probe fails, set a flag to prevent bnxt_open() from proceeding again.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
This cleans up a lot of unneeded code and logic around the debugfs
files, making all of this much simpler and easier to understand.
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define the 57508, 57504, and 57502 chip IDs that are all part of the
BNXT_CHIP_P5 family of chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the new TPA feature in the 57500 chips, we need to discover the
feature first before setting up the netdev features. Refactor the
the firmware probe and init logic more cleanly into 2 functions and
and make these calls before setting up the netdev features.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new TPA implemantation has additional TPA counters that extend the
per-ring statistics block. Allocate the proper size accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new TPA feature on 57500 supports a larger number of concurrent TPAs
(up to 1024) divided among the functions. We need to add some logic to
map the hardware TPA ID to a software index that keeps track of each TPA
in progress. A 1:1 direct mapping without translation would be too
wasteful as we would have to allocate 1024 TPA structures for each RX
ring on each PCI function.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an aggregation array to bnxt_tpa_info struct to keep track of the
aggregation completions. The aggregation completions are not
completed at the TPA_END completion on 57500 chips so we need to
keep track of them. The array is only allocated on the new chips
when required. An agg_count field is also added to keep track of the
number of these completions.
The maximum concurrent TPA is now discovered from firmware instead of
the hardcoded 64. Add a new bp->max_tpa to keep track of maximum
configured TPA.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new chips have a slightly modified TPA interface for LRO/GRO_HW.
Modify the TPA structures so that the same structures can also be
used on the new chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes contention over page allocation for XDP_REDIRECT actions by
adding page_pool support per queue for the driver. The performance for
XDP_REDIRECT actions scales linearly with the number of cores performing
redirect actions when using the page pools instead of the standard page
allocator.
v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds basic support for XDP_REDIRECT in the bnxt_en driver. Next
patch adds the more optimized page pool support.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__bnxt_xmit_xdp() is used by XDP_TX and ethtool loopback packet transmit.
Refactor it so that it can be re-used by the XDP_REDIRECT logic.
Restructure the TX interrupt handler logic to cleanly separate XDP_TX
logic in preparation for XDP_REDIRECT.
Acked-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Tal Gilboa and Yamin Fridman
Implement net DIM over a generic DIM library, add RDMA DIM
dim.h lib exposes an implementation of the DIM algorithm for
dynamically-tuned interrupt moderation for networking interfaces.
We want a similar functionality for other protocols, which might need to
optimize interrupts differently. Main motivation here is DIM for NVMf
storage protocol.
Current DIM implementation prioritizes reducing interrupt overhead over
latency. Also, in order to reduce DIM's own overhead, the algorithm might
take some time to identify it needs to change profiles. While this is
acceptable for networking, it might not work well on other scenarios.
Here we propose a new structure to DIM. The idea is to allow a slightly
modified functionality without the risk of breaking Net DIM behavior for
netdev. We verified there are no degradations in current DIM behavior with
the modified solution.
Suggested solution:
- Common logic is implemented in lib/dim/dim.c
- Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
the common logic in dim.c
- Any new DIM logic will be implemented in "lib/dim/new_dim.c".
This new implementation will expose modified versions of profiles,
dim_step() and dim_decision().
- DIM API is declared in include/linux/dim.h for all implementations.
Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Increased extensibility
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0SiDAACgkQSD+KveBX
+j4mlQgAoabbtxwIapMg62tKhnlr/3hEEyuEjiQRhzMUb5y5zOIJNRL82Nerv4K3
oD7v76emw/Xb1ErPe4yJJRdkpbpabEAFKbqkM0R7xFPNK33ZbphlU1d9YGgUKp1W
Uieydn5kfubzi6p7R1TC7c2qW1Wr6rE1wnNJyC/sy8zFEP/OEH0NmgPDyOs7ckWI
xaoJ1lhNWMoGN+Qk+fxazd1VtsKdcbc9JvLD2e8ON5qWdnuNiLhawcJVoWzfsCAa
V/Tdff+Rj9bVFsi468u20og4EdV3ceAOTzMnhCYpAtYZHVbVk6MFgh87uQYkTrfY
sW8BDmWKkRdHtzHd3a3yQiVDnz/iuw==
=bAZn
-----END PGP SIGNATURE-----
Merge tag 'blk-dim-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mamameed says:
====================
Generic DIM
From: Tal Gilboa and Yamin Fridman
Implement net DIM over a generic DIM library, add RDMA DIM
dim.h lib exposes an implementation of the DIM algorithm for
dynamically-tuned interrupt moderation for networking interfaces.
We want a similar functionality for other protocols, which might need to
optimize interrupts differently. Main motivation here is DIM for NVMf
storage protocol.
Current DIM implementation prioritizes reducing interrupt overhead over
latency. Also, in order to reduce DIM's own overhead, the algorithm might
take some time to identify it needs to change profiles. While this is
acceptable for networking, it might not work well on other scenarios.
Here we propose a new structure to DIM. The idea is to allow a slightly
modified functionality without the risk of breaking Net DIM behavior for
netdev. We verified there are no degradations in current DIM behavior with
the modified solution.
Suggested solution:
- Common logic is implemented in lib/dim/dim.c
- Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
the common logic in dim.c
- Any new DIM logic will be implemented in "lib/dim/new_dim.c".
This new implementation will expose modified versions of profiles,
dim_step() and dim_decision().
- DIM API is declared in include/linux/dim.h for all implementations.
Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Increased extensibility
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c.
This is both more structurally appealing and would allow to only
expose externally used functions.
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Removed 'net' prefix from functions and structs used by external drivers.
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Skip RDMA context memory allocations, reduce to 1 ring, and disable
TPA when running in the kdump kernel. Without this patch, the driver
fails to initialize with memory allocation errors when running in a
typical kdump kernel.
Fixes: cf6daed098 ("bnxt_en: Increase context memory allocations on 57500 chips for RDMA.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When making configuration changes, the driver calls bnxt_close_nic()
and then bnxt_open_nic() for the changes to take effect. A parameter
irq_re_init is passed to the call sequence to indicate if IRQ
should be re-initialized. This irq_re_init parameter needs to
be included in the bnxt_reserve_rings() call. bnxt_reserve_rings()
can only call pci_disable_msix() if the irq_re_init parameter is
true, otherwise it may hit BUG() because some IRQs may not have been
freed yet.
Fixes: 41e8d79837 ("bnxt_en: Modify the ring reservation functions for 57500 series chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Query support for the aRFS ring table index in the firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In newer firmware, the context memory for MR (Memory Region)
and AH (Address Handle) to support RDMA are specified separately.
Modify driver to specify and allocate the 2 context memory types
separately when supported by the firmware.
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HWRM_VER_GET firmware command returns package name that is running
actively on the adapter. Use this version instead of parsing from
the package log in NVRAM.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer firmware now advertises the capability for extended stats
support. Check the new capability in addition to the existing
version check.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gather periodic PCIe statistics for ethtool -S.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.
The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.
However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.
Signed-off-by: David S. Miller <davem@davemloft.net>
The code waits up to 20 usec for the firmware response to complete
once we've seen the valid response header in the buffer. It turns
out that in some scenarios, this wait time is not long enough.
Extend it to 150 usec and use usleep_range() instead of udelay().
Fixes: 9751e8e714 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer firmware understands the concept of a trusted VF, so propagate the
trusted VF attribute set by the PF admin. to the firmware. Also, check
the firmware trusted setting when considering the VF MAC address change
and reporting the trusted setting to the user.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BNXT only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get(). The conversion is
straight forward here since the PF and VF code use the same getter.
Since bnxt makes uses of switchdev_port_same_parent_id() convert it to
use netdev_port_same_parent_id().
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register devlink_port with devlink and create initial port params
table for bnxt_en. The table consists of a generic parameter:
wake_on_lan: Enables Wake on Lan for this port when magic packet
is received with this port's MAC address using ACPI pattern.
If enabled, the controller asserts a wake pin upon reception of
WoL packet. ACPI (Advanced Configuration and Power Interface) is
an industry specification for the efficient handling of power
consumption in desktop and mobile computers.
v2->v3:
- Modify bnxt_dl_wol_validate(), to throw error message when user gives
value other than DEVLINK_PARAM_WAKE_MAGIC ot to disable WOL.
- Use netdev_err() instead of netdev_warn(), when devlink_port_register()
and devlink_port_params_register() returns error. Also, don't log rc
in this message.
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Older firmware only supports 16-bit flow handle, because of which the
number of flows that can be offloaded can’t scale beyond a point.
Newer firmware supports 64-bit flow handle enabling the host to scale
upto millions of flows. With the new 64-bit flow handle support, driver
has to query flow stats in a different way compared to the older approach.
This patch adds support for 64-bit flow handle and new way to query
flow stats.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If RDMA is supported on the 57500 chip, increase context memory
allocations for the resources used by RDMA.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the new functions bnxt_alloc_ctx_pg_tbls()/bnxt_free_ctx_pg_tbls()
to allocate and free pages for context memory. The new functions
will handle the different levels of paging support and allocate/free
the pages accordingly using the existing functions.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support level 2 context page memory structures, enhance the
bnxt_ring_mem_info structure with a "depth" field to specify the page
level and add a flag to specify using full pages for L1 and L2 page
tables. This is needed to support RDMA functionality on 57500 chips
since RDMA requires more context memory.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Earlier, some of the firmware commands (ex: CFA_FLOW_*) which are processed
by KONG processor were sent to the CHIMP processor from the host. This
approach was taken as there was no direct message channel to KONG.
CHIMP in turn used to send them to KONG. Newer firmware supports a new
message channel which the host can send messages directly to the KONG
processor.
This patch adds support for required changes needed in the driver
to support direct KONG message channel. This speeds up flow related
messages sent to the firmware for CLS_FLOWER offload.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These routines will be enhanced in the subsequent patch to
return the 2nd firmware comm. channel's hwrm response address &
sequence id respectively.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for adding a 2nd communication channel to firmware.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set hwrm_intr_seq_id value to its inverted value instead of
HWRM_SEQ_INVALID, when an hwrm completion of type
CMPL_BASE_TYPE_HWRM_DONE is received. This will enable us to use
the complete 16-bit sequence ID space.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the current driver, the statistics reported by .ndo_get_stats64()
are reset when the device goes down. Store a snapshot of the
rtnl_link_stats64 before shutdown. This snapshot is added to the
current counters in .ndo_get_stats64() so that the counters will not
get reset when the device is down.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Display the CoS counters as additional priority counters by looking up
the priority to CoS queue mapping. If the TX extended port statistics
block size returned by firmware is big enough to cover the CoS counters,
then we will display the new priority counters. We call firmware to get
the up-to-date pri2cos mapping to convert the CoS counters to
priority counters.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some minor differences when assigning VF resources on the
new chips. The MSIX (NQ) resource has to be assigned and ring group
is not needed on the new chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_en requires same number of stat_ctxs as CP rings but RDMA
requires only 1 stat_ctx. Also add a new parameter resv_stat_ctxs
to better keep track of stat_ctxs reserved including resources used
by RDMA. Add a stat_ctxs parameter to all the relevant resource
reservation functions so we can reserve the correct number of
stat_ctxs.
Prior to this patch, we were not reserving the extra stat_ctx for
RDMA and RDMA would not work on the new 57500 chips.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling bnxt_set_max_func_stat_ctxs() to modify max stat_ctxs requested
or freed by the RDMA driver is wrong. After introducing reservation of
resources recently, the driver has to keep track of all stat_ctxs
including the ones used by the RDMA driver. This will provide a better
foundation for accurate accounting of the stat_ctxs.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For bnxt_en driver, stat_ctxs created will always be same as
cp_nr_rings. Remove extra variable that duplicates the value.
Also introduce bnxt_get_avail_stat_ctxs_for_en() helper to get
available stat_ctxs and bnxt_get_ulp_stat_ctxs() helper to return
number of stat_ctxs used by RDMA.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The available CP rings are calculated differently on the new 57500
chips, so add this helper to do this calculation correctly. The
VFs will be assigned these available CP rings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PF has a pool of NQs and MSIX vectors assigned to it based on
NVRAM configurations. The number of usable MSIX vectors on the PF
is the minimum of the NQs and MSIX vectors. Any excess NQs without
associated MSIX may be used for the VFs, so we need to store this
max_nqs value. max_nqs minus the NQs used by the PF will be the
available NQs for the VFs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new 57500 chips use 1 NQ per MSIX vector, whereas legacy chips use
1 CP ring per MSIX vector. To better unify this, add a resv_irqs
field to struct bnxt_hw_resc. On legacy chips, we initialize resv_irqs
with resv_cp_rings. On new chips, we initialize it with the allocated
MSIX resources.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To keep track of the number of times the workaround code for 57500 A0
has been triggered. This is a per NQ counter.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware can sometimes not generate NQ MSIX with a single pending
CP ring entry. This seems to always happen at the last entry of
the CP ring before it wraps. Add logic to check all the CP rings for
pending entries without the CP ring consumer index advancing. Calling
HWRM_DBG_RING_INFO_GET to read the context of the CP ring will flush
out the NQ entry and MSIX.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct type was copied from the line before but it should be "tx"
instead of "rx". I have reviewed the code and I can't immediately see
that this bug causes a runtime issue.
Fixes: 36e53349b6 ("bnxt_en: Add additional extended port statistics.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new poll function that polls for NQ events. If the NQ event is
a CQ notification, we locate the CP ring from the cq_handle and call
__bnxt_poll_work() to handle RX/TX events on the CP ring.
Add a new has_more_work field in struct bnxt_cp_ring_info to indicate
budget has been reached. __bnxt_poll_cqs_done() is called to update or
ARM the CP rings if budget has not been reached or not. If budget
has been reached, the next bnxt_poll_p5() call will continue to poll
from the CQ rings directly. Otherwise, the NQ will be ARMed for the
next IRQ.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the CP ring polling logic in bnxt_poll_work() into 2 separate
functions __bnxt_poll_work() and __bnxt_poll_work_done(). Since the logic
is separated, we need to add tx_pkts and events fields to struct bnxt_napi
to keep track of the events to handle between the 2 functions. We also
add had_work_done field to struct bnxt_cp_ring_info to indicate whether
some work was performed on the CP ring.
This is needed to better support the 57500 chips. We need to poll up to
2 separate CP rings before we update or ARM the CP rings on the 57500 chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the new 57500 chips, we need to allocate one RSS context for every
64 RX rings. In previous chips, only one RSS context per vnic is
required regardless of the number of RX rings. So increase the max
RSS context array count to 8.
Hardware ring groups are not used on the new chips. Note that the
software ring group structure is still maintained in the driver to
keep track of the rings associated with the vnic.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware ring allocation semantics are slightly different for most
ring types on 57500 chips. Allocation/deallocation for NQ rings are
also added for the new chips.
A CP ring handle is also added so that from the NQ interrupt event,
we can locate the CP ring.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On 57500 chips, the original bnxt_cp_ring_info struct now refers to the
NQ. bp->cp_nr_rings refer to the number of NQs on 57500 chips. There
are now 2 pointers for the CP rings associated with RX and TX rings.
Modify bnxt_alloc_cp_rings() and bnxt_free_cp_rings() accordingly.
With multiple CP rings per NAPI, we need to add a pointer in
bnxt_cp_ring_info struct to point back to the bnxt_napi struct.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 57500 series chips have a new 64-bit doorbell format. Use a new
bnxt_db_info structure to unify the new and the old 32-bit doorbells.
Add a new bnxt_set_db() function to set up the doorbell addreses and
doorbell keys ahead of time. Modify and introduce new doorbell
helpers to help abstract and unify the old and new doorbells.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
57500 series is a new chip class (P5) that requires some driver changes
in the next several patches. This adds basic chip ID, doorbells, and
the notification queue (NQ) structures. Each MSIX is associated with an
NQ instead of a CP ring in legacy chips. Each NQ has up to 2 associated
CP rings for RX and TX. The same bnxt_cp_ring_info struct will be used
for the NQ.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New device requires host context memory as a backing store. Call
firmware to check for context memory requirements and store the
parameters. Allocate host pages accordingly.
We also need to move the call bnxt_hwrm_queue_qportcfg() earlier
so that all the supported hardware queues and the IDs are known
before checking and allocating context memory.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer chips require the PTU_PTE_VALID bit to be set for every page
table entry for context memory and rings. Additional bits are also
required for page table entries for all rings. Add a flags field to
bnxt_ring_mem_info struct to specify these additional bits to be used
when setting up the pages tables as needed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the DMA page table and vmem fields in bnxt_ring_struct to a new
bnxt_ring_mem_info struct. This will allow context memory management
for a new device to re-use some of the existing infrastructure.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New firmware spec. allows interrupt coalescing parameters, such as
maximums, timer units, supported features to be queried. Update
the driver to make use of the new call to query these parameters
and provide the legacy defaults if the call is not available.
Replace the hard-coded values with these parameters.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support the max_ext_req_len field from the HWRM_VER_GET_RESPONSE.
If this field is valid and greater than the mailbox size, use the
short command format to send firmware messages greater than the
mailbox size. Newer devices use this method to send larger messages
to the firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Latest firmware spec. has some additional rx extended port stats and new
tx extended port stats added. We now need to check the size of the
returned rx and tx extended stats and determine how many counters are
valid. New counters added include CoS byte and packet counts for rx
and tx.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Among the new changes are trusted VF support, 200Gbps support, and new
API to dump ring information on the new chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver adjusts the bp->hw_resc.max_cp_rings by the number
of MSIX vectors used by RDMA. There is one code path in open that needs
to check the true max_cp_rings including any used by RDMA. This code
is now checking for the reduced max_cp_rings which will fail when the
number of cp rings is very small.
To fix this in a clean way, we don't adjust max_cp_rings anymore.
Instead, we add a helper bnxt_get_max_func_cp_rings_for_en() to get the
reduced max_cp_rings when appropriate.
Fixes: ec86f14ea5 ("bnxt_en: Add ULP calls to stop and restart IRQs.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unused bnxt_subtract_ulp_resources(). Change
bnxt_get_max_func_irqs() to static since it is only locally used.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expand the .ieee_setapp() and ieee_delapp() DCBNL methods to support
DSCP. This allows DSCP values to user priority mappings instead
of using VLAN priorities. Each DSCP mapping is added or deleted one
entry at a time using the firmware API. The firmware call can only be
made from a PF.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Export temperature sensor reading via hwmon sysfs.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use latest firmware API to notify firmware about IF state changes.
Firmware has the option to clean up resources during IF down and
to require the driver to reserve resources again during IF up.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flags field is almost getting full. Move firmware capability flags
to a new fw_cap field to better organize these firmware flags.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The BNXT_FLAG_NEW_RM flag is checked a lot in the code to determine if
the new resource manager is in effect. Define a macro to perform
this check.
Signed-off-by: Michael Chan <michael.chan@broadocm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the default hash mode flag in HWRM_VNIC_RSS_CFG to signal to the
firmware that the driver is compliant with the latest spec. With
that, the firmware can return expanded RSS profile IDs that the driver
checks to setup the proper gso_type for GRO-HW packets. But instead
of checking for the new profile IDs, we check the IP_TYPE flag
in TPA_START which is more straight forward than checking a list of
profile IDs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new mode is "minimal-static" to be used when resources are more
limited to support a large number of VFs, for example The PF driver
will provision guaranteed minimum resources of 0. Each VF has no
guranteed resources until it tries to reserve resources during device
open.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During hotplug, the driver's open function can be called almost
immediately after power on reset. The PHY may not be ready and the
firmware may return failure when the driver tries to update PHY
settings. Add retry logic fired from the driver's timer to retry
the operation for 5 seconds.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add code to detect firmware support for external loopback and the extra
test entry for external loopback.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New interface has firmware core dump support, new extended port
statistics, and IF state change notifications to the firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hwrm_dbg_resp_addr and hwrm_dbg_resp_dma_addr are never used
and can be removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling bnxt_set_max_func_irqs() to modify the max IRQ count requested or
freed by the RDMA driver is flawed. The max IRQ count is checked when
re-initializing the IRQ vectors and this can happen multiple times
during ifup or ethtool -L. If the max IRQ is reduced and the RDMA
driver is operational, we may not initailize IRQs correctly. This
problem shows up on VFs with very small number of MSIX.
There is no other logic that relies on the IRQ count excluding the ones
used by RDMA. So we fix it by just removing the call to subtract or
add the IRQs used by RDMA.
Fixes: a588e4580a ("bnxt_en: Add interface to support RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For SFP+ modules, 0xA2 page is available only when Diagnostic Monitoring
Type [Address A0h, Byte 92] is implemented. Extend bnxt_get_module_info(),
to read optical diagnostics support at offset 92(0x5c) and set eeprom_len
length to ETH_MODULE_SFF_8436_LEN (to exclude A2 page), if dianostics is
not supported.
Also in bnxt_get_module_info(), module id is read from offset 0x5e which
is not correct. It was working by accident, as offset was not effective
without setting enables flag in the firmware request. SFP module id is
present at location 0. Fix this by removing the offset and read it
from location 0.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent change to reduce delay granularity waiting for firmware
reponse has caused a regression. With a tighter delay loop,
the driver may see the beginning part of the response faster.
The original 5 usec delay to wait for the rest of the message
is not long enough and some messages are detected as invalid.
Increase the maximum wait time from 5 usec to 20 usec. Also, fix
the debug message that shows the total delay time for the response
when the message times out. With the new logic, the delay time
is not fixed per iteration of the loop, so we define a macro to
show the total delay time.
Fixes: 9751e8e714 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>