Firmware resets initiated by the user are not errors and should not
be reported via devlink. Once only unsolicited resets remain, it is no
longer sensible to maintain a separate fw_reset reporter.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recovery election messages are often mistaken for errors. Improve
the wording to clarify the meaning of these frequent and expected
events. Also, take the first step towards more inclusive language.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reported parameter value should not take into account the state
of remote drivers. Firmware will reject remote resets as appropriate,
thus it is not strictly necessary to check HOT_RESET_ALLOWED before
attempting to initiate a reset. But we add the check so that we can
provide more intuitive messages when reset is not permitted.
This firmware setting needs to be restored from all functions after
a firmware reset.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to reload driver_reinit, the RTNL lock is held across reload
down and up to prevent interleaving state changes. But we need to
subsequently release the RTNL lock while waiting for firmware reset
to complete.
Also keep a statistic on fw_activate resets initiated remotely from
other functions.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RTNL lock must be held between down and up to prevent interleaving
state changes, especially since external state changes might release
and allocate different driver resource subsets that would otherwise
need to be tracked and carefully handled. If the down function fails,
then devlink will not call the corresponding up function, thus the
lock is released in the down error paths.
v2: Don't use devlink_reload_disable() and devlink_reload_enable().
Instead, check that the netdev is not in unregistered state before
proceeding with reload.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-Off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resource reservations will also need to be reset after FUNC_DRV_UNRGTR
in the following devlink driver_reinit patch. Extract this logic into a
reusable function.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device info logged during probe will be reused by the devlink
driver_reinit code in a following patch. Extract this logic into
the new bnxt_print_device_info() function. The board index needs
to be saved in the driver context so that the board information
can be retrieved at a later time, outside of the probe function.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Spectrum ASIC allows configuration of maximum shaper on all levels of
the scheduling hierarchy: TCs, subgroups, groups and also ports. Currently,
TBF always configures a subgroup. But a user could reasonably express the
intent to configure port shaper by putting TBF to a root position, around
ETS / PRIO. Accept this usage and offload appropriately.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Return error code if devm_kmemdup() fails in ice_get_recp_frm_fw()
Fixes: fd2a6b71e3 ("ice: create advanced switch recipe")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Clang warns:
drivers/net/ethernet/intel/ice/ice_lib.c:1906:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
default:
^
drivers/net/ethernet/intel/ice/ice_lib.c:1906:2: note: insert 'break;' to avoid fall-through
default:
^
break;
1 error generated.
Clang is a little more pedantic than GCC, which does not warn when
falling through to a case that is just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing break to silence
the warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1482
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Some devices have support for loading the PHY FW and in some cases this
can fail. When this fails, the FW will set the corresponding bit in the
link info structure. Also, the FW will send a link event if the correct
link event mask bit is set. Add support for printing an error message
when the PHY FW load fails during any link configuration flow and the
link event flow.
Since ice_check_module_power() is already doing something very similar
add a new function ice_check_link_cfg_err() so any failures reported in
the link info's link_cfg_err member can be printed in this one function.
Also, add the new ICE_FLAG_PHY_FW_LOAD_FAILED bit to the PF's flags so
we don't constantly print this error message during link polling if the
value never changed.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This change adds support for changing MTU on port representor in
switchdev mode, by setting the min/max MTU values on port representor
netdev. Before it was possible to change the MTU only in a limited,
default range (68-1500).
Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Part of virtchannel messages are treated in different way in switchdev
mode to block configuring VFs from iavf driver side. This blocking was
done by doing nothing and returning success, event without sending
response.
Not sending response for opcodes that aren't supported in switchdev mode
leads to block iavf driver message handling. This happens for example
when vlan is configured at VF config time (VLAN module is already
loaded).
To get rid of it ice driver should answer for each VF message. In
switchdev mode:
- for adding/deleting VLAN driver should answer success without doing
anything to allow creating vlan device on VFs
- for enabling/disabling VLAN stripping and promiscuous mode driver
should answer not supported, this feature in switchdev can be only
set from host side
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Mostly reuse code from Geneve and VXLAN in TC parsing code. Add new GRE
header to match on correct fields. Create new dummy packets with GRE
fields.
Instead of checking if any encap values are presented in TC flower,
check if device is tunnel type or redirect is to tunnel device. This
will allow adding all combination of rules. For example filters only
with inner fields.
Return error in case device isn't tunnel but encap values are presented.
gre example:
- create tunnel device
ip l add $NVGRE_DEV type gretap remote $NVGRE_REM_IP local $VF1_IP \
dev $PF
- add tc filter (in switchdev mode)
tc filter add dev $NVGRE_DEV protocol ip parent ffff: flower dst_ip \
$NVGRE1_IP action mirred egress redirect dev $VF1_PR
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add definition of UDP tunnel dummy packets. Fill destination port value
in filter based on UDP tunnel port. Append tunnel flags to switch filter
definition in case of matching the tunnel.
Both VXLAN and Geneve are UDP tunnels, so only one new header is needed.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add definition for VXLAN and Geneve dummy packet. Define VXLAN and
Geneve type of fields to match on correct UDP tunnel header.
Parse tunnel specific fields from TC tool like outer MACs, outer IPs,
outer destination port and VNI. Save values and masks in outer header
struct and move header pointer to inner to simplify parsing inner
values.
There are two cases for redirect action:
- from uplink to VF - TC filter is added on tunnel device
- from VF to uplink - TC filter is added on PR, for this case check if
redirect device is tunnel device
VXLAN example:
- create tunnel device
ip l add $VXLAN_DEV type vxlan id $VXLAN_VNI dstport $VXLAN_PORT \
dev $PF
- add TC filter (in switchdev mode)
tc filter add dev $VXLAN_DEV protocol ip parent ffff: flower \
enc_dst_ip $VF1_IP enc_key_id $VXLAN_VNI action mirred egress \
redirect dev $VF1_PR
Geneve example:
- create tunnel device
ip l add $GENEVE_DEV type geneve id $GENEVE_VNI dstport $GENEVE_PORT \
remote $GENEVE_IP
- add TC filter (in switchdev mode)
tc filter add dev $GENEVE_DEV protocol ip parent ffff: flower \
enc_key_id $GENEVE_VNI dst_ip $GENEVE1_IP action mirred egress \
redirect dev $VF1_PR
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement indirect notification mechanism to support offloading TC rules
on tunnel devices.
Keep indirect block list in netdev priv. Notification will call setting
tc cls flower function. For now we can offload only ingress type. Return
not supported for other flow block binder.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch is to dispaly channel and channel_mask for each RX
interface of NPC MCAM rule.
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10k SoCs use atomic stores of up to 128 bytes to submit
packets/instructions into co-processor cores. The enqueueing is performed
using Large Memory Transaction Store (LMTST) operations. They allow for
lockless enqueue operations - i.e., two different CPU cores can submit
instructions to the same queue without needing to lock the queue or
synchronize their accesses.
This patch implements a new debugfs entry for dumping LMTST map
table present on CN10K, as this might be very useful to debug any issue
in case of shared LMTST region among multiple pci functions.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Few changes in rvu_debugfs.c file to remove unwanted characters,
indenting the code, added a new comment line etc.
Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes possible null pointer dereference in files
"rvu_debugfs.c" and "rvu_nix.c"
Fixes: 8756828a81 ("octeontx2-af: Add NPA aura and pool contexts to debugfs")
Fixes: 9a946def26 ("octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries")
Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we are using a fixed buffer size of length 2048 to display
rsrc_alloc output. As a result a maximum of 2048 characters of
rsrc_alloc output is displayed, which may lead sometimes to display only
partial output. This patch fixes this dependency on max limit of buffer
size and displays all PF VF entries.
Each column of the debugfs entry "rsrc_alloc" uses a fixed width of 12
characters to print the list of LFs of each block for a PF/VF. If the
length of list of LFs of a block exceeds this fixed width then the list
gets truncated and displays only a part of the list. This patch fixes
this by using the maximum possible length of list of LFs among all
blocks of all PFs and VFs entries as the width size.
Fixes: f788409714 ("octeontx2-af: Formatting debugfs entry rsrc_alloc.")
Fixes: 23205e6d06 ("octeontx2-af: Dump current resource provisioning status")
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <Sunil.Goutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While displaying ingress policers information in
debugfs check whether ingress policers exist in
the hardware or not because some platforms(CN9XXX)
do not have this feature.
Fixes: e7d8971763 ("octeontx2-af: cn10k: Debugfs support for bandwidth")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver allocates skb during ndo_open with GFP_ATOMIC which has high chance of failure when there are multiple instances.
GFP_KERNEL is enough while open and use GFP_ATOMIC only from interrupt context.
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable err will be reassigned on subsequent branches, and this
assignment does not perform related value operations. This will cause
the double parentheses to be redundant, so the inner parentheses should
be deleted.
clang_analyzer complains as follows:
drivers/net/ethernet/marvell/sky2.c:4988: warning:
Although the value stored to 'err' is used in the enclosing expression,
the value is never actually read from 'err'.
Changes in v2:
modify title category:octeontx2-af to sky2.
delete the inner parentheses.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
MTU change is refused whenever the value of new MTU is bigger than
the max packet bytes that fits in NFP Cluster Target Memory (CTM).
However, an eBPF program doesn't always need to access the whole
packet data.
The maximum direct packet access (DPA) offset has always been
caculated by verifier and stored in the max_pkt_offset field of prog
aux data.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Reviewed-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Reviewed-by: Niklas Soderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mvpp2_phylink_validate() no longer needs to check for
PHY_INTERFACE_MODE_NA as phylink will walk the supported interface
types to discover the link mode capabilities. Remove these checks.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function, and we can also get rid of our hack to indicate
both 1000base-X and 2500base-X if the comphy is present to make that
work. Remove this hack and use of phylink_helper_basex_speed().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Populate the phy interface mode bitmap for the Marvell mvpp2 driver
with interfaces modes supported by the MAC.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4d98bb0d7e ("net: macb: Use mdio child node for MDIO bus if it
exists") added code to detect if a 'mdio' child node exists to the macb
driver. Ths added code does, however, not actually check if the child node
exists, but if the parent node exists. This results in errors such as
macb 10090000.ethernet eth0: Could not attach PHY (-19)
if there is no 'mdio' child node. Fix the code to actually check for
the child node.
Fixes: 4d98bb0d7e ("net: macb: Use mdio child node for MDIO bus if it exists")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/20211026173950.353636-1-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch makes the driver r8169 pick up device Realtek Semiconductor Co.
, Ltd. Device [10ec:8162].
Signed-off-by: Janghyub Seo <jhyub06@gmail.com>
Suggested-by: Rushab Shah <rushabshah32@gmail.com>
Link: https://lore.kernel.org/r/1635231849296.1489250046.441294000@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function, and we can also get rid of our hack to indicate
both 1000base-X and 2500base-X if the comphy is present to make that
work. Remove this hack and use of phylink_helper_basex_speed().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Populate the phy_interface_t bitmap for the Marvell mvneta driver with
interfaces modes supported by the MAC.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adjusts the string spaces of some parameters of tx bd info in
debugfs according to their maximum needs.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The specified buffer length for three debugfs files fd_tcam, uc and tqp
is not enough for their maximum needs, so this patch fixes them.
Fixes: b5a0b70d77 ("net: hns3: refactor dump fd tcam of debugfs")
Fixes: 1556ea9120 ("net: hns3: refactor dump mac list of debugfs")
Fixes: d96b0e5946 ("net: hns3: refactor dump reg of debugfs")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the width of packets number registers is 32 bits, they needs at most
10 characters for decimal data printing, but now the string spaces is not
enough, so this patch fixes it.
Fixes: e44c495d95 ("net: hns3: refactor queue info of debugfs")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The member data in struct hclge_desc is type of __le32, it needs endian
conversion before using it, and some functions of debugfs didn't do that,
so this patch fixes it.
Fixes: c0ebebb9cc ("net: hns3: Add "dcb register" status information query function")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the workqueue of hclge/hclgevf is executed on
the CPU that initiates scheduling requests by default. In
stress scenarios, the CPU may be busy and workqueue scheduling
is completed after a long period of time. To avoid this
situation and implement proper scheduling, use the WQ_UNBOUND
mode instead. In this way, the workqueue can be performed on
a relatively idle CPU.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a TP port is configured by follow steps:
1.ethtool -s ethx autoneg off speed 100 duplex full
2.ethtool -A ethx rx on tx on
3.ethtool -s ethx autoneg on(rx&tx negotiated pause results are off)
4.ethtool -s ethx autoneg off speed 100 duplex full
In step 3, driver will set rx&tx pause parameters of hardware to off as
pause parameters negotiated with link partner are off.
After step 4, the "ethtool -a ethx" command shows both rx and tx pause
parameters are on. However, pause parameters of hardware are still off
and port has no flow control function actually.
To fix this problem, if autoneg is disabled, driver uses its saved
parameters to restore pause of hardware. If the speed is not changed in
this case, there is no link state changed for phy, it will cause the pause
parameter is not taken effect, so we need to force phy to go down and up.
Fixes: aacbe27e82 ("net: hns3: modify how pause options is displayed")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit introduces HW-GRO offload by using the SHAMPO feature
- Add set feature handler for HW-GRO.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch adds HW_GRO counters to RX packets statistics:
- gro_match_packets: counter of received packets with set match flag.
- gro_packets: counter of received packets over the HW_GRO feature,
this counter is increased by one for every received
HW_GRO cqe.
- gro_bytes: counter of received bytes over the HW_GRO feature,
this counter is increased by the received bytes for every
received HW_GRO cqe.
- gro_skbs: counter of built HW_GRO skbs,
increased by one when we flush HW_GRO skb
(when we call a napi_gro_receive with hw_gro skb).
- gro_large_hds: counter of received packets with large headers size,
in case the packet needs new SKB, the driver will allocate
new one and will not use the headers entry to build it.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
this patch updates the SHAMPO CQE handler to support HW_GRO,
changes in the SHAMPO CQE handler:
- CQE match and flush fields are used to determine if to build new skb
using the new received packet,
or to add the received packet data to the existing RQ.hw_gro_skb,
also this fields are used to determine when to flush the skb.
- in the end of the function mlx5e_poll_rx_cq the RQ.hw_gro_skb is flushed.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The header buffer is used to store the headers of the rx packets.
The header buffer size deduced from WorkQueue size + restriction
of max packets per WorkQueueElement.
This commit adds the functionality for posting/updating memory for
the header buffer during the posting/updating of WQEs.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch adds the new CQE SHAMPO fields:
- flush: indicates that we must close the current session and pass the SKB
to the network stack.
- match: indicates that the current packet matches the oppened session,
the packet will be merge into the current SKB.
- header_size: the size of the packet headers that written into the headers
buffer.
- header_entry_index: the entry index in the headers buffer.
- data_offset: packets data offset in the WQE.
Also new cqe handler is added to handle SHAMPO packets:
- The new handler uses CQE SHAMPO fields to build the SKB.
CQE's Flush and match fields are not used in this patch, packets are not
merged in this patch.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit introduces the control path infrastructure for SHAMPO feature.
SHAMPO feature enables packet stitching by splitting packets to
header and payload, the header is placed on a dedicated buffer
and the payload on the RX ring, this allows stitching the data part
of a flow together continuously in the receive buffer.
SHAMPO feature is implemented as linked list striding RQ feature.
To support packets splitting and payload stitching:
- Enlarge the ICOSQ and the correspond CQ to support the header buffer
memory regions.
- Add support to create linked list striding RQ with SHAMPO feature set
in the open_rq function.
- Add deallocation function and corresponded calls for SHAMPO header
buffer.
- Add mlx5e_create_umr_klm_mkey to support KLM mkey for the header
buffer.
- Rename mlx5e_create_umr_mkey to mlx5e_create_umr_mtt_mkey.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit adds the needed definitions for using the klm_umr_wqe.
UMR stands for user-mode memory registration, is a mechanism to alter
address translation properties of MKEY by posting WorkQueueElement
aka WQE on send queue.
MKEY stands for memory key, MKEY are used to describe a region in memory that
can be later used by HW.
KLM stands for {Key, Length, MemVa}, KLM_MKEY is indirect MKEY that enables
to map multiple memory spaces with different sizes in unified MKEY.
klm_umr_wqe is a UMR that use to update a KLM_MKEY.
SHAMPO feature uses KLM_MKEY for memory registration of his header buffer.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This series introduces new packet merge type, therefore rename lro
functions to packet merge to support the new merge type:
- Generalize + rename mlx5e_build_tir_ctx_lro to
mlx5e_build_tir_ctx_packet_merge.
- Rename mlx5e_modify_tirs_lro to mlx5e_modify_tirs_packet_merge.
- Rename lro bit in mlx5_ifc_modify_tir_bitmask_bits to packet_merge.
- Rename lro_en in mlx5e_params to packet_merge_type type and combine
packet_merge params into one struct mlx5e_packet_merge_param.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit adds SHAMPO bit to hca_cap and SHAMPO capabilities structure,
SHAMPO related HW spec hardware fields and enumerations.
SHAMPO stands for: split headers and merge payload offload.
SHAMPO new fields:
WQ:
- headers_mkey: mkey that represents the headers buffer, where the packets
headers will be written by the HW.
- shampo_enable: flag to verify if the WQ supports SHAMPO feature.
- log_reservation_size: the log of the reservation size where the data of
the packet will be written by the HW.
- log_max_num_of_packets_per_reservation: log of the maximum number of
packets that can be written to the same reservation.
- log_headers_entry_size: log of the header entry size of the headers buffer.
- log_headers_buffer_entry_num: log of the entries number of the headers buffer.
RQ:
- shampo_no_match_alignment_granularity: the HW alignment granularity
in case the received packet doesn't match the current session.
- shampo_match_criteria_type: the type of match criteria.
- reservation_timeout: the maximum time that the HW will hold the
reservation.
mlx5_ifc_shampo_cap_bits, the capabilities of the SHAMPO feature:
- shampo_log_max_reservation_size: the maximum allowed value of the field
WQ.log_reservation_size.
- log_reservation_size: the minimum allowed value of the field
WQ.log_reservation_size.
- shampo_min_mss_size: the minimum payload size of packet that can open
a new session or be merged to a session.
- shampo_max_log_headers_entry_size: the maximum allowed value of the field
WQ.log_headers_entry_size
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
TIR stands for transport interface receive, the TIR object is
responsible for performing all transport related operations on
the receive side like packet processing, demultiplexing the packets
to different RQ's, etc.
lro_timeout is a field in the TIR that is used to set the timeout for lro
session, this series introduces new packet merge type, therefore rename
lro_timeout to packet_merge_timeout for all packet merge types.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
revert commit 46ae40b94d ("net/mlx5: Let user configure io_eq_size param")
revert commit a6cb08daa3 ("net/mlx5: Let user configure event_eq_size param")
revert commit 5546040619 ("net/mlx5: Let user configure max_macs param")
The EQE parameters are applicable to more drivers, they should
be configured via standard API, probably ethtool. Example of
another driver needing something similar:
https://lore.kernel.org/all/1633454136-14679-3-git-send-email-sbhatta@marvell.com/
The last param for "max_macs" is probably fine but the documentation
is severely lacking. The meaning and implications for changing the
param need to be stated.
Link: https://lore.kernel.org/r/20211026152939.3125950-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clang warns:
drivers/net/ethernet/asix/ax88796c_main.c:851:24: error: address of
array 'ax_local->phydev->advertising' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
if (ax_local->phydev->advertising &&
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ ~~
advertising cannot be NULL here if ax_local is not NULL, which cannot
happen due to the check in ax88796c_probe(). Remove the check.
Link: https://github.com/ClangBuiltLinux/linux/issues/1492
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang warns:
drivers/net/ethernet/asix/ax88796c_main.c:696:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
case SPEED_10:
^
drivers/net/ethernet/asix/ax88796c_main.c:696:2: note: insert 'break;' to avoid fall-through
case SPEED_10:
^
break;
drivers/net/ethernet/asix/ax88796c_main.c:706:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
case DUPLEX_HALF:
^
drivers/net/ethernet/asix/ax88796c_main.c:706:2: note: insert 'break;' to avoid fall-through
case DUPLEX_HALF:
^
break;
Clang is a little more pedantic than GCC, which permits implicit
fallthroughs to cases that contain just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing breaks to fix
the warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1491
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code doesn't allow setting the number of queues while the
NIC is down.
Update the ethtool handler functions to support setting the number of
queues while the NIC is at down state.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose via devlink-resource the maximum number of RIF MAC profiles and
their current occupancy, so it can be used for debug and writing generic
tests, like in the next patch.
Example for Spectrum-2 output:
$ devlink resource show pci/0000:06:00.0
...
name rif_mac_profiles size 4 occ 0 unit entry dpipe_tables none
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlxsw enforces that all the router interfaces (RIFs) have the
same MAC prefix.
Relax this limitation by using RIF MAC profiles. Each profile is
associated with a particular MAC prefix and multiple RIFs can use the
same profile. Therefore, the number of possible MAC prefixes is no
longer one, but the number of profiles supported by the device.
Store the profiles in an IDR and reference count them according to the
number of RIFs using them.
Associate a RIF with a profile when the RIF is created and remove the
association when the RIF is deleted.
Change the association following 'NETDEV_CHANGEADDR' events, except when
only one RIF is using the profile. In which case, change the MAC prefix
of the profile itself instead of associating the RIF with a new profile.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The next patch will set the MAC profile of a router interface (RIF) as
part of its configure() callback. The operation can fail in case the
maximum number of profiles was exceeded.
Add extack to mlxsw_sp_rif_ops::configure() in order to communicate such
failures to user space.
In addition, the MAC profile of a RIF can change following a
'NETDEV_CHANGEADDR' notification. Propagate extack to
mlxsw_sp_router_port_change_event() so that failures could be
communicated in this path as well.
No functional changes intended.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a resource identifier for maximum RIF MAC profiles so that it could
be later used to query the information from firmware.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MAC profile ID field to RITR register so that it could be used for
associating a RIF with a MAC profile ID by a later patch.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-10-25
This series contains updates to ice driver only.
Dave adds event handler for LAG NETDEV_UNREGISTER to unlink device from
link aggregate.
Yongxin Liu adds a check for PTP support during release which would
cause a call trace on non-PTP supported devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Misc updates for mlx5 driver:
1) Misc updates and cleanups:
- Don't write directly to netdev->dev_addr, From Jakub Kicinski
- Remove unnecessary checks for slow path flag in tc module
- Fix unused function warning of mlx5i_flow_type_mask
- Bridge, support replacing existing FDB entry
2) Sub Functions, Reduction in memory usage:
- Reduce flow counters bulk query buffer size
- Implement max_macs devlink parameter
- Add devlink vendor params to control Event Queue sizes
- Added SF life cycle trace points by Parav/
3) From Aya, Firmware health buffer reporting improvements
- Print health buffer by log level and more missing information
- Periodic update of host time to firmware
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmF3GMkACgkQSD+KveBX
+j5sZggAnuZLX9nUwT0hj/1Qu102RzDFjWrqG4DkWFbkCb4/yI9wckSjXW1J9c7v
/KyVU87SM58MQGwcaNn6t8DMJTtpieGL+emYW5P1raemmYPvRsL8X9Bb7qiQM5G0
tupLpzwmBuSJASWMC743VDGrnWUdD1woIbo6LXuU1nDVoxfOxIwUZPni5kOpo6TP
Ui6mrSYsT2HWUrem0lXrY5hCHkCinwpSLDr6J4Djv0PJcnwt5zTXbleH904Kdtxu
qMazklHCEh5njFQ/ukl5PNon+uUIpdCE9/Gn5IcmhbUEHGH5yLgmBaOfDY8M6TTV
4VPjhuGaNK8eyjoWsFPlxlCLWn1IqA==
=gv0F
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-10-25
Misc updates for mlx5 driver:
1) Misc updates and cleanups:
- Don't write directly to netdev->dev_addr, From Jakub Kicinski
- Remove unnecessary checks for slow path flag in tc module
- Fix unused function warning of mlx5i_flow_type_mask
- Bridge, support replacing existing FDB entry
2) Sub Functions, Reduction in memory usage:
- Reduce flow counters bulk query buffer size
- Implement max_macs devlink parameter
- Add devlink vendor params to control Event Queue sizes
- Added SF life cycle trace points by Parav/
3) From Aya, Firmware health buffer reporting improvements
- Print health buffer by log level and more missing information
- Periodic update of host time to firmware
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the check of !rc in (!rc && !resc_lock_params.b_granted) since it
is always true.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the check of !rc in (!rc && !params.b_granted) since it is always
true.
We should also use constant 0 for return.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the driver fails to allocate a new Rx buffer, it passes an empty Rx
descriptor (contains zero address and size) to the device and marks it
as invalid by setting the skb pointer in the descriptor's metadata to
NULL.
After processing enough Rx descriptors, the driver will try to process
the invalid descriptor, but will return immediately seeing that the skb
pointer is NULL. Since the driver no longer passes new Rx descriptors to
the device, the Rx queue will eventually become full and the device will
start to drop packets.
Fix this by recycling the received packet if allocation of the new
packet failed. This means that allocation is no longer performed at the
end of the Rx routine, but at the start, before tearing down the DMA
mapping of the received packet.
Remove the comment about the descriptor being zeroed as it is no longer
correct. This is OK because we either use the descriptor as-is (when
recycling) or overwrite its address and size fields with that of the
newly allocated Rx buffer.
The issue was discovered when a process ("perf") consumed too much
memory and put the system under memory pressure. It can be reproduced by
injecting slab allocation failures [1]. After the fix, the Rx queue no
longer comes to a halt.
[1]
# echo 10 > /sys/kernel/debug/failslab/times
# echo 1000 > /sys/kernel/debug/failslab/interval
# echo 100 > /sys/kernel/debug/failslab/probability
FAULT_INJECTION: forcing a failure.
name failslab, interval 1000, probability 100, space 0, times 8
[...]
Call Trace:
<IRQ>
dump_stack_lvl+0x34/0x44
should_fail.cold+0x32/0x37
should_failslab+0x5/0x10
kmem_cache_alloc_node+0x23/0x190
__alloc_skb+0x1f9/0x280
__netdev_alloc_skb+0x3a/0x150
mlxsw_pci_rdq_skb_alloc+0x24/0x90
mlxsw_pci_cq_tasklet+0x3dc/0x1200
tasklet_action_common.constprop.0+0x9f/0x100
__do_softirq+0xb5/0x252
irq_exit_rcu+0x7a/0xa0
common_interrupt+0x83/0xa0
</IRQ>
asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xc8/0x340
[...]
mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ
Fixes: eda6500a98 ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rx_dropped, tx_dropped, rx_frame_errors and rx_crc_errors are being
wrongly fetched from the target container rather than source percpu
ones.
No idea if that goes from the vendor driver or was brainoed during
the refactoring, but fix it either way.
Fixes: a97c69ba4f ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Łukasz Stelmach <l.stelmach@samsung.com>
Link: https://lore.kernel.org/r/20211023121148.113466-1-alobakin@pm.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using the VPD API allows to simplify the code and completely get
rid of t3_seeprom_write().
Link: https://lore.kernel.org/r/a0291004-dda3-ea08-4d6c-a2f8826c8527@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Use standard VPD API to replace t3_seeprom_write(), this prepares for
removing this function. Chelsio T3 maps the EEPROM write protect flag
to an arbitrary place in VPD address space, therefore we have to use
pci_write_vpd_any().
Link: https://lore.kernel.org/r/f768fdbe-3a16-d539-57d2-c7c908294336@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Using the VPD API allows to simplify the code and completely get rid
of t3_seeprom_read(). Note that we don't have to use pci_read_vpd_any()
here because a VPD quirk sets dev->vpd.len to the full EEPROM size.
Tested with a T320 card.
Link: https://lore.kernel.org/r/68ef15bb-b6bf-40ad-160c-aaa72c4a70f8@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Currently, max_macs is taking 70Kbytes of memory per function. This
size is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the number of max_macs.
For example, to reduce the number of max_macs to 1, execute::
$ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \
cmode driverinit
$ devlink dev reload pci/0000:00:0b.0
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Event EQ is an EQ which received the notification of almost all the
events generated by the NIC.
Currently, each event EQ is taking 512KB of memory. This size is not
needed in most use cases, and is critical with large scale. Hence,
allow user to configure the size of the event EQ.
For example to reduce event EQ size to 64, execute::
$ devlink resource set pci/0000:00:0b.0 path /event_eq_size/ size 64
$ devlink dev reload pci/0000:00:0b.0
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, each I/O EQ is taking 128KB of memory. This size
is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the size of I/O EQs.
For example, to reduce I/O EQ size to 64, execute:
$ devlink resource set pci/0000:00:0b.0 path /io_eq_size/ size 64
$ devlink dev reload pci/0000:00:0b.0
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The SWITCHDEV_FDB_ADD_TO_DEVICE is used for both adding new and replacing
existing entry. Implement support for replacing existing FDB entries in
mlx5 offload code.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Following two patterns in bridge code are used in multiple places where
similar code is duplicated:
- Lookup FDB entry from hashtable by address+vid pair.
- Notify software bridge and then delete existing FDB entry.
In order to improve code quality and prepare for following patch series
that also uses described patterns, extract the codes to dedicated helper
functions.
This commit doesn't change functionality.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Firmware logs its asserts also to non-volatile memory. In order to
reduce drift between the NIC and the host, the driver sets the host
epoch-time to the firmware every hour.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add log macro which gets log level as a parameter. Use the severity
read from the health buffer and the new log macro to log the health buffer
with severity as log level. Prior to this patch, health buffer was
printed in error log level regardless of its severity. Now the user may
filter dmesg (--level) or change kernel log level to focus on different
severity levels of firmware errors.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, the flow counters bulk query buffer takes a little more than
512KB of memory, which is aligned to the next power of 2, to 1MB.
The buffer size determines the maximum number of flow counters that can
be queried at a time. Thus, having a bigger buffer can improve
performance for users that need to query many flow counters.
SFs don't use many flow counters and don't need a big buffer. Since this
size is critical with large scale, reduce the size of the bulk query
buffer for SFs.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit is causing unused-function warning[1] when
CONFIG_MLX5_EN_RXNFC is not set.
Fix this by moving the function into the ifdef, where it's only used
[1]
warning: ‘mlx5i_flow_type_mask’ defined but not used [-Wunused-function]
Fixes: 9fbe1c25ec ("net/mlx5i: Enable Rx steering for IPoIB via ethtool")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
After previous changes, caller (mlx5e_tc_offload_fdb_rules()) already
checks for the slow path flag, and if set won't call offload/unoffload
sample.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
PTP is currently only supported on E810 devices, it is checked
in ice_ptp_init(). However, there is no check in ice_ptp_release().
For other E800 series devices, ice_ptp_release() will be wrongly executed.
Fix the following calltrace.
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
Workqueue: ice ice_service_task [ice]
Call Trace:
dump_stack_lvl+0x5b/0x82
dump_stack+0x10/0x12
register_lock_class+0x495/0x4a0
? find_held_lock+0x3c/0xb0
__lock_acquire+0x71/0x1830
lock_acquire+0x1e6/0x330
? ice_ptp_release+0x3c/0x1e0 [ice]
? _raw_spin_lock+0x19/0x70
? ice_ptp_release+0x3c/0x1e0 [ice]
_raw_spin_lock+0x38/0x70
? ice_ptp_release+0x3c/0x1e0 [ice]
ice_ptp_release+0x3c/0x1e0 [ice]
ice_prepare_for_reset+0xcb/0xe0 [ice]
ice_do_reset+0x38/0x110 [ice]
ice_service_task+0x138/0xf10 [ice]
? __this_cpu_preempt_check+0x13/0x20
process_one_work+0x26a/0x650
worker_thread+0x3f/0x3b0
? __kthread_parkme+0x51/0xb0
? process_one_work+0x650/0x650
kthread+0x161/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
Fixes: 4dd0d5c33c ("ice: add lock around Tx timestamp tracker flush")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the PF is a member of a link aggregate, and the driver
is removed, the process will hang unless we respond to the
NETDEV_UNREGISTER event that is sent to the event_handler
for LAG.
Add a case statement for the ice_lag_event_handler to unlink
the PF from the link aggregate.
Also remove code that was incorrectly applying a dev_hold to
peer_netdevs that were associated with the ice driver.
Fixes: df006dd4b1 ("ice: Add initial support framework for LAG")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 406f42fa0d ("net-next: When a bond have a massive amount of
VLANs...") introduced a rbtree for faster Ethernet address look up. To
maintain netdev->dev_addr in this tree we need to make all the writes to
it got through appropriate helpers.
Link: https://lore.kernel.org/r/20211019182604.1441387-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid
some open-coded arithmetic in allocator arguments.
Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A hard hang is observed whenever the ethernet interface is brought
down. If the PHY is stopped before the LPC core block is reset,
the SoC will hang. Comparing lpc_eth_close() and lpc_eth_open() I
re-arranged the ordering of the functions calls in lpc_eth_close() to
reset the hardware before stopping the PHY.
Fixes: b7370112f5 ("lpc32xx: Added ethernet driver")
Signed-off-by: Trevor Woerner <twoerner@gmail.com>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A widely deployed driver has a bug that will cause the driver not
to load when a max_mtu > 2048 is present in the device descriptor.
To avoid this bug while still enabling jumbo frames, we present a lower
max_mtu in the device descriptor and pass the actual max_mtu in
a separate device option.
The driver supports 2 different queue formats. To enable features
on one queue format, but not the other, a supported_features mask
was added to the device options in the device descriptor.
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This enables the driver to receive RX packets spread across multiple
buffers:
For a given multi-fragment packet the "packet continuation" bit is set
on all descriptors except the last one. These descriptors' payloads are
combined into a single SKB before the SKB is handed to the
networking stack.
This change adds a "packet buffer size" notion for RX queues. The
CreateRxQueue AdminQueue command sent to the device now includes the
packet_buffer_size.
We opt for a packet_buffer_size of PAGE_SIZE / 2 to give the
driver the opportunity to flip pages where we can instead of copying.
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This refactor moves the skb_head and skb_tail fields into a new
gve_rx_ctx struct. This new struct will contain information about the
current packet being processed. This is in preparation for
multi-descriptor RX packets.
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds himac error recovery module, link_error type and
ptp_error type for himac.
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds one ras error of bus related for roce, this error
including RRESP/BRESP and read poison error.
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the ethtool advertised link modes of FIBRE port is cleared to
zero when autoneg is off, so user can not get the advertised link modes
info directly from "ethtool <dev>" command.
In order to ameliorate this situation, update data of speeds, fec and pause
of advertised link modes when autoneg is off.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions of converting speed ability to ethtool link mode just
support setting mac->supported currently, to reuse these functions to
set ethtool link mode for others(i.e. advertising), delete the argument
mac and add argument link_mode.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mac statistics add pause/pfc durations in device version V3, we can
get total active cycle of pause/pfc from these durations.
As driver gets register number from firmware to calculate desc number to
query mac statistics, it needs to set mac statistics extended enable bit
in firmware command 0x701A to tell firmware that driver supports extended
mac statistics, otherwise firmware only returns register number of
version V1.
As pause/pfc durations are not supported by hardware of old version, they
should not been shown in command "ethtool -S ethX" in this case, so add
checking max register number of each mac statistic in their version.
If the max register number of one mac statistic is greater than register
number got from firmware, it means hardware does not support this mac
statistic, so ignore this statistic when get string and data of mac
statistic.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, driver queries number of mac statistics before querying mac
statistics. As the number of mac statistics is a fixed value in firmware,
it is redundant to query this number everytime before querying mac
statistics, it can just be queried once in initialization process and
saved in device specifications.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After querying mac statistics from firmware, driver copies data from
descriptors to struct mac_stats of hdev, and the number of copied data
is just according to the register number queried from firmware. There is
a problem that if the register number queried from firmware is larger
than data number of struct mac_stats, it will cause a copy overflow.
So if the firmware adds more mac statistics in later version, it is not
compatible with driver of old version.
To fix this problem, the number of copied data needs to be used the
minimum value between the register number queried from firmware and
data number of struct mac_stats.
The first descriptor has three data and there is one reserved, to
optimize the copy process, add this reserverd data to struct mac_stats.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since user may need to check the current configuration of the
interrupt coalesce, so add debugfs support for query this info.
Create a single file "coalesce_info" for it, and query it by
"cat coalesce_info", return the result to userspace.
For device whose version is above V3(include V3), the GL's register
contains usecs and 1us unit configuration. When get the usecs
configuration from this register, it will include the confusing unit
configuration, so add a GL mask to get the correct value, and add
a QL mask for the frames configuration as well.
The display style is below:
$ cat coalesce_info
tx interrupt coalesce info:
VEC_ID ALGO_STATE PROFILE_ID CQE_MODE TUNE_STATE STEPS_LEFT...
0 IN_PROG 4 EQE ON_TOP 0...
1 START 3 EQE LEFT 1...
rx interrupt coalesce info:
VEC_ID ALGO_STATE PROFILE_ID CQE_MODE TUNE_STATE STEPS_LEFT...
0 IN_PROG 3 EQE LEFT 1...
1 IN_PROG 0 EQE ON_TOP 0...
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA would like to remove the rtnl_lock from its
SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handlers, and the felix driver uses
the same MAC table functions as ocelot.
This means that the MAC table functions will no longer be implicitly
serialized with respect to each other by the rtnl_mutex, we need to add
a dedicated lock in ocelot for the non-atomic operations of selecting a
MAC table row, reading/writing what we want and polling for completion.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA would like to remove the rtnl_lock from its
SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handlers, and the felix driver uses
the same MAC table functions as ocelot.
This means that the MAC table functions will no longer be implicitly
serialized with respect to each other by the rtnl_mutex, we need to add
a dedicated lock in ocelot for the non-atomic operations of selecting a
MAC table row, reading/writing what we want and polling for completion.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows explicitly specifying which children are present on the mdio
bus. Additionally, it allows for non-phy MDIO devices on the bus.
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 16nm internal EPHY that is present in 7712 is actually a 16nm
Gigabit PHY which has been forced to operate in 10/100 mode. Its
controls are therefore via the EXT_GPHY_CTRL registers and not via the
EXT_EPHY_CTRL which are used for all GENETv5 adapters. Add a match on
the 7712 compatible string to allow that differentiation to happen.
On previous GENETv4 chips the EXT_CFG_IDDQ_GLOBAL_PWR bit was cleared by
default, but this is not the case with this chip, so we need to make
sure we clear it to power on the EPHY.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dma failure was reported in the raspberry pi github (issue #4117).
https://github.com/raspberrypi/linux/issues/4117
The use of dma_set_mask_and_coherent fixes the issue.
Tested on 32/64-bit raspberry pi CM4 and 64-bit ubuntu x86 PC with EVB-LAN7430.
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver needs to clean up and return when the initialization fails on resume.
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With current KPU profile, we have 2 reserved entries which can
be loaded from firmware to parse custom headers. Adding changes
to increase these reserved entries to 6.
And also removed KPU entries for unused LTYPEs like
NPC_LT_LA_IH_8_ETHER, NPC_LT_LA_IH_4_ETHER, NPC_LT_LA_IH_2_ETHER
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Silent merge conflict between these two:
3d677735d3 ("net/mlx5: Lag, move lag files into directory")
14fe2471c6 ("net/mlx5: Lag, change multipath and bonding to be mutually exclusive")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ASIX AX88796[1] is a versatile ethernet adapter chip, that can be
connected to a CPU with a 8/16-bit bus or with an SPI. This driver
supports SPI connection.
The driver has been ported from the vendor kernel for ARTIK5[2]
boards. Several changes were made to adapt it to the current kernel
which include:
+ updated DT configuration,
+ clock configuration moved to DT,
+ new timer, ethtool and gpio APIs,
+ dev_* instead of pr_* and custom printk() wrappers,
+ removed awkward vendor power managemtn.
+ introduced ethtool tunable to control SPI compression
[1] https://www.asix.com.tw/products.php?op=pItemdetail&PItemID=104;65;86&PLine=65
[2] https://git.tizen.org/cgit/profile/common/platform/kernel/linux-3.10-artik/
The other ax88796 driver is for NE2000 compatible AX88796L chip. These
chips are not compatible. Hence, two separate drivers are required.
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The code checks whether the skb had one-step TX timestamping enabled, in
order to schedule the work item for emptying the priv->tx_skbs queue.
That code checks for "tx_swbd->skb" directly, when we already had a skb
retrieved using enetc_tx_swbd_get_skb(tx_swbd) - a TX software BD can
also hold an XDP_TX packet or an XDP frame. But since the direct tx_swbd
dereference is in an "if" block guarded by the non-NULL quality of
"skb", accessing "tx_swbd->skb" directly is not wrong, just confusing.
Just use the local variable named "skb".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "priv" variable is needed in the "check_writeback" scope since
commit d398231219 ("enetc: add hardware timestamping support").
Since commit 7294380c52 ("enetc: support PTP Sync packet one-step
timestamping"), we also need "priv" in the larger function scope.
So the local variable from the "if" block scope is not needed, and
actually shadows the other one. Delete it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The enetc driver does not implement .ndo_change_mtu, instead it
configures the MAC register field PTC{Traffic Class}MSDUR[MAXSDU]
statically to a large value during probe time.
The driver used to configure only the max SDU for traffic class 0, and
that was fine while the driver could only use traffic class 0. But with
the introduction of mqprio, sending a large frame into any other TC than
0 is broken.
This patch fixes that by replicating per traffic class the static
configuration done in enetc_configure_port_mac().
Fixes: cbe9e83594 ("enetc: Enable TC offloading with mqprio")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20211020173340.1089992-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are two counters named "MAC tx frames", one of them is actually
incorrect. The correct name for that counter should be "MAC tx error
frames", which is symmetric to the existing "MAC rx error frames".
Fixes: 16eb4c85c9 ("enetc: Add ethtool statistics")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20211020165206.1069889-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use pci_info instead to avoid unnamed/uninitialized noise:
[197088.688729] sfc 0000:01:00.0: Solarflare NIC detected
[197088.690333] sfc 0000:01:00.0: Part Number : SFN5122F
[197088.729061] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no SR-IOV VFs probed
[197088.729071] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no PTP support
Inspired by fa44821a4d ("sfc: don't use netif_info et al before
net_device is registered") from Heiner Kallweit.
Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 1/10GbaseT modes were set up for cards with SFP+ cages in
3497ed8c85 ("sfc: report supported link speeds on SFP connections").
10GbaseT was likely used since no 10G fibre mode existed.
The missing fibre modes for 1/10G were added to ethtool.h in 5711a98221
("net: ethtool: add support for 1000BaseX and missing 10G link modes")
shortly thereafter.
The user guide available at https://support-nic.xilinx.com/wp/drivers
lists support for the following cable and transceiver types in section 2.9:
- QSFP28 100G Direct Attach Cables
- QSFP28 100G SR Optical Transceivers (with SR4 modules listed)
- SFP28 25G Direct Attach Cables
- SFP28 25G SR Optical Transceivers
- QSFP+ 40G Direct Attach Cables
- QSFP+ 40G Active Optical Cables
- QSFP+ 40G SR4 Optical Transceivers
- QSFP+ to SFP+ Breakout Direct Attach Cables
- QSFP+ to SFP+ Breakout Active Optical Cables
- SFP+ 10G Direct Attach Cables
- SFP+ 10G SR Optical Transceivers
- SFP+ 10G LR Optical Transceivers
- SFP 1000BASE‐T Transceivers
- 1G Optical Transceivers
(From user guide issue 28. Issue 16 which also includes older cards like
SFN5xxx/SFN6xxx has matching lists for 1/10/40G transceiver types.)
Regarding SFP+ 10GBASE‐T transceivers the latest guide says:
"Solarflare adapters do not support 10GBASE‐T transceiver modules."
Tested using SFN5122F-R7 (with 2 SFP+ ports). Supported link modes do not change
depending on module used (tested with 1000BASE-T, 1000BASE-BX10, 10GBASE-LR).
Before:
$ ethtool ext
Settings for ext:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 255
Transceiver: internal
Current message level: 0x000020f7 (8439)
drv probe link ifdown ifup rx_err tx_err hw
Link detected: yes
After:
$ ethtool ext
Settings for ext:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
1000baseX/Full
10000baseCR/Full
10000baseSR/Full
10000baseLR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 255
Transceiver: internal
Supports Wake-on: g
Wake-on: d
Current message level: 0x000020f7 (8439)
drv probe link ifdown ifup rx_err tx_err hw
Link detected: yes
Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-10-20
Sudheer Mogilappagari says:
This series introduces initial support for Application Device Queues(ADQ)
in ice driver. ADQ provides traffic isolation for application flows in
hardware and ability to steer traffic to a given traffic class. This
helps in aligning NIC queues to application threads.
Traffic classes are configured using mqprio framework of tc command
and mapped to HW channels(VSIs) in the driver. The queue set of each
traffic class is managed by corresponding VSI. Each traffic channel
can be configured with bandwidth rate-limiting limits and is offloaded
to the hardware through the mqprio framework by specifying the mode
option as 'channel' and shaper option as 'bw_rlimit'.
Next, the flows of application can be steered into a given traffic class
using "tc filter" command. The option "skip_sw hw_tc x" indicates
hw-offload of filtering and steering filtered traffic into specified TC.
Non-matching traffic flows through TC0.
When channel configuration are removed queue configuration is set to
default and filters configured on individual traffic classes are deleted.
example:
$ ethtool -K eth0 hw-tc-offload on
Configure 3 traffic classes and map priority 0,1,2 to TC0, TC1 and TC2
respectively. TC0 has 2 queues from offset 0 & TC1 has 8 queues from
offset 2 and TC2 has 4 queues from offset 10. Enable hardware offload
of channels.
$ tc qdisc add dev eth0 root mqprio num_tc 3 map 0 1 2 queues \
2@0 8@2 4@10 hw 1 mode channel
$ tc qdisc show dev eth0
qdisc mqprio 8001: root tc 2 map 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0
queues:(0:1) (2:9) (10:13)
mode:channel
Configure two filters to match based on dst ipaddr, dst tcp port and
redirect to TC1 and TC2.
$ tc qdisc add dev eth0 clsact
$ tc filter add dev eth0 protocol ip ingress prio 1 flower\
dst_ip 192.168.1.1/32 ip_proto tcp dst_port 80\
skip_sw hw_tc 1
$ tc filter add dev eth0 protocol ip ingress prio 1 flower\
dst_ip 192.168.1.1/32 ip_proto tcp dst_port 5001\
skip_sw hw_tc 2
$ tc filter show dev eth0 ingress
Delete traffic classes configuration:
$ sudo tc qdisc del dev eth0 root
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have a list of struct ocelot_bridge_vlan entries, we can
rewrite the pvid logic to simply point to one of those structures,
instead of having a separate structure with a "bool valid".
The NULL pointer will represent the lack of a bridge pvid (not to be
confused with the lack of a hardware pvid on the port, that is present
at all times).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot switchdev driver does not include the CPU port in the list of
flooding destinations for unknown traffic, instead that traffic is
supposed to match FDB entries to reach the CPU.
The addresses it installs are:
(a) the station MAC address, in ocelot_probe_port() and later during
runtime in ocelot_port_set_mac_address(). These are the VLAN-unaware
addresses. The VLAN-aware addresses are in ocelot_vlan_vid_add().
(b) multicast addresses added with dev_mc_add() (not bridge host MDB
entries) in ocelot_mc_sync()
(c) multicast destination MAC addresses for MRP in ocelot_mrp_save_mac(),
to make sure those are dropped (not forwarded) by the bridging
service, just trapped to the CPU
So we can see that the logic is slightly buggy ever since the initial
commit a556c76adc ("net: mscc: Add initial Ocelot switch support").
This is because, when ocelot_probe_port() runs, the port pvid is 0.
Then we join a VLAN-aware bridge, the pvid becomes 1, we call
ocelot_port_set_mac_address(), this learns the new MAC address in VID 1
(also fails to forget the old one, since it thinks it's in VID 1, but
that's not so important). Then when we leave the VLAN-aware bridge,
outside world is unable to ping our new MAC address because it isn't
learned in VID 0, the VLAN-unaware pvid.
[ note: this is strictly based on static analysis, I don't have hardware
to test. But there are also many more corner cases ]
The basic idea is that we should have a separation of concerns, and the
FDB entries used for standalone operation should be managed by the
driver, and the FDB entries used by the bridging service should be
managed by the bridge. So the standalone and VLAN-unaware bridge FDB
entries should not follow the bridge PVID, because that will only be
active when the bridge is VLAN-aware. So since the port pvid is
coincidentally zero during probe time, just make those entries
statically go to VID 0.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At present, the ocelot driver accepts a single egress-untagged bridge
VLAN, meaning that this sequence of operations:
ip link add br0 type bridge vlan_filtering 1
ip link set swp0 master br0
bridge vlan add dev swp0 vid 2 pvid untagged
fails because the bridge automatically installs VID 1 as a pvid & untagged
VLAN, and vid 2 would be the second untagged VLAN on this port. It is
necessary to delete VID 1 before proceeding to add VID 2.
This limitation comes from the fact that we operate the port tag, when
it has an egress-untagged VID, in the OCELOT_PORT_TAG_NATIVE mode.
The ocelot switches do not have full flexibility and can either have one
single VID as egress-untagged, or all of them.
There are use cases for having all VLANs as egress-untagged as well, and
this patch adds support for that.
The change rewrites ocelot_port_set_native_vlan() into a more generic
ocelot_port_manage_port_tag() function. Because the software bridge's
state, transmitted to us via switchdev, can become very complex, we
don't attempt to track all possible state transitions, but instead take
a more declarative approach and just make ocelot_port_manage_port_tag()
figure out which more to operate in:
- port is VLAN-unaware: the classified VLAN (internal, unrelated to the
802.1Q header) is not inserted into packets on egress
- port is VLAN-aware:
- port has tagged VLANs:
-> port has no untagged VLAN: set up as pure trunk
-> port has one untagged VLAN: set up as trunk port + native VLAN
-> port has more than one untagged VLAN: this is an invalid config
which is rejected by ocelot_vlan_prepare
- port has no tagged VLANs
-> set up as pure egress-untagged port
We don't keep the number of tagged and untagged VLANs, we just count the
structures we keep.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
First and foremost, the driver currently allocates a constant sized
4K * u32 (16KB memory) array for the VLAN masks. However, a typical
application might not need so many VLANs, so if we dynamically allocate
the memory as needed, we might actually save some space.
Secondly, we'll need to keep more advanced bookkeeping of the VLANs we
have, notably we'll have to check how many untagged and how many tagged
VLANs we have. This will have to stay in a structure, and allocating
another 16 KB array for that is again a bit too much.
So refactor the bridge VLANs in a linked list of structures.
The hook points inside the driver are ocelot_vlan_member_add() and
ocelot_vlan_member_del(), which previously used to operate on the
ocelot->vlan_mask[vid] array element.
ocelot_vlan_member_add() and ocelot_vlan_member_del() used to call
ocelot_vlan_member_set() to commit to the ocelot->vlan_mask.
Additionally, we had two calls to ocelot_vlan_member_set() from outside
those callers, and those were directly from ocelot_vlan_init().
Those calls do not set up bridging service VLANs, instead they:
- clear the VLAN table on reset
- set the port pvid to the value used by this driver for VLAN-unaware
standalone port operation (VID 0)
So now, when we have a structure which represents actual bridge VLANs,
VID 0 doesn't belong in that structure, since it is not part of the
bridging layer.
So delete the middle man, ocelot_vlan_member_set(), and let
ocelot_vlan_init() call directly ocelot_vlant_set_mask() which forgoes
any data structure and writes directly to hardware, which is all that we
need.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a cosmetic patch which clarifies what are the port tagging
options for Ocelot switches.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A small series to clean up the mlx5 mkey code across the mlx5_core and
InfiniBand.
* branch 'mlx5_mkey':
RDMA/mlx5: Attach ndescs to mlx5_ib_mkey
RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib
RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key
RDMA/mlx5: Remove pd from struct mlx5_core_mkey
RDMA/mlx5: Remove size from struct mlx5_core_mkey
RDMA/mlx5: Remove iova from struct mlx5_core_mkey
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Add support to add/delete channel specific filter using tc-flower.
For now, only supported action is "skip_sw hw_tc <tc_num>"
Filter criteria is specific to channel and it can be
combination of L3, L3+L4, L2+L4.
Example:
MATCH criteria Action
---------------------------
src and/or dest IPv4[6]/mask -> Forward to "hw_tc <tc_num>"
dest IPv4[6]/mask + dest L4 port -> Forward to "hw_tc <tc_num>"
dest MAC + dest L4 port -> Forward to "hw_tc <tc_num>"
src IPv4[6]/mask + src L4 port -> Forward to "hw_tc <tc_num>"
src MAC + src L4 port -> Forward to "hw_tc <tc_num>"
Adding tc-flower filter for channel using "hw_tc"
-------------------------------------------------
tc qdisc add dev <ethX> clsact
Above two steps are only needed the first time when adding
tc-flower filter.
tc filter add dev <ethX> protocol ip ingress prio 1 flower \
dst_ip 192.168.0.1/32 ip_proto tcp dst_port 5001 \
skip_sw hw_tc 1
tc filter show dev <ethX> ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1 hw_tc 1
eth_type ipv4
ip_proto tcp
dst_ip 192.168.0.1
dst_port 5001
skip_sw
in_hw in_hw_count 1
Delete specific filter:
-------------------------
tc filter del dev <ethx> ingress pref 1 handle 0x1 flower
Delete All filters:
------------------
tc filter del dev <ethX> ingress
Co-developed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support in driver for TC_QDISC_SETUP_MQPRIO. This support
enables instantiation of channels in HW using existing MQPRIO
infrastructure which is extended to be offloadable. This
provides a mechanism to configure dedicated set of queues for
each TC.
Configuring channels using "tc mqprio":
--------------------------------------
tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \
queues 4@0 4@4 4@8 hw 1 mode channel
Above command configures 3 TCs having 4 queues each. "hw 1 mode channel"
implies offload of channel configuration to HW. When driver processes
configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each
TC maps to HW VSI with specified queues.
User can optionally specify bandwidth min and max rate limit per TC
(see example below). If shaper params like min and/or max bandwidth
rate limit are specified, driver configures VSI specific rate limiter
in HW.
Configuring channels and bandwidth shaper parameters using "tc mqprio":
----------------------------------------------------------------
tc qdisc add dev <ethX> root mqprio \
num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \
shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \
max_rate 4Gbit 5Gbit 6Gbit 7Gbit
Command to view configured TCs:
-----------------------------
tc qdisc show dev <ethX>
Deleting TCs:
------------
tc qdisc del dev <ethX> root mqprio
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add infrastructure required for "ndo_setup_tc:qdisc_mqprio".
ice_vsi_setup is modified to configure traffic classes based
on mqprio data received from the stack. This includes low-level
functions to configure min, max rate-limit parameters in hardware
for traffic classes. Each traffic class gets mapped to a hardware
channel (VSI) which can be individually configured with different
bandwidth parameters.
Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Current Work Queue Entry (WQE) checksum (csum) flags in the ethernet
segment (eseg) in case of IPsec crypto offload datapath are not aligned
with PRM/HW expectations.
Currently the driver always sets the l3_inner_csum flag in case of IPsec
because of the wrong usage of skb->encapsulation as indicator for inner
IPsec header since skb->encapsulation is always ON for IPsec packets
since IPsec itself is an encapsulation protocol. The above forced a
failing attempts of calculating csum of non-existing segments (like in
the IP|ESP|TCP packet case which does not have an l3_inner) which led
to lots of packet drops hence the low throughput.
Fix by using xo->inner_ipproto as indicator for inner IPsec header
instead of skb->encapsulation in addition to setting the csum flags
as following:
* Tunnel Mode:
* Pkt: MAC IP ESP IP L4
* CSUM: l3_cs | l3_inner_cs | l4_inner_cs
*
* Transport Mode:
* Pkt: MAC IP ESP L4
* CSUM: l3_cs [ | l4_cs (checksum partial case)]
*
* Tunnel(VXLAN TCP/UDP) over Transport Mode
* Pkt: MAC IP ESP UDP VXLAN IP L4
* CSUM: l3_cs | l3_inner_cs | l4_inner_cs
Fixes: f1267798c9 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
IPsec crypto offload current Software Parser (SWP) fields settings in
the ethernet segment (eseg) are not aligned with PRM/HW expectations.
Among others in case of IP|ESP|TCP packet, current driver sets the
offsets for inner_l3 and inner_l4 although there is no inner l3/l4
headers relative to ESP header in such packets.
SWP provides the offsets for HW ,so it can be used to find csum fields
to offload the checksum, however these are not necessarily used by HW
and are used as fallback in case HW fails to parse the packet, e.g
when performing IPSec Transport Aware (IP | ESP | TCP) there is no
need to add SW parse on inner packet. So in some cases packets csum
was calculated correctly , whereas in other cases it failed. The later
faced csum errors (caused by wrong packet length calculations) which
led to lots of packet drops hence the low throughput.
Fix by setting the SWP fields as expected in a IP|ESP|TCP packet.
the following describe the expected SWP offsets:
* Tunnel Mode:
* SWP: OutL3 InL3 InL4
* Pkt: MAC IP ESP IP L4
*
* Transport Mode:
* SWP: OutL3 OutL4
* Pkt: MAC IP ESP L4
*
* Tunnel(VXLAN TCP/UDP) over Transport Mode
* SWP: OutL3 InL3 InL4
* Pkt: MAC IP ESP UDP VXLAN IP L4
Fixes: f1267798c9 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
During suspend flow the driver calls mlx5e_destroy_vlan_table() which
does not only delete the vlans steering flow rules, but also frees the
data on currently active vlans, thus it is not restored during resume
flow.
This fix keeps the vlan data on suspend flow and frees it only on driver
remove flow.
Fixes: 6783f0a21a ("net/mlx5e: Dynamic alloc vlan table for netdev when needed")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Dan Carpenter report:
The patch f47e04eb96: "net/mlx5: E-switch, Allow setting share/max
tx rate limits of rate groups" from May 31, 2021, leads to the
following Smatch static checker warning:
drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c:483 esw_qos_create_rate_group()
warn: passing zero to 'ERR_PTR'
If min rate normalization failed then error code may be overwritten to 0
if scheduling element destruction succeed. Ignore this value and always
return initial one.
Fixes: f47e04eb96 ("net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Both multipath and bonding events are changing the HW LAG state
independently.
Handling one of the features events while the other is already
enabled can cause unwanted behavior, for example handling
bonding event while multipath enabled will disable the lag and
cause multipath to stop working.
Fix it by ignoring bonding event while in multipath and ignoring FIB
events while in bonding mode.
Fixes: 544fe7c2e6 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As part of support for E810 XXV devices, some device ids were
inadvertently left out. Add those missing ids.
Fixes: 195fb97766 ("ice: add additional E810 device id")
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
The device ID for I226_K was incorrectly assigned, update the device
ID to the correct one.
Fixes: bfa5e98c9d ("igc: Add new device ID")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Update the HW MAC initialization flow. Do not gate DMA clock from
the modPHY block. Keeping this clock will prevent dropped packets
sent in burst mode on the Kumeran interface.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213651
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213377
Fixes: fb776f5d57 ("e1000e: Add support for Tiger Lake")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Mark Pearson <markpearson@lenovo.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We have the same LAN controller on different PCHs. Separate TGP board
type from SPT which will allow for specific fixes to be applied for
TGP platforms.
Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Mark Pearson <markpearson@lenovo.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When utilizing End to End delay mechanism, the following error messages show up:
|root@ehl1:~# ptp4l --tx_timestamp_timeout=50 -H -i eno2 -E -m
|ptp4l[950.573]: selected /dev/ptp3 as PTP clock
|ptp4l[950.586]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[950.586]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[952.879]: port 1: new foreign master 001395.fffe.4897b4-1
|ptp4l[956.879]: selected best master clock 001395.fffe.4897b4
|ptp4l[956.879]: port 1: assuming the grand master role
|ptp4l[956.879]: port 1: LISTENING to GRAND_MASTER on RS_GRAND_MASTER
|ptp4l[962.017]: port 1: received DELAY_REQ without timestamp
|ptp4l[962.273]: port 1: received DELAY_REQ without timestamp
|ptp4l[963.090]: port 1: received DELAY_REQ without timestamp
Commit f2fb6b6275 ("net: stmmac: enable timestamp snapshot for required PTP
packets in dwmac v5.10a") already addresses this problem for the dwmac
v5.10. However, same holds true for all dwmacs above version v4.10. Correct the
check accordingly. Afterwards everything works as expected.
Tested on Intel Atom(R) x6414RE Processor.
Fixes: 14f347334b ("net: stmmac: Correctly take timestamp for PTPv2")
Fixes: f2fb6b6275 ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a")
Suggested-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coverity complains of a possible dereference of a null return value.
5. returned_null: kzalloc returns NULL. [show details]
6. var_assigned: Assigning: si_data = NULL return value from kzalloc.
488 si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
489 cbd.length = cpu_to_le16(data_size);
490
491 dma = dma_map_single(&priv->si->pdev->dev, si_data,
492 data_size, DMA_FROM_DEVICE);
While this kzalloc() is unlikely to fail, I did notice that the function
returned without unmapping si_data.
Fix this by refactoring the error paths and checking for kzalloc()
failure.
Fixes: 888ae5a395 ("net: enetc: add tc flower psfp offload driver")
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-10-19
This series contains updates to ice driver only.
Brett implements support for ndo_set_vf_rate allowing for min_tx_rate
and max_tx_rate to be set for a VF.
Jesse updates DIM moderation to improve latency and resolves problems
with reported rate limit and extra software generated interrupts.
Wojciech moves a check for trusted VFs to the correct function,
disables lb_en for switchdev offloads, and refactors ethtool ops due
to differences in support for PF and port representor support.
Cai Huoqing utilizes the helper function devm_add_action_or_reset().
Gustavo A. R. Silva replaces uses of allocation to devm_kcalloc() as
applicable.
Dan Carpenter propagates an error instead of returning success.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, do the swapping, then
call eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Break the address up into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
HNS3 driver includes hns3.ko, hnae3.ko and hclge.ko.
hns3.ko includes network stack and pci_driver, hclge.ko includes
HW device action, algo_ops and timer task, hnae3.ko includes some
register function.
When SRIOV is enable and hclge.ko is removed, HW device is unloaded
but VF still exists, PF will not reply VF mbx messages, and cause
errors.
This patch fix it by disable SRIOV before remove hclge.ko.
Fixes: e2cb1dec97 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The task of VF reset is performed through the workqueue. It checks the
value of hdev->reset_pending to determine whether to exit the loop.
However, the value of hdev->reset_pending may also be assigned by
the interrupt function hclgevf_misc_irq_handle(), which may cause the
loop fail to exit and keep occupying the workqueue. This loop is not
necessary, so remove it and the workqueue will be rescheduled if the
reset needs to be retried or a new reset occurs.
Fixes: 1cc9bc6e58 ("net: hns3: split hclgevf_reset() into preparing and rebuilding part")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when there is a rx page allocation failure, it is
possible that polling may be stopped if there is no more packet
to be reveiced, which may cause queue stall problem under memory
pressure.
This patch makes sure polling is scheduled again when there is
any rx page allocation failure, and polling will try to allocate
receive buffers until it succeeds.
Now the allocation retry is added, it is unnecessary to do the rx
page allocation at the end of rx cleaning, so remove it. And reset
the unused_count to zero after calling hns3_nic_alloc_rx_buffers()
to avoid calling hns3_nic_alloc_rx_buffers() repeatedly under
memory pressure.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rx unused desc is the desc that need attatching new buffer
before refilling to hw to receive new packet, the number of
desc need attatching new buffer is calculated using next_to_use
and next_to_clean. when next_to_use == next_to_clean, currently
hns3 driver assumes that all the desc has the buffer attatched,
but 'next_to_use == next_to_clean' also means all the desc need
attatching new buffer if hw has comsumed all the desc and the
driver has not attatched any buffer to the desc yet.
This patch adds 'refill' in desc_cb to indicate whether a new
buffer has been refilled to a desc.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the max tx size supported by the hw is calculated by
using the max BD num supported by the hw. According to the hw
user manual, the max tx size is fixed value for both non-TSO and
TSO skb.
This patch updates the max tx size according to the manual.
Fixes: 8ae10cfb5089("net: hns3: support tx-scatter-gather-fraglist feature")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If ets dwrr bandwidth of tc is set to 0, the hardware will switch to SP
mode. In this case, this tc may occupy all the tx bandwidth if it has
huge traffic, so it violates the purpose of the user setting.
To fix this problem, limit the ets dwrr bandwidth must greater than 0.
Fixes: cacde272dd ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, DWRR of tc will be initialized to a fixed value when this tc
is enabled, but it is not been reset to 0 when this tc is disabled. It
cause a problem that the DWRR of unused tc is not 0 after using tc tool
to add and delete multi-tc parameters.
For examples, after enabling 4 TCs and restoring to 1 TC by follow
tc commands:
$ tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 1 2 3 queues \
8@0 8@8 8@16 8@24 hw 1 mode channel
$ tc qdisc del dev eth0 root
Now there is just one TC is enabled for eth0, but the tc info querying by
debugfs is shown as follow:
$ cat /mnt/hns3/0000:7d:00.0/tm/tc_sch_info
enabled tc number: 1
weight_offset: 14
TC MODE WEIGHT
0 dwrr 100
1 dwrr 100
2 dwrr 100
3 dwrr 100
4 dwrr 0
5 dwrr 0
6 dwrr 0
7 dwrr 0
This patch fixes it by resetting DWRR of tc to 0 when tc is disabled.
Fixes: 848440544b ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add configuration of interrupt type and fifo interrupt enable of TM QCN
error event if enabled, otherwise this event will not be reported when
there is error.
Fixes: d914971df0 ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return the error code if ice_eswitch_configure() fails. Don't return
success.
Fixes: 1c54c83993 ("ice: enable/disable switchdev when managing VFs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use 2-factor multiplication argument form devm_kcalloc() instead
of devm_kzalloc().
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The helper function devm_add_action_or_reset() will internally
call devm_add_action(), and if devm_add_action() fails then it will
execute the action mentioned and return the error code. So
use devm_add_action_or_reset() instead of devm_add_action()
to simplify the error handling, reduce the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch improves a few things:
- it fixes issue where ethtool -i reports that PR supports
priv-flags and tests when in fact it does not support them
- instead of using the same functions for both PF and PR ethtool ops,
this patch introduces separate ops for both cases and internal
functions with core logic.
- prevent accessing VF VSI while VF is not ready by calling
ice_check_vf_ready_for_cfg
- all PR specific functions in ethtool.c were moved to one place in
file
- instead overwriting n_priv_flags in ice_repr_get_drvinfo,
priv-flags code was moved from __ice_get_drvinfo to ice_get_drvinfo
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently it is not possible to set/unset lb_en and lan_en flags
for advanced rules during their creation. Both flags are enabled
by default. In case of switchdev offloads for egress traffic we
need lb_en to be disabled. Because of that, we work around it by
updating the rule immediately after its creation.
This change allows us to set/unset those flags right away and it
gets rid of old workaround as well. Using ice_adv_rule_flags_info
structure we can pass info about flags we want to be set for
a given advanced rule. Flags are stored in flags_info.act.
Values from act would be used only if act_valid was set to true,
otherwise default values would be used.
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Merge issues caused the check for switchdev mode has been inserted
in wrong place. It should be in ice_set_vf_trust not in ice_set_vf_mac.
Trusted VFs are forbidden in switchdev mode because they should
be configured only from the host side.
Fixes: 1c54c83993 ("ice: enable/disable switchdev when managing VFs")
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver tried to work around missing completion events that occurred
while interrupts are disabled, by triggering a software interrupt
whenever we exit polling (but we had to have polled at least once).
This was causing a *lot* of extra interrupts for some workloads like
NVMe over TCP, which resulted in regressions in performance. It was also
visible when polling didn't prevent interrupts when busy_poll was
enabled.
Fix the extra interrupts by utilizing our previously unused 3rd ITR
(interrupt throttle) index and set it to 20K interrupts per second, and
then trigger a software interrupt within that rate limit.
While here, slightly refactor the code to avoid an overwrite of a local
variable in the case of wb_en = true.
Fixes: b7306b42be ("ice: manage interrupts during poll exit")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If the adaptive settings are changed with
ethtool -C ethx adaptive-rx off adaptive-tx off
then the interrupt rate limit should be maintained as a user set value,
but only if BOTH adaptive settings are off. Fix a bug where the rate
limit that was being used in adaptive mode was staying set in the
register but was not reported correctly by ethtool -c ethx. Due to long
lines include a small refactor of q_vector variable.
Fixes: b8b4772377 ("ice: refactor interrupt moderation writes")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver was having trouble with unreliable latency when doing single
threaded ping-pong tests. This was root caused to the DIM algorithm
landing on a too slow interrupt value, which caused high latency, and it
was especially present when queues were being switched frequently by the
scheduler as happens on default setups today.
In attempting to improve this, we allow the upper rate limit for
interrupts to move to rate limit of 4 microseconds as a max, which means
that no vector can generate more than 250,000 interrupts per second. The
old config was up to 100,000. The driver previously tried to program the
rate limit too frequently and if the receive and transmit side were both
active on the same vector, the INTRL would be set incorrectly, and this
change fixes that issue as a side effect of the redesign.
This driver will operate from now on with a slightly changed DIM table
with more emphasis towards latency sensitivity by having more table
entries with lower latency than with high latency (high being >= 64
microseconds).
The driver also resets the DIM algorithm state with a new stats set when
there is no work done and the data becomes stale (older than 1 second),
for the respective receive or transmit portion of the interrupt.
Add a new helper for setting rate limit, which will be used more
in a followup patch.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
A small series to clean up the mlx5 mkey code across the mlx5_core and
InfiniBand.
* branch 'mlx5_mkey':
RDMA/mlx5: Attach ndescs to mlx5_ib_mkey
RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib
RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key
RDMA/mlx5: Remove pd from struct mlx5_core_mkey
RDMA/mlx5: Remove size from struct mlx5_core_mkey
RDMA/mlx5: Remove iova from struct mlx5_core_mkey
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Implement ndo_set_vf_rate to support setting of min_tx_rate and
max_tx_rate; set the appropriate bandwidth in the scheduler for the
node representing the specified VF VSI.
Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Replacing dma_pool_alloc/memset() with dma_pool_zalloc()
to simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable will be assigned again later in the if condition,
there is no meaning there.
drivers/net/ethernet/broadcom/tg3.c:5750:2 warning:
Value stored to 'current_link_up' is never read.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This assignment statement is meaningless, because the statement
will execute to the tag "set_itr_now".
The clang_analyzer complains as follows:
drivers/net/ethernet/intel/e1000e/netdev.c:2552:3 warning:
Value stored to 'current_itr' is never read.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Vadym and Taras report that the current behavior of the driver
is not exactly expected and it's better to add the port id in
like other drivers do.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Break the address up into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Break the address up into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Invert the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In mlx5_core and vdpa there is no use of mlx5_core_mkey members except
for the key itself.
As preparation for moving mlx5_core_mkey to mlx5_ib, the occurrences of
struct mlx5_core_mkey in all modules except for mlx5_ib are replaced by
a u32 key.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
There is no read of mkey->pd, only write. Remove it.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
mkey->size is already stored in ibmr->length, no need to store it here.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
iova is already stored in ibmr->iova, no need to store it here.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Permit offloading qdiscs below RED and TBF. In order to avoid having to
implement trivial propagating callbacks for get_prio_bitmap and
get_tclass_num, extend mlxsw_sp_qdisc_get_prio_bitmap() and
..._get_tclass_num() to handle the lack of the callback as a cue to forward
the request to the parent.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A following patch will enable offloading qdiscs that are deeper than
directly under root qdisc. Currently the topology validation consists of
demanding a root qdisc position for ETS and PRIO. Since RED and TBF are
considered classless, this is enough. In order to prevent some nonsensical
combinations when RED and TBF become classful, introduce a more general
topology validator.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Spectrum, there are no per-TC TX counters. Instead, mlxsw uses per-prio
counters and aggregates them according to the priomap. Therefore when
priomap changes, the counter base values need to be reset to reflect the
change. Previously, this was only done for the sole child qdisc, but a
following patch makes RED and TBF classful. Thus apply the request to the
whole sub-tree.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qdisc graft operations have so far been reported at PRIO, ETS and RED, with
RED events ignored, because RED was not considered a classful qdisc. A
following patch will make mlxsw recognize RED and TBF as classful qdiscs,
and thus it is necessary to validate grafting at these qdiscs as well.
Rename the existing graft validator to make it clear that it is a generic
function, and invoke for RED and TBF graft events as well. Drop the
unnecessary PRIO helper and invoke the graft validator directly for PRIO as
well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently ETS and PRIO are the only offloaded classful qdiscs. Since they
are both similar, their destroy handler is the same, and it handles
children destruction itself. But now it is possible to do it generically
for any classful qdisc. Therefore promote the recursive destruction from
the ETS handler to mlxsw_sp_qdisc_destroy(), so that RED and TBF pick it up
in follow-up patches.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract from __mlxsw_sp_qdisc_ets_replace() two helpers for handling of one
future FIFO resp. reinitializing the array of future FIFOs.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when keeping track of qdiscs, mlxsw notes the TC and priomap
corresponding to each qdisc. That is fine currently, as there only ever is
one level of qdiscs to update: the direct children of ETS / PRIO. However
as deeper structures are made offloadable, ETS would need to update these
values for the complete subtree, and interim qdiscs would need to remember
to propagate the value.
Instead, reverse the responsibility: child qdiscs can ask their parent what
their TC and priomap are. ETS / PRIO know the answer right away, or there
are defaults for when the root qdisc does not assign them (e.g. when RED is
used as root qdisc). When RED and TBF become classful, they will simply
forward the request up to their parent.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Maor Gottlieb says:
========================
Use hash to select the affinity port in VF LAG
Current VF LAG architecture is based on QP association with a port.
QP must be created after LAG is enabled to allow association with non-native port.
VM Packets going on slow-path to eSwicth manager (SW path or hairpin) will be transmitted
through a different QP than the VM. This means that Different packets of the same flow might
egress from different physical ports.
This patch-set solves this issue by moving the port selection to be based on the hash function
defined by the bond.
When the device is moved to VF LAG mode, the driver creates TTC (traffic type classifier) flow
tables in order to classify the packet and steer it to the relevant hash function. Similar to what
is done in the mlx5 RSS implementation.
Each rule in the TTC table, forwards the packet to port selection flow table which has one hash
split flow group which contains two "catch all" flow table entries. Each entry point to the
relative uplink port. As shown below:
-------------------
| FT |
TTC rule -> | ----------- |
| FG| FTE --|-|-----> uplink of port #1
| | FTE --|-|-----> uplink of port #2
| ----------- |
-------------------
Hash split flow group is flow group that created as type of HASH_SPLIT and associated with match definer.
The match definer define the fields which included in the hash calculation.
The driver creates the match definer according to the xmit hash policy of the bond driver.
Patches overview:
========================
Minor E-Switch updates:
- Patch #12, dynamic allocation of dest array
- Patch #13, increase number of forward destinations to 32
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFuOPIACgkQSD+KveBX
+j4Ejgf/ScZmLSMvPu8doQ+eLG6nSiA5EAXkJqx0dwZZzzB4hSJleYuTveab/rgA
HNhSZiVI8YXrscqvBAWNVAE8wQ0DFgYtDFs5UfUc/Pd+dZsqk7+ecHlo+kBCkYSn
3fNTKSkdzZsGz5hOu0eP3rteIvTf9JrtB07rfBLbma/nuTnSGxIFYQpDe7H52jW2
pov9LEonara9kjJ7BFtaupQMwCpVwYuPkMPTnt/qO1IOE18GHnK5SXgdMSlLBdjY
HKfBF6jXWooDlN9nAxvH+2RWsumng0pRyujw0uvHTQ6SDNoOKf1ucj8znOHCFC1h
sTfzAX6jUdrRtJ6hWLo+p9YlkLtjSw==
=TVAs
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
mlx5-updates-2021-10-18
Maor Maor Gottlieb says:
========================
Use hash to select the affinity port in VF LAG
Current VF LAG architecture is based on QP association with a port.
QP must be created after LAG is enabled to allow association with non-native port.
VM Packets going on slow-path to eSwicth manager (SW path or hairpin) will be transmitted
through a different QP than the VM. This means that Different packets of the same flow might
egress from different physical ports.
This patch-set solves this issue by moving the port selection to be based on the hash function
defined by the bond.
When the device is moved to VF LAG mode, the driver creates TTC (traffic type classifier) flow
tables in order to classify the packet and steer it to the relevant hash function. Similar to what
is done in the mlx5 RSS implementation.
Each rule in the TTC table, forwards the packet to port selection flow table which has one hash
split flow group which contains two "catch all" flow table entries. Each entry point to the
relative uplink port. As shown below:
-------------------
| FT |
TTC rule -> | ----------- |
| FG| FTE --|-|-----> uplink of port #1
| | FTE --|-|-----> uplink of port #2
| ----------- |
-------------------
Hash split flow group is flow group that created as type of HASH_SPLIT and associated with match definer.
The match definer define the fields which included in the hash calculation.
The driver creates the match definer according to the xmit hash policy of the bond driver.
Patches overview:
========================
Minor E-Switch updates:
- Patch #12, dynamic allocation of dest array
- Patch #13, increase number of forward destinations to 32
Signed-off-by: David S. Miller <davem@davemloft.net>
Mateusz Palczewski says:
====================
40GbE Intel Wired LAN Driver Updates 2021-10-18
Use single state machine for driver initialization
and for service initialized driver. The init state
machine implemented in init_task() is merged
into the watchdog_task(). The init_task() function
is removed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase supported number of forward destinations in the same rule, local
and remote, from 2 to 32.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use dynamic allocation for the dest array in preparation for
the next patch which increase MLX5_MAX_FLOW_FWD_VPORTS and
will cause stack allocation to be bigger than 1024 bytes.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use the steering based solution for select the affinity port
when the LAG mode is based on hash policy and the device support
in port selection flow table.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add create function, build the steering tables, TTC and definers
according to the LAG hash type.
The destroy function, destroys all the steering components.
The modify functions is used when the bond mapping changes and it
iterates over all the rules in the definers and modifies them to steer
the packet to the relevant active ports.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support to create inner and outer TTC tables for LAG port
selection. These tables are used to classify the packets in
order to select the related definer.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Every definer will consist of a flow table with a single hash group
with exactly two flow table entries, one for each device port.
The destination of these entries is the uplink vport according to the
port state and hash policy.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Set the related bits in the match definer mask according to the
TT mapping.
This mask will be used to create the match definers.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Generate a traffic type bitmap that will define which
steering objects we need to create for the steering
based LAG.
Bits in this bitmap are set according to the LAG hash type.
In addition, have a field that indicate if the lag is in encap
mode or not.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Downstream patches add another lag related file so it makes
sense to have all the lag files in a dedicated directory.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The uplink destination type should be used in rules to steer the
packet to the uplink when the device is in steering based LAG mode.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Introduce new APIs to create and destroy flow matcher
for given format id.
Flow match definer object is used for defining the fields and
mask used for the hash calculation. User should mask the desired
fields like done in the match criteria.
This object is assigned to flow group of type hash. In this flow
group type, packets lookup is done based on the hash result.
This patch also adds the required bits to create such flow group.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add new port selection flow steering namespace. Flow steering rules in
this namespaceare are used to determine the physical port for egress
packets.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add bitmasks to ttc_params to indicate if rule is valid or not.
It will allow to create TTC table with support only in part of the
traffic types.
In later patches which introduce the steering based LAG port selection,
TTC will be created with only part of the rules according to the hash
type.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Change the TCP common variable - "iscsi_ooo" to "ooo_opq".
This variable is common between all the TCP L5 protocols and not
specific to iSCSI.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Link: https://lore.kernel.org/r/20211015124118.29041-2-smalin@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new warning in clang points out two places in this driver where
boolean expressions are being used with a bitwise OR instead of a
logical one:
drivers/net/ethernet/netronome/nfp/nfp_asm.c:199:20: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
reg->src_lmextn = swreg_lmextn(lreg) | swreg_lmextn(rreg);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
drivers/net/ethernet/netronome/nfp/nfp_asm.c:199:20: note: cast one or both operands to int to silence this warning
drivers/net/ethernet/netronome/nfp/nfp_asm.c:280:20: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
reg->src_lmextn = swreg_lmextn(lreg) | swreg_lmextn(rreg);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
drivers/net/ethernet/netronome/nfp/nfp_asm.c:280:20: note: cast one or both operands to int to silence this warning
2 errors generated.
The motivation for the warning is that logical operations short circuit
while bitwise operations do not. In this case, it does not seem like
short circuiting is harmful so implement the suggested fix of changing
to a logical operation to fix the warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1479
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20211018193101.2340261-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use single state machine for driver initialization and for service
initialized driver. The init state machine implemented in init_task()
is merged into the watchdog_task(). The init_task() function is
removed.
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This commit adds a new state, __IAVF_INIT_FAILED to the state machine.
From now on initialization functions report errors not by returning an
error value, but by changing the state to indicate that something went
wrong.
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Replace state changes of iavf state machine
with a method that also tracks the previous
state the machine was on.
This change is required for further work with
refactoring init and watchdog state machines.
Tracking of previous state would help us
recover iavf after failure has occurred.
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
mlx5_tout_ms() returns a u64, we can't directly divide it.
This is not a problem here, @timeout which is the value
that actually matters here is already a ulong, so this
implies storing return value of mlx5_tout_ms() on a ulong
should be fine.
This fixes:
ERROR: modpost: "__udivdi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!
Fixes: 32def4120e ("net/mlx5: Read timeout values from DTOR")
Link: https://lore.kernel.org/r/20211018172608.1069754-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Everything except the first 32 bits was lost when the pause flags were
added. This makes the 50000baseCR2 mode flag (bit 34) not appear.
I have tested this with a 10G card (SFN5122F-R7) by modifying it to
return a non-legacy link mode (10000baseCR).
Signed-off-by: Erik Ekman <erik@kryo.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following coccicheck warning:
./drivers/net/ethernet/mscc/ocelot_vsc7514.c:946:1-33: WARNING: Function
for_each_available_child_of_node should have of_node_put() before goto.
Early exits from for_each_available_child_of_node should decrement the
node reference counter.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following coccicheck warning:
./drivers/net/ethernet/microchip/sparx5/s4parx5_main.c:723:1-33: WARNING: Function
for_each_available_child_of_node should have of_node_put() before goto
Early exits from for_each_available_child_of_node should decrement the
node reference counter.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Building the VF and PF side of this driver differently, with one being
a loadable module and the other one built-in results in a link failure
for the common PTP driver:
ld.lld: error: undefined symbol: __this_module
>>> referenced by otx2_ptp.c
>>> net/ethernet/marvell/octeontx2/nic/otx2_ptp.o:(otx2_ptp_init) in archive drivers/built-in.a
>>> referenced by otx2_ptp.c
>>> net/ethernet/marvell/octeontx2/nic/otx2_ptp.o:(otx2_ptp_init) in archive drivers/built-in.a
Move the otx2_ptp.c code into a separate module that gets built for
both configurations, making it built-in if at least one of the other
two is built-in.
Fixes: 43510ef4dd ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic support for UniPhier NX1 SoC. This includes a compatible string
and SoC-dependent data.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up to now w5100_remove() returns zero unconditionally. Make it return
void instead which makes it easier to see in the callers that there is
no error to handle.
Also the return value of platform and spi remove callbacks is ignored
anyway.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up to now ks8851_remove_common() returns zero unconditionally. Make it
return void instead which makes it easier to see in the callers that
there is no error to handle.
Also the return value of platform and spi remove callbacks is ignored
anyway.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only factor differentiating per-CPU bstats data type (struct
gnet_stats_basic_cpu) from the packed non-per-CPU one (struct
gnet_stats_basic_packed) was a u64_stats sync point inside the former.
The two data types are now equivalent: earlier commits added a u64_stats
sync point to the latter.
Combine both data types into "struct gnet_stats_basic_sync". This
eliminates redundancy and simplifies the bstats read/write APIs.
Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit
architectures, u64_stats sync points do not use sequence counter
protection.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set(). ixgb_get_ee_mac_addr() is used with
a non-nevdev->dev_addr pointer so we can't deal with the problem
inside it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We'll want to make netdev->dev_addr const, remove the local
helper which is missing a const qualifier on the argument
and use ether_addr_to_u64().
Similar story to mlx4.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Pass a netdev into the helper instead of just the address,
read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Copy the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Use a zero'ed array on the stack, then call eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Use an array on the stack, then call eth_hw_addr_set().
eth_hw_addr_set() is after error checking, this should
be fine, error propagates all the way to failing probe.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Break the address apart into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
macaddr[] is a module param, and int, so copy the address into
an array of u8 on the stack, then call eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) From Rongwei Liu:
Use system_image_guid and native_port_num when bonding.
Don't relay on PCIe ids anymore.
With some specific NIC, the physical devices may have PCIe IDs like
0001:01:00.0/1 and 0002:02:00.0/1. All of these devices should have
the same system_image_guid and device index can be queried from
native_port_num.
For matching sibling devices/port of the same HCA, compare the HCA
GUID reported on each device rather than just assuming PCIe ids have
similar attributes.
2) From Amir Tzin: Use HCA defined Timouts
Replace hard coded timeouts with values stored by firmware in default
timeouts register (DTOR). Timeouts are read during driver load. If DTOR
is not supported by firmware then fallback to hard coded defaults
instead.
3) From Shay Drory: Disable roce at HCA level
Disable RoCE in Firmware when devlink roce parameter is set to off.
4) A small set of trivial cleanups
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFqHtsACgkQSD+KveBX
+j592AgAlO2CFEmYMSJmeQ7jBBIFCIq26XwWYJncuuwJsY+RxilzSRVTZGEydzw3
dfvYpUGvOox0jjRXm1kohz/BLVUW/crxhBDgr2oYYvP1yCLJtPa3lDzw5rABdt46
+cyweKQatKE1Bfbc3t9+aecHRzi84EcwjfVB1dc1/CuzmuwThMBaE0TxuGk8fYMc
euLKIYQHuGzY6pFMiAHp6oxb5D0H0PPzjSZD+9Jubx7j7NiKPqfGu91B1PgNCkji
zmbIOtj1xqQHzwVErUv8AbFDbjlajqUdAG6E9j5xQZkjPhn/1VlxS8VWPhGBO+w8
R+Ta3k/0vNFattQdtFFkTRaiElCpLw==
=kDng
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-10-15
1) From Rongwei Liu:
Use system_image_guid and native_port_num when bonding.
Don't relay on PCIe ids anymore.
With some specific NIC, the physical devices may have PCIe IDs like
0001:01:00.0/1 and 0002:02:00.0/1. All of these devices should have
the same system_image_guid and device index can be queried from
native_port_num.
For matching sibling devices/port of the same HCA, compare the HCA
GUID reported on each device rather than just assuming PCIe ids have
similar attributes.
2) From Amir Tzin: Use HCA defined Timouts
Replace hard coded timeouts with values stored by firmware in default
timeouts register (DTOR). Timeouts are read during driver load. If DTOR
is not supported by firmware then fallback to hard coded defaults
instead.
3) From Shay Drory: Disable roce at HCA level
Disable RoCE in Firmware when devlink roce parameter is set to off.
4) A small set of trivial cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With specific NICs, the PFs may have different PCIe ids like
0001:01:00.0/1 and 0002:02:00:00/1.
For PFs with the same system_image_guid, driver should consider
them under the same physical NIC and they are legal to bond together.
If firmware doesn't support system_image_guid, set it to zero and
fallback to use PCIe ids.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When querying system_image_guid from firmware, we should check return
value first. The buffer content is valid only if query succeed.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.
So, refactor the code a bit to use the purpose specific kcalloc()
function instead of the argument size * count in the kzalloc() function.
[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As multiple places EOPNOTSUPP and EINVAL is returned from driver
it becomes difficult to understand the reason only with error code.
With the netlink extack message exact reason will be known and will
aid in debugging.
Signed-off-by: Abhiram R N <abhiramrn@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If CT fails to initialize it's rhashtables, it doesn't destroy
the ct nat global table.
Destroy the ct nat global table on ct init failure.
Fixes: d7cade5137 ("net/mlx5e: check return value of rhashtable_init")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, when a user disables roce via the devlink param, this change
isn't passed down to the device.
If device allows disabling RoCE at device level, make use of it. This
instructs the device to skip memory allocations related to RoCE
functionality which otherwise is done by the device.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Enable steering IPoIB packets via ethtool, the same way it is done today
for Ethernet packets.
Signed-off-by: Moosa Baransi <moosab@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, SMFS mode doesn't support rx-loopback flows which causes bridge
egress rules to be rejected because without hint rules for both rx and tx
destinations are created by default. Provide explicit flow source hints for
compatibility with SMFS.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Replace hard coded timeouts with values stored by firmware in default
timeouts register (DTOR). Timeouts are read during driver load. If DTOR
is not supported by firmware then fallback to hard coded defaults
instead.
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Replace hard coded timeouts with values stored in firmware's init
segment. Timeouts are read from init segment during driver load. If init
segment timeouts are not supported then fallback to hard coded defaults
instead. Also move pre initialization timeouts which cannot be read from
firmware to the new mechanism.
Signed-off-by: Amir Tzin <amirtz@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Go through the code base and use ice_for_each_* macros. While at it,
introduce ice_for_each_xdp_txq() macro that can be used for looping over
xdp_rings array.
Commit is not introducing any new functionality.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Under rare circumstances there might be a situation where a requirement
of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
resources have to be shared between CPUs. This yields a need for placing
accesses to xdp_ring inside a critical section protected by spinlock.
These accesses happen to be in the hot path, so let's introduce the
static branch that will be triggered from the control plane when driver
could not provide Tx queue dedicated for XDP on each CPU.
Currently, the design that has been picked is to allow any number of XDP
Tx queues that is at least half of a count of CPUs that platform has.
For lower number driver will bail out with a response to user that there
were not enough Tx resources that would allow configuring XDP. The
sharing of rings is signalled via static branch enablement which in turn
indicates that lock for xdp_ring accesses needs to be taken in hot path.
Approach based on static branch has no impact on performance of a
non-fallback path. One thing that is needed to be mentioned is a fact
that the static branch will act as a global driver switch, meaning that
if one PF got out of Tx resources, then other PFs that ice driver is
servicing will suffer. However, given the fact that HW that ice driver
is handling has 1024 Tx queues per each PF, this is currently an
unlikely scenario.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Optimize Tx descriptor cleaning for XDP. Current approach doesn't
really scale and chokes when multiple flows are handled.
Introduce two ring fields, @next_dd and @next_rs that will keep track of
descriptor that should be looked at when the need for cleaning arise and
the descriptor that should have the RS bit set, respectively.
Note that at this point the threshold is a constant (32), but it is
something that we could make configurable.
First thing is to get away from setting RS bit on each descriptor. Let's
do this only once NTU is higher than the currently @next_rs value. In
such case, grab the tx_desc[next_rs], set the RS bit in descriptor and
advance the @next_rs by a 32.
Second thing is to clean the Tx ring only when there are less than 32
free entries. For that case, look up the tx_desc[next_dd] for a DD bit.
This bit is written back by HW to let the driver know that xmit was
successful. It will happen only for those descriptors that had RS bit
set. Clean only 32 descriptors and advance the DD bit.
Actual cleaning routine is moved from ice_napi_poll() down to the
ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any
SKBs in there that would rely on interrupts for the cleaning. Nice side
effect is that for rare case of Tx fallback path (that next patch is
going to introduce) we don't have to trigger the SW irq to clean the
ring.
With those two concepts, ring is kept at being almost full, but it is
guaranteed that driver will be able to produce Tx descriptors.
This approach seems to work out well even though the Tx descriptors are
produced in one-by-one manner. Test was conducted with the ice HW
bombarded with packets from HW generator, configured to generate 30
flows.
Xdp2 sample yields the following results:
<snip>
proto 17: 79973066 pkt/s
proto 17: 80018911 pkt/s
proto 17: 80004654 pkt/s
proto 17: 79992395 pkt/s
proto 17: 79975162 pkt/s
proto 17: 79955054 pkt/s
proto 17: 79869168 pkt/s
proto 17: 79823947 pkt/s
proto 17: 79636971 pkt/s
</snip>
As that sample reports the Rx'ed frames, let's look at sar output.
It says that what we Rx'ed we do actually Tx, no noticeable drops.
Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32
with tx_busy staying calm.
When compared to a state before:
Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64
it can be observed that the amount of txpck/s is almost doubled, meaning
that the performance is improved by around 90%. All of this due to the
drops in the driver, previously the tx_busy stat was bumped at a 7mpps
rate.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
With rings being split, it is now convenient to introduce a pointer to
XDP ring within the Rx ring. For XDP_TX workloads this means that
xdp_rings array access will be skipped, which was executed per each
processed frame.
Also, read the XDP prog once per NAPI and if prog is present, set up the
local xdp_ring pointer. Reading prog a single time was discussed in [1]
with some concern raised by Toke around dispatcher handling and having
the need for going through the RCU grace period in the ndo_bpf driver
callback, but ice currently is torning down NAPI instances regardless of
the prog presence on VSI.
Although the pointer to XDP ring introduced to Rx ring makes things a
lot slimmer/simpler, I still feel that single prog read per NAPI
lifetime is beneficial.
Further patch that will introduce the fallback path will also get a
profit from that as xdp_ring pointer will be set during the XDP rings
setup.
[1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
xdp_frame is not needed for XDP_TX data path in ice driver case.
For this data path cleaning of sent descriptor will not happen anywhere
outside of the driver, which means that carrying the information about
the underlying memory model via xdp_frame will not be used. Therefore,
this conversion can be simply dropped, which would relieve CPU a bit.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There has been a long lasting issue of improper xdp_rings indexing for
XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index
is mixed with smp_processor_id(), there could be a situation where Tx
descriptors are produced onto XDP Tx ring, but tail is never bumped -
for example pin a particular queue id to non-matching IRQ line.
Address this problem by ignoring the user ring count setting and always
initialize the xdp_rings array to be of num_possible_cpus() size. Then,
always use the smp_processor_id() as an index to xdp_rings array. This
provides serialization as at given time only a single softirq can run on
a particular CPU.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
While it was convenient to have a generic ring structure that served
both Tx and Rx sides, next commits are going to introduce several
Tx-specific fields, so in order to avoid hurting the Rx side, let's
pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.
Rx ring could be handled by the old ice_ring which would reduce the code
churn within this patch, but this would make things asymmetric.
Make the union out of the ring container within ice_q_vector so that it
is possible to iterate over newly introduced ice_tx_ring.
Remove the @size as it's only accessed from control path and it can be
calculated pretty easily.
Change definitions of ice_update_ring_stats and
ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
used for both Rx and Tx rings.
Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
difference now.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently ice_container_type is scoped only for ice_ethtool.c. Next
commit that will split the ice_ring struct onto Rx/Tx specific ring
structs is going to also modify the type of linked list of rings that is
within ice_ring_container. Therefore, the functions that are taking the
ice_ring_container as an input argument will need to be aware of a ring
type that will be looked up.
Embed ice_container_type within ice_ring_container and initialize it
properly when allocating the q_vectors.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This field is dead and driver is not making any use of it. Simply remove
it.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support for adaptive interrupt coalescing to the dpaa2-eth driver.
First of all, ETHTOOL_COALESCE_USE_ADAPTIVE_RX is defined as a supported
coalesce parameter and the requested state is configured through the
dpio APIs added in the previous patch.
Besides the ethtool API interaction, we keep track of how many bytes and
frames are dequeued per CDAN (Channel Data Availability Notification)
and update the Net DIM instance through the dpaa2_io_update_net_dim()
API.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the newly exported dpio driver API to manually configure the IRQ
coalescing parameters requested by the user.
The .get_coalesce() and .set_coalesce() net_device callbacks are
implemented and directly export or setup the rx-usecs on all the
channels configured.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of percpu_counter structure to track count of orphaned
sockets is causing problems on modern hosts with 256 cpus
or more.
Stefan Bach reported a serious spinlock contention in real workloads,
that I was able to reproduce with a netfilter rule dropping
incoming FIN packets.
53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath
|
---queued_spin_lock_slowpath
|
--53.51%--_raw_spin_lock_irqsave
|
--53.51%--__percpu_counter_sum
tcp_check_oom
|
|--39.03%--__tcp_close
| tcp_close
| inet_release
| inet6_release
| sock_close
| __fput
| ____fput
| task_work_run
| exit_to_usermode_loop
| do_syscall_64
| entry_SYSCALL_64_after_hwframe
| __GI___libc_close
|
--14.48%--tcp_out_of_resources
tcp_write_timeout
tcp_retransmit_timer
tcp_write_timer_handler
tcp_write_timer
call_timer_fn
expire_timers
__run_timers
run_timer_softirq
__softirqentry_text_start
As explained in commit cf86a086a1 ("net/dst: use a smaller percpu_counter
batch for dst entries accounting"), default batch size is too big
for the default value of tcp_max_orphans (262144).
But even if we reduce batch sizes, there would still be cases
where the estimated count of orphans is beyond the limit,
and where tcp_too_many_orphans() has to call the expensive
percpu_counter_sum_positive().
One solution is to use plain per-cpu counters, and have
a timer to periodically refresh this cache.
Updating this cache every 100ms seems about right, tcp pressure
state is not radically changing over shorter periods.
percpu_counter was nice 15 years ago while hosts had less
than 16 cpus, not anymore by current standards.
v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m,
reported by kernel test robot <lkp@intel.com>
Remove unused socket argument from tcp_too_many_orphans()
Fixes: dd24c00191 ("net: Use a percpu_counter for orphan_count")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stefan Bach <sfb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support to flush or invalidate CPT CTX entries as part of FLR
and also provides a mailbox to flush CPT CTX entries in case of
graceful exit.
This patch also adds support for AF -> CPT PF uplink mailbox messages
and adds a new mbox message to submit a CPT instruction from AF.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Perform CPT LF teardown in non FLR path as well via cpt_lf_free()
Currently CPT LF teardown and reset sequence is only
done when FLR is handled with CPT LF still attached.
This patch also fixes cpt_lf_alloc() to set EXEC_LDWB in
CPT_AF_LFX_CTL2 when being completely overwritten as that is
the default value and is better for performance.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch enables and registers interrupt handler for CPT HW
interrupts.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On i386, when builtin (not a loadable module), the winbond-840 driver
inspects boot_cpu_data to see what CPU family it is running on, and
then acts on that data. The "family" struct member (x86) does not exist
when running on UML, so prevent that test and do the default action.
Prevents this build error on UML + i386:
../drivers/net/ethernet/dec/tulip/winbond-840.c: In function ‘init_registers’:
../drivers/net/ethernet/dec/tulip/winbond-840.c:882:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’
if (boot_cpu_data.x86 <= 4) {
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20211014050606.7288-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On a UML build, the igc_ptp driver uses CONFIG_X86_TSC for timestamp
conversion. The function that is used is not available on UML builds,
so have the function use the default system_counterval_t timestamp
instead for UML builds.
Prevents this build error on UML:
../drivers/net/ethernet/intel/igc/igc_ptp.c: In function ‘igc_device_tstamp_to_system’:
../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: implicit declaration of function ‘convert_art_ns_to_tsc’ [-Werror=implicit-function-declaration]
return convert_art_ns_to_tsc(tstamp);
../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: incompatible types when returning type ‘int’ but ‘struct system_counterval_t’ was expected
return convert_art_ns_to_tsc(tstamp);
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/r/20211014050516.6846-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On i386, when builtin (not a loadable module), the fealnx driver
inspects boot_cpu_data to see what CPU family it is running on, and
then acts on that data. The "family" struct member (x86) does not exist
when running on UML, so prevent that test and do the default action.
Prevents this build error on UML + i386:
../drivers/net/ethernet/fealnx.c: In function ‘netdev_open’:
../drivers/net/ethernet/fealnx.c:861:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20211014050500.5620-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the PTP_PEROUT_DUTY_CYCLE flag is set, then check if the
request_on value in ptp_perout_request matches the pre-defined
values or a toggle option.
Return a failure if the value is not supported.
Preserve the old behaviors if the PTP_PEROUT_DUTY_CYCLE flag is not
set.
Tested with an oscilloscope on EVB-LAN7430:
e.g., to output PPS 1sec period 500mS on (high) to GPIO 2.
./testptp -L 2,2
./testptp -p 1000000000 -w 500000000
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Link: https://lore.kernel.org/r/1634046593-64312-1-git-send-email-yuiko.oshino@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tools/testing/selftests/net/ioam6.sh
7b1700e009 ("selftests: net: modify IOAM tests for undef bits")
bf77b1400a ("selftests: net: Test for the IOAM encapsulation with IPv6")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently when a user uses "devlink dev info", the fw.mgmt.api will be
the major.minor numbers as shown below:
devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
driver ice
serial_number 00-01-00-ff-ff-00-00-00
versions:
fixed:
board.id K91258-000
running:
fw.mgmt 6.1.2
fw.mgmt.api 1.7 <--- No patch number included
fw.mgmt.build 0xd75e7d06
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.app.name ICE OS Default Package
fw.app 1.3.27.0
fw.app.bundle_id 0xc0000001
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
stored:
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
There are many features in the driver that depend on the major, minor,
and patch version of the FW. Without the patch number in the output for
fw.mgmt.api debugging issues related to the FW API version is difficult.
Also, using major.minor.patch aligns with the existing firmware version
which uses a 3 digit value.
Fix this by making the fw.mgmt.api print the major.minor.patch
versions. Shown below is the result:
devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
driver ice
serial_number 00-01-00-ff-ff-00-00-00
versions:
fixed:
board.id K91258-000
running:
fw.mgmt 6.1.2
fw.mgmt.api 1.7.9 <--- patch number included
fw.mgmt.build 0xd75e7d06
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.app.name ICE OS Default Package
fw.app 1.3.27.0
fw.app.bundle_id 0xc0000001
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
stored:
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
Fixes: ff2e5c700e ("ice: add basic handler for devlink .info_get")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Correct parameters order in call to ice_tunnel_idx_to_entry function.
Entry in sparse port table is correct when the idx is 0. For idx 1 one
correct entry should be skipped, for idx 2 two of them should be skipped
etc. Change if condition to be true when idx is 0, which means that
previous valid entry of this tunnel type were skipped.
Fixes: b20e6c17c4 ("ice: convert to new udp_tunnel infrastructure")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In the remove path, there is an attempt to free the aux_idx IDA whether
it was allocated or not. This can potentially cause a crash when
unloading the driver on systems that do not initialize support for RDMA.
But, this free cannot be gated by the status bit for RDMA, since it is
allocated if the driver detects support for RDMA at probe time, but the
driver can enter into a state where RDMA is not supported after the IDA
has been allocated at probe time and this would lead to a memory leak.
Initialize aux_idx to an invalid value and check for a valid value when
unloading to determine if an IDA free is necessary.
Fixes: d25a0fc41c ("ice: Initialize RDMA support")
Reported-by: Jun Miao <jun.miao@windriver.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active
the RDMA Tx queue scheduler node configuration will not be cleaned up.
This will cause the rebuild/re-add of the VSI to fail due to the
software structures not being correctly cleaned up for the VSI index.
Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there
are no RDMA scheduler nodes created, then there is no harm in calling
ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if
RDMA support is added for other VSI types they will also get this
change.
Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
random_ether_addr() was the original name of the helper which
was kept for backward compatibility (?) after the rename in
commit 0a4dd59498 ("etherdevice: Rename random_ether_addr
to eth_random_addr").
We have a single random_ether_addr() caller left in tree
while there are 70 callers of eth_random_addr().
Time to drop this define.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20211013205450.328092-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
This patch takes care of drivers which cast netdev->dev_addr to
a 16bit type, often with an explicit byte order.
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A handful of drivers contains loops assigning the mac
addr byte by byte. Convert those to eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A straggler I somehow missed in the automated conversion in
commit 9ca01b25df ("ethernet: use of_get_ethdev_address()").
Use the new helper instead of using netdev->dev_addr directly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A handful of drivers use sizeof(addr) for the size of
the address, after manually confirming the size is
indeed 6 convert them to eth_hw_addr_set().
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Petko Manolov <petkan@nucleusys.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Number of drivers use eth_random_addr(netdev->dev_addr)
thus writing to netdev->dev_addr directly, and not setting
the address type. Switch them to eth_hw_addr_random().
@@
expression netdev;
@@
- eth_random_addr(netdev->dev_addr);
+ eth_hw_addr_random(netdev);
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This big patch sprinkles const on local variables and
function arguments which may refer to netdev->dev_addr.
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Some of the changes here are not strictly required - const
is sometimes cast off but pointer is not used for writing.
It seems like it's still better to add the const in case
the code changes later or relevant -W flags get enabled
for the build.
No functional changes.
Link: https://lore.kernel.org/r/20211014142432.449314-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and
allow controlling them.
E810-T adapters are equipped with:
- 2 external bidirectional SMA connectors
- 1 internal TX U.FL
- 1 internal RX U.FL
U.FL connectors share signal lines with the SMA connectors. The TX U.FL1
share the line with the SMA1 and the RX U.FL2 share line with the SMA2.
This dependence is controlled by the ice_verify_pin_e810t.
Additionally add support for the E810-T-based devices which don't use the
SMA/U.FL controller. If the IO expander is not detected don't expose pins
and use 2 predefined 1PPS input and output pins.
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
E810-T adapters have two external bidirectional SMA connectors and two
internal unidirectional U.FL connectors. Multiplexing between U.FL and
SMA and SMA direction is controlled using the PCA9575 expander.
Add support for the PCA9575 detection and control of the respective pins
of the SMA/U.FL multiplexer using the GPIO AQ API.
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement ice_aq_get_gpio and ice_aq_set_gpio for reading and changing
the state of GPIO pins described in the topology.
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Separate link topo parameters in struct ice_aqc_link_topo_addr into
new struct ice_aqc_link_topo_params.
This keeps input parameters for the get_link_topo command in a separate
structure and is required by future commands that operate only on link
topo params without the node handle.
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, mlxsw allows cooling states to be set above the maximum
cooling state supported by the driver:
# cat /sys/class/thermal/thermal_zone2/cdev0/type
mlxsw_fan
# cat /sys/class/thermal/thermal_zone2/cdev0/max_state
10
# echo 18 > /sys/class/thermal/thermal_zone2/cdev0/cur_state
# echo $?
0
This results in out-of-bounds memory accesses when thermal state
transition statistics are enabled (CONFIG_THERMAL_STATISTICS=y), as the
transition table is accessed with a too large index (state) [1].
According to the thermal maintainer, it is the responsibility of the
driver to reject such operations [2].
Therefore, return an error when the state to be set exceeds the maximum
cooling state supported by the driver.
To avoid dead code, as suggested by the thermal maintainer [3],
partially revert commit a421ce088a ("mlxsw: core: Extend cooling
device with cooling levels") that tried to interpret these invalid
cooling states (above the maximum) in a special way. The cooling levels
array is not removed in order to prevent the fans going below 20% PWM,
which would cause them to get stuck at 0% PWM.
[1]
BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x271/0x290
Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.15.0-rc3-custom-45935-gce1adf704b14 #122
Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2FO"/"SA000874", BIOS 4.6.5 03/08/2016
Workqueue: events_freezable_power_ thermal_zone_device_check
Call Trace:
dump_stack_lvl+0x8b/0xb3
print_address_description.constprop.0+0x1f/0x140
kasan_report.cold+0x7f/0x11b
thermal_cooling_device_stats_update+0x271/0x290
__thermal_cdev_update+0x15e/0x4e0
thermal_cdev_update+0x9f/0xe0
step_wise_throttle+0x770/0xee0
thermal_zone_device_update+0x3f6/0xdf0
process_one_work+0xa42/0x1770
worker_thread+0x62f/0x13e0
kthread+0x3ee/0x4e0
ret_from_fork+0x1f/0x30
Allocated by task 1:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
thermal_cooling_device_setup_sysfs+0x153/0x2c0
__thermal_cooling_device_register.part.0+0x25b/0x9c0
thermal_cooling_device_register+0xb3/0x100
mlxsw_thermal_init+0x5c5/0x7e0
__mlxsw_core_bus_device_register+0xcb3/0x19c0
mlxsw_core_bus_device_register+0x56/0xb0
mlxsw_pci_probe+0x54f/0x710
local_pci_probe+0xc6/0x170
pci_device_probe+0x2b2/0x4d0
really_probe+0x293/0xd10
__driver_probe_device+0x2af/0x440
driver_probe_device+0x51/0x1e0
__driver_attach+0x21b/0x530
bus_for_each_dev+0x14c/0x1d0
bus_add_driver+0x3ac/0x650
driver_register+0x241/0x3d0
mlxsw_sp_module_init+0xa2/0x174
do_one_initcall+0xee/0x5f0
kernel_init_freeable+0x45a/0x4de
kernel_init+0x1f/0x210
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff8881052f7800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 1016 bytes inside of
1024-byte region [ffff8881052f7800, ffff8881052f7c00)
The buggy address belongs to the page:
page:0000000052355272 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1052f0
head:0000000052355272 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffffea0005034800 0000000300000003 ffff888100041dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[2] https://lore.kernel.org/linux-pm/9aca37cb-1629-5c67-1895-1fdc45c0244e@linaro.org/
[3] https://lore.kernel.org/linux-pm/af9857f2-578e-de3a-e62b-6baff7e69fd4@linaro.org/
CC: Daniel Lezcano <daniel.lezcano@linaro.org>
Fixes: a50c1e3565 ("mlxsw: core: Implement thermal zone")
Fixes: a421ce088a ("mlxsw: core: Extend cooling device with cooling levels")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20211012174955.472928-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After recent cleanups, gcc started warning about a suspicious
memcpy() call during the s2io_io_resume() function:
In function '__dev_addr_set',
inlined from 'eth_hw_addr_set' at include/linux/etherdevice.h:318:2,
inlined from 's2io_set_mac_addr' at drivers/net/ethernet/neterion/s2io.c:5205:2,
inlined from 's2io_io_resume' at drivers/net/ethernet/neterion/s2io.c:8569:7:
arch/x86/include/asm/string_32.h:182:25: error: '__builtin_memcpy' accessing 6 bytes at offsets 0 and 2 overlaps 4 bytes at offset 2 [-Werror=restrict]
182 | #define memcpy(t, f, n) __builtin_memcpy(t, f, n)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/netdevice.h:4648:9: note: in expansion of macro 'memcpy'
4648 | memcpy(dev->dev_addr, addr, len);
| ^~~~~~
What apparently happened is that an old cleanup changed the calling
conventions for s2io_set_mac_addr() from taking an ethernet address
as a character array to taking a struct sockaddr, but one of the
callers was not changed at the same time.
Change it to instead call the low-level do_s2io_prog_unicast() function
that still takes the old argument type.
Fixes: 2fd3768845 ("S2io: Added support set_mac_address driver entry point")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211013143613.2049096-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Qdisc code in mlxsw used to report a number of packets ECN-marked on a
port. Because reporting a per-port value as a per-TC value was misleading,
this was removed in commit 8a29581eb0 ("mlxsw: spectrum: Move the
ECN-marked packet counter to ethtool").
On Spectrum-3, a per-TC number of ECN-marked packets is available in per-TC
congestion counter group. Add a new array for the ECN counter, fetch the
values from the per-TC congestion group, and pick the value indicated by
tclass_num as appropriate.
On Spectrum-1 and Spectrum-2, this per-TC value is not available, and
zeroes will be reported, as they currently are.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The PPCNT register retrieves per port performance counters. The
ecn_marked_tc field in per-TC Congestion counter group contains a count of
packets marked as ECN or potentially marked as ECN.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The name does not make sense as it is. Clearly there is a typo and the
suffix should have been _CNT, like the other enumerators. Fix accordingly.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is no such thing as "traffic group". The group that this is a heading
of is "per traffic class counters". Fix the heading.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This was supposed to be a check for if dma_alloc_coherent() failed
but it has a copy and paste bug so it will not work.
Fixes: fb8629e2cb ("net: enetc: add support for software TSO")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20211013080456.GC6010@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/ethernet/korina.o: in function `korina_multicast_list':
korina.c:(.text+0x1af): undefined reference to `crc32_le'
Fixes: ef11291bcd ("Add support the Korina (IDT RC32434) Ethernet MAC")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Florian fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211012152509.21771-1-vegard.nossum@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the typo AVB->DMAC in comment, as the code following the comment
is for DMAC on Gigabit Ethernet IP.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch enables Receive/Transmit port of TOE and removes
the setting of promiscuous bit from EMAC configuration mode register.
This patch also update EMAC configuration mode comment from
"PAUSE prohibition" to "EMAC Mode: PAUSE prohibition; Duplex; TX;
RX; CRC Pass Through".
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Document PFRI register bit, as it is documented on R-Car Gen3 and
RZ/G2L hardware manuals.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename the feature bit "nc_queue" with "nc_queues" as AVB DMAC has
RX and TX NC queues.
There is no functional change.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename the variable "tsrq" with "tccr_mask" as we are passing
TCCR mask to the ravb_wait() function.
There is no functional change.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for retrieving stats information for GbEthernet.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RZ/G2L E-MAC supports carrier counters.
Add a carrier_counter hw feature bit to struct ravb_hw_info
to add this feature only for RZ/G2L.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fillup ravb_rx_gbeth() function to support RZ/G2L.
This patch also renames ravb_rcar_rx to ravb_rx_rcar to be
consistent with the naming convention used in sh_eth driver.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fillup ravb_rx_ring_format_gbeth() function to support RZ/G2L.
This patch also renames ravb_rx_ring_format to ravb_rx_ring_format_rcar
to be consistent with the naming convention used in sh_eth driver.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fillup ravb_rx_ring_free_gbeth() function to support RZ/G2L.
This patch also renames ravb_rx_ring_free to ravb_rx_ring_free_rcar
to be consistent with the naming convention used in sh_eth driver.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fillup ravb_alloc_rx_desc_gbeth() function to support RZ/G2L.
This patch also renames ravb_alloc_rx_desc to ravb_alloc_rx_desc_rcar
to be consistent with the naming convention used in sh_eth driver.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
R-Car AVB-DMAC has maximum 2K size on RX buffer, whereas on RZ/G2L
it is 8K. We need to allow for changing the MTU within the limit
of the maximum size of a descriptor.
Add a rx_max_buf_size variable to struct ravb_hw_info to handle
this difference.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use ALIGN macro for calculating the value for max_rx_len.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/ethernet/arc/emac_main.o: in function `arc_emac_set_rx_mode':
emac_main.c:(.text+0xb11): undefined reference to `crc32_le'
The crc32_le() call comes through the ether_crc_le() call in
arc_emac_set_rx_mode().
[v2: moved the select to ARC_EMAC_CORE; the Makefile is a bit confusing,
but the error comes from emac_main.o, which is part of the arc_emac module,
which in turn is enabled by CONFIG_ARC_EMAC_CORE. Note that arc_emac is
different from emac_arc...]
Fixes: 775dd682e2 ("arc_emac: implement promiscuous mode and multicast filtering")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Link: https://lore.kernel.org/r/20211012093446.1575-1-vegard.nossum@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The type of enum dbg_grc_params has the enumerator list starting from 0.
When grc_param is declared by enum dbg_grc_params, (grc_param < 0) is
always false. We should remove the check of this expression.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Acked-by: Shai Malin <smalin@marvell.com>
Link: https://lore.kernel.org/r/20211012074645.12864-1-sakiwit@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For those architectures which do not define_HAVE_ARCH_IPV6_CSUM, we need
to include ip6_checksum.h which provides the csum_ipv6_magic() function.
Fixes: fb8629e2cb ("net: enetc: add support for software TSO")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211012121358.16641-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't try to unregister the devlink if it hasn't been registered
yet. This bit of error cleanup code got missed in the recent
devlink registration changes.
Fixes: 7911c8bd54 ("ionic: Move devlink registration to be last devlink command")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20211012231520.72582-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As explained here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocol drivers cannot depend on symbols exported by switch
drivers, because this creates a circular dependency that breaks module
autoloading.
The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
exported by the common ocelot switch lib. This function looks at
OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
DSA tag, for PTP timestamping (the command: one-step/two-step, and the
TX timestamp identifier).
None of that requires deep insight into the driver, it is quite
stateless, as it only depends upon the skb->cb. So let's make it a
static inline function and put it in include/linux/dsa/ocelot.h, a
file that despite its name is used by the ocelot switch driver for
populating the injection header too - since commit 40d3f295b5 ("net:
mscc: ocelot: use common tag parsing code with DSA").
With that function declared as static inline, its body is expanded
inside each call site, so the dependency is broken and the DSA tagger
can be built without the switch library, upon which the felix driver
depends.
Fixes: 39e5308b32 ("net: mscc: ocelot: support PTP Sync one-step timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The sad reality is that when a PTP frame with a TX timestamping request
is transmitted, it isn't guaranteed that it will make it all the way to
the wire (due to congestion inside the switch), and that a timestamp
will be taken by the hardware and placed in the timestamp FIFO where an
IRQ will be raised for it.
The implication is that if enough PTP frames are silently dropped by the
hardware such that the timestamp ID has rolled over, it is possible to
match a timestamp to an old skb.
Furthermore, nobody will match on the real skb corresponding to this
timestamp, since we stupidly matched on a previous one that was stale in
the queue, and stopped there.
So PTP timestamping will be broken and there will be no way to recover.
It looks like the hardware parses the sequenceID from the PTP header,
and also provides that metadata for each timestamp. The driver currently
ignores this, but it shouldn't.
As an extra resiliency measure, do the following:
- check whether the PTP sequenceID also matches between the skb and the
timestamp, treat the skb as stale otherwise and free it
- if we see a stale skb, don't stop there and try to match an skb one
more time, chances are there's one more skb in the queue with the same
timestamp ID, otherwise we wouldn't have ever found the stale one (it
is by timestamp ID that we matched it).
While this does not prevent PTP packet drops, it at least prevents
the catastrophic consequences of incorrect timestamp matching.
Since we already call ptp_classify_raw in the TX path, save the result
in the skb->cb of the clone, and just use that result in the interrupt
code path.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It appears that Ocelot switches cannot timestamp non-PTP frames,
I tested this using the isochron program at:
https://github.com/vladimiroltean/tsn-scripts
with the result that the driver increments the ocelot_port->ts_id
counter as expected, puts it in the REW_OP, but the hardware seems to
not timestamp these packets at all, since no IRQ is emitted.
Therefore check whether we are sending PTP frames, and refuse to
populate REW_OP otherwise.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When skb_match is NULL, it means we received a PTP IRQ for a timestamp
ID that the kernel has no idea about, since there is no skb in the
timestamping queue with that timestamp ID.
This is a grave error and not something to just "continue" over.
So print a big warning in case this happens.
Also, move the check above ocelot_get_hwtimestamp(), there is no point
in reading the full 64-bit current PTP time if we're not going to do
anything with it anyway for this skb.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PTP packets with 2-step TX timestamp requests are matched to packets
based on the egress port number and a 6-bit timestamp identifier.
All PTP timestamps are held in a common FIFO that is 128 entry deep.
This patch ensures that back-to-back timestamping requests cannot exceed
the hardware FIFO capacity. If that happens, simply send the packets
without requesting a TX timestamp to be taken (in the case of felix,
since the DSA API has a void return code in ds->ops->port_txtstamp) or
drop them (in the case of ocelot).
I've moved the ts_id_lock from a per-port basis to a per-switch basis,
because we need separate accounting for both numbers of PTP frames in
flight. And since we need locking to inc/dec the per-switch counter,
that also offers protection for the per-port counter and hence there is
no reason to have a per-port counter anymore.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At present, there is a problem when user space bombards a port with PTP
event frames which have TX timestamping requests (or when a tc-taprio
offload is installed on a port, which delays the TX timestamps by a
significant amount of time). The driver will happily roll over the 2-bit
timestamp ID and this will cause incorrect matches between an skb and
the TX timestamp collected from the FIFO.
The Ocelot switches have a 6-bit PTP timestamp identifier, and the value
63 is reserved, so that leaves identifiers 0-62 to be used.
The timestamp identifiers are selected by the REW_OP packet field, and
are actually shared between CPU-injected frames and frames which match a
VCAP IS2 rule that modifies the REW_OP. The hardware supports
partitioning between the two uses of the REW_OP field through the
PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP
IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2.
The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping,
and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in
ocelot_init_timestamp() which makes all timestamp identifiers available
to CPU injection.
Therefore, we can make use of all 63 timestamp identifiers, which should
allow more timestampable packets to be in flight on each port. This is
only part of the solution, more issues will be addressed in future changes.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit a0c76345e3 ("devlink: disallow reload operation during device
cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
operation from racing with device probe/dismantle.
After recent changes to move devlink_register() to the end of device
probe and devlink_unregister() to the beginning of device dismantle,
these races can no longer happen. Reload operations will be denied if
the devlink instance is unregistered and devlink_unregister() will block
until all in-flight operations are done.
Therefore, remove these devlink_reload_{enable,disable}() APIs.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mulitport slave device doesn't support devlink reload, so instead of
complicating initialization flow with devlink_reload_enable() which
will be removed in next patch, don't set DEVLINK_F_RELOAD feature bit
for such devices.
This fixes an error when reload counters exposed (and equal zero) for
the mode that is not supported at all.
Fixes: d89ddaae17 ("net/mlx5: Disable devlink reload for multi port slave device")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce new devlink call to set feature mask to control devlink
behavior during device initialization phase after devlink_alloc()
is already called.
This allows us to set reload ops based on device property which
is not known at the beginning of driver initialization.
For the sake of simplicity, this API lacks any type of locking and
needs to be called before devlink_register() to make sure that no
parallel access to the ops is possible at this stage.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The declaration of struct devlink in general header provokes the
situation where internal fields can be accidentally used by the driver
authors. In order to reduce such possible situations, let's reduce the
namespace exposure of struct devlink.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace dev->driver_name() by dev_driver_string() for the corresponding
struct device. This is a step toward removing pci_dev->driver.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/20211004125935.2300113-8-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Replace dev->driver_name() by dev_driver_string() for the corresponding
struct device. This is a step toward removing pci_dev->driver.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/20211004125935.2300113-8-u.kleine-koenig@pengutronix.de
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Replace dev->driver_name() by dev_driver_string() for the corresponding
struct device. This is a step toward removing pci_dev->driver.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/20211004125935.2300113-8-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Replace dev->driver_name() by dev_driver_string() for the corresponding
struct device. This is a step toward removing pci_dev->driver.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/20211004125935.2300113-8-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Commit 846d6da1fc ("net/mlx5e: Fix division by 0 in
mlx5e_select_queue") makes mlx5e_build_nic_params assign a non-zero
initial value to priv->num_tc_x_num_ch, so that mlx5e_select_queue
doesn't fail with division by 0 if called before the first activation of
channels. However, the initialization flow of representors doesn't call
mlx5e_build_nic_params, so this bug can still happen with representors.
This commit fixes the bug by adding the missing assignment to
mlx5e_build_rep_params.
Fixes: 846d6da1fc ("net/mlx5e: Fix division by 0 in mlx5e_select_queue")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Due to current HW arch limitations, RX-FCS (scattering FCS frame field
to software) and RX-port-timestamp (improved timestamp accuracy on the
receive side) can't work together.
RX-port-timestamp is not controlled by the user and it is enabled by
default when supported by the HW/FW.
This patch sets RX-port-timestamp opposite to RX-FCS configuration.
Fixes: 102722fc68 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Before this patch, mlx5 representors advertised the
NETIF_F_VLAN_CHALLENGED bit, this could lead to missing features when
using reps with vxlan/bridge and maybe other virtual interfaces,
when such interfaces inherit this bit and block vlan usage in their
topology.
Example:
$ip link add dev bridge type bridge
# add representor interface to the bridge
$ip link set dev pf0hpf master
$ip link add link bridge name vlan10 type vlan id 10 protocol 802.1q
Error: 8021q: VLANs not supported on device.
Reps are perfectly capable of handling vlan traffic, although they don't
implement vlan_{add,kill}_vid ndos, hence, remove
NETIF_F_VLAN_CHALLENGED advertisement.
Fixes: cb67b83292 ("net/mlx5e: Introduce SRIOV VF representors")
Reported-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Do not allow configurations of MQPRIO channel mode that do not
fully define and utilize the channels txqs.
Fixes: ec60c4581b ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
For dependencies in the following patches.
* mellanox/mlx5-next:
net/mlx5: Add priorities for counters in RDMA namespaces
net/mlx5: Add ifc bits to support optional counters
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
I recently added a variable called addr to tulip_init_one()
but for sparc there's already a variable called that half
way thru the function. Rename it to fix build.
Fixes: ca87931755 ("ethernet: tulip: remove direct netdev->dev_addr writes")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4dd0d5c33c ("ice: add lock around Tx timestamp tracker flush")
added a lock around the Tx timestamp tracker flow which is used to
cleanup any left over SKBs and prepare for device removal.
This lock is problematic because it is being held around a call to
ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
write command to firmware. This could lead to a deadlock if the mutex
actually sleeps, and causes the following warning on a kernel with
preemption debugging enabled:
[ 715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573
[ 715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod
[ 715.435652] INFO: lockdep is turned off.
[ 715.439591] Preemption disabled at:
[ 715.439594] [<0000000000000000>] 0x0
[ 715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G W OE 5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c
[ 715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[ 715.468483] Call Trace:
[ 715.470940] dump_stack_lvl+0x6a/0x9a
[ 715.474613] ___might_sleep.cold+0x224/0x26a
[ 715.478895] __mutex_lock+0xb3/0x1440
[ 715.482569] ? stack_depot_save+0x378/0x500
[ 715.486763] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.494979] ? kfree+0xc1/0x520
[ 715.498128] ? mutex_lock_io_nested+0x12a0/0x12a0
[ 715.502837] ? kasan_set_free_info+0x20/0x30
[ 715.507110] ? __kasan_slab_free+0x10b/0x140
[ 715.511385] ? slab_free_freelist_hook+0xc7/0x220
[ 715.516092] ? kfree+0xc1/0x520
[ 715.519235] ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.527359] ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.535133] ? pci_device_remove+0xab/0x1d0
[ 715.539318] ? __device_release_driver+0x35b/0x690
[ 715.544110] ? driver_detach+0x214/0x2f0
[ 715.548035] ? bus_remove_driver+0x11d/0x2f0
[ 715.552309] ? pci_unregister_driver+0x26/0x250
[ 715.556840] ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.564799] ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[ 715.570554] ? do_syscall_64+0x3b/0x90
[ 715.574303] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 715.579529] ? start_flush_work+0x542/0x8f0
[ 715.583719] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.591923] ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.599960] ? wait_for_completion_io+0x250/0x250
[ 715.604662] ? lock_acquire+0x196/0x200
[ 715.608504] ? do_raw_spin_trylock+0xa5/0x160
[ 715.612864] ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.620813] ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.628497] ? __debug_check_no_obj_freed+0x1e8/0x3c0
[ 715.633550] ? trace_hardirqs_on+0x1c/0x130
[ 715.637748] ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.646220] ? do_raw_spin_trylock+0xa5/0x160
[ 715.650581] ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.658797] ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.667013] ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.675403] ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.683440] ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.691037] ? _raw_spin_unlock_irqrestore+0x46/0x73
[ 715.696005] pci_device_remove+0xab/0x1d0
[ 715.700018] __device_release_driver+0x35b/0x690
[ 715.704637] driver_detach+0x214/0x2f0
[ 715.708389] bus_remove_driver+0x11d/0x2f0
[ 715.712489] pci_unregister_driver+0x26/0x250
[ 715.716857] ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.724637] __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[ 715.730210] ? free_module+0x6d0/0x6d0
[ 715.733963] ? task_work_run+0xe1/0x170
[ 715.737803] ? exit_to_user_mode_loop+0x17f/0x1d0
[ 715.742509] ? rcu_read_lock_sched_held+0x12/0x80
[ 715.747215] ? trace_hardirqs_on+0x1c/0x130
[ 715.751401] do_syscall_64+0x3b/0x90
[ 715.754981] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 715.760033] RIP: 0033:0x7f4dfe59000b
[ 715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48
[ 715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b
[ 715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918
[ 715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940
[ 715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0
Notice that this is the only case where we use the lock in this way. In
the cleanup kthread and work kthread the lock is only taken around the
bit accesses. This was done intentionally to avoid this kind of issue.
The way the lock is used, we only protect ordering of bit sets vs bit
clears. The Tx writers in the hot path don't need to be protected
against the entire kthread loop. The Tx queues threads only need to
ensure that they do not re-use an index that is currently in use. The
cleanup loop does not need to block all new set bits, since it will
re-queue itself if new timestamps are present.
Fix the tracker flow so that it uses the same flow as the standard
cleanup thread. In addition, ensure the in_use bitmap actually gets
cleared properly.
This fixes the warning and also avoids the potential deadlock that might
have occurred otherwise.
Fixes: 4dd0d5c33c ("ice: add lock around Tx timestamp tracker flush")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I missed removing i from the array index when converting
from a loop to a direct copy.
Fixes: ca87931755 ("ethernet: tulip: remove direct netdev->dev_addr writes")
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.
So, take the opportunity to refactor the hnae_handle structure to switch
the last member to flexible array, changing the code accordingly. Also,
fix the comment in the hnae_vf_cb structure to inform that the ae_handle
member must be the last member.
Then, use the struct_size() helper to do the arithmetic instead of the
argument "size + count * size" in the kzalloc() function.
This code was detected with the help of Coccinelle and audited and fixed
manually.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following coccicheck warning:
drivers/net/ethernet/mscc/ocelot.c:474:duplicated argument to & or |
drivers/net/ethernet/mscc/ocelot.c:476:duplicated argument to & or |
drivers/net/ethernet/mscc/ocelot_net.c:1627:duplicated argument
to & or |
These DEV_CLOCK_CFG_MAC_TX_RST are duplicate here.
Here should be DEV_CLOCK_CFG_MAC_RX_RST.
Fixes: e6e12df625 ("net: mscc: ocelot: convert to phylink")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RED "mark" qevent can be offloaded under similar conditions as the RED
"early_drop" qevent. Therefore recognize its binding type in the
TC_SETUP_BLOCK handler and translate to the right SPAN trigger, with the
right set of supported actions.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One block can be bound to several qevents. The qevent type that the block
is bound to determines which actions make sense in a given context. In the
particular case of mlxsw, trap cannot be offloaded on a RED mark qevent,
because the trap contract specifies that the packet is dropped in the HW
datapath, and the HW trigger that the action is offloaded to is always
forwarding the packet (in addition to marking in).
Therefore keep track of which actions are permissible at each binding
block. When an attempt is made to bind a certain action at a binding point
where it is not supported, bounce the request.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patches will configure the MLXSW_SP_SPAN_TRIGGER_ECN
mirroring trigger. This trigger is considered "egress", unlike the
previously-offloaded _EARLY_DROP. Add a helper to spectrum_span,
mlxsw_sp_span_trigger_is_ingress(), to classify triggers to ingress and
egress. Pass result of this instead of hardcoding true when calling
mlxsw_sp_span_analyzed_port_get()/_put().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function will report a new failure in the following patches.
Pass extack so that the failure is explicable.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
t-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-10-11
Wojciech Drewek says:
This series adds support for adding/removing advanced switch filters
in ice driver. Advanced filters are building blocks for HW acceleration
of TC orchestration. Add ndo_setup_tc callback implementation for PF and
VF port representors (when device is configured in switchdev mode).
Define dummy packet headers to allow adding advanced rules in HW.
Supported headers, and thus filters, are:
- MAC + IPv4 + UDP
- MAC + VLAN + IPv4 + UDP
- MAC + IPv4 + TCP
- MAC + VLAN + IPv4 + TCP
- MAC + IPv6 + UDP
- MAC + VLAN + IPv6 + UDP
- MAC + IPv6 + TCP
- MAC + VLAN + IPv6 + TCP
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx_buf_alloc_fail counter wasn't getting updated.
Fixes: 433e274b8f ("gve: Add stats for gve.")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Half pages are just used for small enough packets. This change allows
this to also apply for systems with pages larger than 4 KB.
Fixes: 02b0e0c18b ("gve: Rx Buffer Recycling")
Signed-off-by: Jordan Kim <jrkim@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Configure XPS when adding tx queues to the notification blocks.
Fixes: dbdaa67540 ("gve: Move some static functions to a common file")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't always reset the driver on a TX timeout. Attempt to
recover by kicking the queue in case an IRQ was missed.
Fixes: 9e5f7d26a4 ("gve: Add workqueue and reset support")
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TX queue is full, attemt to process enough TX completions
to avoid stalling the queue.
Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a pagecnt bias field to rx buffer info struct to eliminate
needing to increment the atomic page ref count on every pass in the
rx hotpath.
Also prefetch two packet pages ahead.
Fixes: ede3fcf5ec ("gve: Add support for raw addressing to the rx path")
Signed-off-by: Yanchun Fu <yangchun@google.com>
Signed-off-by: Nathan Lewis <npl@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use napi_complete_done to allow for the use of gro_flush_timeout.
Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tc-flower support for VF port representor devices.
Implement ndo_setup_tc callback for TC HW offload on VF port representors
devices. Implemented both methods: add and delete tc-flower flows.
Mark NETIF_F_HW_TC bit in net device's feature set to enable offload TC
infrastructure for port representor.
Implement TC filters replay function required to restore filters settings
while switchdev configuration is rebuilt.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement ndo_setup_tc net device callback for TC HW offload on PF device.
ndo_setup_tc provides support for HW offloading various TC filters.
Add support for configuring the following filter with tc-flower:
- default L2 filters (src/dst mac addresses, ethertype, VLAN)
- variations of L3, L3+L4, L2+L3+L4 filters using advanced filters
(including ipv4 and ipv6 addresses).
Allow for adding/removing TC flows when PF device is configured in
eswitch switchdev mode. Two types of actions are supported at the
moment: FLOW_ACTION_DROP and FLOW_ACTION_REDIRECT.
Co-developed-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com>
Signed-off-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is no way to change default lan_en and lb_en flags while
adding new rule. Add function that allows changing these flags
on rule determined by rule id and recipe id.
Function checks if the rule is presented on regular rules list or
advance rules list and call the appropriate function to update
rule entry.
As rules with ICE_SW_LKUP_DFLT recipe aren't tracked in a list,
implement function which updates flags without searching for rules
based only on rule id.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Change ICE_SW_LKUP_LAST to ICE_MAX_NUM_RECIPES as for now there also can
be recipes other than the default.
Free all structures created for advanced recipes in cleanup function.
Write a function to clean allocated structures on advanced rule info.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
To remove advanced rule the same protocols list like in adding should be
send to function. Based on this information list of advanced rules is
searched to find the correct rule id.
Remove advanced rule if it forwards to only one VSI. If it forwards
to list of VSI remove only input VSI from this list.
Introduce function to remove rule by id. It is used in case rule needs to
be removed even if it forwards to the list of VSI.
Allow removing all advanced rules from a particular VSI. It is useful in
rebuilding VSI path.
Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Shivanshu Shukla <shivanshu.shukla@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Define dummy packet headers to allow adding advanced rules in HW. This
header is used as admin queue command parameter for adding a rule.
The firmware will extract correct fields and will use them in look ups.
Define each supported packets header and offsets to words used in recipe.
Supported headers:
- MAC + IPv4 + UDP
- MAC + VLAN + IPv4 + UDP
- MAC + IPv4 + TCP
- MAC + VLAN + IPv4 + TCP
- MAC + IPv6 + UDP
- MAC + VLAN + IPv6 + UDP
- MAC + IPv6 + TCP
- MAC + VLAN + IPv6 + TCP
Add code for creating an advanced rule. Rule needs to match defined
dummy packet, if not return error, which means that this type of rule
isn't currently supported.
The first step in adding advanced rule is searching for an advanced
recipe matching this kind of rule. If it doesn't exist new recipe is
created. Dummy packet has to be filled with the correct header field
value from the rule definition. It will be used to do look up in HW.
Support searching for existing advance rule entry. It is used in case
of adding the same rule on different VSI. In this case, instead of
creating new rule, the existing one should be updated with refreshed VSI
list.
Add initialization for prof_res_bm_init flag to zero so that
the possible resource for fv in the files can be initialized.
Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
These changes introduce code for creating advanced recipes for the
switch in hardware.
There are a couple of recipes already defined in the HW. They apply to
matching on basic protocol headers, like MAC, VLAN, MACVLAN,
ethertype or direction (promiscuous), etc.. If the user wants to match on
other protocol headers (eg. ip address, src/dst port etc.) or different
variation of already supported protocols, there is a need to create
new, more complex recipe. That new recipe is referred as
'advanced recipe', and the filtering rule created on top of that recipe
is called 'advanced rule'.
One recipe can have up to 5 words, but the first word is always reserved
for match on switch id, so the driver can define up to 4 words for one
recipe. To support recipes with more words up to 5 recipes can be
chained, so 20 words can be programmed for look up.
Input for adding recipe function is a list of protocols to support. Based
on this list correct profile is being chosen. Correct profile means
that it contains all protocol types from a list. Each profile have up to
48 field vector words and each of this word have protocol id and offset.
These two fields need to match with input data for adding recipe
function. If the correct profile can't be found the function returns an
error.
The next step after finding the correct profile is grouping words into
groups. One group can have up to 4 words. This is done to simplify
sending recipes to HW (because recipe also can have up to 4 words).
In case of chaining (so when look up consists of more than 4 words) last
recipe will always have results from the previous recipes used as words.
A recipe to profile map is used to store information about which profile
is associate with this recipe. This map is an array of 64 elements (max
number of recipes) and each element is a 256 bits bitmap (max number of
profiles)
Profile to recipe map is used to store information about which recipe is
associate with this profile. This map is an array of 256 elements (max
number of profiles) and each element is a 64 bits bitmap (max number of
recipes)
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement functions to manage profiles and field vectors in hardware.
In hardware, there are up to 256 profiles and each of these profiles can
have 48 field vector words. Each field vector word is described by
protocol id and offset in the packet. To add a new recipe all used
profiles need to be searched. If the profile contains all required
protocol ids and offsets from the recipe it can be used. The driver has
to add this profile to recipe association to tell hardware that newly
added recipe is going to be associated with this profile.
The amount of used profiles depend on the package. To avoid searching
across not used profile, max profile id value is calculated at init flow.
The profile is considered as unused when all field vector words in the
profile are invalid (protocol id 0xff and offset 0x1ff).
Profiles are read from the package section ICE_SID_FLD_VEC_SW. Empty
field vector words can be used for recipe results. Store all unused field
vector words in prof_res_bm. It is a 256 elements array (max number of
profiles) each element is a 48 bit bitmap (max number of field vector
words).
For now, support only non-tunnel profiles type.
Co-developed-by: Grishma Kotecha <grishma.kotecha@intel.com>
Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add code to manage recipes and profiles on admin queue layer.
Allow the driver to add a new recipe and update an existing one. Get a
recipe and get a recipe to profile association is mostly used in update
existing recipes code.
Only default recipes can be updated. An update is done by reading recipes
from HW, changing their params and calling add recipe command.
Support following admin queue commands:
- ice_aqc_opc_add_recipe (0x0290) - create a recipe with protocol
header information and other details that determine how this recipe
filter works
- ice_aqc_opc_recipe_to_profile (0x0291) - associate a switch recipe
to a profile
- ice_aqc_opc_get_recipe (0x0292) - get details of an existing recipe
- ice_aqc_opc_get_recipe_to_profile (0x0293) - get a recipe associated
with profile ID
Define ICE_AQC_RES_TYPE_RECIPE resource type to hold a switch
recipe. It is needed when a new switch recipe needs to be created.
Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix for this build problem:
drivers/net/ethernet/sun/ldmvsw.c: In function 'vsw_alloc_netdev':
drivers/net/ethernet/sun/ldmvsw.c:243:2: error: expected ';' before 'sprintf'
sprintf(dev->name, "vif%d.%d", (int)handle, (int)port_id);
^~~~~~~
Fixes: a7639279c9 ("ethernet: sun: remove direct netdev->dev_addr writes")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20211011173424.7743035d@canb.auug.org.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch separates the logic of configuring hardware
maximum transmit frame size and receive frame size.
This simplifies the logic to calculate receive buffer
size and using cqe descriptor of different size.
Also additional size of skb_shared_info structure is
allocated for each receive buffer pointer given to
hardware which is not necessary. Hence change the
size calculation to remove the size of
skb_shared_info. Add a check for array out of
bounds while adding fragments to the network stack.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.
Remove the redundant 'flush_workqueue()' calls.
This was generated with coccinelle:
@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> #mlx*
Signed-off-by: David S. Miller <davem@davemloft.net>
The open code which is dev->priv_flags & IFF_RXFH_CONFIGURED is defined as
a helper function on netdevice.h. So use netif_is_rxfh_configured()
function instead of open code. This patch doesn't change logic.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Override the automatic AdminQ error message in order to
capture the potential No Space message when we hit the
max vlan limit, and add additional messaging to detail
what filter failed.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AdminQ handler has an error handler that automatically prints
an error message when the request has failed. However, there are
situations where the caller can expect that it might fail and has
an alternative strategy, thus may not want the error message sent
to the log, such as hitting -ENOSPC when adding a new vlan id.
We add a new interface to the AdminQ API to allow for override of
the default behavior, and an interface to the use standard error
message formatting.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add vlans to the existing rx_filter_sync mechanics currently
used for managing mac filters.
Older versions of our firmware had no enforced limits on the
number of vlans that the LIF could request, but requesting large
numbers of vlans caused issues in FW memory management, so an
arbitrary limit was added in the FW. The FW now returns -ENOSPC
when it hits that limit, which the driver needs to handle.
Unfortunately, the FW doesn't advertise the vlan id limit,
as it does with mac filters, so the driver won't know the
limit until it bumps into it. We'll grab the current vlan id
count and use that as the limit from there on and thus prevent
getting any more -ENOSPC errors.
Just as is done for the mac filters, the device puts the device
into promiscuous mode when -ENOSPC is seen for vlan ids, and
the driver will track the vlans that aren't synced to the FW.
When vlans are removed, the driver will retry the un-synced
vlans. If all outstanding vlans are synced, the promiscuous
mode will be disabled.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the filter add, make a generic filter delete.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for adding vlan overflow management, rework
the ionic_lif_addr_add() function to something a little more
generic that can be used for other filter types.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for enhancing vlan filter management,
add a filter search routine that can figure out for
itself which type of filter search is needed.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The overflow flags really aren't useful and we don't need lif
struct elements to track them.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The routines that add and delete mac addresses from the
firmware really should be in the file with the rest of
the filter management. This simply moves the functions
with no logic changes.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dump the filter list to debugfs - includes the device-assigned
filter id and the sync'd-to-hardware status.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'rc'.
Eliminate the follow smatch warning:
drivers/net/ethernet/qlogic/qed/qed_main.c:1298 qed_slowpath_start()
warn: missing error code 'rc'.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: d51e4af5c2 ("qed: aRFS infrastructure support")
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bridging, and possibly other upper stack gizmos, adds the
lower device's netdev->dev_addr to its own uc list, and
then requests it be deleted when the upper bridge device is
removed. This delete request also happens with the bridging
vlan_filtering is enabled and then disabled.
Bonding has a similar behavior with the uc list, but since it
also uses set_mac to manage netdev->dev_addr, it doesn't have
the same the failure case.
Because we store our netdev->dev_addr in our uc list, we need
to ignore the delete request from dev_uc_sync so as to not
lose the address and all hope of communicating. Note that
ndo_set_mac_address is expressly changing netdev->dev_addr,
so no limitation is set there.
Fixes: 2a654540be ("ionic: Add Rx filter and rx_mode ndo support")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
8390 contains a lot of loops assigning netdev->dev_addr
byte by byte. Convert what's possible directly to
eth_hw_addr_set(), use local buf in other places.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consify temporary variables pointing to netdev->dev_addr.
A few places need local storage but pretty simple conversion
over all. Note that macaddr[] is an array of ints, so we need
to keep the loops.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consify the casts of netdev->dev_addr.
Convert pointless to eth_hw_addr_set() where possible.
Use local buffers in a number of places.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tg3 does various forms of direct writes to netdev->dev_addr.
Use a local buffer. Make sure local buffer is aligned since
eth_platform_get_mac_address() may call ether_addr_copy().
tg3_get_device_address() returns whenever it finds a method
that found a valid address. Instead of modifying all the exit
points pass the buffer from the outside and commit the address
in the caller.
Constify the argument of the set addr helper.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
forcedeth writes to dev_addr byte by byte, make it use
a local buffer instead. Commit the changes with
eth_hw_addr_set() at the end.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add additional flow steering priorities in the RDMA namespace.
This allows adding flow counters to count filtered RDMA traffic and then
continue processing in the regular RDMA steering flow.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
mlxsw is using helpers to get / set fields in messages exchanged with
the device. It is possible that some fields are only set or only get.
This causes LLVM to emit warnings such as the following when building
with W=1 [1]:
drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c:2022:1: warning: unused function 'mlxsw_afa_sampler_mirror_agent_get'
The fact that some fields are only set or only get is very much
intentional and not indicative of functions that need to be removed.
Therefore, annotate the item helpers with '__maybe_unused' to suppress
these warnings.
[1] https://lkml.org/lkml/2021/9/29/685
Cc: Nathan Chancellor <nathan@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20211008132315.90211-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the redundant check of (tg3_asic_rev(tp) == ASIC_REV_5705) after
it is checked to be true.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20211008063147.1421-1-sakiwit@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch fixes below compliation error in case CONFIG_QED_SRIOV not
defined.
drivers/net/ethernet/qlogic/qed/qed_dev.c: In function
‘qed_fw_err_handler’:
drivers/net/ethernet/qlogic/qed/qed_dev.c:2390:3: error: implicit
declaration of function ‘qed_sriov_vfpf_malicious’; did you mean
‘qed_iov_vf_task’? [-Werror=implicit-function-declaration]
qed_sriov_vfpf_malicious(p_hwfn, &data->err_data);
^~~~~~~~~~~~~~~~~~~~~~~~
qed_iov_vf_task
drivers/net/ethernet/qlogic/qed/qed_dev.c: In function
‘qed_common_eqe_event’:
drivers/net/ethernet/qlogic/qed/qed_dev.c:2410:10: error: implicit
declaration of function ‘qed_sriov_eqe_event’; did you mean
‘qed_common_eqe_event’? [-Werror=implicit-function-declaration]
return qed_sriov_eqe_event(p_hwfn, opcode, echo, data,
^~~~~~~~~~~~~~~~~~~
qed_common_eqe_event
Fixes: fe40a830dc ("qed: Update qed_hsi.h for fw 8.59.1.0")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for driver level TSO in the enetc driver using
the TSO API.
Beside using the usual tso_build_hdr(), tso_build_data() this specific
implementation also has to compute the checksum, both IP and L4, for
each resulted segment. This is because the ENETC controller does not
support Tx checksum offload which is needed in order to take advantage
of TSO.
With the workaround for the ENETC MDIO erratum in place the Tx path of
the driver is forced to lock/unlock for each skb sent. This is why, even
though we are computing the checksum by hand we see the following
improvement in TCP termination on the LS1028A SoC, on a single A72 core
running at 1.3GHz:
before: 1.63 Gbits/sec
after: 2.34 Gbits/sec
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is just a preparation patch for software TSO in the enetc driver.
Unfortunately, ENETC does not support Tx checksum offload which would
normally render TSO, even software, impossible.
Declare NETIF_F_HW_CSUM as part of the feature set and do it at driver
level using skb_csum_hwoffload_help() so that we can move forward and
also add support for TSO in the next patch.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dwmac 3.40a is an old ip version that can be found on SPEAr3xx soc.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some old IPs do not provide the hardware feature register.
On these IPs, this register is read 0x00000000.
In old driver version, this feature was handled but a regression came
with the commit f10a6a3541 ("stmmac: rework get_hw_feature function").
Indeed, this commit removes the return value in dma->get_hw_feature().
This return value was used to indicate the validity of retrieved
information and used later on in stmmac_hw_init() to override
priv->plat data if this hardware feature were valid.
This patch restores the return code in ->get_hw_feature() in order
to indicate the hardware feature validity and override priv->plat
data only if this hardware feature is valid.
Fixes: f10a6a3541 ("stmmac: rework get_hw_feature function")
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new platform_get_ethdev_address() helper for the cases
where dev->dev_addr is passed in directly as the destination.
@@
expression dev, net;
@@
- eth_platform_get_mac_address(dev, net->dev_addr)
+ platform_get_ethdev_address(dev, net)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-10-07
Michal Swiatkowski says:
The following patch series introduces basic switchdev model
support in ice driver. Implement the following blocks of
switchdev framework:
- VF port representors creation
- control plane VSI definition
- exception path (a. k. a. "slow-path") - to allow a virtual
switch or linux bridge to receive any packet that doesn't match
any hw filter
- link state management of virtual ports
- query virtual port statistics
Hardware offload support in switchdev mode is out of scope of
this patchset. Devlink interface is used to toggle between
switchdev and legacy (the default) modes of the driver.
---
Note: This series includes the use enum ice_status, however, we have
patches in our queue to remove it from the driver [1]. We are working
through the patches that precede the removal series.
[1] https://patchwork.ozlabs.org/project/intel-wired-lan/list/?series=265957
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce the following ethtool operations for VF's representor:
-get_drvinfo
-get_strings
-get_ethtool_stats
-get_sset_count
-get_link
In all cases, existing operations were used with minor
changes which allow us to detect if ethtool op was called for
representor. Only VF VSI stats will be available for representor.
Implement ndo_get_stats64 for port representor. This will update
VF VSI stats and read them.
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Slow path means allowing packet to go from uplink to representor
and from representor to correct VF on Rx site and from VF to
representor and to uplink on Tx site.
To accomplish this driver, has to set correct Tx descriptor. When
packet is sent from representor to VF, destination should be
set to VF VSI. When packet is sent from uplink port destination
should be uplink to bypass switch infrastructure and send packet
outside.
On Rx site driver should check source VSI field from Rx descriptor
and based on that forward packed to correct netdev. To allow
this there is a target netdevs table in control plane VSI
struct.
Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As resetting all VFs behaves mostly like creating new VFs also
eswitch infrastructure has to be recreated. The easiest way to
do that is to rebuild eswitch after resetting VFs.
Implement helper functions to start and stop all representors
queues. This is used to disable traffic on port representors.
In rebuild path:
- NAPI has to be disabled
- eswitch environment has to be set up
- new port representors have to be created, because the old
one had pointer to not existing VFs
- new control plane VSI ring should be remapped
- NAPI hast to be enabled
- rxdid has to be set to FLEX_NIC_2, because this descriptor id
support source_vsi, which is needed on control plane VSI queues
- port representors queues have to be started
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Only way to enable switchdev is to create VFs when the eswitch
mode is set to switchdev. Check if correct mode is set and
enable switchdev in function which creating VFs.
Disable switchdev when user change number of VFs to 0. Changing
eswitch mode back to legacy when VFs are created in switchdev
mode isn't allowed.
As switchdev takes care of managing filter rules, adding new
rules on VF is blocked.
In case of resetting VF driver has to update pointer in ice_repr
struct, because after reset VSI related things can change.
Co-developed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
New type of VSI has to be defined for switchdev control plane
VSI. Number of allocated Tx and Rx queue has to be equal to
amount of VFs, because each port representor should have one
Tx and Rx queue.
Also to not increase number of used irqs too much, control plane
VSI uses only one q_vector and handle all queues in one irq.
To allow handling all queues in one irq , new function to clean
msix for eswitch was introduced. This function will schedule napi
for each representor instead of scheduling it only for one like in
normal clean irq function.
Only one additional msix has to be requested. Always try to request
it in ice_ena_msix_range function.
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Switchdev environment has to be set up when user create VFs
and eswitch mode is switchdev. Release is done when user
delete all VFs.
Data path in this implementation is based on control plane VSI.
This VSI is used to pass traffic from port representors to
corresponding VFs and vice versa. Default TX rule has to be
added to forward packet to control plane VSI. This will redirect
packets from VFs which don't match other rules to control plane
VSI.
On RX side default rule is added on uplink VSI to receive all
traffic that doesn't match other rules. When setting switchdev
environment all other rules from VFs should be removed. Packet to
VFs will be forwarded by control plane VSI.
As VF without any mac rules can't send any packet because of
antispoof mechanism, VSI antispoof should be turned off on each VFs.
To send packet from representor to correct VSI, destination VSI
field in TX descriptor will have to be filled. Allow that by
setting destination override bit in control plane VSI security config.
Packet from VFs will be received on control plane VSI. Driver
should decide to which netdev forward the packet. Decision is
made based on src_vsi field from descriptor. There is a target
netdev list in control plane VSI struct which choose netdev
based on src_vsi number.
Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is no way to change default lan_en and lb_en flags while
adding new rule. Add function that allows changing these flags
on ICE_SW_LKUP_DFLT recipe and any rule id.
lan_en allows packet to go outside if rule is matched. Clearing
this bit will block packet from sending it outside.
lb_en allows packet to be forwarded to other VSI. Clearing
this bit will block packet from forwarding it to other VSI.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement functions to make setting VSI security config easier.
Main function ice_update_security fills security section field and
checks against error in updating VSI. Reset functions are responsible
for correct filling config according to user expectations.
This helper is needed because destination override is located in
this section. Driver has to set this bit to allow strering Tx packet
on VSI based on value in Tx descriptors.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In switchdev driver shouldn't add MAC, VLAN and promisc
filters on iavf demand but should return success to not
break normal iavf flow.
Achieve that by creating table of functions pointer with
default functions used to parse iavf command. While parse
iavf command, call correct function from table instead of
calling function direct.
When port representors are being created change functions
in table to new one that behaves correctly for switchdev
puprose (ignoring new filters).
Change back to default ops when representors are being
removed.
Co-developed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Port representor is used to manage VF from host side. To allow
it each created representor registers netdevice with random hw
address. Also devlink port is created for all representors.
Port representor name is created based on switch id or managed
by devlink core if devlink port was registered with success.
Open and stop ndo ops are implemented to allow managing the VF
link state. Link state is tracked in VF struct.
Struct ice_netdev_priv is extended by pointer to representor
field. This is needed to get correct representor from netdev
struct mostly used in ndo calls.
Implement helper functions to check if given netdev is netdev of
port representor (ice_is_port_repr_netdev) and to get representor
from netdev (ice_netdev_to_repr).
As driver mostly will create or destroy port representors on all
VFs instead of on single one, write functions to add and remove
representor for each VF.
Representor struct contains pointer to source VSI, which is VSI
configured on VF, backpointer to VF, backpointer to netdev,
q_vector pointer and metadata_dst which will be used in data path.
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Keeping devlink port inside VSI data structure causes some issues.
Since VF VSI is released during reset that means that we have to
unregister devlink port and register it again every time reset is
triggered. With the new changes in devlink API it
might cause deadlock issues. After calling
devlink_port_register/devlink_port_unregister devlink API is going to
lock rtnl_mutex. It's an issue when VF reset is triggered in netlink
operation context (like setting VF MAC address or VLAN),
because rtnl_lock is already taken by netlink. Another call of
rtnl_lock from devlink API results in dead-lock.
By moving devlink port to PF/VF we avoid creating/destroying it
during reset. Since this patch, devlink ports are created during
ice_probe, destroyed during ice_remove for PF and created during
ice_repr_add, destroyed during ice_repr_rem for VF.
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Write set and get eswitch mode functions used by devlink
ops. Use new pf struct member eswitch_mode to track current
eswitch mode in driver.
Changing eswitch mode is only allowed when there are no
VFs created.
Create new file for eswitch related code.
Add config flag ICE_SWITCHDEV to allow user to choose if
switchdev support should be enabled or disabled.
Use case examples:
- show current eswitch mode ('legacy' is the default one)
[root@localhost]# devlink dev eswitch show pci/0000:03:00.1
pci/0000:03:00.1: mode legacy
- move to 'switchdev' mode
[root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode
switchdev
[root@localhost]# devlink dev eswitch show pci/0000:03:00.1
pci/0000:03:00.1: mode switchdev
- create 2 VFs
[root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs
- unsuccessful attempt to change eswitch mode while VFs are created
[root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
devlink answers: Operation not supported
- destroy VFs
[root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs
- restore 'legacy' mode
[root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
[root@localhost]# devlink dev eswitch show pci/0000:03:00.1
pci/0000:03:00.1: mode legacy
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Convert a few drivers to device_get_ethdev_address(),
saving a few LoC.
The check if addr is valid in netsec is superfluous,
device_get_ethdev_addr() already checks that (in
fwnode_get_mac_addr()).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new device_get_ethdev_address() helper for the cases
where dev->dev_addr is passed in directly as the destination.
@@
expression dev, np;
@@
- device_get_mac_address(np, dev->dev_addr, ETH_ALEN)
+ device_get_ethdev_address(np, dev)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers pass in ETH_ALEN and the function itself
will return -EINVAL for any other address length.
Just assume it's ETH_ALEN like all other mac address
helpers (nvm, of, platform).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fwnode_get_mac_address() and device_get_mac_address()
return a pointer to the buffer that was passed to them
on success or NULL on failure. None of the callers
care about the actual value, only if it's NULL or not.
These semantics differ from of_get_mac_address() which
returns an int so to avoid confusion make the device
helpers return an errno.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new of_get_ethdev_address() helper for the cases
where dev->dev_addr is passed in directly as the destination.
@@
expression dev, np;
@@
- of_get_mac_address(np, dev->dev_addr)
+ of_get_ethdev_address(np, dev)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rob suggests to move of_net.c from under drivers/of/ somewhere
to the networking code.
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the transceiver module extended state and sub-state
added in previous patch. The extended state is meant to describe link
issues related to transceiver modules.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement support for ethtool_ops::.get_module_power_mode and
ethtool_ops::set_module_power_mode.
The get operation is implemented using the Management Cable IO and
Notifications (MCION) register that reports the operational power mode
of the module and its presence. In case a module is not present, its
operational power mode is not reported to ethtool and user space. If not
set before, the power mode policy is reported as "high", which is the
default on Mellanox systems.
The set operation is implemented using the Port Module Memory Map
Properties (PMMP) register. The register instructs the device's firmware
to transition a plugged-in module to / out of low power mode by writing
to its memory map.
When the power mode policy is set to 'auto', a module will not
transition to low power mode as long as any ports using it are
administratively up. Example:
# devlink port split swp11 count 4
# ethtool --set-module swp11s0 power-mode-policy auto
$ ethtool --show-module swp11s0
Module parameters for swp11s0:
power-mode-policy auto
power-mode low
# ip link set dev swp11s0 up
# ip link set dev swp11s1 up
$ ethtool --show-module swp11s0
Module parameters for swp11s0:
power-mode-policy auto
power-mode high
# ip link set dev swp11s1 down
$ ethtool --show-module swp11s0
Module parameters for swp11s0:
power-mode-policy auto
power-mode high
# ip link set dev swp11s0 down
$ ethtool --show-module swp11s0
Module parameters for swp11s0:
power-mode-policy auto
power-mode low
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the Management Cable IO and Notifications register. It will be used
to retrieve the power mode status of a module in subsequent patches and
whether a module is present in a cage or not.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the Port Module Memory Map Properties register. It will be used to
set the power mode of a module in subsequent patches.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The crit_lock mutex could be unlocked twice as reported here
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210823/025525.html
Remove the superfluous unlock. Technically the problem was already
present before 5ac49f3c27 as that commit only replaced the locking
primitive, but no functional change.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 5ac49f3c27 ("iavf: use mutexes for locking of critical sections")
Fixes: bac8486116 ("iavf: Refactor the watchdog state machine")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When VSI set up failed in i40e_probe() as part of PF switch set up
driver was trying to free misc IRQ vectors in
i40e_clear_interrupt_scheme and produced a kernel Oops:
Trying to free already-free IRQ 266
WARNING: CPU: 0 PID: 5 at kernel/irq/manage.c:1731 __free_irq+0x9a/0x300
Workqueue: events work_for_cpu_fn
RIP: 0010:__free_irq+0x9a/0x300
Call Trace:
? synchronize_irq+0x3a/0xa0
free_irq+0x2e/0x60
i40e_clear_interrupt_scheme+0x53/0x190 [i40e]
i40e_probe.part.108+0x134b/0x1a40 [i40e]
? kmem_cache_alloc+0x158/0x1c0
? acpi_ut_update_ref_count.part.1+0x8e/0x345
? acpi_ut_update_object_reference+0x15e/0x1e2
? strstr+0x21/0x70
? irq_get_irq_data+0xa/0x20
? mp_check_pin_attr+0x13/0xc0
? irq_get_irq_data+0xa/0x20
? mp_map_pin_to_irq+0xd3/0x2f0
? acpi_register_gsi_ioapic+0x93/0x170
? pci_conf1_read+0xa4/0x100
? pci_bus_read_config_word+0x49/0x70
? do_pci_enable_device+0xcc/0x100
local_pci_probe+0x41/0x90
work_for_cpu_fn+0x16/0x20
process_one_work+0x1a7/0x360
worker_thread+0x1cf/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x1f/0x40
The problem is that at that point misc IRQ vectors
were not allocated yet and we get a call trace
that driver is trying to free already free IRQ vectors.
Add a check in i40e_clear_interrupt_scheme for __I40E_MISC_IRQ_REQUESTED
PF state before calling i40e_free_misc_vector. This state is set only if
misc IRQ vectors were properly initialized.
Fixes: c17401a1dd ("i40e: use separate state bit for miscellaneous IRQ setup")
Reported-by: PJ Waskiewicz <pwaskiewicz@jumptrading.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The loop in i40e_get_capabilities can never end. The problem is that
although i40e_aq_discover_capabilities returns with an error if there's
a firmware problem, the returned error is not checked. There is a check for
pf->hw.aq.asq_last_status but that value is set to I40E_AQ_RC_OK on most
firmware problems.
When i40e_aq_discover_capabilities encounters a firmware problem, it will
encounter the same problem on its next invocation. As the result, the loop
becomes endless. We hit this with I40E_ERR_ADMIN_QUEUE_TIMEOUT but looking
at the code, it can happen with a range of other firmware errors.
I don't know what the correct behavior should be: whether the firmware
should be retried a few times, or whether pf->hw.aq.asq_last_status should
be always set to the encountered firmware error (but then it would be
pointless and can be just replaced by the i40e_aq_discover_capabilities
return value). However, the current behavior with an endless loop under the
rtnl mutex(!) is unacceptable and Intel has not submitted a fix, although we
explained the bug to them 7 months ago.
This may not be the best possible fix but it's better than hanging the whole
system on a firmware bug.
Fixes: 56a62fc868 ("i40e: init code and hardware support")
Tested-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is a spelling mistake in a DP_VERBOSE message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Static checkers and runtime checkers such as KMSan will complain that
we do not initialize the last 6 bytes of "cb_priv". The caller only
uses the first two bytes so it doesn't cause a runtime issue. Still
worth fixing though.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The err variable is checked for true or false a few lines above. When
!err is checked again, it always evaluates to true. Therefore we should
skip this check.
We should also group the adjacent statements together for readability.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the setting of the filter-sync-needed bit to the error
case in the filter add routine to be sure we're checking the
live filter status rather than a copy of the pre-sync status.
Fixes: 969f843946 ("ionic: sync the filters in the work task")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each tx queue maintains a 64bit counter for bytes, there is
no reason to truncate this to 32bit (or this has not been
documented)
Fixes: 24aeb56f2d ("gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yangchun Fu <yangchun@google.com>
Cc: Kuo Zhao <kuozhao@google.com>
Cc: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gve_get_stats() can report wrong numbers if/when u64_stats_fetch_retry()
returns true.
What is needed here is to sample values in temporary variables,
and only use them after each loop is ended.
Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Catherine Sullivan <csully@google.com>
Cc: Sagi Shahar <sagis@google.com>
Cc: Jon Olson <jonolson@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Luigi Rizzo <lrizzo@google.com>
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Tao Liu <xliutaox@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ignored errors would result in crash.
Fixes: ede3fcf5ec ("gve: Add support for raw addressing to the rx path")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent possible crashes when cleaning up after unsuccessful
initializations.
Fixes: 893ce44df5 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Catherine Sully <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qpl_map_size is rounded up to a multiple of sizeof(long), but the
number of qpls doesn't have to be.
Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation enable PCS EEE feature in the event of link
up, but PCS EEE feature is not disabled on link down.
This patch makes sure PCE EEE feature is disabled on link down.
Fixes: 656ed8b015 ("net: stmmac: fix EEE init issue when paired with EEE capable PHYs")
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert all Ethernet drivers from memcpy(... dev->addr_len)
to eth_hw_addr_set():
@@
expression dev, np;
@@
- memcpy(dev->dev_addr, np, dev->addr_len)
+ eth_hw_addr_set(dev, np)
In theory addr_len may not be ETH_ALEN, but we don't expect
non-Ethernet devices to live under this directory, and only
the following cases of setting addr_len exist:
- cxgb4 for mgmt device,
and the drivers which set it to ETH_ALEN: s2io, mlx4, vxge.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev->dev_addr will become const soon. Make sure all
functions which pass it around mark appropriate args
as const.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_en_u64_to_mac() takes the dev->dev_addr pointer and writes
to it byte by byte. It also clears the two bytes _after_ ETH_ALEN
which seems unnecessary. dev->addr_len is set to ETH_ALEN just
before the call.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_u64_to_mac() predates the common helper but doesn't
make the argument constant.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_mac_to_u64() predates and opencodes ether_addr_to_u64().
It doesn't make the argument constant so it'll be problematic
when dev->dev_addr becomes a const. Convert to the generic helper.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Misc updates for mlx5 driver
1) Add TX max rate support for MQPRIO channel mode
2) Trivial TC action and modify header refactoring
3) TC support for accept action in fdb offloads
4) Allow single IRQ for PCI functions
5) Bridge offload: Pop PVID VLAN header on egress miss
Vlad Buslov says:
=================
With current architecture of mlx5 bridge offload it is possible for a
packet to match in ingress table by source MAC (resulting VLAN header push
in case of port with configured PVID) and then miss in egress table when
destination MAC is not in FDB. Due to the lack of hardware learning in
NICs, this, in turn, results packet going to software data path with PVID
VLAN already added by hardware. This doesn't break software bridge since it
accepts either untagged packets or packets with any provisioned VLAN on
ports with PVID, but can break ingress TC, if affected part of Ethernet
header is matched by classifier.
Improve compatibility with software TC by restoring the packet header on
egress miss. Effectively, this change implements atomicity of mlx5 bridge
offload implementation - packet is either modified and redirected to
destination port or appears unmodified in software.
=================
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFbpiEACgkQSD+KveBX
+j4rjQf/a1UTqBH31Rh3+zr71yAhfsEYHdSogdPe1oo9zA4IvDZ0uwwdBNPNjzYa
ZTcDPKmHgbi6UUqokpmWHYDieXNsZz95lPWS0/QcySgnSag9keGpS2I1y9KtvurH
MkejWuCUD1UniPPIw02F1AJ3hNOLjDst8gydyt2T52lqxHX9xprcgxAXcUPkGCsW
7jw+g5F6hbahgh1fFdBERqdLmvJiv2i0gmo5XEIYr5lQePqba43B4EQNKZkSQ/91
Gz8537wCHixW4q2e81m60b0olXrG65JTQAj+ckUUR8VbHwxCBbM5jzOZZXw9FXbB
hebTL+GflwbmshVWluXLlSKLu2gBaw==
=iVdj
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-10-04
Misc updates for mlx5 driver
1) Add TX max rate support for MQPRIO channel mode
2) Trivial TC action and modify header refactoring
3) TC support for accept action in fdb offloads
4) Allow single IRQ for PCI functions
5) Bridge offload: Pop PVID VLAN header on egress miss
Vlad Buslov says:
=================
With current architecture of mlx5 bridge offload it is possible for a
packet to match in ingress table by source MAC (resulting VLAN header push
in case of port with configured PVID) and then miss in egress table when
destination MAC is not in FDB. Due to the lack of hardware learning in
NICs, this, in turn, results packet going to software data path with PVID
VLAN already added by hardware. This doesn't break software bridge since it
accepts either untagged packets or packets with any provisioned VLAN on
ports with PVID, but can break ingress TC, if affected part of Ethernet
header is matched by classifier.
Improve compatibility with software TC by restoring the packet header on
egress miss. Effectively, this change implements atomicity of mlx5 bridge
offload implementation - packet is either modified and redirected to
destination port or appears unmodified in software.
=================
=================
Signed-off-by: David S. Miller <davem@davemloft.net>
Check ethernet controller DT node for "mdio" subnode and use it with
of_mdiobus_register() when present. That allows specifying MDIO and its
PHY devices in a standard DT based way.
This is required for BCM53573 SoC support. That family is sometimes
called Northstar (by marketing?) but is quite different from it. It uses
different CPU(s) and many different hw blocks.
One of shared blocks in BCM53573 is Ethernet controller. Switch however
is not SRAB accessible (as it Northstar) but is MDIO attached.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Use info from DT if available
It allows describing for example a fixed link. It's more accurate than
just guessing there may be one (depending on a chipset).
2. Verify PHY ID before trying to connect PHY
PHY addr 0x1e (30) is special in Broadcom routers and means a switch
connected as MDIO devices instead of a real PHY. Don't try connecting to
it.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to cast the pointer, unlike memcpy() eth_hw_addr_set()
does not take void *. The driver already casts &port->mac_addr
to u8 * in other places.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a96d317fb1 ("ethernet: use eth_hw_addr_set()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch the driver requires two IRQs to function properly,
one required IRQ for control and at least one required IRQ for IO.
This requirement can be relaxed to one as the driver now allows
sharing of IRQs, so control and IO EQs can share the same irq.
This is needed for high scale amount of VFs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Control IRQ is the first IRQ vector. This complicates handling of
completion irqs as we need to offset them by one.
in the next patch, there are scenarios where completion and control EQs
will share the same irq. for example: functions with single IRQ. To ease
such scenarios, we shift control IRQ to the end of the irq array.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Create lowest priority flow group in egress table with single rule that
matches on special reg_c1 value that is set on ingress VLAN push with
single action that pops VLAN. The flow destination is skip table that is
used to skip any further processing of packet in FDB bridge priority.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
On ingress VLAN push also assign value 0x7FE to reg_c1 tunnel id+opts
bits (tunnel id 0, which is not a valid tunnel id, and option 0x7FE which
was reserved by one of previous patches in the series). In following patch
the reg value is matched on egress miss to restore the packet to its
original state by removing the VLAN before passing it to the software data
path.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Following patches in series need to pop VLAN when packet misses on egress.
To reuse existing bridge VLAN pop handling code, extract it to dedicated
helpers mlx5_esw_bridge_pkt_reformat_vlan_pop_supported() and
mlx5_esw_bridge_pkt_reformat_vlan_pop_create().
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Several functions in bridge.c excessively obtain pointer to parent eswitch
instance by dereferencing br_offloads->esw on every usage and following
patches in this series add even more usages of eswitch. Introduce local
variable 'esw' and use it instead.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Support TC generic 'accept' action in mlx5 by introducing
MLX5_ESW_ATTR_FLAG_ACCEPT attribute flag. Flag has similar semantics to
existing MLX5_ESW_ATTR_FLAG_SLOW_PATH flag, however, dedicated flag is
required because existing 'slow path' flag can be flipped by tunneling
subsystem when neighbor changes state.
Introduce new helper function mlx5_esw_attr_flags_skip() to check whether
attribute flags for 'slow path' or 'accept' action are set and use it in
eswitch code instead of direct bit manipulation.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There is a use case that the local and remote VTEPs are in the same
host. Currently, the out ifindex is not specified when looking up the
encap route for offloads. So in this case, a local route is returned
and the route dev is lo.
Actual tunnel interface can be created with a parameter "dev" [1],
which specifies the physical device to use for tunnel endpoint
communication. Pass this parameter to driver when looking up encap
route for offloads. So that a unicast route will be returned.
[1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reserve one more value from TC tunnel options range to be used by bridge
offload in following patches.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The parse fdb/nic actions funcs parse the actions and then call
actions_match_supported() for final check.
Move related check in parse_tc_fdb_actions() into
actions_match_supported_fdb() for more organized code.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There will probably be more checks, some for nic flows, some for fdb
flows and some are shared checks. Split it for fdb and nic to avoid
the function getting too big.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Move mod hdr allocation chunk from parse_tc_fdb_actions() and
parse_tc_nic_actions() to a shared function.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Refactor sample unoffload to be symmetric to sample offload.
Use the existing del_post_rule() to release the post rule.
Also mlx5e_tc_sample_unoffload() should not return post_rule
which is NULL when post actions are supported.
Sample offload works with this NULL because many places of the
code use IS_ERR() instead of IS_ERR_OR_NULL() to check rule is valid
and when rule is detected as sample offload the code is not using the
rule. Let's be persistent and avoid returning NULL anyway and return the
pre rule, like in CT case, which is not NULL.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Let the caller of mlx5e_open_txqsq() directly pass the SQ stats
structure pointer.
This replaces logic involving the qos_queue_group_id parameter,
and helps generalizing its role in the next patch.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmFaG98eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGosAH/jqy5B2BIEE39O+8
QTr3vO54SyRRuY/d98wZ+O4SPjfqfpCHuyjKt9YJpEdmzH754NC9gSPOOBegnvHI
DfrWaivmJ5mdjN2h7+JVqjs58krUv98wWNa5xfvqUp5H7wF3WQg3AxsaMKS1PePD
kFHfeFbxsg2gYhyhPK6gHtwLn6dEsx9bGny2bKvCh6KuJQEiUXoEcgnFzjFgLNxp
T5zI1cNSCNUzwRIe+vqQRlfVR2JlSI4tiy0zNJWy9dQ5Z4HOSbFcEz5Df2N7qNYn
/MqruaASmyREgo9yLHpR1BSyzrea8MCckY04ycYqKZb7gDwcrpAe4QVw2I/Fuzu9
q//PV4I=
=+mYg
-----END PGP SIGNATURE-----
Merge tag 'v5.15-rc4' into rdma.get for-next
Merged due to dependencies in following patches.
Conflict in drivers/infiniband/hw/hfi1/ipoib_tx.c resolved by hand to take
the %p change and txq stats rename together.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Update three drivers to use the new phylink_set_10g_modes() helper:
Cadence macb, Freescale DPAA2 and Marvell PP2.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
If stats ID of a LL2 (light l2) queue exceeds than the total amount
of statistics counters, it may cause system crash upon enabling
RDMA on all PFs.
This patch makes sure that the stats ID of the LL2 queue doesn't exceed
the max allowed value.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize 2 MSL timeout value used for the TCP TIME_WAIT state to
non-zero default.
This patch also removes magic number from qedi/qedi_main.c.
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update TCP silly-window-syndrome timeout, for the cases where
initiator's small TCP window size prevents FW from transmitting
packets on the connection. Timeout causes FW to retransmit
window probes if needed, preventing I/O stall if initiator ignores
first window probe.
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qed_debug features are updated to support FW version 8.59.1.0 along
with few enhancements.
- Removal of _BB_K2 from register defines.
- Add new condition cond14.
- Add dump of new area sw-platform, epoch, iscsi_task_pages,
fcoe_task_pages, roce_task_pages and eth_task_pages.
- Introduced new functions qed_dbg_phy_size().
- Update in qed_mcp_nvm_rd_cmd() declaration.
- Allow QED to control init/exit at pf level.
- Dump partial "ILT-dump" if buffer size is not sufficient.
This patch also fixes the existing checkpatch warnings and few important
checks.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GTT (Global translation table) is a fast-access window in the BAR into
the register space, which only maps certain register addresses.
This change helps enforce that only those addresses which are indeed
mapped by the GTT are being accessed through it.
Adding the '_GTT' suffix to the IRO FW memory (“RAM”) macros that
access GTT-able region in FW memories (“RAM”) and use GTT macros
to access RAM BAR from drivers.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qed_init_fw_func.c and qed_init_ops.c updated to support FW
version 8.59.1.0.
- Support 16-bit VPORT WFQ (weighted fair queueing) weights.
- Support WFQ (weighted fair queueing) weight per VPORT + TC.
- Support allocation of Tx PQs(physical queues) per PF,VF.
- Modify Global RL (rate limiter) upper bound configuration.
- Update FW operation functions.
- Update iro_arr[] array.
This patch also fixes the existing checkpatch warnings and few important
checks.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qed_iro_hsi.h contains HSI changes related to storm memories access.
Existing code is based on hard-coded index.
Use enum as defined for FW HSI 8.59.1.0, instead of hard-coded index.
This patch also removes unnecessary header file inclusion.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qed_hsi.h has been updated to support new FW version 8.59.1.0 with
changes.
- Updates FW HSI (Hardware Software interface) structures.
- Addition/update in function declaration and defines as per HSI.
- Add generic infrastructure for FW error reporting as part of
common event queue handling.
- Move malicious VF error reporting to FW error reporting
infrastructure.
- Move consolidation queue initialization from FW context to ramrod
message.
qed_hsi.h header file changes lead to change in many files to ensure
compilation.
This patch also fixes the existing checkpatch warnings and few important
checks.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qed_mfw_hsi.h contains HSI (Hardware Software Interface) changes
related to management firmware. It has been updated to support new FW
version 8.59.1.0 with below changes.
- New defines for VF bitmap.
- fec_mode and extended_speed defines updated in struct eth_phy_cfg.
- Updated structutres lldp_system_tlvs_buffer_s, public_global,
public_port, public_func, drv_union_data, public_drv_mb
with all dependent new structures.
- Updates in NVM related structures and defines.
- Msg defines are added in enum drv_msg_code and fw_msg_code.
- Updated/added new defines.
This patch also fixes the existing checkpatch warnings and few important
checks.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The common_hsi.h has been updated for FW version 8.59.1.0 with below
changes.
- FW and Tools version.
- New structures related to search table, packet duplication.
- Structure for doorbell address for legacy mode without DEM.
- Enhanced union rdma_eqe_data for RoCE Suspend Event Data.
- New defines.
This patch also fixes the existing checkpatch warnings and few important
checks.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qed_hsi.h is a huge header file containing HSI (Hardware Software
Interface) definitions of storm memory access, debug related, general
and management firmware specific. In order to have a better
code-organization HSI definition, this patch split the code across
multiple files, i.e.
- storm memory access HSI : qed_iro_hsi.h
- debug related HSI : qed_dbg_hsi.h
- Management firmware HSI : qed_mfg_hsi.h
- General HSI : qed_hsi.h
In addition, this patch also fixes existing checkpatch warnings and
few important checks.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing qed/qede/qedr/qedi/qedf code uses chip-specific naming in
structures, functions, variables and defines in FW HSI (Hardware
Software Interface).
The new FW version introduced a generic naming convention in HSI
in-which the same code will be used across different versions
for simpler maintainability. It also eases in providing support for
new features.
With this patch every "_e4" or "e4_" prefix or suffix is not needed
anymore and it will be removed.
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes all the qed and qede kernel-doc warnings
according to the guidelines that are described in
Documentation/doc-guide/kernel-doc.rst.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch caches doorbell address directly in struct mlx4_en_tx_ring.
This removes the need to bring in cpu caches whole struct mlx4_uar
in fast path.
Note that mlx4_uar is not guaranteed to be on a local node,
because mlx4_bf_alloc() uses a single free list (priv->bf_list)
regardless of its node parameter.
This kind of change does matter in presence of light/moderate traffic.
In high stress, this read-only line would be kept hot in caches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eth_hw_addr_set() takes a u8 pointer, like other
etherdevice helpers. Convert the few drivers which
require casts because they memcpy from "endian marked"
types.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manual conversions because we need to get to the member
which is inside an array to have a u8 pointer which
eth_hw_addr_set() expects.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert chelsio drivers from memcpy() and ether_addr_copy()
to eth_hw_addr_set(). They lack includes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert Ethernet from ether_addr_copy() to eth_hw_addr_set():
@@
expression dev, np;
@@
- ether_addr_copy(dev->dev_addr, np)
+ eth_hw_addr_set(dev, np)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert all Ethernet drivers from memcpy(... ETH_ADDR)
to eth_hw_addr_set():
@@
expression dev, np;
@@
- memcpy(dev->dev_addr, np, ETH_ALEN)
+ eth_hw_addr_set(dev, np)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VLAN TCI contains more than the VLAN ID, it also has the VLAN PCP
and Drop Eligibility Indicator.
If the ocelot driver is going to write the VLAN header inside the DSA
tag, it could just as well write the entire TCI.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the ocelot driver does support the 'vlan modify' action, but
in the ingress chain, and it is offloaded to VCAP IS1. This action
changes the classified VLAN before the packet enters the bridging
service, and the bridging works with the classified VLAN modified by
VCAP IS1.
That is good for some use cases, but there are others where the VLAN
must be modified at the stage of the egress port, after the packet has
exited the bridging service. One example is simulating IEEE 802.1CB
active stream identification filters ("active" means that not only the
rule matches on a packet flow, but it is also able to change some
headers). For example, a stream is replicated on two egress ports, but
they must have different VLAN IDs on egress ports A and B.
This seems like a task for the VCAP ES0, but that currently only
supports pushing the ES0 tag A, which is specified in the rule. Pushing
another VLAN header is not what we want, but rather overwriting the
existing one.
It looks like when we push the ES0 tag A, it is actually possible to not
only take the ES0 tag A's value from the rule itself (VID_A_VAL), but
derive it from the following formula:
ES0_TAG_A = Classified VID + VID_A_VAL
Otherwise said, ES0_TAG_A can be used to increment with a given value
the VLAN ID that the packet was already classified to, and the packet
will have this value as an outer VLAN tag. This new VLAN ID value then
gets stripped on egress (or not) according to the value of the native
VLAN from the bridging service.
While the hardware will happily increment the classified VLAN ID for all
packets that match the ES0 rule, in practice this would be rather
insane, so we only allow this kind of ES0 action if the ES0 filter
contains a VLAN ID too, so as to restrict the matching on a known
classified VLAN. If we program VID_A_VAL with the delta between the
desired final VLAN (ES0_TAG_A) and the classified VLAN, we obtain the
desired behavior.
It doesn't look like it is possible with the tc-vlan action to modify
the VLAN ID but not the PCP. In hardware it is possible to leave the PCP
to the classified value, but we unconditionally program it to overwrite
it with the PCP value from the rule.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the lif parameter for use in an error message, and
to better match the style of most of the functions calls.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the code a little by keeping the send_to_hw decision
inside of ionic_qcq_disable rather than in the callers. Also,
add ENXIO to the decision expression.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the adminq wait into smaller polling periods in order
to watch for broken firmware and not have to wait for the full
adminq devcmd_timeout.
Generally, adminq commands take fewer than 2 msecs. If the
FW is busy they can take longer, but usually still under 100
msecs. We set the polling period to 100 msecs in order to
start snooping on FW status when a command is taking longer
than usual.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Widen the coverage of the queue_lock to be sure the lif init
and lif deinit actions are protected. This addresses a hang
seen when a Tx Timeout action was attempted at the same time
as a FW Reset was started.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move creation and deletion of lif mutex a level out to
lif creation and delete, rather than in init and deinit.
This assures that nothing will get hung if anything is waiting
on the mutex while the driver is clearing the lif while handling
the fw_down/fw_up cycle.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the PCI connection is broken, reading the FW version string
will only get 0xff bytes, which shouldn't get printed. This
checks the first byte and prints only the first 4 bytes
if non-ASCII.
Also, add a limit to the string length printed when a valid
string is found, just in case it is not properly terminated.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
These debug stats are not really useful, their collection is
likely detrimental to performance, and they suck up a lot
of memory which never gets used if no one ever enables the
priv-flag to print them, so just remove these bits.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize GbEthernet E-MAC found on RZ/G2L SoC.
This patch also renames ravb_set_rate to ravb_set_rate_rcar and
ravb_rcar_emac_init to ravb_emac_init_rcar to be consistent with
the naming convention used in sh_eth driver.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RZ/G2L supports half duplex mode.
Add a half_duplex hw feature bit to struct ravb_hw_info for
supporting half duplex mode for RZ/G2L.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
E-MAC on R-Car supports magic packet detection, whereas RZ/G2L
does not support this feature. Add magic_pkt to struct ravb_hw_info
and enable this feature only for R-Car.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car AVB-DMAC has 4 Transmit start request queues, whereas
RZ/G2L has only 1 Transmit start request queue.
Add a tsrq variable to struct ravb_hw_info to handle this
difference.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car supports gPTP feature whereas RZ/G2L does not support it.
This patch excludes gtp feature support for RZ/G2L.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize GbEthernet DMAC found on RZ/G2L SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RZ/G2L SoC has Gigabit Ethernet IP consisting of Ethernet controller
(E-MAC), Internal TCP/IP Offload Engine (TOE) and Dedicated Direct
memory access controller (DMAC).
This patch adds compatible string for RZ/G2L and fills up the
ravb_hw_info struct. Function stubs are added which will be used by
gbeth_hw_info and will be filled incrementally.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car supports network control queue whereas RZ/G2L does not support
it. Add nc_queue to struct ravb_hw_info, so that NC queue is handled
only by R-Car.
This patch also renames ravb_rcar_dmac_init to ravb_dmac_init_rcar
to be consistent with the naming convention used in sh_eth driver.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the variable "no_ptp_cfg_active" with "gptp" and
"ptp_cfg_active" with "ccc_gac" to match the HW features.
There is no functional change.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename "ravb_set_features_rx_csum" function to "ravb_set_features_rcar" and
replace the function pointer "set_rx_csum_feature" with "set_feature".
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
bpf-next 2021-10-02
We've added 85 non-merge commits during the last 15 day(s) which contain
a total of 132 files changed, 13779 insertions(+), 6724 deletions(-).
The main changes are:
1) Massive update on test_bpf.ko coverage for JITs as preparatory work for
an upcoming MIPS eBPF JIT, from Johan Almbladh.
2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool,
with driver support for i40e and ice from Magnus Karlsson.
3) Add legacy uprobe support to libbpf to complement recently merged legacy
kprobe support, from Andrii Nakryiko.
4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky.
5) Support saving the register state in verifier when spilling <8byte bounded
scalar to the stack, from Martin Lau.
6) Add libbpf opt-in for stricter BPF program section name handling as part
of libbpf 1.0 effort, from Andrii Nakryiko.
7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov.
8) Fix skel_internal.h to propagate errno if the loader indicates an internal
error, from Kumar Kartikeya Dwivedi.
9) Fix build warnings with -Wcast-function-type so that the option can later
be enabled by default for the kernel, from Kees Cook.
10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it
otherwise errors out when encountering them, from Toke Høiland-Jørgensen.
11) Teach libbpf to recognize specialized maps (such as for perf RB) and
internally remove BTF type IDs when creating them, from Hengqi Chen.
12) Various fixes and improvements to BPF selftests.
====================
Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 2d26f6e39a ("net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings")
while getting rid of a runtime PM warning ended up breaking ethernet
on rk3399 based devices. By dropping an extra reference to the device,
the commit ends up enabling suspend / resume of the ethernet device -
which appears to be broken.
While the issue with runtime pm is being investigated, partially
revert commit 2d26f6e39a to restore the network on rk3399.
Fixes: 2d26f6e39a ("net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings")
Suggested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Cc: Michael Riesch <michael.riesch@wolfvision.net>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20210929135049.3426058-1-punitagrawal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When ocelot_flower.c calls ocelot_vcap_filter_add(), the filter has a
given filter->id.cookie. This filter is added to the block->rules list.
However, when ocelot_flower.c calls ocelot_vcap_block_find_filter_by_id()
which passes the cookie as argument, the filter is never found by
filter->id.cookie when searching through the block->rules list.
This is unsurprising, since the filter->id.cookie is an unsigned long,
but the cookie argument provided to ocelot_vcap_block_find_filter_by_id()
is a signed int, and the comparison fails.
Fixes: 50c6cc5b92 ("net: mscc: ocelot: store a namespaced VCAP filter ID")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210930125330.2078625-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Misc mlx5 updates:
1) SW steering, Vports handling and SFs support
From Yevgeny Kliteynik
======================
This patch series deals with vport handling in SW steering.
For every vport, SW steering queries FW for this vport's properties,
such as RX/TX ICM addresses to be able to add this vport as dest action.
The following patches rework vport capabilities managements and add support
for Scalable Functions (SFs).
- Patch 1 fixes the vport number data type all over the DR code to 16 bits
in accordance with HW spec.
- Patch 2 replaces local SW steering WIRE_PORT macro with the existing
mlx5 define.
- Patch 3 adds missing query for vport 0 and and handles eswitch manager
capabilities for ECPF (BlueField in embedded CPU mode).
- Patch 4 fixes error messages for failure to obtain vport caps from
different locations in the code to have the same verbosity level and
similar wording.
- Patch 5 adds support for csum recalculation flow tables on SFs: it
implements these FTs management in XArray instead of the fixed size array,
thus adding support for csum recalculation table for any valid vport.
- Patch 6 is the main patch of this whole series: it refactors vports
capabilities handling and adds SFs support.
======================
2) Minor and trivial updates and cleanups
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFWReYACgkQSD+KveBX
+j7GQAgAkCiEDAOY2WhZIf2Z1L9pOqLCKjT5yjbsbncMmBgMlxe5WPuQaDUiLLqC
ihKYpmrHerimx796W6lPlaNUHS0eX4MHNRJgRU5nZhp6MjTrP9rjFav/oEHtdl/n
TpZJEbxZwttBTdNMZLalnI4EVOdzpGpEuxe+YMWu38moUH+sovboXyFqA6Clhc6X
ofsjgx/C79lMppKT6TaC04zsFawG8MOUEHNvpZ6btAfL2RT7Oxgy7/uccsRAe4l+
GUqCcx1zXX/dNNMPTFjpu09guSzxFMAtSD24cRX+bX/UpU3XJBplmuvW0EKahpOC
k/eGul4f75FU3eAnJRyUNRgXg6oQeg==
=QN9O
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-09-30
1) From Yevgeny Kliteynik:
This patch series deals with vport handling in SW steering.
For every vport, SW steering queries FW for this vport's properties,
such as RX/TX ICM addresses to be able to add this vport as dest action.
The following patches rework vport capabilities managements and add support
for Scalable Functions (SFs).
- Patch 1 fixes the vport number data type all over the DR code to 16 bits
in accordance with HW spec.
- Patch 2 replaces local SW steering WIRE_PORT macro with the existing
mlx5 define.
- Patch 3 adds missing query for vport 0 and and handles eswitch manager
capabilities for ECPF (BlueField in embedded CPU mode).
- Patch 4 fixes error messages for failure to obtain vport caps from
different locations in the code to have the same verbosity level and
similar wording.
- Patch 5 adds support for csum recalculation flow tables on SFs: it
implements these FTs management in XArray instead of the fixed size array,
thus adding support for csum recalculation table for any valid vport.
- Patch 6 is the main patch of this whole series: it refactors vports
capabilities handling and adds SFs support.
2) Minor and trivial updates and cleanups
* tag 'mlx5-updates-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Use array_size() helper
net/mlx5: Use struct_size() helper in kvzalloc()
net/mlx5: Use kvcalloc() instead of kvzalloc()
net/mlx5: Tolerate failures in debug features while driver load
net/mlx5: Warn for devlink reload when there are VFs alive
net/mlx5: DR, Add missing string for action type SAMPLER
net/mlx5: DR, init_next_match only if needed
net/mlx5: DR, Fix typo 'offeset' to 'offset'
net/mlx5: DR, Increase supported num of actions to 32
net/mlx5: DR, Add support for SF vports
net/mlx5: DR, Support csum recalculation flow table on SFs
net/mlx5: DR, Align error messages for failure to obtain vport caps
net/mlx5: DR, Add missing query for vport 0
net/mlx5: DR, Replace local WIRE_PORT macro with the existing MLX5_VPORT_UPLINK
net/mlx5: DR, Fix vport number data type to u16
====================
Link: https://lore.kernel.org/r/20210930232050.41779-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFWJy0ACgkQSD+KveBX
+j7wJQgAqlx3rBiJGYgtnEr/HHzBSgZXkIICGfoQHfiOORoNTjpq46Csxuiu2Fg1
JvxB1Dahoht/8Vfcm+GooKKDRlkUXx8vYhQF3BSQ7oboVGkWZ1a6ZuEUSfW3rrag
oBkSCuVcn6EwVBFF0NvAi9ARd2G5GAYsew9yQHnXVQzp9ZWqzZcMMsp3pdhQvKLu
CL0SlY0uZhXHyHNl5Gz79xQDPwSyXbVlhhEK9lIaPi5tcqA3X1Y7ZTqP/ouDjZBj
/VLCIYJZ40471c4a2YVLEXiVZu0E0BDgOLlEWRiHIkTrQf+gjcjOCOUT1CkMOuid
F7Ny4xnVaCzCIx8V5i3FXAOfcOeOIA==
=TCu2
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-09-30
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use array_size() helper to aid in 2-factor allocation instances.
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows that,
in the worse scenario, could lead to heap overflows.
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use 2-factor argument form kvcalloc() instead of kvzalloc().
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
FW tracer and resource dump are debug features. Although failing to
initialize them may indicate an error, don't let this stop device
loading.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When performing PF reload, VF can't communicate with FW until
it recovers and reloads as well.
Add a warning message when performing devlink reload while
VFs are still present. Thus, giving a notice of an unfavorable
behavior that might occur as a result of a consequential reloads
and cause interruption of VF recovery.
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Allocate next steering table entry only if the remaining space requires to.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Increase max supported number of actions in the same rule.
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Move all the vport capabilities to a separate struct and store vport caps
in XArray: SFs vport numbers will not come in the same range as VF vports,
so the existing implementation of vport capabilities as a fixed size array
is not suitable here.
XArray is a perfect fit: it is efficient when the indices used are densely
clustered. In addition to being a perfect fit as a dynamic data structure,
XArray also provides locking - it uses RCU and an internal spinlock to
synchronise access, so no additional protection needed.
Now except for the eswitch manager vport, all other vports (including the
uplink vport) are handled in the same way: when a new go-to-vport action
is added, this vport's caps are loaded from the xarray. If it is the first
time for this particular vport number, then its capabilities are queried
from FW and filled in into the appropriate entry.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Implement csum recalculation flow tables in XAarray instead of a fixed
array, thus adding support for csum recalc table on any valid vport
number, which enables this support for SFs.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Print similar error messages when an invalid vport number is
provided during action creation and during STEv0/1 creation.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, vport 0 capabilities are not set.
To fix this, we now querying both eswitch manager and vport 0.
Eswitch manager has an access to all the vports - for eswitch manager PF, all
vports can be referred as other vports. The exception is embedded CPU mode,
where there is vport 0 of ECPF and the PF vport 0.
Here is how vport are queried:
For Connect-X5/6:
PF vport (0) and vports 1..n: vport number, other = true
esw_manager is vport 0 (PF)
For BlueField (in embedded CPU mode):
ECPF vport: vport = 0, other = false
PF vport (0) and 1..n: vport number, other = true
esw_manager = vport 0 (ECPF)
Also, note that there's no need for other_vport function parameter
in dr_domain_query_vport - this value is now deduced locally in the
function.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SW steering defines its own macro for uplink vport number.
Replace this macro with an already existing mlx5 macro.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
According to the HW spec, vport number is a 16-bit value.
Fix vport usage all over the code to u16 data type.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
TX-port-TS hijacks the PTP traffic to a specific HW TX-queue. This
conflicts with MQPRIO in channel mode, which specifies explicitly which
TC accepts the packet. This patch mutually excludes the above
configuration.
Fixes: ec60c4581b ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When setting number of completion EQs of the SF, consider number of
online CPUs.
Without this consideration, when number of online cpus are less than 8,
unnecessary 8 completion EQs are allocated.
Fixes: c36326d38d ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The maximum irq_index can be 2047, This means irq_name should have 4
characters reserve for the irq_index. Hence, increase it to 4.
Fixes: 3af26495a2 ("net/mlx5: Enlarge interrupt field in CREATE_EQ")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When in Real-time mode, HW clock is synced with the PTP daemon. Hence
driver should not re-calibrate the next pulse (via MTPPSE repetitive
events mechanism).
This patch arms repetitive events only in free-running mode.
Fixes: 432119de33 ("net/mlx5: Add cyc2time HW translation mode support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Allow configuration of 1PPS start time only with time-stamp representing
a round second. Prior to this patch driver allowed setting of a
non-round-second which is not supported by the device. Avoid unexpected
behavior by restricting start-time configuration to a round-second.
Fixes: 4272f9b88d ("net/mlx5e: Change 1PPS out scheme")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
* Add netdev->tc_to_txq rollback in case of failure in
mlx5e_update_netdev_queues().
* Fix broken transition between the two modes:
MQPRIO DCB mode with tc==8, and MQPRIO channel mode.
* Disable MQPRIO channel mode if re-attaching with a different number
of channels.
* Improve code sharing.
Fixes: ec60c4581b ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The value for maximum number of channels is first calculated based
on the netdev's profile and current function resources (specifically,
number of MSIX vectors, which depends among other things on the number
of online cores in the system).
This value is then used to calculate the netdev's number of rxqs/txqs.
Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
is constant and we must not exceed it.
To achieve this, keep the maximum number of channels in sync upon any
netdevice re-attach.
Use mlx5e_get_max_num_channels() for calculating the number of netdev's
rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
coinsiders core device resources, profile, and netdev) to init or
update priv->max_nch.
Before this patch, the value of priv->max_nch might get out of sync,
mistakenly allowing accesses to out-of-bounds objects, which would
crash the system.
Track the number of channels stats structures used in a separate
field, as they are persistent to suspend/resume operations. All the
collected stats of every channel index that ever existed should be
preserved. They are reset only when struct mlx5e_priv is,
in mlx5e_priv_cleanup(), which is part of the profile changing flow.
There is no point anymore in blocking a profile change due to max_nch
mismatch in mlx5e_netdev_change_profile(). Remove the limitation.
Fixes: a1f240f180 ("net/mlx5e: Adjust to max number of channles when re-attaching")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently in Rx data path IPsec crypto offloaded packets uses
csum_none flag, so checksum is handled by the stack, this naturally
have some performance/cpu utilization impact on such flows. As Nvidia
NIC starting from ConnectX6DX provides checksum complete value out of
the box also for such flows there is no sense in taking csum_none path,
furthermore the stack (xfrm) have the method to handle checksum complete
corrections for such flows i.e. IPsec trailer removal and consequently
checksum value adjustment.
Because of the above and in addition the ConnectX6DX is the first HW
which supports IPsec crypto offload then it is safe to report csum
complete for IPsec offloaded traffic.
Fixes: b2ac7541e3 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add counters for XDP REDIRECT success and failure. This brings the
redirect path in line with metrics gathered via the other XDP paths.
Signed-off-by: Joshua Roys <roysjosh@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When STMMAC is paired with Energy-Efficient Ethernet(EEE) capable PHY,
and the PHY is advertising EEE by default, we need to enable EEE on the
xPCS side too, instead of having user to manually trigger the enabling
config via ethtool.
Fixed this by adding xpcs_config_eee() call in stmmac_eee_init().
Fixes: 7617af3d1a ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet")
Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the
server is equipped with more than 64 cpus online. So it turns out that
the loading of xdpdrv causes the "NOMEM" failure.
Actually, we can adjust the algorithm and then make it work through
mapping the current cpu to some xdp ring with the protect of @tx_lock.
Here are some numbers before/after applying this patch with xdp-example
loaded on the eth0X:
As client (tx path):
Before After
TCP_STREAM send-64 734.14 714.20
TCP_STREAM send-128 1401.91 1395.05
TCP_STREAM send-512 5311.67 5292.84
TCP_STREAM send-1k 9277.40 9356.22 (not stable)
TCP_RR send-1 22559.75 21844.22
TCP_RR send-128 23169.54 22725.13
TCP_RR send-512 21670.91 21412.56
As server (rx path):
Before After
TCP_STREAM send-64 1416.49 1383.12
TCP_STREAM send-128 3141.49 3055.50
TCP_STREAM send-512 9488.73 9487.44
TCP_STREAM send-1k 9491.17 9356.22 (not stable)
TCP_RR send-1 23617.74 23601.60
...
Notice: the TCP_RR mode is unstable as the official document explains.
I tested many times with different parameters combined through netperf.
Though the result is not that accurate, I cannot see much influence on
this patch. The static key is places on the hot path, but it actually
shouldn't cause a huge regression theoretically.
Co-developed-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable pin is being initialized with a value that is never
read, it is being updated later on in only one case of a switch
statement. The assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The macb PTP support currently implements the `gettime64` callback to allow
to retrieve the hardware clock time. Update the implementation to provide
the `gettimex64` callback instead.
The difference between the two is that with `gettime64` a snapshot of the
system clock is taken before and after invoking the callback. Whereas
`gettimex64` expects the callback itself to take the snapshots.
To get the time from the macb Ethernet core multiple register accesses have
to be done. Only one of which will happen at the time reported by the
function. This leads to a non-symmetric delay and adds a slight offset
between the hardware and system clock time when using the `gettime64`
method. This offset can be a few 100 nanoseconds. Switching to the
`gettimex64` method allows for a more precise correlation of the hardware
and system clocks and results in a lower offset between the two.
On a Xilinx ZynqMP system `phc2sys` reports a delay of 1120 ns before and
300 ns after the patch. With the latter being mostly symmetric.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds XDP_PASS, XDP_TX, XDP_DROP and XDP_REDIRECT support
for netdev PF.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of ltype NPC_LT_LA_CPT_HDR, LA pointer is pointing to the
start of cpt parse header. Since cpt parse header has veriable
length padding, this will be a problem for DMAC extraction. Adding
KPU profile changes to adjust the LA pointer to start at ether header
in case of cpt parse header by
- Adding ptr advance in pkind 58 to a fixed value 40
- Adding variable length offset 7 and mask 7 (pad len in
CPT_PARSE_HDR).
Also added the missing static declaration for npc_set_var_len_offset_pkind
function.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use 2-factor argument form kvcalloc() instead of kvzalloc().
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the firmware compatible features are enabled in PF driver
initialization process, but they are not disabled in PF driver
deinitialization process and firmware keeps these features in enabled
status.
In this case, if load an old PF driver (for example, in VM) which not
support the firmware compatible features, firmware will still send mailbox
message to PF when link status changed and PF will print
"un-supported mailbox message, code = 201".
To fix this problem, disable these firmware compatible features in PF
driver deinitialization process.
Fixes: ed8fb4b262 ("net: hns3: add link change event report")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the rx vlan filter will always be disabled before selftest and
be enabled after selftest as the rx vlan filter feature is fixed on in
old device earlier than V3.
However, this feature is not fixed in some new devices and it can be
disabled by user. In this case, it is wrong if rx vlan filter is enabled
after selftest. So fix it.
Fixes: bcc26e8dc4 ("net: hns3: remove unused code in hns3_self_test()")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If unicast mac address table is full, and user add a new mac address, the
unicast promisc needs to be enabled for the new unicast mac address can be
used. So does the multicast promisc.
Now this feature has been implemented for PF, and VF should be implemented
too. When the mac table of VF is overflow, PF will enable promisc for this
VF.
Fixes: 1e6e76101f ("net: hns3: configure promisc mode for VF asynchronously")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if function adds an existing unicast mac address, eventhough
driver will not add this address into hardware, but it will return 0 in
function hclge_add_uc_addr_common(). It will cause the state of this
unicast mac address is ACTIVE in driver, but it should be in TO-ADD state.
To fix this problem, function hclge_add_uc_addr_common() returns -EEXIST
if mac address is existing, and delete two error log to avoid printing
them all the time after this modification.
Fixes: 72110b5674 ("net: hns3: return 0 and print warning when hit duplicate MAC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HCLGE_FLAG_MQPRIO_ENABLE is supposed to set when enable
multiple TCs with tc mqprio, and HCLGE_FLAG_DCB_ENABLE is
supposed to set when enable multiple TCs with ets. But
the driver mixed the flags when updating the tm configuration.
Furtherly, PFC should be available when HCLGE_FLAG_MQPRIO_ENABLE
too, so remove the unnecessary limitation.
Fixes: 5a5c909174 ("net: hns3: add support for tc mqprio offload")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For destroy mqprio is irreversible in stack, so it's unnecessary
to rollback the tc configuration when destroy mqprio failed.
Otherwise, it may cause the configuration being inconsistent
between driver and netstack.
As the failure is usually caused by reset, and the driver will
restore the configuration after reset, so it can keep the
configuration being consistent between driver and hardware.
Fixes: 5a5c909174 ("net: hns3: add support for tc mqprio offload")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, in function hns3_nic_set_real_num_queue(), the
driver doesn't report the queue count and offset for disabled
tc. If user enables multiple TCs, but only maps user
priorities to partial of them, it may cause the queue range
of the unmapped TC being displayed abnormally.
Fix it by removing the tc enable checking, ensure the queue
count is not zero.
With this change, the tc_en is useless now, so remove it.
Fixes: a75a8efa00 ("net: hns3: Fix tc setup when netdev is first up")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ixgbe driver currently generates a NULL pointer dereference with
some machine (online cpus < 63). This is due to the fact that the
maximum value of num_xdp_queues is nr_cpu_ids. Code is in
"ixgbe_set_rss_queues"".
Here's how the problem repeats itself:
Some machine (online cpus < 63), And user set num_queues to 63 through
ethtool. Code is in the "ixgbe_set_channels",
adapter->ring_feature[RING_F_FDIR].limit = count;
It becomes 63.
When user use xdp, "ixgbe_set_rss_queues" will set queues num.
adapter->num_rx_queues = rss_i;
adapter->num_tx_queues = rss_i;
adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
And rss_i's value is from
f = &adapter->ring_feature[RING_F_FDIR];
rss_i = f->indices = f->limit;
So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup",
for (i = 0; i < adapter->num_rx_queues; i++)
if (adapter->xdp_ring[i]->xsk_umem)
It leads to panic.
Call trace:
[exception RIP: ixgbe_xdp+368]
RIP: ffffffffc02a76a0 RSP: ffff9fe16202f8d0 RFLAGS: 00010297
RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000001c RDI: ffffffffa94ead90
RBP: ffff92f8f24c0c18 R8: 0000000000000000 R9: 0000000000000000
R10: ffff9fe16202f830 R11: 0000000000000000 R12: ffff92f8f24c0000
R13: ffff9fe16202fc01 R14: 000000000000000a R15: ffffffffc02a7530
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
7 [ffff9fe16202f8f0] dev_xdp_install at ffffffffa89fbbcc
8 [ffff9fe16202f920] dev_change_xdp_fd at ffffffffa8a08808
9 [ffff9fe16202f960] do_setlink at ffffffffa8a20235
10 [ffff9fe16202fa88] rtnl_setlink at ffffffffa8a20384
11 [ffff9fe16202fc78] rtnetlink_rcv_msg at ffffffffa8a1a8dd
12 [ffff9fe16202fcf0] netlink_rcv_skb at ffffffffa8a717eb
13 [ffff9fe16202fd40] netlink_unicast at ffffffffa8a70f88
14 [ffff9fe16202fd80] netlink_sendmsg at ffffffffa8a71319
15 [ffff9fe16202fdf0] sock_sendmsg at ffffffffa89df290
16 [ffff9fe16202fe08] __sys_sendto at ffffffffa89e19c8
17 [ffff9fe16202ff30] __x64_sys_sendto at ffffffffa89e1a64
18 [ffff9fe16202ff38] do_syscall_64 at ffffffffa84042b9
19 [ffff9fe16202ff50] entry_SYSCALL_64_after_hwframe at ffffffffa8c0008c
So I fix ixgbe_max_channels so that it will not allow a setting of queues
to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup,
take the smaller value of num_rx_queues and num_xdp_queues.
Fixes: 4a9b32f30f ("ixgbe: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
t-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-09-28
This series contains updates to ice driver only.
Dave adds support for QoS DSCP allowing for DSCP to TC mapping via APP
TLVs.
Ani adds enforcement of DSCP to only supported devices with the
introduction of a feature bitmap and corrects messaging of unsupported
modules based on link mode.
Jake refactors devlink info functions to be void as the functions no
longer return errors.
Jeff fixes a macro name to properly reflect the value.
Len Baker converts a kzalloc allocation to, the preferred, kcalloc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds PTP PHC support to NIX VF interfaces. This enables
a VF to run PTP master/slave instance. PTP block being a shared
hardware resource it is recommended to avoid running multiple
PTP instances in the system which will impact the PTP clock
accuracy.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever the interface is brought up/down then set_rx_mode
function is called by the stack which enables promisc/allmulti
MCAM entries. But there are cases when driver brings
interface down and then up such as while changing number
of channels. In these cases promisc/allmulti MCAM entries
are left disabled as set_rx_mode callback is not called.
This patch enables these MCAM entries in all such cases.
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.
In this case this is not actually dynamic sizes: both sides of the
multiplication are constant values. However it is best to refactor this
anyway, just to keep the open-coded math idiom out of code.
So, use the purpose specific kcalloc() function instead of the argument
size * count in the kzalloc() function.
[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In IPv4 header, fragment flags indicate whether the packet needs
to be fragmented or not. The value 0x20 represents MF (More Fragment); fix
the macro name to match this.
Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After commit a8f89fa277 ("ice: do not abort devlink info if board
identifier can't be found"), the getter/fallback() functions no longer
report an error. Convert the interface to a void so that it is no
longer possible to add a version field that is fatal. This makes
sense, because we should not fail to report other versions just
because one of the version pieces could not be found.
Finally, clean up the getter functions line wrapping so that none of
them take more than 80 columns, as is the usual style for networking
files.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The messaging for unsupported module detection is different for
lenient mode and strict mode. Update the code to print the right
messaging for a given link mode.
Media topology conflict is not an error in lenient mode, so return
an error code only if not in lenient mode.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
DSCP a.k.a L3 QoS is only supported on certain devices. To enforce this,
this patch introduces a bitmap of features and helper functions.
The feature bitmap is set based on device IDs on driver init. Currently,
DSCP is the only feature in this bitmap, but there will be more in the
future. In the DCB netlink flow, check if the feature bit is set before
exercising DSCP.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement code to handle submission of APP TLV's
containing DSCP to TC mapping.
The first such mapping received on an interface
will cause that PF to switch to L3 DSCP QoS mode,
apply the default config for that mode, and apply
the received mapping.
Only one such mapping will be allowed per DSCP value,
and when the last DSCP mapping is deleted, the PF
will switch back into L2 VLAN QoS mode, applying the
appropriate default QoS settings.
L3 DSCP QoS mode will only be allowed in SW DCBx
mode, in other words, when the FW LLDP engine is
disabled. Commands that break this mutual exclusivity
will be blocked.
Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
UID field was added to alloc_uar and dealloc_uar PRM command, to specify
DevX UID for UAR. This change enables firmware validating user access to
its own UAR resources.
For the kernel allocated UARs the UID will stay 0 as of today.
Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
The use of dma_unmap_addr()/dma_unmap_len() in the driver causes
multiple warnings when these macros are defined as empty, e.g.
in an ARCH=i386 allmodconfig build:
drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'gve_tx_add_skb_no_copy_dqo':
drivers/net/ethernet/google/gve/gve_tx_dqo.c:494:40: error: unused variable 'buf' [-Werror=unused-variable]
494 | struct gve_tx_dma_buf *buf =
This is not how the NEED_DMA_MAP_STATE macros are meant to work,
as they rely on never using local variables or a temporary structure
like gve_tx_dma_buf.
Remote the gve_tx_dma_buf definition and open-code the contents
in all places to avoid the warning. This causes some rather long
lines but otherwise ends up making the driver slightly smaller.
Fixes: a57e5de476 ("gve: DQO: Add TX path")
Link: https://lore.kernel.org/netdev/20210723231957.1113800-1-bcf@google.com/
Link: https://lore.kernel.org/netdev/20210721151100.2042139-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current driver uses software CQ head pointer to poll on CQE
header in memory to determine if CQE is valid. Software needs
to make sure, that the reads of the CQE do not get re-ordered
so much that it ends up with an inconsistent view of the CQE.
To ensure that DMB barrier after read to first CQE cacheline
and before reading of the rest of the CQE is needed.
But having barrier for every CQE read will impact the performance,
instead use hardware CQ head and tail pointers to find the
valid number of CQEs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTP hardware block can be configured to utilize
the external clock. Also the current ptp timestamp
can be captured when external trigger is applied on
a gpio pin. These features are required in scenarios
like connecting a external timing device to the chip
for time synchronization. The timing device provides
the clock and trigger(PPS signal) to the PTP block.
This patch does the following:
1. configures PTP block to use external clock
frequency and timestamp capture on external event.
2. sends PTP_REQ_EXTTS events to kernel ptp phc susbsytem
with captured timestamps
3. aligns PPS edge to adjusted ptp clock in the ptp device
by setting the PPS_THRESH to the reminder of the last
timestamp value captured by external PPS
Signed-off-by: Yi Guo <yig@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input clock frequency of PTP block is figured
out from hardware reset block currently. The firmware
data already has this info in sclk. Hence simplify
ptp driver to use sclk from firmware data.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAC on CN10K support hardware timestamping such that 8 bytes addition
header is prepended to incoming packets. This patch does necessary
configuration to enable Hardware time stamping upon receiving request
from PF netdev interfaces.
Timestamp configuration is different on MAC (CGX) Octeontx2 silicon
and MAC (RPM) OcteonTX3 CN10k. Based on silicon variant appropriate
fn() pointer is called. Refactor MAC specific mbox messages to remove
unnecessary gaps in mboxids.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon receiving ptp config request from netdev interface , Octeontx2 MAC
block CGX is configured to append timestamp to every incoming packet
and NPC config is updated with DMAC offset change.
Currently this configuration is not reset in FLR handler. This patch
resets the same.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function copies strings around between multiple buffers
including a large on-stack array that causes a build warning
on 32-bit systems:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c: In function 'hclge_dbg_dump_tm_pg':
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:782:1: error: the frame size of 1424 bytes is larger than 1400 bytes [-Werror=frame-larger-than=]
The function can probably be cleaned up a lot, to go back to
printing directly into the output buffer, but dynamically allocating
the structure is a simpler workaround for now.
Fixes: 04d96139dd ("net: hns3: refine function hclge_dbg_dump_tm_pri()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_INET is not set, there are failing references to IPv4
functions, so make this driver depend on INET.
Fixes these build errors:
sparc64-linux-ld: drivers/net/ethernet/sun/sunvnet_common.o: in function `sunvnet_start_xmit_common':
sunvnet_common.c:(.text+0x1a68): undefined reference to `__icmp_send'
sparc64-linux-ld: drivers/net/ethernet/sun/sunvnet_common.o: in function `sunvnet_poll_common':
sunvnet_common.c:(.text+0x358c): undefined reference to `ip_send_check'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Aaron Young <aaron.young@oracle.com>
Cc: Rashmi Narasimhan <rashmi.narasimhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't print stats for which we haven't reserved space as it can
cause nasty memory bashing and related bad behaviors.
Fixes: aa620993b1 ("ionic: pull per-q stats work out of queue loops")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
nguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-09-27
This series contains updates to e100 driver only.
Jake corrects under allocation of register buffer due to incorrect
calculations and fixes buffer overrun of register dump.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
An object file cannot be built for both loadable module and built-in
use at the same time:
arm-linux-gnueabi-ld: drivers/net/ethernet/micrel/ks8851_common.o: in function `ks8851_probe_common':
ks8851_common.c:(.text+0xf80): undefined reference to `__this_module'
Change the ks8851_common code to be a standalone module instead,
and use Makefile logic to ensure this is built-in if at least one
of its two users is.
Fixes: 797047f875 ("net: ks8851: Implement Parallel bus operations")
Link: https://lore.kernel.org/netdev/20210125121937.3900988-1-arnd@kernel.org/
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marek Vasut <marex@denx.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
My previous patch had an off-by-one error in the added sanity
check, the arrays are MTL_MAX_{RX,TX}_QUEUES long, so if that
index is that number, it has overflown.
The patch silenced the warning anyway because the strings could
no longer overlap with the input, but they could still overlap
with other fields.
Fixes: 3e0d5699a9 ("net: stmmac: fix gcc-10 -Wrestrict warning")
Reported-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang warns about arithmetic on NULL pointers:
drivers/net/ethernet/ti/am65-cpsw-ethtool.c:71:2: error: performing pointer subtraction with a null pointer has undefined behavior [-Werror,-Wnull-pointer-subtraction]
AM65_CPSW_REGDUMP_REC(AM65_CPSW_REGDUMP_MOD_NUSS, 0x0, 0x1c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/ti/am65-cpsw-ethtool.c:64:29: note: expanded from macro 'AM65_CPSW_REGDUMP_REC'
.hdr.len = (((u32 *)(end)) - ((u32 *)(start)) + 1) * sizeof(u32) * 2 + \
^ ~~~~~~~~~~~~~~~~
The expression here is easily changed to a calculation based on integers
that is no less readable.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When rhashtable_init() fails, it returns -EINVAL.
However, since error return value of rhashtable_init is not checked,
it can cause use of uninitialized pointers.
So, fix unhandled errors of rhashtable_init.
Signed-off-by: MichelleJin <shjy180909@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new xsk batched rx allocation interface for the zero-copy data
path. As the array of struct xdp_buff pointers kept by the driver is
really a ring that wraps, the allocation routine is modified to detect
a wrap and in that case call the allocation function twice. The
allocation function cannot deal with wrapped rings, only arrays. As we
now know exactly how many buffers we get and that there is no
wrapping, the allocation function can be simplified even more as all
if-statements in the allocation loop can be removed, improving
performance.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-6-magnus.karlsson@gmail.com
Use the new xsk batched rx allocation interface for the zero-copy data
path. As the array of struct xdp_buff pointers kept by the driver is
really a ring that wraps, the allocation routine is modified to detect
a wrap and in that case call the allocation function twice. The
allocation function cannot deal with wrapped rings, only arrays. As we
now know exactly how many buffers we get and that there is no
wrapping, the allocation function can be simplified even more as all
if-statements in the allocation loop can be removed, improving
performance.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-5-magnus.karlsson@gmail.com
In order to use the new xsk batched buffer allocation interface, a
pointer to an array of struct xsk_buff pointers need to be provided so
that the function can put the result of the allocation there. In the
ice driver, we already have a ring that stores pointers to
xdp_buffs. This is only used for the xsk zero-copy driver and is a
union with the structure that is used for the regular non zero-copy
path. Unfortunately, that structure is larger than the xdp_buffs
pointers which mean that there will be a stride (of 20 bytes) between
each xdp_buff pointer. And feeding this into the xsk_buff_alloc_batch
interface will not work since it assumes a regular array of xdp_buff
pointers (each 8 bytes with 0 bytes in-between them on a 64-bit
system).
To fix this, remove the xdp_buff pointer from the rx_buf union and
move it one step higher to the union above which only has pointers to
arrays in it. This solves the problem and we can directly feed the SW
ring of xdp_buff pointers straight into the allocation function in the
next patch when that interface is used. This will improve performance.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-4-magnus.karlsson@gmail.com
The e100_get_regs function is used to implement a simple register dump
for the e100 device. The data is broken into a couple of MAC control
registers, and then a series of PHY registers, followed by a memory dump
buffer.
The total length of the register dump is defined as (1 + E100_PHY_REGS)
* sizeof(u32) + sizeof(nic->mem->dump_buf).
The logic for filling in the PHY registers uses a convoluted inverted
count for loop which counts from E100_PHY_REGS (0x1C) down to 0, and
assigns the slots 1 + E100_PHY_REGS - i. The first loop iteration will
fill in [1] and the final loop iteration will fill in [1 + 0x1C]. This
is actually one more than the supposed number of PHY registers.
The memory dump buffer is then filled into the space at
[2 + E100_PHY_REGS] which will cause that memcpy to assign 4 bytes past
the total size.
The end result is that we overrun the total buffer size allocated by the
kernel, which could lead to a panic or other issues due to memory
corruption.
It is difficult to determine the actual total number of registers
here. The only 8255x datasheet I could find indicates there are 28 total
MDI registers. However, we're reading 29 here, and reading them in
reverse!
In addition, the ethtool e100 register dump interface appears to read
the first PHY register to determine if the device is in MDI or MDIx
mode. This doesn't appear to be documented anywhere within the 8255x
datasheet. I can only assume it must be in register 28 (the extra
register we're reading here).
Lets not change any of the intended meaning of what we copy here. Just
extend the space by 4 bytes to account for the extra register and
continue copying the data out in the same order.
Change the E100_PHY_REGS value to be the correct total (29) so that the
total register dump size is calculated properly. Fix the offset for
where we copy the dump buffer so that it doesn't overrun the total size.
Re-write the for loop to use counting up instead of the convoluted
down-counting. Correct the mdio_read offset to use the 0-based register
offsets, but maintain the bizarre reverse ordering so that we have the
ABI expected by applications like ethtool. This requires and additional
subtraction of 1. It seems a bit odd but it makes the flow of assignment
into the register buffer easier to follow.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
commit abf9b90205 ("e100: cleanup unneeded math") tried to simplify
e100_get_regs_len and remove a double 'divide and then multiply'
calculation that the e100_reg_regs_len function did.
This change broke the size calculation entirely as it failed to account
for the fact that the numbered registers are actually 4 bytes wide and
not 1 byte. This resulted in a significant under allocation of the
register buffer used by e100_get_regs.
Fix this by properly multiplying the register count by u32 first before
adding the size of the dump buffer.
Fixes: abf9b90205 ("e100: cleanup unneeded math")
Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This commit extends the supported ethtool operations to allow MAC
level flow control to be configured for the bcmgenet driver.
The ethtool utility can be used to change the configuration of
auto-negotiated symmetric and asymmetric modes as well as manually
configuring support for RX and TX Pause frames individually.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit separates out the MAC configuration that occurs on a
PHY state change into a function named bcmgenet_mac_config().
This allows the function to be called directly elsewhere.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY state machine has been fixed to only call the adjust_link
callback when the link state has changed. Therefore the old link
state variables are no longer needed to detect a change in link
state.
This commit effectively reverts
commit 5ad6e6c508 ("net: bcmgenet: improve bcmgenet_mii_setup()")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bcmgenet_mii_setup() function is registered as the adjust_link
callback from the phylib for the GENET driver.
The phylib always sets the netif_carrier according to phydev->link
prior to invoking the adjust_link callback, so there is no need to
repeat that in the link down case within the network driver.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change prevents from users to access device before devlink is
fully configured.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change prevents from users to access device before devlink is
fully configured.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change prevents from users to access device before devlink is
fully configured.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Open user space access to the devlink after driver is probed.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Open access to the devlink interface when the driver fully initialized.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that devlink is open to receive user input when all
parameters are initialized.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the code to make sure that devlink_register() is the last
command during initialization stage.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate devlink registrations and traps registrations so devlink will
be registered when driver is fully initialized.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change prevents from users to access device before devlink is fully
configured. This change allows us to delete call to devlink_params_publish()
and impossible check during unregister flow.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move devlink_registration routine to be the last command, when the
device is fully initialized.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move devlink registration to be the last command in device activation,
so it opens the driver to accept such devlink commands from the user
when it is fully initialized.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move devlink_register to be the last command in the initialization
sequence.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The liquidio driver is broken by design. It initialize PCI devices
in separate delayed works. It causes to the situation where device lock
is dropped during initialize and remove sequences.
That lock is part of driver/core and needed to protect from races during
init, destroy and bus invocations.
In addition to lack of locking protection, it has incorrect order of
destroy flows and very questionable synchronization scheme based on
atomic_t.
This change doesn't fix that driver but makes sure that rest of the
netdev subsystem doesn't suffer from such basic protection by adding
device_lock over devlink_*() APIs and by moving devlink_register()
to be last command in setup_nic_devices().
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move devlink_register() to be last command in devlink configuration
sequence, so no user space access will be possible till devlink instance
is fully operable. As part of this change, the devlink_params_publish
call is removed as not needed.
This change fixes forgotten devlink_params_unpublish() too.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang-14 does not like the custom offsetof() macro in vsc7326:
drivers/net/ethernet/chelsio/cxgb/vsc7326.c:597:3: error: performing pointer subtraction with a null pointer has undefined behavior [-Werror,-Wnull-pointer-subtraction]
HW_STAT(RxUnicast, RxUnicastFramesOK),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb/vsc7326.c:594:56: note: expanded from macro 'HW_STAT'
{ reg, (&((struct cmac_statistics *)NULL)->stat_name) - (u64 *)NULL }
Rewrite this to use the version provided by the kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-10 and later warn about a theoretical array overrun when
accessing priv->int_name_rx_irq[i] with an out of bounds value
of 'i':
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_request_irq_multi_msi':
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3528:17: error: 'snprintf' argument 4 may overlap destination object 'dev' [-Werror=restrict]
3528 | snprintf(int_name, int_name_len, "%s:%s-%d", dev->name, "tx", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3404:60: note: destination object referenced by 'restrict'-qualified argument 1 was declared here
3404 | static int stmmac_request_irq_multi_msi(struct net_device *dev)
| ~~~~~~~~~~~~~~~~~~~^~~
The warning is a bit strange since it's not actually about the array
bounds but rather about possible string operations with overlapping
arguments, but it's not technically wrong.
Avoid the warning by adding an extra bounds check.
Fixes: 8532f613bc ("net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX")
Link: https://lore.kernel.org/all/20210421134743.3260921-1-arnd@kernel.org/
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
of_get_mac_address() reads the same "local-mac-address" property.
... But goes above and beyond by checking the MAC value properly.
printk+message seems outdated too,
so let's put dev_err in the queue.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use resource_size function on resource object
instead of explicit computation.
Clean up coccicheck warning:
./drivers/net/ethernet/microchip/sparx5/sparx5_main.c:237:19-22: ERROR:
Missing resource_size with iores [ idx ]
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replacing kmalloc/kfree/dma_map_single/dma_unmap_single()
with dma_alloc_coherent/dma_free_coherent() helps to reduce
code size, and simplify the code, and coherent DMA will not
clear the cache every time.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit d437f5aa23.
Code has been duplicated through commit <273c29e944bd> "ibmvnic: check
failover_pending in login response"
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dma_alloc_coherent() instead of pci_alloc_consistent(),
because only dma_alloc_coherent() is called here.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dma_xxx_xxx() instead of pci_xxx_xxx(),
because the pci function wrappers are not called here.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dma_alloc_coherent() instead of pci_alloc_consistent(),
because only dma_alloc_coherent() is called here.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dma_alloc_coherent() instead of pci_alloc_consistent(),
because only dma_alloc_coherent() is called here.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dma_map_single() instead of pci_map_single(),
because the pci function wrappers are not called here.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a replication of Christian Lamparter's "net: bgmac-bcma:
handle deferred probe error due to mac-address" patch for the
bgmac-platform driver [1].
As is the case with the bgmac-bcma driver, this change is to cover the
scenario where the MAC address cannot yet be discovered due to reliance
on an nvmem provider which is yet to be instantiated, resulting in a
random address being assigned that has to be manually overridden.
[1] https://lore.kernel.org/netdev/20210919115725.29064-1-chunkeey@gmail.com
Signed-off-by: Matthew Hagan <mnhagan88@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a dev_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimized KPU1 entry processing for variable-length custom L2 headers
of size 24B, 90B by
- Moving LA LTYPE parsing for 24B and 90B headers to PKIND.
- Removing LA flags assignment for 24B and 90B headers.
- Reserving a PKIND 55 to parse variable length headers.
Also, new mailbox(NPC_SET_PKIND) added to configure PKIND with
corresponding variable-length offset, mask, and shift count
(NPC_AF_KPUX_ENTRYX_ACTION0).
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With current KPU profile, while parsing GTPU packets, GTPU payload
is also being parsed and GTPU PDU payload is being treated as IPV4
data, which is not correct. In case of GTPU packets, parsing should
be stopped after identifying the GTPU. Adding changes to limit KPU
profile parsing for GTPU payload.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.
Use struct_group() around members queue_id, min_bw, max_bw, tsa, pri_lvl,
and bw_weight so they can be referenced together. This will allow memcpy()
and sizeof() to more easily reason about sizes, improve readability,
and avoid future warnings about writing beyond the end of queue_id.
"pahole" shows no size nor member offset changes to struct bnxt_cos2bw_cfg.
"objdump -d" shows no meaningful object code changes (i.e. only source
line number induced differences and optimizations).
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/lkml/CACKFLinDc6Y+P8eZ=450yA1nMC7swTURLtcdyiNR=9J6dfFyBg@mail.gmail.com
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/lkml/20210728044517.GE35706@embeddedor
Support offloading of TC rules that filter ingress traffic from a MACVLAN
device, which is attached to uplink representor.
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Support offloading of TC rules that mirror/redirect egress traffic to a
MACVLAN device, which is attached to mlx5 representor net device.
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In switchdev mode we insert steering rules to eswitch that
make sure packets can't be looped back.
Modify the self tests infra and have flags per test.
Add a flag for tests that needs to be skipped in switchdev mode.
Before this commit:
$ ethtool --test enp8s0f0
The test result is FAIL
The test extra info:
Link Test 0
Speed Test 0
Health Test 0
Loopback Test 1
After this commit:
$ ethtool --test enp8s0f0
The test result is PASS
The test extra info:
Link Test 0
Speed Test 0
Health Test 0
Example output in dmesg:
enp8s0f0: Self test begin..
enp8s0f0: [0] Link Test start..
enp8s0f0: [0] Link Test end: result(0)
enp8s0f0: [1] Speed Test start..
enp8s0f0: [1] Speed Test end: result(0)
enp8s0f0: [2] Health Test start..
enp8s0f0: [2] Health Test end: result(0)
enp8s0f0: Self test out: status flags(0x1)
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This to be consistent and adds the module name to the error message.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Instead of having sparse ifdefs in source files use a single
ifdef in the tc sample header file and use stubs.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The priv argument is not being used. remove it.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The driver should add offloaded rules with either a fwd or drop action.
The check existed in parsing fdb flows but not when parsing nic flows.
Move the test into actions_match_supported() which is called for
checking nic flows and fdb flows.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Do it when parsing like in other actions instead of when
checking if goto is supported in current scenario.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
A user is expected to explicit request a fwd or drop action.
It is not correct to implicit add a fwd action for the user,
when modify header action flag exists.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
modify_header_match_supported() should return type bool but
it returns the value returned by is_action_keys_supported()
which is type int.
is_action_keys_supported() always returns either -EOPNOTSUPP
or 0 and it shouldn't change as the purpose of the function
is checking for support. so just make the function return
a bool type.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Prior to this patch, ethtool -X fail but the user receives a success
status. Try to roll-back when failing and return success status
accordingly.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
devlink is a software interface that doesn't depend on any hardware
capabilities. The failure in SW means memory issues, wrong parameters,
programmer error e.t.c.
Like any other such interface in the kernel, the returned status of
devlink APIs should be checked and propagated further and not ignored.
Fixes: 755f982bb1 ("qed/qede: make devlink survive recovery")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PF pointer is always valid when PCI core calls its .shutdown() and
.remove() callbacks. There is no need to check it again.
Fixes: 837f08fdec ("ice: Add basic driver framework for Intel(R) E800 Series")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver doesn't have any port parameters and registers
devlink port parameters with empty table. Remove the useless
calls to devlink_port_params_register and _unregister.
Fixes: da203dfa89 ("Revert "devlink: Add a generic wake_on_lan port parameter"")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
devlink is a software interface that doesn't depend on any hardware
capabilities. The failure in SW means memory issues, wrong parameters,
programmer error e.t.c.
Like any other such interface in the kernel, the returned status of
devlink APIs should be checked and propagated further and not ignored.
Fixes: 4ab0c6a8ff ("bnxt_en: add support to enable VF-representors")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The enetc phylink .mac_config handler intends to clear the IFMODE field
(bits 1:0) of the PM0_IF_MODE register, but incorrectly clears all the
other fields instead.
For normal operation, the bug was inconsequential, due to the fact that
we write the PM0_IF_MODE register in two stages, first in
phylink .mac_config (which incorrectly cleared out a bunch of stuff),
then we update the speed and duplex to the correct values in
phylink .mac_link_up.
Judging by the code (not tested), it looks like maybe loopback mode was
broken, since this is one of the settings in PM0_IF_MODE which is
incorrectly cleared.
Fixes: c76a97218d ("net: enetc: force the RGMII speed and duplex instead of operating in inband mode")
Reported-by: Pavel Machek (CIP) <pavel@denx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlxsw driver supports IP-in-IP only with IPv4 underlay.
Add support for IPv6 underlay for Spectrum-2 and above.
Most of the configurations are same to IPv4, the main difference between
IPv4 and IPv6 is related to saving IP addresses.
IPv6 addresses are saved as part of KVD and the relevant registers hold
pointer to them.
Add API for that as part of ipip_ops, so then only Spectrum-2 and above
will save IPv6 addresses in this way.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Spectrum ASIC has a configurable limit on how deep into the packet
it parses. By default, the limit is 96 bytes.
For IP-in-IP packets, with IPv6 outer and inner headers, the default
parsing depth is not enough and without increasing it such packets cannot
be properly decapsulated.
Use the existing API to set parsing depth, call it once for each
decapsulation entry when it is created/destroyed.
There is no need to protect the code with new mutex because 'router->lock'
is already taken in these code paths.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for allocating and freeing KVD entries for IPv6 addresses.
These addresses are programmed by the RIPS register and referenced by
the RATR and RTDP registers for IPv6 underlay encapsulation and
decapsulation, respectively.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add operations for IP-in-IP GRE6. For now the function can_offload()
returns false and the other functions warn as they should never be called.
A later patch will add dedicated operations for Spectrum-2 which will
support IPv6 underlay, so use 'mlxsw_sp1_' prefix for the new operations.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, there is support for IP-in-IP only with IPv4 underlay for all
supported Spectrum ASICs.
The next patches will add support for IPv6 underlay only for Spectrum-2
and above.
Add infrastructure for splitting IP-in-IP support between different
ASICs - create separate ipip_ops_arr and add ipips_init function to set the
right ops.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RITR register is used to configure the router interface table.
For IP-in-IP, it stores the underlay source IP address for encapsulation
and also the ingress RIF for the underlay lookup.
Add support for IPv6 IP-in-IP configuration.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RATR register is used to configure the Router Adjacency (next-hop)
Table.
For IP-in-IP entry, underlay destination IPv4 is saved as part of this
register and underlay destination IPv6 is saved by RIPS register and RATR
saves pointer to it.
Add function for setting IPv6 IP-in-IP configuration.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RTDP register is used for configuring the tunnel decapsulation
properties of NVE and IP-in-IP.
Linux tunnels verify packets before decapsulation based on the packet's
source IP, which must match tunnel remote IP.
RTDP is used to configure decapsulation so that it filters out packets that
are not IPv6 or have the wrong source IP or wrong GRE key.
For IP-in-IP entry, source IPv4 is saved as part of this register and
source IPv6 is saved by RIPS register and RTDP saves pointer to it.
Create common function for configuring both IPv4 and IPv6 and add
dedicated functions for each protocol.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RIPS register is used to store IPv6 addresses for use by the NVE and
IP-in-IP.
For IPv6 underlay support, RATR register needs to hold a pointer to the
remote IPv6 address for encapsulation and RTDP register needs to hold a
pointer to the local IPv6 address for decapsulation check.
Add the required register for saving IPv6 addresses.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function __mlxsw_sp_ipip_netdev_ul_dev_get() returns the underlay
device that corresponds to the overlay device that it gets.
Currently, this function assumes that the tunnel is IPv4 GRE, because it
is the only one that is supported by mlxsw driver.
This assumption will no longer be correct when IPv6 GRE support is added,
resulting in wrong underlay device being returned.
Instead, check 'ol_dev->type' and return the underlay device accordingly.
Move the function to spectrum_ipip.c because spectrum_router.c should not
be aware to tunnel type.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function mlxsw_sp_ipip_ol_netdev_change_gre4() contains code that
can be shared between IPv4 and IPv6.
The only difference is the way that arguments are taken from tunnel
parameters, which are different between IPv4 and IPv6.
For that, add structure 'mlxsw_sp_ipip_parms' to hold all the required
parameters for the function and save it as part of
'struct mlxsw_sp_ipip_entry' instead of the existing structure that is
not shared between IPv4 and IPv6. Add new operation as part of
'mlxsw_sp_ipip_ops' to initialize the new structure.
Then mlxsw_sp_ipip_ol_netdev_change_gre{4,6}() will prepare the new
structure and both will call the same function.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppress the following checkpatch.pl check [1] by adding a variable to
store the IP-in-IP options. Noticed while adding equivalent IPv6 code in
subsequent patches.
[1]
CHECK: Alignment should match open parenthesis
+ mlxsw_reg_ritr_loopback_ipip4_pack(ritr_pl, lb_cf.lb_ipipt,
+
+ MLXSW_REG_RITR_LOOPBACK_IPIP_OPTIONS_GRE_KEY_PRESET,
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, there are several functions that handle 'struct ip_tunnel_parm'
and 'struct __ip6_tnl_parm'.
Change the mentioned functions to get IP tunnel parameters by reference
and as 'const'.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw_sp_fib4_entry_type_unset() is not specific for IPv4 FIB entry,
move the code to mlxsw_sp_fib_entry_type_unset(), and call this function
from mlxsw_sp_fib4_entry_type_unset() so then it will be used for IPv6
also.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/mptcp/protocol.c
977d293e23 ("mptcp: ensure tx skbs always have the MPTCP ext")
efe686ffce ("mptcp: ensure tx skbs always have the MPTCP ext")
same patch merged in both trees, keep net-next.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After fixing hibernation resume flow, another usecase was found which
should be explicitly handled - resume when device is in "down" state.
Invoke aq_nic_init jointly with aq_nic_start only if ndev was already
up during suspend/hibernate. We still need to perform nic_deinit() if
caller requests for it, to handle the freeze/resume scenarios.
Fixes: 57f780f1c4 ("atlantic: Fix driver resume flow.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver doesn't support aRFS for encapsulated packets, return early error
in such a case.
Fixes: 1eb8c695bd ("net/mlx4_en: Add accelerated RFS support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The blamed commit made the fatally incorrect assumption that ports which
aren't in the FORWARDING STP state should not have packets forwarded
towards them, and that is all that needs to be done.
However, that logic alone permits BLOCKING ports to forward to
FORWARDING ports, which of course allows packet storms to occur when
there is an L2 loop.
The ocelot_get_bridge_fwd_mask should not only ask "what can the bridge
do for you", but "what can you do for the bridge". This way, only
FORWARDING ports forward to the other FORWARDING ports from the same
bridging domain, and we are still compatible with the idea of multiple
bridges.
Fixes: df291e54cc ("net: ocelot: support multiple bridges")
Suggested-by: Colin Foster <colin.foster@in-advantage.com>
Reported-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes multiple CLS_REPLACE calls are issued for the same connection.
rhashtable_insert_fast does not check for these duplicates, so multiple
hardware flow entries can be created.
Fix this by checking for an existing entry early
Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the HW device is during recovery, the HW resources will never return,
hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
This fix speeds up the error recovery flow.
Fixes: 64515dc899 ("qed: Add infrastructure for error detection and recovery")
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start using the trap adjacency entry that was added in the previous
patch and remove the existing one which is no longer needed.
Note that the name of the old entry was inaccurate as the entry did not
discard packets, but trapped them.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 0c3cbbf96d ("mlxsw: Add specific trap for packets routed via
invalid nexthops"), mlxsw started allocating a new adjacency entry
during driver initialization, to trap packets routed via invalid
nexthops.
This behavior was later altered in commit 983db6198f ("mlxsw:
spectrum_router: Allocate discard adjacency entry when needed") to only
allocate the entry upon the first route that requires it. The motivation
for the change is explained in the commit message.
The problem with the current behavior is that the entry shows up as a
"leak" in a new BPF resource monitoring tool [1]. This is caused by the
asymmetry of the allocation/free scheme. While the entry is allocated
upon the first route that requires it, it is only freed during
de-initialization of the driver.
Instead, track the number of active nexthop groups and allocate the
adjacency entry upon the creation of the first group. Free it when the
number of active groups reaches zero.
The next patch will convert mlxsw to start using the new entry and
remove the old one.
[1] https://github.com/Mellanox/mlxsw/tree/master/Debugging/libbpf-tools/resmon
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
devlink_register() can't fail and always returns success, but all drivers
are obligated to check returned status anyway. This adds a lot of boilerplate
code to handle impossible flow.
Make devlink_register() void and simplify the drivers that use that
API call.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When interfacing with a Broadcom PHY, request the auto-power down, DLL
disable and IDDQ-SR modes to be enabled.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hclge_get_reset_status() should return the tqp reset status.
However, if the CMDQ fails, the caller will take it as tqp reset
success status by mistake. Therefore, uses a parameters to get
the tqp reset status instead.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input parameters may not be reliable, so check the vlan id before
using it, otherwise may set wrong vlan id into hardware.
Fixes: dc8131d846 ("net: hns3: Fix for packet loss due wrong filter config in VLAN tbls")
Signed-off-by: liaoguojia <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input parameters may not be reliable. Before using the
queue id, we should check this parameter. Otherwise, memory
overwriting may occur.
Fixes: d341001846 ("net: hns3: refactor the mailbox message between PF and VF")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vport_id include PF and VFs, vport_id = 0 means PF, other values mean VFs.
So the actual vf id is equal to vport_id minus 1.
Some VF print logs are actually vport, and logs of vf id actually use
vport id, so this patch fixes them.
Fixes: ac887be5b0 ("net: hns3: change print level of RAS error log from warning to error")
Fixes: adcf738b80 ("net: hns3: cleanup some print format warning")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vf id from ethtool is added 1 before configured to driver.
So it's necessary to minus 1 when printing it, in order to
keep consistent with user's configuration.
Fixes: dd74f815dd ("net: hns3: Add support for rule add/delete for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When user change rss 'hfunc' without set rss 'hkey' by ethtool
-X command, the driver will ignore the 'hfunc' for the hkey is
NULL. It's unreasonable. So fix it.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes: 374ad29176 ("net: hns3: Add RSS general configuration support for VF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The smallest TX ring size we support must fit a TX SKB with MAX_SKB_FRAGS
+ 1. Because the first TX BD for a packet is always a long TX BD, we
need an extra TX BD to fit this packet. Define BNXT_MIN_TX_DESC_CNT with
this value to make this more clear. The current code uses a minimum
that is off by 1. Fix it using this constant.
The tx_wake_thresh to determine when to wake up the TX queue is half the
ring size but we must have at least BNXT_MIN_TX_DESC_CNT for the next
packet which may have maximum fragments. So the comparison of the
available TX BDs with tx_wake_thresh should be >= instead of > in the
current code. Otherwise, at the smallest ring size, we will never wake
up the TX queue and will cause TX timeout.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadocm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for jumbo frames. Full support for jumbo frames requires
changes in the DSA switch driver (lantiq_gswip.c).
Tested on BT Hone Hub 5A.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per HW errata AQ modification to CQ could be discarded on heavy
traffic. This patch implements workaround for the same after each
CQ write by AQ check whether the requested fields (except those
which HW can update eg: avg_level) are properly updated or not.
If CQ context is not updated then perform AQ write again.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Any link state change that's done prior to net device registration
isn't reflected on the state, thus the operational state is left
obsolete, with 'UNKNOWN' status.
To resolve the issue, query link state from FW upon open operations
to ensure operational state is updated.
Fixes: c27a02cd94 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the inclusion of nvmem handling into the mac-address getter
function of_get_mac_address() by
commit d01f449c00 ("of_net: add NVMEM support to of_get_mac_address")
it is now possible to get a -EPROBE_DEFER return code. Which did cause
bgmac to assign a random ethernet address.
This exact issue happened on my Meraki MR32. The nvmem provider is
an EEPROM (at24c64) which gets instantiated once the module
driver is loaded... This happens once the filesystem becomes available.
With this patch, bgmac_probe() will propagate the -EPROBE_DEFER error.
Then the driver subsystem will reschedule the probe at a later time.
Cc: Petr Štetiar <ynezz@true.cz>
Cc: Michael Walle <michael@walle.cc>
Fixes: d01f449c00 ("of_net: add NVMEM support to of_get_mac_address")
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MODULE_DEVICE_TABLE already creates proper alias for platform
driver. Having another MODULE_ALIAS causes the alias to be duplicated.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When updating ocelot to use phylink, a second write to DEV_CLOCK_CFG was
mistakenly left in. It used the variable "speed" which, previously, would
would have been assigned a value of OCELOT_SPEED_1000. In phylink the
variable is be SPEED_1000, which is invalid for the
DEV_CLOCK_LINK_SPEED macro. Removing it as unnecessary and buggy.
Fixes: e6e12df625 ("net: mscc: ocelot: convert to phylink")
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A useless write to ANA_PFC_PFC_CFG was left in while refactoring ocelot to
phylink. Since priority flow control is disabled, writing the speed has no
effect.
Further, it was using ethtool.h SPEED_ instead of OCELOT_SPEED_ macros,
which are incorrectly offset for GENMASK.
Lastly, for priority flow control to properly function, some scenarios
would rely on the rate adaptation from the PCS while the MAC speed would
be fixed. So it isn't used, and even if it was, neither "speed" nor
"mac_speed" are necessarily the correct values to be used.
Fixes: e6e12df625 ("net: mscc: ocelot: convert to phylink")
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IGC=y and PTP_1588_CLOCK=m, the ptp_*() interface family is
not available to the igc driver. Make this driver depend on
PTP_1588_CLOCK_OPTIONAL so that it will build without errors.
Various igc commits have used ptp_*() functions without checking
that PTP_1588_CLOCK is enabled. Fix all of these here.
Fixes these build errors:
ld: drivers/net/ethernet/intel/igc/igc_main.o: in function `igc_msix_other':
igc_main.c:(.text+0x6494): undefined reference to `ptp_clock_event'
ld: igc_main.c:(.text+0x64ef): undefined reference to `ptp_clock_event'
ld: igc_main.c:(.text+0x6559): undefined reference to `ptp_clock_event'
ld: drivers/net/ethernet/intel/igc/igc_ethtool.o: in function `igc_ethtool_get_ts_info':
igc_ethtool.c:(.text+0xc7a): undefined reference to `ptp_clock_index'
ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_feature_enable_i225':
igc_ptp.c:(.text+0x330): undefined reference to `ptp_find_pin'
ld: igc_ptp.c:(.text+0x36f): undefined reference to `ptp_find_pin'
ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_init':
igc_ptp.c:(.text+0x11cd): undefined reference to `ptp_clock_register'
ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_stop':
igc_ptp.c:(.text+0x12dd): undefined reference to `ptp_clock_unregister'
ld: drivers/platform/x86/dell/dell-wmi-privacy.o: in function `dell_privacy_wmi_probe':
Fixes: 64433e5bf4 ("igc: Enable internal i225 PPS")
Fixes: 60dbede0c4 ("igc: Add support for ethtool GET_TS_INFO command")
Fixes: 87938851b6 ("igc: enable auxiliary PHC functions for the i225")
Fixes: 5f2958052c ("igc: Add basic skeleton for PTP")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Ederson de Souza <ederson.desouza@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phylink documentation says:
Note that the PHY may be able to transform from one connection
technology to another, so, eg, don't clear 1000BaseX just
because the MAC is unable to BaseX mode. This is more about
clearing unsupported speeds and duplex settings. The port modes
should not be cleared; phylink_set_port_modes() will help with this.
So add the missing 10G modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Marek Behún <kabel@kernel.org>
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only struct dim_sample member that does not get
initialized by dim_update_sample() is comp_ctr. (There
is special API to initialize comp_ctr:
dim_update_sample_with_comps(), and it is currently used
only for RDMA.) comp_ctr is used to compute curr_stats->cmps
and curr_stats->cpe_ratio (see dim_calc_stats()) which in
turn are consumed by the rdma_dim_*() API. Therefore,
functionally, the net_dim*() API consumers are not affected.
Nevertheless, fix the computation of statistics based
on an uninitialized variable, even if the mentioned statistics
are not used at the moment.
Fixes: ae0e6a5d16 ("enetc: Add adaptive interrupt coalescing")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
irq_set_affinity_hit() stores a reference to the cpumask_t
parameter in the irq descriptor, and that reference can be
accessed later from irq_affinity_hint_proc_show(). Since
the cpu_mask parameter passed to irq_set_affinity_hit() has
only temporary storage (it's on the stack memory), later
accesses to it are illegal. Thus reads from the corresponding
procfs affinity_hint file can result in paging request oops.
The issue is fixed by the get_cpu_mask() helper, which provides
a permanent storage for the cpumask_t parameter.
Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both MAC IPs available on SAMA7G5 support MII on RGMII feature.
Enable these by adding proper capability to proper macb_config
objects.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cadence IP has option to enable MII support on RGMII interface. This
could be selected though bit 28 of network control register. This option
is not enabled on all the IP versions thus add a software capability to
be selected by the proper implementation of this IP.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Align for OSSMODE offset.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add description for SRTSM bit.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we are using a dedicated PHY driver (not the Generic PHY driver)
chances are that it is going to configure RGMII delays and do that in a
way that is incompatible with our incorrect interpretation of the
phy_interface value.
Add a quirk in order to reverse the PHY_INTERFACE_MODE_RGMII to the
value of PHY_INTERFACE_MODE_RGMII_ID such that the MAC continues to be
configured the way it used to be, but the PHY driver can account for
adding delays. Conversely when PHY_INTERFACE_MODE_RGMII_TXID is
specified, return PHY_INTERFACE_MODE_RGMII_RXID to the PHY since we will
have enabled a TXC MAC delay (id_mode_dis=0, meaning there is a delay
inserted).
This is not considered a bug fix at this point since it only affects
Broadcom STB platforms shipping with a Device Tree blob that is not
updatable in the field (quite a few devices out there) and which was
generated using the scripted Device Tree environment shipped with those
platforms' SDK.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sky2 is parsing the VPD and adds the parsed information to its debugfs
file. This isn't needed in kernel, userspace tools like lspci can be
used to display such information nicely. Therefore remove this from
the driver.
lspci -vv:
Capabilities: [50] Vital Product Data
Product Name: Marvell Yukon 88E8070 Gigabit Ethernet Controller
Read-only fields:
[PN] Part number: Yukon 88E8070
[EC] Engineering changes: Rev. 1.0
[MN] Manufacture ID: Marvell
[SN] Serial number: AbCdEfG970FD4
[CP] Extended capability: 01 10 cc 03
[RV] Reserved: checksum good, 9 byte(s) reserved
Read/write fields:
[RW] Read-write area: 1 byte(s) free
End
Relevant part in debugfs file:
0000:01:00.0 Product Data
Marvell Yukon 88E8070 Gigabit Ethernet Controller
Part Number: Yukon 88E8070
Engineering Level: Rev. 1.0
Manufacturer: Marvell
Serial Number: AbCdEfG970FD4
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/bbaee8ab-9b2e-de04-ee7b-571e094cc5fe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MODULE_DEVICE_TABLE already creates proper alias for spi driver.
Having another MODULE_ALIAS causes the alias to be duplicated.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and the error value
gets printed.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable blkaddr is being initialized with a value that is never
read, it is being updated later on in a for-loop. The assignment is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case where the condition !is_rvu_otx2(rvu) is false variable
val is not initialized and can contain a garbage value. Fix this by
initializing val to zero and bit-wise or'ing in BIT_ULL(51) to val
for the true condition case of !is_rvu_otx2(rvu).
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 4b5a3ab17c ("octeontx2-af: Hardware configuration for inline IPsec")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NXP Legal insists that the following are not fine:
- Saying "NXP Semiconductors" instead of "NXP", since the company's
registered name is "NXP"
- Putting a "(c)" sign in the copyright string
- Putting a comma in the copyright string
The only accepted copyright string format is "Copyright <year-range> NXP".
This patch changes the copyright headers in the networking files that
were sent by me, or derived from code sent by me.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After I turn on the CONFIG_LOCK_STAT=y, insmod e1000e.ko will report:
[ 5.641579] e1000e: Unknown symbol mutex_lock (err -2)
[ 90.775705] e1000e: Unknown symbol mutex_lock (err -2)
[ 132.252339] e1000e: Unknown symbol mutex_lock (err -2)
This problem fixed after include <linux/mutex.h>.
Signed-off-by: Hao Chen <chenhaoa@uniontech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and the error value
gets printed.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver isn't enabled most places because of the ISA config
dependency, but alpha still has it. And I think the 'Jensen' actually
did have an ISA slot.
However, it doesn't build cleanly, because the "Vortex bus master" code
just casts the skb->data pointer to 'int':
outl((int) (skb->data), ioaddr + Wn7_MasterAddr);
which is all kinds of broken. Even on a good old traditional PC/AT it
would be broken because the high bits will be random kernel address
bits, but presumably the hardware ignores those bits. I mean, it's ISA.
We're talking 16MB dma limits. The "good old days".
Make the build happy with this kind of craziness by using the proper
isa_virt_to_bus() handling that the full bus master code uses anyway
(the Vortex bus mastering is a limited special case).
Who knows, this might even work.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On OcteonTX2/CN10K SoC, the admin function (AF) is the only one
with all priviliges to configure HW and alloc resources, PFs and
it's VFs have to request AF via mailbox for all their needs.
This patch adds new mailbox messages for CPT PFs and VFs to configure
HW resources for inline-IPsec.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network interface managed by the mlxbf_gige driver can
get into a problem state where traffic does not flow.
In this state, the interface will be up and enabled, but
will stop processing received packets. This problem state
will happen if three specific conditions occur:
1) driver has received more than (N * RxRingSize) packets but
less than (N+1 * RxRingSize) packets, where N is an odd number
Note: the command "ethtool -g <interface>" will display the
current receive ring size, which currently defaults to 128
2) the driver's interface was disabled via "ifconfig oob_net0 down"
during the window described in #1.
3) the driver's interface is re-enabled via "ifconfig oob_net0 up"
This patch ensures that the driver's "valid_polarity" field is
cleared during the open() method so that it always matches the
receive polarity used by hardware. Without this fix, the driver
needs to be unloaded and reloaded to correct this problem state.
Fixes: f92e1869d7 ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K MAC block (RPM) differs in number of stats compared to Octeontx2
MAC block (CGX). RPM supports stats for each class of PFC and error
packets etc. It would be difficult for user to read stats from ethtool
and map to their definition.
New debugfs file is already added to read RPM stats along with their
definition. This patch adds proper checks such that RPM stats will not
be part of ethtool.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Checking tunnel offloading, it turns out that offloading doesn't work
as expected. The following script allows to reproduce the issue.
Call it as `testscript DEVICE LOCALIP REMOTEIP NETMASK'
=== SNIP ===
if [ $# -ne 4 ]
then
echo "Usage $0 DEVICE LOCALIP REMOTEIP NETMASK"
exit 1
fi
DEVICE="$1"
LOCAL_ADDRESS="$2"
REMOTE_ADDRESS="$3"
NWMASK="$4"
echo "Driver: $(ethtool -i ${DEVICE} | awk '/^driver:/{print $2}') "
ethtool -k "${DEVICE}" | grep tx-udp
echo
echo "Set up NIC and tunnel..."
ip addr add "${LOCAL_ADDRESS}/${NWMASK}" dev "${DEVICE}"
ip link set "${DEVICE}" up
sleep 2
ip link add vxlan1 type vxlan id 42 \
remote "${REMOTE_ADDRESS}" \
local "${LOCAL_ADDRESS}" \
dstport 0 \
dev "${DEVICE}"
ip addr add fc00::1/64 dev vxlan1
ip link set vxlan1 up
sleep 2
rm -f vxlan.pcap
echo "Running tcpdump and iperf3..."
( nohup tcpdump -i any -w vxlan.pcap >/dev/null 2>&1 ) &
sleep 2
iperf3 -c fc00::2 >/dev/null
pkill tcpdump
echo
echo -n "Max. Paket Size: "
tcpdump -r vxlan.pcap -nnle 2>/dev/null \
| grep "${LOCAL_ADDRESS}.*> ${REMOTE_ADDRESS}.*OTV" \
| awk '{print $8}' | awk -F ':' '{print $1}' \
| sort -n | tail -1
echo
ip link del vxlan1
ip addr del ${LOCAL_ADDRESS}/${NWMASK} dev "${DEVICE}"
=== SNAP ===
The expected outcome is
Max. Paket Size: 64904
This is what you see on igb, the code igc has been taken from.
However, on igc the output is
Max. Paket Size: 1516
so the GSO aggregate packets are segmented by the kernel before calling
igc_xmit_frame. Inside the subsequent call to igc_tso, the check for
skb_is_gso(skb) fails and the function returns prematurely.
It turns out that this occurs because the feature flags aren't set
entirely correctly in igc_probe. In contrast to the original code
from igb_probe, igc_probe neglects to set the flags required to allow
tunnel offloading.
Setting the same flags as igb fixes the issue on igc.
Fixes: 34428dff36 ("igc: Add GSO partial support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Corinna Vinschen <vinschen@redhat.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the assert from the callback priv lookup function since it does
not require RTNL lock and is already protected by flow_indr_block_lock.
This will avoid warnings from being emitted to dmesg if the driver
registers its callback after an ingress qdisc was created for a
netdevice.
The warnings started after the following patch was merged:
commit 74fc4f8287 ("net: Fix offloading indirect devices dependency on qdisc order creation")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When possible use dev_err_probe help to properly deal with the
PROBE_DEFER error, the benefit is that DEFER issue will be logged
in the devices_deferred debugfs file.
And using dev_err_probe() can reduce code size, and simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc 11.x reports the following compiler warning/error.
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Use absolute_pointer() to work around the problem.
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement support for ethtool_ops::reset in order to reset transceiver
modules. The module backing the netdev is reset when the 'ETH_RESET_PHY'
flag is set. After a successful reset, the flag is cleared by the driver
and other flags are ignored. This is in accordance with the interface
documentation:
"The reset() operation must clear the flags for the components which
were actually reset. On successful return, the flags indicate the
components which were not reset, either because they do not exist in the
hardware or because they cannot be reset independently. The driver must
never reset any components that were not requested."
Reset is useful in order to allow a module to transition out of a fault
state. From section 6.3.2.12 in CMIS 5.0: "Except for a power cycle, the
only exit path from the ModuleFault state is to perform a module reset
by taking an action that causes the ResetS transition signal to become
TRUE (see Table 6-11)".
An error is returned when the netdev is administratively up:
# ip link set dev swp11 up
# ethtool --reset swp11 phy
ETHTOOL_RESET 0x40
Cannot issue ETHTOOL_RESET: Invalid argument
# ip link set dev swp11 down
# ethtool --reset swp11 phy
ETHTOOL_RESET 0x40
Components reset: 0x40
An error is returned when the module is shared by multiple ports (split
ports) and the "phy-shared" flag is not set:
# devlink port split swp11 count 4
# ethtool --reset swp11s0 phy
ETHTOOL_RESET 0x40
Cannot issue ETHTOOL_RESET: Invalid argument
# ethtool --reset swp11s0 phy-shared
ETHTOOL_RESET 0x400000
Components reset: 0x400000
# devlink port unsplit swp11s0
# ethtool --reset swp11 phy
ETHTOOL_RESET 0x40
Components reset: 0x40
An error is also returned when one of the ports using the module is
administratively up:
# devlink port split swp11 count 4
# ip link set dev swp11s1 up
# ethtool --reset swp11s0 phy-shared
ETHTOOL_RESET 0x400000
Cannot issue ETHTOOL_RESET: Invalid argument
# ip link set dev swp11s1 down
# ethtool --reset swp11s0 phy-shared
ETHTOOL_RESET 0x400000
Components reset: 0x400000
Reset is performed by writing to the "rst" bit of the PMAOS register,
which instructs the firmware to assert the reset signal connected to the
module for a fixed amount of time.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PMAOS register has enable bits (e.g., PMAOS.ee) that allow changing
only a subset of the fields, which is exactly what subsequent patches
will need to do. Instead of passing multiple arguments to its pack
function, only pass the module index and let the rest be set by the
different callers.
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ports Module Administrative and Operational Status (PMAOS) register
configures and retrieves the per-module status. Extend it with fields
required to support various module settings such as reset and power
mode.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the common port module core, track the number of logical ports that
are mapped to the port module and the number of logical ports using it
that are administratively up.
This will be used by later patches to potentially veto and control
certain operations on the module, such as reset and setting its power
mode.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value is never checked. Allows us to simplify a later patch.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value is not checked by the networking stack. Allows us to
simplify a later patch.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the previous patch, the lock is always taken in process context so
it can be converted to a mutex. It is needed for future changes where we
will need to be able to sleep when holding the lock.
Convert the lock to a mutex.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Module temperature events are currently handled in softIRQ context,
requiring the 'module_info_lock' to be a spin lock. In future patchsets
we will need to be able to hold the lock while sleeping.
Therefore, defer handling of these events using a work queue so that the
next patch will be able to convert the lock to a mutex.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the previous patch, the switch driver is always initialized last,
making this function redundant.
Remove it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 961cf99a07 ("mlxsw: core: Re-order initialization sequence")
changed the initialization sequence so that the switch driver (e.g.,
mlxsw_spectrum) is initialized before registration with the hwmon and
thermal subsystems.
This was done in order to avoid situations where hwmon/thermal code uses
features not supported by current firmware version, which is only
validated as part of switch driver initialization.
Later, commit b79cb787ac ("mlxsw: Move fw flashing code into core.c")
moved firmware validation and flashing code from the switch driver to
mlxsw_core so that it is performed before driver initialization.
Therefore, change the initialization sequence back to its original form.
In addition to being more straightforward, it will allow us to simplify
parts of the code in subsequent patches and future patchsets.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The devlink parameters were published in two steps despite being static
and known in advance.
First step was to use devlink_params_publish() which iterated over all
known up to that point parameters and sent notification messages.
In second step, the call was devlink_param_publish() that looped over
same parameters list and sent notification for new parameters.
In order to simplify the API, move devlink_params_publish() to be called
when all parameters were already added and save the need to iterate over
parameters list again.
As a side effect, this change fixes the error unwind flow in which
parameters were not marked as unpublished.
Fixes: 82e6c96f04 ("net/mlx5: Register to devlink ingress VLAN filter trap")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than releasing the tx pools on every close and reallocating
them on open, reuse the tx pools unless the pool parameters (number
of pools, size of each pool or size of each buffer in a pool) have
changed.
If the pool parameters changed, then release the old pools (if
any) and allocate new ones.
Specifically release tx pools, if:
- adapter is removed,
- pool parameters change during reset,
- we encounter an error when opening the adapter in response
to a user request (in ibmvnic_open()).
and don't release them:
- in __ibmvnic_close() or
- on errors in __ibmvnic_open()
in the hope that we can reuse them during this or next reset.
With these changes reset_tx_pools() can be dropped because its
optimization is now included in init_tx_pools() itself.
cleanup_tx_pools() releases all the skbs associated with the pool and
is called from ibmvnic_cleanup(), which is called on every reset. Since
we want to reuse skbs across resets, move cleanup_tx_pools() out of
ibmvnic_cleanup() and call it only when user closes the adapter.
Add two new adapter fields, ->prev_mtu, ->prev_tx_pool_size to track the
previous values and use them to decide whether to reuse or realloc the
pools.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than releasing the rx pools on and reallocating them on every
reset, reuse the rx pools unless the pool parameters (number of pools,
size of each pool or size of each buffer in a pool) have changed.
If the pool parameters changed, then release the old pools (if any)
and allocate new ones.
Specifically release rx pools, if:
- adapter is removed,
- pool parameters change during reset,
- we encounter an error when opening the adapter in response
to a user request (in ibmvnic_open()).
and don't release them:
- in __ibmvnic_close() or
- on errors in __ibmvnic_open()
in the hope that we can reuse them on the next reset.
With these, reset_rx_pools() can be dropped because its optimzation is
now included in init_rx_pools() itself.
cleanup_rx_pools() releases all the skbs associated with the pool and
is called from ibmvnic_cleanup(), which is called on every reset. Since
we want to reuse skbs across resets, move cleanup_rx_pools() out of
ibmvnic_cleanup() and call it only when user closes the adapter.
Add two new adapter fields, ->prev_rx_buf_sz, ->prev_rx_pool_size to
keep track of the previous values and use them to decide whether to
reuse or realloc the pools.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reuse the long term buffer during a reset as long as its size has
not changed. If the size has changed, free it and allocate a new
one of the appropriate size.
When we do this, alloc_long_term_buff() and reset_long_term_buff()
become identical. Drop reset_long_term_buff().
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a follow-on patch, we will reuse long term buffers when possible.
When doing so we have to be careful to properly assign map ids. We
can no longer assign them sequentially because a lower map id may be
available and we could wrap at 255 and collide with an in-use map id.
Instead, use a bitmap to track active map ids and to find a free map id.
Don't need to take locks here since the map_id only changes during reset
and at that time only the reset worker thread should be using the adapter.
Noticed this when analyzing an error Dany Madden ran into with the
patch set.
Reported-by: Dany Madden <drt@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In init_tx_pools() move some loop-invariant code out of the loop.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use/rename local variables in init_tx_pools() for consistency with
init_rx_pools() and for readability. Also add some comments
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make the code more readable, use/rename some local variables.
Basically we have a set of pools, num_pools. Each pool has a set of
buffers, pool_size and each buffer is of size buff_size.
pool_size is a bit ambiguous (whether size in bytes or buffers). Add
a comment in the header file to make it explicit.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add/update some comments/function headers and fix up some messages.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For better readability, consolidate related code in replenish_rx_pool()
and add some comments.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Configure the burst length in Ethernet drivers. This improves
Ethernet performance by 58%. According to the vendor BSP,
8W burst length is supported by ar9 and newer SoCs.
The NAT benchmark results on xRX200 (Down/Up):
* 2W: 330 Mb/s
* 4W: 432 Mb/s 372 Mb/s
* 8W: 520 Mb/s 389 Mb/s
Tested on xRX200 and xRX330.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is called to enable SR-IOV when available,
not enabling interfaces without VFs was a regression.
Fixes: 65161c3555 ("bnx2x: Fix missing error code in bnx2x_iov_init_one()")
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Reported-by: YunQiang Su <wzssyqa@gmail.com>
Tested-by: YunQiang Su <wzssyqa@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Shai Malin <smalin@marvell.com>
Link: https://lore.kernel.org/r/20210912190523.27991-1-bunk@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The new firmware supports to divides the whole multicast MAC address space
equally to functions of all PFs, and calculates the space size of each PF
according to its function number.
To support this feature, PF driver adds querying multicast MAC address
space size from firmware and limits used number according to space size.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, there are two ways for PF to set the unicast MAC address space
size: specified by config parameters in firmware or set to default value.
That's mean if the config parameters in firmware is zero, driver will
divide the whole unicast MAC address space equally to 8 PFs. However, in
this case, the unicast MAC address space will be wasted a lot when the
hardware actually has less then 8 PFs. And in the other hand, if one PF has
much more VFs than other PFs, then each function of this PF will has much
less address space than other PFs.
In order to ameliorate the above two situations, introduce the third way
of unicast MAC address space assignment: firmware divides the whole unicast
MAC address space equally to functions of all PFs, and calculates the space
size of each PF according to its function number. PF queries the space size
by the querying device specification command when in initialization
process.
The third way assignment is lower priority than specified by config
parameters, only if the config parameters is zero can be used, and if
firmware does not support the third way assignment, then driver still
divides the whole unicast MAC address space equally to 8 PFs.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a follow-up to beb401ec50 ("r8169: deprecate support for
RTL_GIGA_MAC_VER_27") that came with 5.12. Nobody complained, so let's
remove support for this chip version.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not used anymore, remove it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newly introduced PMTDB register is there to provide all needed info
about particular requested port split configuration. Use it instead of
figuring the info out manually in the driver.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PMTDB register allows to query the possible module<->local port
mapping than can be used in PMLP. It does not represent the actual/current
mapping of the local to module. Actual mapping is only defined by PMLP.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of relying on the values coming from the PMLP register, use PLLP
to get the information about port front panel number and split number.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PLLP register returns the mapping from Local Port into Label Port.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During port creation, mlxsw_core_port_init() is called with the front
panel port number and the split port sub-number. Currently, this
information is determined by the driver without firmware assistance.
Subsequent patches are going to query this information from firmware,
but this requires the port to assigned to SWID.
Therefore, move port SWID assignment before mlxsw_core_port_init().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During port creation, mlxsw_core_port_init() is called with the front
panel port number and the split port sub-number. Currently, this
information is determined by the driver without firmware assistance.
Subsequent patches are going to query this information from firmware,
but this requires the port to be mapped to a module.
Therefore, move port mapping before mlxsw_core_port_init().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add latest verified version of Nvidia Spectrum-family switch firmware,
for Spectrum (13.2008.3326), Spectrum-2 (29.2008.3326) and Spectrum-3
(30.2008.3326).
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the VF does not clear the interrupt source immediately after
receiving the interrupt. As a result, if the second interrupt task is
triggered when processing the first interrupt task, clearing the
interrupt source before exiting will clear the interrupt sources of the
two tasks at the same time. As a result, no interrupt is triggered for
the second task. The VF detects the missed message only when the next
interrupt is generated.
Clearing it immediately after executing check_evt_cause ensures that:
1. Even if two interrupt tasks are triggered at the same time, they can
be processed.
2. If the second task is triggered during the processing of the first
task and the interrupt source is not cleared, the interrupt is reported
after vector0 is enabled.
Fixes: b90fcc5bd9 ("net: hns3: add reset handling for VF when doing Core/Global/IMP reset")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the command for querying imp info is issued to the firmware,
if the firmware does not support the command, the returned value
of bd num is 0.
Add protection mechanism before alloc memory to prevent apply for
0-length memory.
Fixes: 0b198b0d80 ("net: hns3: refactor dump m7 info of debugfs")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware will not disable mac in flr process. Therefore, the driver
needs to proactively disable mac during flr, which is the same as the
function reset.
Fixes: 35d93a3004 ("net: hns3: adjust the process of PF reset")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, affinity_mask is set to a single cpu. As a result,
irqbalance becomes invalid in SUBSET or EXACT mode. To solve
this problem, change affinity_mask to numa node range. In this
way, irqbalance can be performed on the cpu of the numa node.
Fixes: 0812545487 ("net: hns3: add interrupt affinity support for misc interrupt")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware cannot handle short tunnel frames below 65 bytes,
and will cause vlan tag missing problem. So pads packet size to
65 bytes for tunnel frames to fix this bug.
Fixes: 3db084d28dc0("net: hns3: Fix for vxlan tx checksum bug")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When page pool is added to the hns3 driver, it is always
enabled unconditionally, which means spilt page handling
in the hns3 driver is dead code.
As there is a requirement to test the performance between
spilt page handling in driver and page pool, so add a module
param to support disabling the page pool.
When the page pool is proved to perform better in most case,
the spilt page handling in driver can be removed.
Fixes: 93188e9642 ("net: hns3: support skb's frag page recycling based on page pool")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.
So, use the struct_size() helper to do the arithmetic instead of the
argument "size + count * size" in the kzalloc() function.
[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it was reported and discussed in: https://lore.kernel.org/lkml/CAHk-=whF9F89vsfH8E9TGc0tZA-yhzi2Di8wOtquNB5vRkFX5w@mail.gmail.com/
This patch improves the stack space of qede_config_rx_mode() by
splitting filter_config() to 3 functions and removing the
union qed_filter_type_params.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We recently changed the completion ring page arrays to be dynamically
allocated to better support the expanded range of ring depths. The
cleanup path for this was not quite complete. It might cause the
shutdown path to crash if we need to abort before the completion ring
arrays have been allocated and initialized.
Fix it by initializing the ring_mem->pg_arr to NULL after freeing the
completion ring page array. Add a check in bnxt_free_ring() to skip
referencing the rmem->pg_arr if it is NULL.
Fixes: 03c7448790 ("bnxt_en: Don't use static arrays for completion ring pages")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to bnxt_free_mem(..., false) in the bnxt_half_open_nic() error
path will deallocate ring descriptor memory via bnxt_free_?x_rings(),
but because irq_re_init is false, the ring info itself is not freed.
To simplify error paths, deallocation functions have generally been
written to be safe when called on unallocated memory. It should always
be safe to call dev_close(), which calls bnxt_free_skbs() a second time,
even in this semi- allocated ring state.
Calling bnxt_free_skbs() a second time with the rings already freed will
cause NULL pointer dereference. Fix it by checking the rings are valid
before proceeding in bnxt_free_tx_skbs() and
bnxt_free_one_rx_ring_skbs().
Fixes: 975bc99a4a ("bnxt_en: Refactor bnxt_free_rx_skbs().")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent patch has introduced a regression by not reading the reset
count in the ERROR_RECOVERY async event handler. We may have just
gone through a reset and the reset count has just incremented. If
we don't update the reset count in the ERROR_RECOVERY event handler,
the health check timer will see that the reset count has changed and
will initiate an unintended reset.
Restore the unconditional update of the reset count in
bnxt_async_event_process() if error recovery watchdog is enabled.
Also, update the reset count at the end of the reset sequence to
make it even more robust.
Fixes: 1b2b918319 ("bnxt_en: Fix possible unintended driver initiated error recovery")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.
So, use the struct_size() helper to do the arithmetic instead of the
argument "size + count * size" in the kzalloc() function.
[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Synopsys Ethernet IP uses the CSR clock as a base clock for MDC.
The divisor used is set in the MAC_MDIO_Address register field CR
(Clock Rate)
The divisor is there to change the CSR clock into a clock that falls
below the IEEE 802.3 specified max frequency of 2.5MHz.
If the CSR clock is 300MHz, the code falls back to using the reset
value in the MAC_MDIO_Address register, as described in the comment
above this code.
However, 300MHz is actually an allowed value and the proper divider
can be estimated quite easily (it's just 1Hz difference!)
A CSR frequency of 300MHz with the maximum clock rate value of 0x5
(STMMAC_CSR_250_300M, a divisor of 124) gives somewhere around
~2.42MHz which is below the IEEE 802.3 specified maximum.
For the ARTPEC-8 SoC, the CSR clock is this problematic 300MHz,
and unfortunately, the reset-value of the MAC_MDIO_Address CR field
is 0x0.
This leads to a clock rate of zero and a divisor of 42, and gives an
MDC frequency of ~7.14MHz.
Allow CSR clock of 300MHz by making the comparison inclusive.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable ret is being initialized with a value that is never read, it
is being updated later on. The assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle MFW (management FW) error response in order to avoid a crash
during recovery flows.
Changes from v1:
- Add "Fixes tag".
Fixes: tag 5e7ba042fd ("qed: Fix reading stale configuration information")
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of users have reported that they were not able to get the PHY
to successfully link up, especially after commit c36757eb9d ("net:
phy: consider AN_RESTART status when reading link status") where we
stopped reading just BMSR, but we also read BMCR to determine the link
status.
Andrius at NetBSD did a wonderful job at debugging the problem
and found out that the MDIO bus clock frequency would be incorrectly set
back to its default value which would prevent the MDIO bus controller
from reading PHY registers properly. Back when we only read BMSR, if we
read all 1s, we could falsely indicate a link status, though in general
there is a cable plugged in, so this went unnoticed. After a second read
of BMCR was added, a wrong read will lead to the inability to determine
a link UP condition which is when it started to be visibly broken, even
if it was long before that.
The fix consists in restoring the value of the MD_CSR register that was
set prior to the MAC reset.
Link: http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=53494
Fixes: 90f750a81a ("r6040: consolidate MAC reset to its own function")
Reported-by: Andrius V <vezhlys@gmail.com>
Reported-by: Darek Strugacz <darek.strugacz@op.pl>
Tested-by: Darek Strugacz <darek.strugacz@op.pl>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two cases where the current PF does not support RDMA
functionality. The first is if the NVM loaded on the device is set
to not support RDMA (common_caps.rdma is false). The second is if
the kernel bonding driver has included the current PF in an active
link aggregate.
When the driver has determined that this PF does not support RDMA, then
auxiliary devices should not be created on the auxiliary bus. Without
a device on the auxiliary bus, even if the irdma driver is present, there
will be no RDMA activity attempted on this PF.
Currently, in the reset flow, an attempt to create auxiliary devices is
performed without regard to the ability of the PF. There needs to be a
check in ice_aux_plug_dev (as the central point that creates auxiliary
devices) to see if the PF is in a state to support the functionality.
When disabling and re-enabling RDMA due to the inclusion/removal of the PF
in a link aggregate, we also need to set/clear the bit which controls
auxiliary device creation so that a reset recovery in a link aggregate
situation doesn't try to create auxiliary devices when it shouldn't.
Fixes: f9f5301e7e ("ice: Register auxiliary device to provide RDMA")
Reported-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Building alpha:allmodconfig results in the following error.
drivers/net/ethernet/amd/ni65.c: In function 'ni65_stop_start':
drivers/net/ethernet/amd/ni65.c:751:37: error:
cast from pointer to integer of different size
buffer[i] = (u32) isa_bus_to_virt(tmdp->u.buffer);
'buffer[]' is declared as unsigned long, so replace the typecast to u32
with a typecast to unsigned long to fix the problem.
Cc: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patch addressed the situation of having some free resources for
xdp tx but not enough for one tx queue per CPU. This patch address the
worst case of not having resources at all for xdp tx.
Instead of using queues dedicated to xdp, normal queues used by network
stack are shared for both cases, using __netif_tx_lock for
synchronization. Also queue stop/restart must be considered in the xdp
path to avoid freezing the queue.
This is not the ideal situation we might want to be, and a performance
penalty is expected both for normal and xdp traffic, but at least XDP
will work in all possible situations (with a warning in the logs),
improving a bit the pain of not knowing in what situations we can use it
and in what situations we cannot.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there are not enough resources to allocate one TX queue per core for
XDP TX it was completely disabled.
This patch implements a fallback solution for sharing the available
queues using __netif_tx_lock for synchronization. In the normal case that
there is one TX queue per CPU, no locking is done, as it was before.
With this fallback solution, XDP TX will work in much more cases that
were failing, specially in machines with many CPUs. It's hard for XDP
users to know what features are supported across different NICs and
configurations, so they will benefit on having wider support.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use __maybe_unused for noirq_suspend()/noirq_resume() hooks to avoid
build warning with !CONFIG_PM_SLEEP:
>> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:796:12: error: 'stmmac_pltfr_noirq_resume' defined but not used [-Werror=unused-function]
796 | static int stmmac_pltfr_noirq_resume(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:775:12: error: 'stmmac_pltfr_noirq_suspend' defined but not used [-Werror=unused-function]
775 | static int stmmac_pltfr_noirq_suspend(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Fixes: 276aae3772 ("net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
plat_dev->dev->platform_data is released by platform_device_unregister(),
use of pclk and hclk is a use-after-free. Since device unregister won't
need a clk device we adjust the function call sequence to fix this issue.
[ 31.261225] BUG: KASAN: use-after-free in macb_remove+0x77/0xc6 [macb_pci]
[ 31.275563] Freed by task 306:
[ 30.276782] platform_device_release+0x25/0x80
Suggested-by: Nicolas Ferre <Nicolas.Ferre@microchip.com>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a failover occurs before a login response is received, the login
response buffer maybe undefined. Check that there was no failover
before accessing the login response buffer.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 5f58591323 ("net: stmmac: delete the eee_ctrl_timer after
napi disabled"), this patch tries to fix system hang caused by eee_ctrl_timer,
unfortunately, it only can resolve it for system reboot stress test. System
hang also can be reproduced easily during system suspend/resume stess test
when mount NFS on i.MX8MP EVK board.
In stmmac driver, eee feature is combined to phylink framework. When do
system suspend, phylink_stop() would queue delayed work, it invokes
stmmac_mac_link_down(), where to deactivate eee_ctrl_timer synchronizly.
In above commit, try to fix issue by deactivating eee_ctrl_timer obviously,
but it is not enough. Looking into eee_ctrl_timer expire callback
stmmac_eee_ctrl_timer(), it could enable hareware eee mode again. What is
unexpected is that LPI interrupt (MAC_Interrupt_Enable.LPIEN bit) is always
asserted. This interrupt has chance to be issued when LPI state entry/exit
from the MAC, and at that time, clock could have been already disabled.
The result is that system hang when driver try to touch register from
interrupt handler.
The reason why above commit can fix system hang issue in stmmac_release()
is that, deactivate eee_ctrl_timer not just after napi disabled, further
after irq freed.
In conclusion, hardware would generate LPI interrupt when clock has been
disabled during suspend or resume, since hardware is in eee mode and LPI
interrupt enabled.
Interrupts from MAC, MTL and DMA level are enabled and never been disabled
when system suspend, so postpone clocks management from suspend stage to
noirq suspend stage should be more safe.
Fixes: 5f58591323 ("net: stmmac: delete the eee_ctrl_timer after napi disabled")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert noticed a warning on MIPS TX49xx, Atari and presuambly other
platforms when the driver is built-in but NETDEV_LEGACY_INIT is
disabled:
drivers/net/ethernet/8390/ne.c:909:20: warning: ‘ne_add_devices’ defined but not used [-Wunused-function]
Merge the two module init functions into a single one with an
IS_ENABLED() check to replace the incorrect #ifdef.
Fixes: 4228c39428 ("make legacy ISA probe optional")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmE31tEACgkQSD+KveBX
+j5s1AgAoEhAp/FQc59n9p9sqtnTqKGZtfQN4ymnGugZMXrSPGa5osxKpVvu6rdM
9X5dI81DVphjH7f8Uwc94YDwai4hCRrHHap9RbUpZ8w2UAB5YAFYI2o/ycn4TDQg
3edCDq1IHRGxOQqHZgMpqPoxqgIBsnnKjsaJAcSPsAoqAGrwrcVge68x+CKT7KKw
+fgASxxAGBZjPIGbiMPM+dtHTYXuSegjfFnlPTydr+XHjxyhUXz7C4pUuOHg8rVp
3mYe3hBvx/RpI7ZUX+J1AV6+CiYXWpus+v07kQPrle8ycbRO0YMyEKaiJUYBNZND
4Gii4gIUBGUwNF2oqNlmXRUd/IaD8Q==
=aoQN
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2021-09-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-09-07
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
Included here, a patch which solves a build warning reported on
linux-kernel mailing list [1]:
Fix commit ("net/mlx5: Bridge, fix uninitialized variable usage")
I hope this series can make it to rc1.
[1] https://www.spinics.net/lists/netdev/msg765481.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If a failover occurs before a login response is received, the login
response buffer maybe undefined. Check that there was no failover
before accessing the login response buffer.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When activating the PTP-RQ, redirect the RQT from drop-RQ to PTP-RQ.
Use mlx5e_channels_get_ptp_rqn to retrieve the rqn. This helper returns
a boolean (not status), hence caller should consider return value 0 as a
fail. Change the caller interpretation of the return value.
Fixes: 43ec0f41fa ("net/mlx5e: Hide all implementation details of mlx5e_rx_res")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Some profiles of the driver don't support a dedicated PTP-RQ, hence can't
support HW TS and CQE compression simultaneously. When HW TS is enabled
the COE compression is disabled, and should be restored when the HW TS
is turned off. Add rx_filter as an input to modifying CQE compression to
enforce this restriction.
Fixes: 256f79d13c ("net/mlx5e: Fix HW TS with CQE compression according to profile")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fixes the below flow of sleeping in atomic context by releasing
the RCU lock before calling to free_match_list.
build_match_list() <- disables preempt
-> free_match_list()
-> tree_put_node()
-> down_write_ref_node() <- take write lock
Fixes: 693c6883bb ("net/mlx5: Add hash table for flow groups in flow table")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In NICs that don't support LAG, the LAG control structure won't be
allocated. If it wasn't allocated it means LAG doesn't exists and can be
skipped.
Fixes: cac1eb2cf2 ("net/mlx5: Lag, properly lock eswitch if needed")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
RDMA auxdev parameter registration was skipped for eswitch manager PCI PF.
Due to this when devlink parameter is read, it reads as false in below
code flow.
$ devlink dev reload pci/0000:06:00.0
devlink_reload()
mlx5_load_one()
mlx5_attach_device()
is_ib_enabled()
devlink_param_driverinit_value_get()
Due to this, is_ib_enabled() returns false for the RDMA auxiliary
device. This results into a skipping RDMA auxiliary device creation on
reload.
There is no need to check for eswitch manager capability to support RDMA
auxiliary device. Hence, fix it by skipping eswitch manager capability.
Fixes: 87158cedf0 ("net/mlx5: Support enable_rdma devlink dev param")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In some conditions variable 'err' is not assigned with value in
mlx5_esw_bridge_port_obj_attr_set() and mlx5_esw_bridge_port_changeupper()
functions after recent changes to support LAG. Initialize the variable with
zero value in both cases.
Reported-by: Colin King <colin.king@canonical.com>
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
CC: linux-kernel@vger.kernel.org
Fixes: ff9b752146 ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
wireless and can.
Current release - regressions:
- qrtr: revert check in qrtr_endpoint_post(), fixes audio and wifi
- ip_gre: validate csum_start only on pull
- bnxt_en: fix 64-bit doorbell operation on 32-bit kernels
- ionic: fix double use of queue-lock, fix a sleeping in atomic
- can: c_can: fix null-ptr-deref on ioctl()
- cs89x0: disable compile testing on powerpc
Current release - new code bugs:
- bridge: mcast: fix vlan port router deadlock, consistently disable BH
Previous releases - regressions:
- dsa: tag_rtl4_a: fix egress tags, only port 0 was working
- mptcp: fix possible divide by zero
- netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
- netfilter: socket: icmp6: fix use-after-scope
- stmmac: fix MAC not working when system resume back with WoL active
Previous releases - always broken:
- ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL
address
- seg6: set fc_nlinfo in nh_create_ipv4, nh_create_ipv6
- mptcp: only send extra TCP acks in eligible socket states
- dsa: lantiq_gswip: fix maximum frame length
- stmmac: fix overall budget calculation for rxtx_napi
- bnxt_en: fix firmware version reporting via devlink
- renesas: sh_eth: add missing barrier to fix freeing wrong tx descriptor
Stragglers:
- netfilter: conntrack: switch to siphash
- netfilter: refuse insertion if chain has grown too large
- ncsi: add get MAC address command to get Intel i210 MAC address
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmE3uicACgkQMUZtbf5S
IrtJVA//XdE8qAmw1JukjyYC87JH2ale20eoZ6ERn7/09e4tdv3M6dOTI4YfrM6+
CMNP5MP2qit3IzY+lN0+yt9AAFH7k85z3MA8zLxsXN4z63OJcZvFv/G/OWy4Wp/0
vOo/DH+rF3LR+fZZvjJI+8Xi9/orsRpD12cwGmjGRxybh+XcnHKI/GvK2RgE6oBR
015RfBbbQBpzFQvESLnSwDzabN1XFEL1x/bz7N8ek3okfO/tab+f3E1tb6eYtTy+
jyDyOWpayd4xDttKNMUuxwS1q+/oAWOAq8PzkaF/ZG2sBH1Z4yZN9ZtsLNZmPG8N
5L1FEem/Nmgr54T9v/FhfiryhhGGysVfVgtQcCBkKRmVn1Kk2L6dFvtuanPtFFd3
llbi5PvCDJy3rbMmxKmyoM3T4jpMwWxQRZKsosw+k/WQfb8/SUOjgpY713V1Wx/P
S+2uadU4l9Ql9sF6X0IqZABnnt+j/BuDo6C6vVq7vyj0iQ9hEX9YxC0ybrAHOYpH
suHWKndodRfTxxVOg8xRNYwXyRLNbm1AP6LMDNKBlFUjwNSZ362qFX7W7DuXoRup
Rrnb8V1QFvM+pyFb2a0qNtBS68IXbjCdVQX5e8a5ELaAUnDPefNrfPN+/rrTLEtV
LnusmBF+02llVSYdr88t1e+LmzqS/aqXFy2ry4y6owjq20ld2O0=
=Zvuz
-----END PGP SIGNATURE-----
Merge tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes and stragglers from Jakub Kicinski:
"Networking stragglers and fixes, including changes from netfilter,
wireless and can.
Current release - regressions:
- qrtr: revert check in qrtr_endpoint_post(), fixes audio and wifi
- ip_gre: validate csum_start only on pull
- bnxt_en: fix 64-bit doorbell operation on 32-bit kernels
- ionic: fix double use of queue-lock, fix a sleeping in atomic
- can: c_can: fix null-ptr-deref on ioctl()
- cs89x0: disable compile testing on powerpc
Current release - new code bugs:
- bridge: mcast: fix vlan port router deadlock, consistently disable
BH
Previous releases - regressions:
- dsa: tag_rtl4_a: fix egress tags, only port 0 was working
- mptcp: fix possible divide by zero
- netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
- netfilter: socket: icmp6: fix use-after-scope
- stmmac: fix MAC not working when system resume back with WoL active
Previous releases - always broken:
- ip/ip6_gre: use the same logic as SIT interfaces when computing
v6LL address
- seg6: set fc_nlinfo in nh_create_ipv4, nh_create_ipv6
- mptcp: only send extra TCP acks in eligible socket states
- dsa: lantiq_gswip: fix maximum frame length
- stmmac: fix overall budget calculation for rxtx_napi
- bnxt_en: fix firmware version reporting via devlink
- renesas: sh_eth: add missing barrier to fix freeing wrong tx
descriptor
Stragglers:
- netfilter: conntrack: switch to siphash
- netfilter: refuse insertion if chain has grown too large
- ncsi: add get MAC address command to get Intel i210 MAC address"
* tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
ieee802154: Remove redundant initialization of variable ret
net: stmmac: fix MAC not working when system resume back with WoL active
net: phylink: add suspend/resume support
net: renesas: sh_eth: Fix freeing wrong tx descriptor
bonding: 3ad: pass parameter bond_params by reference
cxgb3: fix oops on module removal
can: c_can: fix null-ptr-deref on ioctl()
can: rcar_canfd: add __maybe_unused annotation to silence warning
net: wwan: iosm: Unify IO accessors used in the driver
net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors
net: qcom/emac: Replace strlcpy with strscpy
ip6_gre: Revert "ip6_gre: add validation for csum_start"
net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static
selftests/bpf: Test XDP bonding nest and unwind
bonding: Fix negative jump label count on nested bonding
MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry
stmmac: dwmac-loongson:Fix missing return value
iwlwifi: fix printk format warnings in uefi.c
net: create netdev->dev_addr assignment helpers
bnxt_en: Fix possible unintended driver initiated error recovery
...
We can reproduce this issue with below steps:
1) enable WoL on the host
2) host system suspended
3) remote client send out wakeup packets
We can see that host system resume back, but can't work, such as ping failed.
After a bit digging, this issue is introduced by the commit 46f69ded98
("net: stmmac: Use resolved link config in mac_link_up()"), which use
the finalised link parameters in mac_link_up() rather than the
parameters in mac_config().
There are two scenarios for MAC suspend/resume in STMMAC driver:
1) MAC suspend with WoL inactive, stmmac_suspend() call
phylink_mac_change() to notify phylink machine that a change in MAC
state, then .mac_link_down callback would be invoked. Further, it will
call phylink_stop() to stop the phylink instance. When MAC resume back,
firstly phylink_start() is called to start the phylink instance, then
call phylink_mac_change() which will finally trigger phylink machine to
invoke .mac_config and .mac_link_up callback. All is fine since
configuration in these two callbacks will be initialized, that means MAC
can restore the state.
2) MAC suspend with WoL active, phylink_mac_change() will put link
down, but there is no phylink_stop() to stop the phylink instance, so it
will link up again, that means .mac_config and .mac_link_up would be
invoked before system suspended. After system resume back, it will do
DMA initialization and SW reset which let MAC lost the hardware setting
(i.e MAC_Configuration register(offset 0x0) is reset). Since link is up
before system suspended, so .mac_link_up would not be invoked after
system resume back, lead to there is no chance to initialize the
configuration in .mac_link_up callback, as a result, MAC can't work any
longer.
After discussed with Russell King [1], we confirm that phylink framework
have not take WoL into consideration yet. This patch calls
phylink_suspend()/phylink_resume() functions which is newly introduced
by Russell King to fix this issue.
[1] https://lore.kernel.org/netdev/20210901090228.11308-1-qiangqing.zhang@nxp.com/
Fixes: 46f69ded98 ("net: stmmac: Use resolved link config in mac_link_up()")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cur_tx counter must be incremented after TACT bit of
txdesc->status was set. However, a CPU is possible to reorder
instructions and/or memory accesses between cur_tx and
txdesc->status. And then, if TX interrupt happened at such a
timing, the sh_eth_tx_free() may free the descriptor wrongly.
So, add wmb() before cur_tx++.
Otherwise NETDEV WATCHDOG timeout is possible to happen.
Fixes: 86a74ff21a ("net: sh_eth: add support for Renesas SuperH Ethernet")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The strlcpy should not be used because it doesn't limit the source
length. As linus says, it's a completely useless function if you
can't implicitly trust the source string - but that is almost always
why people think they should use it! All in all the BSD function
will lead some potential bugs.
But the strscpy doesn't require reading memory from the src string
beyond the specified "count" bytes, and since the return value is
easier to error-check than strlcpy()'s. In addition, the implementation
is robust to the string changing out from underneath it, unlike the
current strlcpy() implementation.
Thus, We prefer using strscpy instead of strlcpy.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This symbols is not used outside of hclge_cmd.c and hclgevf_cmd.c, so marks
it static.
Fix the following sparse warning:
drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c:345:35:
warning: symbol 'hclgevf_cmd_caps_bit_map0' was not declared. Should it
be static?
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c:365:33: warning:
symbol 'hclge_cmd_caps_bit_map0' was not declared. Should it be static?
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If error recovery is already enabled, bnxt_timer() will periodically
check the heartbeat register and the reset counter. If we get an
error recovery async. notification from the firmware (e.g. change in
primary/secondary role), we will immediately read and update the
heartbeat register and the reset counter. If the timer for the next
health check expires soon after this, we may read the heartbeat register
again in quick succession and find that it hasn't changed. This will
trigger error recovery unintentionally.
The likelihood is small because we also reset fw_health->tmr_counter
which will reset the interval for the next health check. But the
update is not protected and bnxt_timer() can miss the update and
perform the health check without waiting for the full interval.
Fix it by only reading the heartbeat register and reset counter in
bnxt_async_event_process() if error recovery is trasitioning to the
enabled state. Also add proper memory barriers so that when enabling
for the first time, bnxt_timer() will see the tmr_counter interval and
perform the health check after the full interval has elapsed.
Fixes: 7e914027f7 ("bnxt_en: Enable health monitoring.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic assumes that when the driver sends the message to the
firmware to add the VXLAN or Geneve port, the firmware will never fail
the operation. The UDP ports are always stored and are used to check
the tunnel packets in .ndo_features_check(). These tunnnel packets
will fail to offload on the transmit side if firmware fails the call to
add the UDP ports.
To fix the problem, bp->vxlan_port and bp->nge_port will only be set to
the offloaded ports when the HWRM_TUNNEL_DST_PORT_ALLOC firmware call
succeeds. When deleting a UDP port, we check that the port was
previously added successfuly first by checking the FW ID.
Fixes: 1698d600b3 ("bnxt_en: Implement .ndo_features_check().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current asic.rev is incomplete and does not include the metal
revision. Add the metal revision and decode the complete asic
revision into the more common and readable form (A0, B0, etc).
Fixes: 7154917a12 ("bnxt_en: Refactor bnxt_dl_info_get().")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
P5 devices store NVM arrays using a different internal representation.
This implementation detail permeates into the HWRM API, requiring the
caller to explicitly index the array elements in HWRM_NVM_GET_VARIABLE
on these devices. Conversely, older devices do not support the indexed
mode of operation and require reading the raw NVM content.
Fixes: db28b6c77f ("bnxt_en: Fix devlink info's stored fw.psid version format.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FW_PSID version components are 8 bits wide, not 4.
Fixes: db28b6c77f ("bnxt_en: Fix devlink info's stored fw.psid version format.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tx_done is not used for napi_complete_done(). Thus, NAPI busy polling
mechanism by gro_flush_timeout and napi_defer_hard_irqs will not able
be triggered after a packet is transmitted when there is no receive
packet.
Fix this by taking the maximum value between tx_done and rx_done as
overall budget completed by the rxtx NAPI poll to ensure XDP Tx ZC
operation is continuously polling for next Tx frame. This gives
benefit of lower packet submission processing latency and jitter
under XDP Tx ZC mode.
Performance of tx-only using xdp-sock on Intel ADL-S platform is
the same with and without this patch.
root@intel-corei7-64:~# ./xdpsock -i enp0s30f4 -t -z -q 1 -n 10
sock0@enp0s30f4:1 txonly xdp-drv
pps pkts 10.00
rx 0 0
tx 511630 8659520
sock0@enp0s30f4:1 txonly xdp-drv
pps pkts 10.00
rx 0 0
tx 511625 13775808
sock0@enp0s30f4:1 txonly xdp-drv
pps pkts 10.00
rx 0 0
tx 511619 18892032
Fixes: 132c32ee5b ("net: stmmac: Add TX via XDP zero-copy socket")
Cc: <stable@vger.kernel.org> # 5.13.x
Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Memory allocated before 'lmac' is stored in 'cgx->lmac_idmap[]' must be
freed explicitly. Otherwise, in case of error, it will leak.
Rename the 'err_irq' label to better describe what is done at this place in
the error handling path.
Fixes: 6f14078e3e ("octeontx2-af: DMAC filter support in MAC block")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to match 'rvu_alloc_bitmap()', add a 'rvu_free_bitmap()' function
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adapter->registered_device_map is NULL, the value of err is
uncertain, we set err to -EINVAL to avoid ambiguity.
Clean up smatch warning:
drivers/net/ethernet/chelsio/cxgb/cxgb2.c:1114 init_one() warn: missing
error code 'err'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous commit 68233c583a removes the qlcnic_rom_lock()
in qlcnic_pinit_from_rom(), but remains its corresponding
unlock function, which is odd. I'm not very sure whether the
lock is missing, or the unlock is redundant. This bug is
suggested by a static analysis tool, please advise.
Fixes: 68233c583a ("qlcnic: updated reset sequence")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
This code is holding spin_lock_bh(&lif->rx_filters.lock); so the
allocation needs to be atomic.
Fixes: 969f843946 ("ionic: sync the filters in the work task")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Link: https://lore.kernel.org/r/20210903131856.GA25934@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ISA DMA API is inconsistent between architectures, and while
powerpc implements most of what the others have, it does not provide
isa_virt_to_bus():
../drivers/net/ethernet/cirrus/cs89x0.c: In function ‘net_open’:
../drivers/net/ethernet/cirrus/cs89x0.c:897:20: error: implicit declaration of function ‘isa_virt_to_bus’ [-Werror=implicit-function-declaration]
(unsigned long)isa_virt_to_bus(lp->dma_buff));
../drivers/net/ethernet/cirrus/cs89x0.c:894:3: note: in expansion of macro ‘cs89_dbg’
cs89_dbg(1, debug, "%s: dma %lx %lx\n",
I tried a couple of approaches to handle this consistently across
all architectures, but as this driver is really only used on
ARM, I ended up taking the easy way out and just disable compile
testing on powerpc.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Reported-by: kernel test robot <lkp@intel.com>
Fixes: 47fd22f2b8 ("cs89x0: rework driver configuration")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are various function arguments that are not indented correctly,
clean these up with correct indentation.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a statement that is not indented correctly, add in the
missing tab.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deadlock seen in an instance where the hwstamp configuration
is changed while the driver is running:
[ 3988.736671] schedule_preempt_disabled+0xe/0x10
[ 3988.736676] __mutex_lock.isra.5+0x276/0x4e0
[ 3988.736683] __mutex_lock_slowpath+0x13/0x20
[ 3988.736687] ? __mutex_lock_slowpath+0x13/0x20
[ 3988.736692] mutex_lock+0x2f/0x40
[ 3988.736711] ionic_stop_queues_reconfig+0x16/0x40 [ionic]
[ 3988.736726] ionic_reconfigure_queues+0x43e/0xc90 [ionic]
[ 3988.736738] ionic_lif_config_hwstamp_rxq_all+0x85/0x90 [ionic]
[ 3988.736751] ionic_lif_hwstamp_set_ts_config+0x29c/0x360 [ionic]
[ 3988.736763] ionic_lif_hwstamp_set+0x76/0xf0 [ionic]
[ 3988.736776] ionic_eth_ioctl+0x33/0x40 [ionic]
[ 3988.736781] dev_ifsioc+0x12c/0x420
[ 3988.736785] dev_ioctl+0x316/0x720
This can be demonstrated with "ptp4l -m -i <intf>"
To fix this, we pull the use of the queue_lock further up above the
callers of ionic_reconfigure_queues() and ionic_stop_queues_reconfig().
Fixes: 7ee99fc5ed ("ionic: pull hwstamp queue_lock up a level")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Various cleanup and small features for rtrs
- kmap_local_page() conversions
- Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
- Cache the IB subnet prefix
- Rework how CRC is calcuated in rxe
- Clean reference counting in iwpm's netlink
- Pull object allocation and lifecycle for user QPs to the uverbs core
code
- Several small hns features and continued general code cleanups
- Fix the scatterlist confusion of orig_nents/nents introduced in an
earlier patch creating the append operation
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmEudRgACgkQOG33FX4g
mxraJA//c6bMxrrTVrzmrtrkyYD4tYWE8RDfgvoyZtleZnnEOJeunCQWakQrpJSv
ukSnOGCA3PtnmRMdV54f/11YJ/7otxOJodSO7jWsIoBrqG/lISAdX8mn2iHhrvJ0
dIaFEFPLy0WqoMLCJVIYIupR0IStVHb/mWx0uYL4XnnoYKyt7f7K5JMZpNWMhDN2
ieJw0jfrvEYm8pipWuxUvB16XARlzAWQrjqLpMRI+jFRpbDVBY21dz2/LJvOJPrA
LcQ+XXsV/F659ibOAGm6bU4BMda8fE6Lw90B/gmhSswJ205NrdziF5cNYHP0QxcN
oMjrjSWWHc9GEE7MTipC2AH8e36qob16Q7CK+zHEJ+ds7R6/O/8XmED1L8/KFpNA
FGqnjxnxsl1y27mUegfj1Hh8PfoDp2oVq0lmpEw0CYo4cfVzHSMRrbTR//XmW628
Ie/mJddpFK4oLk+QkSNjSLrnxOvdTkdA58PU0i84S5eUVMNm41jJDkxg2J7vp0Zn
sclZsclhUQ9oJ5Q2so81JMWxu4JDn7IByXL0ULBaa6xwQTiVEnyvSxSuPlflhLRW
0vI2ylATYKyWkQqyX7VyWecZJzwhwZj5gMMWmoGsij8bkZhQ/VaQMaesByzSth+h
NV5UAYax4GqyOQ/tg/tqT6e5nrI1zof87H64XdTCBpJ7kFyQ/oA=
=ZwOe
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This is quite a small cycle, no major series stands out. The HNS and
rxe drivers saw the most activity this cycle, with rxe being broken
for a good chunk of time. The significant deleted line count is due to
a SPDX cleanup series.
Summary:
- Various cleanup and small features for rtrs
- kmap_local_page() conversions
- Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
- Cache the IB subnet prefix
- Rework how CRC is calcuated in rxe
- Clean reference counting in iwpm's netlink
- Pull object allocation and lifecycle for user QPs to the uverbs
core code
- Several small hns features and continued general code cleanups
- Fix the scatterlist confusion of orig_nents/nents introduced in an
earlier patch creating the append operation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (90 commits)
RDMA/mlx5: Relax DCS QP creation checks
RDMA/hns: Delete unnecessary blank lines.
RDMA/hns: Encapsulate the qp db as a function
RDMA/hns: Adjust the order in which irq are requested and enabled
RDMA/hns: Remove RST2RST error prints for hw v1
RDMA/hns: Remove dqpn filling when modify qp from Init to Init
RDMA/hns: Fix QP's resp incomplete assignment
RDMA/hns: Fix query destination qpn
RDMA/hfi1: Convert to SPDX identifier
IB/rdmavt: Convert to SPDX identifier
RDMA/hns: Bugfix for incorrect association between dip_idx and dgid
RDMA/hns: Bugfix for the missing assignment for dip_idx
RDMA/hns: Bugfix for data type of dip_idx
RDMA/hns: Fix incorrect lsn field
RDMA/irdma: Remove the repeated declaration
RDMA/core/sa_query: Retry SA queries
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
RDMA/hns: Delete unused hns bitmap interface
...
- Fix a kernel crash when a signal is delivered to bad userspace stack
- Fix fall-through warnings in math-emu code
- Increase size of gcc stack frame check
- Switch coding from 'pci_' to 'dma_' API
- Make struct parisc_driver::remove() return void
- Some parisc related Makefile changes
- Minor cleanups, e.g. change to octal permissions, fix macro collisions,
fix PMD_ORDER collision, replace spaces with tabs
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYTELwQAKCRD3ErUQojoP
Xy/uAQChkDVD15kBvj0PUt4hDpGq7ryfAsEfMnxlV2k4Ue6SKAEA3Smfd242lpPF
f89NNo6Y/ZhO+aWKfOLerXLfM6sB2QQ=
=cxvN
-----END PGP SIGNATURE-----
Merge tag 'for-5.15/parisc' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture updates from Helge Deller:
- Fix a kernel crash when a signal is delivered to bad userspace stack
- Fix fall-through warnings in math-emu code
- Increase size of gcc stack frame check
- Switch coding from 'pci_' to 'dma_' API
- Make struct parisc_driver::remove() return void
- Some parisc related Makefile changes
- Minor cleanups, e.g. change to octal permissions, fix macro
collisions, fix PMD_ORDER collision, replace spaces with tabs
* tag 'for-5.15/parisc' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: math-emu: Fix fall-through warnings
parisc: fix crash with signals and alloca
parisc: Fix compile failure when building 64-bit kernel natively
parisc: ccio-dma.c: Added tab instead of spaces
parisc/parport_gsc: switch from 'pci_' to 'dma_' API
parisc: move core-y in arch/parisc/Makefile to arch/parisc/Kbuild
parisc: switch from 'pci_' to 'dma_' API
parisc: Make struct parisc_driver::remove() return void
parisc: remove unused arch/parisc/boot/install.sh and its phony target
parisc: Rename PMD_ORDER to PMD_TABLE_ORDER
parisc: math-emu: Avoid "fmt" macro collision
parisc: Increase size of gcc stack frame check
parisc: Replace symbolic permissions with octal permissions
drivers/net/ethernet/i825xx/sun3_82586.c: In function ‘sun3_82586_probe’:
drivers/net/ethernet/i825xx/sun3_82586.c:317:9: warning: returning ‘struct net_device *’ from a function with return type ‘int’ makes integer from pointer without a cast [-Wint-conversion]
317 | return dev;
| ^~~
The return type of sun3_82586_probe() was changed, but one return value
was forgotten to be updated.
Fixes: e179d78ee1 ("m68k: remove legacy probing")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parameter names in the comments did not match the function arguments.
Fixes: 2138081708 ("bnxt_en: add support for HWRM request slices")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210901185315.57137-1-edwin.peer@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Here is the big set of driver core patches for 5.15-rc1.
These do change a number of different things across different
subsystems, and because of that, there were 2 stable tags created that
might have already come into your tree from different pulls that did the
following
- changed the bus remove callback to return void
- sysfs iomem_get_mapping rework
The latter one will cause a tiny merge issue with your tree, as there
was a last-minute fix for this in 5.14 in your tree, but the fixup
should be "obvious". If you want me to provide a fixed merge for this,
please let me know.
Other than those two things, there's only a few small things in here:
- kernfs performance improvements for huge numbers of sysfs
users at once
- tiny api cleanups
- other minor changes
All of these have been in linux-next for a while with no reported
problems, other than the before-mentioned merge issue.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYS+FLQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylXuACfWECnysDtXNe66DdETCFs1a1RToYAoMokWeU5
s8VFP1NY2BjmxJbkebLL
=8kVu
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core patches for 5.15-rc1.
These do change a number of different things across different
subsystems, and because of that, there were 2 stable tags created that
might have already come into your tree from different pulls that did
the following
- changed the bus remove callback to return void
- sysfs iomem_get_mapping rework
Other than those two things, there's only a few small things in here:
- kernfs performance improvements for huge numbers of sysfs users at
once
- tiny api cleanups
- other minor changes
All of these have been in linux-next for a while with no reported
problems, other than the before-mentioned merge issue"
* tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits)
MAINTAINERS: Add dri-devel for component.[hc]
driver core: platform: Remove platform_device_add_properties()
ARM: tegra: paz00: Handle device properties with software node API
bitmap: extend comment to bitmap_print_bitmask/list_to_buf
drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI
topology: use bin_attribute to break the size limitation of cpumap ABI
lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases
cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list
sysfs: Rename struct bin_attribute member to f_mapping
sysfs: Invoke iomem_get_mapping() from the sysfs open callback
debugfs: Return error during {full/open}_proxy_open() on rmmod
zorro: Drop useless (and hardly used) .driver member in struct zorro_dev
zorro: Simplify remove callback
sh: superhyway: Simplify check in remove callback
nubus: Simplify check in remove callback
nubus: Make struct nubus_driver::remove return void
kernfs: dont call d_splice_alias() under kernfs node lock
kernfs: use i_lock to protect concurrent inode updates
kernfs: switch kernfs to use an rwsem
kernfs: use VFS negative dentry caching
...
This patch reserves the LMTST lines per cpu instead
of separate LMTST lines for NPA(buffer free) and NIX(sqe flush).
LMTST line of the core on which SQ or RQ is processed is used
for LMTST operation.
This patch also replace STEOR with STEORL release semantics and
updates driver name in ethtool file.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check one more time before exiting the API with an error.
Fix API to poll at least twice, in case there are other high priority
tasks and this API doesn't get CPU cycles for multiple jiffies update.
In addition, increase timeout from usecs_to_jiffies(10000) to
usecs_to_jiffies(20000), to prevent the case that for CONFIG_100HZ
timeout will be a single jiffies.
A single jiffies results actual timeout that can be any time between
1usec and 10msec. To solve this, a value of usecs_to_jiffies(20000)
ensures that timeout is 2 jiffies.
Signed-off-by: Smadar Fuks <smadarf@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove repeated include of linux/module.h as it has been included
at line 8.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver requires 64-bit doorbell writes to be atomic on 32-bit
architectures. So we redefined writeq as a new macro with spinlock
protection on 32-bit architectures. This created a new warning when
we added a new file in a recent patchset. writeq is defined on many
32-bit architectures to do the memory write non-atomically and it
generated a new macro redefined warning. This warning was fixed
incorrectly in the recent patch.
Fix this properly by adding a new bnxt_writeq() function that will
do the non-atomic write under spinlock on 32-bit systems. All callers
in the driver will now call bnxt_writeq() instead.
v2: Need to pass in bp to bnxt_writeq()
Use lo_hi_writeq() [suggested by Florian]
Reported-by: kernel test robot <lkp@intel.com>
Fixes: f9ff578251 ("bnxt_en: introduce new firmware message API based on DMA pools")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pci_vpd_find_id_string() to find the VPD ID string. This simplifies the
code and avoids the need for pci_vpd_lrdt_size().
Link: https://lore.kernel.org/r/19ea2e9b-6e94-288a-6612-88db01b1b417@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Current settings may produce a build error when
CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls
a headfile <linux/of.h> and some functions
in <linux/of_net.h>.
Signed-off-by: Slark Xiao <slark_xiao@163.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch removes some unnecessary spaces for cleanup.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some required spaces to improve readability.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
abs() returns signed long, which could not convert the type
as unsigned, and it may cause a mismatch type warning from
static tools. To fix it, this patch uses an variable to save
the abs()'s result and does a explicit conversion.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver sets default feature for netdev->features,
netdev->hw_features, netdev->vlan_features and
netdev->hw_enc_features separately. It's fussy, because most
of the feature bits are same. So refine it by copy value from
netdev->features.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the devm_platform_ioremap_resource_byname() helper instead of
calling platform_get_resource_byname() and devm_ioremap_resource()
separately
Use the devm_platform_ioremap_resource() helper instead of
calling platform_get_resource() and devm_ioremap_resource()
separately
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With current config, for packets with IPv4 checksum errors,
errorcode is being set to UNKNOWN. Hence added a separate
errorcodes for outer and inner IPv4 checksum and changed
NPC configuration accordingly.
Also turn on L2 multicast address check in NPC protocol check block.
Fixes: 6b3321bacc ("octeontx2-af: Enable packet length and csum validation")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the static code analyzer reported issues
in rvu_npc.c. The reported errors are different sizes of
operands in bitops and returning uninitialized values.
Fixes: 651cd26523 ("octeontx2-af: MCAM entry installation support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In npc_update_vf_flow_entry function the loop cursor
'index' is being changed inside the loop causing
the loop to spin forever. This in turn hogs the kworker
thread forever and no other mbox message is processed
by AF driver after that. Fix this by using
another variable in the loop.
Fixes: 55307fcb92 ("octeontx2-af: Add mbox messages to install and delete MCAM rules")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the given counter does not belong to the entry
then code ends up in infinite loop because the loop
cursor, entry is not getting updated further. This
patch fixes that by updating entry for every iteration.
Fixes: a958dd59f9 ("octeontx2-af: Map or unmap NPC MCAM entry and counter")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The strlcpy should not be used because it doesn't limit the source
length. As linus says, it's a completely useless function if you
can't implicitly trust the source string - but that is almost always
why people think they should use it! All in all the BSD function
will lead some potential bugs.
But the strscpy doesn't require reading memory from the src string
beyond the specified "count" bytes, and since the return value is
easier to error-check than strlcpy()'s. In addition, the implementation
is robust to the string changing out from underneath it, unlike the
current strlcpy() implementation.
Thus, We prefer using strscpy instead of strlcpy.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For better performance set hardware to use NDC TX for reading packet
data specified NIX_SEND_SG_S.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on tests the QCA7000 doesn't support checksum offloading. So assume
ip_summed is CHECKSUM_NONE and let the kernel take care of the checksum
handling. This fixes data transfer issues in noisy environments.
Reported-by: Michael Heimpold <michael.heimpold@in-tech.com>
Fixes: 291ab06ecf ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In [1], Christoph Hellwig has proposed to remove the wrappers in
include/linux/pci-dma-compat.h.
Some reasons why this API should be removed have been given by Julia
Lawall in [2].
A coccinelle script has been used to perform the needed transformation
Only relevant parts are given below.
An 'unlikely()' has been removed when calling 'dma_mapping_error()' because
this function, which is inlined, already has such an annotation.
@@ @@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@ @@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
[1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/
[2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/bc6cd281eae024b26fd9c7ef6678d2d1dc9d74fd.1630150008.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In HTB offload mode, qdiscs of leaf classes are grafted to netdev
queues. sch_htb expects the dev_queue field of these qdiscs to point to
the corresponding queues. However, qdisc creation may fail, and in that
case noop_qdisc is used instead. Its dev_queue doesn't point to the
right queue, so sch_htb can lose track of used netdev queues, which will
cause internal inconsistencies.
This commit fixes this bug by keeping track of the netdev queue inside
struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the
new field, the two values are synced on writes, and WARNs are added to
assert equality of the two values.
The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue,
the driver used to pass the old and new queue IDs to sch_htb. Now that
there is a new field (offload_queue) in struct htb_class that needs to
be updated on this operation, the driver will pass the old class ID to
sch_htb instead (it already knows the new class ID).
Fixes: d03b195b5a ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
From Maor Gottlieb
====================
Fix the use of nents and orig_nents in the sg table append helpers. The
nents should be used by the DMA layer to store the number of DMA mapped
sges, the orig_nents is the number of CPU sges.
Since the sg append logic doesn't always create a SGL with exactly
orig_nents entries store a total_nents as well to allow the table to be
properly free'd and reorganize the freeing logic to share across all the
use cases.
====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* 'sg_nents':
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
This adds device tree probing support for the PTP module
adjacent to the ethernet module. It is pretty straight
forward, all resources are in the device tree as they
come to the platform device.
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver is being passed interrupts, then looking up the
same interrupts as GPIOs a second time to convert them into
interrupts and set properties on them.
This is pointless: the GPIO and irqchip APIs of a GPIO chip
are orthogonal. Just request the interrupts and be done
with it, drop reliance on any GPIO functions or definitions.
Use devres-managed functions and add a small devress quirk
to unregister the clock as well and we can rely on devres
to handle all the resources and cut down a bunch of
boilerplate in the process.
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the driver to use portable integer types to avoid warnings
during compile testing, including:
drivers/net/ethernet/xscale/ixp4xx_eth.c:721:21: error: cast to 'u32 *' (aka 'unsigned int *') from smaller integer type 'int' [-Werror,-Wint-to-pointer-cast]
memcpy_swab32(mem, (u32 *)((int)skb->data & ~3), bytes / 4);
^
drivers/net/ethernet/xscale/ixp4xx_eth.c:963:12: error: incompatible pointer types passing 'u32 *' (aka 'unsigned int *') to parameter of type 'dma_addr_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types]
&port->desc_tab_phys)))
^~~~~~~~~~~~~~~~~~~~
include/linux/dmapool.h:27:20: note: passing argument to parameter 'handle' here
dma_addr_t *handle);
^
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the recent ixp4xx cleanups, the ptp driver has gained a
build failure in some configurations:
drivers/net/ethernet/xscale/ptp_ixp46x.c: In function 'ptp_ixp_init':
drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: error: 'IXP4XX_TIMESYNC_BASE_VIRT' undeclared (first use in this function)
Avoid the last bit of hardcoded constants from platform headers
by turning the ptp driver bit into a platform driver and passing
the IRQ and MMIO address as resources.
This is a bit tricky:
- The interface between the two drivers is now the new
ixp46x_ptp_find() function, replacing the global
ixp46x_phc_index variable. The call is done as late
as possible, in hwtstamp_set(), to ensure that the
ptp device is fully probed.
- As the ptp driver is now called by the network driver, the
link dependency is reversed, which in turn requires a small
Makefile hack
- The GPIO number is still left hardcoded. This is clearly not
great, but it can be addressed later. Note that commit 98ac0cc270
("ARM: ixp4xx: Convert to MULTI_IRQ_HANDLER") changed the
IRQ number to something meaningless. Passing the correct IRQ
in a resource fixes this.
- When the PTP driver is disabled, ethtool .get_ts_info()
now correctly lists only software timestamping regardless
of the hardware.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[Fix a missing include]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The parameter name of hclge_ptp_clean_tx_hwts() in declaration is "dev",
but the definition of this function is used the common name "hdev" as
other functions, so modify it.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the "? :" statement wich max() to simplify code.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The type of tqp_vector->vector_irq is int, so modify its print format
to "%d".
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve flexibility, simplicity and maintainability to dump info of
every element of tm priority, add a struct hclge_dbg_item array of tm
priority and fill string of every data according to this array.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reconstructs function hclge_ets_validate() to reduce the code
cycle complexity and make code more concise.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reconstructs function hns3_self_test to reduce the code
cycle complexity and make code more concise.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make the format of each member initialization of structure array
clearer, initialize each member on a separate line.
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add infrastructure to maintain a pending list of HWRM commands awaiting
completion and reduce the scope of the hwrm_cmd_lock mutex so that it
protects only the request mailbox. The mailbox is free to use for one
or more concurrent commands after receiving deferred response events.
For uniformity and completeness, use the same pending list for
collecting completions for commands that respond via a completion ring.
These commands are only used for freeing rings and for IRQ test and
we only support one such command in flight.
Note deferred responses are also only supported on the main channel.
The secondary channel (KONG) does not support deferred responses.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are no longer any callers relying on the old API.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conversion follows this general pattern for most of the calls:
1. The input message is changed from a stack variable initialized
using bnxt_hwrm_cmd_hdr_init() to a pointer allocated and intialized
using hwrm_req_init().
2. If we don't need to read the firmware response, the hwrm_send_message()
call is replaced with hwrm_req_send().
3. If we need to read the firmware response, the mutex lock is replaced
by hwrm_req_hold() to hold the response. When the response is read, the
mutex unlock is replaced by hwrm_req_drop().
If additional DMA buffers are needed for firmware response data, the
hwrm_req_dma_slice() is used instead of calling dma_alloc_coherent().
Some minor refactoring is also done while doing these conversions.
v2: Fix unintialized variable warnings in __bnxt_hwrm_get_tx_rings()
and bnxt_approve_mac()
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently use the hwrm_cmd_lock to serialize the update of the
firmware's link status response data and the copying of link status data
to the VF. This won't work when we update the firmware message APIs, so
we use the link_lock mutex instead. All link_info data should be
updated under the link_lock mutex. Also add link_lock to functions that
touch link_info in __bnxt_open_nic() and bnxt_probe_phy(). The locking
is probably not strictly necessary during probe, but it's more consistent.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Slices are a mechanism for suballocating DMA mapped regions from the
request buffer. Such regions can be used for indirect command data
instead of creating new mappings with dma_alloc_coherent().
The advantage of using a slice is that the lifetime of the slice is
bound to the request and will be automatically unmapped when the
request is consumed.
A single external region is also supported. This allows for regions
that will not fit inside the spare request buffer space such that
the same API can be used consistently even for larger mappings.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hwrm_req_replace() provides an assignment like operation to replace a
managed HWRM request object with data from a pre-built source. This is
useful for handling request data provided by higher layer HWRM clients.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During firmware crash recovery, it is possible for firmware to respond
to stale HWRM commands that have already timed out. Because response
buffers may be reused, any out of sequence responses need to be ignored
and only the matching seq_id should be accepted.
Also, READ_ONCE should be used for the reads from the DMA buffer to
ensure that the necessary loads are scheduled.
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change constitutes a major step towards supporting multiple
firmware commands in flight by maintaining a separate response buffer
for the duration of each request. These firmware commands are also
known as Hardware Resource Manager (HWRM) commands. Using separate
response buffers requires an API change in order for callers to be
able to free the buffer when done.
It is impossible to keep the existing APIs unchanged. The existing
usage for a simple HWRM message request such as the following:
struct input req = {0};
bnxt_hwrm_cmd_hdr_init(bp, &req, REQ_TYPE, -1, -1);
rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
if (rc)
/* error */
changes to:
struct input *req;
rc = hwrm_req_init(bp, req, REQ_TYPE);
if (rc)
/* error */
rc = hwrm_req_send(bp, req); /* consumes req */
if (rc)
/* error */
The key changes are:
1. The req is no longer allocated on the stack.
2. The caller must call hwrm_req_init() to allocate a req buffer and
check for a valid buffer.
3. The req buffer is automatically released when hwrm_req_send() returns.
4. If the caller wants to check the firmware response, the caller must
call hwrm_req_hold() to take ownership of the response buffer and
release it afterwards using hwrm_req_drop(). The caller is no longer
required to explicitly hold the hwrm_cmd_lock mutex to read the
response.
5. Because the firmware commands and responses all have different sizes,
some safeguards are added to the code.
This patch maintains legacy API compatibiltiy, implementing the old
API in terms of the new. The follow-on patches will convert all
callers to use the new APIs.
v2: Fix redefined writeq with parisc .config
Fix "cast from pointer to integer of different size" warning in
hwrm_calc_sentinel()
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move all firmware messaging functions and definitions to new
bnxt_hwrm.[ch]. The follow-on patches will make major modifications
to these APIs.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the code so that __bnxt_hwrm_ver_get() does not call
bnxt_hwrm_do_send_msg() directly. The new APIs will not expose this
internal call. Add a new bnxt_hwrm_poll() to poll the HWRM_VER_GET
firmware call silently. The other bnxt_hwrm_ver_get() function will
send the HWRM_VER_GET message directly with error logs enabled.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The additional response buffer serves no useful purpose. There can
be only one firmware command in flight due to the hwrm_cmd_lock mutex,
which is taken for the entire duration of any command completion,
KONG or otherwise. It is thus safe to share a single DMA buffer.
Removing the code associated with the additional mapping will simplify
matters in the next patch, which allocates response buffers from DMA
pools on a per request basis.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The caller of this function (parisc_driver_remove() in
arch/parisc/kernel/drivers.c) ignores the return value, so better don't
return any value at all to not wake wrong expectations in driver authors.
The only function that could return a non-zero value before was
ipmi_parisc_remove() which returns the return value of
ipmi_si_remove_by_dev(). Make this function return void, too, as for all
other callers the value is ignored, too.
Also fold in a small checkpatch fix for:
WARNING: Unnecessary space before function pointer arguments
+ void (*remove) (struct parisc_device *dev);
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> (for drivers/input)
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
In [1], Christoph Hellwig has proposed to remove the wrappers in
include/linux/pci-dma-compat.h.
Some reasons why this API should be removed have been given by Julia
Lawall in [2].
A coccinelle script has been used to perform the needed transformation
Only relevant parts are given below.
It has been hand modified to use 'dma_set_mask_and_coherent()' instead of
'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
This is less verbose.
Finally, the now useless 'dma_mask' variable has been removed.
It has been compile tested.
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)
[1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/
[2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In [1], Christoph Hellwig has proposed to remove the wrappers in
include/linux/pci-dma-compat.h.
Some reasons why this API should be removed have been given by Julia
Lawall in [2].
A coccinelle script has been used to perform the needed transformation
Only relevant parts are given below.
@@ @@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
@@ @@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@ @@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
[1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/
[2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
NPC extraction profile marks layer types
NPC_LT_LB_CTAG for CTAG and NPC_LT_LB_STAG_QINQ for
STAG after parsing input packet. Those layer types
can be used to install ntuple filters using
vlan-etype option. Below are the commands and
corresponding behavior with this patch in place.
> alias nt "ethtool -U eth0 flow-type ether"
> nt vlan 5 m 0xf000 action 0
Input packets with outer VLAN id as 5 i.e,
stag packets with VLAN id 5 and ctag packets with
VLAN id as 5 are hit.
> nt vlan-etype 0x8100 action 0
All input ctag packets with any VLAN id are hit.
> nt vlan-etype 0x88A8 action 0
All input stag packets with any VLAN id are hit.
> nt vlan-etype 0x8100 vlan 5 m 0xf000 action 0
All input ctag packets with VLAN id 5 are hit.
> nt vlan-etype 0x88A8 vlan 5 m 0xf000 action 0
All input stag packets with VLAN id 5 are hit.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver crashes when restoring from the Hibernate. In the resume flow,
driver need to clean up the older nic/vec objects and re-initialize them.
Fixes: 8aaa112a57 ("net: atlantic: refactoring pm logic")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed inconsistent license text across the RVU admin
function driver.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed inconsistent license text across the netdev
drivers.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2021-08-27
ravindhan Gunasekaran says:
This adds support for Credit-based shaper qdisc offload from
Traffic Control system. It enables traffic prioritization and
bandwidth reservation via the Credit-Based Shaper which is
implemented in hardware by i225 controller.
Patch 1/3 adds a default cycle-time for TSN mode to be configured.
Patch 2/3 helps to separate TSN mode programming on the fly and
during reset sequence. It also simplifies handling features flags
for various TSN modes supported by i225 in the driver.
Patch 3/3 adds support for IEEE802.1Qav(CBS) standard
implemented in i225 HW. Two sets of CBS HW shapers are present
in i225 and driver enables them in the two high priority queues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The queues can be freed in ionic_close(). They need to be recreated
after ionic_open(). It doesn't need to replay the whole config. It
only needs to create the timestamping queues again.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the hwstamp configuration use of queue_lock up
a level to simplify use and error handling.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the queue configuration lock to ionic_open() and
ionic_stop() so that they don't collide with other in parallel
queue configuration actions such as MTU changes as can be
demonstrated with a tight loop of ifup/change-mtu/ifdown.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure the ctx struct has the new mac address before
any save operations happen.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the heartbeat check will already have complained about
the firmware status, don't bother complaining about the
DEVCMD failing. We'll keep the print message but demote it
to a debug messages so that we normally no longer see it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases of fw_down it was called because there was a
fw_generation change, and the firmware is already back up.
In order to keep the down time to a minimum, don't wait for
the next watchdog polling cycle, fire another watchdog off
as soon as we can - an out-of-cycle check won't hurt, and
may well speed up the recovery.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some required spaces in comment for cleanup.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some local variable declarations are no need to add "static", so remove it.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function hclge_tm_dwrr_cfg() will be called twice in function
hclge_ieee_setets() when map_changed is true, the calling flow is
hclge_ieee_setets()
hclge_map_update()
| hclge_tm_schd_setup_hw()
| hclge_tm_dwrr_cfg()
hclge_notify_init_up()
hclge_tm_dwrr_cfg()
It is no need to call hclge_tm_dwrr_cfg() twice actually, so just
return after calling hclge_notify_init_up().
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, function hclge_check_port_speed() uses switch/case statement
to get speed bit according to speed. To reuse this part of code and
improve code readability and maintainability, add a new function
hclge_get_speed_bit() to get speed bit according to map relationship
of speed and speed bit defined in array speed_bit_map.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function hclgevf_parse_capability() will add more if statement in the
future, to improve code readability, maintainability and simplicity,
refactor this function by using a bit mapping array of IMP capabilities
and driver capabilities.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function hclge_parse_capability() uses too many if statement, and
it may add more in the future. To improve code readability, maintainability
and simplicity, refactor this function by using a bit mapping array of IMP
capabilities and driver capabilities.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a trace to get the info of pf responds to the mailbox message of vf.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count packets dropped due to buffer or skb allocation errors.
Report as part of rx_dropped.
v2: drop the ethtool -S entry [Vladimir]
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt may discard packets if Rx completions are consumed
in an attempt to let netpoll make progress. It should be
extremely rare in practice but nonetheless such events
should be counted.
Since completion ring memory is allocated dynamically use
a similar scheme to what is done for HW stats to save them.
Report the stats in rx_dropped and per-netdev ethtool
counter. Chances that users care which ring dropped are
very low.
v3: only save the stat to rx_dropped on reset,
rx_total_netpoll_discards will now only show drops since
last reset, similar to other "total_discard" counters.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When we enabled auxiliary input/output support for the E810 device, we
forgot to add logic to restart the output when we change time. This is
important as the periodic output will be incorrect after a time change
otherwise.
This unfortunately includes the adjust time function, even though it
uses an atomic hardware interface. The atomic adjustment can still cause
the pin output to stall permanently, so we need to stop and restart it.
Introduce wrapper functions to temporarily disable and then re-enable
the clock outputs.
Fixes: 172db5f91d ("ice: add support for auxiliary input/output pins")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sunitha D Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement support for Credit-based shaper(CBS) Qdisc hardware
offload mode in the driver. There are two sets of IEEE802.1Qav
(CBS) HW logic in i225 controller and this patch supports
enabling them in the top two priority TX queues.
Driver implemented as recommended by Foxville External
Architecture Specification v0.993. Idleslope and Hi-credit are
the CBS tunable parameters for i225 NIC, programmed in TQAVCC
and TQAVHC registers respectively.
In-order for IEEE802.1Qav (CBS) algorithm to work as intended
and provide BW reservation CBS should be enabled in highest
priority queue first. If we enable CBS on any of low priority
queues, the traffic in high priority queue does not allow low
priority queue to be selected for transmission and bandwidth
reservation is not guaranteed.
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Separates the procedure done during reset from applying a
configuration, knowing when the code is executing allow us to
separate the better what changes the hardware state from what
changes only the driver state.
Introduces a flag for bookkeeping the driver state of TSN
features. When Qav and frame-preemption is also implemented
this flag makes it easier to keep track on whether a TSN feature
driver state is enabled or not though controller state changes,
say, during a reset.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Sets default values for each queue cycle start and cycle end.
This allows some simplification in the handling of these
configurations as most TSN features in i225 require a cycle
to be configured.
In i225, cycle start and end time is required to be programmed
for CBS to work properly.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver didn't take the lock while flushing the Tx tracker, which
could cause a race where one thread is trying to read timestamps out
while another thread is trying to read the tracker to check the
timestamps.
Avoid this by ensuring that flushing is locked against read accesses.
Fixes: ea9b847cda ("ice: enable transmit timestamps for E810 devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We have code in the ice driver which allocates the pin_config structure
if n_pins is > 0, but we never set n_pins to be greater than zero.
There's no reason to keep this code until we actually have pin_config
support. Remove this. We can re-add it properly when we implement
support for pin_config for E810-T devices.
Fixes: 172db5f91d ("ice: add support for auxiliary input/output pins")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver accidentally copied the ice_for_each_rxq iterator when
implementing enablement of the ptp_tx bit for the Tx rings. We still
load the Tx rings and set the ptp_tx field, but we iterate over the
count of the num_rxq.
If the number of Tx and Rx queues differ, this could either cause
a buffer overrun when accessing the tx_rings list if num_txq is greater
than num_rxq, or it could cause us to fail to enable Tx timestamps for
some rings.
This was not noticed originally as we generally have the same number of
Tx and Rx queues.
Fixes: ea9b847cda ("ice: enable transmit timestamps for E810 devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enhancing the mailbox scope to support important configurations
like enabling scheduled LMTST, disable LMTLINE prefetch, disable
early completion for ordered LMTST, as per request from the
application. On FLR these configurations will be reset to default.
This patch also adds the 95XXO silicon version to octeontx2 silicon
list.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The parameter cmd in function definition of hns3_dbg_bd_file_init and
hns3_dbg_common_file_init is used type u32, this patch uniforms them
in function declaration to type u32 too.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some repetitive macros have same meaning and value, this patch
merges them to make code clean.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch packages two new function to simplify the function
hclgevf_mbx_handler, and it can reduce the code cycle complexity
and make code more concise.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The param msg_q is redundant, copy &req->msg to
hdev->arq.msg_q[hdev->arq.tail] directly makes code clean.
So removes the redundant param msg_q.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memcpy to copy req->msg.resp_data to resp->additional_info,
to simplify the code and improve a little efficiency.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the redundant param mbx_event_pending.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve the readability and maintainability, add hns3_state_init() to
initialize the state, and this new function will be used to add more state
initialization in the future.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve code readability, replace digital numbers of mac speeds
defined by firmware command with macros.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series contains various fixes, additions and improvements to
mlx5 software steering.
Patch 1:
adds support for REMOVE_HEADER packet reformat - a new reformat type
that is supported starting with ConnectX-6 DX, and allows removing an
arbitrary size packet segment at a selected position.
Patches 2 and 3:
add support for VLAN pop on TX and VLAN push on RX flows.
Patch 4:
enables retransmission mechanism for the SW Steering RC QP.
Patch 5:
does some improvements to error flow in building STE array and adds
a more informative printout of an invalid actions sequence.
Patch 6:
improves error flow on SW Steering QP error.
Patch 7:
reduces the log level of a message that is printed when a table is
connected to a lower/same level destination table, as this case proves to
be not as rare as it was in the past.
Patch 8:
adds missing support for matching on IPv6 flow label for devices
older than ConnectX-6 DX.
Patch 9:
replaces uintN_t types with kernel-style types.
Patch 10:
allows for using the right API for updating flow tables - if it is
a FW-owned table, then FW API will be used.
Patch 11:
adds support for 'ignore_flow_level' on multi-destination flow
tables that are created by SW Steering.
Patch 12:
optimizes FDB RX steering rule by skipping matching on source port,
as the source port for all incoming packets equals to wire.
Patch 13:
is a small code refactoring - it merges several DR_STE_SIZE enums
into a single enum.
Patch 14:
does some additional refactoring and removes HW-specific STE type
from NIC domain.
Patch 15:
removes rehash ctrl struct from dr_htbl struct and saves some memory.
Patch 16:
does a more significant improvement in terms of memory consumption
and was able to save about 1.6 Gb for 8M rules.
Patch 17:
adds support for update FTE, which is needed for cases where there
are multiple rules with the same match.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmEoF8wACgkQSD+KveBX
+j7+oAgAvNXfSvagaGpqIqSKY39u4YGoaJzMga80Lj6AtnY3PBTUQUY3B7fReiqg
8ZwYU9fCEVk4TFRTRvZOTph/lRw87SEAtFUXNLYlYrckDehQN2Jxw25uPLgy5cws
pYkuBiljpMUf/Fqtk2x9i2gaV3eMjA9Hr8beP4OUWAfoFM/BOIBByK4hcb5qRjBf
UUOOLCRS7QIktusnlc0jMlNJx0J2v5rNHH3aNKzsyqXknB9yhy49CUsQ5Y85bB7N
tb41eFeV4VniDXU5MAuildgCv/KZ3FZUyPuOh5qwO7T8K+eUm1laCC0/sojWRAy9
25BuhdpMn/OhCpyNGbn6Frs7C7+fmA==
=Fzxb
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
This patch series contains various fixes, additions and improvements to
mlx5 software steering.
Patch 1:
adds support for REMOVE_HEADER packet reformat - a new reformat type
that is supported starting with ConnectX-6 DX, and allows removing an
arbitrary size packet segment at a selected position.
Patches 2 and 3:
add support for VLAN pop on TX and VLAN push on RX flows.
Patch 4:
enables retransmission mechanism for the SW Steering RC QP.
Patch 5:
does some improvements to error flow in building STE array and adds
a more informative printout of an invalid actions sequence.
Patch 6:
improves error flow on SW Steering QP error.
Patch 7:
reduces the log level of a message that is printed when a table is
connected to a lower/same level destination table, as this case proves to
be not as rare as it was in the past.
Patch 8:
adds missing support for matching on IPv6 flow label for devices
older than ConnectX-6 DX.
Patch 9:
replaces uintN_t types with kernel-style types.
Patch 10:
allows for using the right API for updating flow tables - if it is
a FW-owned table, then FW API will be used.
Patch 11:
adds support for 'ignore_flow_level' on multi-destination flow
tables that are created by SW Steering.
Patch 12:
optimizes FDB RX steering rule by skipping matching on source port,
as the source port for all incoming packets equals to wire.
Patch 13:
is a small code refactoring - it merges several DR_STE_SIZE enums
into a single enum.
Patch 14:
does some additional refactoring and removes HW-specific STE type
from NIC domain.
Patch 15:
removes rehash ctrl struct from dr_htbl struct and saves some memory.
Patch 16:
does a more significant improvement in terms of memory consumption
and was able to save about 1.6 Gb for 8M rules.
Patch 17:
adds support for update FTE, which is needed for cases where there
are multiple rules with the same match.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the support for update FTE, which is needed for cases where there are
multiple rules with the same match. In such case fs_core will merge the
actions and call update FTE to update current FTE. Since we don't want to
disrupt the traffic, we will add the new duplicate rule, and only then
remove the old duplicate rule.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To track each STE of the rule a rule member was allocated, each
member would point to one STE. This means that we would allocate
40B (rule member) * number of STEs per rule.
To reduce this per rule allocation we use the STE tree pointers
for next_htbl and pointing STE to navigate the tree, this allows
us to keep only the pointer to the last STE of rule (always unique).
From the last rule STE we are able to traverse and rebuild all of
the STEs that construct the rule.
In our testing with 8M rules, each consisting of 7 STES, we were able
to reduce 1.6GB of memory.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The calculations to decide for the maximum allowed collision threshold
are simple and there is no reason to save them on the htbl struct.
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Instead of using the HW specific STEv0 type, it is better to use
an enum to indicate if this is an RX or TX nic domain.
This means that now we will need to convert the nic domain type
to the corresponding STE type.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Merge DR_STE_SIZE enums - no need for a separate enum for reduced STE size.
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The FDB RX pipe is connected to the wire and the source port for all
incoming packets equals to wire, single uplink port per PF, this means
there is no point of matching on the source port in such case.
Once we recognize such case, we will optimize the RX steering rule.
Note that in such case we clean both source_eswitch_owner_vhca_id and
source_port.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When creating an FTE, we might need to create multi-destination flow table,
which is eventually created by FW. In such case, this FW table should
include all the FTE properties as requested by the upper layer, including
the ability to point to another flow table with level lower or equal to
the current table - indicated by the "ignore_flow_level" property.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Need to call the DR API only when it is DR table.
To update FW-owned table the driver should call the FW API.
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add missing support for matching on IPv6 flow label for STEv0.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There are usecases with Connection Tracking that have such connection
as default, printing this warning in dmesg confuses the user.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In the event of SW steering QP entering error state, SW steering
cannot insert more rules, and will silently ignore the insertion
after issuing a warning.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Under high stress, SW steering might get stuck on polling for completion
that never comes.
For such cases QP needs to have protocol retransmission mechanism enabled.
Currently the retransmission timeout is defined as 0 (unlimited). Fix this
by defining a real timeout.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Enable pop VLAN action in TX and push VLAN in RX.
These actions are supported only on STEv1.
On TX: when a host sends a packet, VLAN is popped at the beginning.
On RX: just before passing the packet to the host the VLAN is pushed.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Split modify vlan state in the actions state machine to pop vlan
and push vlan states. This enables using of pop/push vlan without
restrictions (e.g. pop vlan on TX in STEv1).
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
ConnectX supports offloading of various encapsulations and decapsulations
(e.g. VXLAN), which are performed by 'Packet Reformat' action. Starting
with ConnectX-6 DX, a new reformat type is supported - REMOVE_HEADER, which
allows deleting an arbitrary size chunk at the selected position in the packet.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In line 849 (#1), "mlx5dr_htbl_put(cur_htbl);" drops the reference to
cur_htbl and may cause cur_htbl to be freed.
However, cur_htbl is subsequently used in the next line, which may result
in an use-after-free bug.
Fix this by calling mlx5dr_err() before the cur_htbl is put.
Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If link aggregation is used within stack devices driver rejects encap
rules if PF of the VF tunnel device is down. This happens because route
resolved for other PF and its eswitch instance is used to determine
correct vport.
To fix that use devcom feature to retrieve other eswitch instance if
failed to find vport for the 1st eswitch and LAG is active.
Fixes: 10742efc20 ("net/mlx5e: VF tunnel TX traffic offloading")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When indirect forward group is created, flow is added with vhca id but
without setting vhca id valid flag which violates the PRM.
Fix by setting the missing flag, vhca id valid.
Fixes: 34ca65352d ("net/mlx5: E-Switch, Indirect table infrastructure")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
After neigh-update-add failure we are still with a slow path rule but
the driver always assume the rule is an fdb rule.
Fix neigh-update-del by checking slow path tc flag on the flow.
Also fix neigh-update-add for when neigh-update-del fails the same.
Fixes: 5dbe906ff1 ("net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The call to mlx5_unregister_device() means that mlx5_core driver is
removed. In such scenario, we need to disregard all other flags like
attach/detach and forcibly remove all auxiliary devices.
Fixes: a5ae8fc905 ("net/mlx5e: Don't create devices during unload flow")
Tested-and-Reported-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When handling FIB_EVENT_ENTRY_REPLACE event for a new multipath route,
lag activation can be missed if a stale (struct lag_mp)->mfi pointer
exists, which was associated with an older multipath route that had been
removed.
Normally, when a route is removed, it triggers mlx5_lag_fib_event(),
which handles FIB_EVENT_ENTRY_DEL and clears mfi pointer. But, if
mlx5_lag_check_prereq() condition isn't met, for example when eswitch is
in legacy mode, the fib event is skipped and mfi pointer becomes stale.
Fix by resetting mfi pointer to NULL in mlx5_deactivate_lag().
Fixes: 8a66e45859 ("net/mlx5: Change ownership model for lag")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, when query PFC configuration by dcbtool, driver will return
PFC enable status based on TC. As all priorities are mapped to TC0 by
default, if TC0 is enabled, then all priorities mapped to TC0 will be
shown as enabled status when query PFC setting, even though some
priorities have never been set.
for example:
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off
$ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on
To fix this problem, just returns user's PFC config parameter saved in
driver.
Fixes: cacde272dd ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The GRO configuration is enabled by default after reset. This
is incorrect and should be restored to the user-configured value.
So this restoration is added during reset initialization.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the cmd index is obtained in debugfs by comparing file names.
However, this method may cause errors when processing more complex file
names. So, change this method by saving cmd in private data and comparing
it when getting cmd index in debugfs for optimization.
Fixes: 5e69ea7ee2 ("net: hns3: refactor the debugfs process")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VLAN list should not be added duplicate VLAN node, otherwise it would
cause "add failed" when restore VLAN from VLAN list, so this patch adds
VLAN ID check before adding node into VLAN list.
Fixes: c6075b1934 ("net: hns3: Record VF vlan tables")
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bond 4, when the link goes down and up repeatedly, the bond may get an
unknown speed, and then this port can not work.
The driver notify netif_carrier_on() before update the link state, when the
bond receive carrier on, will query the speed of the port, if the query
operation happens before updating the link state, will get an unknown
speed. So need to notify netif_carrier_on() after update the link state.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes: e2cb1dec97 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After the cmdq registers are cleared, the firmware may take time to
clear out possible left over commands in the cmdq. Driver must release
cmdq memory only after firmware has completed processing of left over
commands.
Fixes: 232d0d55fc ("net: hns3: uninitialize command queue while unloading PF driver")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a PF is bonded to a virtual machine and the virtual machine exits
unexpectedly, some hardware resource cannot be cleared. In this case,
loading driver may cause exceptions. Therefore, the hardware resource
needs to be cleared when the driver is loaded.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
LiteX is a soft system-on-chip that targets FPGAs. LiteETH is a basic
network device that is commonly used in LiteX designs.
The driver was first written in 2017 and has been maintained by the
LiteX community in various trees. Thank you to all who have contributed.
Co-developed-by: Gabriel Somlo <gsomlo@gmail.com>
Co-developed-by: David Shah <dave@ds0.me>
Co-developed-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Tested-by: Gabriel Somlo <gsomlo@gmail.com>
Reviewed-by: Gabriel Somlo <gsomlo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a function for what has been magic register writes so far.
It's based on recent changes to vendor drivers r8101, r8168, r8125,
and deals with events that trigger an early ASPM L1 exit.
Description of the bits has been kindly provided by Realtek.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If otx2_mbox_get_rsp() fails, otx2_set_flowkey_cfg() need return an
error code.
Fixes: e793836545 ("octeontx2-pf: Fix algorithm index in MCAM rules with RSS action")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adapter init fails, the blocked freelist bitmap is already freed
up and should not be touched. So, move the bitmap zeroing closer to
where it was successfully allocated. Also handle adapter init failure
unwind path immediately and avoid setting up RDMA memory windows.
Fixes: 5b377d114f ("cxgb4: Add debugfs facility to inject FL starvation")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure we go into PROMISC mode when we have too many
filters by specifically counting the filters that successfully
get saved to the firmware.
The device advertises max_ucast_filters and max_mcast_filters,
but really only has max_ucast_filters slots available for
uc and mc filters combined.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The filter counting in ionic_lif_addr() really isn't useful,
and potentially misleading, especially when we're checking in
ionic_lif_rx_mode() to see if we need to go into PROMISC mode.
We can safely refactor this and remove a calling layer.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to separate the atomic needs of __dev_uc_sync()
and __dev_mc_sync() from the safe rx_mode handling, we need
to have the ndo handler manipulate the driver's filter list,
and later have the driver sync the filters to the firmware,
outside of the atomic context.
Here we put __dev_mc_sync() and __dev_uc_sync() back into the
ndo callback to give them their netif_addr_lock context and
have them update the driver's filter list, flagging changes
that should be made to the device filter list. Later, in the
rx_mode handler, we read those hints and sync up the device's
list as needed.
It is possible for multiple add/delete requests to come from
the stack before the rx_mode task processes the list, but the
handling of the sync status flag should keep everything sorted
correctly. For example, if a delete of an existing filter is
followed by another add before the rx_mode task is run, as can
happen when going in and out of a bond, the add will cancel
the delete and no actual changes will be sent to the device.
We also add a check in the watchdog to see if there are any
stray unsync'd filters, possibly left over from a filter
overflow and waiting to get sync'd after some other filter
gets removed to make room.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since only two functions call through ionic_set_rx_mode(), one
that can sleep and one that can't, we can split the function
and put the bits of code into the callers. This removes an
unnecessary calling layer.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the move of mac filter handling to outside of the
ndo_rx_mode context using the IONIC_DW_TYPE_RX_MODE,
we no longer are using IONIC_DW_TYPE_RX_ADDR_ADD and
IONIC_DW_TYPE_RX_ADDR_DEL and they can be removed.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added mbox for PF/VF drivers to retrieve current ingress bandwidth
profile free count. Also added current policer timeunit
configuration info based on which ratelimiting decisions can be
taken by PF/VF drivers.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New usecases are popping up where in user wants to install common MCAM
filters for all interfaces. Having channel verification will result in
duplicating such MCAM filters for each of the ingress interface. Hence
removed channel verification.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K slicon has different device id for PTP device.
Hence this patch updates the driver with new id.
Though ptp driver being a separate driver AF manages
configuring PTP block by all PFs. To manage ptp, AF
driver checks in its probe whether
1. ptp hardware device found on silicon
2. A driver is bound to ptp device
3. The ptp driver probe is successful
In failure of cases 1 and 3, AF proceeds with out ptp
and for case 2 defers the probe. This patch refactors
code also to check for all the PTP device ids given in
ptp device ids table for case 1.
Also added PTP device ID for 95O silicon
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon receiving the MBOX_MSG_FREE_RSRC_CNT, the AF will find out the
current number of free resources and reply it back to the requester. No
guarantee is given on the future state of the free resources yet.
If another requester sends MBOX_MSG_ATTACH_RESOURCES after this call,
the number of available resources might change.
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for packet IO via SDK links which is used when
Octeon is connected as a end-point. Traffic host to end-point
and vice versa flow through SDP links. This patch also support
dual SDP blocks supported in 98xx silicon.
Signed-off-by: Radha Mohan Chintakuntla <radhac@marvell.com>
Signed-off-by: Nalla Pradeep <pnalla@marvell.com>
Signed-off-by: Subrahmanyam Nilla <snilla@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 98xx, there are 2 NIX blocks and 4 LBK blocks present. The way
these NIX-LBK should be configured depends on the use case. By
default loopback functionality is supported in AF VF pairs which
are attached to NIX0 and NIX1 LFs alternatively to ensure load
balancing. NIX0 transmits a packet to LBK1 which will be received
by NIX1 and packet transmitted by NIX1 will get received by NIX0 via
LBK2.
There are some requirements where only one AF VF is used and respective
NIX is expected to operate in a mode where it can receive it own packet
back. This can be achieved if NIX0 sends packet to LBK0 and not LBK1.
Adding a flag in LF alloc request mailbox which can setup NIX0 to use
LBK0 and NIX1 can use LBK3.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike OcteonTx2, the channel numbers used by CGX/RPM
and LBK on CN10K silicons aren't fixed in HW. They are
SW programmable, hence we cannot derive transmit link
from static channel numbers anymore. Get the same from
admin function via mailbox.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before C0 HW revision, The RSS adder was computed based the
following static formula.
rss_adder<7:0> = flow_tag<7:0> ^ flow_tag<15:8> ^
flow_tag<23:16> ^ flow_tag<31:24>
The above scheme has the following drawbacks:
1) It is not in line with other standard NIC behavior.
2) There can be an SW use case where SW can compute the hash
upfront using Toeplitz function and predict the queue selection
to optimize some packet lookup function. The nonstandard
way of doing XOR makes the consumer to not predict the queue selection.
C0 HW revision onwards, The HW can configure the
rss_adder<7:0> as flow_tag<7:0> to align with standard NICs.
This patch adds an option to select legacy RSS adder mode
vs standard NIC behavior by setting NIX_LF_RSS_TAG_LSB_AS_ADDER flag.
Since this bit field is used as reserved in old HW revisions,
No need to have an additional HW version check.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting from 96xx C0 onwards all silicons support traffic shaping.
This patch enables that feature along with other changes
- When PIR/CIR shaping config is modified, toggle SW_XOFF
for config to take effect
- Before SMQ flush, clear SW_XOFF at all parent schedulers
- Support to read current transmit scheduler configuration via mbox
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
it's helpful for complie test in other platform(e.g.X86)
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NIX_AF_TX_LINKX_NORM_CREDIT holds running counter of
tx credits available per link. But, tx credits should be
configured based on MTU config. So MTU change needs tx
credit count update.
An issue exists whereby when both PF & VF are enabled and
PF traffic is flowing, if VF requests for MTU update,
updating the NORM_CREDIT register will lead to corruption
of credit count and subsequent deadlock of tx link as
the NORM_CREDIT register holds running count.
This patch provides workaround by pausing link traffic
using NIX_AF_TL1X_SW_XOFF, waiting for existing packets to
drain, and used credits be returned before updating new
credit count.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear and disable interrupt before queueing work as there might be
a chance that work gets completed on other core faster and
interrupt enable as a part of the work completes before
interrupt disable in the interrupt context. This leads to
permanent disable of interrupt.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set NPA batch allocation engine to process 35 cache lines
per turn on CN10k platform.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset support is present on R-Car. Let's support it, if it is
available.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The E-MAC IP on the R-Car AVB module has different initialization
parameters for RX frame size, duplex settings, different offset
for transfer speed setting and has magic packet detection support
compared to E-MAC on RZ/G2L Gigabit Ethernet module. Factorise
the ravb_emac_init function to support the later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMAC IP on the R-Car AVB module has different initialization
parameters for RCR, TGC, TCCR, RIC0, RIC2, and TIC compared to
DMAC IP on the RZ/G2L Gigabit Ethernet module. Factorise the
ravb_dmac_init function to support the later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RZ/G2L supports HW checksum on RX and TX whereas R-Car supports on RX.
Factorise ravb_set_features to support this feature.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car supports 100 and 1000 Mbps transfer speed whereas RZ/G2L
in addition support 10Mbps. Factorise ravb_adjust_link function
in order to support 10Mbps speed.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car uses an extended descriptor in RX whereas, RZ/G2L uses
normal descriptor in RX. Factorise the ravb_rx function to
support the later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ravb_ring_init function uses an extended descriptor in RX for
R-Car and normal descriptor for RZ/G2L. Add a helper function
for RX ring buffer allocation to support later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ravb_ring_format function uses an extended descriptor in RX
for R-Car compared to the normal descriptor for RZ/G2L. Factorise
RX ring buffer buildup to extend the support for later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car uses extended descriptor in RX, whereas RZ/G2L uses normal
descriptor. Factorise ravb_ring_free function so that it can
support later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some H/W differences for the gPTP feature between
R-Car Gen3, R-Car Gen2, and RZ/G2L as below.
1) On R-Car Gen3, gPTP support is active in config mode.
2) On R-Car Gen2, gPTP support is not active in config mode.
3) RZ/G2L does not support the gPTP feature.
Add a ptp_cfg_active hw feature bit to struct ravb_hw_info for
supporting gPTP active in config mode for R-Car Gen3.
This patch also removes enum ravb_chip_id, chip_id from both
struct ravb_hw_info and struct ravb_private, as it is unused.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some H/W differences for the gPTP feature between
R-Car Gen3, R-Car Gen2, and RZ/G2L as below.
1) On R-Car Gen2, gPTP support is not active in config mode.
2) On R-Car Gen3, gPTP support is active in config mode.
3) RZ/G2L does not support the gPTP feature.
Add a no_ptp_cfg_active hw feature bit to struct ravb_hw_info for
handling gPTP for R-Car Gen2.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car Gen3 supports separate interrupts for E-MAC and DMA queues,
whereas R-Car Gen2 and RZ/G2L have a single interrupt instead.
Add a multi_irq hw feature bit to struct ravb_hw_info to enable
this only for R-Car Gen3.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For addressing 4 bytes alignment restriction on transmission
buffer for R-Car Gen2 we use 2 descriptors whereas it is a single
descriptor for other cases.
Replace the macros NUM_TX_DESC_GEN[23] with magic number and
add a comment to explain it.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to Kees Cook who detected the problem of memset that starting
from not the first member, but sized for the whole struct.
The better change will be to remove the redundant memset and to clear
only the msix_cnt member.
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is not an expected case normally.
Add WARN_ON_ONCE in case of CQE read overflow, instead of failing
silently.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code uses (1 + #vPorts * #Queues) MSIXs, which may exceed
the device limit.
Support EQ sharing, so that multiple vPorts (NICs) can share the same
set of MSIXs.
And, report the EQ-sharing capability bit to the host, which means the
host can potentially offer more vPorts and queues to the VM.
Also update the resource limit checking and error handling for better
robustness.
Now, we support up to 256 virtual ports per VF (it was 16/VF), and
support up to 64 queues per vPort (it was 16).
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code has NAPI threads polling on EQ directly. To prepare
for EQ sharing among vPorts, move NAPI from EQ to CQ so that one EQ
can serve multiple CQs from different vPorts.
The "arm bit" is only set when CQ processing is completed to reduce
the number of EQ entries, which in turn reduce the number of interrupts
on EQ.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function 'netxen_rom_fast_read' is declared twice, so remove the
repeated declaration.
Cc: Manish Chopra <manishc@marvell.com>
Cc: Rahul Verma <rahulv@marvell.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang warns:
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2785:2: error: variable 'kw_offset' is uninitialized when used here [-Werror,-Wuninitialized]
FIND_VPD_KW(i, "RV");
^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2776:39: note: expanded from macro 'FIND_VPD_KW'
var = pci_vpd_find_info_keyword(vpd, kw_offset, vpdr_len, name); \
^~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2748:34: note: initialize the variable 'kw_offset' to silence this warning
unsigned int vpdr_len, kw_offset, id_len;
^
= 0
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2785:2: error: variable 'vpdr_len' is uninitialized when used here [-Werror,-Wuninitialized]
FIND_VPD_KW(i, "RV");
^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2776:50: note: expanded from macro 'FIND_VPD_KW'
var = pci_vpd_find_info_keyword(vpd, kw_offset, vpdr_len, name); \
^~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2748:23: note: initialize the variable 'vpdr_len' to silence this warning
unsigned int vpdr_len, kw_offset, id_len;
^
= 0
2 errors generated.
The series "PCI/VPD: Convert more users to the new VPD API functions"
was applied to net-next when it should have been applied to the PCI tree
because of build errors. However, commit 82e34c8a9b ("Revert "Revert
"cxgb4: Search VPD with pci_vpd_find_ro_info_keyword()""") reapplied a
change, resulting in the warning above.
Properly revert commit 8d63ee602d ("cxgb4: Search VPD with
pci_vpd_find_ro_info_keyword()") to fix the warning and restore proper
functionality. This also reverts commit 3a93bedea0 ("cxgb4: Remove
unused vpd_param member ec") to avoid future merge conflicts, as that
change has been applied to the PCI tree.
Link: https://lore.kernel.org/r/20210823120929.7c6f7a4f@canb.auug.org.au/
Link: https://lore.kernel.org/r/1ca29408-7bc7-4da5-59c7-87893c9e0442@gmail.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure a valid XSK buffer before proceed to free the xdp buffer.
The following kernel panic is observed without this patch:
RIP: 0010:xp_free+0x5/0x40
Call Trace:
stmmac_napi_poll_rxtx+0x332/0xb30 [stmmac]
? stmmac_tx_timer+0x3c/0xb0 [stmmac]
net_rx_action+0x13d/0x3d0
__do_softirq+0xfc/0x2fb
? smpboot_register_percpu_thread+0xe0/0xe0
run_ksoftirqd+0x32/0x70
smpboot_thread_fn+0x1d8/0x2c0
kthread+0x169/0x1a0
? kthread_park+0x90/0x90
ret_from_fork+0x1f/0x30
---[ end trace 0000000000000002 ]---
Fixes: bba2556efa ("net: stmmac: Enable RX via AF_XDP zero-copy")
Cc: <stable@vger.kernel.org> # 5.13.x
Suggested-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After free xsk_pool, there is possibility that napi polling is still
running in the middle, thus causes a kernel crash due to kernel NULL
pointer dereference of rx_q->xsk_pool and tx_q->xsk_pool.
Fix this by changing the XDP pool setup sequence to:
1. disable napi before free xsk_pool
2. enable napi after init xsk_pool
The following kernel panic is observed without this patch:
RIP: 0010:xsk_uses_need_wakeup+0x5/0x10
Call Trace:
stmmac_napi_poll_rxtx+0x3a9/0xae0 [stmmac]
__napi_poll+0x27/0x130
net_rx_action+0x233/0x280
__do_softirq+0xe2/0x2b6
run_ksoftirqd+0x1a/0x20
smpboot_thread_fn+0xac/0x140
? sort_range+0x20/0x20
kthread+0x124/0x150
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
---[ end trace a77c8956b79ac107 ]---
Fixes: bba2556efa ("net: stmmac: Enable RX via AF_XDP zero-copy")
Cc: <stable@vger.kernel.org> # 5.13.x
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2021-08-24
Vinicius Costa Gomes says:
This adds support for PCIe PTM (Precision Time Measurement) to the igc
driver. PCIe PTM allows the NIC and Host clocks to be compared more
precisely, improving the clock synchronization accuracy.
Patch 1/4 reverts a commit that made pci_enable_ptm() private to the
PCI subsystem, reverting makes it possible for it to be called from
the drivers.
Patch 2/4 adds the pcie_ptm_enabled() helper.
Patch 3/4 calls pci_enable_ptm() from the igc driver.
Patch 4/4 implements the PCIe PTM support. Exposing it via the
.getcrosststamp() API implies that the time measurements are made
synchronously with the ioctl(). The hardware was implemented so the
most convenient way to retrieve that information would be
asynchronously. So, to follow the expectations of the ioctl() we have
to use less convenient ways, triggering an PCIe PTM dialog every time
a ioctl() is received.
Some questions are raised (also pointed out in the commit message):
1. Using convert_art_ns_to_tsc() is too x86 specific, there should be
a common way to create a 'system_counterval_t' from a timestamp.
2. convert_art_ns_to_tsc() says that it should only be used when
X86_FEATURE_TSC_KNOWN_FREQ is true, but during tests it works even
when it returns false. Should that check be done?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
macb_ptp_desc will not return NULL under most circumstances with correct
Kconfig and IP design config register. But for the sake of the extreme
corner case, check for NULL when using the helper. In case of rx_tstamp,
no action is necessary except to return (similar to timestamp disabled)
and warn. In case of TX, return -EINVAL to let the skb be free. Perform
this check before marking skb in progress.
Fixes coverity warning:
(4) Event dereference:
Dereferencing a null pointer "desc_ptp"
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables automatic recovery by default in case of various
error condition like fw assert , hardware error etc.
This also ensure driver can handle multiple iteration of assertion
conditions.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2c896fb02e
"net: stmmac: dwmac-rk: add pd_gmac support for rk3399" and fixes
unbalanced pm_runtime_enable warnings.
In the commit to be reverted, support for power management was
introduced to the Rockchip glue code. Later, power management support
was introduced to the stmmac core code, resulting in multiple
invocations of pm_runtime_{enable,disable,get_sync,put_sync}.
The multiple invocations happen in rk_gmac_powerup and
stmmac_{dvr_probe, resume} as well as in rk_gmac_powerdown and
stmmac_{dvr_remove, suspend}, respectively, which are always called
in conjunction.
Fixes: 5ec5582343 ("net: stmmac: add clocks management for gmac driver")
Signed-off-by: Michael Riesch <michael.riesch@wolfvision.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enables PCIe PTM (Precision Time Measurement) support in the igc
driver. Notifies the PCI devices that PCIe PTM should be enabled.
PCIe PTM is similar protocol to PTP (Precision Time Protocol) running
in the PCIe fabric, it allows devices to report time measurements from
their internal clocks and the correlation with the PCIe root clock.
The i225 NIC exposes some registers that expose those time
measurements, those registers will be used, in later patches, to
implement the PTP_SYS_OFFSET_PRECISE ioctl().
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and
read the full VPD data into it.
This simplifies the code, and we no longer have to make assumptions about
VPD size.
Link: https://lore.kernel.org/r/62522a24-f39a-2b35-1577-1fbb41695bed@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to
simplify the code.
Use strncasecmp() to match Vendor ID instead of comparing with lower- and
upper-case hex string.
[bhelgaas: convert to strncasecmp()]
Link: https://lore.kernel.org/r/a9f730cf-e31e-902b-7b39-0ff2e99636e0@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and
read the full VPD data into it.
This simplifies the code, and we no longer have to make assumptions about
VPD size.
Link: https://lore.kernel.org/r/821a334d-ff9d-386e-5f42-9b620ab3dbfa@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Read NVRAM directly into buffer and use swab32s() to byte swap it in-place
instead of reading it into the end of the buffer and swapping it manually
while copying it.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/e4ac6229-1df5-8760-3a87-1ad0ace87137@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to
simplify the code.
Replace netif_err() with pci_err() because the netdevice isn't registered
yet, which results in very ugly messages.
Use kmemdup_nul() instead of open-coding it.
This is the same as 37838aa437 ("sfc: Search VPD with
pci_vpd_find_ro_info_keyword()"), just for the falcon chip version.
Link: https://lore.kernel.org/r/898282a1-13bd-17bc-2e9a-d3dcd336b46c@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and
read the full VPD data into it.
This avoids having to allocate a buffer on the stack, and we don't have to
make any assumptions on VPD size and location of information in VPD.
This is the same as 5119e20fac ("sfc: Read VPD with pci_vpd_alloc()"),
just for the falcon chip version.
Link: https://lore.kernel.org/r/2a8d069e-9516-50d8-6520-2614222c8f5f@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add support in ethtool for switching EQE/CQE mode.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For device whose version is above V3(include V3), the GL can
select EQE or CQE mode, so adds support for it.
In CQE mode, the coalesced timer will restart when the first new
completion occurs, while in EQE mode, the timer will not restart.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to support more coalesce parameters through netlink,
add two new parameter kernel_coal and extack for .set_coalesce
and .get_coalesce, then some extra info can return to user with
the netlink API.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ASPM is disabled completely because we've seen different types of
problems in the past. However it seems these problems occurred with
L1 or L1 sub-states only. On all the chip versions I've seen the
acceptable L0s exit latency is 512ns. This should be short enough not
to cause problems. If the actual L0s exit latency of the PCIe link
is bigger than 512ns then the PCI core will disable L0s anyway.
So let's give it a try and disable L1 and L1 sub-states only.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For VFs we should return with an error in case we didn't get the exact
number of msix vectors as we requested.
Not doing that will lead to a crash when starting queues for this VF.
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the interface name and PCI address are printed twice, because
netdev_info() is printing this information implicitly already. This results
in messages like the following. remove the duplicated information.
cxgb4 0000:81:00.4 eth3: eth3: Chelsio T6225-OCP-SO (0000:81:00.4) 1G/10G/25GBASE-SFP28
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retrieve OF match data, it's better and cleaner to use
'of_device_get_match_data' over 'of_match_device'.
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retrieve OF match data, it's better and cleaner to use
'of_device_get_match_data' over 'of_match_device'.
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The usage of these macros has been removed in commit db1a8611c8
("sunhme: Convert to pure OF driver."). So they can be removed.
This simplifies code and helps for removing the wrappers in
include/linux/pci-dma-compat.h.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to Armada XP datasheet bit at 0 position is corresponding for
TxInProg indication.
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case of taprio offload is not enabled, the error handling path
causes a kernel crash due to kernel NULL pointer deference.
Fix this by adding check for NULL before attempt to access 'plat->est'
on the mutex_lock() call.
The following kernel panic is observed without this patch:
RIP: 0010:mutex_lock+0x10/0x20
Call Trace:
tc_setup_taprio+0x482/0x560 [stmmac]
kmem_cache_alloc_trace+0x13f/0x490
taprio_disable_offload.isra.0+0x9d/0x180 [sch_taprio]
taprio_destroy+0x6c/0x100 [sch_taprio]
qdisc_create+0x2e5/0x4f0
tc_modify_qdisc+0x126/0x740
rtnetlink_rcv_msg+0x12b/0x380
_raw_spin_lock_irqsave+0x19/0x40
_raw_spin_unlock_irqrestore+0x18/0x30
create_object+0x212/0x340
rtnl_calcit.isra.0+0x110/0x110
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x191/0x230
netlink_sendmsg+0x243/0x470
sock_sendmsg+0x5e/0x60
____sys_sendmsg+0x20b/0x280
copy_msghdr_from_user+0x5c/0x90
__mod_memcg_state+0x87/0xf0
___sys_sendmsg+0x7c/0xc0
lru_cache_add+0x7f/0xa0
_raw_spin_unlock+0x16/0x30
wp_page_copy+0x449/0x890
handle_mm_fault+0x921/0xfc0
__sys_sendmsg+0x59/0xa0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---[ end trace b1f19b24368a96aa ]---
Fixes: b60189e039 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-08-20
This series contains updates to igc and e1000e drivers.
Aaron Ma resolves a page fault which occurs when thunderbolt is
unplugged for igc.
Toshiki Nishioka fixes Tx queue looping to use actual number of queues
instead of max value for igc.
Sasha fixes an incorrect latency comparison by decoding the values before
comparing and prevents attempted writes to read-only NVMs for e1000e.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A successful 'xge_mdio_config()' call should be balanced by a corresponding
'xge_mdio_remove()' call in the error handling path of the probe, as
already done in the remove function.
Update the error handling path accordingly.
Fixes: ea8ab16ab2 ("drivers: net: xgene-v2: Add MDIO support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.
1) mlx5 single E-Switch FDB for lag
2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
3) Add DCS caps & fields support
We need this in net-next as multiple features are dependent on the
single FDB feature.
====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* mellanox/mlx5-next:
net/mlx5: Lag, Create shared FDB when in switchdev mode
net/mlx5: E-Switch, add logic to enable shared FDB
net/mlx5: Lag, move lag destruction to a workqueue
net/mlx5: Lag, properly lock eswitch if needed
net/mlx5: Add send to vport rules on paired device
net/mlx5: E-Switch, Add event callback for representors
net/mlx5e: Use shared mappings for restoring from metadata
net/mlx5e: Add an option to create a shared mapping
net/mlx5: E-Switch, set flow source for send to uplink rule
RDMA/mlx5: Add shared FDB support
{net, RDMA}/mlx5: Extend send to vport rules
RDMA/mlx5: Fill port info based on the relevant eswitch
net/mlx5: Lag, add initial logic for shared FDB
net/mlx5: Return mdev from eswitch
IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to
simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validate the VPD checksum with pci_vpd_check_csum() to simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to
simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and
read the full VPD data into it.
This simplifies the code, and we no longer have to make assumptions about
VPD size.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to
simplify the code.
str_id_reg and str_id_cap hold the same string and are used in the same
comparison. This doesn't make sense, use one string str_id instead.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and
read the full VPD data into it.
This simplifies the code, and we no longer have to make assumptions about
VPD size.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to
simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the same as 37838aa437 "sfc: Search VPD with
pci_vpd_find_ro_info_keyword()", just for the falcon chip version.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the same as 5119e20fac "sfc: Read VPD with pci_vpd_alloc()",
just for the falcon chip version.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 01848e05f8 ("mlxsw: spectrum_router: Add support for inner
layer 3 multipath hash policy") and commit daeabf89eb ("mlxsw:
spectrum_router: Add support for custom multipath hash policy") added
support for multipath hash policies where the hash is calculated based
on inner packet fields.
For IPv6-in-IPv6 packets, the default parsing depth (96 bytes) is not
enough when these policies are used.
Therefore, for such cases, call the new API to increase / decrease the
parsing depth as necessary. Care is taken to ensure the API is not
called multiple times.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patches added new API to handle parsing depth and converted
the existing code to use it.
Remove the old infrastructure which is not needed anymore.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert VxLAN and PTP modules to increase parsing depth using new API
that was added in the previous patch.
Separate MPRS register's configuration to VxLAN related configuration
and parsing depth configuration. Handle each one using the appropriate
API.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum ASICs have a configurable limit on how deep into the packet
they parse. By default, the limit is 96 bytes.
There are several cases where this parsing depth is not enough and there
is a need to increase it. Currently, increasing parsing depth is
maintained as part of VxLAN module, because the MPRS register which
configures parsing depth also configures UDP destination port number
used for VxLAN encapsulation and decapsulation.
Add an API for increasing parsing depth as part of spectrum.c code, so
that it will be possible to use it from other modules. In addition, add
an API for setting UDP destination port and protect it using a dedicated
lock for saving parsing configurations. The lock is needed as not all
the callers hold RTNL lock.
Maintain a counter for increased parsing depth consumers. For first
consumer subscription, increase the parsing depth and for last consumer
unsubscription, set parsing depth to default value.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RVU SMMU widget stores the final translated PA at
RVU_AF_SMMU_TLN_FLIT0<57:18> instead of FLIT1 register. This patch
fixes the address translation logic to use the correct register.
Fixes: 893ae97214 ("octeontx2-af: cn10k: Support configurable LMTST regions")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherthan setting action as RSS in NPC MCAM entry, RSS flowkey
algorithm index also needs to be set. Otherwise whatever algorithm
is defined at flowkey index '0' will be considered by HW and pkt
flows will be distributed as such.
Fix this by saving the flowkey index sent by admin function while
initializing RSS and then use it when framing MCAM rules.
Fixes: 81a4362016 ("octeontx2-pf: Add RSS multi group support")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever user changes interface MAC address both default DMAC based
MCAM rule and VLAN offload (for strip) rules are updated with new
MAC address. To update or install VLAN offload rule PF driver needs
interface's receive channel info, which is retrieved from admin
function at the time of NIXLF initialization.
If user changes MAC address before interface is UP, VLAN offload rule
installation will fail and throw error as receive channel is not valid.
To avoid this, skip VLAN offload rule installation if netdev is not UP.
This rule will anyway be reinslatted as part of open() call.
Fixes: fd9d7859db ("octeontx2-pf: Implement ingress/egress VLAN offload")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bandwidth profiles (ipolicer structure)is implemented only on CN10K
platform. But current code try to free the ipolicer memory without
checking the capibility flag leading to driver crash on OCTEONTX2
platform. This patch fixes the issue by add capability flag check.
Fixes: e8e095b3b3 ("octeontx2-af: cn10k: Bandwidth profiles config support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K platform requires physically contiguous memory for LMTST
operations which goes beyond a single page. Not having physically
contiguous memory will result in HW fetching transmit descriptors from
a wrong memory location.
Hence use DMA_ATTR_FORCE_CONTIGUOUS attribute while allocating
LMTST regions.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the erroneous vlan priority mask field that was
send to npc_install_flow_req.
Fixes: 1d4d9e42c2 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Supported link modes are updated by firmware in shared
structure per interface. Kernel uses this value to display
supported link modes via ethtool.
Currently there is extra validation that firmware updated
modes are validated against internal list of supported modes.
As intenal list of supported modes are not updated frequently
new modes supported by firmware are not updated to ethtool.
Hence remove extra validation and report all firmware updated
modes.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Print debug message if any of the RVU hardware blocks
reset fails.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per hardware the base channel number configured
for programmable channels of a block must be multiple
of number of channels of that block. This condition
is not met for SDP base channel currently. Hence this
patch ensures all the base channel numbers of all
blocks are multiple of number of channels present in
the blocks. Also instead of hardcoding SDP number
of channels the same is read from the NIX_AF_CONST1
register.
Fixes: 242da43921 ("octeontx2-af: cn10k: Add support for programmable")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'bp_ena' in Aura context is NIX block index, setting it
zero will always backpressure NIX0 block, even if NIXLF
belongs to NIX1. Hence fix this by setting it appropriately
based on NIX block address.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit a955318fe6 ("stmmac: align RX buffers"),
which breaks at least one platform (Nvidia Jetson-X1), causing
packet corruption. This is 100% reproducible, and reverting
the patch results in a working system again.
Given that it is "only" a performance optimisation, let's
return to a known working configuration until we can have a
good understanding of what is happening here.
Fixes: a955318fe6 ("stmmac: align RX buffers")
Cc: Matteo Croce <mcroce@linux.microsoft.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Link: https://lore.kernel.org/netdev/871r71azjw.wl-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210820183002.457226-1-maz@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and
read the full VPD data into it.
This simplifies the code, and we no longer have to make assumptions about
VPD size.
Link: https://lore.kernel.org/r/bd3cd19c-b74f-9704-5786-476bf35ab5de@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use pci_vpd_find_ro_info_keyword() to search for keywords in VPD to
simplify the code.
Replace netif_err() with pci_err() because the netdevice isn't registered
yet, which results in very ugly messages.
Use kmemdup_nul() instead of open-coding it.
Link: https://lore.kernel.org/r/bf5d4ba9-61a9-2bfe-19ec-75472732d74d@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use pci_vpd_alloc() to dynamically allocate a properly sized buffer and
read the full VPD data into it.
This avoids having to allocate a buffer on the stack, and we don't have to
make any assumptions on VPD size and location of information in VPD.
Link: https://lore.kernel.org/r/e58f1e40-c043-0266-9a0f-e5a7f3f6883c@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
On new platforms, the NVM is read-only. Attempting to update the NVM
is causing a lockup to occur. Do not attempt to write to the NVM
on platforms where it's not supported.
Emit an error message when the NVM checksum is invalid.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213667
Fixes: fb776f5d57 ("e1000e: Add support for Tiger Lake")
Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Suggested-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We should decode the latency and the max_latency before directly compare.
The latency should be presented as lat_enc = scale x value:
lat_enc_d = (lat_enc & 0x0x3ff) x (1U << (5*((max_ltr_enc & 0x1c00)
>> 10)))
Fixes: cf8fb73c23 ("e1000e: add support for LTR on I217/I218")
Suggested-by: Yee Li <seven.yi.lee@gmail.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use num_tx_queues rather than the IGC_MAX_TX_QUEUES fixed number 4 when
iterating over tx_ring queue since instantiated queue count could be
less than 4 where on-line cpu count is less than 4.
Fixes: ec50a9d437 ("igc: Add support for taprio offloading")
Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After unplug thunderbolt dock with i225, pciehp interrupt is triggered,
remove call will read/write mmio address which is already disconnected,
then cause page fault and make system hang.
Check PCI state to remove device safely.
Trace:
BUG: unable to handle page fault for address: 000000000000b604
Oops: 0000 [#1] SMP NOPTI
RIP: 0010:igc_rd32+0x1c/0x90 [igc]
Call Trace:
igc_ptp_suspend+0x6c/0xa0 [igc]
igc_ptp_stop+0x12/0x50 [igc]
igc_remove+0x7f/0x1c0 [igc]
pci_device_remove+0x3e/0xb0
__device_release_driver+0x181/0x240
Fixes: 13b5b7fd6a ("igc: Add support for Tx/Rx rings")
Fixes: b03c49cde6 ("igc: Save PTP time before a reset")
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch ensures that mcam flows are allocated
before adding or destroying the flows.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a mostly cosmetic patch that creates some helpers for accessing
the VLAN table. These helpers are also a bit more careful in that they
do not modify the ocelot->vlan_mask unless the hardware operation
succeeded.
Not all callers check the return value (the init code doesn't), but anyway.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to transmit more restrictions in future patches, convert this
one to netlink extack.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to reject some more configurations in future patches, convert
the existing one to netlink extack.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing ocelot device trees, like ocelot_pcb123.dts for example,
have SERDES ports (ports 4 and higher) that do not have status = "disabled";
but on the other hand do not have a phy-handle or a fixed-link either.
So from the perspective of phylink, they have broken DT bindings.
Since the blamed commit, probing for the entire switch will fail when
such a device tree binding is encountered on a port. There used to be
this piece of code which skipped ports without a phy-handle:
phy_node = of_parse_phandle(portnp, "phy-handle", 0);
if (!phy_node)
continue;
but now it is gone.
Anyway, fixed-link setups are a thing which should work out of the box
with phylink, so it would not be in the best interest of the driver to
add that check back.
Instead, let's look at what other drivers do. Since commit 86f8b1c01a
("net: dsa: Do not make user port errors fatal"), DSA continues after a
switch port fails to register, and works only with the ports that
succeeded.
We can achieve the same behavior in ocelot by unregistering the devlink
port for ports where ocelot_port_phylink_create() failed (called via
ocelot_probe_port), and clear the bit in devlink_ports_registered for
that port. This will make the next iteration reconsider the port that
failed to probe as an unused port, and re-register a devlink port of
type UNUSED for it. No other cleanup should need to be performed, since
ocelot_probe_port() should be self-contained when it fails.
Fixes: e6e12df625 ("net: mscc: ocelot: convert to phylink")
Reported-and-tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are cases where we would like to continue probing the switch even
if one port has failed to probe. When that happens, we need to
unregister a devlink_port of type DEVLINK_PORT_FLAVOUR_PHYSICAL and
re-register it of type DEVLINK_PORT_FLAVOUR_UNUSED.
This is fine, except when calling devlink_port_attrs_set on a structure
on which devlink_port_register has been previously called, there is a
WARN_ON in devlink_port_attrs_set that devlink_port->devlink must be
NULL.
So don't assume that the memory behind dlp is clean when calling
ocelot_port_devlink_init, just zero-initialize it.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when probing returns an error, the netdev is freed but
phylink_disconnect is not called.
Create a common function between the unbind path and the error path,
call it the opposite of dpaa2_switch_probe_port: dpaa2_switch_remove_port,
and call it from both the unbind and the error path.
Fixes: 84cba72956 ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an ASSERT_RTNL in phylink_disconnect_phy which triggers
whenever dpaa2_switch_port_disconnect_mac is called.
To follow the pattern established by dpaa2_eth_disconnect_mac, take the
rtnl_mutex every time we call dpaa2_switch_port_disconnect_mac.
Fixes: 84cba72956 ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This add frame DMA functionality to the Sparx5 platform.
Ethernet frames can be extracted or injected autonomously to or from the
device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data
structures in memory are used for injecting or extracting Ethernet frames.
The FDMA generates interrupts when frame extraction or injection is done
and when the linked lists need updating.
The FDMA implements two extraction channels, one per switch core port
towards the VCore CPU system and a total of six injection channels.
Extraction channels are mapped one-to-one to the CPU ports, while injection
channels can be individually assigned to any CPU port.
- FDMA channel 0 through 5 corresponds to CPU port 0 injection direction
FDMA_CH_CFG[channel].CH_INJ_PORT is set to 0.
- FDMA channel 0 through 5 corresponds to CPU port 1 injection direction when
FDMA_CH_CFG[channel].CH_INJ_PORT is set to 1.
- FDMA channel 6 corresponds to CPU port 0 extraction direction.
- FDMA channel 7 corresponds to CPU port 1 extraction direction.
The FDMA implements a strict priority scheme among channels. Extraction
channels are prioritized over injection channels and secondarily channels
with higher channel number are prioritized over channels with lower number.
On the other hand, ports are being served on an equal-bandwidth principle
both on injection and extraction directions. The equal-bandwidth principle
will not force an equal bandwidth. Instead, it ensures that the ports
perform at their best considering the operating conditions.
When more than one injection channel is enabled for injection on the same
CPU port, priority determines which channel can inject data. Ownership
is re-arbitrated on frame boundaries.
The FDMA processes linked lists of DMA Control Block Structures (DCBs). The
DCBs have the same basic structure for both injection and extraction. A DCB
must be placed on a 64-bit word-aligned address in memory. Each DCB has a
per-channel configurable amount of associated data blocks in memory, where
the frame data is stored.
The data blocks that are used by extraction channels must be placed on
64-bit word aligned addresses in memory, and their length must be a
multiple of 128 bytes.
A DCB carries the pointer to the next DCB of the linked list, the INFO word
which holds information for the DCB, and a pair of status word and memory
pointer for every data block that it is associated with.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement eswitch API that allows updating rate groups. If group
pointer is NULL, then move the vport to internal unlimited group zero.
Implement devlink_ops->rate_parent_node_set() callback in the terms of
the new eswitch group update API.
Enable QoS for all group's elements if a group has allocated BW share.
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Provide eswitch API to allow controlling group rate limits. Use it to
implement devlink_ops->mlx5_devlink_rate_node_tx_{share|max}_set().
The share rate will create relative bandwidth share on the groups level
while within the group the user can set shared rate on the member vports
of that group and this rate will be relative to the group's share rate.
The group with the highest shared rate will get a BW share of 100 and
the rest of the groups will get a value that reflects the ratio between
their share rate and the maximum share rate.
Example:
Created four rate groups with tx_share limits:
$ devlink port function rate add \
pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_4 tx_share 10gbit
Assuming link speed is 50 Gbit/sec ratio divider will be
50 / (30+20+20+10) = 0.625. Normalized rate values for the groups:
<group_1> 30 * 0.625 = 18.75 Gbit/sec
<group_2> 20 * 0.625 = 12.5 Gbit/sec
<group_3> 20 * 0.625 = 12.5 Gbit/sec
<group_4> 10 * 0.625 = 6.25 Gbit/sec
Rate group with unlimited tx_share rate will receive minimum BW value
(1Mbit/sec) if presented any group with tx_share rate limit. This allow
to not drop all packets in case of heavy traffic.
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Extend eswitch API with rate limiting groups:
- Define new struct mlx5_esw_rate_group that is used to hold all
internal group data.
- Implement functions that allow creation, destruction and cleanup of
groups.
- Assign all vports to internal unlimited zero group by default.
This commit lays the groundwork for group rate limiting by implementing
devlink_ops->rate_node_{new|del}() callbacks to support creating and
deleting groups through devlink rate node objects. APIs that allows
setting rates and adding/removing members are implemented in following
patches.
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Register devlink rate leaf object for every eswitch vport.
Implement devlink ops that enable setting shared and max tx rates
through devlink API.
Extract common eswitch code from existing tx rate set function that is
accessed through NDO to be reused for the devlink. Values configured
with NDO API are not visible for the devlink API, therefore shouldn't be
used simultaneously.
When normalizing the BW share value, dividing the desired minimum rate
by the common divider results in losing information since the quotient
is rounded down. This has a significant affect on configurations of low
rate where the round down eliminates a large percentage of the total
rate. To improve the formula, round up the division result to make sure
that the BW share is at least the value it was supposed to be and won't
lost a significant amount of the expected value.
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Move eswitch QoS related code into dedicated file. Provide eswitch API
to access this code meaning it is isolated and restricted to be used
only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
esw/legacy.c.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the sample offload actions send the encapsulated packet
to software. This commit decapsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.
If decapsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the sample offload actions send the encapsulated packet
to software. sFlow expects tunneled packets to be decapsulated while
having the tunnel properties on the skb metadata fields.
Reuse the functions used by connection tracking to map the outer
header properties to a unique id. The next patch will use that id
to restore the tunnel information of decapsulated packets onto the
skb.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CONFIG_NET_TC_SKB_EXT controls the SKB extension support for
restoring chain ids. SKB extension is not required for tunnel
restoration.
Remove the CONFIG_NET_TC_SKB_EXT dependency as a pre-step for
using the tunnel restore methods for sample offload use cases.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Move post action table management to common library providing
add/del/get API. Refactor the ct action offload to use the common
API.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flowtable.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.
Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.
Currently the post-action design is implemented only by the ct
action. Introduce post action infrastructure as a pre-step for
reusing it with the sFlow offload feature. Init and destroy the
common post action table. Refactor the ct offload to use the
common post table infrastructure in the next patch.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
IDR is deprecated. Use xarray instead.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently it is in eswitch attribute. Move it to flow attribute to
reflect the change in previous patch.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Module sample belongs to en/tc instead of esw. Move it and rename
accordingly.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5/esw/sample.c doesn't really need mlx5e_priv object, we can remove
this redundant dependency by passing the eswitch object directly to
the sample object constructor.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
The devlink dev info command reports version information about the
device and firmware running on the board. This includes the "board.id"
field which is supposed to represent an identifier of the board design.
The ice driver uses the Product Board Assembly identifier for this.
In some cases, the PBA is not present in the NVM. If this happens,
devlink dev info will fail with an error. Instead, modify the
ice_info_pba function to just exit without filling in the context
buffer. This will cause the board.id field to be skipped. Log a dev_dbg
message in case someone wants to confirm why board.id is not showing up
for them.
Fixes: e961b679fb ("ice: add board identifier info to devlink .info_get")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20210819223451.245613-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A recent change added error checking messages and failed to remove one
of the previous error checks. There are now two checks on variable err
so the second one is redundant dead code and can be removed.
Addresses-Coverity: ("Logically dead code")
Fixes: a83bdada06 ("octeontx2-af: Add debug messages for failures")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210818130927.33895-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently dpaa2_switch_takedown has a funny name and does not do the
opposite of dpaa2_switch_init, which makes probing fail when we need to
handle an -EPROBE_DEFER.
A sketch of what dpaa2_switch_init does:
dpsw_open
dpaa2_switch_detect_features
dpsw_reset
for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
dpsw_if_disable
dpsw_if_set_stp
dpsw_vlan_remove_if_untagged
dpsw_if_set_tci
dpsw_vlan_remove_if
}
dpsw_vlan_remove
alloc_ordered_workqueue
dpsw_fdb_remove
dpaa2_switch_ctrl_if_setup
When dpaa2_switch_takedown is called from the error path of
dpaa2_switch_probe(), the control interface, enabled by
dpaa2_switch_ctrl_if_setup from dpaa2_switch_init, remains enabled,
because dpaa2_switch_takedown does not call
dpaa2_switch_ctrl_if_teardown.
Since dpaa2_switch_probe might fail due to EPROBE_DEFER of a PHY, this
means that a second probe of the driver will happen with the control
interface directly enabled.
This will trigger a second error:
[ 93.273528] fsl_dpaa2_switch dpsw.0: dpsw_ctrl_if_set_pools() failed
[ 93.281966] fsl_dpaa2_switch dpsw.0: fsl_mc_driver_probe failed: -13
[ 93.288323] fsl_dpaa2_switch: probe of dpsw.0 failed with error -13
Which if we investigate the /dev/dpaa2_mc_console log, we find out is
caused by:
[E, ctrl_if_set_pools:2211, DPMNG] ctrl_if must be disabled
So make dpaa2_switch_takedown do the opposite of dpaa2_switch_init (in
reasonable limits, no reason to change STP state, re-add VLANs etc), and
rename it to something more conventional, like dpaa2_switch_teardown.
Fixes: 613c0a5810 ("staging: dpaa2-switch: enable the control interface")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210819141755.1931423-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make changes to MAC address dependent on the response of PF.
Disallow changes to HW MAC address and MAC filter from untrusted
VF, thanks to that ping is not lost if VF tries to change MAC.
Add a new field in iavf_mac_filter, to indicate whether there
was response from PF for given filter. Based on this field pass
or discard the filter.
If untrusted VF tried to change it's address, it's not changed.
Still filter was changed, because of that ping couldn't go through.
Fixes: c5c922b3e0 ("iavf: fix MAC address setting for VFs when filter is rejected")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan G <Gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Without this patch, ATR does not work. Receive/transmit uses queue
selection based on SW DCB hashing method.
If traffic classes are not configured for PF, then use
netdev_pick_tx function for selecting queue for packet transmission.
Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx,
which ensures that packet is transmitted/received from CPU that is
running the application.
Reproduction steps:
1. Load i40e driver
2. Map each MSI interrupt of i40e port for each CPU
3. Disable ntuple, enable ATR i.e.:
ethtool -K $interface ntuple off
ethtool --set-priv-flags $interface flow-director-atr
4. Run application that is generating traffic and is bound to a
single CPU, i.e.:
taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10
5. Observe behavior:
Application's traffic should be restricted to the CPU provided in
taskset.
Fixes: 89ec1f0886 ("i40e: Fix queue-to-TC mapping on Tx")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate the array stpa on the stack but instead it
static const. Makes the object code smaller by 81 bytes:
Before:
text data bss dec hex filename
54993 17248 0 72241 11a31 ./drivers/net/ethernet/ti/cpsw_new.o
After:
text data bss dec hex filename
54784 17376 0 72160 119e0 ./drivers/net/ethernet/ti/cpsw_new.o
(gcc version 10.3.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't populate the array spec_opcode on the stack but instead it
static const. Makes the object code smaller by 158 bytes:
Before:
text data bss dec hex filename
12271 3976 128 16375 3ff7 .../hisilicon/hns3/hns3pf/hclge_cmd.o
After:
text data bss dec hex filename
12017 4072 128 16217 3f59 .../hisilicon/hns3/hns3pf/hclge_cmd.o
(gcc version 10.3.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't populate the array speeds on the stack but instead it
static const. Makes the object code smaller by 17 bytes:
Before:
text data bss dec hex filename
39987 14200 64 54251 d3eb .../huawei/hinic/hinic_sriov.o
After:
text data bss dec hex filename
39906 14264 64 54234 d3da .../huawei/hinic/hinic_sriov.o
(gcc version 10.3.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mii_ethtool_gset() does not return any errors, so error handling can be
omitted to make code more simple.
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The register for retrieving TX counters is present only on R-Car Gen3
and RZ/G2L; it is not present on R-Car Gen2.
Add the tx_counters hw feature bit to struct ravb_hw_info, to enable this
feature specifically for R-Car Gen3 now and later extend it to RZ/G2L.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car Gen3 supports TX and RX clock internal delay modes, whereas R-Car
Gen2 and RZ/G2L do not support it.
Add an internal_delay hw feature bit to struct ravb_hw_info to enable this
only for R-Car Gen3.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On R-Car the checksum calculation on RX frames is done by the E-MAC
module, whereas on RZ/G2L it is done by the TOE.
TOE calculates the checksum of received frames from E-MAC and outputs it to
DMAC. TOE also calculates the checksum of transmission frames from DMAC and
outputs it E-MAC.
Add net_features and net_hw_features to struct ravb_hw_info, to support
subsequent SoCs without any code changes in the ravb_probe function.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device stats strings for R-Car and RZ/G2L are different.
R-Car provides 30 device stats, whereas RZ/G2L provides only 15. In
addition, RZ/G2L has stats "rx_queue_0_csum_offload_errors" instead of
"rx_queue_0_missed_errors".
Add structure variables gstrings_stats and gstrings_size to struct
ravb_hw_info, so that subsequent SoCs can be added without any code
changes in the ravb_get_strings function.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car provides 30 device stats, whereas RZ/G2L provides only 15. In
addition, RZ/G2L has stats "rx_queue_0_csum_offload_errors" instead of
"rx_queue_0_missed_errors".
Replace RAVB_STATS_LEN macro with a structure variable stats_len to
struct ravb_hw_info, to support subsequent SoCs without any code changes
to the ravb_get_sset_count function.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The maximum descriptor size that can be specified on the reception side for
R-Car is 2048 bytes, whereas for RZ/G2L it is 8096.
Add the max_rx_len variable to struct ravb_hw_info for allocating different
RX skb buffer sizes for R-Car and RZ/G2L using the netdev_alloc_skb
function.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
R-Car Gen2 needs a 4byte aligned address for the transmission buffer,
whereas R-Car Gen3 doesn't have any such restriction.
Add aligned_tx to struct ravb_hw_info to select the driver to choose
between aligned and unaligned tx buffers.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are
similar to the R-Car Ethernet AVB IP. With a few changes in the driver we
can support both IPs.
This patch adds the struct ravb_hw_info to hold hw features, driver data
and function pointers to support both the IPs. It also replaces the driver
data chip type with struct ravb_hw_info by moving chip type to it.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of TX descriptors per packet is an unsigned value and
the variable for holding this information should be unsigned.
This patch replaces the data type of num_tx_desc variable in struct
ravb_private from 'int' to 'unsigned int'.
This patch also updates the data type of local variables to unsigned int,
where the local variables are evaluated using num_tx_desc.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we are unable to ping a bridge on top of a felix switch which
uses the ocelot-8021q tagger. The packets are dropped on the ingress of
the user port and the 'drop_local' counter increments (the counter which
denotes drops due to no valid destinations).
Dumping the PGID tables, it becomes clear that the PGID_SRC of the user
port is zero, so it has no valid destinations.
But looking at the code, the cpu_fwd_mask (the bit mask of DSA tag_8021q
ports) is clearly missing from the forwarding mask of ports that are
under a bridge. So this has always been broken.
Looking at the version history of the patch, in v7
https://patchwork.kernel.org/project/netdevbpf/patch/20210125220333.1004365-12-olteanv@gmail.com/
the code looked like this:
/* Standalone ports forward only to DSA tag_8021q CPU ports */
unsigned long mask = cpu_fwd_mask;
(...)
} else if (ocelot->bridge_fwd_mask & BIT(port)) {
mask |= ocelot->bridge_fwd_mask & ~BIT(port);
while in v8 (the merged version)
https://patchwork.kernel.org/project/netdevbpf/patch/20210129010009.3959398-12-olteanv@gmail.com/
it looked like this:
unsigned long mask;
(...)
} else if (ocelot->bridge_fwd_mask & BIT(port)) {
mask = ocelot->bridge_fwd_mask & ~BIT(port);
So the breakage was introduced between v7 and v8 of the patch.
Fixes: e21268efbe ("net: dsa: felix: perform switch setup for tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210817160425.3702809-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change the type of probe argument in functions which implement reset
methods from int to bool to make the context and intent clear.
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-10-ameynarkhede03@gmail.com
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The ARRAY_SIZE macro is defined to get an array's size which is
more compact and more formal in linux source. Thus, we can replace
the long sizeof(arr)/sizeof(arr[0]) with the compact ARRAY_SIZE.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210817121106.44189-1-wangborong@cdjrlc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VLAN TCI is a 16 bit field which includes Priority(3 bits),
CFI(1 bit) and VID(12 bits). Currently ntuple filters support
installing rules to steer packets based on VID only.
This patch extends that support such that filters can
be installed for entire VLAN TCI.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ixgbe_xsk_pool_enable(), if ixgbe_xsk_wakeup() fails,
We should restore the previous state and clean up the
resources. Add the missing clear af_xdp_zc_qps and unmap dma
to fix this bug.
Fixes: d49e286d35 ("ixgbe: add tracking of AF_XDP zero-copy state for each queue pair")
Fixes: 4a9b32f30f ("ixgbe: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20210817203736.3529939-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
"reset_fn" indicates whether the device supports any reset mechanism.
Remove the use of reset_fn in favor of the reset_methods array that tracks
supported reset mechanisms of a device and their ordering.
The octeon driver incorrectly used reset_fn to detect whether the device
supports FLR or not. Use pcie_reset_flr() to probe whether it supports FLR.
Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-5-ameynarkhede03@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
There is a spelling mistake in a dev_info message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As follow-up to the discussion with Jakub Kicinski about iavf locking
being insufficient [1] convert iavf to use mutexes instead of bitops.
The locking logic is kept as is, just a drop-in replacement of
enum iavf_critical_section_t with separate mutexes.
The only difference is that the mutexes will be destroyed before the
module is unloaded.
[1] https://lwn.net/ml/netdev/20210316150210.00007249%40intel.com/
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
qlcnic_83xx_unlock_flash() is called on all paths after we call
qlcnic_83xx_lock_flash(), except for one error path on failure
of QLCRD32(), which may cause a deadlock. This bug is suggested
by a static analysis tool, please advise.
Fixes: 81d0aeb0a4 ("qlcnic: flash template based firmware reset recovery")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20210816131405.24024-1-dinghao.liu@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On CN10K, the higher bits in the channel number represents the CPT
channel number. Mask out these higher bits in the npc configuration
to allow packets from cpt for parsing.
Signed-off-by: Vidya <vvelumuri@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The way SW can identify the number NPC counters supported by silicon
has changed for CN10K. This patch addresses this reading appropriate
registers to find out number of counters available.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the mcam entry allocation request is from PF
and NOT a priority allocation request then allocate
low priority entries so that PF entries always have
lower priority than its VFs. This is required so
that entries with (base) MCAM match criteria have lower
priority compared to entries with (base + additional)
match criteria. This patch considers only best case
scenario where PF entries are allocated from low
priority zone if low priority zone has free space.
There are worst case scenarios like:
1. VFs allocating hundreds of MCAM entries leading to VFs
using all mid priority zone and low priority zone entries
hence no entries free from low priority zone for PF.
2. All the PFs and VFs in the system allocating and freeing
entries causing fragmentation in MCAM space and all the
entries requested by PF could not fit in low priority
zone for allocation.
This patch do not handle worst case scenarios.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for setting or modifying MCAM entry count at
runtime via devlink params.
commands:
devlink dev param show
pci/0002:02:00.0:
name mcam_count type driver-specific
values:
cmode runtime value 16
devlink dev param set pci/0002:02:00.0 name mcam_count
value 64 cmode runtime
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variables used for TC flow management like maximum number
of flows, number of flows installed etc are a copy of ntuple
flow management variables. Since both TC and NTUPLE are not
supported at the same time, it's better to unify these with
common variables.
This patch addresses this unification and also does cleanup of
other minor stuff wrt TC.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per single mailbox request a maximum of 256 MCAM entries
can be allocated. If more than 256 are being allocated, then
the mcam indices in the final list could get jumbled. Hence
sort the indices.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add packet flow classification support for both LMAC mapped virtual
functions and loopback VFs. This patch adds supports for ntuple
offload feature.
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabled NETIF_F_RXALL support for VF driver.
Also removed MTU range comments which are no longer valid.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added debug messages for various failures during probe.
This will help in quickly identifying the API where the failure
is happening.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add appropriate error codes to be used when returning from AF
mailbox handlers due to some error condition.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When installing a flow using npc_install_flow
mailbox there are number of reasons to reject
the request like caller is not permitted,
invalid channel specified in request, flow
not supported in extraction profile and so on.
Hence define new error codes for npc flows and use
them instead of generic error codes.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset provides two separate mlx5 updates
1) Ethtool RSS context and MQPRIO channel mode support:
1.1) enable mlx5e netdev driver to allow creating Transport Interface RX
(TIRs) objects on the fly to be used for ethtool RSS contexts and
TX MQPRIO channel mode
1.2) Introduce mlx5e_rss object to manage such TIRs.
1.3) Ethtool support for RSS context
1.4) Support MQPRIO channel mode
2) Bridge offloads Lag support:
to allow adding bond net devices to mlx5 bridge
2.1) Address bridge port by (vport_num, esw_owner_vhca_id) pair
since vport_num is only unique per eswitch and in lag mode we
need to manage ports from both eswitches.
2.2) Allow connectivity between representors of different eswitch
instances that are attached to same bridge
2.3) Bridge LAG, Require representors to be in shared FDB mode and
introduce local and peer ports representors,
match on paired eswitch metadata in peer FDB entries,
And finally support addition/deletion and aging of peer flows.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmEa8gwACgkQSD+KveBX
+j56lAf/esRKDA2kKdtYr7AUNCDmRf/Fj5jBRuASuEWOGgoBaGNZprBw7So5MZdc
kZpkTnrvFZ5KXsq8nfCJsFkmepHMfJYmTc4VWXpMOyxYlsTwQc9UQ3MBboylMYqL
23KJKxurFN0kyMIOGciqXaetHgIhf+iQx1osJZd4WGk1soiWX7JiDV6gXXEZu4Ge
VH261CwOCLwpJ4STTMdjgcGnARStMcr0I3LJm7BWoMVls6FcLXcNslhvcRYYyUiX
B5UXYrZPLA39CegL/y/jP1vdH6pStb453x1dtjIoJiM+iOIjTaGkYOLF6w2QQWqd
/8R4MPeAYJ4ENsWmrEbPG+F/5GIN1Q==
=KWry
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-08-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-08-16
The following patchset provides two separate mlx5 updates
1) Ethtool RSS context and MQPRIO channel mode support:
1.1) enable mlx5e netdev driver to allow creating Transport Interface RX
(TIRs) objects on the fly to be used for ethtool RSS contexts and
TX MQPRIO channel mode
1.2) Introduce mlx5e_rss object to manage such TIRs.
1.3) Ethtool support for RSS context
1.4) Support MQPRIO channel mode
2) Bridge offloads Lag support:
to allow adding bond net devices to mlx5 bridge
2.1) Address bridge port by (vport_num, esw_owner_vhca_id) pair
since vport_num is only unique per eswitch and in lag mode we
need to manage ports from both eswitches.
2.2) Allow connectivity between representors of different eswitch
instances that are attached to same bridge
2.3) Bridge LAG, Require representors to be in shared FDB mode and
introduce local and peer ports representors,
match on paired eswitch metadata in peer FDB entries,
And finally support addition/deletion and aging of peer flows.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow adding bond net devices to mlx5 bridge with following changes:
- Modify bridge representor code to obtain uplink represetor that belongs
to eswitch that is registered for notification. Require representor to be
in shared FDB mode. If representor is the lag master, then consider its
port as local, otherwise treat it as peer.
- Use devcom to match on paired eswitch metadata in peer FDB entries. This
is necessary for shared FDB LAG to function since packets are always
received on active eswitch instance as opposed to parent eswitch of port.
- Support for deleting peer flows when receiving
SWITCHDEV_FDB_DEL_TO_BRIDGE notification was implemented in one of previous
patches in series. Now also implement support for handling
SWITCHDEV_FDB_ADD_TO_BRIDGE which can be generated on peer by bridge update
workqueue task in LAG configuration. Refresh the flow 'lastuse' timestamp
to current jiffies when receiving such notification on eswitch that manages
the local FDB entry. This allows peer entries to prevent ageing of the FDB.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Allow connectivity between representors of different eswitch instances that
are attached to same bridge when merged_eswitch capability is enabled. Add
ports of peer eswitch to bridge instance and mark them with
MLX5_ESW_BRIDGE_PORT_FLAG_PEER. Mark FDBs offloaded on peer ports with
MLX5_ESW_BRIDGE_FLAG_PEER flag. Such FDBs can only be aged out on their
local eswitch instance, which then sends SWITCHDEV_FDB_DEL_TO_BRIDGE event.
Listen to the event on mlx5 bridge implementation and delete peer FDBs in
event handler.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SWITCHDEV_FDB_DEL_TO_BRIDGE notification is generated in multiple places in
bridge code. Following patch in series changes the condition for the
notification. Extract the notification into dedicated helper function
mlx5_esw_bridge_fdb_del_notify() to only modify it in single place in the
future changes.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Following patches in series allow traffic between vports of different
eswitch instances, which requires addressing bridge port by
vport_num+esw_owner_vhca_id pair since vport_num is only unique
per-eswitch. As a preparation, extend struct mlx5_esw_bridge_port with
'esw_owner_vhca_id' field and use it as part of key for
mlx5_esw_bridge->vports xarray.
With this change we can't rely on switchdev_handle_port_obj_add() helper to
get mlx5 representor from stacked device because we need specifically
representor from parent eswitch that registered the callback to obtain
correct esw_owner_vhca_id. The helper doesn't allow passing additional
parameters to predicate function and doesn't provide access to the notifier
block to obtain eswitch through br_offloads. Implement custom helpers to
obtain mlx5 representor and use them in
mlx5_esw_bridge_port_obj_{add|del|attr_set}() implementations.
Remove direct pointer to parent bridge from struct mlx5_vport as it is no
longer needed.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Following patches in series will pass bond device to bridge, which means
the code can't assume the device is mlx5 representor. Moreover, the core
device can be easily obtained from eswitch instance, so there is no reason
for more complex code that obtains struct mlx5_priv from net_device in
order to use its mdev. Refactor the code to use esw->dev instead of
priv->mdev.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Refactor mlx5_esw_bridge_vport_link() to release the bridge instance if
mlx5_esw_bridge_vport_init() returned an error instead of relying on it to
release the bridge. This improves the design because object instance is
taken and released in same layer and simplifies following patches that add
more logic to mlx5_esw_bridge_vport_link().
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support for MQPRIO channel mode, in which a partition to TCs
is defined over the channels. We allow partitions with contiguous
queue indices, with no holes within. We do not allow modification
to the num of channels while this MQPRIO mode is active.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This is in preparation for supporting MQPRIO CHANNEL mode in
downstream patch, in addition to DCB mode that's supported today.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Abstract the MQPRIO params into a struct.
Use a getter for DCB mode num_tcs.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Extend the existing flow classification support, to steer
flows not only directly to a receive ring, but also into
the new RSS contexts.
Create needed TIR objects on demand, and hold reference
on the RSS context.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support to multiple RSS contexts. Resources of the non-default
RSS contexts are allocated and created on demand. Each RSS context
can be controlled and configured separately, via the implemented
ethtool ops. Here we limit the num of total contexts to 16.
We do not enforce any kind of new limitation over the indirection table
content. More specifically, two separate contexts can be configured to
fully or partially point to the same set of receive rings.
The default RSS context (index 0) is created with its full set of TIRs.
All other contexts are created with an empty set, then TIRs are added
upon first usage when steering rules are added.
We use a reference counting mechanism to make sure an RSS context is
not removed before the rules pointing to it.
Block ethtool set_channels operations when multiple RSS contexts exist,
as currently the kernel doesn't protect against inconsistent channels
configs that break non-default RSS contexts.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Move from static to dynamic memory allocations for TIR.
This is in preparation to supporting on-demand TIR operations in
downstream patches, where every RSS context will be init with an
empty set of TIRs.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Code related to RSS is now encapsulated into a dedicated object and put
into new files en/rss.{c,h}. All usages are converted.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Bring all fields that define and maintain RSS behavior together
into a new structure.
Align all usages with this new structure. Keep it hidden within
rx_res.c.
This helps supporting multiple RSS contexts in downstream patch.
Use dynamic allocations for the RSS context.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Take TIR control operations in rx_res into functions.
This is in preparation to supporting on-demand TIR operations in
downstream patches.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
All calls to mlx5e_rx_res_rss_set_indir_uniform() occur while the RSS
state is inactive, i.e. the RQT is pointing to the drop RQ, not to the
channels' RQs.
It means that the "apply" part of the function is not called.
Remove this part from the function, and document the change. It will be
useful for next patches in the series, allows code simplifications when
multiple RSS contexts are introduced.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In order to know the reason of link up failure, add supporting ethtool
extended link state. Driver reads the link status code from firmware if
in link down state and converts it to ethtool extended link state.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new file hns3_ethtool.h, and move struct type definitions from
hns3_ethtool.c to hns3_ethtool.h.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adding ethtool per-queue statistics support to show number of interrupts
generated at DMA tx and DMA rx. All the counters are incremented at
dwmac4_dma_interrupt function.
Signed-off-by: Vijayakannan Ayyathurai <vijayakannan.ayyathurai@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding generic ethtool per-queue statistic framework to display the
statistics for each rx/tx queue. In future, users can avail it to add
more per-queue specific counters. Number of rx/tx queues displayed is
depending on the available rx/tx queues in that particular MAC config
and this number is limited up to the MTL_MAX_{RX|TX}_QUEUES defined
in the driver.
Ethtool per-queue statistic display will look like below, when users
start adding more counters.
Example:
q0_tx_statA:
q0_tx_statB:
q0_tx_statC:
|
q0_tx_statX:
.
.
.
qMAX_tx_statA:
qMAX_tx_statB:
qMAX_tx_statC:
|
qMAX_tx_statX:
q0_rx_statA:
q0_rx_statB:
q0_rx_statC:
|
q0_rx_statX:
.
.
.
qMAX_rx_statA:
qMAX_rx_statB:
qMAX_rx_statC:
|
qMAX_rx_statX:
In addition, this patch has the support on displaying the number of
packets received and transmitted per queue.
Signed-off-by: Vijayakannan Ayyathurai <vijayakannan.ayyathurai@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DMA channel status "Transmit buffer unavailable(TBU)" bit is not
considered as a successful dma tx. Hence, it should not affect
all the irq count statistic.
Fixes: 1103d3a553 ("net: stmmac: dwmac4: Also use TBU interrupt to clean TX path")
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Vijayakannan Ayyathurai <vijayakannan.ayyathurai@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each completion ring entry has a valid bit to indicate that the entry
contains a valid completion event. The driver's main poll loop
__bnxt_poll_work() has the proper dma_rmb() to make sure the valid
bit of the next entry has been checked before proceeding further.
But when we call bnxt_rx_pkt() to process the RX event, the RX
completion event consists of two completion entries and only the
first entry has been checked to be valid. We need the same barrier
after checking the next completion entry. Add missing dma_rmb()
barriers in bnxt_rx_pkt() and other similar locations.
Fixes: 67a95e2022 ("bnxt_en: Need memory barrier when processing the completion ring.")
Reported-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
212 firmware broke aRFS, so disable it. Traffic may stop after ntuple
filters are inserted and deleted by the 212 firmware.
Fixes: ae10ae740a ("bnxt_en: Add new hardware RFS mode.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a possible null-pointer dereference in qed_rdma_create_qp().
Changes from V2:
- Revert checkpatch fixes.
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoiding qed ll2 race condition and NULL pointer dereference as part
of the remove and recovery flows.
Changes form V1:
- Change (!p_rx->set_prod_addr).
- qed_ll2.c checkpatch fixes.
Change from V2:
- Revert "qed_ll2.c checkpatch fixes".
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the function to reflect what it's doing. Also add a description
of the register values as kindly provided by Realtek.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The felix DSA driver, which is a wrapper over the same hardware class as
ocelot, is integrated with phylink, but ocelot is using the plain PHY
library. It makes sense to bring together the two implementations, which
is what this patch achieves.
This is a large patch and hard to break up, but it does the following:
The existing ocelot_adjust_link writes some registers, and
felix_phylink_mac_link_up writes some registers, some of them are
common, but both functions write to some registers to which the other
doesn't.
The main reasons for this are:
- Felix switches so far have used an NXP PCS so they had no need to
write the PCS1G registers that ocelot_adjust_link writes
- Felix switches have the MAC fixed at 1G, so some of the MAC speed
changes actually break the link and must be avoided.
The naming conventions for the functions introduced in this patch are:
- vsc7514_phylink_{mac_config,validate} are specific to the Ocelot
instantiations and placed in ocelot_net.c which is built only for the
ocelot switchdev driver.
- ocelot_phylink_mac_link_{up,down} are shared between the ocelot
switchdev driver and the felix DSA driver (they are put in the common
lib).
One by one, the registers written by ocelot_adjust_link are:
DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this
register since its out-of-reset value was fine and
did not need changing. The write is moved to the
common ocelot_phylink_mac_link_up and on felix it is
guarded by a quirk bit that makes the written value
identical with the out-of-reset one
DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config
PCS1G_MODE_CFG - same as above
PCS1G_SD_CFG - same as above
PCS1G_CFG - same as above
PCS1G_ANEG_CFG - same as above
PCS1G_LB_CFG - same as above
DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable
touched this. felix_phylink_mac_link_{up,down} also
do. We go with what felix does and put it in
ocelot_phylink_mac_link_up.
DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both
write this, but to different values. Move to the common
ocelot_phylink_mac_link_up and make sure via the quirk
that the old values are preserved for both.
ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up
did not. Runtime invariant, speed does not matter since
PFC is disabled via the RX_PFC_ENA bits which are cleared.
Move to vsc7514_phylink_mac_config.
QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and
felix_phylink_mac_link_{up,down} wrote
this. Ocelot also wrote this register
from ocelot_port_disable. Keep what
felix did, move in ocelot_phylink_mac_link_{up,down}
and delete ocelot_port_disable.
ANA_POL_FLOWC - same as above
SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas
ocelot always enabled RX and TX flow control, felix
listened to phylink (for the most part, at least - see
the 2500base-X comment).
The registers which only felix_phylink_mac_link_up wrote are:
SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control
worked on ocelot. Not it should, since the
code is shared with felix where it does.
ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink
should be the one touching them, deleted.
Other changes:
- The old phylib registration code was in mscc_ocelot_init_ports. It is
hard to work with 2 levels of indentation already in, and with hard to
follow teardown logic. The new phylink registration code was moved
inside ocelot_probe_port(), right between alloc_etherdev() and
register_netdev(). It could not be done before (=> outside of)
ocelot_probe_port() because ocelot_probe_port() allocates the struct
ocelot_port which we then use to assign ocelot_port->phy_mode to. It
is more preferable to me to have all PHY handling logic inside the
same function.
- On the same topic: struct ocelot_port_private :: serdes is only used
in ocelot_port_open to set the SERDES protocol to Ethernet. This is
logically a runtime invariant and can be done just once, when the port
registers with phylink. We therefore don't even need to keep the
serdes reference inside struct ocelot_port_private, or to use the devm
variant of of_phy_get().
- Phylink needs a valid phy-mode for phylink_create() to succeed, and
the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts
don't define one for the internal PHY ports. So we patch
PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL.
- There was a strategically placed:
switch (priv->phy_mode) {
case PHY_INTERFACE_MODE_NA:
continue;
which made the code skip the serdes initialization for the internal
PHY ports. Frankly that is not all that obvious, so now we explicitly
initialize the serdes under an "if" condition and not rely on code
jumps, so everything is clearer.
- There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII
ports. Since that is in fact the default value for the register field
DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear
the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out
of reset, which does match the comment. I don't even want to know why
this code is placed there, but if there is indeed an issue that all
ports that share a QSGMII lane must all be up, then this logic is
already buggy, since mscc_ocelot_init_ports iterates using
for_each_available_child_of_node, so nobody prevents the user from
putting a 'status = "disabled";' for some QSGMII ports which would
break the driver's assumption.
In any case, in the eventuality that I'm right, we would have yet
another issue if ocelot_phylink_mac_link_down would reset those ports
and that would be forbidden, so since the ocelot_adjust_link logic did
not do that (maybe for a reason), add another quirk to preserve the
old logic.
The ocelot driver teardown goes through all ports in one fell swoop.
When initialization of one port fails, the ocelot->ports[port] pointer
for that is reset to NULL, and teardown is done only for non-NULL ports,
so there is no reason to do partial teardowns, let the central
mscc_ocelot_release_ports() do its job.
Tested bind, unbind, rebind, link up, link down, speed change on mock-up
hardware (modified the driver to probe on Felix VSC9959). Also
regression tested the felix DSA driver. Could not test the Ocelot
specific bits (PCS1G, SERDES, device tree bindings).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_port_enable touches ANA_PORT_PORT_CFG, which has the following
fields:
- LOCKED_PORTMOVE_CPU, LEARNDROP, LEARNCPU, LEARNAUTO, RECV_ENA, all of
which are written with their hardware default values, also runtime
invariants. So it makes no sense to write these during every .ndo_open.
- PORTID_VAL: this field has an out-of-reset value of zero for all ports
and must be initialized by software. Additionally, the
ocelot_setup_logical_port_ids() code path sets up different logical
port IDs for the ports in a hardware LAG, and we absolutely don't want
.ndo_open to interfere there and reset those values.
So in fact the write from ocelot_port_enable can better be moved to
ocelot_init_port, and the .ndo_open hook deleted.
ocelot_port_disable touches DEV_MAC_ENA_CFG and QSYS_SWITCH_PORT_MODE_PORT_ENA,
in an attempt to undo what ocelot_adjust_link did. But since .ndo_stop
does not get called each time the link falls (i.e. this isn't a
substitute for .phylink_mac_link_down), felix already does better at
this by writing those registers already in felix_phylink_mac_link_down.
So keep ocelot_port_disable (for now, until ocelot is converted to
phylink too), and just delete the felix call to it, which is not
necessary.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The devlink pointer always exists after hclge_devlink_init() succeed.
Remove that check together with NULL setting after release and ensure
that devlink_register is last command prior to call to devlink_reload_enable().
Fixes: b741269b27 ("net: hns3: add support for registering devlink for PF")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'imply' keyword does not do what most people think it does, it only
politely asks Kconfig to turn on another symbol, but does not prevent
it from being disabled manually or built as a loadable module when the
user is built-in. In the ICE driver, the latter now causes a link failure:
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':
This is a recurring problem in many drivers, and we have discussed
it several times befores, without reaching a consensus. I'm providing
a link to the previous email thread for reference, which discusses
some related problems.
To solve the dependency issue better than the 'imply' keyword, introduce a
separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
can depend on if it is able to use PTP support when available, but works
fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
then prevented from being built-in, the same way as with a 'depends on
PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
but that can be rather confusing when you first see it.
Since this should cover the dependencies correctly, the IS_REACHABLE()
hack in the header is no longer needed now, and can be turned back
into a normal IS_ENABLED() check. Any driver that gets the dependency
wrong will now cause a link time failure rather than being unable to use
PTP support when that is in a loadable module.
However, the two recently added ptp_get_vclocks_index() and
ptp_convert_timestamp() interfaces are only called from builtin code with
ethtool and socket timestamps, so keep the current behavior by stubbing
those out completely when PTP is in a loadable module. This should be
addressed properly in a follow-up.
As Richard suggested, we may want to actually turn PTP support into a
'bool' option later on, preventing it from being a loadable module
altogether, which would be one way to solve the problem with the ethtool
interface.
Fixes: 06c16d89d2 ("ice: register 1588 PTP clock device object for E810 devices")
Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/
Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/
Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Internal tests found out that the latest code doesn't bring up 1PPS out
as expected. As a result of incorrect define used to round the time up
the time was round down to the past second boundary.
Fix define used for rounding to properly round up to the next Top of
second in ice_ptp_cfg_clkout to fix it.
Fixes: 172db5f91d ("ice: add support for auxiliary input/output pins")
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20210813165018.2196013-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The menuconfig FSL_DPAA_ETH selects config FSL_FMAN_MAC, but the config
FSL_FMAN_MAC never existed in the kernel tree.
Hence, ./scripts/checkkconfigsymbols.py warns:
FSL_FMAN_MAC
Referencing files: drivers/net/ethernet/freescale/dpaa/Kconfig
Remove this dead select in menuconfig FSL_DPAA_ETH.
Fixes: 9ad1a37493 ("dpaa_eth: add support for DPAA Ethernet")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By default FEC driver treat irq[0] (i.e. int0 described in dt-binding) as
wakeup interrupt, but this situation changed on i.MX8M serials, SoC
integration guys mix wakeup interrupt signal into int2 interrupt line.
This patch introduces FEC_QUIRK_WAKEUP_FROM_INT2 to indicate int2 as wakeup
interrupt for i.MX8MQ.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20210812070948.25797-1-qiangqing.zhang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The EtherAVB instances on the R-Car E3/D3 and RZ/G2E SoCs do not support
TX clock internal delay modes, and the EtherAVB driver prints a warning
if an unsupported "rgmii-*id" PHY mode is specified, to catch buggy
DTBs.
Commit a6f51f2efa ("ravb: Add support for explicit internal
clock delay configuration") deprecated deriving the internal delay mode
from the PHY mode, in favor of explicit configuration using the now
mandatory "rx-internal-delay-ps" and "tx-internal-delay-ps" properties,
thus delegating the warning to the legacy fallback code.
Since explicit configuration of a (valid) internal clock delay
configuration is enforced by validating device tree source files against
DT binding files, and all upstream DTS files have been converted as of
commit a5200e63af ("arm64: dts: renesas: rzg2: Convert EtherAVB to
explicit delay handling"), the checks in the legacy fallback code can be
removed.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/2037542ac56e99413b9807e24049711553cc88a9.1628696778.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drivers should count packets they are dropping.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
skbs are freed on error and not put on the ring. We may, however,
be in a situation where we're freeing the last skb of a batch,
and there is a doorbell ring pending because of xmit_more() being
true earlier. Make sure we ring the door bell in such situations.
Since errors are rare don't pay attention to xmit_more() and just
always flush the pending frames.
The busy case should be safe to be left alone because it can
only happen if start_xmit races with completions and they
both enable the queue. In that case the kick can't be pending.
Noticed while reading the code.
Fixes: 4d172f21ce ("bnxt_en: Implement xmit_more.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
napi schedules DIM, napi has to be disabled first,
then DIM canceled.
Noticed while reading the code.
Fixes: 0bc0b97fca ("bnxt_en: cleanup DIM work on device shutdown")
Fixes: 6a8788f256 ("bnxt_en: add support for software dynamic interrupt moderation")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can't take the tx lock from the napi poll routine, because
netpoll can poll napi at any moment, including with the tx lock
already held.
The tx lock is protecting against two paths - the disable
path, and (as Michael points out) the NETDEV_TX_BUSY case
which may occur if NAPI completions race with start_xmit
and both decide to re-enable the queue.
For the disable/ifdown path use synchronize_net() to make sure
closing the device does not race we restarting the queues.
Annotate accesses to dev_state against data races.
For the NAPI cleanup vs start_xmit path - appropriate barriers
are already in place in the main spot where Tx queue is stopped
but we need to do the same careful dance in the TX_BUSY case.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Misc. cleanup for mlx5.
1) Typos and use of netdev_warn()
2) smatch cleanup
3) Minor fix to inner TTC table creation
4) Dynamic capability cache allocation
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmEUE4oACgkQSD+KveBX
+j7EsAf/W8hfoRYNeqf+xGwkOOUzMvSK8gNPD7f02yniqQ4MmC++v7Fg4M162uvG
d87EoCNjbGLqhAe/4W8Euis+4/s6o3EAjjjdck5B/ZKEoyY/W6PJYJPDNL358yaK
DsKw6oZrs02enrRbQN5/WzEjfaDSpKboD5eYqsl+g4YWu0wJt5bsDb5qj7UlGqee
FIIqLeKgsbqWmpQxpBRb6jzKLB0LxN1Kk2ymy7tyq+CCEWUvtlAY+509pWn/MOyX
G0rCg1Pz125ZvganFZaEDK4bKy2yAu0FmKv6CoC/LkNdLVWCRuQ1n1zR2B8Tt85U
SXEcRzenTSpDQ9X4a78dB5wpdG5Y3w==
=8u8S
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 updates 2021-08-11
This series provides misc updates to mlx5.
For more information please see tag log below.
Please pull and let me know if there is any problem.
mlx5-updates-2021-08-11
Misc. cleanup for mlx5.
1) Typos and use of netdev_warn()
2) smatch cleanup
3) Minor fix to inner TTC table creation
4) Dynamic capability cache allocation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot driver makes use of regmap, wrapping it with driver specific
operations that are thin wrappers around the core regmap APIs. These are
exported with EXPORT_SYMBOL, dropping the _GPL from the core regmap
exports which is frowned upon. Add _GPL suffixes to at least the APIs that
are doing register I/O.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, four reset types are supported for the HNS3 ethernet
driver: IMP reset, global reset, function reset, and FLR. Only
FLR can now be triggered by the user. To restore the device when
an exception occurs, add support for triggering reset by ethtool.
Run the "ethtool --reset DEVNAME mgmt | all | dedicated" to
trigger the IMP | global | function reset manually.
In addition, VF can only trigger function reset.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Link: https://lore.kernel.org/r/1628602128-15640-1-git-send-email-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ocelot driver makes use of regmap, wrapping it with driver specific
operations that are thin wrappers around the core regmap APIs. These are
exported with EXPORT_SYMBOL, dropping the _GPL from the core regmap
exports which is frowned upon. Add _GPL suffixes to at least the APIs that
are doing register I/O.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210810123748.47871-1-broonie@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
to replace printk(KERN_WARNING ...) with netdev_warn() kindly
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fix the following smatch warning:
wait_func_handle_exec_timeout() warn: should '1 << ent->idx' be a 64 bit type?
Use 1ULL, to have a 64 bit type variable.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Subsequent patches make use of numa node affinity for memory
allocations. Initialize it for PCI PF, VF and SF devices.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently mlx5_core_dev contains array of capabilities. It contains 19
valid capabilities of the device, 2 reserved entries and 12 holes.
Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
bytes of memory which is never used. Due to this mlx5_core_dev structure
size is 270Kbytes odd. This allocation further aligns to next power of 2
to 512Kbytes.
By skipping non-existent entries,
(a) 112Kbyte is saved,
(b) mlx5_core_dev reduces to 8KB with alignment
(c) 350KB saved in alignment
In future individual capability allocation can be used to skip its
allocation when such capability is disabled at the device level. This
patch prepares mlx5_core_dev to hold capability using a pointer instead
of inline array.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In the current code, the current and maximal capabilities are
maintained in separate arrays which are both per type. In order to
allow the creation of such a basic structure as a dynamically
allocated array, we move curr and max fields to a unified
structure so that specific capabilities can be allocated as one unit.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, all access to mlx5 IRQs are done undere a lock. Hance, there
isn't a reason to have kref in struct mlx5_irq.
Switch it to integer.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When MSI-X vectors allocated are not enough for SFs to have dedicated,
MSI-X, kernel log buffer has too many entries.
Hence only enable such log with debug level.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
New mlx5_core device structure is allocated through devlink_alloc
with\ kzalloc and that ensures that all fields are equal to zero
and it includes ->state too.
That means that checks of that field in the mlx5_init_one() is
completely redundant, because that function is called only once
in the begging of mlx5_core_dev lifetime.
PCI:
.probe()
-> probe_one()
-> mlx5_init_one()
The recovery flow can't run at that time or before it, because relevant
work initialized later in mlx5_init_once().
Such initialization flow ensures that dev->state can't be
MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
checks.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fix typo of the cited commit that calls to mlx5_create_ttc_table, instead
of mlx5_create_inner_ttc_table.
Fixes: f4b45940e9 ("net/mlx5: Embed mlx5_ttc_table")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Enable user to disable VDPA net auxiliary device so that when it is not
required, user can disable it.
For example,
$ devlink dev param set pci/0000:06:00.0 \
name enable_vnet value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create auxiliary device
mlx5_core.vnet.2 for the VDPA net functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable user to disable RDMA auxiliary device so that when it is not
required, user can disable it.
For example,
$ devlink dev param set pci/0000:06:00.0 \
name enable_rdma value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create auxiliary device
mlx5_core.rdma.2 for the RDMA functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable user to disable Ethernet auxiliary device so that when it is not
required, user can disable it.
For example,
$ devlink dev param set pci/0000:06:00.0 \
name enable_eth value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create mlx5_core.eth.2 auxiliary
device for the Ethernet functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup routine missed to unpublish the parameters. Add it.
Fixes: e890acd5ff ("net/mlx5: Add devlink flow_steering_mode parameter")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.
To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.
Fixes: 2c4eca3ef7 ("net: bridge: switchdev: include local flag in FDB notifications")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
pull-request: mlx5-next 2020-08-9
This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.
1) mlx5 single E-Switch FDB for lag
2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
3) Add DCS caps & fields support
[1] https://patchwork.kernel.org/project/netdevbpf/cover/20210803231959.26513-1-saeed@kernel.org/
[2] 0e3364dab7.1626609184.git.leonro@nvidia.com/
[3] 55e1d69bef.1624258894.git.leonro@nvidia.com/
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Lag, Create shared FDB when in switchdev mode
net/mlx5: E-Switch, add logic to enable shared FDB
net/mlx5: Lag, move lag destruction to a workqueue
net/mlx5: Lag, properly lock eswitch if needed
net/mlx5: Add send to vport rules on paired device
net/mlx5: E-Switch, Add event callback for representors
net/mlx5e: Use shared mappings for restoring from metadata
net/mlx5e: Add an option to create a shared mapping
net/mlx5: E-Switch, set flow source for send to uplink rule
RDMA/mlx5: Add shared FDB support
{net, RDMA}/mlx5: Extend send to vport rules
RDMA/mlx5: Fill port info based on the relevant eswitch
net/mlx5: Lag, add initial logic for shared FDB
net/mlx5: Return mdev from eswitch
IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
net/mlx5: Add DCS caps & fields support
====================
Link: https://lore.kernel.org/r/20210809202522.316930-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmER+RIACgkQSD+KveBX
+j76rwf/TxdKKUEkwHp6w/NL0lSJ2NwUHtxWWeXmfI1bgyYQvF2ItbamkN0PrPYF
YUUs3v/pOwNlpVng+CITbOkHQZaJxWuTkhyujkMr5zulaIYRMqZRFvk5m7zlcEzO
srue5CA8B6j533LHhwfqJfVnFjhA4zEUzQn7+b/9HaOSFvkacdXTHR1YGuYOrfTY
jgptzJfqS26xVo3S6k3wndkvu6SpJy0cpvF7iaAa7zXckUUN8/knyrBv4c6UPmEF
vBq1TngqEoxbO420NvCVBfKrqw52MG6qtQRqoVWaFaKHZL02yZbzjAJK+MulYKwU
CNm1t7Y31Q+kp5kcnIhrBokQgx79Gw==
=1R3z
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2021-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-08-09
This series introduces fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The CQ destroy is performed based on the IRQ number that is stored in
cq->irqn. That number wasn't set explicitly during CQ creation and as
expected some of the API users of mlx5_core_create_cq() forgot to update
it.
This caused to wrong synchronization call of the wrong IRQ with a number
0 instead of the real one.
As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
update all users accordingly.
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Fixes: ef1659ade3 ("IB/mlx5: Add DEVX support for CQ events")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Free the offload sample action on error.
Fixes: f94d6389f6 ("net/mlx5e: TC, Add support to offload sample action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently irq->pool is set after the irq is insert to the xarray.
Set irq->pool before the irq is inserted to the xarray.
Fixes: 71e084e264 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Change order of functions in mlx5_irq_detach_nb() so it will be
a mirror of mlx5_irq_attach_nb.
Fixes: 71e084e264 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Since switchdev mode can't support devlink traps, verify there are
no active devlink traps before moving eswitch to switchdev mode. If
there are active traps, prevent the switchdev mode configuration.
Fixes: eb3862a052 ("net/mlx5e: Enable traps according to link state")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to
free the outstanding descriptors, which relies on
mlx5e_page_release_dynamic and page_pool_release_page. However,
page_pool_destroy is already called by this point, because
mlx5e_close_rq runs before mlx5e_close_xdpsq.
This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and
mlx5e_close_rq.
The commit cited below started calling page_pool_destroy directly from
the driver. Previously, the page pool was destroyed under a call_rcu
from xdp_rxq_info_unreg_mem_model, which would defer the deallocation
until after the XDPSQ is cleaned up.
Fixes: 1da4bbeffe ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Ageing time is not converted from clock_t to jiffies which results
incorrect ageing timeout calculation in workqueue update task. Fix it by
applying clock_t_to_jiffies() to provided value.
Fixes: c636a0f0f3 ("net/mlx5: Bridge, dynamic entry ageing")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
It could be local and remote are on the same machine and the route
result will be a local route which will result in creating encap id
with src/dst mac address of 0.
Fixes: a54e20b4fc ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
While processing encapsulated packet on RX, one of the fields that is
checked is the inner packet length. If the length as specified in the header
doesn't match the actual inner packet length, the packet is invalid
and should be dropped. However, such packet caused the NIC to hang.
This patch turns on a 'fail_on_error' HW bit which allows HW to drop
such an invalid packet while processing RX packet and trying to decap it.
Fixes: ad17dc8cf9 ("net/mlx5: DR, Move STEv0 action apply logic")
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch adds skb's frag page recycling support based on
the frag page support in page pool.
The performance improves above 10~20% for single thread iperf
TCP flow with IOMMU disabled when iperf server and irq/NAPI
have a different CPU.
The performance improves about 135%(14Gbit to 33Gbit) for single
thread iperf TCP flow when IOMMU is in strict mode and iperf
server shares the same cpu with irq/NAPI.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, page->pp is cleared and set everytime the page
is recycled, which is unnecessary.
So only set the page->pp when the page is added to the page
pool and only clear it when the page is released from the
page pool.
This is also a preparation to support allocating frag page
in page pool.
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
iavf driver should set RSS LUT and key unconditionally in reset
path. Currently, the driver does not do that. This patch fixes
this issue.
Fixes: 2c86ac3c70 ("i40evf: create a generic config RSS function")
Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In some circumstances, such as with bridging, it's possible that the
stack will add the device's own MAC address to its unicast address list.
If, later, the stack deletes this address, the driver will receive a
request to remove this address.
The driver stores its current MAC address as part of the VSI MAC filter
list instead of separately. So, this causes a problem when the device's
MAC address is deleted unexpectedly, which results in traffic failure in
some cases.
The following configuration steps will reproduce the previously
mentioned problem:
> ip link set eth0 up
> ip link add dev br0 type bridge
> ip link set br0 up
> ip addr flush dev eth0
> ip link set eth0 master br0
> echo 1 > /sys/class/net/br0/bridge/vlan_filtering
> modprobe -r veth
> modprobe -r bridge
> ip addr add 192.168.1.100/24 dev eth0
The following ping command fails due to the netdev->dev_addr being
deleted when removing the bridge module.
> ping <link partner>
Fix this by making sure to not delete the netdev->dev_addr during MAC
address sync. After fixing this issue it was noticed that the
netdev_warn() in .set_mac was overly verbose, so make it at
netdev_dbg().
Also, there is a possibility of a race condition between .set_mac and
.set_rx_mode. Fix this by calling netif_addr_lock_bh() and
netif_addr_unlock_bh() on the device's netdev when the netdev->dev_addr
is going to be updated in .set_mac.
Fixes: e94d447866 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Liang Li <liali@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When VFs are setup and torn down in quick succession, it is possible
that a VF is torn down by the PF while the VF's virtchnl requests are
still in the PF's mailbox ring. Processing the VF's virtchnl request
when the VF itself doesn't exist results in undefined behavior. Fix
this by adding a check to stop processing virtchnl requests when VF
teardown is in progress.
Fixes: ddf30f7ff8 ("ice: Add handler to configure SR-IOV")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The userspace utility "driverctl" can be used to change/override the
system's default driver choices. This is useful in some situations
(buggy driver, old driver missing a device ID, trying a workaround,
etc.) where the user needs to load a different driver.
However, this is also prone to user error, where a driver is mapped
to a device it's not designed to drive. For example, if the ice driver
is mapped to driver iavf devices, the ice driver crashes.
Add a check to return an error if the ice driver is being used to
probe a virtual function.
Fixes: 837f08fdec ("ice: Add basic driver framework for Intel(R) E800 Series")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().
Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.
Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.
[ 8.758862] devlink_nl_param_fill+0x2e8/0xe50
[ 8.760305] devlink_param_notify+0x6d/0x180
[ 8.760435] __devlink_params_register+0x2f1/0x670
[ 8.760558] devlink_params_register+0x1e/0x20
The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CPSW switchdev driver inherited fix from commit 9421c90150 ("net:
ethernet: ti: cpsw: fix min eth packet size") which changes min TX packet
size to 64bytes (VLAN_ETH_ZLEN, excluding ETH_FCS). It was done to fix HW
packed drop issue when packets are sent from Host to the port with PVID and
un-tagging enabled. Unfortunately this breaks some other non-switch
specific use-cases, like:
- [1] CPSW port as DSA CPU port with DSA-tag applied at the end of the
packet
- [2] Some industrial protocols, which expects min TX packet size 60Bytes
(excluding FCS).
Fix it by configuring min TX packet size depending on driver mode
- 60Bytes (ETH_ZLEN) for multi mac (dual-mac) mode
- 64Bytes (VLAN_ETH_ZLEN) for switch mode
and update it during driver mode change and annotate with
READ_ONCE()/WRITE_ONCE() as it can be read by napi while writing.
[1] https://lore.kernel.org/netdev/20210531124051.GA15218@cephalopod/
[2] https://e2e.ti.com/support/arm/sitara_arm/f/791/t/701669
Cc: stable@vger.kernel.org
Fixes: ed3525eda4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
Reported-by: Ben Hutchings <ben.hutchings@essensium.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some older Broadcom debug tools use window 5 and may conflict, so switch
to use window 6 instead.
Fixes: 118612d519 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New firmware interface requires the PTP sequence ID header offset to
be passed to the firmware to properly find the matching timestamp
for all protocols.
Fixes: 83bb623c96 ("bnxt_en: Transmit and retrieve packet timestamps")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The key change is the firmware call to retrieve the PTP TX timestamp.
The header offset for the PTP sequence number field is now added.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink port already has pointer to the devlink instance and all API
calls that forward these devlink ports to the drivers perform same
"devlink_port->devlink" assignment before actual call.
This patch removes useless parameter and allows us in the future
to create specific devlink_port_ops to manage user space access with
reliable ops assignment.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When compiling with clang in certain configurations, an objtool warning
appears:
drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.o: warning: objtool:
ipq806x_gmac_probe() falls through to next function phy_modes()
This happens because the unreachable annotation in the third switch
statement is not eliminated. The compiler should know that the first
default case would prevent the second and third from being reached as
the comment notes but sanitizer options can make it harder for the
compiler to reason this out.
Help the compiler out by eliminating the unreachable() annotation and
unifying the default case error handling so that there is no objtool
warning, the meaning of the code stays the same, and there is less
duplication.
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The true check on the variable startable in the ternary operator
is always false because the previous if statement handles the true
condition for startable. Hence the ternary check is dead code and
can be removed.
Addresses-Coverity: ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original L0 and L1 entrance latencies of RTL8106e are 4us. And
they cause the delay of link-up interrupt when enabling ASPM. Change
the L0 entrance latency to 7us and L1 entrance latency to 32us. Then,
they could avoid the issue.
Tested-by: Koba Ko <koba.ko@canonical.com>
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1ee8856de8.
This is used to re-enable ASPM on RTL8106e, if it is possible.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On s390, the following build warning occurs:
drivers/net/ethernet/marvell/mvpp2/mvpp2.h:844:2: warning: overflow in
conversion from 'long unsigned int' to 'int' changes value from
'18446744073709551584' to '-32' [-Woverflow]
844 | ((total_size) - MVPP2_SKB_HEADROOM - MVPP2_SKB_SHINFO_SIZE)
This happens because MVPP2_SKB_SHINFO_SIZE, which is 320 bytes (which is
already 64-byte aligned) on some architectures, actually gets ALIGN'd up
to 512 bytes in the s390 case.
So then, when this is invoked:
MVPP2_RX_MAX_PKT_SIZE(MVPP2_BM_SHORT_FRAME_SIZE)
...that turns into:
704 - 224 - 512 == -32
...which is not a good frame size to end up with! The warning above is a
bit lucky: it notices a signed/unsigned bad behavior here, which leads
to the real problem of a frame that is too short for its contents.
Increase MVPP2_BM_SHORT_FRAME_SIZE by 32 (from 704 to 736), which is
just exactly big enough. (The other values can't readily be changed
without causing a lot of other problems.)
Fixes: 07dd0a7aae ("mvpp2: add basic XDP support")
Cc: Sven Auhagen <sven.auhagen@voleatech.de>
Cc: Matteo Croce <mcroce@microsoft.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables support for hard irqs deferral feature from Eric Dumazet
[1] for TI K3 CPSW driver by using napi_complete_done() in TX completion
path.
Depending on gro_flush_timeout and napi_defer_hard_irqs at gives up to 30%
CPU utilization reduction:
gro_flush_timeout=50000
napi_defer_hard_irqs=2
netperf -l 10 -H 192.168.1.1 -t UDP_STREAM -c -C -- -m 1470
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SS us/KB
before:
212992 1470 10.00 809632 0 952.0 42.98 14.792
212992 10.00 809630 952.0 50.66 8.719
after:
212992 1470 10.00 813686 0 956.8 32.14 11.009
212992 10.00 813686 956.8 50.05 8.570
[1] https://lore.kernel.org/netdev/20200422161329.56026-1-edumazet@google.com/
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On TI K3 am64x platform the issue with RX IRQ is observed - it's become
disabled forever after .ndo_stop(). The K3 CPSW driver manipulates RX IRQ
by using standard Linux enable_irq()/disable_irq_nosync() API as there is
no IRQ enable/disable options in CPSW HW itself, as result during
.ndo_stop() following sequence happens
phy_stop()
teardown TX/RX channels
wait for TX tdown complete
napi_disable(TX)
clean up TX channels
(a)
napi_disable(RX)
At point (a) it's not possible to predict if RX IRQ was triggered or not.
if RX IRQ was triggered then it also not possible to definitely say if RX
NAPI was run or only scheduled and immediately canceled by
napi_disable(RX). Actually the last case causes RX IRQ to be permanently
disabled.
Another observed issue is that RX IRQ enable counter become unbalanced if
(gro_flush_timeout =! 0) while (napi_defer_hard_irqs == 0):
Unbalanced enable for IRQ 44
WARNING: CPU: 0 PID: 10 at ../kernel/irq/manage.c:776 __enable_irq+0x38/0x80
__enable_irq+0x38/0x80
enable_irq+0x54/0xb0
am65_cpsw_nuss_rx_poll+0x2f4/0x368
__napi_poll+0x34/0x1b8
net_rx_action+0xe4/0x220
_stext+0x11c/0x284
run_ksoftirqd+0x4c/0x60
To avoid above issues introduce flag indicating if RX was actually disabled
before enabling it in am65_cpsw_nuss_rx_poll() and restore RX IRQ state in
.ndo_open()
Fixes: 4f7cce2724 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hence all users of davinci_cpdma switched to skb_put_padto() the frame
padding can be removed from it.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use skb_put_padto() instead of skb_padto() so skb->len also got updated, as
preparation for further removing frame padding from cpdma.
It also makes xmit path more clear and linear.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use skb_put_padto() instead of skb_padto() so skb->len also got updated, as
preparation for further removing frame padding from cpdma.
It also makes xmit path more clear and linear.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
add missing parameter (0, assuming we don't want buffer pre-alloc).
Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
589918df93 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
0fac6aa098 ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")
Follow the instructions from the commit message of the former commit
- removed the if conditions. When looking at commit 589918df93 ("net:
dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
note that the mask_iotag fields get removed by the following patch.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If both eswitches are in switchdev mode and the uplink representors
are enslaved to the same bond device create a shared FDB configuration.
When moving to shared FDB mode not only the hardware needs be configured
but the RDMA driver needs to reconfigure itself.
When such change is done, unload the RDMA devices, configure the hardware
and load the RDMA representors.
When destroying the lag (can happen if a PCI function is unbinded,
driver is unloaded or by just removing a netdev from the bond) make sure
to restore the system to the previous state only if possible.
For example, if a PCI function is unbinded there is no need to load the
representors as the device is going away.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Shared FDB allows to direct traffic from all the vports in the HCA to a
single eswitch. In order to do that three things are needed.
1) Point the ingress ACL of the slave uplink to that of the master.
With this, wire traffic from both uplinks will reach the same eswitch
with the same metadata where a single steering rule can catch traffic
from both ports.
2) Set the FDB root flow table of the slave's eswitch to that of the
master. As this flow table can change dynamically make sure to
sync it on any set root flow table FDB command.
This will make sure traffic from SFs, VFs, ECPFs and PFs reach the
master eswitch.
3) Split wire traffic at the eswitch manager egress ACL so that it's
directed to the native eswitch manager. We only treat wire traffic
from both ports the same at the eswitch level. If such traffic wasn't
handled in the eswitch it needs to reach the right representor to be
processed by software. For example LACP packets should *always*
reach the right uplink representor for correct operation.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If a netdev is removed from the lag the lag should be destroyed.
With downstream patches this might trigger a reconfiguration of
representors on a different eswitch and such we don't have the proper
locking to so from this path. Move the destruction to be done by the
workqueue.
As the destruction won't affect the netdev side it okay to do so.
The RDMA side will be reconfigured and it already coded to handle such
reconfiguration.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently when doing hardware lag we check the eswitch mode
but as this isn't done under a lock the check isn't valid.
As the code needs to sync between two different devices an extra
care is needed.
- When going to change eswitch mode, if hardware lag is active destroy it.
- While changing eswitch modes block any hardware bond creation.
- Delay handling bonding events until there are no mode changes in
progress.
- When attaching a new mdev to lag, block until there is no mode change
in progress. In order for the mode change to finish the interface lock
will have to be taken. Release the lock and sleep for 100ms to
allow forward progress. As this is a very rare condition (can happen if
the user unbinds and binds a PCI function while also changing eswitch
mode of the other PCI function) it has no real world impact.
As taking multiple eswitch mode locks is now required lockdep will
complain about a possible deadlock. Register a key per eswitch to make
lockdep happy.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When two mlx5 devices are paired in switchdev mode, always offload the
send-to-vport rule to the peer E-Switch. This allows to abstract
the logic when this is really necessary (single FDB) and combine
the logic of both cases into one.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This callback will allow to notify representors about relevant events
when in OFFLOADS mode. In downstream patches, this will be used to notify
about PAIR/UNPAIR devcom events.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
FTEs are added with mapped metadata which is saved per eswitch.
When uplink reps are bonded and we are in a single FDB mode,
we could fail to find metadata which was stored on one eswitch mapping
but not the other or with a different id.
To resolve this issue use shared mapping between eswitch ports.
We do not have any conflict using a single mapping, for a type,
between the ports.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Set the flow source param to local vport for the uplink rep
send-to-vport rule.
This will comply with the recent changes in SW steering that
use the flow source as an indication for the rule type - rx or tx.
Since the uplink send-to-vport rule is forwarding traffic to the wire
it has to indicate that it is an sx rule and can't use the any port
value in the flow source.
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In shared FDB there is only one eswitch which is active and it receives
traffic from all representors and all vports in the HCA.
While the Ethernet representor will always reside on its native PF
the IB representor will not. Extend send to vport rule creation to
support such flows. Need to account for source vport that sends the
traffic (on which the representors resides) and the target eswitch
the traffic which reach.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As shared FDB requires changes in two subsystems first expose the needed
core functions so the RDMA side can be changed.
mlx5_lag_is_master(): return true if a given mlx5 device is the lag master.
mlx5_lag_is_shared_fdb(): Returns true if the lag mode is shared FDB.
mlx5_lag_get_peer_mdev(): Return the peer mdev in lag.
The mentioned functions will be used by downstream patches in order
to add support for shared FDB for the RDMA side.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Export a function so users can retrieve the mellanox device that manages
the eswitch from the eswitch device.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Smatch says:
drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
Since vdev pointer is netdev private data accessing it after free_netdev()
call can cause use-after-free bug. Fix it by moving free_netdev() call at
the end of the function
Fixes: 6cca200362 ("vxge: cleanup probe error paths")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Smatch says:
drivers/net/ethernet/freescale/fec_main.c:3994 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev);
drivers/net/ethernet/freescale/fec_main.c:3995 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev);
Since fep pointer is netdev private data, accessing it after free_netdev()
call can cause use-after-free bug. Fix it by moving free_netdev() call at
the end of the function
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: a31eda65ba ("net: fec: fix clock count mis-match")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The am65_cpsw_port_offload_fwd_mark_update() causes NULL exception crash
when there is at least one disabled port and any other port added to the
bridge first time.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000858
pc : am65_cpsw_port_offload_fwd_mark_update+0x54/0x68
lr : am65_cpsw_netdevice_event+0x8c/0xf0
Call trace:
am65_cpsw_port_offload_fwd_mark_update+0x54/0x68
notifier_call_chain+0x54/0x98
raw_notifier_call_chain+0x14/0x20
call_netdevice_notifiers_info+0x34/0x78
__netdev_upper_dev_link+0x1c8/0x290
netdev_master_upper_dev_link+0x1c/0x28
br_add_if+0x3f0/0x6d0 [bridge]
Fix it by adding proper check for port->ndev != NULL.
Fixes: 2934db9bcb ("net: ti: am65-cpsw-nuss: Add netdevice notifiers")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the error code if bnx2x_alloc_fw_stats_mem() fails. The current
code returns success.
Fixes: ad5afc8936 ("bnx2x: Separate VF and PF logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit b0e8181762. Explicit
driver dependency on the bridge is no longer needed since
switchdev_bridge_port_{,un}offload() is no longer implemented by the
bridge driver but by switchdev.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the introduction of explicit offloading API in switchdev in commit
2f5dc00f7a ("net: bridge: switchdev: let drivers inform which bridge
ports are offloaded"), we started having Ethernet switch drivers calling
directly into a function exported by net/bridge/br_switchdev.c, which is
a function exported by the bridge driver.
This means that drivers that did not have an explicit dependency on the
bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
possible to call a symbol exported by a driver that can be built as
module unless you are a module too.
There was an attempt to solve the dependency issue in the form of commit
b0e8181762 ("net: build all switchdev drivers as modules when the
bridge is a module"). Grygorii Strashko, however, says about it:
| In my opinion, the problem is a bit bigger here than just fixing the
| build :(
|
| In case, of ^cpsw the switchdev mode is kinda optional and in many
| cases (especially for testing purposes, NFS) the multi-mac mode is
| still preferable mode.
|
| There were no such tight dependency between switchdev drivers and
| bridge core before and switchdev serviced as independent, notification
| based layer between them, so ^cpsw still can be "Y" and bridge can be
| "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
| will need to be set as "Y", or we will have to update drivers to
| support build with BRIDGE=n and maintain separate builds for
| networking vs non-networking testing. But is this enough? Wouldn't
| it cause 'chain reaction' required to add more and more "Y" options
| (like CONFIG_VLAN_8021Q)?
|
| PS. Just to be sure we on the same page - ARM builds will be forced
| (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
| automation testing will just fail with omap2plus_defconfig.
In the light of this, it would be desirable for some configurations to
avoid dependencies between switchdev drivers and the bridge, and have
the switchdev mode as completely optional within the driver.
Arnd Bergmann also tried to write a patch which better expressed the
build time dependency for Ethernet switch drivers where the switchdev
support is optional, like cpsw/am65-cpsw, and this made the drivers
follow the bridge (compile as module if the bridge is a module) only if
the optional switchdev support in the driver was enabled in the first
place:
https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/
but this still did not solve the fact that cpsw and am65-cpsw now must
be built as modules when the bridge is a module - it just expressed
correctly that optional dependency. But the new behavior is an apparent
regression from Grygorii's perspective.
So to support the use case where the Ethernet driver is built-in,
NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
need a framework that can handle the possible absence of the bridge from
the running system, i.e. runtime bloatware as opposed to build-time
bloatware.
Luckily we already have this framework, since switchdev has been using
it extensively. Events from the bridge side are transmitted to the
driver side using notifier chains - this was originally done so that
unrelated drivers could snoop for events emitted by the bridge towards
ports that are implemented by other drivers (think of a switch driver
with LAG offload that listens for switchdev events on a bonding/team
interface that it offloads).
There are also events which are transmitted from the driver side to the
bridge side, which again are modeled using notifiers.
SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
notifying the bridge that a MAC address has been dynamically learned.
So there is a precedent we can use for modeling the new framework.
The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
that the bridge needs to do when a port becomes offloaded is blocking in
its nature: replay VLANs, MDBs etc. The calling context is indeed
blocking (we are under rtnl_mutex), but the existing switchdev
notification chain that the bridge is subscribed to is only the atomic
one. So we need to subscribe the bridge to the blocking switchdev
notification chain too.
This patch:
- keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
unchanged
- moves the implementation of switchdev_bridge_port_{,un}offload from
the bridge module into the switchdev module.
- makes everybody that is subscribed to the switchdev blocking notifier
chain "hear" offload & unoffload events
- makes the bridge driver subscribe and handle those events
- moves the bridge driver's handling of those events into 2 new
functions called br_switchdev_port_{,un}offload. These functions
contain in fact the core of the logic that was previously in
switchdev_bridge_port_{,un}offload, just that now we go through an
extra indirection layer to reach them.
Unlike all the other switchdev notification structures, the structure
used to carry the bridge port information, struct
switchdev_notifier_brport_info, does not contain a "bool handled".
This is because in the current usage pattern, we always know that a
switchdev bridge port offloading event will be handled by the bridge,
because the switchdev_bridge_port_offload() call was initiated by a
NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
couldn't have happened.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid reconfig problems due to failures in netif_set_real_num_tx_queues()
by using netif_set_real_num_queues().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink trap group is registered but not released in error flow,
add the missing devlink_trap_groups_unregister() call.
Fixes: 0a9003f45e ("net: marvell: prestera: devlink: add traps/groups implementation")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a switch port is connected to a MAC, use the common dpaa2-mac support
for exporting the available MAC statistics.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the next patch, we'll add support for also exporting the MAC
statistics in the ethtool stats. Annotate already present HW stats with
a suggestive prefix.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Integrate the common MAC endpoint management support into the
dpaa2-switch driver as well. Nothing special happens here, just that the
already available dpaa2-mac functions are also called from dpaa2-switch.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of a switch DPAA2 object, the interface ID is also needed when
querying for the object endpoint. Extend fsl_mc_get_endpoint() so that
users can also pass the interface ID that are interested in.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to dpaa2_switch_port_link_state_update is a leftover from the
time when on DPAA2 platforms the PHYs were started at boot time so when
an ifconfig was issued on the associated interface, the link status
needed to be checked directly from the ndo_open() callback. This is not
needed anymore since we are now properly integrated with the PHY layer
thus a link interrupt will come directly from the PHY eventually without
the need to call the sync function.
Fix this up by removing the call to dpaa2_switch_port_link_state_update.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>