Perform CPT LF teardown in non FLR path as well via cpt_lf_free()
Currently CPT LF teardown and reset sequence is only
done when FLR is handled with CPT LF still attached.
This patch also fixes cpt_lf_alloc() to set EXEC_LDWB in
CPT_AF_LFX_CTL2 when being completely overwritten as that is
the default value and is better for performance.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch enables and registers interrupt handler for CPT HW
interrupts.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This big patch sprinkles const on local variables and
function arguments which may refer to netdev->dev_addr.
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Some of the changes here are not strictly required - const
is sometimes cast off but pointer is not used for writing.
It seems like it's still better to add the const in case
the code changes later or relevant -W flags get enabled
for the build.
No functional changes.
Link: https://lore.kernel.org/r/20211014142432.449314-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace dev->driver_name() by dev_driver_string() for the corresponding
struct device. This is a step toward removing pci_dev->driver.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/20211004125935.2300113-8-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This patch separates the logic of configuring hardware
maximum transmit frame size and receive frame size.
This simplifies the logic to calculate receive buffer
size and using cqe descriptor of different size.
Also additional size of skb_shared_info structure is
allocated for each receive buffer pointer given to
hardware which is not necessary. Hence change the
size calculation to remove the size of
skb_shared_info. Add a check for array out of
bounds while adding fragments to the network stack.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.
Remove the redundant 'flush_workqueue()' calls.
This was generated with coccinelle:
@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> #mlx*
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers pass in ETH_ALEN and the function itself
will return -EINVAL for any other address length.
Just assume it's ETH_ALEN like all other mac address
helpers (nvm, of, platform).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fwnode_get_mac_address() and device_get_mac_address()
return a pointer to the buffer that was passed to them
on success or NULL on failure. None of the callers
care about the actual value, only if it's NULL or not.
These semantics differ from of_get_mac_address() which
returns an int so to avoid confusion make the device
helpers return an errno.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new of_get_ethdev_address() helper for the cases
where dev->dev_addr is passed in directly as the destination.
@@
expression dev, np;
@@
- of_get_mac_address(np, dev->dev_addr)
+ of_get_ethdev_address(np, dev)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert all Ethernet drivers from memcpy(... dev->addr_len)
to eth_hw_addr_set():
@@
expression dev, np;
@@
- memcpy(dev->dev_addr, np, dev->addr_len)
+ eth_hw_addr_set(dev, np)
In theory addr_len may not be ETH_ALEN, but we don't expect
non-Ethernet devices to live under this directory, and only
the following cases of setting addr_len exist:
- cxgb4 for mgmt device,
and the drivers which set it to ETH_ALEN: s2io, mlx4, vxge.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update three drivers to use the new phylink_set_10g_modes() helper:
Cadence macb, Freescale DPAA2 and Marvell PP2.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert Ethernet from ether_addr_copy() to eth_hw_addr_set():
@@
expression dev, np;
@@
- ether_addr_copy(dev->dev_addr, np)
+ eth_hw_addr_set(dev, np)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert all Ethernet drivers from memcpy(... ETH_ADDR)
to eth_hw_addr_set():
@@
expression dev, np;
@@
- memcpy(dev->dev_addr, np, ETH_ALEN)
+ eth_hw_addr_set(dev, np)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable pin is being initialized with a value that is never
read, it is being updated later on in only one case of a switch
statement. The assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds XDP_PASS, XDP_TX, XDP_DROP and XDP_REDIRECT support
for netdev PF.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of ltype NPC_LT_LA_CPT_HDR, LA pointer is pointing to the
start of cpt parse header. Since cpt parse header has veriable
length padding, this will be a problem for DMAC extraction. Adding
KPU profile changes to adjust the LA pointer to start at ether header
in case of cpt parse header by
- Adding ptr advance in pkind 58 to a fixed value 40
- Adding variable length offset 7 and mask 7 (pad len in
CPT_PARSE_HDR).
Also added the missing static declaration for npc_set_var_len_offset_pkind
function.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds PTP PHC support to NIX VF interfaces. This enables
a VF to run PTP master/slave instance. PTP block being a shared
hardware resource it is recommended to avoid running multiple
PTP instances in the system which will impact the PTP clock
accuracy.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever the interface is brought up/down then set_rx_mode
function is called by the stack which enables promisc/allmulti
MCAM entries. But there are cases when driver brings
interface down and then up such as while changing number
of channels. In these cases promisc/allmulti MCAM entries
are left disabled as set_rx_mode callback is not called.
This patch enables these MCAM entries in all such cases.
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current driver uses software CQ head pointer to poll on CQE
header in memory to determine if CQE is valid. Software needs
to make sure, that the reads of the CQE do not get re-ordered
so much that it ends up with an inconsistent view of the CQE.
To ensure that DMB barrier after read to first CQE cacheline
and before reading of the rest of the CQE is needed.
But having barrier for every CQE read will impact the performance,
instead use hardware CQ head and tail pointers to find the
valid number of CQEs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTP hardware block can be configured to utilize
the external clock. Also the current ptp timestamp
can be captured when external trigger is applied on
a gpio pin. These features are required in scenarios
like connecting a external timing device to the chip
for time synchronization. The timing device provides
the clock and trigger(PPS signal) to the PTP block.
This patch does the following:
1. configures PTP block to use external clock
frequency and timestamp capture on external event.
2. sends PTP_REQ_EXTTS events to kernel ptp phc susbsytem
with captured timestamps
3. aligns PPS edge to adjusted ptp clock in the ptp device
by setting the PPS_THRESH to the reminder of the last
timestamp value captured by external PPS
Signed-off-by: Yi Guo <yig@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input clock frequency of PTP block is figured
out from hardware reset block currently. The firmware
data already has this info in sclk. Hence simplify
ptp driver to use sclk from firmware data.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAC on CN10K support hardware timestamping such that 8 bytes addition
header is prepended to incoming packets. This patch does necessary
configuration to enable Hardware time stamping upon receiving request
from PF netdev interfaces.
Timestamp configuration is different on MAC (CGX) Octeontx2 silicon
and MAC (RPM) OcteonTX3 CN10k. Based on silicon variant appropriate
fn() pointer is called. Refactor MAC specific mbox messages to remove
unnecessary gaps in mboxids.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon receiving ptp config request from netdev interface , Octeontx2 MAC
block CGX is configured to append timestamp to every incoming packet
and NPC config is updated with DMAC offset change.
Currently this configuration is not reset in FLR handler. This patch
resets the same.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate devlink registrations and traps registrations so devlink will
be registered when driver is fully initialized.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change prevents from users to access device before devlink is fully
configured. This change allows us to delete call to devlink_params_publish()
and impossible check during unregister flow.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimized KPU1 entry processing for variable-length custom L2 headers
of size 24B, 90B by
- Moving LA LTYPE parsing for 24B and 90B headers to PKIND.
- Removing LA flags assignment for 24B and 90B headers.
- Reserving a PKIND 55 to parse variable length headers.
Also, new mailbox(NPC_SET_PKIND) added to configure PKIND with
corresponding variable-length offset, mask, and shift count
(NPC_AF_KPUX_ENTRYX_ACTION0).
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With current KPU profile, while parsing GTPU packets, GTPU payload
is also being parsed and GTPU PDU payload is being treated as IPV4
data, which is not correct. In case of GTPU packets, parsing should
be stopped after identifying the GTPU. Adding changes to limit KPU
profile parsing for GTPU payload.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
devlink_register() can't fail and always returns success, but all drivers
are obligated to check returned status anyway. This adds a lot of boilerplate
code to handle impossible flow.
Make devlink_register() void and simplify the drivers that use that
API call.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per HW errata AQ modification to CQ could be discarded on heavy
traffic. This patch implements workaround for the same after each
CQ write by AQ check whether the requested fields (except those
which HW can update eg: avg_level) are properly updated or not.
If CQ context is not updated then perform AQ write again.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sky2 is parsing the VPD and adds the parsed information to its debugfs
file. This isn't needed in kernel, userspace tools like lspci can be
used to display such information nicely. Therefore remove this from
the driver.
lspci -vv:
Capabilities: [50] Vital Product Data
Product Name: Marvell Yukon 88E8070 Gigabit Ethernet Controller
Read-only fields:
[PN] Part number: Yukon 88E8070
[EC] Engineering changes: Rev. 1.0
[MN] Manufacture ID: Marvell
[SN] Serial number: AbCdEfG970FD4
[CP] Extended capability: 01 10 cc 03
[RV] Reserved: checksum good, 9 byte(s) reserved
Read/write fields:
[RW] Read-write area: 1 byte(s) free
End
Relevant part in debugfs file:
0000:01:00.0 Product Data
Marvell Yukon 88E8070 Gigabit Ethernet Controller
Part Number: Yukon 88E8070
Engineering Level: Rev. 1.0
Manufacturer: Marvell
Serial Number: AbCdEfG970FD4
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/bbaee8ab-9b2e-de04-ee7b-571e094cc5fe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The variable blkaddr is being initialized with a value that is never
read, it is being updated later on in a for-loop. The assignment is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case where the condition !is_rvu_otx2(rvu) is false variable
val is not initialized and can contain a garbage value. Fix this by
initializing val to zero and bit-wise or'ing in BIT_ULL(51) to val
for the true condition case of !is_rvu_otx2(rvu).
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 4b5a3ab17c ("octeontx2-af: Hardware configuration for inline IPsec")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On OcteonTX2/CN10K SoC, the admin function (AF) is the only one
with all priviliges to configure HW and alloc resources, PFs and
it's VFs have to request AF via mailbox for all their needs.
This patch adds new mailbox messages for CPT PFs and VFs to configure
HW resources for inline-IPsec.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K MAC block (RPM) differs in number of stats compared to Octeontx2
MAC block (CGX). RPM supports stats for each class of PFC and error
packets etc. It would be difficult for user to read stats from ethtool
and map to their definition.
New debugfs file is already added to read RPM stats along with their
definition. This patch adds proper checks such that RPM stats will not
be part of ethtool.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Memory allocated before 'lmac' is stored in 'cgx->lmac_idmap[]' must be
freed explicitly. Otherwise, in case of error, it will leak.
Rename the 'err_irq' label to better describe what is done at this place in
the error handling path.
Fixes: 6f14078e3e ("octeontx2-af: DMAC filter support in MAC block")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to match 'rvu_alloc_bitmap()', add a 'rvu_free_bitmap()' function
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reserves the LMTST lines per cpu instead
of separate LMTST lines for NPA(buffer free) and NIX(sqe flush).
LMTST line of the core on which SQ or RQ is processed is used
for LMTST operation.
This patch also replace STEOR with STEORL release semantics and
updates driver name in ethtool file.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check one more time before exiting the API with an error.
Fix API to poll at least twice, in case there are other high priority
tasks and this API doesn't get CPU cycles for multiple jiffies update.
In addition, increase timeout from usecs_to_jiffies(10000) to
usecs_to_jiffies(20000), to prevent the case that for CONFIG_100HZ
timeout will be a single jiffies.
A single jiffies results actual timeout that can be any time between
1usec and 10msec. To solve this, a value of usecs_to_jiffies(20000)
ensures that timeout is 2 jiffies.
Signed-off-by: Smadar Fuks <smadarf@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With current config, for packets with IPv4 checksum errors,
errorcode is being set to UNKNOWN. Hence added a separate
errorcodes for outer and inner IPv4 checksum and changed
NPC configuration accordingly.
Also turn on L2 multicast address check in NPC protocol check block.
Fixes: 6b3321bacc ("octeontx2-af: Enable packet length and csum validation")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the static code analyzer reported issues
in rvu_npc.c. The reported errors are different sizes of
operands in bitops and returning uninitialized values.
Fixes: 651cd26523 ("octeontx2-af: MCAM entry installation support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In npc_update_vf_flow_entry function the loop cursor
'index' is being changed inside the loop causing
the loop to spin forever. This in turn hogs the kworker
thread forever and no other mbox message is processed
by AF driver after that. Fix this by using
another variable in the loop.
Fixes: 55307fcb92 ("octeontx2-af: Add mbox messages to install and delete MCAM rules")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the given counter does not belong to the entry
then code ends up in infinite loop because the loop
cursor, entry is not getting updated further. This
patch fixes that by updating entry for every iteration.
Fixes: a958dd59f9 ("octeontx2-af: Map or unmap NPC MCAM entry and counter")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For better performance set hardware to use NDC TX for reading packet
data specified NIX_SEND_SG_S.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NPC extraction profile marks layer types
NPC_LT_LB_CTAG for CTAG and NPC_LT_LB_STAG_QINQ for
STAG after parsing input packet. Those layer types
can be used to install ntuple filters using
vlan-etype option. Below are the commands and
corresponding behavior with this patch in place.
> alias nt "ethtool -U eth0 flow-type ether"
> nt vlan 5 m 0xf000 action 0
Input packets with outer VLAN id as 5 i.e,
stag packets with VLAN id 5 and ctag packets with
VLAN id as 5 are hit.
> nt vlan-etype 0x8100 action 0
All input ctag packets with any VLAN id are hit.
> nt vlan-etype 0x88A8 action 0
All input stag packets with any VLAN id are hit.
> nt vlan-etype 0x8100 vlan 5 m 0xf000 action 0
All input ctag packets with VLAN id 5 are hit.
> nt vlan-etype 0x88A8 vlan 5 m 0xf000 action 0
All input stag packets with VLAN id 5 are hit.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed inconsistent license text across the RVU admin
function driver.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed inconsistent license text across the netdev
drivers.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhancing the mailbox scope to support important configurations
like enabling scheduled LMTST, disable LMTLINE prefetch, disable
early completion for ordered LMTST, as per request from the
application. On FLR these configurations will be reset to default.
This patch also adds the 95XXO silicon version to octeontx2 silicon
list.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If otx2_mbox_get_rsp() fails, otx2_set_flowkey_cfg() need return an
error code.
Fixes: e793836545 ("octeontx2-pf: Fix algorithm index in MCAM rules with RSS action")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added mbox for PF/VF drivers to retrieve current ingress bandwidth
profile free count. Also added current policer timeunit
configuration info based on which ratelimiting decisions can be
taken by PF/VF drivers.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New usecases are popping up where in user wants to install common MCAM
filters for all interfaces. Having channel verification will result in
duplicating such MCAM filters for each of the ingress interface. Hence
removed channel verification.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K slicon has different device id for PTP device.
Hence this patch updates the driver with new id.
Though ptp driver being a separate driver AF manages
configuring PTP block by all PFs. To manage ptp, AF
driver checks in its probe whether
1. ptp hardware device found on silicon
2. A driver is bound to ptp device
3. The ptp driver probe is successful
In failure of cases 1 and 3, AF proceeds with out ptp
and for case 2 defers the probe. This patch refactors
code also to check for all the PTP device ids given in
ptp device ids table for case 1.
Also added PTP device ID for 95O silicon
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon receiving the MBOX_MSG_FREE_RSRC_CNT, the AF will find out the
current number of free resources and reply it back to the requester. No
guarantee is given on the future state of the free resources yet.
If another requester sends MBOX_MSG_ATTACH_RESOURCES after this call,
the number of available resources might change.
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for packet IO via SDK links which is used when
Octeon is connected as a end-point. Traffic host to end-point
and vice versa flow through SDP links. This patch also support
dual SDP blocks supported in 98xx silicon.
Signed-off-by: Radha Mohan Chintakuntla <radhac@marvell.com>
Signed-off-by: Nalla Pradeep <pnalla@marvell.com>
Signed-off-by: Subrahmanyam Nilla <snilla@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 98xx, there are 2 NIX blocks and 4 LBK blocks present. The way
these NIX-LBK should be configured depends on the use case. By
default loopback functionality is supported in AF VF pairs which
are attached to NIX0 and NIX1 LFs alternatively to ensure load
balancing. NIX0 transmits a packet to LBK1 which will be received
by NIX1 and packet transmitted by NIX1 will get received by NIX0 via
LBK2.
There are some requirements where only one AF VF is used and respective
NIX is expected to operate in a mode where it can receive it own packet
back. This can be achieved if NIX0 sends packet to LBK0 and not LBK1.
Adding a flag in LF alloc request mailbox which can setup NIX0 to use
LBK0 and NIX1 can use LBK3.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike OcteonTx2, the channel numbers used by CGX/RPM
and LBK on CN10K silicons aren't fixed in HW. They are
SW programmable, hence we cannot derive transmit link
from static channel numbers anymore. Get the same from
admin function via mailbox.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before C0 HW revision, The RSS adder was computed based the
following static formula.
rss_adder<7:0> = flow_tag<7:0> ^ flow_tag<15:8> ^
flow_tag<23:16> ^ flow_tag<31:24>
The above scheme has the following drawbacks:
1) It is not in line with other standard NIC behavior.
2) There can be an SW use case where SW can compute the hash
upfront using Toeplitz function and predict the queue selection
to optimize some packet lookup function. The nonstandard
way of doing XOR makes the consumer to not predict the queue selection.
C0 HW revision onwards, The HW can configure the
rss_adder<7:0> as flow_tag<7:0> to align with standard NICs.
This patch adds an option to select legacy RSS adder mode
vs standard NIC behavior by setting NIX_LF_RSS_TAG_LSB_AS_ADDER flag.
Since this bit field is used as reserved in old HW revisions,
No need to have an additional HW version check.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting from 96xx C0 onwards all silicons support traffic shaping.
This patch enables that feature along with other changes
- When PIR/CIR shaping config is modified, toggle SW_XOFF
for config to take effect
- Before SMQ flush, clear SW_XOFF at all parent schedulers
- Support to read current transmit scheduler configuration via mbox
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NIX_AF_TX_LINKX_NORM_CREDIT holds running counter of
tx credits available per link. But, tx credits should be
configured based on MTU config. So MTU change needs tx
credit count update.
An issue exists whereby when both PF & VF are enabled and
PF traffic is flowing, if VF requests for MTU update,
updating the NORM_CREDIT register will lead to corruption
of credit count and subsequent deadlock of tx link as
the NORM_CREDIT register holds running count.
This patch provides workaround by pausing link traffic
using NIX_AF_TL1X_SW_XOFF, waiting for existing packets to
drain, and used credits be returned before updating new
credit count.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear and disable interrupt before queueing work as there might be
a chance that work gets completed on other core faster and
interrupt enable as a part of the work completes before
interrupt disable in the interrupt context. This leads to
permanent disable of interrupt.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set NPA batch allocation engine to process 35 cache lines
per turn on CN10k platform.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support more coalesce parameters through netlink,
add two new parameter kernel_coal and extack for .set_coalesce
and .get_coalesce, then some extra info can return to user with
the netlink API.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to Armada XP datasheet bit at 0 position is corresponding for
TxInProg indication.
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RVU SMMU widget stores the final translated PA at
RVU_AF_SMMU_TLN_FLIT0<57:18> instead of FLIT1 register. This patch
fixes the address translation logic to use the correct register.
Fixes: 893ae97214 ("octeontx2-af: cn10k: Support configurable LMTST regions")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherthan setting action as RSS in NPC MCAM entry, RSS flowkey
algorithm index also needs to be set. Otherwise whatever algorithm
is defined at flowkey index '0' will be considered by HW and pkt
flows will be distributed as such.
Fix this by saving the flowkey index sent by admin function while
initializing RSS and then use it when framing MCAM rules.
Fixes: 81a4362016 ("octeontx2-pf: Add RSS multi group support")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever user changes interface MAC address both default DMAC based
MCAM rule and VLAN offload (for strip) rules are updated with new
MAC address. To update or install VLAN offload rule PF driver needs
interface's receive channel info, which is retrieved from admin
function at the time of NIXLF initialization.
If user changes MAC address before interface is UP, VLAN offload rule
installation will fail and throw error as receive channel is not valid.
To avoid this, skip VLAN offload rule installation if netdev is not UP.
This rule will anyway be reinslatted as part of open() call.
Fixes: fd9d7859db ("octeontx2-pf: Implement ingress/egress VLAN offload")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bandwidth profiles (ipolicer structure)is implemented only on CN10K
platform. But current code try to free the ipolicer memory without
checking the capibility flag leading to driver crash on OCTEONTX2
platform. This patch fixes the issue by add capability flag check.
Fixes: e8e095b3b3 ("octeontx2-af: cn10k: Bandwidth profiles config support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K platform requires physically contiguous memory for LMTST
operations which goes beyond a single page. Not having physically
contiguous memory will result in HW fetching transmit descriptors from
a wrong memory location.
Hence use DMA_ATTR_FORCE_CONTIGUOUS attribute while allocating
LMTST regions.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the erroneous vlan priority mask field that was
send to npc_install_flow_req.
Fixes: 1d4d9e42c2 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Supported link modes are updated by firmware in shared
structure per interface. Kernel uses this value to display
supported link modes via ethtool.
Currently there is extra validation that firmware updated
modes are validated against internal list of supported modes.
As intenal list of supported modes are not updated frequently
new modes supported by firmware are not updated to ethtool.
Hence remove extra validation and report all firmware updated
modes.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Print debug message if any of the RVU hardware blocks
reset fails.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per hardware the base channel number configured
for programmable channels of a block must be multiple
of number of channels of that block. This condition
is not met for SDP base channel currently. Hence this
patch ensures all the base channel numbers of all
blocks are multiple of number of channels present in
the blocks. Also instead of hardcoding SDP number
of channels the same is read from the NIX_AF_CONST1
register.
Fixes: 242da43921 ("octeontx2-af: cn10k: Add support for programmable")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'bp_ena' in Aura context is NIX block index, setting it
zero will always backpressure NIX0 block, even if NIXLF
belongs to NIX1. Hence fix this by setting it appropriately
based on NIX block address.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that mcam flows are allocated
before adding or destroying the flows.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent change added error checking messages and failed to remove one
of the previous error checks. There are now two checks on variable err
so the second one is redundant dead code and can be removed.
Addresses-Coverity: ("Logically dead code")
Fixes: a83bdada06 ("octeontx2-af: Add debug messages for failures")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210818130927.33895-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VLAN TCI is a 16 bit field which includes Priority(3 bits),
CFI(1 bit) and VID(12 bits). Currently ntuple filters support
installing rules to steer packets based on VID only.
This patch extends that support such that filters can
be installed for entire VLAN TCI.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On CN10K, the higher bits in the channel number represents the CPT
channel number. Mask out these higher bits in the npc configuration
to allow packets from cpt for parsing.
Signed-off-by: Vidya <vvelumuri@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The way SW can identify the number NPC counters supported by silicon
has changed for CN10K. This patch addresses this reading appropriate
registers to find out number of counters available.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the mcam entry allocation request is from PF
and NOT a priority allocation request then allocate
low priority entries so that PF entries always have
lower priority than its VFs. This is required so
that entries with (base) MCAM match criteria have lower
priority compared to entries with (base + additional)
match criteria. This patch considers only best case
scenario where PF entries are allocated from low
priority zone if low priority zone has free space.
There are worst case scenarios like:
1. VFs allocating hundreds of MCAM entries leading to VFs
using all mid priority zone and low priority zone entries
hence no entries free from low priority zone for PF.
2. All the PFs and VFs in the system allocating and freeing
entries causing fragmentation in MCAM space and all the
entries requested by PF could not fit in low priority
zone for allocation.
This patch do not handle worst case scenarios.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for setting or modifying MCAM entry count at
runtime via devlink params.
commands:
devlink dev param show
pci/0002:02:00.0:
name mcam_count type driver-specific
values:
cmode runtime value 16
devlink dev param set pci/0002:02:00.0 name mcam_count
value 64 cmode runtime
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variables used for TC flow management like maximum number
of flows, number of flows installed etc are a copy of ntuple
flow management variables. Since both TC and NTUPLE are not
supported at the same time, it's better to unify these with
common variables.
This patch addresses this unification and also does cleanup of
other minor stuff wrt TC.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per single mailbox request a maximum of 256 MCAM entries
can be allocated. If more than 256 are being allocated, then
the mcam indices in the final list could get jumbled. Hence
sort the indices.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add packet flow classification support for both LMAC mapped virtual
functions and loopback VFs. This patch adds supports for ntuple
offload feature.
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabled NETIF_F_RXALL support for VF driver.
Also removed MTU range comments which are no longer valid.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added debug messages for various failures during probe.
This will help in quickly identifying the API where the failure
is happening.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add appropriate error codes to be used when returning from AF
mailbox handlers due to some error condition.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When installing a flow using npc_install_flow
mailbox there are number of reasons to reject
the request like caller is not permitted,
invalid channel specified in request, flow
not supported in extraction profile and so on.
Hence define new error codes for npc flows and use
them instead of generic error codes.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'imply' keyword does not do what most people think it does, it only
politely asks Kconfig to turn on another symbol, but does not prevent
it from being disabled manually or built as a loadable module when the
user is built-in. In the ICE driver, the latter now causes a link failure:
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':
This is a recurring problem in many drivers, and we have discussed
it several times befores, without reaching a consensus. I'm providing
a link to the previous email thread for reference, which discusses
some related problems.
To solve the dependency issue better than the 'imply' keyword, introduce a
separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
can depend on if it is able to use PTP support when available, but works
fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
then prevented from being built-in, the same way as with a 'depends on
PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
but that can be rather confusing when you first see it.
Since this should cover the dependencies correctly, the IS_REACHABLE()
hack in the header is no longer needed now, and can be turned back
into a normal IS_ENABLED() check. Any driver that gets the dependency
wrong will now cause a link time failure rather than being unable to use
PTP support when that is in a loadable module.
However, the two recently added ptp_get_vclocks_index() and
ptp_convert_timestamp() interfaces are only called from builtin code with
ethtool and socket timestamps, so keep the current behavior by stubbing
those out completely when PTP is in a loadable module. This should be
addressed properly in a follow-up.
As Richard suggested, we may want to actually turn PTP support into a
'bool' option later on, preventing it from being a loadable module
altogether, which would be one way to solve the problem with the ethtool
interface.
Fixes: 06c16d89d2 ("ice: register 1588 PTP clock device object for E810 devices")
Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/
Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/
Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.
To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.
Fixes: 2c4eca3ef7 ("net: bridge: switchdev: include local flag in FDB notifications")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, page->pp is cleared and set everytime the page
is recycled, which is unnecessary.
So only set the page->pp when the page is added to the page
pool and only clear it when the page is released from the
page pool.
This is also a preparation to support allocating frag page
in page pool.
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().
Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.
Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.
[ 8.758862] devlink_nl_param_fill+0x2e8/0xe50
[ 8.760305] devlink_param_notify+0x6d/0x180
[ 8.760435] __devlink_params_register+0x2f1/0x670
[ 8.760558] devlink_params_register+0x1e/0x20
The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On s390, the following build warning occurs:
drivers/net/ethernet/marvell/mvpp2/mvpp2.h:844:2: warning: overflow in
conversion from 'long unsigned int' to 'int' changes value from
'18446744073709551584' to '-32' [-Woverflow]
844 | ((total_size) - MVPP2_SKB_HEADROOM - MVPP2_SKB_SHINFO_SIZE)
This happens because MVPP2_SKB_SHINFO_SIZE, which is 320 bytes (which is
already 64-byte aligned) on some architectures, actually gets ALIGN'd up
to 512 bytes in the s390 case.
So then, when this is invoked:
MVPP2_RX_MAX_PKT_SIZE(MVPP2_BM_SHORT_FRAME_SIZE)
...that turns into:
704 - 224 - 512 == -32
...which is not a good frame size to end up with! The warning above is a
bit lucky: it notices a signed/unsigned bad behavior here, which leads
to the real problem of a frame that is too short for its contents.
Increase MVPP2_BM_SHORT_FRAME_SIZE by 32 (from 704 to 736), which is
just exactly big enough. (The other values can't readily be changed
without causing a lot of other problems.)
Fixes: 07dd0a7aae ("mvpp2: add basic XDP support")
Cc: Sven Auhagen <sven.auhagen@voleatech.de>
Cc: Matteo Croce <mcroce@microsoft.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
add missing parameter (0, assuming we don't want buffer pre-alloc).
Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
589918df93 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
0fac6aa098 ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")
Follow the instructions from the commit message of the former commit
- removed the if conditions. When looking at commit 589918df93 ("net:
dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
note that the mask_iotag fields get removed by the following patch.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Devlink trap group is registered but not released in error flow,
add the missing devlink_trap_groups_unregister() call.
Fixes: 0a9003f45e ("net: marvell: prestera: devlink: add traps/groups implementation")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't populate the const array name on the stack but instead it
static. Makes the object code smaller by 28 bytes. Add a missing
const to clean up a checkpatch warning.
Before:
text data bss dec hex filename
124565 31565 384 156514 26362 drivers/net/ethernet/marvell/sky2.o
After:
text data bss dec hex filename
124441 31661 384 156486 26346 drivers/net/ethernet/marvell/sky2.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801150647.145728-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Program SQ, MDQ, TL4 to TL2 transmit scheduler queues' DWRR
weight based on DWRR MTU programmed at NIX_AF_DWRR_RPM_MTU.
The DWRR MTU from admin function is retrieved via mbox.
On OcteaonTx2 silicon, admin function driver responds with DWRR
MTU as '1'. This helps to avoid silicon specific transmit
scheduler DWRR quantum/weight configuration logic.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On OcteonTx2 DWRR quantum is directly configured into each of
the transmit scheduler queues. And PF/VF drivers were free to
config any value upto 2^24.
On CN10K, HW is modified, the quantum configuration at scheduler
queues is in terms of weight. And SW needs to setup a base DWRR MTU
at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
'DWRR MTU * weight' to get the quantum. For LBK traffic, value
programmed into NIX_AF_DWRR_RPM_MTU register is considered as
DWRR MTU.
This patch programs a default DWRR MTU of 8192 into HW and also
provides a way to change this via devlink params.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most users of ndo_do_ioctl are ethernet drivers that implement
the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
Separate these from the few drivers that use ndo_do_ioctl to
implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
This is a purely cosmetic change intended to help readers find
their way through the implementation.
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NIX_RX_SW_SYNC ensures all existing transactions are finished and
pkts are written to LLC/DRAM, queues should be teared down after
successful SW_SYNC. Due to a HW errata, in some rare scenarios
an existing transaction might end after SW_SYNC operation. To
ensure operation is fully done, do the SW_SYNC twice.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid configure backpressure for LBK links as they
don't support it and enable lmacs before configuration
pause frames.
Fixes: 75f3627099 ("octeontx2-pf: Support to enable/disable pause frames via ethtool")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the existing code while changing the number of TX/RX
queues using ethtool the PF/VF interface resources are
freed and reallocated (otx2_stop and otx2_open is called)
if the device is in running state. If any resource allocation
fails in otx2_open, driver free already allocated resources
and return. But again, when the number of queues changes
as the device state still running oxt2_stop is called.
In which we try to free already freed resources leading
to driver crash.
This patch fixes the issue by setting the INTF_DOWN flag on
error and free the resources in otx2_stop only if the flag is
not set.
Fixes: 50fe6c02e5 ("octeontx2-pf: Register and handle link notifications")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <Sunil.Goutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently PKINDs are not assigned to LBK channels.
The default value of LBK_CHX_PKIND (channel to PKIND mapping) register
is zero, which is resulting in a overlap of pkind between LBK and CGX
LMACs. When KPU1 parser config is modified when PTP timestamping is
enabled on the CGX LMAC interface it is impacting traffic on LBK
interfaces as well.
This patch fixes the issue by reserving the PKIND#0 for LBK devices.
CGX mapped PF pkind starts from 1 and also fixes the max pkind available.
Fixes: 421572175b ("octeontx2-af: Support to enable/disable HW timestamping")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get the number of VFs of a PF correctly by calling
rvu_get_pf_numvfs in rvu_switch_disable function.
Also hwvf is not required hence remove it.
Fixes: 23109f8dd0 ("octeontx2-af: Introduce internal packet switching")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added mailbox id to name translation on trace entry for
better tracing output.
Before the change:
otx2_msg_process: [0002:01:00.0] msg:(0x03) error:0
After the change:
otx2_msg_process: [0002:01:00.0] msg:(DETACH_RESOURCES) error:0
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow switchdevs to forward frames from the CPU in accordance with the
bridge configuration in the same way as is done between bridge
ports. This means that the bridge will only send a single skb towards
one of the ports under the switchdev's control, and expects the driver
to deliver the packet to all eligible ports in its domain.
Primarily this improves the performance of multicast flows with
multiple subscribers, as it allows the hardware to perform the frame
replication.
The basic flow between the driver and the bridge is as follows:
- When joining a bridge port, the switchdev driver calls
switchdev_bridge_port_offload() with tx_fwd_offload = true.
- The bridge sends offloadable skbs to one of the ports under the
switchdev's control using skb->offload_fwd_mark = true.
- The switchdev driver checks the skb->offload_fwd_mark field and lets
its FDB lookup select the destination port mask for this packet.
v1->v2:
- convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long
- introduce a static key "br_switchdev_fwd_offload_used" to minimize the
impact of the newly introduced feature on all the setups which don't
have hardware that can make use of it
- introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache
line access
- reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in
__br_forward()
- do not strip VLAN on egress if forwarding offload on VLAN-aware bridge
is being used
- propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP
v2->v3:
- replace the solution based on .ndo_dfwd_add_station with a solution
based on switchdev_bridge_port_offload
- rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD
v3->v4: rebase
v4->v5:
- make sure the static key is decremented on bridge port unoffload
- more function and variable renaming and comments for them:
br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload
br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload
nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom
nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload
fwd_accel to tx_fwd_offload
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove devm_kfree of memory where VLAN entry to RVU PF mapping
info is saved. This will be freed anyway at driver exit.
Having this could result in warning from devm_kfree() if
the memory is not allocated due to errors in rvu_nix_block_init()
before nix_setup_txvlan().
Fixes: 9a946def26 ("octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting with commit 4f2673b3a2 ("net: bridge: add helper to replay
port and host-joined mdb entries"), DSA has introduced some bridge
helpers that replay switchdev events (FDB/MDB/VLAN additions and
deletions) that can be lost by the switchdev drivers in a variety of
circumstances:
- an IP multicast group was host-joined on the bridge itself before any
switchdev port joined the bridge, leading to the host MDB entries
missing in the hardware database.
- during the bridge creation process, the MAC address of the bridge was
added to the FDB as an entry pointing towards the bridge device
itself, but with no switchdev ports being part of the bridge yet, this
local FDB entry would remain unknown to the switchdev hardware
database.
- a VLAN/FDB/MDB was added to a bridge port that is a LAG interface,
before any switchdev port joined that LAG, leading to the hardware
database missing those entries.
- a switchdev port left a LAG that is a bridge port, while the LAG
remained part of the bridge, and all FDB/MDB/VLAN entries remained
installed in the hardware database of the switchdev port.
Also, since commit 0d2cfbd41c ("net: bridge: ignore switchdev events
for LAG ports which didn't request replay"), DSA introduced a method,
based on a const void *ctx, to ensure that two switchdev ports under the
same LAG that is a bridge port do not see the same MDB/VLAN entry being
replayed twice by the bridge, once for every bridge port that joins the
LAG.
With so many ordering corner cases being possible, it seems unreasonable
to expect a switchdev driver writer to get it right from the first try.
Therefore, now that DSA has experimented with the bridge replay helpers
for a little bit, we can move the code to the bridge driver where it is
more readily available to all switchdev drivers.
To convert the switchdev object replay helpers from "pull mode" (where
the driver asks for them) to a "push mode" (where the bridge offers them
automatically), the biggest problem is that the bridge needs to be aware
when a switchdev port joins and leaves, even when the switchdev is only
indirectly a bridge port (for example when the bridge port is a LAG
upper of the switchdev).
Luckily, we already have a hook for that, in the form of the newly
introduced switchdev_bridge_port_offload() and
switchdev_bridge_port_unoffload() calls. These offer a natural place for
hooking the object addition and deletion replays.
Extend the above 2 functions with:
- pointers to the switchdev atomic notifier (for FDB replays) and the
blocking notifier (for MDB and VLAN replays).
- the "const void *ctx" argument required for drivers to be able to
disambiguate between which port is targeted, when multiple ports are
lowers of the same LAG that is a bridge port. Most of the drivers pass
NULL to this argument, except the ones that support LAG offload and have
the proper context check already in place in the switchdev blocking
notifier handler.
Also unexport the replay helpers, since nobody except the bridge calls
them directly now.
Note that:
(a) we abuse the terminology slightly, because FDB entries are not
"switchdev objects", but we count them as objects nonetheless.
With no direct way to prove it, I think they are not modeled as
switchdev objects because those can only be installed by the bridge
to the hardware (as opposed to FDB entries which can be propagated
in the other direction too). This is merely an abuse of terms, FDB
entries are replayed too, despite not being objects.
(b) the bridge does not attempt to sync port attributes to newly joined
ports, just the countable stuff (the objects). The reason for this
is simple: no universal and symmetric way to sync and unsync them is
known. For example, VLAN filtering: what to do on unsync, disable or
leave it enabled? Similarly, STP state, ageing timer, etc etc. What
a switchdev port does when it becomes standalone again is not really
up to the bridge's competence, and the driver should deal with it.
On the other hand, replaying deletions of switchdev objects can be
seen a matter of cleanup and therefore be treated by the bridge,
hence this patch.
We make the replay helpers opt-in for drivers, because they might not
bring immediate benefits for them:
- nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(),
so br_vlan_replay() should not do anything for the new drivers on
which we call it. The existing drivers where there was even a slight
possibility for there to exist a VLAN on a bridge port before they
join it are already guarded against this: mlxsw and prestera deny
joining LAG interfaces that are members of a bridge.
- br_fdb_replay() should now notify of local FDB entries, but I patched
all drivers except DSA to ignore these new entries in commit
2c4eca3ef7 ("net: bridge: switchdev: include local flag in FDB
notifications"). Driver authors can lift this restriction as they
wish, and when they do, they can also opt into the FDB replay
functionality.
- br_mdb_replay() should fix a real issue which is described in commit
4f2673b3a2 ("net: bridge: add helper to replay port and host-joined
mdb entries"). However most drivers do not offload the
SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw
offload this switchdev object, and I don't completely understand the
way in which they offload this switchdev object anyway. So I'll leave
it up to these drivers' respective maintainers to opt into
br_mdb_replay().
So most of the drivers pass NULL notifier blocks for the replay helpers,
except:
- dpaa2-switch which was already acked/regression-tested with the
helpers enabled (and there isn't much of a downside in having them)
- ocelot which already had replay logic in "pull" mode
- DSA which already had replay logic in "pull" mode
An important observation is that the drivers which don't currently
request bridge event replays don't even have the
switchdev_bridge_port_{offload,unoffload} calls placed in proper places
right now. This was done to avoid unnecessary rework for drivers which
might never even add support for this. For driver writers who wish to
add replay support, this can be used as a tentative placement guide:
https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/
Cc: Vadym Kochan <vkochan@marvell.com>
Cc: Taras Chornyi <tchornyi@marvell.com>
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
Signed-off-by: David S. Miller <davem@davemloft.net>
On reception of an skb, the bridge checks if it was marked as 'already
forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
is, it assigns the source hardware domain of that skb based on the
hardware domain of the ingress port. Then during forwarding, it enforces
that the egress port must have a different hardware domain than the
ingress one (this is done in nbp_switchdev_allowed_egress).
Non-switchdev drivers don't report any physical switch id (neither
through devlink nor .ndo_get_port_parent_id), therefore the bridge
assigns them a hardware domain of 0, and packets coming from them will
always have skb->offload_fwd_mark = 0. So there aren't any restrictions.
Problems appear due to the fact that DSA would like to perform software
fallback for bonding and team interfaces that the physical switch cannot
offload.
+-- br0 ---+
/ / | \
/ / | \
/ | | bond0
/ | | / \
swp0 swp1 swp2 swp3 swp4
There, it is desirable that the presence of swp3 and swp4 under a
non-offloaded LAG does not preclude us from doing hardware bridging
beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
enough that software bridging between {swp0,swp1,swp2} and bond0 is not
impractical.
But this creates an impossible paradox given the current way in which
port hardware domains are assigned. When the driver receives a packet
from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
something.
- If we set it to 0, then the bridge will forward it towards swp1, swp2
and bond0. But the switch has already forwarded it towards swp1 and
swp2 (not to bond0, remember, that isn't offloaded, so as far as the
switch is concerned, ports swp3 and swp4 are not looking up the FDB,
and the entire bond0 is a destination that is strictly behind the
CPU). But we don't want duplicated traffic towards swp1 and swp2, so
it's not ok to set skb->offload_fwd_mark = 0.
- If we set it to 1, then the bridge will not forward the skb towards
the ports with the same switchdev mark, i.e. not to swp1, swp2 and
bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
have forwarded the skb there.
So the real issue is that bond0 will be assigned the same hardware
domain as {swp0,swp1,swp2}, because the function that assigns hardware
domains to bridge ports, nbp_switchdev_add(), recurses through bond0's
lower interfaces until it finds something that implements devlink (calls
dev_get_port_parent_id with bool recurse = true). This is a problem
because the fact that bond0 can be offloaded by swp3 and swp4 in our
example is merely an assumption.
A solution is to give the bridge explicit hints as to what hardware
domain it should use for each port.
Currently, the bridging offload is very 'silent': a driver registers a
netdevice notifier, which is put on the netns's notifier chain, and
which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
bridge, and the lower is an interface it knows about (one registered by
this driver, normally). Then, from within that notifier, it does a bunch
of stuff behind the bridge's back, without the bridge necessarily
knowing that there's somebody offloading that port. It looks like this:
ip link set swp0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v
call_netdevice_notifiers
|
v
dsa_slave_netdevice_event
|
v
oh, hey! it's for me!
|
v
.port_bridge_join
What we do to solve the conundrum is to be less silent, and change the
switchdev drivers to present themselves to the bridge. Something like this:
ip link set swp0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge: Aye! I'll use this
call_netdevice_notifiers ^ ppid as the
| | hardware domain for
v | this port, and zero
dsa_slave_netdevice_event | if I got nothing.
| |
v |
oh, hey! it's for me! |
| |
v |
.port_bridge_join |
| |
+------------------------+
switchdev_bridge_port_offload(swp0, swp0)
Then stacked interfaces (like bond0 on top of swp3/swp4) would be
treated differently in DSA, depending on whether we can or cannot
offload them.
The offload case:
ip link set bond0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge: Aye! I'll use this
call_netdevice_notifiers ^ ppid as the
| | switchdev mark for
v | bond0.
dsa_slave_netdevice_event | Coincidentally (or not),
| | bond0 and swp0, swp1, swp2
v | all have the same switchdev
hmm, it's not quite for me, | mark now, since the ASIC
but my driver has already | is able to forward towards
called .port_lag_join | all these ports in hw.
for it, because I have |
a port with dp->lag_dev == bond0. |
| |
v |
.port_bridge_join |
for swp3 and swp4 |
| |
+------------------------+
switchdev_bridge_port_offload(bond0, swp3)
switchdev_bridge_port_offload(bond0, swp4)
And the non-offload case:
ip link set bond0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge waiting:
call_netdevice_notifiers ^ huh, switchdev_bridge_port_offload
| | wasn't called, okay, I'll use a
v | hwdom of zero for this one.
dsa_slave_netdevice_event : Then packets received on swp0 will
| : not be software-forwarded towards
v : swp1, but they will towards bond0.
it's not for me, but
bond0 is an upper of swp3
and swp4, but their dp->lag_dev
is NULL because they couldn't
offload it.
Basically we can draw the conclusion that the lowers of a bridge port
can come and go, so depending on the configuration of lowers for a
bridge port, it can dynamically toggle between offloaded and unoffloaded.
Therefore, we need an equivalent switchdev_bridge_port_unoffload too.
This patch changes the way any switchdev driver interacts with the
bridge. From now on, everybody needs to call switchdev_bridge_port_offload
and switchdev_bridge_port_unoffload, otherwise the bridge will treat the
port as non-offloaded and allow software flooding to other ports from
the same ASIC.
Note that these functions lay the ground for a more complex handshake
between switchdev drivers and the bridge in the future.
For drivers that will request a replay of the switchdev objects when
they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we
place the call to switchdev_bridge_port_unoffload() strategically inside
the NETDEV_PRECHANGEUPPER notifier's code path, and not inside
NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers
need the netdev adjacency lists to be valid, and that is only true in
NETDEV_PRECHANGEUPPER.
Cc: Vadym Kochan <vkochan@marvell.com>
Cc: Taras Chornyi <tchornyi@marvell.com>
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch
Signed-off-by: David S. Miller <davem@davemloft.net>
The character sequence ??! is a trigraph and causes the following
clang warning:
drivers/net/ethernet/marvell/mvneta.c:2604:39: warning: trigraph ignored [-Wtrigraphs]
Clean this by replacing it with single ?.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation for Armada 8040 says:
Bit 2 Field InBandAnEn In-band Auto-Negotiation enable. ...
When <PortType> = 1 (1000BASE-X) this field must be set to 1.
We presently ignore whether userspace requests autonegotiation or not
through the ethtool ksettings interface. However, we have some network
interfaces that wish to do this. To offer a consistent API across
network interfaces, deny the ability to disable autonegotiation on
mvpp2 hardware when in 1000BASE-X and 2500BASE-X.
This means the only way to switch between 2500BASE-X and 1000BASE-X
on SFPs that support this will be:
# ethtool -s ethX advertise 0x20000006000 # 1000BASE-X Pause AsymPause
# ethtool -s ethX advertise 0xe000 # 2500BASE-X Pause AsymPause
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Marek Behún <kabel@kernel.org>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation for Armada 38x says:
Bit 2 Field InBandAnEn In-band Auto-Negotiation enable. ...
When <PortType> = 1 (1000BASE-X) this field must be set to 1.
We presently ignore whether userspace requests autonegotiation or not
through the ethtool ksettings interface. However, we have some network
interfaces that wish to do this. To offer a consistent API across
network interfaces, deny the ability to disable autonegotiation on
mvneta hardware when in 1000BASE-X and 2500BASE-X.
This means the only way to switch between 2500BASE-X and 1000BASE-X
on SFPs that support this will be:
# ethtool -s ethX advertise 0x20000002000 # 1000BASE-X Pause
# ethtool -s ethX advertise 0xa000 # 2500BASE-X Pause
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of now any communication between CGXs PFs and
their VFs within the system is possible only by
external switches sending packets back to the
system. This patch adds internal switching support.
Broadcast packet replication is not covered here.
RVU admin function (AF) maintains MAC addresses
of all interfaces in the system. When switching is
enabled, MCAM entries are allocated to install rules
such that packets with DMAC matching any of the
internal interface MAC addresses is punted back
into the system via the loopback channel.
On the receive side the default unicast rules
are modified to not check for ingress channel.
So any packet with matching DMAC irrespective of
which interface it is coming from will be forwarded
to the respective PF/VF interface.
The transmit side rules and default unicast rules
are updated if user changes MAC address of an interface.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF till now only manages the allocation and freeing of
MCAM rules for other PF/VFs in system. To implement
L2 switching between all CGX mapped PF and VFs, AF
requires MCAM entries for DMAC rules for each PF and VF.
This patch modifies AF driver such that AF can also
allocate MCAM rules and install rules for other
PFs and VFs. All the checks like channel verification
for RX rules and PF_FUNC verification for TX rules are
relaxed in case AF is allocating or installing rules.
Also all the entry and counter to owner mappings are
set to NPC_MCAM_INVALID_MAP when they are free indicating
those are not allocated to AF nor PF/VFs.
This patch also ensures that AF allocated and installed
entries are displayed in debugfs.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For enabling VF-VF switching the packets egressing
out of CGX mapped VFs needed to be sent to LBK
so that same packets are received back to the system.
But the LBK link also needs to be enabled in addition
to a VF's mapped CGX_LMAC link otherwise hardware
raises send error interrupt indicating selected LBK
link is not enabled in NIX_AF_TL3_TL2X_LINKX_CFG register.
Hence this patch enables all LBK links in
TL3_TL2_LINKX_CFG registers.
Also to enable packet flow between PFs/VFs of NIX0
to PFs/VFs of NIX1(in 98xx silicon) the NPC TX DMAC
rules has to be installed such that rules must be hit
for any TX interface i.e., NIX0-TX or NIX1-TX provided
DMAC match creteria is met. Hence this patch changes the
behavior such that MCAM is programmed to match with any
NIX0/1-TX interface for TX rules.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets
in subflow receive buffers longer than necessary, delaying
MPTCP-level ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
- netfilter: conntrack: do not mark RST in the reply direction coming
after SYN packet for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S
Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb
HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw
wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX
GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0
vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu
rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo
peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V
xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc
FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw
4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ
eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs=
=QFnb
-----END PGP SIGNATURE-----
Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski.
"Including fixes from bpf and netfilter.
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets in
subflow receive buffers longer than necessary, delaying MPTCP-level
ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack:
- do not renew entry stuck in tcp SYN_SENT state
- do not mark RST in the reply direction coming after SYN packet
for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison"
* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
sfc: add logs explaining XDP_TX/REDIRECT is not available
sfc: ensure correct number of XDP queues
sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
net: fddi: fix UAF in fza_probe
net: dsa: sja1105: fix address learning getting disabled on the CPU port
net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
net: Use nlmsg_unicast() instead of netlink_unicast()
octeontx2-pf: Fix uninitialized boolean variable pps
ipv6: allocate enough headroom in ip6_finish_output2()
net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
net: bridge: multicast: fix MRD advertisement router port marking race
net: bridge: multicast: fix PIM hello router port marking race
net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
dsa: fix for_each_child.cocci warnings
virtio_net: check virtqueue_add_sgs() return value
mptcp: properly account bulk freed memory
selftests: mptcp: fix case multiple subflows limited by server
mptcp: avoid processing packet if a subflow reset
mptcp: fix syncookie process if mptcp can not_accept new subflow
...
In the case where act->id is FLOW_ACTION_POLICE and also
act->police.rate_bytes_ps > 0 or act->police.rate_pkt_ps is not > 0
the boolean variable pps contains an uninitialized value when
function otx2_tc_act_set_police is called. Fix this by initializing
pps to false.
Addresses-Coverity: ("Uninitialized scalar variable)"
Fixes: 68fbff68db ("octeontx2-pf: Add police action for TC flower")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when the call to otx2_mbox_alloc_msg_cgx_mac_addr_update fails
the error return variable rc is being assigned -ENOMEM and does not
return early. rc is then re-assigned and the error case is not handled
correctly. Fix this by returning -ENOMEM rather than assigning rc.
Addresses-Coverity: ("Unused value")
Fixes: 79d2be385e ("octeontx2-pf: offload DMAC filters to CGX/RPM block")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always set skb_shared_info data structure in mvneta_swbm_add_rx_fragment
routine even if the fragment contains only the ethernet FCS.
Fixes: 039fbc47f9 ("net: mvneta: alloc skb_shared_info on the mvneta_rx_swbm stack")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Added option for per CPU threads to the hwlat tracer
- Have hwlat tracer handle hotplug CPUs
- New tracer: osnoise, that detects latency caused by interrupts, softirqs
and scheduling of other tasks.
- Added timerlat tracer that creates a thread and measures in detail what
sources of latency it has for wake ups.
- Removed the "success" field of the sched_wakeup trace event.
This has been hardcoded as "1" since 2015, no tooling should be looking
at it now. If one exists, we can revert this commit, fix that tool and
try to remove it again in the future.
- tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.
- New boot command line option "tp_printk_stop", as tp_printk causes trace
events to write to console. When user space starts, this can easily live
lock the system. Having a boot option to stop just after boot up is
useful to prevent that from happening.
- Have ftrace_dump_on_oops boot command line option take numbers that match
the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.
- Bootconfig clean ups, fixes and enhancements.
- New ktest script that tests bootconfig options.
- Add tracepoint_probe_register_may_exist() to register a tracepoint
without triggering a WARN*() if it already exists. BPF has a path from
user space that can do this. All other paths are considered a bug.
- Small clean ups and fixes
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYN8YPhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhxLAP9Mo5hHv7Hg6W7Ddv77rThm+qclsMR/
yW0P+eJpMm4+xAD8Cq03oE1DimPK+9WZBKU5rSqAkqG6CjgDRw6NlIszzQQ=
=WEPR
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Added option for per CPU threads to the hwlat tracer
- Have hwlat tracer handle hotplug CPUs
- New tracer: osnoise, that detects latency caused by interrupts,
softirqs and scheduling of other tasks.
- Added timerlat tracer that creates a thread and measures in detail
what sources of latency it has for wake ups.
- Removed the "success" field of the sched_wakeup trace event. This has
been hardcoded as "1" since 2015, no tooling should be looking at it
now. If one exists, we can revert this commit, fix that tool and try
to remove it again in the future.
- tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.
- New boot command line option "tp_printk_stop", as tp_printk causes
trace events to write to console. When user space starts, this can
easily live lock the system. Having a boot option to stop just after
boot up is useful to prevent that from happening.
- Have ftrace_dump_on_oops boot command line option take numbers that
match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.
- Bootconfig clean ups, fixes and enhancements.
- New ktest script that tests bootconfig options.
- Add tracepoint_probe_register_may_exist() to register a tracepoint
without triggering a WARN*() if it already exists. BPF has a path
from user space that can do this. All other paths are considered a
bug.
- Small clean ups and fixes
* tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits)
tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
tracing: Simplify & fix saved_tgids logic
treewide: Add missing semicolons to __assign_str uses
tracing: Change variable type as bool for clean-up
trace/timerlat: Fix indentation on timerlat_main()
trace/osnoise: Make 'noise' variable s64 in run_osnoise()
tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
tracing: Fix spelling in osnoise tracer "interferences" -> "interference"
Documentation: Fix a typo on trace/osnoise-tracer
trace/osnoise: Fix return value on osnoise_init_hotplug_support
trace/osnoise: Make interval u64 on osnoise_main
trace/osnoise: Fix 'no previous prototype' warnings
tracing: Have osnoise_main() add a quiescent state for task rcu
seq_buf: Make trace_seq_putmem_hex() support data longer than 8
seq_buf: Fix overflow in seq_buf_putmem_hex()
trace/osnoise: Support hotplug operations
trace/hwlat: Support hotplug operations
trace/hwlat: Protect kdata->kthread with get/put_online_cpus
trace: Add timerlat tracer
trace: Add osnoise tracer
...
DMAC filtering can be achieved by either NPC MCAM rules or
CGX/RPM MAC filters. Currently we are achieving this by NPC
MCAM rules. This patch offloads DMAC filters to CGX/RPM MAC
filters instead of NPC MCAM rules. Offloading DMAC filter to
CGX/RPM block helps in reducing traffic to NPC block and
save MCAM rules
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAC block supports 32 dmac filters which are logically
divided among all attached LMACS.
For example MAC block0 having one LMAC then maximum supported
filters are 32 where as MAC block1 having 4 enabled LMACS
them maximum supported filteres are 8 for each LMAC.
This patch adds mbox handlers to add/delete/update mac entry
in DMAC filter table.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current driver uses static LMTST region allocated by firmware.
This memory gets populated as PF/VF BAR2. RVU PF/VF driver ioremap
the memory as device memory for NIX/NPA operation. Since the memory
is mapped as device memory we see performance degration. To address
this issue this patch implements runtime memory allocation.
RVU PF/VF allocates memory during device probe and share the base
address with RVU AF. RVU AF then configure the LMT MAP table
accordingly.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends the lmtst_tbl_setup_req mbox to support run time
LMTST configuration.
RVU PF/VF and DPDK/ODP allocates a LMT region, creates a translation
entry for a device via VFIO IOCTLs.
This IOVA is shared with AF through above mbox. AF then uses
RVU_SMMU transulation Widget and gets PA for the IOVA and updates
the LMTtable entry for that device.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introducing a new mailbox to support updating lmt entries
and common lmt base address scheme i.e. multiple pcifuncs
can share lmt region to reduce L1 cache pressure for application.
Parameters passed to mailbox includes the primary pcifunc
value whose lmt regions will be shared by other secondary
pcifuncs. Here secondary pcifunc will be the one who is
calling the mailbox.
For example:
By default each pcifunc has its own LMT base address:
PCIFUNC1 LMT_BASE_ADDR A
PCIFUNC2 LMT_BASE_ADDR B
PCIFUNC3 LMT_BASE_ADDR C
PCIFUNC4 LMT_BASE_ADDR D
Application will choose PCIFUNC1 as base/primary pcifunc
and as and when other pcifunc(secondary pcifuncs) gets
probed, this mailbox will be called and LMTST table will
be updated as:
PCIFUNC1 LMT_BASE_ADDR A
PCIFUNC2 LMT_BASE_ADDR A
PCIFUNC3 LMT_BASE_ADDR A
PCIFUNC4 LMT_BASE_ADDR A
On FLR lmtst map table gets resetted to the default lmt
base addresses for all secondary pcifuncs.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The __assign_str macro has an unusual ending semicolon but the vast
majority of uses of the macro already have semicolon termination.
$ git grep -P '\b__assign_str\b' | wc -l
551
$ git grep -P '\b__assign_str\b.*;' | wc -l
480
Add semicolons to the __assign_str() uses without semicolon termination
and all the other uses without semicolon termination via additional defines
that are equivalent to __assign_str() with the eventual goal of removing
the semicolon from the __assign_str() macro definition.
Link: https://lore.kernel.org/lkml/1e068d21106bb6db05b735b4916bb420e6c9842a.camel@perches.com/
Link: https://lkml.kernel.org/r/48a056adabd8f70444475352f617914cef504a45.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-06-28
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).
The main changes are:
1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.
2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.
3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.
4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.
5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.
6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.
7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.
8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.
9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.
10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.
11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.
12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case where the driver asks for a replay of a certain type of
event (port object or attribute) for a bridge port that is a LAG, it may
do so because this port has just joined the LAG.
But there might already be other switchdev ports in that LAG, and it is
preferable that those preexisting switchdev ports do not act upon the
replayed event.
The solution is to add a context to switchdev events, which is NULL most
of the time (when the bridge layer initiates the call) but which can be
set to a value controlled by the switchdev driver when a replay is
requested. The driver can then check the context to figure out if all
ports within the LAG should act upon the switchdev event, or just the
ones that match the context.
We have to modify all switchdev_handle_* helper functions as well as the
prototypes in the drivers that use these helpers too, because these
helpers hide the underlying struct switchdev_notifier_info from us and
there is no way to retrieve the context otherwise.
The context structure will be populated and used in later patches.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly implemented fwnode_mdbiobus_register turned out to be
problematic - in case the fwnode_/of_/acpi_mdio are built as
modules, a dependency cycle can be observed during the depmod phase of
modules_install, eg.:
depmod: ERROR: Cycle detected: fwnode_mdio -> of_mdio -> fwnode_mdio
depmod: ERROR: Found 2 modules in dependency cycles!
OR:
depmod: ERROR: Cycle detected: acpi_mdio -> fwnode_mdio -> acpi_mdio
depmod: ERROR: Found 2 modules in dependency cycles!
A possible solution could be to rework fwnode_mdiobus_register,
so that to merge the contents of acpi_mdiobus_register and
of_mdiobus_register. However feasible, such change would
be very intrusive and affect huge amount of the of_mdiobus_register
users.
Since there are currently 2 users of ACPI and MDIO
(xgmac_mdio and mvmdio), withdraw the fwnode_mdbiobus_register
and roll back to a simple 'if' condition in affected drivers.
Fixes: 62a6ef6a99 ("net: mdiobus: Introduce fwnode_mdbiobus_register()")
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvneta and mvpp2 drivers have rcu_read_lock()/rcu_read_unlock() pairs
around XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.
Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/20210624160609.292325-13-toke@redhat.com
This is a preliminary patch to add hw csum hint support to
mvneta/mvpp2 xdp implementation
Tested-by: Matteo Croce <mcroce@linux.microsoft.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.
To avoid having memcpy() think a u64 "prof" is being written beyond,
adjust the prof member type by adding struct nix_bandprof_s to the union
to match the other structs. This silences the following future warning:
In file included from ./include/linux/string.h:253,
from ./include/linux/bitmap.h:10,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/cpumask.h:5,
from ./arch/x86/include/asm/msr.h:11,
from ./arch/x86/include/asm/processor.h:22,
from ./arch/x86/include/asm/timex.h:5,
from ./include/linux/timex.h:65,
from ./include/linux/time32.h:13,
from ./include/linux/time.h:60,
from ./include/linux/stat.h:19,
from ./include/linux/module.h:13,
from drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:11:
In function '__fortify_memcpy_chk',
inlined from '__fortify_memcpy' at ./include/linux/fortify-string.h:310:2,
inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:910:5:
./include/linux/fortify-string.h:268:4: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); please use struct_group() [-Wattribute-warning]
268 | __write_overflow_field();
| ^~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:
...
else if (req->ctype == NIX_AQ_CTYPE_BANDPROF)
memcpy(&rsp->prof, ctx,
sizeof(struct nix_bandprof_s));
...
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Subbaraya Sundeep<sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'has_phy' field from struct mvpp2_port is no longer used.
Remove it.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the MDIO and phylink are supported in the ACPI
world, enable to use them in the mvpp2 driver. Ensure a backward
compatibility with the firmware whose ACPI description does
not contain the necessary elements for the proper phy handling
and fall back to relying on the link interrupts instead.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introducing ACPI support for the mvmdio driver by adding
acpi_match_table with two entries:
* "MRVL0100" for the SMI operation
* "MRVL0101" for the XSMI mode
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0571a753cb cancelled delayed work too late, keeping small
race. Cancel work sooner to close it completely.
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Fixes: 0571a753cb ("net: pxa168_eth: Fix a potential data race in pxa168_eth_remove")
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the return expression in the rvu_npc_fs.c
Signed-off-by: dingsenjie <dingsenjie@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Introduce matchall filter support
- Add SPAN API to configure port mirroring.
- Add tc mirror action.
At this moment, only mirror (egress) action is supported.
Example:
tc filter ... action mirred egress mirror dev DEV
Co-developed-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Vadym Kochan <vkochan@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added police action for ingress TC flower
hardware offload. With this rate limiting can be
done per flow. Since rate limiting is tied to
RQs in hardware the number of TC flower filters
with action as police is limited to number
of receive queues of the interface. Both bps
and pps modes are supported.
Examples to rate limit a flow:
$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 parent ffff: protocol ip \
flower ip_proto udp dst_port 80 action \
police rate 100Mbit burst 32Kbit
$ tc filter add dev eth0 parent ffff: \
protocol ip flower dst_mac 5e:b2:34:ee:29:49 \
action police pkts_rate 5000 pkts_burst 2048
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies all netdev_err messages in
tc code to NL_SET_ERR_MSG_MOD. NL_SET_ERR_MSG_MOD
does not support format specifiers yet hence
netdev_err messages with only strings are modified.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TC_MATCHALL ingress ratelimiting offload support with POLICE
action for entire traffic coming into the interface.
Eg: To ratelimit ingress traffic to 100Mbps
$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 clsact
$ tc filter add dev eth0 ingress matchall skip_sw \
action police rate 100Mbit burst 32Kbit
To support this, a leaf level bandwidth profile is allocated and all
RQs' contexts used by this interface are updated to point to it.
And the leaf level bandwidth profile is configured with user specified
rate and burst sizes.
Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for dumping current resource status of bandwidth
profiles and contexts of allocated profiles via debugfs.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K silicons supports hierarchial ingress packet ratelimiting.
There are 3 levels of profilers supported leaf, mid and top.
Ratelimiting is done after packet forwarding decision is taken
and a NIXLF's RQ is identified to DMA the packet. RQ's context
points to a leaf bandwidth profile which can be configured
to achieve desired ratelimit.
This patch adds logic for management of these bandwidth profiles
ie profile alloc, free, context update etc.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>