The Injection Frame Header and Extraction Frame Header that the switch
prepends to frames over the NPI port is also prepended to frames
delivered over the CPU port module's queues.
Let's unify the handling of the frame headers by making the ocelot
driver call some helpers exported by the DSA tagger. Among other things,
this allows us to get rid of the strange cpu_to_be32 when transmitting
the Injection Frame Header on ocelot, since the packing API uses
network byte order natively (when "quirks" is 0).
The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and
the cpu extraction queue mask to something, but the code doesn't do it,
so we don't do it either.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot switch has been supporting LAG offload since its initial
commit, however felix could not make use of that, due to lack of a LAG
abstraction in DSA. Now that we have that, let's forward DSA's calls
towards the ocelot library, who will deal with setting up the bonding.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At present there is an issue when ocelot is offloading a bonding
interface, but one of the links of the physical ports goes down. Traffic
keeps being hashed towards that destination, and of course gets dropped
on egress.
Monitor the netdev notifier events emitted by the bonding driver for
changes in the physical state of lower interfaces, to determine which
ports are active and which ones are no longer.
Then extend ocelot_get_bond_mask to return either the configured bonding
interfaces, or the active ones, depending on a boolean argument. The
code that does rebalancing only needs to do so among the active ports,
whereas the bridge forwarding mask and the logical port IDs still need
to look at the permanently bonded ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since switchdev/DSA exposes network interfaces that fulfill many of the
same user space expectations that dedicated NICs do, it makes sense to
not deny bonding interfaces with a bonding policy that we cannot offload,
but instead allow the bonding driver to select the egress interface in
software.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are several error handling bugs in mscc_ocelot_init_ports(). I
went through the code, and carefully audited it and made fixes and
cleanups.
1) The ocelot_probe_port() function didn't have a mirror release function
so it was hard to follow. I created the ocelot_release_port()
function.
2) In the ocelot_probe_port() function, if the register_netdev() call
failed, then it lead to a double free_netdev(dev) bug. Fix this by
setting "ocelot->ports[port] = NULL" on the error path.
3) I was concerned that the "port" which comes from of_property_read_u32()
might be out of bounds so I added a check for that.
4) In the original code if ocelot_regmap_init() failed then the driver
tried to continue but I think that should be a fatal error.
5) If ocelot_probe_port() failed then the most recent devlink was leaked.
The fix for mostly came Vladimir Oltean. Get rid of "registered_ports"
and just set a bit in "devlink_ports_registered" to say when the
devlink port has been registered (and needs to be unregistered on
error). There are fewer than 32 ports so a u32 is large enough for
this purpose.
6) The error handling if the final ocelot_port_devlink_init() failed had
two problems. The "while (port-- >= 0)" loop should have been
"--port" pre-op instead of a post-op to avoid a buffer underflow.
The "if (!registered_ports[port])" condition was reversed leading to
resource leaks and double frees.
Fixes: 6c30384eb1 ("net: mscc: ocelot: register devlink ports")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using devlink-sb, we can configure 12/16 (the important 75%) of the
switch's controlling watermarks for congestion drops, and we can monitor
50% of the watermark occupancies (we can monitor the reservation
watermarks, but not the sharing watermarks, which are exposed as pool
sizes).
The following definitions can be made:
SB_BUF=0 # The devlink-sb for frame buffers
SB_REF=1 # The devlink-sb for frame references
POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances
# have one of these.
POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances
# have one of these.
Editing the hardware watermarks is done in the following way:
BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING
REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING
BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR
REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR
Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly
by modifying the corresponding pool size. By default, the pool size has
maximum size, so this can be skipped.
devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \
size 129840 thtype static
Since by default there is no buffer reservation, the above command has
maxed out BUF_COL_SHR_I(dp=0).
Configuring the per-port reservation watermark (P_RSRV) is done in the
following way:
devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \
pool $POOL_ING th 1000
The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this
command, the sharing watermarks are internally reconfigured with 1000
bytes less, i.e. from 129840 bytes to 128840 bytes.
Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in
the following way:
for tc in {0..7}; do
devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \
type ingress pool $POOL_ING \
th 3000
done
The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes.
The sharing watermarks are again reconfigured with 24000 bytes less.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is meant to be a gentle introduction into the world of watermarks
on ocelot. The code is placed in ocelot_devlink.c because it will be
integrated with devlink, even if it isn't right now.
My first step was intended to be to replicate the default configuration
of the congestion watermarks programatically, since they are now going
to be tuned by the user.
But after studying and understanding through trial and error how they
work, I now believe that the configuration used out of reset does not do
justice to the word "reservation", since the sum of all reservations
exceeds the total amount of resources (otherwise said, all reservations
cannot be fulfilled at the same time, which means that, contrary to the
reference manual, they don't guarantee anything).
As an example, here's a dump of the reservation watermarks for frame
buffers, for port 0 (for brevity, the ports 1-6 were omitted, but they
have the same configuration):
BUF_Q_RSRV_I(port 0, prio 0) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 1) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 2) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 3) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 4) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 5) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 6) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 7) = max 3000 bytes
Otherwise said, every port-tc has an ingress reservation of 3000 bytes,
and there are 7 ports in VSC9959 Felix (6 user ports and 1 CPU port).
Concentrating only on the ingress reservations, there are, in total,
8 [traffic classes] x 7 [ports] x 3000 [bytes] = 168,000 bytes of memory
reserved on ingress.
But, surprise, Felix only has 128 KB of packet buffer in total...
A similar thing happens with Seville, which has a larger packet buffer,
but also more ports, and the default configuration is also overcommitted.
This patch disables the (apparently) bogus reservations and moves all
resources to the shared area. This way, real reservations can be set up
by the user, using devlink-sb.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add devlink integration into the mscc_ocelot switchdev driver. All
physical ports (i.e. the unused ones as well) except the CPU port module
at ocelot->num_phys_ports are registered with devlink, and that requires
keeping the devlink_port structure outside struct ocelot_port_private,
since the latter has a 1:1 mapping with a struct net_device (which does
not exist for unused ports).
Since we use devlink_port_type_eth_set to link the devlink port to the
net_device, we can as well remove the .ndo_get_phys_port_name and
.ndo_get_port_parent_id implementations, since devlink takes care of
retrieving the port name and number automatically, once
.ndo_get_devlink_port is implemented.
Note that the felix DSA driver is already integrated with devlink by
default, since that is a thing that the DSA core takes care of. This is
the reason why these devlink stubs were put in ocelot_net.c and not in
the common library. It is also the reason why ocelot::devlink is a
pointer and not a full structure embedded inside struct ocelot: because
the mscc_ocelot driver allocates that by itself (as the container of
struct ocelot, in fact), but in the case of felix, it is DSA who
allocates the devlink, and felix just propagates the pointer towards
struct ocelot.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a leftover of commit 69df578c5f ("net: mscc: ocelot: eliminate
confusion between CPU and NPI port") which renamed that function.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is one main difference in mscc_ocelot between IP multicast and L2
multicast. With IP multicast, destination ports are encoded into the
upper bytes of the multicast MAC address. Example: to deliver the
address 01:00:5E:11:22:33 to ports 3, 8, and 9, one would need to
program the address of 00:03:08:11:22:33 into hardware. Whereas for L2
multicast, the MAC table entry points to a Port Group ID (PGID), and
that PGID contains the port mask that the packet will be forwarded to.
As to why it is this way, no clue. My guess is that not all port
combinations can be supported simultaneously with the limited number of
PGIDs, and this was somehow an issue for IP multicast but not for L2
multicast. Anyway.
Prior to this change, the raw L2 multicast code was bogus, due to the
fact that there wasn't really any way to test it using the bridge code.
There were 2 issues:
- A multicast PGID was allocated for each MDB entry, but it wasn't in
fact programmed to hardware. It was dummy.
- In fact we don't want to reserve a multicast PGID for every single MDB
entry. That would be odd because we can only have ~60 PGIDs, but
thousands of MDB entries. So instead, we want to reserve a multicast
PGID for every single port combination for multicast traffic. And
since we can have 2 (or more) MDB entries delivered to the same port
group (and therefore PGID), we need to reference-count the PGIDs.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the mscc_ocelot_switch_lib is common between a pure switchdev and
a DSA driver, the procedure of retrieving a net_device for a certain
port index differs, as those are registered by their individual
front-ends.
Up to now that has been dealt with by always passing the port index to
the switch library, but now, we're going to need to work with net_device
pointers from the tc-flower offload, for things like indev, or mirred.
It is not desirable to refactor that, so let's make sure that the flower
offload core has the ability to translate between a net_device and a
port index properly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently Felix and Ocelot share the same bit layout in these per-port
registers, but Seville does not. So we need reg_fields for that.
Actually since these are per-port registers, we need to also specify the
number of ports, and register size per port, and use the regmap API for
multiple ports.
There's a more subtle point to be made about the other 2 register
fields:
- QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG
- QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE
which we are not writing any longer, for 2 reasons:
- Using the previous API (ocelot_write_rix), we were only writing 1 for
Felix and Ocelot, which was their hardware-default value, and which
there wasn't any intention in changing.
- In the case of SCH_NEXT_CFG, in fact Seville does not have this
register field at all, and therefore, if we want to have common code
we would be required to not write to it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment, there are some minimal register differences between
VSC7514 Ocelot and VSC9959 Felix. To be precise, the PCS1G registers are
missing from Felix because it was integrated with an NXP PCS.
But with VSC9953 Seville (not yet introduced), the register differences
are more pronounced. The MAC registers are located at different offsets
within the DEV_GMII target. So we need to refactor the driver to keep a
regmap even for per-port registers. The callers of the ocelot_port_readl
and ocelot_port_writel were kept unchanged, only the implementation is
now more generic.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current procedure for installing a multicast address is hardcoded
for IPv4. But, in the ocelot hardware, there are 3 different procedures
for IPv4, IPv6 and for regular L2 multicast.
For IPv6 (33-33-xx-xx-xx-xx), it's the same as for IPv4
(01-00-5e-xx-xx-xx), except that the destination port mask is stuffed
into first 2 bytes of the MAC address except into first 3 bytes.
For plain Ethernet multicast, there's no port-in-address stuffing going
on, instead the DEST_IDX (pointer to PGID) is used there, just as for
unicast. So we have to use one of the nonreserved multicast PGIDs that
the hardware has allocated for this purpose.
This patch classifies the type of multicast address based on its first
bytes, then redirects to one of the 3 different hardware procedures.
Note that this gives us a really better way of redirecting PTP frames
sent at 01-1b-19-00-00-00 to the CPU. Previously, Yangbo Lu tried to add
a trapping rule for PTP EtherType but got a lot of pushback:
https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/
But right now, that isn't needed at all. The application stack (ptp4l)
does this for the PTP multicast addresses it's interested in (which are
configurable, and include 01-1b-19-00-00-00):
memset(&mreq, 0, sizeof(mreq));
mreq.mr_ifindex = index;
mreq.mr_type = PACKET_MR_MULTICAST;
mreq.mr_alen = MAC_LEN;
memcpy(mreq.mr_address, addr1, MAC_LEN);
err1 = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq,
sizeof(mreq));
Into the kernel, this translates into a dev_mc_add on the switch network
interfaces, and our drivers know that it means they should translate it
into a host MDB address (make the CPU port be the destination).
Previously, this was broken because all mdb addresses were treated as
IPv4 (which 01-1b-19-00-00-00 obviously is not).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the mdb hooks in felix and exports the mdb functions from
ocelot.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot hardware library shouldn't contain too much net_device
specific code, since it is shared with DSA which abstracts that
structure away. So much as much of this code as possible into the
mscc_ocelot driver and outside of the common library.
We're making an exception for MDB and LAG code. That is not yet exported
to DSA, but when it will, most of the code that's already in ocelot.c
will remain there. So, there's no point in moving code to ocelot_net.c
just to move it back later.
We could have moved all net_device code to ocelot_vsc7514.c directly,
but let's operate under the assumption that if a new switchdev ocelot
driver gets added, it'll define its SoC-specific stuff in a new
ocelot_vsc*.c file and it'll reuse the rest of the code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_regs.c actually shouldn't be part of the common library. It
describes the register map of the VSC7514 switch. The way ocelot
switches work, they'll have highly optimized register maps, so another
SoC will likely have the same registers but laid out completely
different in memory (so there's little room for reusing this structure).
So move it to ocelot_vsc7514.c instead.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ocelot PTP clock driver had been embedded into ocelot.c driver.
It had supported basic gettime64/settime64/adjtime/adjfine functions
by now which were used by both Ocelot switch and Felix switch.
This patch is to move current ptp clock code out of ocelot.c driver
maintaining as a single ocelot_ptp.c.
For futher new features implementation, the common code could be put
in ocelot_ptp.c and the switch specific code should be in specific
switch driver. The interrupt implementation in SoC is different
between Ocelot and Felix.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To rehash a previous explanation given in commit 1c44ce560b ("net:
mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is
up"), the switch driver operates the in a mode where a single VLAN can
be transmitted as untagged on a particular egress port. That is the
"native VLAN on trunk port" use case.
The configuration for this native VLAN is driven in 2 ways:
- Set the egress port rewriter to strip the VLAN tag for the native
VID (as it is egress-untagged, after all).
- Configure the ingress port to drop untagged and priority-tagged
traffic, if there is no native VLAN. The intention of this setting is
that a trunk port with no native VLAN should not accept untagged
traffic.
Since both of the above configurations for the native VLAN should only
be done if VLAN awareness is requested, they are actually done from the
ocelot_port_vlan_filtering function, after the basic procedure of
toggling the VLAN awareness flag of the port.
But there's a problem with that simplistic approach: we are trying to
juggle with 2 independent variables from a single function:
- Native VLAN of the port - its value is held in port->vid.
- VLAN awareness state of the port - currently there are some issues
here, more on that later*.
The actual problem can be seen when enslaving the switch ports to a VLAN
filtering bridge:
0. The driver configures a pvid of zero for each port, when in
standalone mode. While the bridge configures a default_pvid of 1 for
each port that gets added as a slave to it.
1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true.
The VLAN-filtering-dependent portion of the native VLAN
configuration is done, considering that the native VLAN is 0.
2. The bridge calls ocelot_vlan_add with vid=1, pvid=true,
untagged=true. The native VLAN changes to 1 (change which gets
propagated to hardware).
3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the
VLAN-filtering-dependent portion of the native VLAN configuration,
for the new native VLAN of 1. One can notice that after toggling "ip
link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0
type bridge vlan_filtering 1", the new native VLAN finally makes it
through and untagged traffic finally starts flowing again. But
obviously that shouldn't be needed.
So it is clear that 2 independent variables need to both re-trigger the
native VLAN configuration. So we introduce the second variable as
ocelot_port->vlan_aware.
*Actually both the DSA Felix driver and the Ocelot driver already had
each its own variable:
- Ocelot: ocelot_port_private->vlan_aware
- Felix: dsa_port->vlan_filtering
but the common Ocelot library needs to work with a single, common,
variable, so there is some refactoring done to move the vlan_aware
property from the private structure into the common ocelot_port
structure.
Fixes: 97bb69e1e3 ("net: mscc: ocelot: break apart ocelot_vlan_port_apply")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Compared to other DSA switches, in the Ocelot cores, the RX filtering is
a much more important concern.
Firstly, the primary use case for Ocelot is non-DSA, so there isn't any
secondary Ethernet MAC [the DSA master's one] to implicitly drop frames
having a DMAC we are not interested in. So the switch driver itself
needs to install FDB entries towards the CPU port module (PGID_CPU) for
the MAC address of each switch port, in each VLAN installed on the port.
Every address that is not whitelisted is implicitly dropped. This is in
order to achieve a behavior similar to N standalone net devices.
Secondly, even in the secondary use case of DSA, such as illustrated by
Felix with the NPI port mode, that secondary Ethernet MAC is present,
but its RX filter is bypassed. This is because the DSA tags themselves
are placed before Ethernet, so the DMAC that the switch ports see is
not seen by the DSA master too (since it's shifter to the right).
So RX filtering is pretty important. A good RX filter won't bother the
CPU in case the switch port receives a frame that it's not interested
in, and there exists no other line of defense.
Ocelot is pretty strict when it comes to RX filtering: non-IP multicast
and broadcast traffic is allowed to go to the CPU port module, but
unknown unicast isn't. This means that traffic reception for any other
MAC addresses than the ones configured on each switch port net device
won't work. This includes use cases such as macvlan or bridging with a
non-Ocelot (so-called "foreign") interface. But this seems to be fine
for the scenarios that the Linux system embedded inside an Ocelot switch
is intended for - it is simply not interested in unknown unicast
traffic, as explained in Allan Nielsen's presentation [0].
On the other hand, the Felix DSA switch is integrated in more
general-purpose Linux systems, so it can't afford to drop that sort of
traffic in hardware, even if it will end up doing so later, in software.
Actually, unknown unicast means more for Felix than it does for Ocelot.
Felix doesn't attempt to perform the whitelisting of switch port MAC
addresses towards PGID_CPU at all, mainly because it is too complicated
to be feasible: while the MAC addresses are unique in Ocelot, by default
in DSA all ports are equal and inherited from the DSA master. This adds
into account the question of reference counting MAC addresses (delayed
ocelot_mact_forget), not to mention reference counting for the VLAN IDs
that those MAC addresses are installed in. This reference counting
should be done in the DSA core, and the fact that it wasn't needed so
far is due to the fact that the other DSA switches don't have the DSA
tag placed before Ethernet, so the DSA master is able to whitelist the
MAC addresses in hardware.
So this means that even regular traffic termination on a Felix switch
port happens through flooding (because neither Felix nor Ocelot learn
source MAC addresses from CPU-injected frames).
So far we've explained that whitelisting towards PGID_CPU:
- helps to reduce the likelihood of spamming the CPU with frames it
won't process very far anyway
- is implemented in the ocelot driver
- is sufficient for the ocelot use cases
- is not feasible in DSA
- breaks use cases in DSA, in the current status (whitelisting enabled
but no MAC address whitelisted)
So the proposed patch allows unknown unicast frames to be sent to the
CPU port module. This is done for the Felix DSA driver only, as Ocelot
seems to be happy without it.
[0]: https://www.youtube.com/watch?v=B1HhxEcU7Jg
Suggested-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the Felix DSA driver is implementing its own PHYLINK instance due
to SoC differences, it needs access to the few registers that are
common, mainly for flow control.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ocelot switchdev driver and the Felix DSA one need it for different
reasons. Felix (or at least the VSC9959 instantiation in NXP LS1028A) is
integrated with the traditional NXP Layerscape PCS design which does not
support runtime configuration of SerDes protocol. So it needs to
pre-validate the phy-mode from the device tree and prevent PHYLINK from
attempting to change it. For this, it needs to cache it in a private
variable.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The method getting TX timestamp by reading timestamp FIFO and
matching skbs list is common for DSA Felix driver too.
So move code out of ocelot_board.c, convert to use
ocelot_get_txtstamp() function and export it.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Felix DSA driver needs to write to SYS_RAM_INIT_RAM_INIT for its own
chip initialization process.
Also update the MAINTAINERS file such that the headers exported by the
ocelot driver are under the same maintainers' umbrella as the driver
itself.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will be registering another switch driver based on ocelot, which
lives under drivers/net/dsa.
Make sure the Felix DSA front-end has the necessary abstractions to
implement a new Ocelot driver instantiation. This includes the function
prototypes for implementing DSA callbacks.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Felix switch has a different reset procedure, so a function pointer
needs to be created and added to the ocelot_ops structure.
The reset procedure has been moved into ocelot_init.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using the NPI port, the DSA tag is passed through Ethernet, so the
switch's MAC needs to accept it as it comes from the DSA master. Increase
the MTU on the external CPU port to account for the length of the
injection header.
Without this patch, MTU-sized frames are dropped by the switch's CPU
port on xmit, which is especially obvious in TCP sessions.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This constant will be used in a future patch to increase the MTU on NPI
ports, and will also be used in the tagger driver for Felix.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The adjust_link routine should be generic enough to be (re)used by
any SoC that integrates a switch core compatible with the Ocelot
core switch driver. Currently all configurations are generic except
for the PCS settings that are SoC specific. Move these out to the
Ocelot SoC/board instance.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let's make this ioremap and regmap init code common. It should not
be platform dependent as it should be usable by PCI devices too.
Use better names where necessary to avoid clashes.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the places that configure routing destinations for the CPU port
have been marked as such, allow callers to specify their own CPU port
that is different than ocelot->num_phys_ports. A user will be the Felix
DSA driver, where the CPU port is one of the physical ports (NPI mode).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot and ocelot_port structures will be used by a new DSA driver,
so the ocelot_board.c file will have to allocate and work with a private
structure (ocelot_port_private), which embeds the generic struct
ocelot_port. This is because in DSA, at least one interface does not
have a net_device, and the DSA driver API does not interact with that
anyway.
The ocelot_port structure is equivalent to dsa_port, and ocelot to
dsa_switch. The members of ocelot_port which have an equivalent in
dsa_port (such as dp->vlan_filtering) have been moved to
ocelot_port_private.
We want to enforce the coding convention that "ocelot_port" refers to
the structure, and "port" refers to the integer index. One can retrieve
the structure at any time from ocelot->ports[port].
The patch is large but only contains variable renaming and mechanical
movement of fields from one structure to another.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "read-modify-write register index" function is declared with a
confusing prototype: the "mask" and "reg" arguments are swapped.
Fortunately, this does not affect callers so far. Both arguments are
u32, and the wrapper macros (ocelot_rmw_ix etc) have the arguments in
the correct order (the one from ocelot_io.c).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for PTP Hardware Clock (PHC) to the Ocelot
switch for both PTP 1-step and 2-step modes.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In struct frame_info, the cpuq member is never used. This cosmetic patch
removes it from the structure, and from the parsing of the frame header
as it's only set but never used.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for using the PTP register range, and adds a
description of its registers. This bank is used when configuring PTP.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ACL support using the TCAM. Using ACL it is possible to create rules
in hardware to filter/redirect frames.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The phylink conflict was between a bug fix by Russell King
to make sure we have a consistent PHY interface mode, and
a change in net-next to pull some code in phylink_resolve()
into the helper functions phylink_mac_link_{up,down}()
On the dp83867 side it's mostly overlapping changes, with
the 'net' side removing a condition that was supposed to
trigger for RGMII but because of how it was coded never
actually could trigger.
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware offload of matchall classifier and police action are now
supported via the tc command.
Supported police parameters are: rate and burst.
Example:
Add:
tc qdisc add dev eth3 handle ffff: ingress
tc filter add dev eth3 parent ffff: prio 1 handle 2 \
matchall skip_sw \
action police rate 100Mbit burst 10000
Show:
tc -s -d qdisc show dev eth3
tc -s -d filter show dev eth3 ingress
Delete:
tc filter del dev eth3 parent ffff: prio 1
tc qdisc del dev eth3 handle ffff: ingress
Signed-off-by: Joergen Andreasen <joergen.andreasen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doing kmalloc in atomic context is always an issue,
more so for a list that can grow significantly.
Turns out that the driver only uses the duplicated
list of multicast mac addresses to keep track of
what addresses to delete from h/w before committing
the new list from kernel to h/w back again via set_rx_mode,
every time this list gets updated by the kernel.
Given that the h/w knows how to add and delete mac addresses
based on the mac address value alone, __dev_mc_sync should be
the much better choice of kernel API for these operations
avoiding the considerable overhead of maintaining a duplicated
list in the driver.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patches will change the way we communicate setting a port's
attribute and use notifiers to perform those tasks.
Ocelot does not currently have an atomic notifier registered for
switchdev events, so we need to register one in order to deal with
atomic context SWITCHDEV_PORT_ATTR_SET events.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patches will change the way of distributing port object
changes from a switchdev operation to a switchdev notifier. The
switchdev code currently recursively descends through layers of lower
devices, eventually calling the op on a front-panel port device. The
notifier will instead be sent referencing the bridge port device, which
may be a stacking device that's one of front-panel ports uppers, or a
completely unrelated device.
Dispatch the new events to ocelot_port_obj_add() resp. _del() to
maintain the same behavior that the switchdev operation based code
currently has. Pass through switchdev_handle_port_obj_add() / _del() to
handle the recursive descend, because Ocelot supports LAG uppers.
Register to the new switchdev blocking notifier chain to get the new
events when they start getting distributed.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, the SerDes muxing was hardcoded to a given mode in the MAC
controller driver. Now, the SerDes muxing is configured within the
Device Tree and is enforced in the MAC controller driver so we can have
a lot of different SerDes configurations.
Make use of the SerDes PHYs in the MAC controller to set up the SerDes
according to the SerDes<->switch port mapping and the communication mode
with the Ethernet PHY.
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since HSIO address space can be accessed by different drivers, let's
simplify the register address definitions so that it can be easily used
by all drivers and put the register address definition in the
include/soc/mscc/ocelot_hsio.h header file.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since HSIO address space can be used by different drivers (PLL, SerDes
muxing, temperature sensor), let's move it somewhere it can be included
by all drivers.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add link aggregation hardware offload support for Ocelot.
ocelot_get_link_ksettings() is not great but it does work until the driver
is reworked to switch to phylink.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a driver for Microsemi Ocelot Ethernet switch support.
This makes two modules:
mscc_ocelot_common handles all the common features that doesn't depend on
how the switch is integrated in the SoC. Currently, it handles offloading
bridging to the hardware. ocelot_io.c handles register accesses. This is
unfortunately needed because the register layout is packed and then depends
on the number of ports available on the switch. The register definition
files are automatically generated.
ocelot_board handles the switch integration on the SoC and on the board.
Frame injection and extraction to/from the CPU port is currently done using
register accesses which is quite slow. DMA is possible but the port is not
able to absorb the whole switch bandwidth.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>