Files
linux/drivers/net/ethernet/mscc/ocelot_vsc7514.c

656 lines
18 KiB
C
Raw Normal View History

// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/*
* Microsemi Ocelot Switch driver
*
* Copyright (c) 2017 Microsemi Corporation
*/
#include <linux/dsa/ocelot.h>
#include <linux/interrupt.h>
#include <linux/iopoll.h>
#include <linux/module.h>
#include <linux/of_net.h>
#include <linux/netdevice.h>
net: mscc: ocelot: convert to phylink The felix DSA driver, which is a wrapper over the same hardware class as ocelot, is integrated with phylink, but ocelot is using the plain PHY library. It makes sense to bring together the two implementations, which is what this patch achieves. This is a large patch and hard to break up, but it does the following: The existing ocelot_adjust_link writes some registers, and felix_phylink_mac_link_up writes some registers, some of them are common, but both functions write to some registers to which the other doesn't. The main reasons for this are: - Felix switches so far have used an NXP PCS so they had no need to write the PCS1G registers that ocelot_adjust_link writes - Felix switches have the MAC fixed at 1G, so some of the MAC speed changes actually break the link and must be avoided. The naming conventions for the functions introduced in this patch are: - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot instantiations and placed in ocelot_net.c which is built only for the ocelot switchdev driver. - ocelot_phylink_mac_link_{up,down} are shared between the ocelot switchdev driver and the felix DSA driver (they are put in the common lib). One by one, the registers written by ocelot_adjust_link are: DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this register since its out-of-reset value was fine and did not need changing. The write is moved to the common ocelot_phylink_mac_link_up and on felix it is guarded by a quirk bit that makes the written value identical with the out-of-reset one DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config PCS1G_MODE_CFG - same as above PCS1G_SD_CFG - same as above PCS1G_CFG - same as above PCS1G_ANEG_CFG - same as above PCS1G_LB_CFG - same as above DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable touched this. felix_phylink_mac_link_{up,down} also do. We go with what felix does and put it in ocelot_phylink_mac_link_up. DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both write this, but to different values. Move to the common ocelot_phylink_mac_link_up and make sure via the quirk that the old values are preserved for both. ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up did not. Runtime invariant, speed does not matter since PFC is disabled via the RX_PFC_ENA bits which are cleared. Move to vsc7514_phylink_mac_config. QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and felix_phylink_mac_link_{up,down} wrote this. Ocelot also wrote this register from ocelot_port_disable. Keep what felix did, move in ocelot_phylink_mac_link_{up,down} and delete ocelot_port_disable. ANA_POL_FLOWC - same as above SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas ocelot always enabled RX and TX flow control, felix listened to phylink (for the most part, at least - see the 2500base-X comment). The registers which only felix_phylink_mac_link_up wrote are: SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control worked on ocelot. Not it should, since the code is shared with felix where it does. ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink should be the one touching them, deleted. Other changes: - The old phylib registration code was in mscc_ocelot_init_ports. It is hard to work with 2 levels of indentation already in, and with hard to follow teardown logic. The new phylink registration code was moved inside ocelot_probe_port(), right between alloc_etherdev() and register_netdev(). It could not be done before (=> outside of) ocelot_probe_port() because ocelot_probe_port() allocates the struct ocelot_port which we then use to assign ocelot_port->phy_mode to. It is more preferable to me to have all PHY handling logic inside the same function. - On the same topic: struct ocelot_port_private :: serdes is only used in ocelot_port_open to set the SERDES protocol to Ethernet. This is logically a runtime invariant and can be done just once, when the port registers with phylink. We therefore don't even need to keep the serdes reference inside struct ocelot_port_private, or to use the devm variant of of_phy_get(). - Phylink needs a valid phy-mode for phylink_create() to succeed, and the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts don't define one for the internal PHY ports. So we patch PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL. - There was a strategically placed: switch (priv->phy_mode) { case PHY_INTERFACE_MODE_NA: continue; which made the code skip the serdes initialization for the internal PHY ports. Frankly that is not all that obvious, so now we explicitly initialize the serdes under an "if" condition and not rely on code jumps, so everything is clearer. - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII ports. Since that is in fact the default value for the register field DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out of reset, which does match the comment. I don't even want to know why this code is placed there, but if there is indeed an issue that all ports that share a QSGMII lane must all be up, then this logic is already buggy, since mscc_ocelot_init_ports iterates using for_each_available_child_of_node, so nobody prevents the user from putting a 'status = "disabled";' for some QSGMII ports which would break the driver's assumption. In any case, in the eventuality that I'm right, we would have yet another issue if ocelot_phylink_mac_link_down would reset those ports and that would be forbidden, so since the ocelot_adjust_link logic did not do that (maybe for a reason), add another quirk to preserve the old logic. The ocelot driver teardown goes through all ports in one fell swoop. When initialization of one port fails, the ocelot->ports[port] pointer for that is reset to NULL, and teardown is done only for non-NULL ports, so there is no reason to do partial teardowns, let the central mscc_ocelot_release_ports() do its job. Tested bind, unbind, rebind, link up, link down, speed change on mock-up hardware (modified the driver to probe on Felix VSC9959). Also regression tested the felix DSA driver. Could not test the Ocelot specific bits (PCS1G, SERDES, device tree bindings). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-15 04:47:48 +03:00
#include <linux/phylink.h>
#include <linux/of_mdio.h>
#include <linux/of_platform.h>
#include <linux/mfd/syscon.h>
#include <linux/skbuff.h>
#include <net/switchdev.h>
#include <soc/mscc/ocelot_vcap.h>
#include <soc/mscc/ocelot_hsio.h>
#include <soc/mscc/vsc7514_regs.h>
#include "ocelot_fdma.h"
#include "ocelot.h"
#define VSC7514_VCAP_POLICER_BASE 128
#define VSC7514_VCAP_POLICER_MAX 191
#define MEM_INIT_SLEEP_US 1000
#define MEM_INIT_TIMEOUT_US 100000
static const u32 *ocelot_regmap[TARGET_MAX] = {
[ANA] = vsc7514_ana_regmap,
[QS] = vsc7514_qs_regmap,
[QSYS] = vsc7514_qsys_regmap,
[REW] = vsc7514_rew_regmap,
[SYS] = vsc7514_sys_regmap,
[S0] = vsc7514_vcap_regmap,
[S1] = vsc7514_vcap_regmap,
[S2] = vsc7514_vcap_regmap,
[PTP] = vsc7514_ptp_regmap,
[DEV_GMII] = vsc7514_dev_gmii_regmap,
};
static const struct reg_field ocelot_regfields[REGFIELD_MAX] = {
[ANA_ADVLEARN_VLAN_CHK] = REG_FIELD(ANA_ADVLEARN, 11, 11),
[ANA_ADVLEARN_LEARN_MIRROR] = REG_FIELD(ANA_ADVLEARN, 0, 10),
[ANA_ANEVENTS_MSTI_DROP] = REG_FIELD(ANA_ANEVENTS, 27, 27),
[ANA_ANEVENTS_ACLKILL] = REG_FIELD(ANA_ANEVENTS, 26, 26),
[ANA_ANEVENTS_ACLUSED] = REG_FIELD(ANA_ANEVENTS, 25, 25),
[ANA_ANEVENTS_AUTOAGE] = REG_FIELD(ANA_ANEVENTS, 24, 24),
[ANA_ANEVENTS_VS2TTL1] = REG_FIELD(ANA_ANEVENTS, 23, 23),
[ANA_ANEVENTS_STORM_DROP] = REG_FIELD(ANA_ANEVENTS, 22, 22),
[ANA_ANEVENTS_LEARN_DROP] = REG_FIELD(ANA_ANEVENTS, 21, 21),
[ANA_ANEVENTS_AGED_ENTRY] = REG_FIELD(ANA_ANEVENTS, 20, 20),
[ANA_ANEVENTS_CPU_LEARN_FAILED] = REG_FIELD(ANA_ANEVENTS, 19, 19),
[ANA_ANEVENTS_AUTO_LEARN_FAILED] = REG_FIELD(ANA_ANEVENTS, 18, 18),
[ANA_ANEVENTS_LEARN_REMOVE] = REG_FIELD(ANA_ANEVENTS, 17, 17),
[ANA_ANEVENTS_AUTO_LEARNED] = REG_FIELD(ANA_ANEVENTS, 16, 16),
[ANA_ANEVENTS_AUTO_MOVED] = REG_FIELD(ANA_ANEVENTS, 15, 15),
[ANA_ANEVENTS_DROPPED] = REG_FIELD(ANA_ANEVENTS, 14, 14),
[ANA_ANEVENTS_CLASSIFIED_DROP] = REG_FIELD(ANA_ANEVENTS, 13, 13),
[ANA_ANEVENTS_CLASSIFIED_COPY] = REG_FIELD(ANA_ANEVENTS, 12, 12),
[ANA_ANEVENTS_VLAN_DISCARD] = REG_FIELD(ANA_ANEVENTS, 11, 11),
[ANA_ANEVENTS_FWD_DISCARD] = REG_FIELD(ANA_ANEVENTS, 10, 10),
[ANA_ANEVENTS_MULTICAST_FLOOD] = REG_FIELD(ANA_ANEVENTS, 9, 9),
[ANA_ANEVENTS_UNICAST_FLOOD] = REG_FIELD(ANA_ANEVENTS, 8, 8),
[ANA_ANEVENTS_DEST_KNOWN] = REG_FIELD(ANA_ANEVENTS, 7, 7),
[ANA_ANEVENTS_BUCKET3_MATCH] = REG_FIELD(ANA_ANEVENTS, 6, 6),
[ANA_ANEVENTS_BUCKET2_MATCH] = REG_FIELD(ANA_ANEVENTS, 5, 5),
[ANA_ANEVENTS_BUCKET1_MATCH] = REG_FIELD(ANA_ANEVENTS, 4, 4),
[ANA_ANEVENTS_BUCKET0_MATCH] = REG_FIELD(ANA_ANEVENTS, 3, 3),
[ANA_ANEVENTS_CPU_OPERATION] = REG_FIELD(ANA_ANEVENTS, 2, 2),
[ANA_ANEVENTS_DMAC_LOOKUP] = REG_FIELD(ANA_ANEVENTS, 1, 1),
[ANA_ANEVENTS_SMAC_LOOKUP] = REG_FIELD(ANA_ANEVENTS, 0, 0),
[ANA_TABLES_MACACCESS_B_DOM] = REG_FIELD(ANA_TABLES_MACACCESS, 18, 18),
[ANA_TABLES_MACTINDX_BUCKET] = REG_FIELD(ANA_TABLES_MACTINDX, 10, 11),
[ANA_TABLES_MACTINDX_M_INDEX] = REG_FIELD(ANA_TABLES_MACTINDX, 0, 9),
[QSYS_TIMED_FRAME_ENTRY_TFRM_VLD] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 20, 20),
[QSYS_TIMED_FRAME_ENTRY_TFRM_FP] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 8, 19),
[QSYS_TIMED_FRAME_ENTRY_TFRM_PORTNO] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 4, 7),
[QSYS_TIMED_FRAME_ENTRY_TFRM_TM_SEL] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 1, 3),
[QSYS_TIMED_FRAME_ENTRY_TFRM_TM_T] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 0, 0),
[SYS_RESET_CFG_CORE_ENA] = REG_FIELD(SYS_RESET_CFG, 2, 2),
[SYS_RESET_CFG_MEM_ENA] = REG_FIELD(SYS_RESET_CFG, 1, 1),
[SYS_RESET_CFG_MEM_INIT] = REG_FIELD(SYS_RESET_CFG, 0, 0),
/* Replicated per number of ports (12), register size 4 per port */
[QSYS_SWITCH_PORT_MODE_PORT_ENA] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 14, 14, 12, 4),
[QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 11, 13, 12, 4),
[QSYS_SWITCH_PORT_MODE_YEL_RSRVD] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 10, 10, 12, 4),
[QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 9, 9, 12, 4),
[QSYS_SWITCH_PORT_MODE_TX_PFC_ENA] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 1, 8, 12, 4),
[QSYS_SWITCH_PORT_MODE_TX_PFC_MODE] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 0, 0, 12, 4),
[SYS_PORT_MODE_DATA_WO_TS] = REG_FIELD_ID(SYS_PORT_MODE, 5, 6, 12, 4),
[SYS_PORT_MODE_INCL_INJ_HDR] = REG_FIELD_ID(SYS_PORT_MODE, 3, 4, 12, 4),
[SYS_PORT_MODE_INCL_XTR_HDR] = REG_FIELD_ID(SYS_PORT_MODE, 1, 2, 12, 4),
[SYS_PORT_MODE_INCL_HDR_ERR] = REG_FIELD_ID(SYS_PORT_MODE, 0, 0, 12, 4),
[SYS_PAUSE_CFG_PAUSE_START] = REG_FIELD_ID(SYS_PAUSE_CFG, 10, 18, 12, 4),
[SYS_PAUSE_CFG_PAUSE_STOP] = REG_FIELD_ID(SYS_PAUSE_CFG, 1, 9, 12, 4),
[SYS_PAUSE_CFG_PAUSE_ENA] = REG_FIELD_ID(SYS_PAUSE_CFG, 0, 1, 12, 4),
};
static const struct ocelot_stat_layout ocelot_stats_layout[OCELOT_NUM_STATS] = {
OCELOT_COMMON_STATS,
};
static void ocelot_pll5_init(struct ocelot *ocelot)
{
/* Configure PLL5. This will need a proper CCF driver
* The values are coming from the VTSS API for Ocelot
*/
regmap_write(ocelot->targets[HSIO], HSIO_PLL5G_CFG4,
HSIO_PLL5G_CFG4_IB_CTRL(0x7600) |
HSIO_PLL5G_CFG4_IB_BIAS_CTRL(0x8));
regmap_write(ocelot->targets[HSIO], HSIO_PLL5G_CFG0,
HSIO_PLL5G_CFG0_CORE_CLK_DIV(0x11) |
HSIO_PLL5G_CFG0_CPU_CLK_DIV(2) |
HSIO_PLL5G_CFG0_ENA_BIAS |
HSIO_PLL5G_CFG0_ENA_VCO_BUF |
HSIO_PLL5G_CFG0_ENA_CP1 |
HSIO_PLL5G_CFG0_SELCPI(2) |
HSIO_PLL5G_CFG0_LOOP_BW_RES(0xe) |
HSIO_PLL5G_CFG0_SELBGV820(4) |
HSIO_PLL5G_CFG0_DIV4 |
HSIO_PLL5G_CFG0_ENA_CLKTREE |
HSIO_PLL5G_CFG0_ENA_LANE);
regmap_write(ocelot->targets[HSIO], HSIO_PLL5G_CFG2,
HSIO_PLL5G_CFG2_EN_RESET_FRQ_DET |
HSIO_PLL5G_CFG2_EN_RESET_OVERRUN |
HSIO_PLL5G_CFG2_GAIN_TEST(0x8) |
HSIO_PLL5G_CFG2_ENA_AMPCTRL |
HSIO_PLL5G_CFG2_PWD_AMPCTRL_N |
HSIO_PLL5G_CFG2_AMPC_SEL(0x10));
}
static int ocelot_chip_init(struct ocelot *ocelot, const struct ocelot_ops *ops)
{
int ret;
ocelot->map = ocelot_regmap;
ocelot->stats_layout = ocelot_stats_layout;
ocelot->num_mact_rows = 1024;
ocelot->ops = ops;
ret = ocelot_regfields_init(ocelot, ocelot_regfields);
if (ret)
return ret;
ocelot_pll5_init(ocelot);
eth_random_addr(ocelot->base_mac);
ocelot->base_mac[5] &= 0xf0;
return 0;
}
static irqreturn_t ocelot_xtr_irq_handler(int irq, void *arg)
{
struct ocelot *ocelot = arg;
int grp = 0, err;
while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)) {
struct sk_buff *skb;
err = ocelot_xtr_poll_frame(ocelot, grp, &skb);
if (err)
goto out;
skb->dev->stats.rx_bytes += skb->len;
skb->dev->stats.rx_packets++;
if (!skb_defer_rx_timestamp(skb))
netif_rx(skb);
}
out:
if (err < 0)
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 00:38:01 +02:00
ocelot_drain_cpu_queue(ocelot, 0);
return IRQ_HANDLED;
}
static irqreturn_t ocelot_ptp_rdy_irq_handler(int irq, void *arg)
{
struct ocelot *ocelot = arg;
ocelot_get_txtstamp(ocelot);
return IRQ_HANDLED;
}
static const struct of_device_id mscc_ocelot_match[] = {
{ .compatible = "mscc,vsc7514-switch" },
{ }
};
MODULE_DEVICE_TABLE(of, mscc_ocelot_match);
static int ocelot_mem_init_status(struct ocelot *ocelot)
{
unsigned int val;
int err;
err = regmap_field_read(ocelot->regfields[SYS_RESET_CFG_MEM_INIT],
&val);
return err ?: val;
}
static int ocelot_reset(struct ocelot *ocelot)
{
int err;
u32 val;
err = regmap_field_write(ocelot->regfields[SYS_RESET_CFG_MEM_INIT], 1);
if (err)
return err;
err = regmap_field_write(ocelot->regfields[SYS_RESET_CFG_MEM_ENA], 1);
if (err)
return err;
/* MEM_INIT is a self-clearing bit. Wait for it to be cleared (should be
* 100us) before enabling the switch core.
*/
err = readx_poll_timeout(ocelot_mem_init_status, ocelot, val, !val,
MEM_INIT_SLEEP_US, MEM_INIT_TIMEOUT_US);
if (err)
return err;
err = regmap_field_write(ocelot->regfields[SYS_RESET_CFG_MEM_ENA], 1);
if (err)
return err;
return regmap_field_write(ocelot->regfields[SYS_RESET_CFG_CORE_ENA], 1);
}
/* Watermark encode
* Bit 8: Unit; 0:1, 1:16
* Bit 7-0: Value to be multiplied with unit
*/
static u16 ocelot_wm_enc(u16 value)
{
WARN_ON(value >= 16 * BIT(8));
if (value >= BIT(8))
return BIT(8) | (value / 16);
return value;
}
static u16 ocelot_wm_dec(u16 wm)
{
if (wm & BIT(8))
return (wm & GENMASK(7, 0)) * 16;
return wm;
}
static void ocelot_wm_stat(u32 val, u32 *inuse, u32 *maxuse)
{
*inuse = (val & GENMASK(23, 12)) >> 12;
*maxuse = val & GENMASK(11, 0);
}
static const struct ocelot_ops ocelot_ops = {
.reset = ocelot_reset,
.wm_enc = ocelot_wm_enc,
.wm_dec = ocelot_wm_dec,
.wm_stat = ocelot_wm_stat,
.port_to_netdev = ocelot_port_to_netdev,
.netdev_to_port = ocelot_netdev_to_port,
};
static struct vcap_props vsc7514_vcap_props[] = {
[VCAP_ES0] = {
.action_type_width = 0,
.action_table = {
[ES0_ACTION_TYPE_NORMAL] = {
.width = 73, /* HIT_STICKY not included */
.count = 1,
},
},
.target = S0,
.keys = vsc7514_vcap_es0_keys,
.actions = vsc7514_vcap_es0_actions,
},
[VCAP_IS1] = {
.action_type_width = 0,
.action_table = {
[IS1_ACTION_TYPE_NORMAL] = {
.width = 78, /* HIT_STICKY not included */
.count = 4,
},
},
.target = S1,
.keys = vsc7514_vcap_is1_keys,
.actions = vsc7514_vcap_is1_actions,
},
[VCAP_IS2] = {
.action_type_width = 1,
.action_table = {
[IS2_ACTION_TYPE_NORMAL] = {
.width = 49,
.count = 2
},
[IS2_ACTION_TYPE_SMAC_SIP] = {
.width = 6,
.count = 4
},
},
.target = S2,
.keys = vsc7514_vcap_is2_keys,
.actions = vsc7514_vcap_is2_actions,
},
};
static struct ptp_clock_info ocelot_ptp_clock_info = {
.owner = THIS_MODULE,
.name = "ocelot ptp",
.max_adj = 0x7fffffff,
.n_alarm = 0,
.n_ext_ts = 0,
.n_per_out = OCELOT_PTP_PINS_NUM,
.n_pins = OCELOT_PTP_PINS_NUM,
.pps = 0,
.gettime64 = ocelot_ptp_gettime64,
.settime64 = ocelot_ptp_settime64,
.adjtime = ocelot_ptp_adjtime,
.adjfine = ocelot_ptp_adjfine,
.verify = ocelot_ptp_verify,
.enable = ocelot_ptp_enable,
};
static void mscc_ocelot_teardown_devlink_ports(struct ocelot *ocelot)
{
int port;
for (port = 0; port < ocelot->num_phys_ports; port++)
ocelot_port_devlink_teardown(ocelot, port);
}
static void mscc_ocelot_release_ports(struct ocelot *ocelot)
{
int port;
for (port = 0; port < ocelot->num_phys_ports; port++) {
struct ocelot_port *ocelot_port;
ocelot_port = ocelot->ports[port];
if (!ocelot_port)
continue;
ocelot_deinit_port(ocelot, port);
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
ocelot_release_port(ocelot_port);
}
}
static int mscc_ocelot_init_ports(struct platform_device *pdev,
struct device_node *ports)
{
struct ocelot *ocelot = platform_get_drvdata(pdev);
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
u32 devlink_ports_registered = 0;
struct device_node *portnp;
int port, err;
u32 reg;
ocelot->ports = devm_kcalloc(ocelot->dev, ocelot->num_phys_ports,
sizeof(struct ocelot_port *), GFP_KERNEL);
if (!ocelot->ports)
return -ENOMEM;
ocelot->devlink_ports = devm_kcalloc(ocelot->dev,
ocelot->num_phys_ports,
sizeof(*ocelot->devlink_ports),
GFP_KERNEL);
if (!ocelot->devlink_ports)
return -ENOMEM;
for_each_available_child_of_node(ports, portnp) {
struct ocelot_port_private *priv;
struct ocelot_port *ocelot_port;
struct devlink_port *dlp;
struct regmap *target;
struct resource *res;
char res_name[8];
if (of_property_read_u32(portnp, "reg", &reg))
continue;
port = reg;
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
if (port < 0 || port >= ocelot->num_phys_ports) {
dev_err(ocelot->dev,
"invalid port number: %d >= %d\n", port,
ocelot->num_phys_ports);
continue;
}
snprintf(res_name, sizeof(res_name), "port%d", port);
res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
res_name);
target = ocelot_regmap_init(ocelot, res);
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
if (IS_ERR(target)) {
err = PTR_ERR(target);
of_node_put(portnp);
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
goto out_teardown;
}
err = ocelot_port_devlink_init(ocelot, port,
DEVLINK_PORT_FLAVOUR_PHYSICAL);
if (err) {
of_node_put(portnp);
goto out_teardown;
}
net: mscc: ocelot: convert to phylink The felix DSA driver, which is a wrapper over the same hardware class as ocelot, is integrated with phylink, but ocelot is using the plain PHY library. It makes sense to bring together the two implementations, which is what this patch achieves. This is a large patch and hard to break up, but it does the following: The existing ocelot_adjust_link writes some registers, and felix_phylink_mac_link_up writes some registers, some of them are common, but both functions write to some registers to which the other doesn't. The main reasons for this are: - Felix switches so far have used an NXP PCS so they had no need to write the PCS1G registers that ocelot_adjust_link writes - Felix switches have the MAC fixed at 1G, so some of the MAC speed changes actually break the link and must be avoided. The naming conventions for the functions introduced in this patch are: - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot instantiations and placed in ocelot_net.c which is built only for the ocelot switchdev driver. - ocelot_phylink_mac_link_{up,down} are shared between the ocelot switchdev driver and the felix DSA driver (they are put in the common lib). One by one, the registers written by ocelot_adjust_link are: DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this register since its out-of-reset value was fine and did not need changing. The write is moved to the common ocelot_phylink_mac_link_up and on felix it is guarded by a quirk bit that makes the written value identical with the out-of-reset one DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config PCS1G_MODE_CFG - same as above PCS1G_SD_CFG - same as above PCS1G_CFG - same as above PCS1G_ANEG_CFG - same as above PCS1G_LB_CFG - same as above DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable touched this. felix_phylink_mac_link_{up,down} also do. We go with what felix does and put it in ocelot_phylink_mac_link_up. DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both write this, but to different values. Move to the common ocelot_phylink_mac_link_up and make sure via the quirk that the old values are preserved for both. ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up did not. Runtime invariant, speed does not matter since PFC is disabled via the RX_PFC_ENA bits which are cleared. Move to vsc7514_phylink_mac_config. QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and felix_phylink_mac_link_{up,down} wrote this. Ocelot also wrote this register from ocelot_port_disable. Keep what felix did, move in ocelot_phylink_mac_link_{up,down} and delete ocelot_port_disable. ANA_POL_FLOWC - same as above SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas ocelot always enabled RX and TX flow control, felix listened to phylink (for the most part, at least - see the 2500base-X comment). The registers which only felix_phylink_mac_link_up wrote are: SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control worked on ocelot. Not it should, since the code is shared with felix where it does. ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink should be the one touching them, deleted. Other changes: - The old phylib registration code was in mscc_ocelot_init_ports. It is hard to work with 2 levels of indentation already in, and with hard to follow teardown logic. The new phylink registration code was moved inside ocelot_probe_port(), right between alloc_etherdev() and register_netdev(). It could not be done before (=> outside of) ocelot_probe_port() because ocelot_probe_port() allocates the struct ocelot_port which we then use to assign ocelot_port->phy_mode to. It is more preferable to me to have all PHY handling logic inside the same function. - On the same topic: struct ocelot_port_private :: serdes is only used in ocelot_port_open to set the SERDES protocol to Ethernet. This is logically a runtime invariant and can be done just once, when the port registers with phylink. We therefore don't even need to keep the serdes reference inside struct ocelot_port_private, or to use the devm variant of of_phy_get(). - Phylink needs a valid phy-mode for phylink_create() to succeed, and the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts don't define one for the internal PHY ports. So we patch PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL. - There was a strategically placed: switch (priv->phy_mode) { case PHY_INTERFACE_MODE_NA: continue; which made the code skip the serdes initialization for the internal PHY ports. Frankly that is not all that obvious, so now we explicitly initialize the serdes under an "if" condition and not rely on code jumps, so everything is clearer. - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII ports. Since that is in fact the default value for the register field DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out of reset, which does match the comment. I don't even want to know why this code is placed there, but if there is indeed an issue that all ports that share a QSGMII lane must all be up, then this logic is already buggy, since mscc_ocelot_init_ports iterates using for_each_available_child_of_node, so nobody prevents the user from putting a 'status = "disabled";' for some QSGMII ports which would break the driver's assumption. In any case, in the eventuality that I'm right, we would have yet another issue if ocelot_phylink_mac_link_down would reset those ports and that would be forbidden, so since the ocelot_adjust_link logic did not do that (maybe for a reason), add another quirk to preserve the old logic. The ocelot driver teardown goes through all ports in one fell swoop. When initialization of one port fails, the ocelot->ports[port] pointer for that is reset to NULL, and teardown is done only for non-NULL ports, so there is no reason to do partial teardowns, let the central mscc_ocelot_release_ports() do its job. Tested bind, unbind, rebind, link up, link down, speed change on mock-up hardware (modified the driver to probe on Felix VSC9959). Also regression tested the felix DSA driver. Could not test the Ocelot specific bits (PCS1G, SERDES, device tree bindings). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-15 04:47:48 +03:00
err = ocelot_probe_port(ocelot, port, target, portnp);
if (err) {
net: mscc: ocelot: allow probing to continue with ports that fail to register The existing ocelot device trees, like ocelot_pcb123.dts for example, have SERDES ports (ports 4 and higher) that do not have status = "disabled"; but on the other hand do not have a phy-handle or a fixed-link either. So from the perspective of phylink, they have broken DT bindings. Since the blamed commit, probing for the entire switch will fail when such a device tree binding is encountered on a port. There used to be this piece of code which skipped ports without a phy-handle: phy_node = of_parse_phandle(portnp, "phy-handle", 0); if (!phy_node) continue; but now it is gone. Anyway, fixed-link setups are a thing which should work out of the box with phylink, so it would not be in the best interest of the driver to add that check back. Instead, let's look at what other drivers do. Since commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal"), DSA continues after a switch port fails to register, and works only with the ports that succeeded. We can achieve the same behavior in ocelot by unregistering the devlink port for ports where ocelot_port_phylink_create() failed (called via ocelot_probe_port), and clear the bit in devlink_ports_registered for that port. This will make the next iteration reconsider the port that failed to probe as an unused port, and re-register a devlink port of type UNUSED for it. No other cleanup should need to be performed, since ocelot_probe_port() should be self-contained when it fails. Fixes: e6e12df625f2 ("net: mscc: ocelot: convert to phylink") Reported-and-tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 19:49:58 +03:00
ocelot_port_devlink_teardown(ocelot, port);
continue;
}
net: mscc: ocelot: allow probing to continue with ports that fail to register The existing ocelot device trees, like ocelot_pcb123.dts for example, have SERDES ports (ports 4 and higher) that do not have status = "disabled"; but on the other hand do not have a phy-handle or a fixed-link either. So from the perspective of phylink, they have broken DT bindings. Since the blamed commit, probing for the entire switch will fail when such a device tree binding is encountered on a port. There used to be this piece of code which skipped ports without a phy-handle: phy_node = of_parse_phandle(portnp, "phy-handle", 0); if (!phy_node) continue; but now it is gone. Anyway, fixed-link setups are a thing which should work out of the box with phylink, so it would not be in the best interest of the driver to add that check back. Instead, let's look at what other drivers do. Since commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal"), DSA continues after a switch port fails to register, and works only with the ports that succeeded. We can achieve the same behavior in ocelot by unregistering the devlink port for ports where ocelot_port_phylink_create() failed (called via ocelot_probe_port), and clear the bit in devlink_ports_registered for that port. This will make the next iteration reconsider the port that failed to probe as an unused port, and re-register a devlink port of type UNUSED for it. No other cleanup should need to be performed, since ocelot_probe_port() should be self-contained when it fails. Fixes: e6e12df625f2 ("net: mscc: ocelot: convert to phylink") Reported-and-tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 19:49:58 +03:00
devlink_ports_registered |= BIT(port);
ocelot_port = ocelot->ports[port];
priv = container_of(ocelot_port, struct ocelot_port_private,
port);
dlp = &ocelot->devlink_ports[port];
devlink_port_type_eth_set(dlp, priv->dev);
}
/* Initialize unused devlink ports at the end */
for (port = 0; port < ocelot->num_phys_ports; port++) {
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
if (devlink_ports_registered & BIT(port))
continue;
err = ocelot_port_devlink_init(ocelot, port,
DEVLINK_PORT_FLAVOUR_UNUSED);
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
if (err)
goto out_teardown;
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
devlink_ports_registered |= BIT(port);
}
return 0;
out_teardown:
/* Unregister the network interfaces */
mscc_ocelot_release_ports(ocelot);
/* Tear down devlink ports for the registered network interfaces */
for (port = 0; port < ocelot->num_phys_ports; port++) {
net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 12:12:38 +03:00
if (devlink_ports_registered & BIT(port))
ocelot_port_devlink_teardown(ocelot, port);
}
return err;
}
static int mscc_ocelot_probe(struct platform_device *pdev)
{
struct device_node *np = pdev->dev.of_node;
int err, irq_xtr, irq_ptp_rdy;
struct device_node *ports;
struct devlink *devlink;
struct ocelot *ocelot;
struct regmap *hsio;
unsigned int i;
struct {
enum ocelot_target id;
char *name;
u8 optional:1;
} io_target[] = {
{ SYS, "sys" },
{ REW, "rew" },
{ QSYS, "qsys" },
{ ANA, "ana" },
{ QS, "qs" },
{ S0, "s0" },
{ S1, "s1" },
{ S2, "s2" },
{ PTP, "ptp", 1 },
{ FDMA, "fdma", 1 },
};
if (!np && !pdev->dev.platform_data)
return -ENODEV;
devlink =
devlink_alloc(&ocelot_devlink_ops, sizeof(*ocelot), &pdev->dev);
if (!devlink)
return -ENOMEM;
ocelot = devlink_priv(devlink);
ocelot->devlink = priv_to_devlink(ocelot);
platform_set_drvdata(pdev, ocelot);
ocelot->dev = &pdev->dev;
for (i = 0; i < ARRAY_SIZE(io_target); i++) {
struct regmap *target;
struct resource *res;
res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
io_target[i].name);
target = ocelot_regmap_init(ocelot, res);
if (IS_ERR(target)) {
if (io_target[i].optional) {
ocelot->targets[io_target[i].id] = NULL;
continue;
}
err = PTR_ERR(target);
goto out_free_devlink;
}
ocelot->targets[io_target[i].id] = target;
}
if (ocelot->targets[FDMA])
ocelot_fdma_init(pdev, ocelot);
hsio = syscon_regmap_lookup_by_compatible("mscc,ocelot-hsio");
if (IS_ERR(hsio)) {
dev_err(&pdev->dev, "missing hsio syscon\n");
err = PTR_ERR(hsio);
goto out_free_devlink;
}
ocelot->targets[HSIO] = hsio;
err = ocelot_chip_init(ocelot, &ocelot_ops);
if (err)
goto out_free_devlink;
irq_xtr = platform_get_irq_byname(pdev, "xtr");
if (irq_xtr < 0) {
err = irq_xtr;
goto out_free_devlink;
}
err = devm_request_threaded_irq(&pdev->dev, irq_xtr, NULL,
ocelot_xtr_irq_handler, IRQF_ONESHOT,
"frame extraction", ocelot);
if (err)
goto out_free_devlink;
irq_ptp_rdy = platform_get_irq_byname(pdev, "ptp_rdy");
if (irq_ptp_rdy > 0 && ocelot->targets[PTP]) {
err = devm_request_threaded_irq(&pdev->dev, irq_ptp_rdy, NULL,
ocelot_ptp_rdy_irq_handler,
IRQF_ONESHOT, "ptp ready",
ocelot);
if (err)
goto out_free_devlink;
/* Both the PTP interrupt and the PTP bank are available */
ocelot->ptp = 1;
}
ports = of_get_child_by_name(np, "ethernet-ports");
if (!ports) {
dev_err(ocelot->dev, "no ethernet-ports child node found\n");
err = -ENODEV;
goto out_free_devlink;
}
ocelot->num_phys_ports = of_get_child_count(ports);
net: mscc: ocelot: fix dropping of unknown IPv4 multicast on Seville The current assumption is that the felix DSA driver has flooding knobs per traffic class, while ocelot switchdev has a single flooding knob. This was correct for felix VSC9959 and ocelot VSC7514, but with the introduction of seville VSC9953, we see a switch driven by felix.c which has a single flooding knob. So it is clear that we must do what should have been done from the beginning, which is not to overwrite the configuration done by ocelot.c in felix, but instead to teach the common ocelot library about the differences in our switches, and set up the flooding PGIDs centrally. The effect that the bogus iteration through FELIX_NUM_TC has upon seville is quite dramatic. ANA_FLOODING is located at 0x00b548, and ANA_FLOODING_IPMC is located at 0x00b54c. So the bogus iteration will actually overwrite ANA_FLOODING_IPMC when attempting to write ANA_FLOODING[1]. There is no ANA_FLOODING[1] in sevile, just ANA_FLOODING. And when ANA_FLOODING_IPMC is overwritten with a bogus value, the effect is that ANA_FLOODING_IPMC gets the value of 0x0003CF7D: MC6_DATA = 61, MC6_CTRL = 61, MC4_DATA = 60, MC4_CTRL = 0. Because MC4_CTRL is zero, this means that IPv4 multicast control packets are not flooded, but dropped. An invalid configuration, and this is how the issue was actually spotted. Reported-by: Eldar Gasanov <eldargasanov2@gmail.com> Reported-by: Maxim Kochetkov <fido_max@inbox.ru> Tested-by: Eldar Gasanov <eldargasanov2@gmail.com> Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 3c7b51bd39b2 ("net: dsa: felix: allow flooding for all traffic classes") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201204175416.1445937-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04 19:54:16 +02:00
ocelot->num_flooding_pgids = 1;
ocelot->vcap = vsc7514_vcap_props;
ocelot->vcap_pol.base = VSC7514_VCAP_POLICER_BASE;
ocelot->vcap_pol.max = VSC7514_VCAP_POLICER_MAX;
ocelot->npi = -1;
err = ocelot_init(ocelot);
if (err)
goto out_put_ports;
err = mscc_ocelot_init_ports(pdev, ports);
if (err)
goto out_ocelot_devlink_unregister;
if (ocelot->fdma)
ocelot_fdma_start(ocelot);
net: mscc: ocelot: configure watermarks using devlink-sb Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 04:11:20 +02:00
err = ocelot_devlink_sb_register(ocelot);
if (err)
goto out_ocelot_release_ports;
if (ocelot->ptp) {
err = ocelot_init_timestamp(ocelot, &ocelot_ptp_clock_info);
if (err) {
dev_err(ocelot->dev,
"Timestamp initialization failed\n");
ocelot->ptp = 0;
}
}
register_netdevice_notifier(&ocelot_netdevice_nb);
register_switchdev_notifier(&ocelot_switchdev_nb);
register_switchdev_blocking_notifier(&ocelot_switchdev_blocking_nb);
of_node_put(ports);
devlink_register(devlink);
dev_info(&pdev->dev, "Ocelot switch probed\n");
return 0;
net: mscc: ocelot: configure watermarks using devlink-sb Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 04:11:20 +02:00
out_ocelot_release_ports:
mscc_ocelot_release_ports(ocelot);
mscc_ocelot_teardown_devlink_ports(ocelot);
out_ocelot_devlink_unregister:
ocelot_deinit(ocelot);
out_put_ports:
of_node_put(ports);
out_free_devlink:
devlink_free(devlink);
return err;
}
static int mscc_ocelot_remove(struct platform_device *pdev)
{
struct ocelot *ocelot = platform_get_drvdata(pdev);
if (ocelot->fdma)
ocelot_fdma_deinit(ocelot);
devlink_unregister(ocelot->devlink);
ocelot_deinit_timestamp(ocelot);
net: mscc: ocelot: configure watermarks using devlink-sb Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 04:11:20 +02:00
ocelot_devlink_sb_unregister(ocelot);
mscc_ocelot_release_ports(ocelot);
mscc_ocelot_teardown_devlink_ports(ocelot);
ocelot_deinit(ocelot);
unregister_switchdev_blocking_notifier(&ocelot_switchdev_blocking_nb);
unregister_switchdev_notifier(&ocelot_switchdev_nb);
unregister_netdevice_notifier(&ocelot_netdevice_nb);
devlink_free(ocelot->devlink);
return 0;
}
static struct platform_driver mscc_ocelot_driver = {
.probe = mscc_ocelot_probe,
.remove = mscc_ocelot_remove,
.driver = {
.name = "ocelot-switch",
.of_match_table = mscc_ocelot_match,
},
};
module_platform_driver(mscc_ocelot_driver);
MODULE_DESCRIPTION("Microsemi Ocelot switch driver");
MODULE_AUTHOR("Alexandre Belloni <alexandre.belloni@bootlin.com>");
MODULE_LICENSE("Dual MIT/GPL");