The addition of sja1105_port_status_ether structure into the
statistics causes the frame size to go over the warning limit:
drivers/net/dsa/sja1105/sja1105_ethtool.c:421:6: error: stack frame size of 1104 bytes in function 'sja1105_get_ethtool_stats' [-Werror,-Wframe-larger-than=]
Use dynamic allocation to avoid this.
Fixes: 336aa67bd0 ("net: dsa: sja1105: show more ethtool statistics counters for P/Q/R/S")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like the sja1105 external timestamping input is not as generic
as we thought. When fed a signal with 50% duty cycle, it will timestamp
both the rising and the falling edge. When fed a short pulse signal,
only the timestamp of the falling edge will be seen in the PTPSYNCTS
register, because that of the rising edge had been overwritten. So the
moral is: don't feed it short pulse inputs.
Luckily this is not a complete deal breaker, as we can still work with
1 Hz square waves. But the problem is that the extts polling period was
not dimensioned enough for this input signal. If we leave the period at
half a second, we risk losing timestamps due to jitter in the measuring
process. So we need to increase it to 4 times per second.
Also, the very least we can do to inform the user is to deny any other
flags combination than with PTP_RISING_EDGE and PTP_FALLING_EDGE both
set.
Fixes: 747e5eb31d ("net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit d1cbfd771c ("ptp_clock: Allow for it to be optional") changed
all PTP-capable Ethernet drivers from `select PTP_1588_CLOCK` to `imply
PTP_1588_CLOCK`, "in order to break the hard dependency between the PTP
clock subsystem and ethernet drivers capable of being clock providers."
As a result it is possible to build PTP-capable Ethernet drivers without
the PTP subsystem by deselecting PTP_1588_CLOCK. Drivers are required to
handle the missing dependency gracefully.
Some PTP-capable Ethernet drivers (e.g., TI_CPSW) factor their PTP code
out into separate drivers (e.g., TI_CPTS_MOD). The above commit also
changed these PTP-specific drivers to `imply PTP_1588_CLOCK`, making it
possible to build them without the PTP subsystem. But as Grygorii
Strashko noted in [1]:
On Wed, Apr 22, 2020 at 02:16:11PM +0300, Grygorii Strashko wrote:
> Another question is that CPTS completely nonfunctional in this case and
> it was never expected that somebody will even try to use/run such
> configuration (except for random build purposes).
In my view, enabling a PTP-specific driver without the PTP subsystem is
a configuration error made possible by the above commit. Kconfig should
not allow users to create a configuration with missing dependencies that
results in "completely nonfunctional" drivers.
I audited all network drivers that call ptp_clock_register() but merely
`imply PTP_1588_CLOCK` and found five PTP-specific drivers that are
likely nonfunctional without PTP_1588_CLOCK:
NET_DSA_MV88E6XXX_PTP
NET_DSA_SJA1105_PTP
MACB_USE_HWSTAMP
CAVIUM_PTP
TI_CPTS_MOD
Note how these symbols all reference PTP or timestamping in their name;
this is a clue that they depend on PTP_1588_CLOCK.
Change them from `imply PTP_1588_CLOCK` [2] to `depends on PTP_1588_CLOCK`.
I'm not using `select PTP_1588_CLOCK` here because PTP_1588_CLOCK has
its own dependencies, which `select` would not transitively apply.
Additionally, remove the `select NET_PTP_CLASSIFY` from CPTS_TI_MOD;
PTP_1588_CLOCK already selects that.
[1]: https://lore.kernel.org/lkml/c04458ed-29ee-1797-3a11-7f3f560553e6@ti.com/
[2]: NET_DSA_SJA1105_PTP had never declared any type of dependency on
PTP_1588_CLOCK (`imply` or otherwise); adding a `depends on PTP_1588_CLOCK`
here seems appropriate.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: d1cbfd771c ("ptp_clock: Allow for it to be optional")
Signed-off-by: Clay McClure <clay@daemons.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some boards do not have the RX_ER MII signal connected. Normally in such
situation, those pins would be grounded, but then again, some boards
left it electrically floating.
When sending traffic to those switch ports, one can see that the
N_SOFERR statistics counter is incrementing once per each packet. The
user manual states for this counter that it may count the number of
frames "that have the MII error input being asserted prior to or
up to the SOF delimiter byte". So the switch MAC is sampling an
electrically floating signal, and preventing proper traffic reception
because of that.
As a workaround, enable the internal weak pull-downs on the input pads
for the MII control signals. This way, a floating signal would be
internally tied to ground.
The logic levels of signals which _are_ externally driven should not be
bothered by this 40-50 KOhm internal resistor. So it is not an issue to
enable the internal pull-down unconditionally, irrespective of PHY
interface type (MII, RMII, RGMII, SGMII) and of board layout.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds complete support for manipulating the L2 Policing Tables
from this switch. There are 45 table entries, one entry per each port
and traffic class, and one dedicated entry for broadcast traffic for
each ingress port.
Policing entries are shareable, and we use this functionality to support
shared block filters.
We are modeling broadcast policers as simple tc-flower matches on
dst_mac. As for the traffic class policers, the switch only deduces the
traffic class from the VLAN PCP field, so it makes sense to model this
as a tc-flower match on vlan_prio.
How to limit broadcast traffic coming from all front-panel ports to a
cumulated total of 10 Mbit/s:
tc qdisc add dev sw0p0 ingress_block 1 clsact
tc qdisc add dev sw0p1 ingress_block 1 clsact
tc qdisc add dev sw0p2 ingress_block 1 clsact
tc qdisc add dev sw0p3 ingress_block 1 clsact
tc filter add block 1 flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
action police rate 10mbit burst 64k
How to limit traffic with VLAN PCP 0 (also includes untagged traffic) to
100 Mbit/s on port 0 only:
tc filter add dev sw0p0 ingress protocol 802.1Q flower skip_sw \
vlan_prio 0 action police rate 100mbit burst 64k
The broadcast, VLAN PCP and port policers are compatible with one
another (can be installed at the same time on a port).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds partial configuration support for the L2 Policing Table. Out
of the 45 policing entries, only 5 are used (one for each port), in a
shared manner. All 8 traffic classes, and the broadcast policer, are
redirected to a common instance which belongs to the ingress port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like the P/Q/R/S series supports some more counters,
generically named "Ethernet statistics counter", which we were not
printing. Add them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On this switch, the frame length enforcements are performed by the
ingress policers. There are 2 types of those: regular L2 (also called
best-effort) and Virtual Link policers (an ARINC664/AFDX concept for
defining L2 streams with certain QoS abilities). To avoid future
confusion, I prefer to call the reset reason "Best-effort policers",
even though the VL policers are not yet supported.
We also need to change the setup of the initial static config, such that
DSA calls to .change_mtu (which are expensive) become no-ops and don't
reset the switch 5 times.
A driver-level decision is to unconditionally allow single VLAN-tagged
traffic on all ports. The CPU port must accept an additional VLAN header
for the DSA tag, which is again a driver-level decision.
The policers actually count bytes not only from the SDU, but also from
the Ethernet header and FCS, so those need to be accounted for as well.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SJA1105 switch family has a PTP_CLK pin which emits a signal with
fixed 50% duty cycle, but variable frequency and programmable start time.
On the second generation (P/Q/R/S) switches, this pin supports even more
functionality. The use case described by the hardware documents talks
about synchronization via oneshot pulses: given 2 sja1105 switches,
arbitrarily designated as a master and a slave, the master emits a
single pulse on PTP_CLK, while the slave is configured to timestamp this
pulse received on its PTP_CLK pin (which must obviously be configured as
input). The difference between the timestamps then exactly becomes the
slave offset to the master.
The only trouble with the above is that the hardware is very much tied
into this use case only, and not very generic beyond that:
- When emitting a oneshot pulse, instead of being told when to emit it,
the switch just does it "now" and tells you later what time it was,
via the PTPSYNCTS register. [ Incidentally, this is the same register
that the slave uses to collect the ext_ts timestamp from, too. ]
- On the sync slave, there is no interrupt mechanism on reception of a
new extts, and no FIFO to buffer them, because in the foreseen use
case, software is in control of both the master and the slave pins,
so it "knows" when there's something to collect.
These 2 problems mean that:
- We don't support (at least yet) the quirky oneshot mode exposed by
the hardware, just normal periodic output.
- We abuse the hardware a little bit when we expose generic extts.
Because there's no interrupt mechanism, we need to poll at double the
frequency we expect to receive a pulse. Currently that means a
non-configurable "twice a second".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AVB table contains the CAS_MASTER field (to be added in the next
patch) which decides the direction of the PTP_CLK pin.
Reconfiguring this field dynamically is highly preferable to having to
reset the switch and upload a new static configuration, so we add
support for exactly that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because the PTP_CLK pin starts toggling only at a time higher than the
current PTP clock, this helper from the time-aware shaper code comes in
handy here as well. We'll use it to transform generic user input for the
perout request into valid input for the sja1105 hardware.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These fields configure the destination and source MAC address that the
switch will put in the Ethernet frames sent towards the CPU port that
contain RX timestamps for PTP.
These fields do not enable the feature itself, that is configured via
SEND_META0 and SEND_META1 in the General Params table.
The implication of this patch is that the AVB Params table will always
be present in the static config. Which doesn't really hurt.
This is needed because in a future patch, we will add another field from
this table, CAS_MASTER, for configuring the PTP_CLK pin function. That
can be configured irrespective of whether RX timestamping is enabled or
not, so always having this table present is going to simplify things a
bit.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SJA1105 switches R and S have one SerDes port with an 802.3z
quasi-compatible PCS, hardwired on port 4. The other ports are still
MII/RMII/RGMII. The PCS performs rate adaptation to lower link speeds;
the MAC on this port is hardwired at gigabit. Only full duplex is
supported.
The SGMII port can be configured as part of the static config tables, as
well as through a dedicated SPI address region for its pseudo-clause-22
registers. However it looks like the static configuration is not
able to change some out-of-reset values (like the value of MII_BMCR), so
at the end of the day, having code for it is utterly pointless. We are
just going to use the pseudo-C22 interface.
Because the PCS gets reset when the switch resets, we have to add even
more restoration logic to sja1105_static_config_reload, otherwise the
SGMII port breaks after operations such as enabling PTP timestamping
which require a switch reset.
>From PHYLINK perspective, the switch supports *only* SGMII (it doesn't
support 1000Base-X). It also doesn't expose access to the raw config
word for in-band AN in registers MII_ADV/MII_LPA.
It is able to work in the following modes:
- Forced speed
- SGMII in-band AN slave (speed received from PHY)
- SGMII in-band AN master (acting as a PHY)
The latter mode is not supported by this patch. It is even unclear to me
how that would be described. There is some code for it left in the
patch, but 'an_master' is always passed as false.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sja1105_init_mii_settings iterates over the port list, it prints
this message for disabled ports, because they don't have a valid
phy-mode:
[ 4.778702] sja1105 spi2.0: Unsupported PHY mode unknown!
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switches supported so far by the driver only have non-SerDes ports,
so they should be configured in the PHYLINK callback that provides the
resolved PHY link parameters.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validate 100baseT1_Full to make this driver work with TJA1102 PHY.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propagate the resolved link configuration down via DSA's
phylink_mac_link_up() operation to allow split PCS/MAC to work.
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sja1105_parse_ports_node function was tested only on device trees
where all ports were enabled. Fix this check so that the driver
continues to probe only with the ports where status is not "disabled",
as expected.
Fixes: 8aa9ebccae ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible to stack multiple DSA switches in a way that they are not
part of the tree (disjoint) but the DSA master of a switch is a DSA
slave of another. When that happens switch drivers may have to know this
is the case so as to determine whether their tagging protocol has a
remove chance of working.
This is useful for specific switch drivers such as b53 where devices
have been known to be stacked in the wild without the Broadcom tag
protocol supporting that feature. This allows b53 to continue supporting
those devices by forcing the disabling of Broadcom tags on the outermost
switches if necessary.
The get_tag_protocol() function is therefore updated to gain an
additional enum dsa_tag_protocol argument which denotes the current
tagging protocol used by the DSA master we are attached to, else
DSA_TAG_PROTO_NONE for the top of the dsa_switch_tree.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are 3 things that are wrong with the DSA deferred xmit mechanism:
1. Its introduction has made the DSA hotpath ever so slightly more
inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs
to be initialized to false for every transmitted frame, in order to
figure out whether the driver requested deferral or not (a very rare
occasion, rare even for the only driver that does use this mechanism:
sja1105). That was necessary to avoid kfree_skb from freeing the skb.
2. Because L2 PTP is a link-local protocol like STP, it requires
management routes and deferred xmit with this switch. But as opposed
to STP, the deferred work mechanism needs to schedule the packet
rather quickly for the TX timstamp to be collected in time and sent
to user space. But there is no provision for controlling the
scheduling priority of this deferred xmit workqueue. Too bad this is
a rather specific requirement for a feature that nobody else uses
(more below).
3. Perhaps most importantly, it makes the DSA core adhere a bit too
much to the NXP company-wide policy "Innovate Where It Doesn't
Matter". The sja1105 is probably the only DSA switch that requires
some frames sent from the CPU to be routed to the slave port via an
out-of-band configuration (register write) rather than in-band (DSA
tag). And there are indeed very good reasons to not want to do that:
if that out-of-band register is at the other end of a slow bus such
as SPI, then you limit that Ethernet flow's throughput to effectively
the throughput of the SPI bus. So hardware vendors should definitely
not be encouraged to design this way. We do _not_ want more
widespread use of this mechanism.
Luckily we have a solution for each of the 3 issues:
For 1, we can just remove that variable in the skb->cb and counteract
the effect of kfree_skb with skb_get, much to the same effect. The
advantage, of course, being that anybody who doesn't use deferred xmit
doesn't need to do any extra operation in the hotpath.
For 2, we can create a kernel thread for each port's deferred xmit work.
If the user switch ports are named swp0, swp1, swp2, the kernel threads
will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15
character length limit on kernel thread names). With this, the user can
change the scheduling priority with chrt $(pidof swp2_xmit).
For 3, we can actually move the entire implementation to the sja1105
driver.
So this patch deletes the generic implementation from the DSA core and
adds a new one, more adequate to the requirements of PTP TX
timestamping, in sja1105_main.c.
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I finally found out how the 4 management route slots are supposed to
be used, but.. it's not worth it.
The description from the comment I've just deleted in this commit is
still true: when more than 1 management slot is active at the same time,
the switch will match frames incoming [from the CPU port] on the lowest
numbered management slot that matches the frame's DMAC.
My issue was that one was not supposed to statically assign each port a
slot. Yes, there are 4 slots and also 4 non-CPU ports, but that is a
mere coincidence.
Instead, the switch can be used like this: every management frame gets a
slot at the right of the most recently assigned slot:
Send mgmt frame 1 through S0: S0 x x x
Send mgmt frame 2 through S1: S0 S1 x x
Send mgmt frame 3 through S2: S0 S1 S2 x
Send mgmt frame 4 through S3: S0 S1 S2 S3
The difference compared to the old usage is that the transmission of
frames 1-4 doesn't need to wait until the completion of the management
route. It is safe to use a slot to the right of the most recently used
one, because by protocol nobody will program a slot to your left and
"steal" your route towards the correct egress port.
So there is a potential throughput benefit here.
But mgmt frame 5 has no more free slot to use, so it has to wait until
_all_ of S0, S1, S2, S3 are full, in order to use S0 again.
And that's actually exactly the problem: I was looking for something
that would bring more predictable transmission latency, but this is
exactly the opposite: 3 out of 4 frames would be transmitted quicker,
but the 4th would draw the short straw and have a worse worst-case
latency than before.
Useless.
Things are made even worse by PTP TX timestamping, which is something I
won't go deeply into here. Suffice to say that the fact there is a
driver-level lock on the SPI bus offsets any potential throughput gains
that parallelism might bring.
So there's no going back to the multi-slot scheme, remove the
"mgmt_slot" variable from sja1105_port and the dummy static assignment
made at probe time.
While passing by, also remove the assignment to casc_port altogether.
Don't pretend that we support cascaded setups.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When disabling PTP timestamping, don't reset the switch with the new
static config until all existing PTP frames have been timestamped on the
RX path or dropped. There's nothing we can do with these afterwards.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And move the queue of skb's waiting for RX timestamps into the ptp_data
structure, since it isn't needed if PTP is not compiled.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For first-generation switches (SJA1105E and SJA1105T):
- TPID means C-Tag (typically 0x8100)
- TPID2 means S-Tag (typically 0x88A8)
While for the second generation switches (SJA1105P, SJA1105Q, SJA1105R,
SJA1105S) it is the other way around:
- TPID means S-Tag (typically 0x88A8)
- TPID2 means C-Tag (typically 0x8100)
In other words, E/T tags untagged traffic with TPID, and P/Q/R/S with
TPID2.
So the patch mentioned below fixed VLAN filtering for P/Q/R/S, but broke
it for E/T.
We strive for a common code path for all switches in the family, so just
lie in the static config packing functions that TPID and TPID2 are at
swapped bit offsets than they actually are, for P/Q/R/S. This will make
both switches understand TPID to be ETH_P_8021Q and TPID2 to be
ETH_P_8021AD. The meaning from the original E/T was chosen over P/Q/R/S
because E/T is actually the one with public documentation available
(UM10944.pdf).
Fixes: f9a1a7646c ("net: dsa: sja1105: Reverse TPID and TPID2")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check originates from the initial implementation which was not based
on PTP time but on a standalone clock source. In the meantime we can now
program the PTPSCHTM register at runtime with the dynamic base time
(actually with a value that is 200 ns smaller, to avoid writing DELTA=0
in the Schedule Entry Points Parameters Table). And we also have logic
for moving the actual base time in the future of the PHC's current time
base, so the check for zero serves no purpose, since even if the user
will specify zero, that's not what will end up in the static config
table where the limitation is.
Fixes: 86db36a347 ("net: dsa: sja1105: Implement state machine for TAS with PTP clock source")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When activating tc-taprio offload on the switch ports, the TAS state
machine will try to check whether it is running or not, but will find
both the STARTED and STOPPED bits as false in the
sja1105_tas_check_running function. So the function will return -EINVAL
(an abnormal situation) and the kernel will keep printing this from the
TAS FSM workqueue:
[ 37.691971] sja1105 spi0.1: An operation returned -22
The reason is that the underlying function that gets called,
sja1105_ptp_commit, does not actually do a SPI_READ, but a SPI_WRITE. So
the command buffer remains initialized with zeroes instead of retrieving
the hardware state. Fix that.
Fixes: 41603d78b3 ("net: dsa: sja1105: Make the PTP command read-write")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PTP egress timestamp N must be captured from register PTPEGR_TS[n],
where n = 2 * PORT + TSREG. There are 10 PTPEGR_TS registers, 2 per
port. We are only using TSREG=0.
As opposed to the management slots, which are 4 in number
(SJA1105_NUM_PORTS, minus the CPU port). Any management frame (which
includes PTP frames) can be sent to any non-CPU port through any
management slot. When the CPU port is not the last port (#4), there will
be a mismatch between the slot and the port number.
Luckily, the only mainline occurrence with this switch
(arch/arm/boot/dts/ls1021a-tsn.dts) does have the CPU port as #4, so the
issue did not manifest itself thus far.
Fixes: 47ed985e97 ("net: dsa: sja1105: Add logic for TX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function was using configuration of port 0 in devicetree for all ports.
In case CPU port was not 0, the delay settings was ignored. This resulted not
working communication between CPU and the switch.
Fixes: f5b8631c29 ("net: dsa: sja1105: Error out if RGMII delays are requested in DT")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't really need 10k species of reset. Remove everything except cold
reset which is what is actually used. Too bad the hardware designers
couldn't agree to use the same bit field for rev 1 and rev 2, so the
(*reset_cmd) function pointer is there to stay.
However let's simplify the prototype and give it a struct dsa_switch (we
want to avoid forward-declarations of structures, in this case struct
sja1105_private, wherever we can).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested using the following bash script and the tc from iproute2-next:
#!/bin/bash
set -e -u -o pipefail
NSEC_PER_SEC="1000000000"
gatemask() {
local tc_list="$1"
local mask=0
for tc in ${tc_list}; do
mask=$((${mask} | (1 << ${tc})))
done
printf "%02x" ${mask}
}
if ! systemctl is-active --quiet ptp4l; then
echo "Please start the ptp4l service"
exit
fi
now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
# Phase-align the base time to the start of the next second.
sec=$(echo "${now}" | gawk -F. '{ print $1; }')
base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
tc qdisc add dev swp5 parent root handle 100 taprio \
num_tc 8 \
map 0 1 2 3 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S $(gatemask 7) 100000 \
sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
clockid CLOCK_TAI flags 2
The "state machine" is a workqueue invoked after each manipulation
command on the PTP clock (reset, adjust time, set time, adjust
frequency) which checks over the state of the time-aware scheduler.
So it is not monitored periodically, only in reaction to a PTP command
typically triggered from a userspace daemon (linuxptp). Otherwise there
is no reason for things to go wrong.
Now that the timecounter/cyclecounter has been replaced with hardware
operations on the PTP clock, the TAS Kconfig now depends upon PTP and
the standalone clocksource operating mode has been removed.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PTPSTRTSCH and PTPSTOPSCH bits are actually readable and indicate
whether the time-aware scheduler is running or not. We will be using
that for monitoring the scheduler in the next patch, so refactor the PTP
command API in order to allow that.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes it can be quite opaque even for me why the driver decided to
reset the switch. So instead of adding dump_stack() calls each time for
debugging, just add a reset reason to sja1105_static_config_reload
calls which gets printed to the console.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The purpose here is to avoid ptp4l fail due to this condition:
timed out while polling for tx timestamp
increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
port 1: send peer delay request failed
So either reset the switch before the management frame was sent, or
after it was timestamped as well, but not in the middle.
The condition may arise either due to a true timeout (i.e. because
re-uploading the static config takes time), or due to the TX timestamp
actually getting lost due to reset. For the former we can increase
tx_timestamp_timeout in userspace, for the latter we need this patch.
Locking all traffic during switch reset does not make sense at all,
though. Forcing all CPU-originated traffic to potentially block waiting
for a sleepable context to send > 800 bytes over SPI is not a good idea.
Flows that are autonomously forwarded by the switch will get dropped
anyway during switch reset no matter what. So just let all other
CPU-originated traffic be dropped as well.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PTP time of the switch is not preserved when uploading a new static
configuration. Work around this hardware oddity by reading its PTP time
before a static config upload, and restoring it afterwards.
Static config changes are expected to occur at runtime even in scenarios
directly related to PTP, i.e. the Time-Aware Scheduler of the switch is
programmed in this way.
Perhaps the larger implication of this patch is that the PTP .gettimex64
and .settime functions need to be exposed to sja1105_main.c, where the
PTP lock needs to be held during this entire process. So their core
implementation needs to move to some common functions which get exposed
in sja1105_ptp.h.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace
applications (i.e. phc2sys) to compensate for the delays incurred while
reading the PHC's time.
The task itself of taking the software timestamp is delegated to the SPI
subsystem, through the newly introduced API in struct spi_transfer. The
goal is to cross-timestamp I/O operations on the switch's PTP clock with
values in the local system clock (CLOCK_REALTIME). For that we need to
understand a bit of the hardware internals.
The 'read PTP time' message is a 12 byte structure, first 4 bytes of
which represent the SPI header, and the last 8 bytes represent the
64-bit PTP time. The switch itself starts processing the command
immediately after receiving the last bit of the address, i.e. at the
middle of byte 3 (last byte of header). The PTP time is shadowed to a
buffer register in the switch, and retrieved atomically during the
subsequent SPI frames.
A similar thing goes on for the 'write PTP time' message, although in
that case the switch waits until the 64-bit PTP time becomes fully
available before taking any action. So the byte that needs to be
software-timestamped is byte 11 (last) of the transfer.
The patch creates a common (and local) sja1105_xfer implementation for
the SPI I/O, and offers 3 front-ends:
- sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally
requesting a PTP timestamp
- sja1105_xfer_buf: this is for large transfers (e.g. the static config
buffer) and other misc data, and there is no point in giving
timestamping capabilities to this.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this change of_get_phy_mode() returned an enum,
phy_interface_t. On error, -ENODEV etc, is returned. If the result of
the function is stored in a variable of type phy_interface_t, and the
compiler has decided to represent this as an unsigned int, comparision
with -ENODEV etc, is a signed vs unsigned comparision.
Fix this problem by changing the API. Make the function return an
error, or 0 on success, and pass a pointer, of type phy_interface_t,
where the phy mode should be stored.
v2:
Return with *interface set to PHY_INTERFACE_MODE_NA on error.
Add error checks to all users of of_get_phy_mode()
Fixup a few reverse christmas tree errors
Fixup a few slightly malformed reverse christmas trees
v3:
Fix 0-day reported errors.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.
The rest were (relatively) trivial in nature.
Signed-off-by: David S. Miller <davem@davemloft.net>
An earlier bugfix introduced a dependency on CONFIG_NET_SCH_TAPRIO,
but this missed the case of NET_SCH_TAPRIO=m and NET_DSA_SJA1105=y,
which still causes a link error:
drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_setup_tc_taprio':
sja1105_tas.c:(.text+0x5c): undefined reference to `taprio_offload_free'
sja1105_tas.c:(.text+0x3b4): undefined reference to `taprio_offload_get'
drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_tas_teardown':
sja1105_tas.c:(.text+0x6ec): undefined reference to `taprio_offload_free'
Change the dependency to only allow selecting the TAS code when it
can link against the taprio code.
Fixes: a8d570de0c ("net: dsa: sja1105: Add dependency for NET_DSA_SJA1105_TAS")
Fixes: 317ab5b86c ("net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that ports are dynamically listed in the fabric, there is no need
to provide a special helper to allocate the dsa_switch structure. This
will give more flexibility to drivers to embed this structure as they
wish in their private structure.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Like the dsa_switch_tree structures, the dsa_port structures will be
allocated on switch registration.
The SJA1105 driver is the only one accessing the dsa_port structure
after the switch allocation and before the switch registration.
For that reason, move switch registration prior to assigning the priv
member of the dsa_port structures.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Do not let the drivers access the ds->ports static array directly
while there is a dsa_to_port helper for this purpose.
At the same time, un-const this helper since the SJA1105 driver
assigns the priv member of the returned dsa_port structure.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Adjusting the hardware clock (PTPCLKVAL, PTPCLKADD, PTPCLKRATE) is a
requirement for the auxiliary PTP functionality of the switch
(TTEthernet, PPS input, PPS output).
Therefore we need to switch to using these registers to keep a
synchronized time in hardware, instead of the timecounter/cyclecounter
implementation, which is reliant on the free-running PTPTSCLK.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the SPDX License Identifier style
in header files related to Distributed Switch Architecture
drivers for NXP SJA1105 series Ethernet switch support.
It uses an expilict block comment for the SPDX License
Identifier.
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reworks the SPI transfer implementation to make use of more of the
SPI core features. The main benefit is to avoid the memcpy in
sja1105_xfer_buf().
The memcpy was only needed because the function was transferring a
single buffer at a time. So it needed to copy the caller-provided buffer
at buf + 4, to store the SPI message header in the "headroom" area.
But the SPI core supports scatter-gather messages, comprised of multiple
transfers. We can actually use those to break apart every SPI message
into 2 transfers: one for the header and one for the actual payload.
To keep the behavior the same regarding the chip select signal, it is
necessary to tell the SPI core to de-assert the chip select after each
chunk. This was not needed before, because each spi_message contained
only 1 single transfer.
The meaning of the per-transfer cs_change=1 is:
- If the transfer is the last one of the message, keep CS asserted
- Otherwise, deassert CS
We need to deassert CS in the "otherwise" case, which was implicit
before.
Avoiding the memcpy creates yet another opportunity. The device can't
process more than 256 bytes of SPI payload at a time, so the
sja1105_xfer_long_buf() function used to exist, to split the larger
caller buffer into chunks.
But these chunks couldn't be used as scatter/gather buffers for
spi_message until now, because of that memcpy (we would have needed more
memory for each chunk). So we can now remove the sja1105_xfer_long_buf()
function and have a single implementation for long and short buffers.
Another benefit is lower usage of stack memory. Previously we had to
store 2 SPI buffers for each chunk. Due to the elimination of the
memcpy, we can now send pointers to the actual chunks from the
caller-supplied buffer to the SPI core.
Since the patch merges two functions into a rewritten implementation,
the function prototype was also changed, mainly for cosmetic consistency
with the structures used within it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a cosmetic patch that reduces some boilerplate in the SPI
interaction of the driver.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PTP command register contains enable bits for:
- Putting the 64-bit PTPCLKVAL register in add/subtract or write mode
- Taking timestamps off of the corrected vs free-running clock
- Starting/stopping the TTEthernet scheduling
- Starting/stopping PPS output
- Resetting the switch
When a command needs to be issued (e.g. "change the PTPCLKVAL from write
mode to add/subtract mode"), one cannot simply write to the command
register setting the PTPCLKADD bit to 1, because that would zeroize the
other settings. One also cannot do a read-modify-write (that would be
too easy for this hardware) because not all bits of the command register
are readable over SPI.
So this leaves us with the only option of keeping the value of the PTP
command register in the driver, and operating on that.
Actually there are 2 types of PTP operations now:
- Operations that modify the cached PTP command. These operate on
ptp_data->cmd as a pointer.
- Operations that apply all previously cached PTP settings, but don't
otherwise cache what they did themselves. The sja1105_ptp_reset
function is such an example. It copies the ptp_data->cmd on stack
before modifying and writing it to SPI.
This practically means that struct sja1105_ptp_cmd is no longer an
implementation detail, since it needs to be stored in full into struct
sja1105_ptp_data, and hence in struct sja1105_private. So the (*ptp_cmd)
function prototype can change and take struct sja1105_ptp_cmd as second
argument now.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a non-functional change with 2 goals (both for the case when
CONFIG_NET_DSA_SJA1105_PTP is not enabled):
- Reduce the size of the sja1105_private structure.
- Make the PTP code more self-contained.
Leaving priv->ptp_data.lock to be initialized in sja1105_main.c is not a
leftover: it will be used in a future patch "net: dsa: sja1105: Restore
PTP time after switch reset".
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new rule (as already started for sja1105_tas.h) is for functions of
optional driver components (ones which may be disabled via Kconfig - PTP
and TAS) to take struct dsa_switch *ds instead of struct sja1105_private
*priv as first argument.
This is so that forward-declarations of struct sja1105_private can be
avoided.
So make sja1105_ptp.h the second user of this rule.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need priv->ptp_caps to hold a structure and not just a pointer,
because we use container_of in the various PTP callbacks.
Therefore, the sja1105_ptp_caps structure declared in the global memory
of the driver serves no further purpose after copying it into
priv->ptp_caps.
So just populate priv->ptp_caps with the needed operations and remove
sja1105_ptp_caps.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings:
drivers/net/dsa/sja1105/sja1105_spi.c:159:5: warning: symbol 'sja1105_xfer_long_buf' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amazingly, of all features, this does not require a switch reset.
Tested with:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress matchall skip_sw \
action mirred egress mirror dev swp3
tc filter show dev swp2 ingress
tc filter del dev swp2 ingress pref 49152
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The most commonly called function in the driver is long due for a
rename. The "packed" word is redundant (it doesn't make sense to
transfer an unpacked structure, since that is in CPU endianness yadda
yadda), and the "spi" word is also redundant since argument 2 of the
function is SPI_READ or SPI_WRITE.
As for the sja1105_spi_send_long_packed_buf function, it is only being
used from sja1105_spi.c, so remove its global prototype.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having a function that takes a variable number of unpacked bytes which
it generically calls an "int" is confusing and makes auditing patches
next to impossible.
We only use spi_send_int with the int sizes of 32 and 64 bits. So just
make the spi_send_int function less generic and replace it with the
appropriate two explicit functions, which can now type-check the int
pointer type.
Note that there is still a small weirdness in the u32 function, which
has to convert it to a u64 temporary. This is because of how the packing
API works at the moment, but the weirdness is at least hidden from
callers of sja1105_xfer_u32 now.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently this stack trace can be seen with CONFIG_DEBUG_ATOMIC_SLEEP=y:
[ 41.568348] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
[ 41.576757] in_atomic(): 1, irqs_disabled(): 0, pid: 208, name: ptp4l
[ 41.583212] INFO: lockdep is turned off.
[ 41.587123] CPU: 1 PID: 208 Comm: ptp4l Not tainted 5.3.0-rc6-01445-ge950f2d4bc7f-dirty #1827
[ 41.599873] [<c0313d7c>] (unwind_backtrace) from [<c030e13c>] (show_stack+0x10/0x14)
[ 41.607584] [<c030e13c>] (show_stack) from [<c1212d50>] (dump_stack+0xd4/0x100)
[ 41.614863] [<c1212d50>] (dump_stack) from [<c037dfc8>] (___might_sleep+0x1c8/0x2b4)
[ 41.622574] [<c037dfc8>] (___might_sleep) from [<c122ea90>] (__mutex_lock+0x48/0xab8)
[ 41.630368] [<c122ea90>] (__mutex_lock) from [<c122f51c>] (mutex_lock_nested+0x1c/0x24)
[ 41.638340] [<c122f51c>] (mutex_lock_nested) from [<c0c6fe08>] (sja1105_static_config_reload+0x30/0x27c)
[ 41.647779] [<c0c6fe08>] (sja1105_static_config_reload) from [<c0c7015c>] (sja1105_hwtstamp_set+0x108/0x1cc)
[ 41.657562] [<c0c7015c>] (sja1105_hwtstamp_set) from [<c0feb650>] (dev_ifsioc+0x18c/0x330)
[ 41.665788] [<c0feb650>] (dev_ifsioc) from [<c0febbd8>] (dev_ioctl+0x320/0x6e8)
[ 41.673064] [<c0febbd8>] (dev_ioctl) from [<c0f8b1f4>] (sock_ioctl+0x334/0x5e8)
[ 41.680340] [<c0f8b1f4>] (sock_ioctl) from [<c05404a8>] (do_vfs_ioctl+0xb0/0xa10)
[ 41.687789] [<c05404a8>] (do_vfs_ioctl) from [<c0540e3c>] (ksys_ioctl+0x34/0x58)
[ 41.695151] [<c0540e3c>] (ksys_ioctl) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
[ 41.702768] Exception stack(0xe8495fa8 to 0xe8495ff0)
[ 41.707796] 5fa0: beff4a8c 00000001 00000011 000089b0 beff4a8c beff4a80
[ 41.715933] 5fc0: beff4a8c 00000001 0000000c 00000036 b6fa98c8 004e19c1 00000001 00000000
[ 41.724069] 5fe0: 004dcedc beff4a6c 004c0738 b6e7af4c
[ 41.729860] BUG: scheduling while atomic: ptp4l/208/0x00000002
[ 41.735682] INFO: lockdep is turned off.
Enabling RX timestamping will logically disturb the fastpath (processing
of meta frames). Replace bool hwts_rx_en with a bit that is checked
atomically from the fastpath and temporarily unset from the sleepable
context during a change of the RX timestamping process (a destructive
operation anyways, requires switch reset).
If found unset, the fastpath (net/dsa/tag_sja1105.c) will just drop any
received meta frame and not take the meta_lock at all.
Fixes: a602afd200 ("net: dsa: sja1105: Expose PTP timestamping ioctls to userspace")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In sja1105_static_config_upload, in two cases memory is leaked: when
static_config_buf_prepare_for_upload fails and when sja1105_inhibit_tx
fails. In both cases config_buf should be released.
Fixes: 8aa9ebccae ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
Fixes: 1a4c69406c ("net: dsa: sja1105: Prevent PHY jabbering during switch reset")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes the PTP synchronization on the switch 'jumps':
ptp4l[11241.155]: rms 8 max 16 freq -21732 +/- 11 delay 742 +/- 0
ptp4l[11243.157]: rms 7 max 17 freq -21731 +/- 10 delay 744 +/- 0
ptp4l[11245.160]: rms 33592410 max 134217731 freq +192422 +/- 8530253 delay 743 +/- 0
ptp4l[11247.163]: rms 811631 max 964131 freq +10326 +/- 557785 delay 743 +/- 0
ptp4l[11249.166]: rms 261936 max 533876 freq -304323 +/- 126371 delay 744 +/- 0
ptp4l[11251.169]: rms 48700 max 57740 freq -20218 +/- 30532 delay 744 +/- 0
ptp4l[11253.171]: rms 14570 max 30163 freq -5568 +/- 7563 delay 742 +/- 0
ptp4l[11255.174]: rms 2914 max 3440 freq -22001 +/- 1667 delay 744 +/- 1
ptp4l[11257.177]: rms 811 max 1710 freq -22653 +/- 451 delay 744 +/- 1
ptp4l[11259.180]: rms 177 max 218 freq -21695 +/- 89 delay 741 +/- 0
ptp4l[11261.182]: rms 45 max 92 freq -21677 +/- 32 delay 742 +/- 0
ptp4l[11263.186]: rms 14 max 32 freq -21733 +/- 11 delay 742 +/- 0
ptp4l[11265.188]: rms 9 max 14 freq -21725 +/- 12 delay 742 +/- 0
ptp4l[11267.191]: rms 9 max 16 freq -21727 +/- 13 delay 742 +/- 0
ptp4l[11269.194]: rms 6 max 15 freq -21726 +/- 9 delay 743 +/- 0
ptp4l[11271.197]: rms 8 max 15 freq -21728 +/- 11 delay 743 +/- 0
ptp4l[11273.200]: rms 6 max 12 freq -21727 +/- 8 delay 743 +/- 0
ptp4l[11275.202]: rms 9 max 17 freq -21720 +/- 11 delay 742 +/- 0
ptp4l[11277.205]: rms 9 max 18 freq -21725 +/- 12 delay 742 +/- 0
Background: the switch only offers partial RX timestamps (24 bits) and
it is up to the driver to read the PTP clock to fill those timestamps up
to 64 bits. But the PTP clock readout needs to happen quickly enough (in
0.135 seconds, in fact), otherwise the PTP clock will wrap around 24
bits, condition which cannot be detected.
Looking at the 'max 134217731' value on output line 3, one can see that
in hex it is 0x8000003. Because the PTP clock resolution is 8 ns,
that means 0x1000000 in ticks, which is exactly 2^24. So indeed this is
a PTP clock wraparound, but the reason might be surprising.
What is going on is that sja1105_tstamp_reconstruct(priv, now, ts)
expects a "now" time that is later than the "ts" was snapshotted at.
This, of course, is obvious: we read the PTP time _after_ the partial RX
timestamp was received. However, the workqueue is processing frames from
a skb queue and reuses the same PTP time, read once at the beginning.
Normally the skb queue only contains one frame and all goes well. But
when the skb queue contains two frames, the second frame that gets
dequeued might have been partially timestamped by the RX MAC _after_ we
had read our PTP time initially.
The code was originally like that due to concerns that SPI access for
PTP time readout is a slow process, and we are time-constrained anyway
(aka: premature optimization). But some timing analysis reveals that the
time spent until the RX timestamp is completely reconstructed is 1 order
of magnitude lower than the 0.135 s deadline even under worst-case
conditions. So we can afford to read the PTP time for each frame in the
RX timestamping queue, which of course ensures that the full PTP time is
in the partial timestamp's future.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_NET_DSA_SJA1105_TAS=y and CONFIG_NET_SCH_TAPRIO=n,
below error can be found:
drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_setup_tc_taprio':
sja1105_tas.c:(.text+0x318): undefined reference to `taprio_offload_free'
sja1105_tas.c:(.text+0x590): undefined reference to `taprio_offload_get'
drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_tas_teardown':
sja1105_tas.c:(.text+0x610): undefined reference to `taprio_offload_free'
make: *** [vmlinux] Error 1
sja1105_tas needs tc-taprio, so this patch add the dependency for it.
Fixes: 317ab5b86c ("net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
This qdisc offload is the closest thing to what the SJA1105 supports in
hardware for time-based egress shaping. The switch core really is built
around SAE AS6802/TTEthernet (a TTTech standard) but can be made to
operate similarly to IEEE 802.1Qbv with some constraints:
- The gate control list is a global list for all ports. There are 8
execution threads that iterate through this global list in parallel.
I don't know why 8, there are only 4 front-panel ports.
- Care must be taken by the user to make sure that two execution threads
never get to execute a GCL entry simultaneously. I created a O(n^4)
checker for this hardware limitation, prior to accepting a taprio
offload configuration as valid.
- The spec says that if a GCL entry's interval is shorter than the frame
length, you shouldn't send it (and end up in head-of-line blocking).
Well, this switch does anyway.
- The switch has no concept of ADMIN and OPER configurations. Because
it's so simple, the TAS settings are loaded through the static config
tables interface, so there isn't even place for any discussion about
'graceful switchover between ADMIN and OPER'. You just reset the
switch and upload a new OPER config.
- The switch accepts multiple time sources for the gate events. Right
now I am using the standalone clock source as opposed to PTP. So the
base time parameter doesn't really do much. Support for the PTP clock
source will be added in a future series.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a preparation patch for the tc-taprio offload (and potentially
for other future offloads such as tc-mqprio).
Instead of looking directly at skb->priority during xmit, let's get the
netdev queue and the queue-to-traffic-class mapping, and put the
resulting traffic class into the dsa_8021q PCP field. The switch is
configured with a 1-to-1 PCP-to-ingress-queue-to-egress-queue mapping
(see vlan_pmap in sja1105_main.c), so the effect is that we can inject
into a front-panel's egress traffic class through VLAN tagging from
Linux, completely transparently.
Unfortunately the switch doesn't look at the VLAN PCP in the case of
management traffic to/from the CPU (link-local frames at
01-80-C2-xx-xx-xx or 01-1B-19-xx-xx-xx) so we can't alter the
transmission queue of this type of traffic on a frame-by-frame basis. It
is only selected through the "hostprio" setting which ATM is harcoded in
the driver to 7.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support tc-taprio offload, the TTEthernet egress scheduling
core registers must be made visible through the static interface.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch barely supports traffic I/O, and it does that by repurposing
VLANs when there is no bridge that is taking control of them.
Letting DSA declare this netdev feature as supported (see
dsa_slave_create) would mean that VLAN sub-interfaces created on sja1105
switch ports will be hardware offloaded. That means that
net/8021q/vlan_core.c would install the VLAN into the filter tables of
the switch, potentially interfering with the tag_8021q VLANs.
We need to prevent that from happening and not let the 8021q core
offload VLANs to the switch hardware tables. In vlan_filtering=0 modes
of operation, the switch ports can pass through VLAN-tagged frames with
no problem.
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/dsa/sja1105/sja1105_main.c: In function sja1105_fdb_dump:
drivers/net/dsa/sja1105/sja1105_main.c:1226:14: warning:
variable tx_vid set but not used [-Wunused-but-set-variable]
drivers/net/dsa/sja1105/sja1105_main.c:1226:6: warning:
variable rx_vid set but not used [-Wunused-but-set-variable]
They are not used since commit 6d7c7d948a ("net: dsa:
sja1105: Fix broken learning with vlan_filtering disabled")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IS_ERR_OR_NULL(priv->clock) check inside
sja1105_ptp_clock_unregister() is preventing cancel_delayed_work_sync
from actually being run.
Additionally, sja1105_ptp_clock_unregister() does not actually get run,
when placed in sja1105_remove(). The DSA switch gets torn down, but the
sja1105 module does not get unregistered. So sja1105_ptp_clock_unregister
needs to be moved to sja1105_teardown, to be symmetrical with
sja1105_ptp_clock_register which is called from the DSA sja1105_setup.
It is strange to fix a "fixes" patch, but the probe failure can only be
seen when the attached PHY does not respond to MDIO (issue which I can't
pinpoint the reason to) and it goes away after I power-cycle the board.
This time the patch was validated on a failing board, and the kernel
panic from the fixed commit's message can no longer be seen.
Fixes: 29dd908d35 ("net: dsa: sja1105: Cancel PTP delayed work on unregister")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like the FDB dump taken from first-generation switches also
contains information on whether entries are static or not. So use that
instead of searching through the driver's tables.
Fixes: d763778224 ("net: dsa: sja1105: Implement is_static for FDB entries on E/T")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When put under a bridge with vlan_filtering 0, the SJA1105 ports will
flood all traffic as if learning was broken. This is because learning
interferes with the rx_vid's configured by dsa_8021q as unique pvid's.
So learning technically still *does* work, it's just that the learnt
entries never get matched due to their unique VLAN ID.
The setting that saves the day is Shared VLAN Learning, which on this
switch family works exactly as desired: VLAN tagging still works
(untagged traffic gets the correct pvid) and FDB entries are still
populated with the correct contents including VID. Also, a frame cannot
violate the forwarding domain restrictions enforced by its classified
VLAN. It is just that the VID is ignored when looking up the FDB for
taking a forwarding decision (selecting the egress port).
This patch activates SVL, and the result is that frames with a learnt
DMAC are no longer flooded in the scenario described above.
Now exactly *because* SVL works as desired, we have to revisit some
earlier patches:
- It is no longer necessary to manipulate the VID of the 'bridge fdb
{add,del}' command when vlan_filtering is off. This is because now,
SVL is enabled for that case, so the actual VID does not matter*.
- It is still desirable to hide dsa_8021q VID's in the FDB dump
callback. But right now the dump callback should no longer hide
duplicates (one per each front panel port's pvid, plus one for the
VLAN that the CPU port is going to tag a TX frame with), because there
shouldn't be any (the switch will match a single FDB entry no matter
its VID anyway).
* Not really... It's no longer necessary to transform a 'bridge fdb add'
into 5 fdb add operations, but the user might still add a fdb entry with
any vid, and all of them would appear as duplicates in 'bridge fdb
show'. So force a 'bridge fdb add' to insert the VID of 0**, so that we
can prune the duplicates at insertion time.
** The VID of 0 is better than 1 because it is always guaranteed to be
in the ports' hardware filter. DSA also avoids putting the VID inside
the netlink response message towards the bridge driver when we return
this particular VID, which makes it suitable for FDB entries learnt
with vlan_filtering off.
Fixes: 227d07a07e ("net: dsa: sja1105: Add support for traffic through standalone ports")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each iteration of for_each_child_of_node puts the previous node, but in
the case of a return from the middle of the loop, there is no put, thus
causing a memory leak. Hence add an of_node_put before the return.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need a better way to signal this, perhaps in phylink_validate, but
for now just print this error message as guidance for other people
looking at this driver's code while trying to rework PHYLINK.
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PHYLINK being designed with PHYs in mind that can change MII protocol,
for correct operation it is necessary to ensure that the PHY interface
mode stays the same (otherwise clear the supported bit mask, as
required).
Because this is just a hypothetical situation for now, we don't bother
to check whether we could actually support the new PHY interface mode.
Actually we could modify the xMII table, reset the switch and send an
updated static configuration, but adding that would just be dead code.
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It has been pointed out that PHYLINK can call mac_config only to update
the phy_interface_type and without knowing what the AN results are.
Experimentally, when this was observed to happen, state->link was also
unset, and therefore was used as a proxy to ignore this call. However it
is also suggested that state->link is undefined for this callback and
should not be relied upon.
So let the previously-dead codepath for SPEED_UNKNOWN be called, and
update the comment to make sure the MAC's behavior is sane.
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first generation switches don't tell us through the dynamic config
interface whether the dumped FDB entries are static or not (the LOCKEDS
bit from P/Q/R/S).
However, now that we're keeping a mirror of all 'bridge fdb' commands in
the static config, this is an opportunity to compare a dumped FDB entry
to the driver's private database. After all, what makes an entry static
is that *we* added it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A FDB entry means that "frames that match this VID and DMAC must be
forwarded to this port".
In the case of dsa_8021q however, the VID is not a single one (and
neither two, as my previous patch assumed). The VID can be set either by
the CPU port (1 tx_vid), or by any of the other front-panel port (n-1
rx_vid's).
Fixes: 93647594d8 ("net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb command")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reason why this wasn't tackled earlier is that I had hoped I
understood the user manual wrong. But unfortunately hacks are required
in order to retrieve the static/dynamic nature of FDB entries on SJA1105
P/Q/R/S, since this info is stored in the writeback buffer of the
dynamic config command.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When trying to add support for LOCKEDS (static FDB entries) on SJA1105
P/Q/R/S, at first I didn't remember how the abstraction I created
worked, and actually thought it works by mistake.
To avoid other people staring at the code and not making much sense out
of it, add some comments at the top of the file.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 8456721dd4 ("net: dsa: sja1105: Add support for
configuring address ageing time"), we started to reset the switch rather
often (each time the bridge core changes the ageing time on a switch
port).
The unfortunate reality is that SJA1105 doesn't have any {cold, warm,
whatever} reset mode in which it accepts a new configuration stream
without flushing the FDB. Instead, in its world, the FDB *is* an
optional part of the static configuration.
So we play its game, and do what we also do for VLANs: for each 'bridge
fdb' command, we add the FDB entry through the dynamic interface, and we
append the in-kernel static config memory with info that we're going to
use later, when the next reset command is going to be issued.
The result is that 'bridge fdb' commands are now persistent (dynamically
learned entries are lost, but that's ok).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the end of the commit 1da7382134 ("net: dsa: sja1105: Add FDB
operations for P/Q/R/S series") message, I said that:
At the moment only FDB entries installed statically through 'bridge fdb'
are visible in the dump callback - the dynamically learned ones are
still under investigation.
It looks like the reason why they were not visible in 'bridge fdb' was
that they were never learned - always flooded.
SJA1105 P/Q/R/S manual says about the MAXADDRP[port] field:
Specify the maximum number of MAC address dynamically learned from
the respective port. It is used to limit the number of learned MAC
addresses per port.
It looks like not providing a value in the static config (aka providing
zeroes) is enough for it to not store the learned addresses in the FDB.
For now we divide the 1024 entry FDB "equally" amongst the 5 ports. This
may be revisited if the situation calls for that - for now I'm happy
that learning works.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 1da7382134 ("net: dsa: sja1105: Add FDB operations for
P/Q/R/S series"), these bits were set in the static config, but
apparently they did not do anything. The reason is that the packing
accessors for them were part of a patch I forgot to send.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In SJA1105 there is no concept of 'default values' per se, everything
needs to be driver-supplied through the static configuration tables.
The issue is that the hardware manual says that 'at least the default
untagging VLAN' is mandatory to be provided through the static config.
But VLAN 0 isn't a very good initial pvid - its use is reserved for
priority-tagged frames, and the layers of the stack that care about
those already make sure that this VLAN is installed, as can be seen in
the message below:
8021q: adding VLAN 0 to HW filter on device swp2
So change the pvid provided through the static configuration to 1, which
matches the bridge core's defaults.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Arnd Bergmann pointed out in commit 78fe8a28fb ("net: dsa: sja1105:
fix ptp link error"), there is no point in having PTP support as a
separate loadable kernel module.
So remove the exported symbols and make sja1105.ko contain PTP support
or not based on CONFIG_NET_DSA_SJA1105_PTP.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to a reversed dependency, it is possible to build
the lower ptp driver as a loadable module and the actual
driver using it as built-in, causing a link error:
drivers/net/dsa/sja1105/sja1105_spi.o: In function `sja1105_static_config_upload':
sja1105_spi.c:(.text+0x6f0): undefined reference to `sja1105_ptp_reset'
drivers/net/dsa/sja1105/sja1105_spi.o:(.data+0x2d4): undefined reference to `sja1105et_ptp_cmd'
drivers/net/dsa/sja1105/sja1105_spi.o:(.data+0x604): undefined reference to `sja1105pqrs_ptp_cmd'
drivers/net/dsa/sja1105/sja1105_main.o: In function `sja1105_remove':
sja1105_main.c:(.text+0x8d4): undefined reference to `sja1105_ptp_clock_unregister'
drivers/net/dsa/sja1105/sja1105_main.o: In function `sja1105_rxtstamp_work':
sja1105_main.c:(.text+0x964): undefined reference to `sja1105_tstamp_reconstruct'
drivers/net/dsa/sja1105/sja1105_main.o: In function `sja1105_setup':
sja1105_main.c:(.text+0xb7c): undefined reference to `sja1105_ptp_clock_register'
drivers/net/dsa/sja1105/sja1105_main.o: In function `sja1105_port_deferred_xmit':
sja1105_main.c:(.text+0x1fa0): undefined reference to `sja1105_ptpegr_ts_poll'
sja1105_main.c:(.text+0x1fc4): undefined reference to `sja1105_tstamp_reconstruct'
drivers/net/dsa/sja1105/sja1105_main.o:(.rodata+0x5b0): undefined reference to `sja1105_get_ts_info'
Change the Makefile logic to always build the ptp module
the same way as the rest. Another option would be to
just add it to the same module and remove the exports,
but I don't know if there was a good reason to keep them
separate.
Fixes: bb77f36ac2 ("net: dsa: sja1105: Add support for the PTP clock")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings:
drivers/net/dsa/sja1105/sja1105_main.c:1848:6:
warning: symbol 'sja1105_port_rxtstamp' was not declared. Should it be static?
drivers/net/dsa/sja1105/sja1105_main.c:1869:6:
warning: symbol 'sja1105_port_txtstamp' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per the DT phy-mode specification, RGMII delays are applied by the
MAC when there is no PHY present on the link.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pad_mii_tx registers point to the same memory region but were
unused. So convert to using these for RGMII I/O cell configuration, as
they bear a shorter name.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first fact that needs to be stated is that the per-MAC settings in
SJA1105 called EGRESS and INGRESS do *not* disable egress and ingress on
the MAC. They only prevent non-link-local traffic from being
sent/received on this port.
So instead of having .phylink_mac_config essentially mess with the STP
state and force it to DISABLED/BLOCKING (which also brings useless
complications in sja1105_static_config_reload), simply add the
.phylink_mac_link_down and .phylink_mac_link_up callbacks which inhibit
TX at the MAC level, while leaving RX essentially enabled.
Also stop from trying to put the link down in .phylink_mac_config, which
is incorrect.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used to stop egress traffic in .phylink_mac_link_up.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the driver is now using PHYLINK exclusively, it makes sense to
remove all references to it and replace them with PHYLINK.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a cosmetic patch that replaces the link speed numbers used in
the driver with the corresponding ethtool macros.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This enables the PTP support towards userspace applications such as
linuxptp.
The switches can timestamp only trapped multicast MAC frames, and
therefore only the profiles of 1588 over L2 are supported.
TX timestamping can be enabled per port, but RX timestamping is enabled
globally. As long as RX timestamping is enabled, the switch will emit
metadata follow-up frames that will be processed by the tagger. It may
be a problem that linuxptp does not restore the RX timestamping settings
when exiting.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Meta frame reception relies on the hardware keeping its promise that it
will send no other traffic towards the CPU port between a link-local
frame and a meta frame. Otherwise there is no other way to associate
the meta frame with the link-local frame it's holding a timestamp of.
The receive function is made stateful, and buffers a timestampable frame
until its meta frame arrives, then merges the two, drops the meta and
releases the link-local frame up the stack.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without noticing any particular issue, this patch ensures that
management traffic is treated with the maximum priority on RX by the
switch. This is generally desirable, as the driver keeps a state
machine that waits for metadata follow-up frames as soon as a management
frame is received. Increasing the priority helps expedite the reception
(and further reconstruction) of the RX timestamp to the driver after the
MAC has generated it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used to keep state for RX timestamping. It is global
because the switch serializes timestampable and meta frames when
trapping them towards the CPU port (lower port indices have higher
priority) and therefore having one state machine per port would create
unnecessary complications.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>