Commit Graph

443 Commits

Author SHA1 Message Date
Nikolay Aleksandrov
58d913a326 net: bridge: multicast: add context support for host-joined groups
Adding bridge multicast context support for host-joined groups is easy
because we only need the proper timer value. We pass the already chosen
context and use its timer value.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 14:34:47 -07:00
Nikolay Aleksandrov
9dee572c38 net: bridge: vlan: add mcast snooping control
Add a new global vlan option which controls whether multicast snooping
is enabled or disabled for a single vlan. It controls the vlan private
flag: BR_VLFLAG_GLOBAL_MCAST_ENABLED.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov
743a53d963 net: bridge: vlan: add support for dumping global vlan options
Add a new vlan options dump flag which causes only global vlan options
to be dumped. The dumps are done only with bridge devices, ports are
ignored. They support vlan compression if the options in sequential
vlans are equal (currently always true).

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov
47ecd2dbd8 net: bridge: vlan: add support for global options
We can have two types of vlan options depending on context:
 - per-device vlan options (split in per-bridge and per-port)
 - global vlan options

The second type wasn't supported in the bridge until now, but we need
them for per-vlan multicast support, per-vlan STP support and other
options which require global vlan context. They are contained in the global
bridge vlan context even if the vlan is not configured on the bridge device
itself. This patch adds initial netlink attributes and support for setting
these global vlan options, they can only be set (RTM_NEWVLAN) and the
operation must use the bridge device. Since there are no such options yet
it shouldn't have any functional effect.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov
1e9ca45662 net: bridge: multicast: include router port vlan id in notifications
Use the port multicast context to check if the router port is a vlan and
in case it is include its vlan id in the notification.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov
4cdd0d10f3 net: bridge: multicast: check if should use vlan mcast ctx
Add helpers which check if the current bridge/port multicast context
should be used (i.e. they're not disabled) and use them for Rx IGMP/MLD
processing, timers and new group addition. It is important for vlans to
disable processing of timer/packet after the multicast_lock is obtained
if the vlan context doesn't have BR_VLFLAG_MCAST_ENABLED. There are two
cases when that flag is missing:
 - if the vlan is getting destroyed it will be removed and timers will
   be stopped
 - if the vlan mcast snooping is being disabled

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov
f4b7002a70 net: bridge: add vlan mcast snooping knob
Add a global knob that controls if vlan multicast snooping is enabled.
The proper contexts (vlan or bridge-wide) will be chosen based on the knob
when processing packets and changing bridge device state. Note that
vlans have their individual mcast snooping enabled by default, but this
knob is needed to turn on bridge vlan snooping. It is disabled by
default. To enable the knob vlan filtering must also be enabled, it
doesn't make sense to have vlan mcast snooping without vlan filtering
since that would lead to inconsistencies. Disabling vlan filtering will
also automatically disable vlan mcast snooping.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov
7b54aaaf53 net: bridge: multicast: add vlan state initialization and control
Add helpers to enable/disable vlan multicast based on its flags, we need
two flags because we need to know if the vlan has multicast enabled
globally (user-controlled) and if it has it enabled on the specific device
(bridge or port). The new private vlan flags are:
 - BR_VLFLAG_MCAST_ENABLED: locally enabled multicast on the device, used
   when removing a vlan, toggling vlan mcast snooping and controlling
   single vlan (kernel-controlled, valid under RTNL and multicast_lock)
 - BR_VLFLAG_GLOBAL_MCAST_ENABLED: globally enabled multicast for the
   vlan, used to control the bridge-wide vlan mcast snooping for a
   single vlan (user-controlled, can be checked under any context)

Bridge vlan contexts are created with multicast snooping enabled by
default to be in line with the current bridge snooping defaults. In
order to actually activate per vlan snooping and context usage a
bridge-wide knob will be added later which will default to disabled.
If that knob is enabled then automatically all vlan snooping will be
enabled. All vlan contexts are initialized with the current bridge
multicast context defaults.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:19 -07:00
Nikolay Aleksandrov
613d61dbef net: bridge: vlan: add global and per-port multicast context
Add global and per-port vlan multicast context, only initialized but
still not used. No functional changes intended.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:19 -07:00
Nikolay Aleksandrov
adc47037a7 net: bridge: multicast: use multicast contexts instead of bridge or port
Pass multicast context pointers to multicast functions instead of bridge/port.
This would make it easier later to switch these contexts to their per-vlan
versions. The patch is basically search and replace, no functional changes.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:19 -07:00
Nikolay Aleksandrov
d3d065c003 net: bridge: multicast: factor out bridge multicast context
Factor out the bridge's global multicast context into a separate
structure which will later be used for per-vlan global context.
No functional changes intended.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:19 -07:00
Nikolay Aleksandrov
9632233e7d net: bridge: multicast: factor out port multicast context
Factor out the port's multicast context into a separate structure which
will later be shared for per-port,vlan context. No functional changes
intended. We need the structure even if bridge multicast is not defined
to pass down as pointer to forwarding functions.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 05:41:19 -07:00
Tobias Waldekranz
6eb38bf8eb net: bridge: switchdev: send FDB notifications for host addresses
Treat addresses added to the bridge itself in the same way as regular
ports and send out a notification so that drivers may sync it down to
the hardware FDB.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29 10:46:23 -07:00
Jakub Kicinski
adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Nikolay Aleksandrov
58e2071742 net: bridge: fix vlan tunnel dst null pointer dereference
This patch fixes a tunnel_dst null pointer dereference due to lockless
access in the tunnel egress path. When deleting a vlan tunnel the
tunnel_dst pointer is set to NULL without waiting a grace period (i.e.
while it's still usable) and packets egressing are dereferencing it
without checking. Use READ/WRITE_ONCE to annotate the lockless use of
tunnel_id, use RCU for accessing tunnel_dst and make sure it is read
only once and checked in the egress path. The dst is already properly RCU
protected so we don't need to do anything fancy than to make sure
tunnel_id and tunnel_dst are read only once and checked in the egress path.

Cc: stable@vger.kernel.org
Fixes: 11538d039a ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10 14:06:43 -07:00
Nikolay Aleksandrov
bbc6f2cca7 net: bridge: fix br_multicast_is_router stub when igmp is disabled
br_multicast_is_router takes two arguments when bridge IGMP is enabled
and just one when it's disabled, fix the stub to take two as well.

Fixes: 1a3065a268 ("net: bridge: mcast: prepare is-router function for mcast router split")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 10:35:20 -07:00
Linus Lüssing
a3c02e769e net: bridge: mcast: split multicast router state for IPv4 and IPv6
A multicast router for IPv4 does not imply that the same host also is a
multicast router for IPv6 and vice versa.

To reduce multicast traffic when a host is only a multicast router for
one of these two protocol families, keep router state for IPv4 and IPv6
separately. Similar to how querier state is kept separately.

For backwards compatibility for netlink and switchdev notifications
these two will still only notify if a port switched from either no
IPv4/IPv6 multicast router to any IPv4/IPv6 multicast router or the
other way round. However a full netlink MDB router dump will now also
include a multicast router timeout for both IPv4 and IPv6.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing
1a3065a268 net: bridge: mcast: prepare is-router function for mcast router split
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants make br_multicast_is_router() protocol
family aware.

Note that for now br_ip6_multicast_is_router() uses the currently still
common ip4_mc_router_timer for now. It will be renamed to
ip6_mc_router_timer later when the split is performed.

While at it also renames the "1" and "2" constants in
br_multicast_is_router() to the MDB_RTR_TYPE_TEMP_QUERY and
MDB_RTR_TYPE_PERM enums.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing
44ebb081dc net: bridge: mcast: add wrappers for router node retrieval
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add
two wrapper functions for router node retrieval in the payload
forwarding code.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing
ce6f709775 net: bridge: mcast: rename multicast router lists and timers
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants, rename the affected variable to the IPv4
version first to avoid some renames in later commits.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Florian Fainelli
ae1ea84b33 net: bridge: propagate error code and extack from br_mc_disabled_update
Some Ethernet switches might only be able to support disabling multicast
snooping globally, which is an issue for example when several bridges
span the same physical device and request contradictory settings.

Propagate the return value of br_mc_disabled_update() such that this
limitation is transmitted correctly to user-space.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 14:32:05 -07:00
Felix Fietkau
bcf2766b13 net: bridge: resolve forwarding path for VLAN tag actions in bridge devices
Depending on the VLAN settings of the bridge and the port, the bridge can
either add or remove a tag. When vlan filtering is enabled, the fdb lookup
also needs to know the VLAN tag/proto for the destination address
To provide this, keep track of the stack of VLAN tags for the path in the
lookup context

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:38 -07:00
Vladimir Oltean
c97f47e3c1 net: bridge: fix br_vlan_filter_toggle stub when CONFIG_BRIDGE_VLAN_FILTERING=n
The prototype of br_vlan_filter_toggle was updated to include a netlink
extack, but the stub definition wasn't, which results in a build error
when CONFIG_BRIDGE_VLAN_FILTERING=n.

Fixes: 9e781401cb ("net: bridge: propagate extack through store_bridge_parm")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 13:15:10 -08:00
Vladimir Oltean
dcbdf1350e net: bridge: propagate extack through switchdev_port_attr_set
The benefit is the ability to propagate errors from switchdev drivers
for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:38:11 -08:00
Vladimir Oltean
9e781401cb net: bridge: propagate extack through store_bridge_parm
The bridge sysfs interface stores parameters for the STP, VLAN,
multicast etc subsystems using a predefined function prototype.
Sometimes the underlying function being called supports a netlink
extended ack message, and we ignore it.

Let's expand the store_bridge_parm function prototype to include the
extack, and just print it to console, but at least propagate it where
applicable. Where not applicable, create a shim function in the
br_sysfs_br.c file that discards the extra function argument.

This patch allows us to propagate the extack argument to
br_vlan_set_default_pvid, br_vlan_set_proto and br_vlan_filter_toggle,
and from there, further up in br_changelink from br_netlink.c.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:38:11 -08:00
Vladimir Oltean
7a572964e0 net: bridge: remove __br_vlan_filter_toggle
This function is identical with br_vlan_filter_toggle.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:38:11 -08:00
Vladimir Oltean
078bbb851e net: bridge: don't print in br_switchdev_set_port_flag
For the netlink interface, propagate errors through extack rather than
simply printing them to the console. For the sysfs interface, we still
print to the console, but at least that's one layer higher than in
switchdev, which also allows us to silently ignore the offloading of
flags if that is ever needed in the future.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
Nikolay Aleksandrov
89268b056e net: bridge: multicast: add per-port EHT hosts limit
Add a default limit of 512 for number of tracked EHT hosts per-port.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-27 17:40:35 -08:00
Nikolay Aleksandrov
d5a1022283 net: bridge: multicast: mark IGMPv3/MLDv2 fast-leave deletes
Mark groups which were deleted due to fast leave/EHT.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov
474ddb37fa net: bridge: multicast: add EHT allow/block handling
Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how
the reports are processed we have 2 cases when the group is in include
or exclude mode, these are processed as follows:
 - group include
  - allow: create missing entries
  - block: remove existing matching entries and remove the corresponding
    S,G entries if there are no more set host entries, then possibly
    delete the whole group if there are no more S,G entries

 - group exclude
  - allow
    - host include: create missing entries
    - host exclude: remove existing matching entries and remove the
      corresponding S,G entries if there are no more set host entries
  - block
    - host include: remove existing matching entries and remove the
      corresponding S,G entries if there are no more set host entries,
      then possibly delete the whole group if there are no more S,G entries
    - host exclude: create missing entries

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov
8f07b83119 net: bridge: multicast: add EHT structures and definitions
Add EHT structures for tracking hosts and sources per group. We keep one
set for each host which has all of the host's S,G entries, and one set for
each multicast source which has all hosts that have joined that S,G. For
each host, source entry we record the filter_mode and we keep an expiry
timer. There is also one global expiry timer per source set, it is
updated with each set entry update, it will be later used to lower the
set's timer instead of lowering each entry's timer separately.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 19:39:56 -08:00
Jakub Kicinski
46d5e62dd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11 22:29:38 -08:00
Joseph Huang
851d0a73c9 bridge: Fix a deadlock when enabling multicast snooping
When enabling multicast snooping, bridge module deadlocks on multicast_lock
if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
network.

The deadlock was caused by the following sequence: While holding the lock,
br_multicast_open calls br_multicast_join_snoopers, which eventually causes
IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
Since the destination Ethernet address is a multicast address, br_dev_xmit
feeds the packet back to the bridge via br_multicast_rcv, which in turn
calls br_multicast_add_group, which then deadlocks on multicast_lock.

The fix is to move the call br_multicast_join_snoopers outside of the
critical section. This works since br_multicast_join_snoopers only deals
with IP and does not modify any multicast data structures of the bridge,
so there's no need to hold the lock.

Steps to reproduce:
1. sysctl net.ipv6.conf.all.force_mld_version=1
2. have another querier
3. ip link set dev bridge type bridge mcast_snooping 0 && \
   ip link set dev bridge type bridge mcast_snooping 1 < deadlock >

A typical call trace looks like the following:

[  936.251495]  _raw_spin_lock+0x5c/0x68
[  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
[  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
[  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
[  936.269689]  dev_hard_start_xmit+0x94/0x158
[  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
[  936.277890]  dev_queue_xmit+0x10/0x18
[  936.281563]  neigh_resolve_output+0xec/0x198
[  936.285845]  ip6_finish_output2+0x240/0x710
[  936.290039]  __ip6_finish_output+0x130/0x170
[  936.294318]  ip6_output+0x6c/0x1c8
[  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
[  936.301834]  igmp6_send+0x358/0x558
[  936.305326]  igmp6_join_group.part.0+0x30/0xf0
[  936.309774]  igmp6_group_added+0xfc/0x110
[  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
[  936.317885]  ipv6_dev_mc_inc+0x10/0x18
[  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
[  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]

Fixes: 4effd28c12 ("bridge: join all-snoopers multicast address")
Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07 17:14:43 -08:00
Heiner Kallweit
7609ecb2ed net: bridge: switch to net core statistics counters handling
Use netdev->tstats instead of a member of net_bridge for storing
a pointer to the per-cpu counters. This allows us to use core
functionality for statistics handling.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/9bad2be2-fd84-7c6e-912f-cee433787018@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21 14:40:50 -08:00
Heiner Kallweit
281cc2843b net: bridge: replace struct br_vlan_stats with pcpu_sw_netstats
Struct br_vlan_stats duplicates pcpu_sw_netstats (apart from
br_vlan_stats not defining an alignment requirement), therefore
switch to using the latter one.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/04d25c3d-c5f6-3611-6d37-c2f40243dae2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18 17:23:16 -08:00
Horatiu Vultur
0169b82054 bridge: mrp: Use hlist_head instead of list_head for mrp
Replace list_head with hlist_head for MRP list under the bridge.
There is no need for a circular list when a linear list will work.
This will also decrease the size of 'struct net_bridge'.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20201106215049.1448185-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-09 16:42:12 -08:00
Vladimir Oltean
c43fd36f7f net: bridge: mcast: fix stub definition of br_multicast_querier_exists
The commit cited below has changed only the functional prototype of
br_multicast_querier_exists, but forgot to do that for the stub
prototype (the one where CONFIG_BRIDGE_IGMP_SNOOPING is disabled).

Fixes: 955062b03f ("net: bridge: mcast: add support for raw L2 multicast groups")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201101000845.190009-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 17:23:19 -07:00
Nikolay Aleksandrov
955062b03f net: bridge: mcast: add support for raw L2 multicast groups
Extend the bridge multicast control and data path to configure routes
for L2 (non-IP) multicast groups.

The uapi struct br_mdb_entry union u is extended with another variant,
mac_addr, which does not change the structure size, and which is valid
when the proto field is zero.

To be compatible with the forwarding code that is already in place,
which acts as an IGMP/MLD snooping bridge with querier capabilities, we
need to declare that for L2 MDB entries (for which there exists no such
thing as IGMP/MLD snooping/querying), that there is always a querier.
Otherwise, these entries would be flooded to all bridge ports and not
just to those that are members of the L2 multicast group.

Needless to say, only permanent L2 multicast groups can be installed on
a bridge port.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201028233831.610076-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:49:19 -07:00
Henrik Bjoernlund
b6d0425b81 bridge: cfm: Netlink Notifications.
This is the implementation of Netlink notifications out of CFM.

Notifications are initiated whenever a state change happens in CFM.

IFLA_BRIDGE_CFM:
    Points to the CFM information.

IFLA_BRIDGE_CFM_MEP_STATUS_INFO:
    This indicate that the MEP instance status are following.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO:
    This indicate that the peer MEP status are following.

CFM nested attribute has the following attributes in next level.

IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE:
    The MEP instance number of the delivered status.
    The type is NLA_U32.
IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN:
    The MEP instance received CFM PDU with unexpected Opcode.
    The type is NLA_U32 (bool).
IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN:
    The MEP instance received CFM PDU with unexpected version.
    The type is NLA_U32 (bool).
IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN:
    The MEP instance received CCM PDU with MD level lower than
    configured level. This frame is discarded.
    The type is NLA_U32 (bool).

IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE:
    The MEP instance number of the delivered status.
    The type is NLA_U32.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID:
    The added Peer MEP ID of the delivered status.
    The type is NLA_U32.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT:
    The CCM defect status.
    The type is NLA_U32 (bool).
    True means no CCM frame is received for 3.25 intervals.
    IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI:
    The last received CCM PDU RDI.
    The type is NLA_U32 (bool).
IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE:
    The last received CCM PDU Port Status TLV value field.
    The type is NLA_U8.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE:
    The last received CCM PDU Interface Status TLV value field.
    The type is NLA_U8.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN:
    A CCM frame has been received from Peer MEP.
    The type is NLA_U32 (bool).
    This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN:
    A CCM frame with TLV has been received from Peer MEP.
    The type is NLA_U32 (bool).
    This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN:
    A CCM frame with unexpected sequence number has been received
    from Peer MEP.
    The type is NLA_U32 (bool).
    When a sequence number is not one higher than previously received
    then it is unexpected.
    This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.

Signed-off-by: Henrik Bjoernlund  <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur  <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-29 18:39:44 -07:00
Henrik Bjoernlund
e77824d81d bridge: cfm: Netlink GET status Interface.
This is the implementation of CFM netlink status
get information interface.

Add new nested netlink attributes. These attributes are used by the
user space to get status information.

GETLINK:
    Request filter RTEXT_FILTER_CFM_STATUS:
    Indicating that CFM status information must be delivered.

    IFLA_BRIDGE_CFM:
        Points to the CFM information.

    IFLA_BRIDGE_CFM_MEP_STATUS_INFO:
        This indicate that the MEP instance status are following.
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO:
        This indicate that the peer MEP status are following.

CFM nested attribute has the following attributes in next level.

GETLINK RTEXT_FILTER_CFM_STATUS:
    IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE:
        The MEP instance number of the delivered status.
        The type is u32.
    IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN:
        The MEP instance received CFM PDU with unexpected Opcode.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN:
        The MEP instance received CFM PDU with unexpected version.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN:
        The MEP instance received CCM PDU with MD level lower than
        configured level. This frame is discarded.
        The type is u32 (bool).

    IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE:
        The MEP instance number of the delivered status.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID:
        The added Peer MEP ID of the delivered status.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT:
        The CCM defect status.
        The type is u32 (bool).
        True means no CCM frame is received for 3.25 intervals.
        IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL.
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI:
        The last received CCM PDU RDI.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE:
        The last received CCM PDU Port Status TLV value field.
        The type is u8.
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE:
        The last received CCM PDU Interface Status TLV value field.
        The type is u8.
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN:
        A CCM frame has been received from Peer MEP.
        The type is u32 (bool).
        This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN:
        A CCM frame with TLV has been received from Peer MEP.
        The type is u32 (bool).
        This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
    IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN:
        A CCM frame with unexpected sequence number has been received
        from Peer MEP.
        The type is u32 (bool).
        When a sequence number is not one higher than previously received
        then it is unexpected.
        This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.

Signed-off-by: Henrik Bjoernlund  <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur  <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-29 18:39:44 -07:00
Henrik Bjoernlund
5e312fc0e7 bridge: cfm: Netlink GET configuration Interface.
This is the implementation of CFM netlink configuration
get information interface.

Add new nested netlink attributes. These attributes are used by the
user space to get configuration information.

GETLINK:
    Request filter RTEXT_FILTER_CFM_CONFIG:
    Indicating that CFM configuration information must be delivered.

    IFLA_BRIDGE_CFM:
        Points to the CFM information.

    IFLA_BRIDGE_CFM_MEP_CREATE_INFO:
        This indicate that MEP instance create parameters are following.
    IFLA_BRIDGE_CFM_MEP_CONFIG_INFO:
        This indicate that MEP instance config parameters are following.
    IFLA_BRIDGE_CFM_CC_CONFIG_INFO:
        This indicate that MEP instance CC functionality
        parameters are following.
    IFLA_BRIDGE_CFM_CC_RDI_INFO:
        This indicate that CC transmitted CCM PDU RDI
        parameters are following.
    IFLA_BRIDGE_CFM_CC_CCM_TX_INFO:
        This indicate that CC transmitted CCM PDU parameters are
        following.
    IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO:
        This indicate that the added peer MEP IDs are following.

CFM nested attribute has the following attributes in next level.

GETLINK RTEXT_FILTER_CFM_CONFIG:
    IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE:
        The created MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN:
        The created MEP domain.
        The type is u32 (br_cfm_domain).
        It must be BR_CFM_PORT.
        This means that CFM frames are transmitted and received
        directly on the port - untagged. Not in a VLAN.
    IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION:
        The created MEP direction.
        The type is u32 (br_cfm_mep_direction).
        It must be BR_CFM_MEP_DIRECTION_DOWN.
        This means that CFM frames are transmitted and received on
        the port. Not in the bridge.
    IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX:
        The created MEP residence port ifindex.
        The type is u32 (ifindex).

    IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE:
        The deleted MEP instance number.
        The type is u32.

    IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC:
        The configured MEP unicast MAC address.
        The type is 6*u8 (array).
        This is used as SMAC in all transmitted CFM frames.
    IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL:
        The configured MEP unicast MD level.
        The type is u32.
        It must be in the range 1-7.
        No CFM frames are passing through this MEP on lower levels.
    IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID:
        The configured MEP ID.
        The type is u32.
        It must be in the range 0-0x1FFF.
        This MEP ID is inserted in any transmitted CCM frame.

    IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE:
        The Continuity Check (CC) functionality is enabled or disabled.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL:
        The CC expected receive interval of CCM frames.
        The type is u32 (br_cfm_ccm_interval).
        This is also the transmission interval of CCM frames when enabled.
    IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID:
        The CC expected receive MAID in CCM frames.
        The type is CFM_MAID_LENGTH*u8.
        This is MAID is also inserted in transmitted CCM frames.

    IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_PEER_MEPID:
        The CC Peer MEP ID added.
        The type is u32.
        When a Peer MEP ID is added and CC is enabled it is expected to
        receive CCM frames from that Peer MEP.

    IFLA_BRIDGE_CFM_CC_RDI_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_RDI_RDI:
        The RDI that is inserted in transmitted CCM PDU.
        The type is u32 (bool).

    IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC:
        The transmitted CCM frame destination MAC address.
        The type is 6*u8 (array).
        This is used as DMAC in all transmitted CFM frames.
    IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE:
        The transmitted CCM frame update (increment) of sequence
        number is enabled or disabled.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD:
        The period of time where CCM frame are transmitted.
        The type is u32.
        The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX
        must be done before timeout to keep transmission alive.
        When period is zero any ongoing CCM frame transmission
        will be stopped.
    IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV:
        The transmitted CCM frame update with Interface Status TLV
        is enabled or disabled.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE:
        The transmitted Interface Status TLV value field.
        The type is u8.
    IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV:
        The transmitted CCM frame update with Port Status TLV is enabled
        or disabled.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE:
        The transmitted Port Status TLV value field.
        The type is u8.

Signed-off-by: Henrik Bjoernlund  <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur  <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-29 18:39:43 -07:00
Henrik Bjoernlund
2be665c394 bridge: cfm: Netlink SET configuration Interface.
This is the implementation of CFM netlink configuration
set information interface.

Add new nested netlink attributes. These attributes are used by the
user space to create/delete/configure CFM instances.

SETLINK:
    IFLA_BRIDGE_CFM:
        Indicate that the following attributes are CFM.

    IFLA_BRIDGE_CFM_MEP_CREATE:
        This indicate that a MEP instance must be created.
    IFLA_BRIDGE_CFM_MEP_DELETE:
        This indicate that a MEP instance must be deleted.
    IFLA_BRIDGE_CFM_MEP_CONFIG:
        This indicate that a MEP instance must be configured.
    IFLA_BRIDGE_CFM_CC_CONFIG:
        This indicate that a MEP instance Continuity Check (CC)
        functionality must be configured.
    IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD:
        This indicate that a CC Peer MEP must be added.
    IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE:
        This indicate that a CC Peer MEP must be removed.
    IFLA_BRIDGE_CFM_CC_CCM_TX:
        This indicate that the CC transmitted CCM PDU must be configured.
    IFLA_BRIDGE_CFM_CC_RDI:
        This indicate that the CC transmitted CCM PDU RDI must be
        configured.

CFM nested attribute has the following attributes in next level.

SETLINK RTEXT_FILTER_CFM_CONFIG:
    IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE:
        The created MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN:
        The created MEP domain.
        The type is u32 (br_cfm_domain).
        It must be BR_CFM_PORT.
        This means that CFM frames are transmitted and received
        directly on the port - untagged. Not in a VLAN.
    IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION:
        The created MEP direction.
        The type is u32 (br_cfm_mep_direction).
        It must be BR_CFM_MEP_DIRECTION_DOWN.
        This means that CFM frames are transmitted and received on
        the port. Not in the bridge.
    IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX:
        The created MEP residence port ifindex.
        The type is u32 (ifindex).

    IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE:
        The deleted MEP instance number.
        The type is u32.

    IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC:
        The configured MEP unicast MAC address.
        The type is 6*u8 (array).
        This is used as SMAC in all transmitted CFM frames.
    IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL:
        The configured MEP unicast MD level.
        The type is u32.
        It must be in the range 1-7.
        No CFM frames are passing through this MEP on lower levels.
    IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID:
        The configured MEP ID.
        The type is u32.
        It must be in the range 0-0x1FFF.
        This MEP ID is inserted in any transmitted CCM frame.

    IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE:
        The Continuity Check (CC) functionality is enabled or disabled.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL:
        The CC expected receive interval of CCM frames.
        The type is u32 (br_cfm_ccm_interval).
        This is also the transmission interval of CCM frames when enabled.
    IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID:
        The CC expected receive MAID in CCM frames.
        The type is CFM_MAID_LENGTH*u8.
        This is MAID is also inserted in transmitted CCM frames.

    IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_PEER_MEPID:
        The CC Peer MEP ID added.
        The type is u32.
        When a Peer MEP ID is added and CC is enabled it is expected to
        receive CCM frames from that Peer MEP.

    IFLA_BRIDGE_CFM_CC_RDI_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_RDI_RDI:
        The RDI that is inserted in transmitted CCM PDU.
        The type is u32 (bool).

    IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE:
        The configured MEP instance number.
        The type is u32.
    IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC:
        The transmitted CCM frame destination MAC address.
        The type is 6*u8 (array).
        This is used as DMAC in all transmitted CFM frames.
    IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE:
        The transmitted CCM frame update (increment) of sequence
        number is enabled or disabled.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD:
        The period of time where CCM frame are transmitted.
        The type is u32.
        The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX
        must be done before timeout to keep transmission alive.
        When period is zero any ongoing CCM frame transmission
        will be stopped.
    IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV:
        The transmitted CCM frame update with Interface Status TLV
        is enabled or disabled.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE:
        The transmitted Interface Status TLV value field.
        The type is u8.
    IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV:
        The transmitted CCM frame update with Port Status TLV is enabled
        or disabled.
        The type is u32 (bool).
    IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE:
        The transmitted Port Status TLV value field.
        The type is u8.

Signed-off-by: Henrik Bjoernlund  <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur  <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-29 18:39:43 -07:00
Henrik Bjoernlund
86a14b79e1 bridge: cfm: Kernel space implementation of CFM. MEP create/delete.
This is the first commit of the implementation of the CFM protocol
according to 802.1Q section 12.14.

It contains MEP instance create, delete and configuration.

Connectivity Fault Management (CFM) comprises capabilities for
detecting, verifying, and isolating connectivity failures in
Virtual Bridged Networks. These capabilities can be used in
networks operated by multiple independent organizations, each
with restricted management access to each others equipment.

CFM functions are partitioned as follows:
    - Path discovery
    - Fault detection
    - Fault verification and isolation
    - Fault notification
    - Fault recovery

Interface consists of these functions:
br_cfm_mep_create()
br_cfm_mep_delete()
br_cfm_mep_config_set()
br_cfm_cc_config_set()
br_cfm_cc_peer_mep_add()
br_cfm_cc_peer_mep_remove()

A MEP instance is created by br_cfm_mep_create()
    -It is the Maintenance association End Point
     described in 802.1Q section 19.2.
    -It is created on a specific level (1-7) and is assuring
     that no CFM frames are passing through this MEP on lower levels.
    -It initiates and validates CFM frames on its level.
    -It can only exist on a port that is related to a bridge.
    -Attributes given cannot be changed until the instance is
     deleted.

A MEP instance can be deleted by br_cfm_mep_delete().

A created MEP instance has attributes that can be
configured by br_cfm_mep_config_set().

A MEP Continuity Check feature can be configured by
br_cfm_cc_config_set()
    The Continuity Check Receiver state machine can be
    enabled and disabled.
    According to 802.1Q section 19.2.8

A MEP can have Peer MEPs added and removed by
br_cfm_cc_peer_mep_add() and br_cfm_cc_peer_mep_remove()
    The Continuity Check feature can maintain connectivity
    status on each added Peer MEP.

Signed-off-by: Henrik Bjoernlund  <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur  <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-29 18:39:43 -07:00
Henrik Bjoernlund
f323aa54be bridge: cfm: Add BRIDGE_CFM to Kconfig.
This makes it possible to include or exclude the CFM
protocol according to 802.1Q section 12.14.

Signed-off-by: Henrik Bjoernlund  <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur  <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-29 18:39:43 -07:00
Henrik Bjoernlund
90c628dd47 net: bridge: extend the process of special frames
This patch extends the processing of frames in the bridge. Currently MRP
frames needs special processing and the current implementation doesn't
allow a nice way to process different frame types. Therefore try to
improve this by adding a list that contains frame types that need
special processing. This list is iterated for each input frame and if
there is a match based on frame type then these functions will be called
and decide what to do with the frame. It can process the frame then the
bridge doesn't need to do anything or don't process so then the bridge
will do normal forwarding.

Signed-off-by: Henrik Bjoernlund  <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur  <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-29 18:39:43 -07:00
Nikolay Aleksandrov
9116ffbf1d net: bridge: mcast: add support for blocked port groups
When excluding S,G entries we need a way to block a particular S,G,port.
The new port group flag is managed based on the source's timer as per
RFCs 3376 and 3810. When a source expires and its port group is in
EXCLUDE mode, it will be blocked.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov
8266a0491e net: bridge: mcast: handle port group filter modes
We need to handle group filter mode transitions and initial state.
To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
a new port group in EXCLUDE mode) we need to add that port to all of
*,G ports' S,G entries for proper replication. When the EXCLUDE state is
changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
called after the source list processing because the assumption is that
all of the group's S,G entries will be created before transitioning to
EXCLUDE mode, i.e. most importantly its blocked entries will already be
added so it will not get automatically added to them.
The transition EXCLUDE -> INCLUDE happens only when a port group timer
expires, it requires us to remove that port from all of *,G ports' S,G
entries where it was automatically added previously.
Finally when we are adding a new S,G entry we must add all of *,G's
EXCLUDE ports to it.
In order to distinguish automatically added *,G EXCLUDE ports we have a
new port group flag - MDB_PG_FLAGS_STAR_EXCL.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov
b08123684b net: bridge: mcast: install S,G entries automatically based on reports
This patch adds support for automatic install of S,G mdb entries based
on the port group's source list and the source entry's timer.
Once installed the S,G will be used when forwarding packets if the
approprate multicast/mld versions are set. A new source flag called
BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry
installed.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov
085b53c8be net: bridge: mcast: add sg_port rhashtable
To speedup S,G forward handling we need to be able to quickly find out
if a port is a member of an S,G group. To do that add a global S,G port
rhashtable with key: source addr, group addr, protocol, vid (all br_ip
fields) and port pointer.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov
8f8cb77e0b net: bridge: mcast: add rt_protocol field to the port group struct
We need to be able to differentiate between pg entries created by
user-space and the kernel when we start generating S,G entries for
IGMPv3/MLDv2's fast path. User-space entries are created by default as
RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can
allow user-space to provide the entry rt_protocol so we can
differentiate between who added the entries specifically (e.g. clag,
admin, frr etc).

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov
88d4bd1804 net: bridge: mdb: add support for add/del/dump of entries with source
Add new mdb attributes (MDBE_ATTR_SOURCE for setting,
MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb
entries with a source address (S,G). New S,G entries are created with
filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and
IPv6, they're validated and parsed based on their protocol.
S,G host joined entries which are added by user are not allowed yet.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov
e12cec65b5 net: bridge: mcast: destroy all entries via gc
Since each entry type has timers that can be running simultaneously we need
to make sure that entries are not freed before their timers have finished.
In order to do that generalize the src gc work to mcast gc work and use a
callback to free the entries (mdb, port group or src).

v3: add IPv6 support
v2: force mcast gc on port del to make sure all port group timers have
    finished before freeing the bridge port

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov
0436862e41 net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report
This patch adds handling for the ALLOW_NEW_SOURCES IGMPv3/MLDv2 report
types and limits them only when multicast_igmp_version == 3 or
multicast_mld_version == 2 respectively. Now that IGMPv3/MLDv2 handling
functions will be managing timers we need to delay their activation, thus
a new argument is added which controls if the timer should be updated.
We also disable host IGMPv3/MLDv2 handling as it's not yet implemented and
could cause inconsistent group state, the host can only join a group as
EXCLUDE {} or leave it.

v4: rename update_timer to igmpv2_mldv1 and use the passed value from
    br_multicast_add_group's callers
v3: Add IPv6/MLDv2 support

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov
81f1983852 net: bridge: mdb: use mdb and port entries in notifications
We have to use mdb and port entries when sending mdb notifications in
order to fill in all group attributes properly. Before this change we
would've used a fake br_mdb_entry struct to fill in only partial
information about the mdb. Now we can also reuse the mdb dump fill
function and thus have only a single central place which fills the mdb
attributes.

v3: add IPv6 support

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov
42c11ccfe8 net: bridge: mcast: add support for group query retransmit
We need to be able to retransmit group-specific and group-and-source
specific queries. The new timer takes care of those.

v3: add IPv6 support

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov
438ef2d027 net: bridge: mcast: add support for group-and-source specific queries
Allows br_multicast_alloc_query to build queries with the port group's
source lists and sends a query for sources over and under lmqt when
necessary as per RFCs 3376 and 3810 with the suppress flag set
appropriately.

v3: add IPv6 support

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov
8b671779b7 net: bridge: mcast: add support for group source list
Initial functions for group source lists which are needed for IGMPv3
and MLDv2 include/exclude lists. Both IPv4 and IPv6 sources are supported.
User-added mdb entries are created with exclude filter mode, we can
extend that later to allow user-supplied mode. When group src entries
are deleted, they're freed from a workqueue to make sure their timers
are not still running. Source entries are protected by the multicast_lock
and rcu. The number of src groups per port group is limited to 32.

v4: use the new port group del function directly
    add igmpv2/mldv1 bool to denote if the entry was added in those
    modes, it will later replace the old update_timer bool
v3: add IPv6 support
v2: allow src groups to be traversed under rcu

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov
681590bd4c net: bridge: mcast: factor out port group del
In order to avoid future errors and reduce code duplication we should
factor out the port group del sequence. This allows us to have one
function which takes care of all details when removing a port group.

v4: set pg's fast leave flag when deleting due to fast leave
    move the patch before adding source lists

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov
6ec0d0ee66 net: bridge: mdb: arrange internal structs so fast-path fields are close
Before this patch we'd need 2 cache lines for fast-path, now all used
fields are in the first cache line.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov
528ae84a34 net: bridge: fix undefined br_vlan_can_enter_range in tunnel code
If bridge vlan filtering is not defined we won't have
br_vlan_can_enter_range and thus will get a compile error as was
reported by Stephen and the build bot. So let's define a stub for when
vlan filtering is not used.

Fixes: 9433944368 ("net: bridge: notify on vlan tunnel changes done via the old api")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 11:22:55 -07:00
David S. Miller
71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Horatiu Vultur
df42ef227d bridge: mrp: Add br_mrp_fill_info
Add the function br_mrp_fill_info which populates the MRP attributes
regarding the status of each MRP instance.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02 14:19:15 -07:00
Horatiu Vultur
9b14d1f8a7 bridge: mrp: Fix endian conversion and some other warnings
The following sparse warnings are fixed:
net/bridge/br_mrp.c:106:18: warning: incorrect type in assignment (different base types)
net/bridge/br_mrp.c:106:18:    expected unsigned short [usertype]
net/bridge/br_mrp.c:106:18:    got restricted __be16 [usertype]
net/bridge/br_mrp.c:281:23: warning: incorrect type in argument 1 (different modifiers)
net/bridge/br_mrp.c:281:23:    expected struct list_head *entry
net/bridge/br_mrp.c:281:23:    got struct list_head [noderef] *
net/bridge/br_mrp.c:332:28: warning: incorrect type in argument 1 (different modifiers)
net/bridge/br_mrp.c:332:28:    expected struct list_head *new
net/bridge/br_mrp.c:332:28:    got struct list_head [noderef] *
net/bridge/br_mrp.c:332:40: warning: incorrect type in argument 2 (different modifiers)
net/bridge/br_mrp.c:332:40:    expected struct list_head *head
net/bridge/br_mrp.c:332:40:    got struct list_head [noderef] *
net/bridge/br_mrp.c:682:29: warning: incorrect type in argument 1 (different modifiers)
net/bridge/br_mrp.c:682:29:    expected struct list_head const *head
net/bridge/br_mrp.c:682:29:    got struct list_head [noderef] *

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 2f1a11ae11 ("bridge: mrp: Add MRP interface.")
Fixes: 4b8d7d4c59 ("bridge: mrp: Extend bridge interface")
Fixes: 9a9f26e8f7 ("bridge: mrp: Connect MRP API with the switchdev API")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28 20:44:10 -07:00
David S. Miller
7bed145516 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes in xfrm_device.c, between the double
ESP trailing bug fix setting the XFRM_INIT flag and the changes
in net-next preparing for bonding encryption support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 19:29:51 -07:00
Thomas Martitz
206e732323 net: bridge: enfore alignment for ethernet address
The eth_addr member is passed to ether_addr functions that require
2-byte alignment, therefore the member must be properly aligned
to avoid unaligned accesses.

The problem is in place since the initial merge of multicast to unicast:
commit 6db6f0eae6 bridge: multicast to unicast

Fixes: 6db6f0eae6 ("bridge: multicast to unicast")
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Thomas Martitz <t.martitz@avm.de>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 12:38:16 -07:00
Nikolay Aleksandrov
31cbc39b63 net: bridge: add option to allow activity notifications for any fdb entries
This patch adds the ability to notify about activity of any entries
(static, permanent or ext_learn). EVPN multihoming peers need it to
properly and efficiently handle mac sync (peer active/locally active).
We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the
current activity state and to control if static entries should be monitored
at all. We use 2 bits - one to activate fdb entry tracking (disabled by
default) and the second to denote that an entry is inactive. We need
the second bit in order to avoid multiple notifications of inactivity.
Obviously this makes no difference for dynamic entries since at the time
of inactivity they get deleted, while the tracked non-dynamic entries get
the inactive bit set and get a notification.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24 14:36:33 -07:00
Vladimir Oltean
9eb8eff0cf net: bridge: allow enslaving some DSA master network devices
Commit 8db0a2ee2c ("net: bridge: reject DSA-enabled master netdevices
as bridge members") added a special check in br_if.c in order to check
for a DSA master network device with a tagging protocol configured. This
was done because back then, such devices, once enslaved in a bridge
would become inoperative and would not pass DSA tagged traffic anymore
due to br_handle_frame returning RX_HANDLER_CONSUMED.

But right now we have valid use cases which do require bridging of DSA
masters. One such example is when the DSA master ports are DSA switch
ports themselves (in a disjoint tree setup). This should be completely
equivalent, functionally speaking, from having multiple DSA switches
hanging off of the ports of a switchdev driver. So we should allow the
enslaving of DSA tagged master network devices.

Instead of the regular br_handle_frame(), install a new function
br_handle_frame_dummy() on these DSA masters, which returns
RX_HANDLER_PASS in order to call into the DSA specific tagging protocol
handlers, and lift the restriction from br_add_if.

Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 19:52:33 -07:00
Eric Dumazet
f78ed2204d netpoll: accept NULL np argument in netpoll_send_skb()
netpoll_send_skb() callers seem to leak skb if
the np pointer is NULL. While this should not happen, we
can make the code more robust.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 18:11:07 -07:00
Jason Yan
8741e18419 net: bridge: return false in br_mrp_enabled()
Fix the following coccicheck warning:

net/bridge/br_private.h:1334:8-9: WARNING: return of 0/1 in function
'br_mrp_enabled' with return type bool

Fixes: 6536993371 ("bridge: mrp: Integrate MRP into the bridge")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 13:57:30 -07:00
Horatiu Vultur
419dba8a49 net: bridge: Add checks for enabling the STP.
It is not possible to have the MRP and STP running at the same time on the
bridge, therefore add check when enabling the STP to check if MRP is already
enabled. In that case return error.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:40:25 -07:00
Horatiu Vultur
6536993371 bridge: mrp: Integrate MRP into the bridge
To integrate MRP into the bridge, the bridge needs to do the following:
- detect if the MRP frame was received on MRP ring port in that case it would be
  processed otherwise just forward it as usual.
- enable parsing of MRP
- before whenever the bridge was set up, it would set all the ports in
  forwarding state. Add an extra check to not set ports in forwarding state if
  the port is an MRP ring port. The reason of this change is that if the MRP
  instance initially sets the port in blocked state by setting the bridge up it
  would overwrite this setting.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:40:25 -07:00
Horatiu Vultur
4b8d7d4c59 bridge: mrp: Extend bridge interface
To integrate MRP into the bridge, first the bridge needs to be aware of ports
that are part of an MRP ring and which rings are on the bridge.
Therefore extend bridge interface with the following:
- add new flag(BR_MPP_AWARE) to the net bridge ports, this bit will be
  set when the port is added to an MRP instance. In this way it knows if
  the frame was received on MRP ring port
- add new flag(BR_MRP_LOST_CONT) to the net bridge ports, this bit will be set
  when the port lost the continuity of MRP Test frames.
- add a list of MRP instances

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:40:25 -07:00
Nikolay Aleksandrov
99f7c5e096 net: bridge: vlan options: rename br_vlan_opts_eq to br_vlan_opts_eq_range
It is more appropriate name as it shows the intent of why we need to
check the options' state. It also allows us to give meaning to the two
arguments of the function: the first is the current vlan (v_curr) being
checked if it could enter the range ending in the second one (range_end).

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 22:47:12 -07:00
Nikolay Aleksandrov
a580c76d53 net: bridge: vlan: add per-vlan state
The first per-vlan option added is state, it is needed for EVPN and for
per-vlan STP. The state allows to control the forwarding on per-vlan
basis. The vlan state is considered only if the port state is forwarding
in order to avoid conflicts and be consistent. br_allowed_egress is
called only when the state is forwarding, but the ingress case is a bit
more complicated due to the fact that we may have the transition between
port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still
allow the bridge to learn from the packet after vlan filtering and it will
be dropped after that. Also to optimize the pvid state check we keep a
copy in the vlan group to avoid one lookup. The state members are
modified with *_ONCE() to annotate the lockless access.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 12:58:14 +01:00
Nikolay Aleksandrov
a5d29ae226 net: bridge: vlan: add basic option setting support
This patch adds support for option modification of single vlans and
ranges. It allows to only modify options, i.e. skip create/delete by
using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range
option changes we try to pack the notifications as much as possible.

v2: do full port (all vlans) notification only when creating/deleting
    vlans for compatibility, rework the range detection when changing
    options, add more verbose extack errors and check if a vlan should
    be used (br_vlan_should_use checks)

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 12:58:14 +01:00
Nikolay Aleksandrov
7a53e718c5 net: bridge: vlan: add basic option dumping support
We'll be dumping the options for the whole range if they're equal. The
first range vlan will be used to extract the options. The commit doesn't
change anything yet it just adds the skeleton for the support. The dump
will happen when the first option is added.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 12:58:14 +01:00
Nikolay Aleksandrov
f545923b4a net: bridge: vlan: notify on vlan add/delete/change flags
Now that we can notify, send a notification on add/del or change of flags.
Notifications are also compressed when possible to reduce their number
and relieve user-space of extra processing, due to that we have to
manually notify after each add/del in order to avoid double
notifications. We try hard to notify only about the vlans which actually
changed, thus a single command can result in multiple notifications
about disjoint ranges if there were vlans which didn't change inside.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov
cf5bddb95c net: bridge: vlan: add rtnetlink group and notify support
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN
and add support for sending vlan notifications (both single and ranges).
No functional changes intended, the notification support will be used by
later patches.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov
f26b296585 net: bridge: vlan: add new rtm message support
Add initial RTM_NEWVLAN support which can only create vlans, operating
similar to the current br_afspec(). We will use it later to also change
per-vlan options. Old-style (flag-based) vlan ranges are not allowed
when using RTM messages, we will introduce vlan ranges later via a new
nested attribute which would allow us to have all the information about a
range encapsulated into a single nl attribute.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:17 +01:00
Nikolay Aleksandrov
8dcea18708 net: bridge: vlan: add rtm definitions and dump support
This patch adds vlan rtm definitions:
 - NEWVLAN: to be used for creating vlans, setting options and
   notifications
 - DELVLAN: to be used for deleting vlans
 - GETVLAN: used for dumping vlan information

Dumping vlans which can span multiple messages is added now with basic
information (vid and flags). We use nlmsg_parse() to validate the header
length in order to be able to extend the message with filtering
attributes later.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:17 +01:00
Nikolay Aleksandrov
8f4cc940a1 net: bridge: netlink: add extack error messages when processing vlans
Add extack messages on vlan processing errors. We need to move the flags
missing check after the "last" check since we may have "last" set but
lack a range end flag in the next entry.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:17 +01:00
Nikolay Aleksandrov
5a46facbbc net: bridge: vlan: add helpers to check for vlan id/range validity
Add helpers to check if a vlan id or range are valid. The range helper
must be called when range start or end are detected.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:17 +01:00
Vivien Didelot
de1799667b net: bridge: add STP xstats
This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk,
transition_fwd xstats counters to the bridge ports copied over via
netlink, providing useful information for STP.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-12-14 20:02:36 -08:00
Nikolay Aleksandrov
5d1fcaf35d net: bridge: fdb: eliminate extra port state tests from fast-path
When commit df1c0b8468 ("[BRIDGE]: Packets leaking out of
disabled/blocked ports.") introduced the port state tests in
br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was
also used to avoid learning/refreshing from user-space with NTF_USE. Those
two tests are done for every packet entering the bridge if it's learning,
but for the fast-path we already have them checked in br_handle_frame() and
is unnecessary to do it again. Thus push the checks to the unlikely cases
and drop them from br_fdb_update(), the new nbp_state_should_learn() helper
is used to determine if the port state allows br_fdb_update() to be called.
The two places which need to do it manually are:
 - user-space add call with NTF_USE set
 - link-local packet learning done in __br_handle_local_finish()

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04 11:15:27 -08:00
Nikolay Aleksandrov
be0c567797 net: bridge: fdb: br_fdb_update can take flags directly
If we modify br_fdb_update() to take flags directly we can get rid of
one test and one atomic bitop in the learning path.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 10:32:43 -07:00
Nikolay Aleksandrov
d38c6e3db0 net: bridge: fdb: convert offloaded to use bitops
Convert the offloaded field to a flag and use bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov
b5cd9f7c42 net: bridge: fdb: convert added_by_external_learn to use bitops
Convert the added_by_external_learn field to a flag and use bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov
ac3ca6af44 net: bridge: fdb: convert added_by_user to bitops
Straight-forward convert of the added_by_user field to bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov
e0458d9a73 net: bridge: fdb: convert is_sticky to bitops
Straight-forward convert of the is_sticky field to bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov
29e63fffd6 net: bridge: fdb: convert is_static to bitops
Convert the is_static to bitops, make use of the combined
test_and_set/clear_bit to simplify expressions in fdb_add_entry.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov
6869c3b02b net: bridge: fdb: convert is_local to bitops
The patch adds a new fdb flags field in the hole between the two cache
lines and uses it to convert is_local to bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov
1bc844ee0f net: bridge: mdb: allow add/delete for host-joined groups
Currently this is needed only for user-space compatibility, so similar
object adds/deletes as the dumped ones would succeed. Later it can be
used for L2 mcast MAC add/delete.

v3: fix compiler warning (DaveM)
v2: don't send a notification when used from user-space, arm the group
    timer if no ports are left after host entry del

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17 12:36:57 -07:00
David S. Miller
13dfb3fa49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Just minor overlapping changes in the conflicts here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 18:44:57 -07:00
Nikolay Aleksandrov
091adf9ba6 net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
Most of the bridge device's vlan init bugs come from the fact that its
default pvid is created at the wrong time, way too early in ndo_init()
before the device is even assigned an ifindex. It introduces a bug when the
bridge's dev_addr is added as fdb during the initial default pvid creation
the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
which really makes no sense for user-space[0] and is wrong.
Usually user-space software would ignore such entries, but they are
actually valid and will eventually have all necessary attributes.
It makes much more sense to send a notification *after* the device has
registered and has a proper ifindex allocated rather than before when
there's a chance that the registration might still fail or to receive
it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
from br_vlan_flush() since that case can no longer happen. At
NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
depending why it was called (if called due to NETDEV_REGISTER error
it'll still be == 1, otherwise it could be any value changed during the
device life time).

For the demonstration below a small change to iproute2 for printing all fdb
notifications is added, because it contained a workaround not to show
entries with ifindex == 0.
Command executed while monitoring: $ ip l add br0 type bridge
Before (both ifindex and master == 0):
$ bridge monitor fdb
36:7e:8a:b3:56:ba dev * vlan 1 master * permanent

After (proper br0 ifindex):
$ bridge monitor fdb
e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent

v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
v3: send the correct v2 patch with all changes (stub should return 0)
v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
    the br_vlan_bridge_event stub when bridge vlans are disabled

[0] https://bugzilla.kernel.org/show_bug.cgi?id=204389

Reported-by: michael-dev <michael-dev@fami-braun.de>
Fixes: 5be5a2df40 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 13:32:53 -07:00
Nikolay Aleksandrov
3247b27204 net: bridge: mcast: add delete due to fast-leave mdb flag
In user-space there's no way to distinguish why an mdb entry was deleted
and that is a problem for daemons which would like to keep the mdb in
sync with remote ends (e.g. mlag) but would also like to converge faster.
In almost all cases we'd like to age-out the remote entry for performance
and convergence reasons except when fast-leave is enabled. In that case we
want explicit immediate remote delete, thus add mdb flag which is set only
when the entry is being deleted due to fast-leave.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 19:13:40 -04:00
David S. Miller
a6cdeeb16b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-07 11:00:14 -07:00
Pablo Neira Ayuso
3c171f496e netfilter: bridge: add connection tracking system
This patch adds basic connection tracking support for the bridge,
including initial IPv4 support.

This patch register two hooks to deal with the bridge forwarding path,
one from the bridge prerouting hook to call nf_conntrack_in(); and
another from the bridge postrouting hook to confirm the entry.

The conntrack bridge prerouting hook defragments packets before passing
them to nf_conntrack_in() to look up for an existing entry, otherwise a
new entry is allocated and it is attached to the skbuff. The conntrack
bridge postrouting hook confirms new conntrack entries, ie. if this is
the first packet seen, then it adds the entry to the hashtable and (if
needed) it refragments the skbuff into the original fragments, leaving
the geometry as is if possible. Exceptions are linearized skbuffs, eg.
skbuffs that are passed up to nfqueue and conntrack helpers, as well as
cloned skbuff for the local delivery (eg. tcpdump), also in case of
bridge port flooding (cloned skbuff too).

The packet defragmentation is done through the ip_defrag() call.  This
forces us to save the bridge control buffer, reset the IP control buffer
area and then restore it after call. This function also bumps the IP
fragmentation statistics, it would be probably desiderable to have
independent statistics for the bridge defragmentation/refragmentation.
The maximum fragment length is stored in the control buffer and it is
used to refragment the skbuff from the postrouting path.

The new fraglist splitter and fragment transformer APIs are used to
implement the bridge refragmentation code. The br_ip_fragment() function
drops the packet in case the maximum fragment size seen is larger than
the output port MTU.

This patchset follows the principle that conntrack should not drop
packets, so users can do it through policy via invalid state matching.

Like br_netfilter, there is no refragmentation for packets that are
passed up for local delivery, ie. prerouting -> input path. There are
calls to nf_reset() already in several spots in the stack since time ago
already, eg. af_packet, that show that skbuff fraglist handling from the
netif_rx path is supported already.

The helpers are called from the postrouting hook, before confirmation,
from there we may see packet floods to bridge ports. Then, although
unlikely, this may result in exercising the helpers many times for each
clone. It would be good to explore how to pass all the packets in a list
to the conntrack hook to do this handle only once for this case.

Thanks to Florian Westphal for handing me over an initial patchset
version to add support for conntrack bridge.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 14:18:18 -07:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Mike Manning
9c0ec2e718 bridge: support binding vlan dev link state to vlan member bridge ports
In the case of vlan filtering on bridges, the bridge may also have the
corresponding vlan devices as upper devices. A vlan bridge binding mode
is added to allow the link state of the vlan device to track only the
state of the subset of bridge ports that are also members of the vlan,
rather than that of all bridge ports. This mode is set with a vlan flag
rather than a bridge sysfs so that the 8021q module is aware that it
should not set the link state for the vlan device.

If bridge vlan is configured, the bridge device event handling results
in the link state for an upper device being set, if it is a vlan device
with the vlan bridge binding mode enabled. This also sets a
vlan_bridge_binding flag so that subsequent UP/DOWN/CHANGE events for
the ports in that bridge result in a link state update of the vlan
device if required.

The link state of the vlan device is up if there is at least one bridge
port that is a vlan member that is admin & oper up, otherwise its oper
state is IF_OPER_LOWERLAYERDOWN.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19 13:58:17 -07:00
Florian Westphal
223fd0adfa bridge: broute: make broute a real ebtables table
This makes broute a normal ebtables table, hooking at PREROUTING.
The broute hook is removed.

It uses skb->cb to signal to bridge rx handler that the skb should be
routed instead of being bridged.

This change is backwards compatible with ebtables as no userspace visible
parts are changed.

This means we can also remove the !ops test in ebt_register_table,
it was only there for broute table sake.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12 01:47:50 +02:00