mlx5-updates-2023-06-06

1) Support 4 ports VF LAG, part 2/2 2) Few extra trivial cleanup patches Shay Drory Says: ================ Support 4 ports VF LAG, part 2/2 This series continues the series[1] "Support 4 ports VF LAG, part1/2". This series adds support for 4 ports VF LAG (single FDB E-Switch). This series of patches refactoring LAG code that make assumptions about VF LAG supporting only two ports and then enable 4 ports VF LAG. Patch 1: - Fix for ib rep code Patches 2-5: - Refactors LAG layer. Patches 6-7: - Block LAG types which doesn't support 4 ports. Patch 8: - Enable 4 ports VF LAG. This series specifically allows HCAs with 4 ports to create a VF LAG with only 4 ports. It is not possible to create a VF LAG with 2 or 3 ports using HCAs that have 4 ports. Currently, the Merged E-Switch feature only supports HCAs with 2 ports. However, upcoming patches will introduce support for HCAs with 4 ports. In order to activate VF LAG a user can execute: devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev devlink dev eswitch set pci/0000:08:00.2 mode switchdev devlink dev eswitch set pci/0000:08:00.3 mode switchdev ip link add name bond0 type bond ip link set dev bond0 type bond mode 802.3ad ip link set dev eth2 master bond0 ip link set dev eth3 master bond0 ip link set dev eth4 master bond0 ip link set dev eth5 master bond0 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively. User can verify LAG state and type via debugfs: /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type [1] https://lore.kernel.org/netdev/20230601060118.154015-1-saeed@kernel.org/T/#mf1d2083780970ba277bfe721554d4925f03f36d1 ================ -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSA7/0ACgkQSD+KveBX +j4faQgApm14Id8QTB0rSj9tO1tJFtSgCcpDN9DtyYWuq3B0rGW9CPC1rPdaFOlt xst7PtEaCiJu7a7dwlH/kFLSAlXpZHdUZA+VG8JF0aYV8qOV/0R0xQKZgP68kwkn vFZqZCzA1vR6egK3AweAjKAVaqDKSSKVlFGXJGzyNpGMWpGEEKodlZKCH7Jd580F UFhCqbyY8vccMUa3cvrLVjePUjdM1xxsLKHWmYXTaN2NkoLvOYXnXThElu7skm96 Uqv8B9t2FoojZPBxgiJtoGKZ516+1dozORq7ioQug3oG9P/vpTY5QnTMKSZpJUTH 5cdSCdqii4UDPqfe0PtdEdE1O2aWig== =A44p -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-06-06 1) Support 4 ports VF LAG, part 2/2 2) Few extra trivial cleanup patches Shay Drory Says: ================ Support 4 ports VF LAG, part 2/2 This series continues the series[1] "Support 4 ports VF LAG, part1/2". This series adds support for 4 ports VF LAG (single FDB E-Switch). This series of patches refactoring LAG code that make assumptions about VF LAG supporting only two ports and then enable 4 ports VF LAG. Patch 1: - Fix for ib rep code Patches 2-5: - Refactors LAG layer. Patches 6-7: - Block LAG types which doesn't support 4 ports. Patch 8: - Enable 4 ports VF LAG. This series specifically allows HCAs with 4 ports to create a VF LAG with only 4 ports. It is not possible to create a VF LAG with 2 or 3 ports using HCAs that have 4 ports. Currently, the Merged E-Switch feature only supports HCAs with 2 ports. However, upcoming patches will introduce support for HCAs with 4 ports. In order to activate VF LAG a user can execute: devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev devlink dev eswitch set pci/0000:08:00.2 mode switchdev devlink dev eswitch set pci/0000:08:00.3 mode switchdev ip link add name bond0 type bond ip link set dev bond0 type bond mode 802.3ad ip link set dev eth2 master bond0 ip link set dev eth3 master bond0 ip link set dev eth4 master bond0 ip link set dev eth5 master bond0 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively. User can verify LAG state and type via debugfs: /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type [1] https://lore.kernel.org/netdev/20230601060118.154015-1-saeed@kernel.org/T/#mf1d2083780970ba277bfe721554d4925f03f36d1 ================ * tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: simplify condition after napi budget handling change mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failure net/mlx5e: TC, refactor access to hash key net/mlx5e: Remove RX page cache leftovers net/mlx5e: Expose catastrophic steering error counters net/mlx5: Enable 4 ports VF LAG net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 ports net/mlx5: LAG, block multipath LAG in case ldev have more than 2 ports net/mlx5: LAG, change mlx5_shared_fdb_supported() to static net/mlx5: LAG, generalize handling of shared FDB net/mlx5: LAG, check if all eswitches are paired for shared FDB {net/RDMA}/mlx5: introduce lag_for_each_peer RDMA/mlx5: Free second uplink ib port ==================== Link: https://lore.kernel.org/r/20230607210410.88209-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15 00:21:59 +00:00 · 2023-06-08 19:28:20 -07:00 · 2023-06-08 19:28:20 -07:00 · f84ad5cffd
commit f84ad5cffd
parent 55b24334c0 803ea346bd
18 changed files with 202 additions and 104 deletions
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
@ -290,6 +290,13 @@ Description of the vnic counters:
 - nic_receive_steering_discard
        number of packets that completed RX flow
        steering but were discarded due to a mismatch in flow table.
+- generated_pkt_steering_fail
+	number of packets generated by the VNIC experiencing unexpected steering
+	failure (at any point in steering flow).
+- handled_pkt_steering_fail
+	number of packets handled by the VNIC experiencing unexpected steering
+	failure (at any point in steering flow owned by the VNIC, including the FDB
+	for the eswitch owner).

 User commands examples:

--- a/drivers/infiniband/hw/mlx5/ib_rep.c
+++ b/drivers/infiniband/hw/mlx5/ib_rep.c
@ -30,45 +30,65 @@ mlx5_ib_set_vport_rep(struct mlx5_core_dev *dev,

 static void mlx5_ib_register_peer_vport_reps(struct mlx5_core_dev *mdev);

+static void mlx5_ib_num_ports_update(struct mlx5_core_dev *dev, u32 *num_ports)
+{
+	struct mlx5_core_dev *peer_dev;
+	int i;
+
+	mlx5_lag_for_each_peer_mdev(dev, peer_dev, i) {
+		u32 peer_num_ports = mlx5_eswitch_get_total_vports(peer_dev);
+
+		if (mlx5_lag_is_mpesw(peer_dev))
+			*num_ports += peer_num_ports;
+		else
+			/* Only 1 ib port is the representor for all uplinks */
+			*num_ports += peer_num_ports - 1;
+	}
+}
+
 static int
 mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
 {
 	u32 num_ports = mlx5_eswitch_get_total_vports(dev);
+	struct mlx5_core_dev *lag_master = dev;
 	const struct mlx5_ib_profile *profile;
 	struct mlx5_core_dev *peer_dev;
 	struct mlx5_ib_dev *ibdev;
-	int second_uplink = false;
-	u32 peer_num_ports;
+	int new_uplink = false;
 	int vport_index;
 	int ret;
+	int i;

 	vport_index = rep->vport_index;

 	if (mlx5_lag_is_shared_fdb(dev)) {
-		peer_dev = mlx5_lag_get_peer_mdev(dev);
-		peer_num_ports = mlx5_eswitch_get_total_vports(peer_dev);
 		if (mlx5_lag_is_master(dev)) {
-			if (mlx5_lag_is_mpesw(dev))
-				num_ports += peer_num_ports;
-			else
-				num_ports += peer_num_ports - 1;
-
+			mlx5_ib_num_ports_update(dev, &num_ports);
 		} else {
 			if (rep->vport == MLX5_VPORT_UPLINK) {
 				if (!mlx5_lag_is_mpesw(dev))
 					return 0;
-				second_uplink = true;
+				new_uplink = true;
 			}
+			mlx5_lag_for_each_peer_mdev(dev, peer_dev, i) {
+				u32 peer_n_ports = mlx5_eswitch_get_total_vports(peer_dev);

-			vport_index += peer_num_ports;
-			dev = peer_dev;
+				if (mlx5_lag_is_master(peer_dev))
+					lag_master = peer_dev;
+				else if (!mlx5_lag_is_mpesw(dev))
+				/* Only 1 ib port is the representor for all uplinks */
+					peer_n_ports--;
+
+				if (mlx5_get_dev_index(peer_dev) < mlx5_get_dev_index(dev))
+					vport_index += peer_n_ports;
+			}
 		}
 	}

-	if (rep->vport == MLX5_VPORT_UPLINK && !second_uplink)
+	if (rep->vport == MLX5_VPORT_UPLINK && !new_uplink)
 		profile = &raw_eth_profile;
 	else
-		return mlx5_ib_set_vport_rep(dev, rep, vport_index);
+		return mlx5_ib_set_vport_rep(lag_master, rep, vport_index);

 	ibdev = ib_alloc_device(mlx5_ib_dev, ib_dev);
 	if (!ibdev)
@ -85,8 +105,8 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
 	vport_index = rep->vport_index;
 	ibdev->port[vport_index].rep = rep;
 	ibdev->port[vport_index].roce.netdev =
-		mlx5_ib_get_rep_netdev(dev->priv.eswitch, rep->vport);
-	ibdev->mdev = dev;
+		mlx5_ib_get_rep_netdev(lag_master->priv.eswitch, rep->vport);
+	ibdev->mdev = lag_master;
 	ibdev->num_ports = num_ports;

 	ret = __mlx5_ib_add(ibdev, profile);
@ -94,8 +114,8 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
 		goto fail_add;

 	rep->rep_data[REP_IB].priv = ibdev;
-	if (mlx5_lag_is_shared_fdb(dev))
-		mlx5_ib_register_peer_vport_reps(dev);
+	if (mlx5_lag_is_shared_fdb(lag_master))
+		mlx5_ib_register_peer_vport_reps(lag_master);

 	return 0;

@ -118,23 +138,27 @@ mlx5_ib_vport_rep_unload(struct mlx5_eswitch_rep *rep)
 	struct mlx5_ib_dev *dev = mlx5_ib_rep_to_dev(rep);
 	int vport_index = rep->vport_index;
 	struct mlx5_ib_port *port;
+	int i;

 	if (WARN_ON(!mdev))
 		return;

-	if (mlx5_lag_is_shared_fdb(mdev) &&
-	    !mlx5_lag_is_master(mdev)) {
-		struct mlx5_core_dev *peer_mdev;
-
-		if (rep->vport == MLX5_VPORT_UPLINK)
-			return;
-		peer_mdev = mlx5_lag_get_peer_mdev(mdev);
-		vport_index += mlx5_eswitch_get_total_vports(peer_mdev);
-	}
-
 	if (!dev)
 		return;

+	if (mlx5_lag_is_shared_fdb(mdev) &&
+	    !mlx5_lag_is_master(mdev)) {
+		if (rep->vport == MLX5_VPORT_UPLINK && !mlx5_lag_is_mpesw(mdev))
+			return;
+		for (i = 0; i < dev->num_ports; i++) {
+			if (dev->port[i].rep == rep)
+				break;
+		}
+		if (WARN_ON(i == dev->num_ports))
+			return;
+		vport_index = i;
+	}
+
 	port = &dev->port[vport_index];
 	write_lock(&port->roce.netdev_lock);
 	port->roce.netdev = NULL;
@ -143,14 +167,19 @@ mlx5_ib_vport_rep_unload(struct mlx5_eswitch_rep *rep)
 	port->rep = NULL;

 	if (rep->vport == MLX5_VPORT_UPLINK) {
+
+		if (mlx5_lag_is_shared_fdb(mdev) && !mlx5_lag_is_master(mdev))
+			return;
+
+		if (mlx5_lag_is_shared_fdb(mdev)) {
 			struct mlx5_core_dev *peer_mdev;
 			struct mlx5_eswitch *esw;

-		if (mlx5_lag_is_shared_fdb(mdev)) {
-			peer_mdev = mlx5_lag_get_peer_mdev(mdev);
+			mlx5_lag_for_each_peer_mdev(mdev, peer_mdev, i) {
 				esw = peer_mdev->priv.eswitch;
 				mlx5_eswitch_unregister_vport_reps(esw, REP_IB);
 			}
+		}
 		__mlx5_ib_remove(dev, dev->profile, MLX5_IB_STAGE_MAX);
 	}
 }
@ -163,14 +192,14 @@ static const struct mlx5_eswitch_rep_ops rep_ops = {

 static void mlx5_ib_register_peer_vport_reps(struct mlx5_core_dev *mdev)
 {
-	struct mlx5_core_dev *peer_mdev = mlx5_lag_get_peer_mdev(mdev);
+	struct mlx5_core_dev *peer_mdev;
 	struct mlx5_eswitch *esw;
+	int i;

-	if (!peer_mdev)
-		return;
-
+	mlx5_lag_for_each_peer_mdev(mdev, peer_mdev, i) {
 		esw = peer_mdev->priv.eswitch;
 		mlx5_eswitch_register_vport_reps(esw, &rep_ops, REP_IB);
+	}
 }

 struct net_device *mlx5_ib_get_rep_netdev(struct mlx5_eswitch *esw,
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
@ -76,6 +76,16 @@ int mlx5_reporter_vnic_diagnose_counters(struct mlx5_core_dev *dev,
 	if (err)
 		return err;

+	err = devlink_fmsg_u64_pair_put(fmsg, "generated_pkt_steering_fail",
+					VNIC_ENV_GET64(&vnic, generated_pkt_steering_fail));
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u64_pair_put(fmsg, "handled_pkt_steering_fail",
+					VNIC_ENV_GET64(&vnic, handled_pkt_steering_fail));
+	if (err)
+		return err;
+
 	err = devlink_fmsg_obj_nest_end(fmsg);
 	if (err)
 		return err;
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@ -594,13 +594,6 @@ struct mlx5e_mpw_info {

 #define MLX5E_MAX_RX_FRAGS 4

-/* a single cache unit is capable to serve one napi call (for non-striding rq)
- * or a MPWQE (for striding rq).
- */
-#define MLX5E_CACHE_UNIT (MLX5_MPWRQ_MAX_PAGES_PER_WQE > NAPI_POLL_WEIGHT ? \
-			  MLX5_MPWRQ_MAX_PAGES_PER_WQE : NAPI_POLL_WEIGHT)
-#define MLX5E_CACHE_SIZE	(4 * roundup_pow_of_two(MLX5E_CACHE_UNIT))
-
 struct mlx5e_rq;
 typedef void (*mlx5e_fp_handle_rx_cqe)(struct mlx5e_rq*, struct mlx5_cqe64*);
 typedef struct sk_buff *
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act_stats.c
@ -25,8 +25,8 @@ struct mlx5e_tc_act_stats {

 static const struct rhashtable_params act_counters_ht_params = {
 	.head_offset = offsetof(struct mlx5e_tc_act_stats, hash),
-	.key_offset = 0,
-	.key_len = offsetof(struct mlx5e_tc_act_stats, counter),
+	.key_offset = offsetof(struct mlx5e_tc_act_stats, tc_act_cookie),
+	.key_len = sizeof_field(struct mlx5e_tc_act_stats, tc_act_cookie),
 	.automatic_shrinking = true,
 };

@ -169,14 +169,11 @@ mlx5e_tc_act_stats_fill_stats(struct mlx5e_tc_act_stats_handle *handle,
 {
 	struct rhashtable *ht = &handle->ht;
 	struct mlx5e_tc_act_stats *item;
-	struct mlx5e_tc_act_stats key;
 	u64 pkts, bytes, lastused;
 	int err = 0;

-	key.tc_act_cookie = fl_act->cookie;
-
 	rcu_read_lock();
-	item = rhashtable_lookup(ht, &key, act_counters_ht_params);
+	item = rhashtable_lookup(ht, &fl_act->cookie, act_counters_ht_params);
 	if (!item) {
 		rcu_read_unlock();
 		err = -ENOENT;
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@ -207,7 +207,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 		}
 		ch_stats->aff_change++;
 		aff_change = true;
-		if (budget && work_done == budget)
+		if (work_done == budget)
 			work_done--;
 	}

--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@ -1601,7 +1601,8 @@ static int mlx5_esw_vports_init(struct mlx5_eswitch *esw)
 		idx++;
 	}

-	if (mlx5_ecpf_vport_exists(dev)) {
+	if (mlx5_ecpf_vport_exists(dev) ||
+	    mlx5_core_is_ecpf_esw_manager(dev)) {
 		err = mlx5_esw_vport_alloc(esw, idx, MLX5_VPORT_ECPF);
 		if (err)
 			goto err;
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@ -779,6 +779,13 @@ static inline int mlx5_eswitch_num_vfs(struct mlx5_eswitch *esw)
 	return 0;
 }

+static inline int mlx5_eswitch_get_npeers(struct mlx5_eswitch *esw)
+{
+	if (mlx5_esw_allowed(esw))
+		return esw->num_peers;
+	return 0;
+}
+
 static inline struct mlx5_flow_table *
 mlx5_eswitch_get_slow_fdb(struct mlx5_eswitch *esw)
 {
@ -826,6 +833,8 @@ static inline void
 mlx5_eswitch_offloads_single_fdb_del_one(struct mlx5_eswitch *master_esw,
 					 struct mlx5_eswitch *slave_esw) {}

+static inline int mlx5_eswitch_get_npeers(struct mlx5_eswitch *esw) { return 0; }
+
 static inline int
 mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw)
 {
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@ -2178,6 +2178,7 @@ static int esw_offloads_start(struct mlx5_eswitch *esw,
 				   "Failed setting eswitch to offloads");
 		esw->mode = MLX5_ESWITCH_LEGACY;
 		mlx5_rescan_drivers(esw->dev);
+		return err;
 	}
 	if (esw->offloads.inline_mode == MLX5_INLINE_MODE_NONE) {
 		if (mlx5_eswitch_inline_mode_get(esw,
@ -2187,7 +2188,7 @@ static int esw_offloads_start(struct mlx5_eswitch *esw,
 					   "Inline mode is different between vports");
 		}
 	}
-	return err;
+	return 0;
 }

 static void mlx5_esw_offloads_rep_mark_set(struct mlx5_eswitch *esw,
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@ -244,16 +244,22 @@ static int mlx5_cmd_update_root_ft(struct mlx5_flow_root_namespace *ns,
 	    ft->type == FS_FT_FDB &&
 	    mlx5_lag_is_shared_fdb(dev) &&
 	    mlx5_lag_is_master(dev)) {
-		err = mlx5_cmd_set_slave_root_fdb(dev,
-						  mlx5_lag_get_peer_mdev(dev),
-						  !disconnect, (!disconnect) ?
-						  ft->id : 0);
+		struct mlx5_core_dev *peer_dev;
+		int i;
+
+		mlx5_lag_for_each_peer_mdev(dev, peer_dev, i) {
+			err = mlx5_cmd_set_slave_root_fdb(dev, peer_dev, !disconnect,
+							  (!disconnect) ? ft->id : 0);
 			if (err && !disconnect) {
 				MLX5_SET(set_flow_table_root_in, in, op_mod, 0);
 				MLX5_SET(set_flow_table_root_in, in, table_id,
 					 ns->root_ft->id);
 				mlx5_cmd_exec_in(dev, set_flow_table_root, in);
 			}
+			if (err)
+				break;
+		}
+
 	}

 	return err;
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@ -512,8 +512,11 @@ static void mlx5_lag_set_port_sel_mode_offloads(struct mlx5_lag *ldev,
 		return;

 	if (MLX5_CAP_PORT_SELECTION(dev0->dev, port_select_flow_table) &&
-	    tracker->tx_type == NETDEV_LAG_TX_TYPE_HASH)
+	    tracker->tx_type == NETDEV_LAG_TX_TYPE_HASH) {
+		if (ldev->ports > 2)
+			ldev->buckets = MLX5_LAG_MAX_HASH_BUCKETS;
 		set_bit(MLX5_LAG_MODE_FLAG_HASH_BASED, flags);
+	}
 }

 static int mlx5_lag_set_flags(struct mlx5_lag *ldev, enum mlx5_lag_mode mode,
@ -708,7 +711,7 @@ int mlx5_deactivate_lag(struct mlx5_lag *ldev)
 	return 0;
 }

-#define MLX5_LAG_OFFLOADS_SUPPORTED_PORTS 2
+#define MLX5_LAG_OFFLOADS_SUPPORTED_PORTS 4
 bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
 {
 #ifdef CONFIG_MLX5_ESWITCH
@ -734,7 +737,7 @@ bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
 		if (mlx5_eswitch_mode(ldev->pf[i].dev) != mode)
 			return false;

-	if (mode == MLX5_ESWITCH_OFFLOADS && ldev->ports != MLX5_LAG_OFFLOADS_SUPPORTED_PORTS)
+	if (mode == MLX5_ESWITCH_OFFLOADS && ldev->ports > MLX5_LAG_OFFLOADS_SUPPORTED_PORTS)
 		return false;
 #else
 	for (i = 0; i < ldev->ports; i++)
@ -782,7 +785,6 @@ void mlx5_disable_lag(struct mlx5_lag *ldev)
 {
 	bool shared_fdb = test_bit(MLX5_LAG_MODE_FLAG_SHARED_FDB, &ldev->mode_flags);
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
-	struct mlx5_core_dev *dev1 = ldev->pf[MLX5_LAG_P2].dev;
 	bool roce_lag;
 	int err;
 	int i;
@ -807,28 +809,35 @@ void mlx5_disable_lag(struct mlx5_lag *ldev)
 	if (shared_fdb || roce_lag)
 		mlx5_lag_add_devices(ldev);

-	if (shared_fdb) {
-		if (!(dev0->priv.flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV))
-			mlx5_eswitch_reload_reps(dev0->priv.eswitch);
-		if (!(dev1->priv.flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV))
-			mlx5_eswitch_reload_reps(dev1->priv.eswitch);
-	}
+	if (shared_fdb)
+		for (i = 0; i < ldev->ports; i++)
+			if (!(ldev->pf[i].dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV))
+				mlx5_eswitch_reload_reps(ldev->pf[i].dev->priv.eswitch);
 }

-bool mlx5_shared_fdb_supported(struct mlx5_lag *ldev)
+static bool mlx5_shared_fdb_supported(struct mlx5_lag *ldev)
 {
-	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
-	struct mlx5_core_dev *dev1 = ldev->pf[MLX5_LAG_P2].dev;
+	struct mlx5_core_dev *dev;
+	int i;

-	if (is_mdev_switchdev_mode(dev0) &&
-	    is_mdev_switchdev_mode(dev1) &&
-	    mlx5_eswitch_vport_match_metadata_enabled(dev0->priv.eswitch) &&
-	    mlx5_eswitch_vport_match_metadata_enabled(dev1->priv.eswitch) &&
-	    mlx5_devcom_comp_is_ready(dev0->priv.devcom,
-				      MLX5_DEVCOM_ESW_OFFLOADS) &&
-	    MLX5_CAP_GEN(dev1, lag_native_fdb_selection) &&
-	    MLX5_CAP_ESW(dev1, root_ft_on_other_esw) &&
-	    MLX5_CAP_ESW(dev0, esw_shared_ingress_acl))
+	for (i = MLX5_LAG_P1 + 1; i < ldev->ports; i++) {
+		dev = ldev->pf[i].dev;
+		if (is_mdev_switchdev_mode(dev) &&
+		    mlx5_eswitch_vport_match_metadata_enabled(dev->priv.eswitch) &&
+		    MLX5_CAP_GEN(dev, lag_native_fdb_selection) &&
+		    MLX5_CAP_ESW(dev, root_ft_on_other_esw) &&
+		    mlx5_eswitch_get_npeers(dev->priv.eswitch) ==
+		    MLX5_CAP_GEN(dev, num_lag_ports) - 1)
+			continue;
+		return false;
+	}
+
+	dev = ldev->pf[MLX5_LAG_P1].dev;
+	if (is_mdev_switchdev_mode(dev) &&
+	    mlx5_eswitch_vport_match_metadata_enabled(dev->priv.eswitch) &&
+	    mlx5_devcom_comp_is_ready(dev->priv.devcom, MLX5_DEVCOM_ESW_OFFLOADS) &&
+	    MLX5_CAP_ESW(dev, esw_shared_ingress_acl) &&
+	    mlx5_eswitch_get_npeers(dev->priv.eswitch) == MLX5_CAP_GEN(dev, num_lag_ports) - 1)
 		return true;

 	return false;
@ -865,7 +874,6 @@ static bool mlx5_lag_should_disable_lag(struct mlx5_lag *ldev, bool do_bond)
 static void mlx5_do_bond(struct mlx5_lag *ldev)
 {
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
-	struct mlx5_core_dev *dev1 = ldev->pf[MLX5_LAG_P2].dev;
 	struct lag_tracker tracker = { };
 	bool do_bond, roce_lag;
 	int err;
@ -906,20 +914,24 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
 			for (i = 1; i < ldev->ports; i++)
 				mlx5_nic_vport_enable_roce(ldev->pf[i].dev);
 		} else if (shared_fdb) {
+			int i;
+
 			dev0->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
 			mlx5_rescan_drivers_locked(dev0);

-			err = mlx5_eswitch_reload_reps(dev0->priv.eswitch);
-			if (!err)
-				err = mlx5_eswitch_reload_reps(dev1->priv.eswitch);
+			for (i = 0; i < ldev->ports; i++) {
+				err = mlx5_eswitch_reload_reps(ldev->pf[i].dev->priv.eswitch);
+				if (err)
+					break;
+			}

 			if (err) {
 				dev0->priv.flags |= MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
 				mlx5_rescan_drivers_locked(dev0);
 				mlx5_deactivate_lag(ldev);
 				mlx5_lag_add_devices(ldev);
-				mlx5_eswitch_reload_reps(dev0->priv.eswitch);
-				mlx5_eswitch_reload_reps(dev1->priv.eswitch);
+				for (i = 0; i < ldev->ports; i++)
+					mlx5_eswitch_reload_reps(ldev->pf[i].dev->priv.eswitch);
 				mlx5_core_err(dev0, "Failed to enable lag\n");
 				return;
 			}
@ -1519,26 +1531,37 @@ u8 mlx5_lag_get_num_ports(struct mlx5_core_dev *dev)
 }
 EXPORT_SYMBOL(mlx5_lag_get_num_ports);

-struct mlx5_core_dev *mlx5_lag_get_peer_mdev(struct mlx5_core_dev *dev)
+struct mlx5_core_dev *mlx5_lag_get_next_peer_mdev(struct mlx5_core_dev *dev, int *i)
 {
 	struct mlx5_core_dev *peer_dev = NULL;
 	struct mlx5_lag *ldev;
 	unsigned long flags;
+	int idx;

 	spin_lock_irqsave(&lag_lock, flags);
 	ldev = mlx5_lag_dev(dev);
 	if (!ldev)
 		goto unlock;

-	peer_dev = ldev->pf[MLX5_LAG_P1].dev == dev ?
-			   ldev->pf[MLX5_LAG_P2].dev :
-			   ldev->pf[MLX5_LAG_P1].dev;
+	if (*i == ldev->ports)
+		goto unlock;
+	for (idx = *i; idx < ldev->ports; idx++)
+		if (ldev->pf[idx].dev != dev)
+			break;
+
+	if (idx == ldev->ports) {
+		*i = idx;
+		goto unlock;
+	}
+	*i = idx + 1;
+
+	peer_dev = ldev->pf[idx].dev;

 unlock:
 	spin_unlock_irqrestore(&lag_lock, flags);
 	return peer_dev;
 }
-EXPORT_SYMBOL(mlx5_lag_get_peer_mdev);
+EXPORT_SYMBOL(mlx5_lag_get_next_peer_mdev);

 int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 				 u64 *values,
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
@ -111,7 +111,6 @@ int mlx5_activate_lag(struct mlx5_lag *ldev,
 		      bool shared_fdb);
 int mlx5_lag_dev_get_netdev_idx(struct mlx5_lag *ldev,
 				struct net_device *ndev);
-bool mlx5_shared_fdb_supported(struct mlx5_lag *ldev);

 char *mlx5_get_str_port_sel_mode(enum mlx5_lag_mode mode, unsigned long flags);
 void mlx5_infer_tx_enabled(struct lag_tracker *tracker, u8 num_ports,
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c
@ -14,6 +14,7 @@ static bool __mlx5_lag_is_multipath(struct mlx5_lag *ldev)
 	return ldev->mode == MLX5_LAG_MODE_MULTIPATH;
 }

+#define MLX5_LAG_MULTIPATH_OFFLOADS_SUPPORTED_PORTS 2
 static bool mlx5_lag_multipath_check_prereq(struct mlx5_lag *ldev)
 {
 	if (!mlx5_lag_is_ready(ldev))
@ -22,6 +23,9 @@ static bool mlx5_lag_multipath_check_prereq(struct mlx5_lag *ldev)
 	if (__mlx5_lag_is_active(ldev) && !__mlx5_lag_is_multipath(ldev))
 		return false;

+	if (ldev->ports > MLX5_LAG_MULTIPATH_OFFLOADS_SUPPORTED_PORTS)
+		return false;
+
 	return mlx5_esw_multipath_prereq(ldev->pf[MLX5_LAG_P1].dev,
 					 ldev->pf[MLX5_LAG_P2].dev);
 }
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
@ -65,6 +65,7 @@ err_metadata:
 	return err;
 }

+#define MLX5_LAG_MPESW_OFFLOADS_SUPPORTED_PORTS 2
 static int enable_mpesw(struct mlx5_lag *ldev)
 {
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
@ -74,6 +75,9 @@ static int enable_mpesw(struct mlx5_lag *ldev)
 	if (ldev->mode != MLX5_LAG_MODE_NONE)
 		return -EINVAL;

+	if (ldev->ports > MLX5_LAG_MPESW_OFFLOADS_SUPPORTED_PORTS)
+		return -EOPNOTSUPP;
+
 	if (mlx5_eswitch_mode(dev0) != MLX5_ESWITCH_OFFLOADS ||
 	    !MLX5_CAP_PORT_SELECTION(dev0, port_select_flow_table) ||
 	    !MLX5_CAP_GEN(dev0, create_lag_when_not_master_up) ||
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
@ -75,13 +75,14 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev)

 	if (!mlx5_core_is_pf(dev))
 		return NULL;
-	if (MLX5_CAP_GEN(dev, num_lag_ports) != MLX5_DEVCOM_PORTS_SUPPORTED)
+	if (MLX5_CAP_GEN(dev, num_lag_ports) > MLX5_DEVCOM_PORTS_SUPPORTED)
 		return NULL;

 	mlx5_dev_list_lock();
 	sguid0 = mlx5_query_nic_system_image_guid(dev);
 	list_for_each_entry(iter, &devcom_list, list) {
-		struct mlx5_core_dev *tmp_dev = NULL;
+		/* There is at least one device in iter */
+		struct mlx5_core_dev *tmp_dev;

 		idx = -1;
 		for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) {
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
@ -6,7 +6,7 @@

 #include <linux/mlx5/driver.h>

-#define MLX5_DEVCOM_PORTS_SUPPORTED 2
+#define MLX5_DEVCOM_PORTS_SUPPORTED 4

 enum mlx5_devcom_components {
 	MLX5_DEVCOM_ESW_OFFLOADS,
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@ -1174,7 +1174,13 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 				 u64 *values,
 				 int num_counters,
 				 size_t *offsets);
-struct mlx5_core_dev *mlx5_lag_get_peer_mdev(struct mlx5_core_dev *dev);
+struct mlx5_core_dev *mlx5_lag_get_next_peer_mdev(struct mlx5_core_dev *dev, int *i);
+
+#define mlx5_lag_for_each_peer_mdev(dev, peer, i)				\
+	for (i = 0, peer = mlx5_lag_get_next_peer_mdev(dev, &i);		\
+	     peer;								\
+	     peer = mlx5_lag_get_next_peer_mdev(dev, &i))
+
 u8 mlx5_lag_get_num_ports(struct mlx5_core_dev *dev);
 struct mlx5_uars_page *mlx5_get_uars_page(struct mlx5_core_dev *mdev);
 void mlx5_put_uars_page(struct mlx5_core_dev *mdev, struct mlx5_uars_page *up);
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@ -1755,7 +1755,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_328[0x2];
 	u8	   relaxed_ordering_read[0x1];
 	u8         log_max_pd[0x5];
-	u8         reserved_at_330[0x9];
+	u8         reserved_at_330[0x7];
+	u8         vnic_env_cnt_steering_fail[0x1];
+	u8         reserved_at_338[0x1];
 	u8         q_counter_aggregation[0x1];
 	u8         q_counter_other_vport[0x1];
 	u8         log_max_xrcd[0x5];
@ -3673,7 +3675,13 @@ struct mlx5_ifc_vnic_diagnostic_statistics_bits {

 	u8         eth_wqe_too_small[0x20];

-	u8         reserved_at_220[0xdc0];
+	u8         reserved_at_220[0xc0];
+
+	u8         generated_pkt_steering_fail[0x40];
+
+	u8         handled_pkt_steering_fail[0x40];
+
+	u8         reserved_at_360[0xc80];
 };

 struct mlx5_ifc_traffic_counter_bits {