Subsequent patches will need to register different trap groups for
Spectrum-1 and Spectrum-2 onwards.
Enable that by invoking a per-ASIC operation during trap groups
initialization.
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When unsetting policer base, the SPAN code currently uses refcount_dec().
However that function splats when the counter reaches zero, because
reaching zero without actually testing is in general indicative of a
missing cleanup. There is no cleanup to be done here, but nonetheless, use
refcount_dec_and_test() as required.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A later patch will refuse to set the action of certain traps in mlxsw
and also to change the policer binding of certain groups. Pass extack so
that failure could be communicated clearly to user space.
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate nic_info dynamically - n_entries is not constant.
Attach the tunnel offload info only to the uplink representor.
We expect the "main" netdev to be unregistered in switchdev
mode, and there to be only one uplink representor.
Drop the udp_tunnel_drop_rx_info() call, it was not there until
commit b3c2ed21c0 ("net/mlx5e: Fix VXLAN configuration restore after function reload")
so the device doesn't need it, and core should handle reloads and
reset just fine.
v2:
- don't drop the ndos on reprs, and register info on uplink repr.
v4:
- Move netdev tunnel structure handling to en_main.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The DR TX state machine supports the following order:
modify header, push vlan and encapsulation.
Instead fs_dr would pass:
push vlan, modify header and encapsulation.
The above caused the rule creation to fail on invalid action
sequence provided error.
Fixes: 6a48faeeca ("net/mlx5: Add direct rule fs_cmd implementation")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently PF and VF representor netdevice carrier is always controlled
by controlling the representor netdevice device state as up/down.
Representor netdevice state change undergoes one or more txq/rxq
destroy/create commands to firmware, skb and its rx buffer allocation,
health reporters creation and more.
Due to this limitation users do not have the ability to just change
the carrier of the non uplink representors without modifying the
device state.
In one use case when the eswitch physical port carrier is down/up,
user needs to update the VF link state to same as physical port
carrier.
Example of updating VF representor carrier state:
$ ip link set enp0s8f0npf0vf0 carrier off
$ ip link set enp0s8f0npf0vf0 carrier on
This enhancement results into VF link state change which is
represented by the VF representor netdevice carrier.
This enables users to modify the representor carrier without modifying
the representor netdevice state.
A simple test is run using [1] to calculate the time difference between
updating carrier vs updating device state (to update just the carrier)
with one VF to simulate 255 VFs.
Time taken to update the carrier using device up/down:
$ time ./calculate.sh dev enp0s8f0npf0vf0
real 0m30.913s
user 0m0.200s
sys 0m11.168s
Time taken to update just the carrier using carrier iproute2 command:
$ time ./calculate.sh carrier enp0s8f0npf0vf0
real 0m2.142s
user 0m0.160s
sys 0m2.021s
Test shows that its better to use carrier on/off user interface to notify
link up/down event to VF compare to device up/down interface, because
carrier user interface delivers the same event 15 times faster.
[1] https://github.com/paravmellanox/myscripts/blob/master/calculate_carrier_time.sh
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The function invokes bpf_prog_inc(), which increases the reference
count of a bpf_prog object "rq->xdp_prog" if the object isn't NULL.
The refcount leak issues take place in two error handling paths. When
either mlx5_wq_ll_create() or mlx5_wq_cyc_create() fails, the function
simply returns the error code and forgets to drop the reference count
increased earlier, causing a reference count leak of "rq->xdp_prog".
Fix this issue by jumping to the error handling path err_rq_wq_destroy
while either function fails.
Fixes: 422d4c401e ("net/mlx5e: RX, Split WQ objects for different RQ types")
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The flow_source must be specified, even for rule without matching
source vport, because some actions are only allowed in uplink.
Otherwise, rule can't be offloaded and firmware syndrome happens.
Fixes: 6fb0701a9c ("net/mlx5: E-Switch, Add support for offloading rules with no in_port")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The modified flow_context fields in FTE must be indicated in
modify_enable bitmask. Previously, the misc bit in modify_enable is
always set as source vport must be set for each rule. So, when parsing
vxlan/gre/geneve/qinq rules, this bit is not set because those are all
from the same misc fileds that source vport fields are located at, and
we don't need to set the indicator twice.
After adding per vport tables for mirroring, misc bit is not set, then
firmware syndrome happens. To fix it, set the bit wherever misc fileds
are changed. This also makes it unnecessary to check misc fields and set
the misc bit accordingly in metadata matching, so here remove it.
Besides, flow_source must be specified for uplink because firmware
will check it and some actions are only allowed for packets received
from uplink.
Fixes: 96e326878f ("net/mlx5e: Eswitch, Use per vport tables for mirroring")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the driver restores only IPv4 tunnel headers.
Add support for restoring IPv6 tunnel header.
Fixes: b8ce903709 ("net/mlx5e: Restore tunnel metadata on miss")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure.
This helper offers defense-in-depth against potential integer
overflows, while at the same time makes it explicitly clear that
we are dealing witha flexible array member.
Also, remove unnecessary pointer identifier sub_pool.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lifetime of the Rx listener item ('rxl_item') is managed using RCU,
but is dereferenced outside of RCU read-side critical section, which can
lead to a use-after-free.
Fix this by increasing the scope of the RCU read-side critical section.
Fixes: 93c1edb27f ("mlxsw: Introduce Mellanox switch driver core")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cited commit mistakenly removed the trap group for externally routed
packets (e.g., via the management interface) and grouped locally routed
and externally routed packet traps under the same group, thereby
subjecting them to the same policer.
This can result in problems, for example, when FRR is restarted and
suddenly all transient traffic is trapped to the CPU because of a
default route through the management interface. Locally routed packets
required to re-establish a BGP connection will never reach the CPU and
the routing tables will not be re-populated.
Fix this by using a different trap group for externally routed packets.
Fixes: 8110668ecd ("mlxsw: spectrum_trap: Register layer 3 control traps")
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cited commit added the ability to program link-local prefix routes to
the ASIC so that relevant packets are routed and trapped correctly.
However, host routes were not included in the change and thus not
programmed to the ASIC. This can result in packets being trapped via an
external route trap instead of a local route trap as in IPv4.
Fix this by programming all the link-local routes to the ASIC.
Fixes: 10d3757fcb ("mlxsw: spectrum_router: Allow programming link-local prefix routes")
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow QSFP-DD transceivers temperature thresholds reading for hardware
monitoring and thermal control.
For this type, the thresholds are located in page 02h according to the
"Module and Lane Thresholds" description from Common Management
Interface Specification.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Quad Small Form Factor Pluggable Double Density (QSFP-DD) hardware
specification defines a form factor that supports up to 400 Gbps in
aggregate over an 8x50-Gbps electrical interface. The QSFP-DD supports
both optical and copper interfaces.
Implementation is based on Common Management Interface Specification;
Rev 4.0 May 8, 2019. Table 8-2 "Identifier and Status Summary (Lower
Page)" from this spec defines "Id and Status" fields located at offsets
00h - 02h. Bit 2 at offset 02h ("Flat_mem") specifies QSFP EEPROM memory
mode, which could be "upper memory flat" or "paged". Flat memory mode is
coded "1", and indicates that only page 00h is implemented in EEPROM.
Paged memory is coded "0" and indicates that pages 00h, 01h, 02h, 10h
and 11h are implemented. Pages 10h and 11h are currently not supported
by the driver.
"Flat" memory mode is used for the passive copper transceivers. For this
type only page 00h (256 bytes) is available. "Paged" memory is used for
the optical transceivers. For this type pages 00h (256 bytes), 01h (128
bytes) and 02h (128 bytes) are available. Upper page 01h contains static
advertising field, while upper page 02h contains the module-defined
thresholds and lane-specific monitors.
Extend enumerator 'mlxsw_reg_mcia_eeprom_module_info_id' with additional
field 'MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID'. This field is used to
indicate for QSFP-DD transceiver type which memory mode is to be used.
Expose 256 bytes buffer for QSFP-DD passive copper transceiver and
512 bytes buffer for optical.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting the PF interface up/down, notify the firmware to update
uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.
This behavior will prevent sending traffic out on uplink port when PF is
down, such as sending traffic from a VF interface which is still up.
Currently when calling mlx5e_open/close(), the driver only sends PAOS
command to notify the firmware to set the physical port state to
up/down, however, it is not sufficient. When VF is in "auto" state, it
follows the uplink state, which was not updated on mlx5e_open/close()
before this patch.
When switchdev mode is enabled and uplink representor is first enabled,
set the uplink port state value back to its FW default "AUTO".
Fixes: 63bfd399de ("net/mlx5e: Send PAOS command on interface up/down")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In a special configuration, a ConnectX6-Dx pin pps-out might be activated
when driver is loaded. Fix the driver to always read the operational pin
mode when registering it, and advertise it accordingly.
Fixes: ee7f12205a ("net/mlx5e: Implement 1PPS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5e_rep_is_lag_netdev is used as first check as part of netdev events
handler for bond device of non-uplink representors, this handler can get
any netdevice under the same network namespace of mlx5e netdevice. Current
code treats the netdev as mlx5e netdev and only later on verifies this,
hence causes the following Kasan trace:
[15402.744990] ==================================================================
[15402.746942] BUG: KASAN: slab-out-of-bounds in mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.749009] Read of size 8 at addr ffff880391f3f6b0 by task ovs-vswitchd/5347
[15402.752065] CPU: 7 PID: 5347 Comm: ovs-vswitchd Kdump: loaded Tainted: G B O --------- -t - 4.18.0-g3dcc204d291d-dirty #1
[15402.755349] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[15402.757600] Call Trace:
[15402.758968] dump_stack+0x71/0xab
[15402.760427] print_address_description+0x6a/0x270
[15402.761969] kasan_report+0x179/0x2d0
[15402.763445] ? mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.765121] mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.766782] mlx5e_rep_esw_bond_netevent+0x129/0x620 [mlx5_core]
Fix by deferring the violating access to be post the netdev verify check.
Fixes: 7e51891a23 ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix a bug where driver did not verify Hardware pin capabilities for
PTP functions.
Fixes: ee7f12205a ("net/mlx5e: Implement 1PPS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On PTP mlx5_ptp_enable(on=0) flow, driver mistakenly used channel index
as pin index.
After ptp patch marked in fixes tag was introduced, driver can freely
call ptp_find_pin() as part of the .enable() callback.
Fix driver mlx5_ptp_enable(on=0) flow to always use ptp_find_pin(). With
that, Driver will use the correct pin index in mlx5_ptp_enable(on=0) flow.
In addition, when initializing the pins, always set channel to zero. As
all pins can be attached to all channels, let ptp_set_pinfunc() to move
them between the channels.
For stable branches, this fix to be applied only on kernels that includes
both patches in fixes tag. Otherwise, mlx5_ptp_enable(on=0) will be stuck
on pincfg_mux.
Fixes: 62582a7ee7 ("ptp: Avoid deadlocks in the programmable pin code.")
Fixes: ee7f12205a ("net/mlx5e: Implement 1PPS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The cited commit add initialization of ethtool steering during
representor rx initializations without cleaning it up in representor
rx cleanup, this may cause for stale ethtool flows to remain after
moving back from switchdev mode to legacy mode.
Fixed by calling ethtool steering cleanup during rep rx cleanup.
Fixes: 6783e8b29f ("net/mlx5e: Init ethtool steering for representors")
Signed-off-by: Maor Dickman <maord@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On failure to attach the netdev, fix the rollback by re-setting the
device's state back to MLX5E_STATE_DESTROYING.
Failing to attach doesn't stop statistics polling via .ndo_get_stats64.
In this case, although the device is not attached, it falsely continues
to query the firmware for counters. Setting the device's state back to
MLX5E_STATE_DESTROYING prevents the firmware counters query.
Fixes: 26e59d8077 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The steering tree is as follow (nic RX as example):
---------
|root_ns|
---------
|
--------------------------------
| | |
---------- ---------- ---------
|p(prio)0| | p1 | | pn |
---------- ---------- ---------
| |
---------------- ---------------
|ns(e.g bypass)| |ns(e.g. lag) |
---------------- ---------------
| | |
---- ---- ----
|p0| |p1| |pn|
---- ---- ----
|
----
|FT|
----
find_next_chained_ft(prio) returns the first flow table in the next
priority. If prio is a parent of a flow table then it returns the first
flow table in the next priority in the same namespace, else if prio
is parent of namespace, then it should return the first flow table
in the next namespace. Currently if the user requests to forward to
next namespace, the code calls to find_next_chained_ft with the prio
of the next namespace and not the prio of the namesapce itself.
Fixes: 9254f8ed15 ("net/mlx5: Add support in forward to namespace")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When eswitch offloads is enabled, TSAR is created before reloading
the interfaces.
However when eswitch offloads mode is disabled, TSAR is disabled before
reloading the interfaces.
To keep the eswitch enable/disable sequence as mirror, destroy TSAR
after reloading the interfaces.
Fixes: 1bd27b11c1 ("net/mlx5: Introduce E-switch QoS management")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When either esw_legacy_enable() or esw_offloads_enable() fails,
code missed to destroy the created TSAR.
Hence, add the missing call to destroy the TSAR.
Fixes: 610090ebce ("net/mlx5: E-switch, Initialize TSAR Qos hardware block before its user vports")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
list_for_each_entry is able to handle an empty list.
The only effect of avoiding the loop is not initializing the
index variable.
Drop list_empty tests in cases where these variables are not
used.
Note that list_for_each_entry is defined in terms of list_first_entry,
which indicates that it should not be used on an empty list. But in
list_for_each_entry, the element obtained by list_first_entry is not
really accessed, only the address of its list_head field is compared
to the address of the list head, so the list_first_entry is safe.
The semantic patch that makes this change is as follows (with another
variant for the no brace case): (http://coccinelle.lip6.fr/)
<smpl>
@@
expression x,e;
iterator name list_for_each_entry;
statement S;
identifier i;
@@
-if (!(list_empty(x))) {
list_for_each_entry(i,x,...) S
- }
... when != i
? i = e
</smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There is no need to print on each unsuccessful matcher
ip_version combination since it probably will happen when
trying to create all the possible combinations.
On a real failure we have a print in the calling function.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The concept of Relaxed Ordering in the PCI Express environment allows
switches in the path between the Requester and Completer to reorder some
transactions just received before others that were previously enqueued.
In ETH driver, there is no question of write integrity since each memory
segment is written only once per cycle. In addition, the driver doesn't
access the memory shared with the hardware until the corresponding CQE
arrives indicating all PCI transactions are done.
Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-numa has
300% improvement in the bandwidth.
With relaxed ordering turned off: BW:10 [GB/s]
With relaxed ordering turned on: BW:40 [GB/s]
The driver turns relaxed ordering with respect to the firmware
capabilities and the return value from pcie_relaxed_ordering_enabled().
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use the indirect call wrapper API macros for declaration and scope
of the RX post WQEs functions.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of exposing the RQ datapath handlers (from en_rx.c) so that
they are set in the control path (in en_main.c), wrap this logic
in a single function in en_rx.c and expose it alone.
Every profile will now have a pointer to the new mlx5e_rx_handlers
structure, instead of directly pointing to the previously-exposed
RQ handlers.
This significantly improves locality and modularity of the driver,
and allows many functions in en_rx.c to become static.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently PF and VF representors are exposed as virtual device.
They are not linked to its parent PCI device like how uplink
representor is linked.
Due to this, PF and VF representors cannot benefit of the
systemd defined naming scheme. This requires special handling
by the users.
Hence, link the PF and VF representors to their parent PCI device
similar to existing uplink representor netdevice.
Example:
udevadm output before linking to PCI device:
$ udevadm test-builtin net_id /sys/class/net/eth6
Load module index
Network interface NamePolicy= disabled on kernel command line, ignoring.
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v243'.
ID_NET_NAMING_SCHEME=v243
Unload module index
Unloaded link configuration context.
udevadm output after linking to PCI device:
$ udevadm test-builtin net_id /sys/class/net/eth6
Load module index
Network interface NamePolicy= disabled on kernel command line, ignoring.
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v243'.
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_PATH=enp0s8f0npf0vf0
Unload module index
Unloaded link configuration context.
In past there was little concern over seeing 10,000 lines output
showing up at thread [1] is not applicable as ndo ops for VF
handling is not exposed for all the 100 repesentors for mlx5 devices.
Additionally alternative device naming [2] to overcome shorter device
naming is also part of the latest systemd release v245.
[1] https://marc.info/?l=linux-netdev&m=152657949117904&w=2
[2] https://lwn.net/Articles/814068/
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently steering table and rx group initialization helper
routines works on the total_vports passed as input parameter.
Both eswitch helpers work on the mlx5_eswitch and thereby have access
to esw->total_vports. Hence use it directly instead of passing it
via function input arguments.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Total e-switch vports are already stored in mlx5_eswitch total_vports.
Avoid copy of it in nvports and reuse existing total_vports calculation.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When eswitch is enabled, VFs might not be enabled. Hence, consider
maximum number of VFs.
This further closes the gap between handling VF vports between ECPF and
PF.
Fixes: ea2128fd63 ("net/mlx5: E-switch, Reduce dependency on num_vfs during mode set")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Per page request event, FW request to allocated or release pages for a
single function. Driver maintains FW pages object per function, so there
is no need to hold one global page data-base. Instead, have a page
data-base per function, which will improve performance release flow in all
cases, especially for "release all pages".
As the range of function IDs is large and not sequential, use xarray to
store a per function ID page data-base, where the function ID is the key.
Upon first allocation of a page to a function ID, create the page
data-base per function. This data-base will be released only at pagealloc
mechanism cleanup.
NIC: ConnectX-4 Lx
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Test case: 32 VFs, measure release pages on one VF as part of FLR
Before: 0.021 Sec
After: 0.014 Sec
The improvement depends on amount of VFs and memory utilization
by them. Time measurements above were taken from idle system.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.
This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().
This patch was compile-tested on allyesconfig.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
When mlxsw_core_trap_register fails in mlxsw_emad_init,
destroy_workqueue() shouled be called to destroy mlxsw_core->emad_wq.
Fixes: d965465b60 ("mlxsw: core: Fix possible deadlock")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>