Commit Graph

41005 Commits

Author SHA1 Message Date
Alexander Lobakin
5ce6663158 ice: switch to napi_build_skb()
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
ice driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-28 09:42:33 -08:00
Alexander Lobakin
ef687d61e0 iavf: switch to napi_build_skb()
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
iavf driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-28 09:42:33 -08:00
Alexander Lobakin
6e19cf7d38 i40e: switch to napi_build_skb()
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
i40e driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-28 09:42:33 -08:00
Alexander Lobakin
89a354c03b e1000: switch to napi_build_skb()
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx element.
e1000 driver runs Tx completion polling cycle right before the Rx
one. Now that e1000 uses napi_consume_skb() to put skbuff_heads of
completed entries into the cache, it will never empty and always
warm at that moment. Switch to the napi_build_skb() to relax mm
pressure on heavy Rx and increase throughput.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-28 09:42:25 -08:00
Alexander Lobakin
dcb95f06ea e1000: switch to napi_consume_skb()
In order to take the best from per-cpu NAPI skbuff_head caches and
CPU cycles, let's switch from dev_kfree_skb_any(), which passes skb
back to the mm layer, to napi_consume_skb(), which feeds those
caches on non-zero budget instead (falls back to the former on 0).
Do the replacement in e1000_unmap_and_free_tx_resource(). There are
4 call sites of this function throughout the driver:
 * e1000_clean_tx_ring(). Slowpath, process context, cleans the
   whole Tx ring on ifdown. Use budget of 0 here;
 * e1000_tx_map(). Hotpath, net Tx softirq, unmaps the buffers in
   case of error. Use 0 as well;
 * e1000_clean_tx_irq(). Hotpath, NAPI Tx completion polling cycle.
   As the driver doesn't count completed Tx entries towards the NAPI
   budget, just use the poll budget of 64 to utilize caches.

Apart from being a preparation for switching to napi_build_skb(),
this is useful on its own as well, as napi_consume_skb() flushes
skb caches by batches of 32 instead of one-at-a-time.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-28 09:41:57 -08:00
Aleksander Jan Bajkowski
5be60a9453 net: lantiq_xrx200: fix statistics of received bytes
Received frames have FCS truncated. There is no need
to subtract FCS length from the statistics.

Fixes: fe1a56420c ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-28 12:18:18 +00:00
Christophe JAILLET
1cd5384c88 net: ag71xx: Fix a potential double free in error handling paths
'ndev' is a managed resource allocated with devm_alloc_etherdev(), so there
is no need to call free_netdev() explicitly or there will be a double
free().

Simplify all error handling paths accordingly.

Fixes: d51b6ce441 ("net: ethernet: add ag71xx driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-28 12:16:34 +00:00
Zekun Shen
5f50153288 atlantic: Fix buff_ring OOB in aq_ring_rx_clean
The function obtain the next buffer without boundary check.
We should return with I/O error code.

The bug is found by fuzzing and the crash report is attached.
It is an OOB bug although reported as use-after-free.

[    4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9
[    4.806505]
[    4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W         5.6.0 #34
[    4.809030] Call Trace:
[    4.809343]  dump_stack+0x76/0xa0
[    4.809755]  print_address_description.constprop.0+0x16/0x200
[    4.810455]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.811234]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.813183]  __kasan_report.cold+0x37/0x7c
[    4.813715]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.814393]  kasan_report+0xe/0x20
[    4.814837]  aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.815499]  ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic]
[    4.816290]  aq_vec_poll+0x179/0x5d0 [atlantic]
[    4.816870]  ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic]
[    4.817746]  ? __next_timer_interrupt+0xba/0xf0
[    4.818322]  net_rx_action+0x363/0xbd0
[    4.818803]  ? call_timer_fn+0x240/0x240
[    4.819302]  ? __switch_to_asm+0x40/0x70
[    4.819809]  ? napi_busy_loop+0x520/0x520
[    4.820324]  __do_softirq+0x18c/0x634
[    4.820797]  ? takeover_tasklets+0x5f0/0x5f0
[    4.821343]  run_ksoftirqd+0x15/0x20
[    4.821804]  smpboot_thread_fn+0x2f1/0x6b0
[    4.822331]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.823041]  ? __kthread_parkme+0x80/0x100
[    4.823571]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.824301]  kthread+0x2b5/0x3b0
[    4.824723]  ? kthread_create_on_node+0xd0/0xd0
[    4.825304]  ret_from_fork+0x35/0x40

Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 14:49:53 +00:00
Lad Prabhakar
32f52e8e78 net: ethernet: ti: davinci_emac: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq() for DT users only.

While at it propagate error code in case request_irq() fails instead of
returning -EBUSY.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
7801302b9a net: xilinx: emaclite: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
6c119fbdb8 net: ethoc: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
441faddaad fsl/fman: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq(). While doing so return error pointer
from read_dts_node() as platform_get_irq() may return -EPROBE_DEFER.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
f83b434811 net: pxa168_eth: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
c0032d6e87 ethernet: netsec: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Horatiu Vultur
0c94d657d2 net: lan966x: Fix the vlan used by host ports
The blamed commit changed the vlan used by the host ports to be 4095
instead of 0.
Because of this change the following issues are seen:
- when the port is probed first it was adding an entry in the MAC table
  with the wrong vlan (port->pvid which is default 0) and not HOST_PVID
- when the port is removed from a bridge, it was using the wrong vlan to
  add entries in the MAC table. It was using the old PVID and not the
  HOST_PVID

This patch fixes this two issues by using the HOST_PVID instead of
port->pvid.

Fixes: 6d2c186afa ("net: lan966x: Add vlan support.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:16:30 +00:00
Jakub Kicinski
720908e5f8 bnxt_en: Use page frag RX buffers for better software GRO performance
If NETIF_F_GRO_HW is disabled, the existing driver code uses kmalloc'ed
data for RX buffers.  This causes inefficient SW GRO performance
because the GRO data is merged using the less efficient frag_list.
Use netdev_alloc_frag() and friends instead so that GRO data can be
merged into skb_shinfo(skb)->frags for better performance.

[Use skb_free_frag() - Vikas Gupta]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Edwin Peer
b976969bed bnxt_en: convert to xdp_do_flush
The xdp_do_flush_map function has been replaced with the more general
xdp_do_flush().

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Michael Chan
3fcbdbd5d8 bnxt_en: Support CQE coalescing mode in ethtool
Support showing and setting the CQE mode in ethtool.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Michael Chan
df78ea2246 bnxt_en: Support configurable CQE coalescing mode
CQE coalescing mode is the same as the timer reset coalescing mode
on Broadcom devices.  Currently this mode is always enabled if it
is supported by the device.  Restructure the code slightly to support
dynamically changing this mode.

Add a flags field to struct bnxt_coal.  Initially, the CQE flag will
be set for the RX and TX side if the device supports it.  We need to
move bnxt_init_dflt_coal() to set up default coalescing until the
capability is determined.

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Andy Gospodarek
dc1f5d1ebc bnxt_en: enable interrupt sampling on 5750X for DIM
5750X (P5) chips handle receiving packets on the NQ rather than the main
completion queue so we need to get and set stats from the correct spots
for dynamic interrupt moderation.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Michael Chan
0fb8582ae5 bnxt_en: Log error report for dropped doorbell
Log the unrecognized error report type value as well.

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Somnath Kotur
5a717f4a8e bnxt_en: Add event handler for PAUSE Storm event
FW has been modified to send a new async event when it detects
a pause storm. Register for this new event and log it upon receipt.

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Jakub Kicinski
6f6f0ac664 Merge tag 'mlx5-fixes-2021-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5 fixes 2021-12-22

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2021-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'
  net/mlx5e: Delete forward rule for ct or sample action
  net/mlx5e: Fix ICOSQ recovery flow for XSK
  net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow
  net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled
  net/mlx5e: Wrap the tx reporter dump callback to extract the sq
  net/mlx5: Fix tc max supported prio for nic mode
  net/mlx5: Fix SF health recovery flow
  net/mlx5: Fix error print in case of IRQ request failed
  net/mlx5: Use first online CPU instead of hard coded CPU
  net/mlx5: DR, Fix querying eswitch manager vport for ECPF
  net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
====================

Link: https://lore.kernel.org/r/20211223190441.153012-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23 19:04:33 -08:00
Jakub Kicinski
8b3f913322 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  commit 8f905c0e73 ("inet: fully convert sk->sk_rx_dst to RCU rules")
  commit 43f51df417 ("net: move early demux fields close to sk_refcnt")
  https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23 16:09:58 -08:00
Nobuhiro Iwamatsu
391e5975c0 net: stmmac: dwmac-visconti: Fix value of ETHER_CLK_SEL_FREQ_SEL_2P5M
ETHER_CLK_SEL_FREQ_SEL_2P5M is not 0 bit of the register. This is a
value, which is 0. Fix from BIT(0) to 0.

Reported-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Fixes: b38dd98ff8 ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Link: https://lore.kernel.org/r/20211223073633.101306-1-nobuhiro1.iwamatsu@toshiba.co.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23 09:58:13 -08:00
Xiaoliang Yang
eccffcf465 net: stmmac: ptp: fix potentially overflowing expression
Convert the u32 variable to type u64 in a context where expression of
type u64 is required to avoid potential overflow.

Fixes: e9e3720002 ("net: stmmac: ptp: update tas basetime after ptp adjust")
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Link: https://lore.kernel.org/r/20211223073928.37371-1-xiaoliang.yang_1@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23 09:45:55 -08:00
Ong Boon Leong
e48cb313fd net: stmmac: add tc flower filter for EtherType matching
This patch adds basic support for EtherType RX frame steering for
LLDP and PTP using the hardware offload capabilities.

Example steps for setting up RX frame steering for LLDP and PTP:
$ IFDEVNAME=eth0
$ tc qdisc add dev $IFDEVNAME ingress
$ tc qdisc add dev $IFDEVNAME root mqprio num_tc 8 \
     map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 \
     queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0

For LLDP
$ tc filter add dev $IFDEVNAME parent ffff: protocol 0x88cc \
     flower hw_tc 5
OR
$ tc filter add dev $IFDEVNAME parent ffff: protocol LLDP \
     flower hw_tc 5

For PTP
$ tc filter add dev $IFDEVNAME parent ffff: protocol 0x88f7 \
     flower hw_tc 6

Show tc ingress filter
$ tc filter show dev $IFDEVNAME ingress

v1->v2:
 Thanks to Kurt's and Sebastian's suggestion.
 - change from __be16 to u16 etype
 - change ETHER_TYPE_FULL_MASK to use cpu_to_be16() macro

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-23 11:20:49 +00:00
Horatiu Vultur
2e49761e4f net: lan966x: Add support for multiple bridge flags
This patch series extends the current supported bridge flags with the
following flags: BR_FLOOD, BR_BCAST_FLOOD and BR_LEARNING.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-23 11:19:06 +00:00
Christophe JAILLET
4390c6edc0 net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'
All the error handling paths of 'mlx5e_tc_add_fdb_flow()' end to 'err_out'
where 'flow_flag_set(flow, FAILED);' is called.

All but the new error handling paths added by the commits given in the
Fixes tag below.

Fix these error handling paths and branch to 'err_out'.

Fixes: 166f431ec6 ("net/mlx5e: Add indirect tc offload of ovs internal port")
Fixes: b16eb3c81f ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
(cherry picked from commit 31108d142f)
2021-12-22 20:38:49 -08:00
Chris Mi
2820110d94 net/mlx5e: Delete forward rule for ct or sample action
When there is ct or sample action, the ct or sample rule will be deleted
and return. But if there is an extra mirror action, the forward rule can't
be deleted because of the return.

Fix it by removing the return.

Fixes: 69e2916ebc ("net/mlx5: CT: Add support for mirroring")
Fixes: f94d6389f6 ("net/mlx5e: TC, Add support to offload sample action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:49 -08:00
Maxim Mikityanskiy
19c4aba2d4 net/mlx5e: Fix ICOSQ recovery flow for XSK
There are two ICOSQs per channel: one is needed for RX, and the other
for async operations (XSK TX, kTLS offload). Currently, the recovery
flow for both is the same, and async ICOSQ is mistakenly treated like
the regular ICOSQ.

This patch prevents running the regular ICOSQ recovery on async ICOSQ.
The purpose of async ICOSQ is to handle XSK wakeup requests and post
kTLS offload RX parameters, it has nothing to do with RQ and XSKRQ UMRs,
so the regular recovery sequence is not applicable here.

Fixes: be5323c837 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:48 -08:00
Maxim Mikityanskiy
17958d7cd7 net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow
Both regular RQ and XSKRQ use the same ICOSQ for UMRs. When doing
recovery for the ICOSQ, don't forget to deactivate XSKRQ.

XSK can be opened and closed while channels are active, so a new mutex
prevents the ICOSQ recovery from running at the same time. The ICOSQ
recovery deactivates and reactivates XSKRQ, so any parallel change in
XSK state would break consistency. As the regular RQ is running, it's
not enough to just flush the recovery work, because it can be
rescheduled.

Fixes: be5323c837 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:48 -08:00
Gal Pressman
a0cb909644 net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled
When TC classifier action offloads are disabled (CONFIG_MLX5_CLS_ACT in
Kconfig), the mlx5e_rep_tc_receive() function which is responsible for
passing the skb to the stack (or freeing it) is defined as a nop, and
results in leaking the skb memory. Replace the nop with a call to
napi_gro_receive() to resolve the leak.

Fixes: 28e7606fa8 ("net/mlx5e: Refactor rx handler of represetor device")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:48 -08:00
Amir Tzin
918fc3855a net/mlx5e: Wrap the tx reporter dump callback to extract the sq
Function mlx5e_tx_reporter_dump_sq() casts its void * argument to struct
mlx5e_txqsq *, but in TX-timeout-recovery flow the argument is actually
of type struct mlx5e_tx_timeout_ctx *.

 mlx5_core 0000:08:00.1 enp8s0f1: TX timeout detected
 mlx5_core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000
 BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae)
 kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
 CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0_mlnx #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
 RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
 [mlx5_core]
 Call Trace:
 mlx5e_tx_reporter_dump+0x43/0x1c0 [mlx5_core]
 devlink_health_do_dump.part.91+0x71/0xd0
 devlink_health_report+0x157/0x1b0
 mlx5e_reporter_tx_timeout+0xb9/0xf0 [mlx5_core]
 ? mlx5e_tx_reporter_err_cqe_recover+0x1d0/0x1d0
 [mlx5_core]
 ? mlx5e_health_queue_dump+0xd0/0xd0 [mlx5_core]
 ? update_load_avg+0x19b/0x550
 ? set_next_entity+0x72/0x80
 ? pick_next_task_fair+0x227/0x340
 ? finish_task_switch+0xa2/0x280
   mlx5e_tx_timeout_work+0x83/0xb0 [mlx5_core]
   process_one_work+0x1de/0x3a0
   worker_thread+0x2d/0x3c0
 ? process_one_work+0x3a0/0x3a0
   kthread+0x115/0x130
 ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30
 --[ end trace 51ccabea504edaff ]---
 RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
 PKRU: 55555554
 Kernel panic - not syncing: Fatal exception
 Kernel Offset: disabled
 end Kernel panic - not syncing: Fatal exception

To fix this bug add a wrapper for mlx5e_tx_reporter_dump_sq() which
extracts the sq from struct mlx5e_tx_timeout_ctx and set it as the
TX-timeout-recovery flow dump callback.

Fixes: 5f29458b77 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:48 -08:00
Chris Mi
d671e109bd net/mlx5: Fix tc max supported prio for nic mode
Only prio 1 is supported if firmware doesn't support ignore flow
level for nic mode. The offending commit removed the check wrongly.
Add it back.

Fixes: 9a99c8f125 ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:47 -08:00
Moshe Shemesh
33de865f7b net/mlx5: Fix SF health recovery flow
SF do not directly control the PCI device. During recovery flow SF
should not be allowed to do pci disable or pci reset, its PF will do it.

It fixes the following kernel trace:
mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:387:(pid 40948): starting health recovery flow
mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
mlx5_core 0000:03:00.0: wait vital counter value 0xab175 after 1 iterations
mlx5_core.sf mlx5_core.sf.25: firmware version: 24.32.532
mlx5_core.sf mlx5_core.sf.23: mlx5_health_try_recover:387:(pid 40946): starting health recovery flow
mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
mlx5_core 0000:03:00.0: wait vital counter value 0xab193 after 1 iterations
mlx5_core.sf mlx5_core.sf.23: firmware version: 24.32.532
mlx5_core.sf mlx5_core.sf.25: mlx5_cmd_check:813:(pid 40948): ENABLE_HCA(0x104) op_mod(0x0) failed,
status bad resource state(0x9), syndrome (0x658908)
mlx5_core.sf mlx5_core.sf.25: mlx5_function_setup:1292:(pid 40948): enable hca failed
mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:389:(pid 40948): health recovery failed

Fixes: 1958fc2f07 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:47 -08:00
Shay Drory
aa968f9220 net/mlx5: Fix error print in case of IRQ request failed
In case IRQ layer failed to find or to request irq, the driver is
printing the first cpu of the provided affinity as part of the error
print. Empty affinity is a valid input for the IRQ layer, and it is
an error to call cpumask_first() on empty affinity.

Remove the first cpu print from the error message.

Fixes: c36326d38d ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:47 -08:00
Shay Drory
26a7993c93 net/mlx5: Use first online CPU instead of hard coded CPU
Hard coded CPU (0 in our case) might be offline. Hence, use the first
online CPU instead.

Fixes: f891b7cdbd ("net/mlx5: Enable single IRQ for PCI Function")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:46 -08:00
Yevgeny Kliteynik
624bf42c2e net/mlx5: DR, Fix querying eswitch manager vport for ECPF
On BlueField the E-Switch manager is the ECPF (vport 0xFFFE), but when
querying capabilities of ECPF eswitch manager, need to query vport 0
with other_vport = 0.

Fixes: 9091b821aa ("net/mlx5: DR, Handle eswitch manager and uplink vports separately")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:46 -08:00
Miaoqian Lin
6b8b425858 net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
The mlx5_get_uars_page() function  returns error pointers.
Using IS_ERR() to check the return value to fix this.

Fixes: 4ec9e7b026 ("net/mlx5: DR, Expose steering domain functionality")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:46 -08:00
Amit Cohen
70ec72d5b6 mlxsw: spectrum_flower: Make vlan_id limitation more specific
Spectrum ASICs do not support matching of VLAN ID at egress.
Currently, mlxsw driver forbids matching of all VLAN related fields at
egress, which is too strict check.

For example, the following filter is not supported by the driver:
$ tc filter add dev swpX egress protocol 802.1q pref 1 handle 101 flower
vlan_ethtype ipv4 src_ip .. dst_ip .. skip_sw action pass
Error: mlxsw_spectrum: vlan_id key is not supported on egress.
We have an error talking to the kernel

The filter above does not match on VLAN ID, but is bounced anyway.

Make the check more specific, forbid only matching of 'vlan_id' at egress.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22 17:14:32 -08:00
Jakub Kicinski
5de24da1b3 Merge tag 'mlx5-updates-2021-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2021-12-21

1) From Shay Drory: Devlink user knobs to control device's EQ size

This series provides knobs which will enable users to
minimize memory consumption of mlx5 Functions (PF/VF/SF).
mlx5 exposes two new generic devlink params for EQ size
configuration and uses devlink generic param max_macs.

LINK: https://lore.kernel.org/netdev/20211208141722.13646-1-shayd@nvidia.com/

2) From Tariq and Lama, allocate software channel objects and statistics
  of a mlx5 netdevice private data dynamically upon first demand to save on
  memory.

* tag 'mlx5-updates-2021-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Take packet_merge params directly from the RX res struct
  net/mlx5e: Allocate per-channel stats dynamically at first usage
  net/mlx5e: Use dynamic per-channel allocations in stats
  net/mlx5e: Allow profile-specific limitation on max num of channels
  net/mlx5e: Save memory by using dynamic allocation in netdev priv
  net/mlx5e: Add profile indications for PTP and QOS HTB features
  net/mlx5e: Use bitmap field for profile features
  net/mlx5: Remove the repeated declaration
  net/mlx5: Let user configure max_macs generic param
  devlink: Clarifies max_macs generic devlink param
  net/mlx5: Let user configure event_eq_size param
  devlink: Add new "event_eq_size" generic device param
  net/mlx5: Let user configure io_eq_size param
  devlink: Add new "io_eq_size" generic device param
====================

Link: https://lore.kernel.org/r/20211222031604.14540-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22 17:13:01 -08:00
Colin Ian King
62a3106697 net: broadcom: bcm4908enet: remove redundant variable bytes
The variable bytes is being used to summate slot lengths,
however the value is never used afterwards. The summation
is redundant so remove variable bytes.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20211222003937.727325-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22 14:58:43 -08:00
Jesse Brandeburg
0092db5fac ice: trivial: fix odd indenting
Fix an odd indent where some code was left indented, and causes smatch
to warn:
ice_log_pkg_init() warn: inconsistent indenting

While here, for consistency, add a break after the default case.

This commit has a Fixes: but we caught this while it was only in net-next.

Fixes: 247dd97d71 ("ice: Refactor status flow for DDP load")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20211221230538.2546315-1-jesse.brandeburg@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22 14:57:55 -08:00
Jakub Kicinski
2030eddced Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-12-21

This series contains updates to ice driver only.

Karol modifies the reset flow to correct issues with PTP reset.

Jake extends PTP support for E822 based devices. This includes a few
cleanup patches, that fix some minor issues. In addition, there are some
slight refactors to ease the addition of E822 support, followed by adding
the new hardware implementation ice_ptp_hw.c.

There are a few major differences with E822 support compared to E810
support:

*) The E822 device has a Clock Generation Unit which must be initialized in
order to generate proper clock frequencies on the output that drives the PTP
hardware clock registers

*) The E822 PHY is a bit different and requires a more complex
initialization procedure which must be rerun any time the link configuration
changes.

*) The E822 devices support enhanced timestamp calibration by making use of
a process called Vernier offset measurement. This allows the hardware to
measure phase offset related to the PHY clocks for Serdes and FEC, reducing
the inaccuracy of the timestamp relative to the actual packet transmission
and receipt. Making use of this requires data gathered from the first
transmitted and received packets, and waiting for the PHY to complete the
calibration measurements. This is done as part of a new kthread, ov_work.
Note that to avoid delay in enabling timestamps, we start the PHY in
'bypass' mode which allows timestamps to be captured without the Vernier
calibration measurement. Once the first packets have been sent and received,
we then complete the calibration setup and exit bypass mode and begin using
the more precise timestamps. According to the datasheet, timestamps without
calibration data can be incorrect relative to actual receipt or transmission
by up to 1 clock cycle (~1.25 nanoseconds), while calibrated timestamps
should be correct to within 1/8th of a clock cycle (~0.15 nanoseconds).

*) E822 devices support crosstimestamping via PCIe PTM, which we enable when
available on the platform.

There is a fair amount of logic required to perform PHY and CGU
initialization, which is the vast majority of the new code, but it is fairly
self contained within ice_ptp_hw.c, with the exception of monitoring for
offset validity being handled by a kthread.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: support crosstimestamping on E822 devices if supported
  ice: exit bypass mode once hardware finishes timestamp calibration
  ice: ensure the hardware Clock Generation Unit is configured
  ice: implement basic E822 PTP support
  ice: convert clk_freq capability into time_ref
  ice: introduce ice_ptp_init_phc function
  ice: use 'int err' instead of 'int status' in ice_ptp_hw.c
  ice: PTP: move setting of tstamp_config
  ice: introduce ice_base_incval function
  ice: Fix E810 PTP reset flow
====================

Link: https://lore.kernel.org/r/20211221174845.3063640-1-anthony.l.nguyen@intel.com
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22 14:25:42 -08:00
Jiasheng Jiang
9b8bdd1eb5 sfc: falcon: Check null pointer of rx_queue->page_ring
Because of the possible failure of the kcalloc, it should be better to
set rx_queue->page_ptr_mask to 0 when it happens in order to maintain
the consistency.

Fixes: 5a6681e22c ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20211220140344.978408-1-jiasheng@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22 12:25:18 -08:00
Jiasheng Jiang
bdf1b5c388 sfc: Check null pointer of rx_queue->page_ring
Because of the possible failure of the kcalloc, it should be better to
set rx_queue->page_ptr_mask to 0 when it happens in order to maintain
the consistency.

Fixes: 5a6681e22c ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20211220135603.954944-1-jiasheng@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22 12:23:18 -08:00
David E. Box
a5f8ef0baf net/mlx5e: Use auxiliary_device driver data helpers
Use auxiliary_get_drvdata and auxiliary_set_drvdata helpers.

Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20211221235852.323752-4-david.e.box@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22 13:59:01 +01:00
Jiasheng Jiang
99d7fbb5ce net: ks8851: Check for error irq
Because platform_get_irq() could fail and return error irq.
Therefore, it might be better to check it if order to avoid the use of
error irq.

Fixes: 797047f875 ("net: ks8851: Implement Parallel bus operations")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-22 10:23:50 +00:00
Jiasheng Jiang
cb93b3e11d drivers: net: smc911x: Check for error irq
Because platform_get_irq() could fail and return error irq.
Therefore, it might be better to check it if order to avoid the use of
error irq.

Fixes: ae150435b5 ("smsc: Move the SMC (SMSC) drivers")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-22 10:23:03 +00:00