Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix cgroupv2 from the input path, from Florian Westphal.
2) Fix incorrect return value of nft_parse_register(), from Antoine Tenart.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: nft_parse_register can return a negative value
netfilter: nft_socket: make cgroup match work in input too
====================
Link: https://lore.kernel.org/r/20220412094246.448055-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 6e1acfa387 ("netfilter: nf_tables: validate registers
coming from userspace.") nft_parse_register can return a negative value,
but the function prototype is still returning an unsigned int.
Fixes: 6e1acfa387 ("netfilter: nf_tables: validate registers coming from userspace.")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Horatiu Vultur says:
====================
net: lan966x: lan966x fixes
This contains different fixes for lan966x in different areas like PTP, MAC,
Switchdev and IGMP processing.
====================
Link: https://lore.kernel.org/r/20220409184143.1204786-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently when getting a new MAC is learn, the HW generates an
interrupt. So then the SW will check the new entry and checks if it
arrived on a correct port. If it didn't just generate a warning.
But this could still crash the system. Therefore stop processing that
entry when an issue is seen.
Fixes: 5ccd66e01c ("net: lan966x: add support for interrupts from analyzer")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On lan966x it is not allowed to have foreign interfaces under a bridge
which already contains lan966x ports. So when a port leaves the bridge
it would call switchdev_bridge_port_unoffload which eventually will
notify the other ports that bridge left the vlan group but that is not
true because the bridge is still part of the vlan group.
Therefore when a port leaves the bridge, stop generating replays because
already the HW cleared after itself and the other ports don't need to do
anything else.
Fixes: cf2f60897e ("net: lan966x: Add support to offload the forwarding.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case an IGMP frame has a vlan tag, then the function
lan966x_hw_offload couldn't figure out that is a IGMP frame. Therefore
the SW thinks that the frame was already forward by the HW which is not
true.
Extend lan966x_hw_offload to pop the vlan tag if are any and then check
for IGMP frames.
Fixes: 47aeea0d57 ("net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED ")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The clk_per_cfg register represents the value added to the system clock
for each clock cycle. The issue is that the default value is wrong,
meaning that in case the DUT was a grandmaster then everone in the
network was too slow. In case there was a grandmaster, then there is no
issue because the DUT will configure clk_per_cfg register based on the
master frequency.
Fixes: d096459494 ("net: lan966x: Add support for ptp clocks")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Function sctp_do_peeloff() wrongly initializes daddr of the original
socket instead of the peeled off socket, which makes getpeername()
return zeroes instead of the primary address. Initialize the new socket
instead.
Fixes: d570ee490f ("[SCTP]: Correctly set daddr for IPv6 sockets during peeloff")
Signed-off-by: Petr Malat <oss@malat.biz>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20220409063611.673193-1-oss@malat.biz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-04-08
Alexander fixes a use after free issue with aRFS for ice driver.
Mateusz reverts a commit that introduced issues related to device
resets for iavf driver.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
Revert "iavf: Fix deadlock occurrence during resetting VF interface"
ice: arfs: fix use-after-free when freeing @rx_cpu_rmap
====================
Link: https://lore.kernel.org/r/20220408163411.2415552-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Karsten Graul says:
====================
net/smc: fixes 2022-04-08
Patch 1 fixes two usages of snprintf() with non null-terminated
string which results into an out-of-bounds read.
Pach 2 fixes a syzbot finding where a pointer check was missed
before the call to dev_name().
Patch 3 fixes a crash when already released memory is used as
a function pointer.
====================
Link: https://lore.kernel.org/r/20220408151035.1044701-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Child sockets may inherit the af_ops from the parent listen socket.
When the listen socket is released then the af_ops of the child socket
points to released memory.
Solve that by restoring the original af_ops for child sockets which
inherited the parent af_ops. And clear any inherited user_data of the
parent socket.
Fixes: 8270d9c210 ("net/smc: Limit backlog connections")
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dev_name() was called with dev.parent as argument but without to
NULL-check it before.
Solve this by checking the pointer before the call to dev_name().
Fixes: af5f60c7e3 ("net/smc: allow PCI IDs as ib device names in the pnet table")
Reported-by: syzbot+03e3e228510223dabd34@syzkaller.appspotmail.com
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using snprintf() to convert not null-terminated strings to null
terminated strings may cause out of bounds read in the source string.
Therefore use memcpy() and terminate the target string with a null
afterwards.
Fixes: fa08666255 ("net/smc: add support for user defined EIDs")
Fixes: 3c572145c2 ("net/smc: add generic netlink support for system EID")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit 4298388574 ("net: macb: restart tx after tx used bit read")
added support for restarting transmission. Restarting tx does not work
in case controller asserts TXUBR interrupt and TQBP is already at the end
of the tx queue. In that situation, restarting tx will immediately cause
assertion of another TXUBR interrupt. The driver will end up in an infinite
interrupt loop which it cannot break out of.
For cases where TQBP is at the end of the tx queue, instead
only clear TX_USED interrupt. As more data gets pushed to the queue,
transmission will resume.
This issue was observed on a Xilinx Zynq-7000 based board.
During stress test of the network interface,
driver would get stuck on interrupt loop within seconds or minutes
causing CPU to stall.
Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220407161659.14532-1-tomas.melin@vaisala.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kongweibin reported a kernel panic in ip6_forward() when input interface
has no in6 dev associated.
The following tc commands were used to reproduce this panic:
tc qdisc del dev vxlan100 root
tc qdisc add dev vxlan100 root netem corrupt 5%
CC: stable@vger.kernel.org
Fixes: ccd27f05ae ("ipv6: fix 'disable_policy' for fwd packets")
Reported-by: kongweibin <kongweibin2@huawei.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both of of_get_parent() and of_parse_phandle() return node pointer with
refcount incremented, use of_node_put() on it to decrease refcount
when done.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
cgroupv2 helper function ignores the already-looked up sk
and uses skb->sk instead.
Just pass sk from the calling function instead; this will
make cgroup matching work for udp and tcp in input even when
edemux did not set skb->sk already.
Fixes: e0bb96db96 ("netfilter: nft_socket: add support for cgroupsv2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A user may set the SO_TXTIME socket option to ensure a packet is send
at a given time. The taprio scheduler has to confirm, that it is allowed
to send a packet at that given time, by a check against the packet time
schedule. The scheduler drop the packet, if the gates are closed at the
given send time.
The check, if SO_TXTIME is set, may fail since sk_flags are part of an
union and the union is used otherwise. This happen, if a socket is not
a full socket, like a request socket for example.
Add a check to verify, if the union is used for sk_flags.
Fixes: 4cfd5779bd ("taprio: Add support for txtime-assist mode")
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using a fixed-link, the altr_tse_pcs driver crashes
due to null-pointer dereference as no phy_device is provided to
tse_pcs_fix_mac_speed function. Fix this by adding a check for
phy_dev before calling the tse_pcs_fix_mac_speed() function.
Also clean up the tse_pcs_fix_mac_speed function a bit. There is
no need to check for splitter_base and sgmii_adapter_base
because the driver will fail if these 2 variables are not
derived from the device tree.
Fixes: fb3bbdb859 ("net: ethernet: Add TSE PCS support to dwmac-socfpga")
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the T1 phy master/slave state is changed, at the end of config_aneg
function genphy_softreset is called. After the reset all the registers
configured during the config_init are restored to default value.
To avoid this, removed the genphy_softreset call.
v1->v2
------
Added the author in cc
Fixes: 8a1b415d70 ("net: phy: added ethtool master-slave configuration support")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UBSAN warnings are observed on atlantic driver:
[ 294.432996] UBSAN: array-index-out-of-bounds in /build/linux-Qow4fL/linux-5.15.0/drivers/net/ethernet/aquantia/atlantic/aq_nic.c:484:48
[ 294.433695] index 8 is out of range for type 'aq_vec_s *[8]'
The ring is dereferenced right before breaking out the loop, to prevent
that from happening, only use the index in the loop to fix the issue.
BugLink: https://bugs.launchpad.net/bugs/1958770
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20220408022204.16815-1-kai.heng.feng@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The DSA master might not have been probed yet in which case the probe of
the felix switch fails with -EPROBE_DEFER:
[ 4.435305] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
It is not an error. Use dev_err_probe() to demote this particular error
to a debug message.
Fixes: 5605194877 ("net: dsa: ocelot: add driver for Felix switch family")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220408101521.281886-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, when inserting a new filter that needs to sit at the head
of chain 0, it will first update the heads pointer on all devices using
the (shared) block, and only then complete the initialization of the new
element so that it has a "next" element.
This can lead to a situation that the chain 0 head is propagated to
another CPU before the "next" initialization is done. When this race
condition is triggered, packets being matched on that CPU will simply
miss all other filters, and will flow through the stack as if there were
no other filters installed. If the system is using OVS + TC, such
packets will get handled by vswitchd via upcall, which results in much
higher latency and reordering. For other applications it may result in
packet drops.
This is reproducible with a tc only setup, but it varies from system to
system. It could be reproduced with a shared block amongst 10 veth
tunnels, and an ingress filter mirroring packets to another veth.
That's because using the last added veth tunnel to the shared block to
do the actual traffic, it makes the race window bigger and easier to
trigger.
The fix is rather simple, to just initialize the next pointer of the new
filter instance (tp) before propagating the head change.
The fixes tag is pointing to the original code though this issue should
only be observed when using it unlocked.
Fixes: 2190d1d094 ("net: sched: introduce helpers to work with filter chains")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/b97d5f4eaffeeb9d058155bcab63347527261abf.1649341369.git.marcelo.leitner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yi Chen reported an unexpected sctp connection abort, and it occurred when
COOKIE_ECHO is bundled with DATA Fragment by SCTP HW GSO. As the IP header
is included in chunk->head_skb instead of chunk->skb, it failed to check
IP header version in security_sctp_assoc_request().
According to Ondrej, SELinux only looks at IP header (address and IPsec
options) and XFRM state data, and these are all included in head_skb for
SCTP HW GSO packets. So fix it by using head_skb when calling
security_sctp_assoc_request() in processing COOKIE_ECHO.
v1->v2:
- As Ondrej noticed, chunk->head_skb should also be used for
security_sctp_assoc_established() in sctp_sf_do_5_1E_ca().
Fixes: e215dab1c4 ("security: call security_sctp_assoc_request in sctp_sf_do_5_1D_ce")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/71becb489e51284edf0c11fc15246f4ed4cef5b6.1649337862.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a driver for an interrupt controller is missing, of_irq_get()
returns -EPROBE_DEFER ad infinitum, causing
fwnode_mdiobus_phy_device_register(), and ultimately, the entire
of_mdiobus_register() call, to fail. In turn, any phy_connect() call
towards a PHY on this MDIO bus will also fail.
This is not what is expected to happen, because the PHY library falls
back to poll mode when of_irq_get() returns a hard error code, and the
MDIO bus, PHY and attached Ethernet controller work fine, albeit
suboptimally, when the PHY library polls for link status. However,
-EPROBE_DEFER has special handling given the assumption that at some
point probe deferral will stop, and the driver for the supplier will
kick in and create the IRQ domain.
Reasons for which the interrupt controller may be missing:
- It is not yet written. This may happen if a more recent DT blob (with
an interrupt-parent for the PHY) is used to boot an old kernel where
the driver didn't exist, and that kernel worked with the
vintage-correct DT blob using poll mode.
- It is compiled out. Behavior is the same as above.
- It is compiled as a module. The kernel will wait for a number of
seconds specified in the "deferred_probe_timeout" boot parameter for
user space to load the required module. The current default is 0,
which times out at the end of initcalls. It is possible that this
might cause regressions unless users adjust this boot parameter.
The proposed solution is to use the driver_deferred_probe_check_state()
helper function provided by the driver core, which gives up after some
-EPROBE_DEFER attempts, taking "deferred_probe_timeout" into consideration.
The return code is changed from -EPROBE_DEFER into -ENODEV or
-ETIMEDOUT, depending on whether the kernel is compiled with support for
modules or not.
Fixes: 66bdede495 ("of_mdio: Fix broken PHY IRQ in case of probe deferral")
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220407165538.4084809-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This change caused a regression with resetting while changing network
namespaces. By clearing the IFF_UP flag, the kernel now thinks it has
fully closed the device.
This reverts commit 0cc318d2e8.
Fixes: 0cc318d2e8 ("iavf: Fix deadlock occurrence during resetting VF interface")
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The CI testing bots triggered the following splat:
[ 718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80
[ 718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834
[ 718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S W IOE 5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1
[ 718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 718.223418] Call Trace:
[ 718.227139]
[ 718.230783] dump_stack_lvl+0x33/0x42
[ 718.234431] print_address_description.constprop.9+0x21/0x170
[ 718.238177] ? free_irq_cpu_rmap+0x53/0x80
[ 718.241885] ? free_irq_cpu_rmap+0x53/0x80
[ 718.245539] kasan_report.cold.18+0x7f/0x11b
[ 718.249197] ? free_irq_cpu_rmap+0x53/0x80
[ 718.252852] free_irq_cpu_rmap+0x53/0x80
[ 718.256471] ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice]
[ 718.260174] ice_remove_arfs+0x5f/0x70 [ice]
[ 718.263810] ice_rebuild_arfs+0x3b/0x70 [ice]
[ 718.267419] ice_rebuild+0x39c/0xb60 [ice]
[ 718.270974] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 718.274472] ? ice_init_phy_user_cfg+0x360/0x360 [ice]
[ 718.278033] ? delay_tsc+0x4a/0xb0
[ 718.281513] ? preempt_count_sub+0x14/0xc0
[ 718.284984] ? delay_tsc+0x8f/0xb0
[ 718.288463] ice_do_reset+0x92/0xf0 [ice]
[ 718.292014] ice_pci_err_resume+0x91/0xf0 [ice]
[ 718.295561] pci_reset_function+0x53/0x80
<...>
[ 718.393035] Allocated by task 690:
[ 718.433497] Freed by task 20834:
[ 718.495688] Last potentially related work creation:
[ 718.568966] The buggy address belongs to the object at ffff8881bd127e00
which belongs to the cache kmalloc-96 of size 96
[ 718.574085] The buggy address is located 0 bytes inside of
96-byte region [ffff8881bd127e00, ffff8881bd127e60)
[ 718.579265] The buggy address belongs to the page:
[ 718.598905] Memory state around the buggy address:
[ 718.601809] ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 718.604796] ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[ 718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 718.610811] ^
[ 718.613819] ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[ 718.617107] ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
This is due to that free_irq_cpu_rmap() is always being called
*after* (devm_)free_irq() and thus it tries to work with IRQ descs
already freed. For example, on device reset the driver frees the
rmap right before allocating a new one (the splat above).
Make rmap creation and freeing function symmetrical with
{request,free}_irq() calls i.e. do that on ifup/ifdown instead
of device probe/remove/resume. These operations can be performed
independently from the actual device aRFS configuration.
Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers
only when aRFS is disabled -- otherwise, CPU rmap sets and clears
its own and they must not be touched manually.
Fixes: 28bf26724f ("ice: Implement aRFS")
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When kmalloc and dst_cache_init failed,
should return ENOMEM rather than ENOBUFS.
Signed-off-by: Hongbin Wang <wh_bin@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bounds checking is unhappy that we try to copy both Ethernet
addresses but pass pointer to the first one. Luckily destination
address is the first field so pass the pointer to the entire header,
whatever.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After feeding a decapsulated packet to a veth device with act_mirred,
skb_headlen() may be 0. But veth_xmit() calls __dev_forward_skb(),
which expects at least ETH_HLEN byte of linear data (as
__dev_forward_skb2() calls eth_type_trans(), which pulls ETH_HLEN bytes
unconditionally).
Use pskb_may_pull() to ensure veth_xmit() respects this constraint.
kernel BUG at include/linux/skbuff.h:2328!
RIP: 0010:eth_type_trans+0xcf/0x140
Call Trace:
<IRQ>
__dev_forward_skb2+0xe3/0x160
veth_xmit+0x6e/0x250 [veth]
dev_hard_start_xmit+0xc7/0x200
__dev_queue_xmit+0x47f/0x520
? skb_ensure_writable+0x85/0xa0
? skb_mpls_pop+0x98/0x1c0
tcf_mirred_act+0x442/0x47e [act_mirred]
tcf_action_exec+0x86/0x140
fl_classify+0x1d8/0x1e0 [cls_flower]
? dma_pte_clear_level+0x129/0x1a0
? dma_pte_clear_level+0x129/0x1a0
? prb_fill_curr_block+0x2f/0xc0
? skb_copy_bits+0x11a/0x220
__tcf_classify+0x58/0x110
tcf_classify_ingress+0x6b/0x140
__netif_receive_skb_core.constprop.0+0x47d/0xfd0
? __iommu_dma_unmap_swiotlb+0x44/0x90
__netif_receive_skb_one_core+0x3d/0xa0
netif_receive_skb+0x116/0x170
be_process_rx+0x22f/0x330 [be2net]
be_poll+0x13c/0x370 [be2net]
__napi_poll+0x2a/0x170
net_rx_action+0x22f/0x2f0
__do_softirq+0xca/0x2a8
__irq_exit_rcu+0xc1/0xe0
common_interrupt+0x83/0xa0
Fixes: e314dbdc1c ("[NET]: Virtual ethernet device driver.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using layer2 or layer2+3 hash, only the 5th byte of the MAC
addresses is used.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A tc flower filter matching TCA_FLOWER_KEY_VLAN_ETH_TYPE is expected to
match the L2 ethertype following the first VLAN header, as confirmed by
linked discussion with the maintainer. However, such rule also matches
packets that have additional second VLAN header, even though filter has
both eth_type and vlan_ethtype set to "ipv4". Looking at the code this
seems to be mostly an artifact of the way flower uses flow dissector.
First, even though looking at the uAPI eth_type and vlan_ethtype appear
like a distinct fields, in flower they are all mapped to the same
key->basic.n_proto. Second, flow dissector skips following VLAN header as
no keys for FLOW_DISSECTOR_KEY_CVLAN are set and eventually assigns the
value of n_proto to last parsed header. With these, such filters ignore any
headers present between first VLAN header and first "non magic"
header (ipv4 in this case) that doesn't result
FLOW_DISSECT_RET_PROTO_AGAIN.
Fix the issue by extending flow dissector VLAN key structure with new
'vlan_eth_type' field that matches first ethertype following previously
parsed VLAN header. Modify flower classifier to set the new
flow_dissector_key_vlan->vlan_eth_type with value obtained from
TCA_FLOWER_KEY_VLAN_ETH_TYPE/TCA_FLOWER_KEY_CVLAN_ETH_TYPE uAPIs.
Link: https://lore.kernel.org/all/Yjhgi48BpTGh6dig@nanopsycho/
Fixes: 9399ae9a6c ("net_sched: flower: Add vlan support")
Fixes: d64efd0926 ("net/sched: flower: Add supprt for matching on QinQ vlan headers")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This refers common bindings, so this is preferred for
unevaluatedProperties instead of additionalProperties.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of "oneOf:" choices, use "allOf:" and "if:" to define clocks,
resets, and their names that can be taken by the compatible string.
The order of clock-names and reset-names doesn't change here.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmJO48cACgkQSfxwEqXe
A667LA//cIZcAx2gi7S0MwpQJFUlVovRHgPYbSWlMaPuTYxzhyLoevG2ubuvfT5/
1QT/uLiJhjKtsbqoIOUKCcihN2RgquOCIBUw1aHdwTTpGA/jfEbutQwr/A8o0u+i
5q8hNlafK6M2d4hAcw89iTNSQ5BSBaBfIfXUGhCJDfk8rISAIWO/Ta0rL6omzQBu
y1RhiwPoLA1hIyWyATy3eaLkAMEHUJllsCpa7n/knx5xb650NJoBAb1zmYtkjqWc
RQMYqJken4EpC4tR9xFVrer8nkfc5H9XfBxmh6YLT7f8LFGHM8TKxMaPHSQyFs6f
bXOG+5WtdPquuIq9aDmLbD2ktj4fS6CWMrz0HDnJ/dLvNAIfPnlY1wbvpyguDfvS
gC7eKvxieQrm/JrQTbB3BglAz+c0fThP8sbe5d63Vu/83TFvmRlIwnAJgaZ6Uj7G
To+pSHHS2l8I0XjXnGhe04ezGXjl+hClodBzNxar92lK00YY/1L7cSFT5pWtQBZP
xddb3E18pu1oef86BVprxHGU17M/Y6KbDN++mPUocUZjQDvNUi3ot4msa5HKJPik
+DQOgJ4niveyCZuLmMJRT+rYHaYhlMOcdYF+8q9esxj0csLok5wfQ0htM4apjNIT
muu9SEQC2v+OQQEZwiqMlnjVWJAZO4C+3m9kaJD57+m6stiz58A=
=cTzo
-----END PGP SIGNATURE-----
Merge tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
- Another fixup to the fast_init/crng_init split, this time in how much
entropy is being credited, from Jan Varho.
- As discussed, we now opportunistically call try_to_generate_entropy()
in /dev/urandom reads, as a replacement for the reverted commit. I
opted to not do the more invasive wait_for_random_bytes() change at
least for now, preferring to do something smaller and more obvious
for the time being, but maybe that can be revisited as things evolve
later.
- Userspace can use FUSE or userfaultfd or simply move a process to
idle priority in order to make a read from the random device never
complete, which breaks forward secrecy, fixed by overwriting
sensitive bytes early on in the function.
- Jann Horn noticed that /dev/urandom reads were only checking for
pending signals if need_resched() was true, a bug going back to the
genesis commit, now fixed by always checking for signal_pending() and
calling cond_resched(). This explains various noticeable signal
delivery delays I've seen in programs over the years that do long
reads from /dev/urandom.
- In order to be more like other devices (e.g. /dev/zero) and to
mitigate the impact of fixing the above bug, which has been around
forever (users have never really needed to check the return value of
read() for medium-sized reads and so perhaps many didn't), we now
move signal checking to the bottom part of the loop, and do so every
PAGE_SIZE-bytes.
* tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: check for signals every PAGE_SIZE chunk of /dev/[u]random
random: check for signal_pending() outside of need_resched() check
random: do not allow user to keep crng key around on stack
random: opportunistically initialize on /dev/urandom reads
random: do not split fast init input in add_hwgenerator_randomness()
A small set of fixes for 5.18-rc2:
* Fix a compilation warning due to an uninitialized variable in
ata_sff_lost_interrupt(), from me.
* Fix invalid internal command tag handling in the sata_dwc_460ex
driver, from Christian.
* Disable READ LOG DMA EXT with Samsung 840 EVO SSDs as this command
causes the drives to hang, from Christian.
* Change the config option CONFIG_SATA_LPM_POLICY back to its original
name CONFIG_SATA_LPM_MOBILE_POLICY to avoid potential problems with
users losing their configuration (as discussed during the merge
window), from Mario.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYk7YwgAKCRDdoc3SxdoY
dhTNAQDlkD62hT8471dC5NZTpY7CI4b0uDajV5O8KnVKKQ7iNwD/fuMw50kzFK/f
MRMWNFzW8z/gTZAjyE3jiSGLfZvYdAw=
=xH3n
-----END PGP SIGNATURE-----
Merge tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fixes from Damien Le Moal:
- Fix a compilation warning due to an uninitialized variable in
ata_sff_lost_interrupt(), from me.
- Fix invalid internal command tag handling in the sata_dwc_460ex
driver, from Christian.
- Disable READ LOG DMA EXT with Samsung 840 EVO SSDs as this command
causes the drives to hang, from Christian.
- Change the config option CONFIG_SATA_LPM_POLICY back to its original
name CONFIG_SATA_LPM_MOBILE_POLICY to avoid potential problems with
users losing their configuration (as discussed during the merge
window), from Mario.
* tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back
ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
ata: sata_dwc_460ex: Fix crash due to OOB write
ata: libata-sff: Fix compilation warning in ata_sff_lost_interrupt()
When a slip driver is detaching, the slip_close() will act to
cleanup necessary resources and sl->tty is set to NULL in
slip_close(). Meanwhile, the packet we transmit is blocked,
sl_tx_timeout() will be called. Although slip_close() and
sl_tx_timeout() use sl->lock to synchronize, we don`t judge
whether sl->tty equals to NULL in sl_tx_timeout() and the
null pointer dereference bug will happen.
(Thread 1) | (Thread 2)
| slip_close()
| spin_lock_bh(&sl->lock)
| ...
... | sl->tty = NULL //(1)
sl_tx_timeout() | spin_unlock_bh(&sl->lock)
spin_lock(&sl->lock); |
... | ...
tty_chars_in_buffer(sl->tty)|
if (tty->ops->..) //(2) |
... | synchronize_rcu()
We set NULL to sl->tty in position (1) and dereference sl->tty
in position (2).
This patch adds check in sl_tx_timeout(). If sl->tty equals to
NULL, sl_tx_timeout() will goto out.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20220405132206.55291-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2022-04-06
We've added 8 non-merge commits during the last 8 day(s) which contain
a total of 9 files changed, 139 insertions(+), 36 deletions(-).
The main changes are:
1) rethook related fixes, from Jiri and Masami.
2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.
3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
bpf: selftests: Test fentry tracing a struct_ops program
bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
rethook: Fix to use WRITE_ONCE() for rethook:: Handler
selftests/bpf: Fix warning comparing pointer to 0
bpf: Fix sparse warnings in kprobe_multi_resolve_syms
bpftool: Explicit errno handling in skeletons
====================
Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In 1448769c9c ("random: check for signal_pending() outside of
need_resched() check"), Jann pointed out that we previously were only
checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
had TIF_NEED_RESCHED set, which meant in practice, super long reads to
/dev/[u]random would delay signal handling by a long time. I tried this
using the below program, and indeed I wasn't able to interrupt a
/dev/urandom read until after several megabytes had been read. The bug
he fixed has always been there, and so code that reads from /dev/urandom
without checking the return value of read() has mostly worked for a long
time, for most sizes, not just for <= 256.
Maybe it makes sense to keep that code working. The reason it was so
small prior, ignoring the fact that it didn't work anyway, was likely
because /dev/random used to block, and that could happen for pretty
large lengths of time while entropy was gathered. But now, it's just a
chacha20 call, which is extremely fast and is just operating on pure
data, without having to wait for some external event. In that sense,
/dev/[u]random is a lot more like /dev/zero.
Taking a page out of /dev/zero's read_zero() function, it always returns
at least one chunk, and then checks for signals after each chunk. Chunk
sizes there are of length PAGE_SIZE. Let's just copy the same thing for
/dev/[u]random, and check for signals and cond_resched() for every
PAGE_SIZE amount of data. This makes the behavior more consistent with
expectations, and should mitigate the impact of Jann's fix for the
age-old signal check bug.
---- test program ----
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <sys/random.h>
static unsigned char x[~0U];
static void handle(int) { }
int main(int argc, char *argv[])
{
pid_t pid = getpid(), child;
signal(SIGUSR1, handle);
if (!(child = fork())) {
for (;;)
kill(pid, SIGUSR1);
}
pause();
printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
kill(child, SIGTERM);
return 0;
}
Cc: Jann Horn <jannh@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The previous commit fixed support for dual-stack sockets in
bpf_tcp_check_syncookie. This commit adjusts the selftest to verify the
fixed functionality.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-2-maximmi@nvidia.com
bpf_tcp_gen_syncookie looks at the IP version in the IP header and
validates the address family of the socket. It supports IPv4 packets in
AF_INET6 dual-stack sockets.
On the other hand, bpf_tcp_check_syncookie looks only at the address
family of the socket, ignoring the real IP version in headers, and
validates only the packet size. This implementation has some drawbacks:
1. Packets are not validated properly, allowing a BPF program to trick
bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
socket.
2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
up receiving a SYNACK with the cookie, but the following ACK gets
dropped.
This patch fixes these issues by changing the checks in
bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
version from the header is taken into account, and it is validated
properly with address family.
Fixes: 3990408470 ("bpf: add helper to check for a valid SYN cookie")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
All remaining skbs should be released when myri10ge_xmit fails to
transmit a packet. Fix it within another skb_list_walk_safe.
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aqc111_rx_fixup() contains several out-of-bounds accesses that can be
triggered by a malicious (or defective) USB device, in particular:
- The metadata array (desc_offset..desc_offset+2*pkt_count) can be out of bounds,
causing OOB reads and (on big-endian systems) OOB endianness flips.
- A packet can overlap the metadata array, causing a later OOB
endianness flip to corrupt data used by a cloned SKB that has already
been handed off into the network stack.
- A packet SKB can be constructed whose tail is far beyond its end,
causing out-of-bounds heap data to be considered part of the SKB's
data.
Found doing variant analysis. Tested it with another driver (ax88179_178a), since
I don't have a aqc111 device to test it, but the code looks very similar.
Signed-off-by: Marcin Kozlowski <marcinguy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qede_build_skb() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.
This results in a kernel panic because the skb to reserve is NULL.
Add a check in case build_skb() failed to allocate and return NULL.
The NULL return is handled correctly in callers to qede_build_skb().
Fixes: 8a8633978b ("qede: Add build_skb() support.")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole'
Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used.
Fixes: 4b340a5a72 ("net: ip6mr: add support for passing full packet on wrong mif")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-04-05
Maciej Fijalkowski says:
We were solving issues around AF_XDP busy poll's not-so-usual scenarios,
such as very big busy poll budgets applied to very small HW rings. This
set carries the things that were found during that work that apply to
net tree.
One thing that was fixed for all in-tree ZC drivers was missing on ice
side all the time - it's about syncing RCU before destroying XDP
resources. Next one fixes the bit that is checked in ice_xsk_wakeup and
third one avoids false setting of DD bits on Tx descriptors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Following the recommendation in Documentation/memory-barriers.txt for
virtual machine guests.
Fixes: 8b6a877c06 ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>