Yuchung Cheng says:
====================
tcp: RACK loss recovery bug fixes
This patch set has four minor bug fixes in TCP RACK loss recovery.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
RACK skips an ACK unless it advances the most recently delivered
TX timestamp (rack.mstamp). Since RACK also uses the most recent
RTT to decide if a packet is lost, RACK should still run the
loss detection whenever the most recent RTT changes. For example,
an ACK that does not advance the timestamp but triggers the cwnd
undo due to reordering, would then use the most recent (higher)
RTT measurement to detect further losses.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RACK should mark a packet lost when remaining wait time is zero.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sender detects spurious retransmission, all packets
marked lost are remarked to be in-flight. However some may
be considered lost based on its timestamps in RACK. This patch
forces RACK to re-evaluate, which may be skipped previously if
the ACK does not advance RACK timestamp.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RACK does not test the loss recovery state correctly to compute
the reordering window. It assumes if lost_out is zero then TCP is
not in loss recovery. But it can be zero during recovery before
calling tcp_rack_detect_loss(): when an ACK acknowledges all
packets marked lost before receiving this ACK, but has not yet
to discover new ones by tcp_rack_detect_loss(). The fix is to
simply test the congestion state directly.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After applying 2270bc5da3 ("bnxt_en: Fix netpoll handling") and
903649e718 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
bnxt_poll+0x0/0xd0 exceeded budget in poll
<snip>
Call Trace:
[<ffffffff814be5cd>] dump_stack+0x4d/0x70
[<ffffffff8107e013>] __warn+0xd3/0xf0
[<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
[<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
[<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
[<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
[<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
[<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
[<ffffffff810c9549>] console_unlock+0x339/0x5b0
[<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
[<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
[<ffffffff81173df5>] printk+0x48/0x50
[<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
[<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
[<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
[<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
[<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
[<ffffffff81095f8b>] process_one_work+0x19b/0x480
[<ffffffff810967ca>] worker_thread+0x6a/0x520
[<ffffffff8109c7c4>] kthread+0xe4/0x100
[<ffffffff81884c52>] ret_from_fork+0x22/0x40
This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell says:
====================
TCP BBR sampling fixes for loss recovery undo
This patch series has a few minor bug fixes for cases where spurious
loss recoveries can trick BBR estimators into estimating that the
available bandwidth is much lower than the true available bandwidth.
In both cases the fix here is to just reset the estimator upon loss
recovery undo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix BBR so that upon notification of a loss recovery undo BBR resets
long-term bandwidth sampling.
Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
can cause BBR to spuriously estimate that we are seeing loss rates
high enough to trigger long-term bandwidth estimation. To avoid that
problem, this commit resets long-term bandwidth sampling on loss
recovery undo events.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix BBR so that upon notification of a loss recovery undo BBR resets
the full pipe detection (STARTUP exit) state machine.
Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
could previously cause BBR to spuriously estimate that the pipe is
full.
Since spurious loss recovery means that our overall sending will have
slowed down spuriously, this commit gives a flow more time to probe
robustly for bandwidth and decide the pipe is really full.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit records the "full bw reached" decision in a new
full_bw_reached bit. This is a pure refactor that does not change the
current behavior, but enables subsequent fixes and improvements.
In particular, this enables simple and clean fixes because the full_bw
and full_bw_cnt can be unconditionally zeroed without worrying about
forgetting that we estimated we filled the pipe in Startup. And it
enables future improvements because multiple code paths can be used
for estimating that we filled the pipe in Startup; any new code paths
only need to set this bit when they think the pipe is full.
Note that this fix intentionally reduces the width of the full_bw_cnt
counter, since we have never used the most significant bit.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bytes_compl and pkts_compl pointers passed to efx_dequeue_buffers
cannot be NULL. Add a paranoid warning to check this condition and fix
the one case where they were NULL.
efx_enqueue_unwind() is called very rarely, during error handling.
Without this fix it would fail with a NULL pointer dereference in
efx_dequeue_buffer, with efx_enqueue_skb in the call stack.
Fixes: e9117e5099 ("sfc: Firmware-Assisted TSO version 2")
Reported-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This controller does not support EEE, but it may connect to a PHY
which supports EEE and advertises EEE by default, while its link
partner also advertises EEE. If this happens, the PHY enters low
power mode when the traffic rate is low and causes packet loss.
This patch disables EEE advertisement by default for any PHY that
gianfar connects to, to prevent the above unwanted outcome.
Signed-off-by: Shaohui Xie <Shaohui.Xie@nxp.com>
Tested-by: Yangbo Lu <Yangbo.lu@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.
< ack 6001 win 10000 sack 7001:38001
< ack 7001 win 0 sack 8001:38001 // Reneg detected
> seq 7001:8001 // RTO, SACK cleared.
< ack 38001 win 10000
In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.
This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.
Fixes: b9f64820fb ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-06
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fixing broken uapi for BPF tracing programs for s390 and arm64
architectures due to pt_regs being in-kernel only, and not part
of uapi right now. A wrapper is added that exports pt_regs in
an asm-generic way. For arm64 this maps to existing user_pt_regs
structure and for s390 a user_pt_regs structure exporting the
beginning of pt_regs is added and uapi-exported, thus fixing the
BPF issues seen in perf (and BPF selftests), all from Hendrik.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The qmi_wwan minidriver support a 'raw-ip' mode where frames are
received without any ethernet header. This causes alignment issues
because the skbs allocated by usbnet are "IP aligned".
Fix by allowing minidrivers to disable the additional alignment
offset. This is implemented using a per-device flag, since the same
minidriver also supports 'ethernet' mode.
Fixes: 32f7adf633 ("net: qmi_wwan: support "raw IP" mode")
Reported-and-tested-by: Jay Foster <jay@systech.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I switched rcv_rtt_est to high resolution timestamps, I forgot
that tp->tcp_mstamp needed to be refreshed in tcp_rcv_space_adjust()
Using an old timestamp leads to autotuning lags.
Fixes: 645f4c6f2e ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 28033ae4e0 ("net: netlink: Update attr validation to require
exact length for some types") requires attributes using types NLA_U* and
NLA_S* to have an exact length. This change is exposing bugs in various
userspace commands that are sending attributes with an invalid length
(e.g., attribute has type NLA_U8 and userspace sends NLA_U32). While
the commands are clearly broken and need to be fixed, users are arguing
that the sudden change in enforcement is breaking older commands on
newer kernels for use cases that otherwise "worked".
Relax the validation to print a warning mesage similar to what is done
for messages containing extra bytes after parsing.
Fixes: 28033ae4e0 ("net: netlink: Update attr validation to require exact length for some types")
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8d79266bc4 ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
introduced new exit point in ipxip6_rcv. however rcu_read_unlock is
missing there. this diff is fixing this
v1->v2:
instead of doing rcu_read_unlock in place, we are going to "drop"
section (to prevent skb leakage)
Fixes: 8d79266bc4 ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
mv88e6xxx error patch fixes
While trying to bring up a new PHY on a board, i exercised the error
paths a bit, and discovered some bugs. The unwind for interrupt
handling deadlocks, and the MDIO code hits a BUG() when a registered
MDIO device is freed without first being unregistered.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The MDIO busses need to be unregistered before they are freed,
otherwise BUG() is called. Add a call to the unregister code if the
registration fails, since we can have multiple busses, of which some
may correctly register before one fails. This requires moving the code
around a little.
Fixes: a3c53be55c ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing the interrupt handling code, we should mask the
generation of interrupts. The code however unmasked all
interrupts. This can then cause a new interrupt. We then get into a
deadlock where the interrupt thread is waiting to run, and the code
continues, trying to remove the interrupt handler, which means waiting
for the thread to complete. On a UP machine this deadlocks.
Fix so we really mask interrupts in the hardware. The same error is
made in the error path when install the interrupt handling code.
Fixes: 3460a5770c ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If clk_set_rate() fails, we should disable clk before return.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Branislav Radocaj <branislav@radocaj.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
add appropriate calls to clk_disable_unprepare() by jumping to out_mdio
in case orion_mdio_probe() returns -EPROBE_DEFER.
Found by Linux Driver Verification project (linuxtesting.org).
Fixes: 3d604da1e9 ("net: mvmdio: get and enable optional clock")
Signed-off-by: Tobias Jordan <Tobias.Jordan@elektrobit.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Macvlan devices are similar to vlans and do not update their
own trans_start. In order for arp monitoring to work for a bond device
when the slaves are macvlans, obtain its real device.
Signed-off-by: Chris Dion <christopher.dion@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a period after a newline causes bad output.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offload IP header checksum to NIC.
This fixes a previous patch which disabled checksum offloading
for both IPv4 and IPv6 packets. So L3 checksum offload was
getting disabled for IPv4 pkts. And HW is dropping these pkts
for some reason.
Without this patch, IPv4 TSO appears to be broken:
WIthout this patch I get ~16kbyte/s, with patch close to 2mbyte/s
when copying files via scp from test box to my home workstation.
Looking at tcpdump on sender it looks like hardware drops IPv4 TSO skbs.
This patch restores performance for me, ipv6 looks good too.
Fixes: fa6d7cb5d7 ("net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts")
Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Aleksey Makarov <aleksey.makarov@auriga.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This changes calling conventions (and simplifies the hell out
the callers). New rules: once struct socket had been passed
to sock_alloc_file(), it's been consumed either by struct file
or by sock_release() done by sock_alloc_file(). Either way
the caller should not do sock_release() after that point.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
simplifies failure exits considerably...
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) it's fput() or sock_release(), not both
2) don't do fd_install() until the last failure exit.
3) not a bug per se, but... don't attach socket to struct file
until it's set up.
Take reserving descriptor into the caller, move fd_install() to the
caller, sanitize failure exits and calling conventions.
Cc: stable@vger.kernel.org # v4.6+
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever the sock object is in DCCP_CLOSED state,
dccp_disconnect() must free dccps_hc_tx_ccid and
dccps_hc_rx_ccid and set to NULL.
Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Potapenko reported use of uninitialized memory [1]
This happens when inserting a request socket into TCP ehash,
in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.
Bug was added by commit d894ba18d4 ("soreuseport: fix ordering for
mixed v4/v6 sockets")
Note that d296ba60d8 ("soreuseport: Resolve merge conflict for v4/v6
ordering fix") missed the opportunity to get rid of
hlist_nulls_add_tail_rcu() :
Both UDP sockets and TCP/DCCP listeners no longer use
__sk_nulls_add_node_rcu() for their hash insertion.
Since all other sockets have unique 4-tuple, the reuseport status
has no special meaning, so we can always use hlist_nulls_add_head_rcu()
for them and save few cycles/instructions.
[1]
==================================================================
BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x185/0x1d0 lib/dump_stack.c:52
kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
__msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
__sk_nulls_add_node_rcu ./include/net/sock.h:684
inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
NF_HOOK ./include/linux/netfilter.h:248
ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
dst_input ./include/net/dst.h:477
ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
NF_HOOK ./include/linux/netfilter.h:248
ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
__netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
__netif_receive_skb net/core/dev.c:4336
netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
napi_skb_finish net/core/dev.c:4858
napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
e1000_clean_rx_irq+0x1492/0x1d30
drivers/net/ethernet/intel/e1000/e1000_main.c:4474
e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
napi_poll net/core/dev.c:5500
net_rx_action+0x73c/0x1820 net/core/dev.c:5566
__do_softirq+0x4b4/0x8dd kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364
irq_exit+0x203/0x240 kernel/softirq.c:405
exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
common_interrupt+0x86/0x86
Fixes: d894ba18d4 ("soreuseport: fix ordering for mixed v4/v6 sockets")
Fixes: d296ba60d8 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexander Potapenko <glider@google.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan says:
====================
net: qualcomm: rmnet: Fix leaks in failure scenarios
Patch 1 fixes a leak in transmit path where a skb cannot be
transmitted due to insufficient headroom to stamp the map header.
Patch 2 fixes a leak in rmnet_newlink() failure because the
rmnet endpoint was never freed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If the rmnet device creation fails in the newlink either while
registering with the physical device or after subsequent
operations, the rmnet endpoint information is never freed.
Fixes: ceed73a2cf ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a skb in transmit path does not have sufficient headroom to add
the map header, the skb is not sent out and is never freed.
Fixes: ceed73a2cf ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 3b4477d2dc ("VSOCK: use TCP
state constants for sk_state") VSOCK has used TCP_* constants for
sk_state.
Commit b4562ca792 ("hv_sock: add locking
in the open/close/release code paths") reintroduced the SS_DISCONNECTING
constant.
This patch replaces the old SS_DISCONNECTING with the new TCP_CLOSING
constant.
CC: Dexuan Cui <decui@microsoft.com>
CC: Cathy Avery <cavery@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the function tipc_accept_from_sock() fails to create an instance of
struct tipc_subscriber it omits to free the already created instance of
struct tipc_conn instance before it returns.
We fix that with this commit.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni says:
====================
net: sh_eth: DMA mapping API fixes
Here are two patches that fix how the sh_eth driver is using the DMA
mapping API: a bogus struct device is used in some places, or a NULL
struct device is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Using NULL as argument for the DMA mapping API is bogus, as the DMA
mapping API may use information from the "struct device" to perform
the DMA mapping operation. Therefore, pass the appropriate "struct
device".
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two types of "struct device": the one representing the
physical device on its physical bus (platform, SPI, PCI, etc.), and
the one representing the logical device in its device class (net,
etc.).
The DMA mapping API expects to receive as argument a "struct device"
representing the physical device, as the "struct device" contains
information about the bus that the DMA API needs.
However, the sh_eth driver mistakenly uses the "struct device"
representing the logical device (embedded in "struct net_device")
rather than the "struct device" representing the physical device on
its bus.
This commit fixes that by adjusting all calls to the DMA mapping API.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel says:
====================
RED qdisc fixes
Add some input validation checks to RED qdisc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the qmin & qmax values doesn't overflow for the given Wlog value.
Check that qmin <= qmax.
Fixes: a783474591 ("[PKT_SCHED]: Generic RED layer")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not allow delta value to be zero since it is used as a divisor.
Fixes: 8af2a218de ("sch_red: Adaptative RED AQM")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to LS1021A RM, the value of PAL can be set so that the start of the
IP header in the receive data buffer is aligned to a 32-bit boundary. Normally,
setting PAL = 2 provides minimal padding to ensure such alignment of the IP
header.
However every incoming packet's 8-byte time stamp will be inserted into the
packet data buffer as padding alignment bytes when hardware time stamping is
enabled.
So we set the padding 8+2 here to avoid the flooded alignment faults:
root@128:~# cat /proc/cpu/alignment
User: 0
System: 17539 (inet_gro_receive+0x114/0x2c0)
Skipped: 0
Half: 0
Word: 0
DWord: 0
Multi: 17539
User faults: 2 (fixup)
Also shown when exception report enablement
CPU: 0 PID: 161 Comm: irq/66-eth1_g0_ Not tainted 4.1.21-rt13-WR8.0.0.0_preempt-rt #16
Hardware name: Freescale LS1021A
[<8001b420>] (unwind_backtrace) from [<8001476c>] (show_stack+0x20/0x24)
[<8001476c>] (show_stack) from [<807cfb48>] (dump_stack+0x94/0xac)
[<807cfb48>] (dump_stack) from [<80025d70>] (do_alignment+0x720/0x958)
[<80025d70>] (do_alignment) from [<80009224>] (do_DataAbort+0x40/0xbc)
[<80009224>] (do_DataAbort) from [<80015398>] (__dabt_svc+0x38/0x60)
Exception stack(0x86ad1cc0 to 0x86ad1d08)
1cc0: f9b3e080 86b3d072 2d78d287 00000000 866816c0 86b3d05e 86e785d0 00000000
1ce0: 00000011 0000000e 80840ab0 86ad1d3c 86ad1d08 86ad1d08 806d7fc0 806d806c
1d00: 40070013 ffffffff
[<80015398>] (__dabt_svc) from [<806d806c>] (inet_gro_receive+0x114/0x2c0)
[<806d806c>] (inet_gro_receive) from [<80660eec>] (dev_gro_receive+0x21c/0x3c0)
[<80660eec>] (dev_gro_receive) from [<8066133c>] (napi_gro_receive+0x44/0x17c)
[<8066133c>] (napi_gro_receive) from [<804f0538>] (gfar_clean_rx_ring+0x39c/0x7d4)
[<804f0538>] (gfar_clean_rx_ring) from [<804f0bf4>] (gfar_poll_rx_sq+0x58/0xe0)
[<804f0bf4>] (gfar_poll_rx_sq) from [<80660b10>] (net_rx_action+0x27c/0x43c)
[<80660b10>] (net_rx_action) from [<80033638>] (do_current_softirqs+0x1e0/0x3dc)
[<80033638>] (do_current_softirqs) from [<800338c4>] (__local_bh_enable+0x90/0xa8)
[<800338c4>] (__local_bh_enable) from [<8008025c>] (irq_forced_thread_fn+0x70/0x84)
[<8008025c>] (irq_forced_thread_fn) from [<800805e8>] (irq_thread+0x16c/0x244)
[<800805e8>] (irq_thread) from [<8004e490>] (kthread+0xe8/0x104)
[<8004e490>] (kthread) from [<8000fda8>] (ret_from_fork+0x14/0x2c)
Signed-off-by: Zumeng Chen <zumeng.chen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit d6f295e9def0; some userspace (in the case
we noticed it's wpa_supplicant), is relying on the current
error code to determine that a fixed name interface already
exists.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we swapped the tx_packets, tx_bytes and tx_dropped counters
with rx_packets, rx_bytes and rx_dropped counters, respectively. This
behaviour is correct and expected for VF representors but it should not
be swapped for physical port mac representors.
Fixes: eadfa4c3be ("nfp: add stats and xmit helpers for representors")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We had to disable BH _before_ calling __inet_twsk_hashdance() in commit
cfac7f836a ("tcp/dccp: block bh before arming time_wait timer").
This means we can revert 614bdd4d6e ("tcp: must block bh in
__inet_twsk_hashdance()").
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hendrik Brueckner says:
====================
Perf tool bpf selftests revealed a broken uapi for s390 and arm64.
With the BPF_PROG_TYPE_PERF_EVENT program type the bpf_perf_event
structure exports the pt_regs structure for all architectures.
This fails for s390 and arm64 because pt_regs are not part of the
user api and kept in-kernel only. To mitigate the broken uapi,
introduce a wrapper that exports pt_regs in an asm-generic way.
For arm64, export the exising user_pt_regs structure. For s390,
introduce a user_pt_regs structure that exports the beginning of
pt_regs.
Note that user_pt_regs must export from the beginning of pt_regs
as BPF_PROG_TYPE_PERF_EVENT program type is not the only type for
running BPF programs.
Some more background:
For the bpf_perf_event, there is a uapi definition that is
passed to the BPF program. For other "probe" points like
trace points, kprobes, and uprobes, there is no uapi and the
BPF program is always passed pt_regs (which is OK as the BPF
program runs in the kernel context). The perf tool can attach
BPF programs to all of these "probe" points and, optionally,
can create a BPF prologue to access particular arguments
(passed as registers). For this, it uses DWARF/CFI
information to obtain the register and calls a perf-arch
backend function, regs_query_register_offset(). This function
returns the index into (user_)pt_regs for a particular
register. Then, perf creates a BPF prologue that accesses
this register based on the passed stucture from the "probe"
point.
Part of this series, are also updates to the testing and bpf selftest
to deal with asm-specifics. To complete the bpf support in perf, the
the regs_query_register_offset function is added for s390 to support
BPF prologue creation.
Changelog v1 -> v2:
- Correct kbuild test bot issues by including
asm-generic/bpf_perf_event.h for archictectures that do not have
their own asm version.
- Added patch to clean-up whitespace and coding style issues in s390
asm/ptrace.h (#4/6) as suggested by Alexei.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>