linux/net/ipv4
Neal Cardwell 5dfe9d2739 tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
Testing determined that the recent commit 9e046bb111 ("tcp: clear
tp->retrans_stamp in tcp_rcv_fastopen_synack()") has a race, and does
not always ensure retrans_stamp is 0 after a TFO payload retransmit.

If transmit completion for the SYN+data skb happens after the client
TCP stack receives the SYNACK (which sometimes happens), then
retrans_stamp can erroneously remain non-zero for the lifetime of the
connection, causing a premature ETIMEDOUT later.

Testing and tracing showed that the buggy scenario is the following
somewhat tricky sequence:

+ Client attempts a TFO handshake. tcp_send_syn_data() sends SYN + TFO
  cookie + data in a single packet in the syn_data skb. It hands the
  syn_data skb to tcp_transmit_skb(), which makes a clone. Crucially,
  it then reuses the same original (non-clone) syn_data skb,
  transforming it by advancing the seq by one byte and removing the
  FIN bit, and enques the resulting payload-only skb in the
  sk->tcp_rtx_queue.

+ Client sets retrans_stamp to the start time of the three-way
  handshake.

+ Cookie mismatches or server has TFO disabled, and server only ACKs
  SYN.

+ tcp_ack() sees SYN is acked, tcp_clean_rtx_queue() clears
  retrans_stamp.

+ Since the client SYN was acked but not the payload, the TFO failure
  code path in tcp_rcv_fastopen_synack() tries to retransmit the
  payload skb.  However, in some cases the transmit completion for the
  clone of the syn_data (which had SYN + TFO cookie + data) hasn't
  happened.  In those cases, skb_still_in_host_queue() returns true
  for the retransmitted TFO payload, because the clone of the syn_data
  skb has not had its tx completetion.

+ Because skb_still_in_host_queue() finds skb_fclone_busy() is true,
  it sets the TSQ_THROTTLED bit and the retransmit does not happen in
  the tcp_rcv_fastopen_synack() call chain.

+ The tcp_rcv_fastopen_synack() code next implicitly assumes the
  retransmit process is finished, and sets retrans_stamp to 0 to clear
  it, but this is later overwritten (see below).

+ Later, upon tx completion, tcp_tsq_write() calls
  tcp_xmit_retransmit_queue(), which puts the retransmit in flight and
  sets retrans_stamp to a non-zero value.

+ The client receives an ACK for the retransmitted TFO payload data.

+ Since we're in CA_Open and there are no dupacks/SACKs/DSACKs/ECN to
  make tcp_ack_is_dubious() true and make us call
  tcp_fastretrans_alert() and reach a code path that clears
  retrans_stamp, retrans_stamp stays nonzero.

+ Later, if there is a TLP, RTO, RTO sequence, then the connection
  will suffer an early ETIMEDOUT due to the erroneously ancient
  retrans_stamp.

The fix: this commit refactors the code to have
tcp_rcv_fastopen_synack() retransmit by reusing the relevant parts of
tcp_simple_retransmit() that enter CA_Loss (without changing cwnd) and
call tcp_xmit_retransmit_queue(). We have tcp_simple_retransmit() and
tcp_rcv_fastopen_synack() share code in this way because in both cases
we get a packet indicating non-congestion loss (MTU reduction or TFO
failure) and thus in both cases we want to retransmit as many packets
as cwnd allows, without reducing cwnd. And given that retransmits will
set retrans_stamp to a non-zero value (and may do so in a later
calling context due to TSQ), we also want to enter CA_Loss so that we
track when all retransmitted packets are ACked and clear retrans_stamp
when that happens (to ensure later recurring RTOs are using the
correct retrans_stamp and don't declare ETIMEDOUT prematurely).

Fixes: 9e046bb111 ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()")
Fixes: a7abf3cd76 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Link: https://patch.msgid.link/20240624144323.2371403-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25 17:22:49 -07:00
..
netfilter netfilter: tproxy: bail out if IP has been disabled on the device 2024-05-29 00:37:51 +02:00
af_inet.c net: gro: initialize network_offset in network layer 2024-05-27 16:46:59 -07:00
ah4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
arp.c arp: Convert ioctl(SIOCGARP) to RCU. 2024-05-01 18:37:07 -07:00
bpf_tcp_ca.c bpf: tcp: Allow to write tp->snd_cwnd_stamp in bpf_tcp_ca 2024-05-02 16:26:56 -07:00
cipso_ipv4.c cipso: make cipso_v4_skbuff_delattr() fully remove the CIPSO options 2024-06-14 08:19:54 +01:00
datagram.c ipv4: Set the routing scope properly in ip_route_output_ports(). 2024-02-12 17:33:05 -08:00
devinet.c rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
esp4_offload.c xfrm: Support GRO for IPv4 ESP in UDP encapsulation 2023-10-06 07:30:40 +02:00
esp4.c ipsec-next-2024-05-03 2024-05-06 19:14:56 -07:00
fib_frontend.c rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
fib_lookup.h
fib_notifier.c
fib_rules.c fib: remove unnecessary input parameters in fib_default_rule_add 2024-01-03 16:42:48 -08:00
fib_semantics.c net: add two more call_rcu_hurry() 2024-04-25 15:24:23 -07:00
fib_trie.c inet: switch inet_dump_fib() to RCU protection 2024-02-26 11:46:13 +00:00
fou_bpf.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
fou_core.c net: gro: rename skb_gro_header_hard() 2024-03-05 13:30:11 +01:00
fou_nl.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
fou_nl.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
gre_demux.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
gre_offload.c net: gro: rename skb_gro_header_hard() 2024-03-05 13:30:11 +01:00
icmp.c net/ipv4: add tracepoint for icmp_send 2024-05-08 10:39:26 +01:00
igmp.c ipv4: Set scope explicitly in ip_route_output(). 2024-04-08 13:20:51 +01:00
inet_connection_sock.c Fix race for duplicate reqsk on identical SYN 2024-06-25 11:37:45 +02:00
inet_diag.c inet_diag: skip over empty buckets 2024-01-23 15:13:55 +01:00
inet_fragment.c inet: frags: delay fqdir_free_fn() 2024-04-08 10:59:56 +01:00
inet_hashtables.c tcp: get rid of twsk_unique() 2024-05-09 20:25:55 -07:00
inet_timewait_sock.c tcp/dccp: do not care about families in inet_twsk_purge() 2024-04-01 21:27:58 -07:00
inetpeer.c net: ipv4: Simplify the allocation of slab caches in inet_initpeers 2024-01-31 16:39:42 -08:00
ip_forward.c net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check 2023-10-13 09:58:45 -07:00
ip_fragment.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
ip_gre.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip_input.c inet: introduce dst_rtable() helper 2024-04-30 18:32:38 -07:00
ip_options.c
ip_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-02 12:06:25 -07:00
ip_sockglue.c inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT 2024-03-06 12:37:06 +00:00
ip_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ip_tunnel.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip_vti.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipcomp.c
ipconfig.c net: ipconfig: move ic_nameservers_fallback into #ifdef block 2023-05-22 11:17:55 +01:00
ipip.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipmr_base.c
ipmr.c ip_tunnel: use a separate struct to store tunnel params in the kernel 2024-04-01 10:49:28 +01:00
Kconfig net/tcp: Add TCP-AO config and structures 2023-10-27 10:35:44 +01:00
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-01-23 21:37:25 -08:00
netfilter.c xfrm: pass struct net to xfrm_decode_session wrappers 2023-10-06 08:31:53 +02:00
netlink.c
nexthop.c nexthop: fix uninitialized variable in nla_put_nh_group_stats() 2024-03-22 18:03:29 -07:00
ping.c bpf-next-for-netdev 2023-10-16 21:05:33 -07:00
proc.c net: add <net/proto_memory.h> 2024-04-30 18:46:52 -07:00
protocol.c
raw_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
raw.c ipv4: Fix uninit-value access in __ip_make_skb() 2024-05-02 10:16:35 +02:00
route.c net: fix __dst_negative_advice() race 2024-05-29 17:34:49 -07:00
syncookies.c tcp: annotate data-races around tp->window_clamp 2024-04-05 22:32:37 -07:00
sysctl_net_ipv4.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
tcp_ao.c net/tcp_ao: Don't leak ao_info on error-path 2024-06-19 17:30:19 -07:00
tcp_bbr.c tcp: Add new args for cong_control in tcp_congestion_ops 2024-05-02 16:26:56 -07:00
tcp_bic.c
tcp_bpf.c tcp_bpf: properly release resources on error paths 2023-10-18 18:09:31 -07:00
tcp_cdg.c
tcp_cong.c bpf, net: validate struct_ops when updating value. 2024-03-04 10:03:57 -08:00
tcp_cubic.c bpf: Remove CONFIG_X86 and CONFIG_DYNAMIC_FTRACE guard from the tcp-cc kfuncs 2024-03-28 18:31:40 -07:00
tcp_dctcp.c tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). 2024-05-21 13:34:50 +02:00
tcp_dctcp.h
tcp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
tcp_fastopen.c inet: move inet->defer_connect to inet->inet_flags 2023-08-16 11:09:18 +01:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO 2024-06-25 17:22:49 -07:00
tcp_ipv4.c tcp: reduce accepted window in NEW_SYN_RECV state 2024-05-27 16:47:23 -07:00
tcp_lp.c tcp: rename tcp_time_stamp() to tcp_time_stamp_ts() 2023-10-23 09:35:01 +01:00
tcp_metrics.c tcp_metrics: use parallel_ops for tcp_metrics_nl_family 2024-04-17 18:31:53 -07:00
tcp_minisocks.c tcp: reduce accepted window in NEW_SYN_RECV state 2024-05-27 16:47:23 -07:00
tcp_nv.c
tcp_offload.c net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment 2024-05-13 14:44:06 -07:00
tcp_output.c tcp: remove 64 KByte limit for initial tp->rcv_wnd value 2024-05-23 12:21:17 +02:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c
tcp_recovery.c tcp: fix excessive TLP and RACK timeouts from HZ rounding 2023-10-17 17:25:42 -07:00
tcp_scalable.c
tcp_sigpool.c net/tcp_sigpool: Use kref_get_unless_zero() 2024-01-01 14:42:05 +00:00
tcp_timer.c tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() 2024-06-10 19:50:10 -07:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB 2024-06-05 12:32:46 +01:00
tunnel4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
udp_impl.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
udp_offload.c net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment 2024-05-13 14:44:06 -07:00
udp_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
udp_tunnel_nic.c udp_tunnel: Use flex array to simplify code 2023-10-03 11:39:34 +02:00
udp_tunnel_stub.c
udp.c ipsec-next-2024-05-03 2024-05-06 19:14:56 -07:00
udplite.c udplite: remove UDPLITE_BIT 2023-09-14 16:16:36 +02:00
xfrm4_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
xfrm4_output.c
xfrm4_policy.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00