linux/net/ipv4
Shmulik Ladkani b8247f095e net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs
Given:
 - tap0 and vxlan0 are bridged
 - vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400)

Assume GSO skbs arriving from tap0 having a gso_size as determined by
user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500).

After encapsulation these skbs have skb_gso_network_seglen that exceed
eth0's ip_skb_dst_mtu.

These skbs are accidentally passed to ip_finish_output2 AS IS.
Alas, each final segment (segmented either by validate_xmit_skb or by
hardware UFO) would be larger than eth0 mtu.
As a result, those above-mtu segments get dropped on certain networks.

This behavior is not aligned with the NON-GSO case:
Assume a non-gso 1500-sized IP packet arrives from tap0. After
encapsulation, the vxlan datagram is fragmented normally at the
ip_finish_output-->ip_fragment code path.

The expected behavior for the GSO case would be segmenting the
"gso-oversized" skb first, then fragmenting each segment according to
dst mtu, and finally passing the resulting fragments to ip_finish_output2.

'ip_finish_output_gso' already supports this "Slowpath" behavior,
according to the IPSKB_FRAG_SEGS flag, which is only set during ipv4
forwarding (not set in the bridged case).

In order to support the bridged case, we'll mark skbs arriving from an
ingress interface that get udp-encaspulated as "allowed to be fragmented",
causing their network_seglen to be validated by 'ip_finish_output_gso'
(and fragment if needed).

Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the
gso and non-gso cases), which serves users wishing to forbid
fragmentation at the udp tunnel endpoint.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 16:40:22 -07:00
..
netfilter netfilter: Convert FWINV<[foo]> macros and uses to NF_INVF 2016-07-03 10:55:07 +02:00
af_inet.c ipv4: af_inet: make it explicitly non-modular 2016-07-11 22:44:26 -07:00
ah4.c ah4: Fix error return in ah_input(). 2015-08-25 13:38:50 -07:00
arp.c net: rename NET_{ADD|INC}_STATS_BH() 2016-04-27 22:48:24 -04:00
cipso_ipv4.c net: introduce lockdep_is_held and update various places to use it 2016-04-07 16:44:14 -04:00
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c ipv4: do not abuse GFP_ATOMIC in inet_netconf_notify_devconf() 2016-07-09 18:12:25 -04:00
esp4.c esp: Fix ESN generation under UDP encapsulation 2016-06-23 11:52:00 -04:00
fib_frontend.c net: vrf: Create FIB tables on link create 2016-05-06 15:51:47 -04:00
fib_lookup.h ipv4: consider TOS in fib_select_default 2015-07-24 22:46:11 -07:00
fib_rules.c net: Add l3mdev rule 2016-06-08 11:36:02 -07:00
fib_semantics.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-15 13:32:48 -04:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-02-01 15:56:08 -08:00
fou.c gue: Implement direction IP encapsulation 2016-06-07 23:51:14 -07:00
gre_demux.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
gre_offload.c gso: Remove arbitrary checks for unsupported GSO 2016-05-20 18:03:15 -04:00
icmp.c net: icmp: rename ICMPMSGIN_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
igmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
inet_connection_sock.c tcp: two more missing bh disable 2016-05-04 23:47:54 -04:00
inet_diag.c net: diag: Add support to filter on device index 2016-06-28 05:25:04 -04:00
inet_fragment.c net: disable fragment reassembly if high_thresh is zero 2016-06-05 22:56:42 -04:00
inet_hashtables.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-04 00:52:29 -04:00
inet_timewait_sock.c tcp: must block bh in __inet_twsk_hashdance() 2016-05-04 16:55:11 -04:00
inetpeer.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
ip_forward.c net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags 2016-07-19 16:40:22 -07:00
ip_fragment.c net: rename IP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
ip_gre.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
ip_input.c net: original ingress device index in PKTINFO 2016-05-11 19:31:40 -04:00
ip_options.c net: ipv4: Convert IP network timestamps to be y2038 safe 2016-03-01 17:18:44 -05:00
ip_output.c net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags 2016-07-19 16:40:22 -07:00
ip_sockglue.c sock: propagate __sock_cmsg_send() error 2016-05-16 13:46:23 -04:00
ip_tunnel_core.c net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs 2016-07-19 16:40:22 -07:00
ip_tunnel.c net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloads 2016-06-15 21:39:59 -07:00
ip_vti.c vti: Add pmtu handling to vti_xmit. 2016-03-31 08:59:56 +02:00
ipcomp.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ipconfig.c ipconfig: Protect ic_addrservaddr with IPCONFIG_DYNAMIC. 2016-06-11 20:40:24 -07:00
ipip.c ipip: support MPLS over IPv4 2016-07-09 17:45:56 -04:00
ipmr.c net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags 2016-07-19 16:40:22 -07:00
Kconfig tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
Makefile tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
netfilter.c ipv4: Pass struct net into ip_route_me_harder 2015-09-29 20:21:32 +02:00
ping.c sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
proc.c ipv4: Namespaceify ip_default_ttl sysctl knob 2016-02-16 20:42:54 -05:00
protocol.c net: Export inet_offloads and inet6_offloads 2014-09-19 17:15:31 -04:00
raw.c sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
route.c net: l3mdev: Allow send on enslaved interface 2016-05-09 22:33:52 -04:00
syncookies.c net: rename NET_{ADD|INC}_STATS_BH() 2016-04-27 22:48:24 -04:00
sysctl_net_ipv4.c ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n 2016-05-23 14:32:06 -07:00
tcp_bic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_cdg.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_cong.c tcp: remove tcp_ecn_make_synack() socket argument 2015-09-25 13:00:38 -07:00
tcp_cubic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_dctcp.c tcp: return sizeof tcp_dctcp_info in dctcp_get_info() 2016-06-14 23:46:30 -07:00
tcp_diag.c net: diag: Support destroying TCP sockets. 2015-12-15 23:26:52 -05:00
tcp_fastopen.c tcp: do not assume TCP code is non preemptible 2016-05-02 17:02:25 -04:00
tcp_highspeed.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_htcp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_hybla.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_illinois.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_input.c tcp: add in_flight to tcp_skb_cb 2016-06-10 23:07:49 -07:00
tcp_ipv4.c tcp: md5: use kmalloc() backed scratch areas 2016-07-01 04:02:55 -04:00
tcp_lp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_metrics.c libnl: nla_put_msecs(): align on a 64-bit area 2016-04-23 20:13:24 -04:00
tcp_minisocks.c tcp: do not assume TCP code is non preemptible 2016-05-02 17:02:25 -04:00
tcp_nv.c tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
tcp_offload.c gso: Remove arbitrary checks for unsupported GSO 2016-05-20 18:03:15 -04:00
tcp_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
tcp_probe.c net: ipv4: tcp_probe: Replace timespec with timespec64 2016-03-01 17:18:44 -05:00
tcp_recovery.c tcp: do not assume TCP code is non preemptible 2016-05-02 17:02:25 -04:00
tcp_scalable.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_timer.c tcp_timer.c: Add kernel-doc function descriptions 2016-07-15 23:18:14 -07:00
tcp_vegas.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_vegas.h tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_veno.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_westwood.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_yeah.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp.c tcp: md5: use kmalloc() backed scratch areas 2016-07-01 04:02:55 -04:00
tunnel4.c tunnels: correct conditional build of MPLS and IPv6 2016-07-11 13:27:06 -07:00
udp_diag.c udp: no longer use SLAB_DESTROY_BY_RCU 2016-04-04 22:11:19 -04:00
udp_impl.h net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
udp_offload.c gso: Remove arbitrary checks for unsupported GSO 2016-05-20 18:03:15 -04:00
udp_tunnel.c net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
udp.c udp reuseport: fix packet of same flow hashed to different socket 2016-06-14 17:23:09 -04:00
udplite.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
xfrm4_input.c netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipv4: hash net ptr into fragmentation bucket selection 2015-03-25 14:07:04 -04:00
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c net: xfrm: fix old-style declaration 2016-06-16 22:06:30 -07:00
xfrm4_protocol.c xfrm4: Remove duplicate semicolon 2014-06-30 07:49:47 +02:00
xfrm4_state.c
xfrm4_tunnel.c