linux/include/net
Daniel Borkmann d5256083f6 ipvlan, l3mdev: fix broken l3s mode wrt local routes
While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.

Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
f5a0aab84b ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24c2 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net->loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.

Now judging from 4fbae7d83c ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb->dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.

Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev->priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.

  [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf

Fixes: f5a0aab84b ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24c2 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d83c ("ipvlan: Introduce l3s mode")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Martynas Pumputis <m@lambda.lt>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 22:13:34 -08:00
..
9p 9p: Add refcount to p9_req_t 2018-09-08 01:39:47 +09:00
bluetooth Bluetooth: Errata Service Release 8, Erratum 3253 2018-10-14 10:25:47 +02:00
caif caif: reduce stack size with KASAN 2018-01-19 14:02:12 -05:00
iucv net/af_iucv: locate IUCV header via skb_network_header() 2018-09-26 09:56:07 -07:00
netfilter netfilter: nft_flow_offload: fix interaction with vrf slave device 2019-01-11 00:55:37 +01:00
netns Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2018-12-20 18:20:26 -08:00
nfc NFC: Fix the number of pipes 2018-09-18 19:55:01 -07:00
phonet proc: introduce proc_create_net{,_data} 2018-05-16 07:24:30 +02:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 21:43:31 -08:00
tc_act Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux 2018-07-20 21:17:12 -07:00
6lowpan.h
act_api.h net/sched: Remove egdev mechanism 2018-12-10 15:54:34 -08:00
addrconf.h net/ipv6: Add anycast addresses to a global hashtable 2018-11-02 23:54:56 -07:00
af_ieee802154.h ieee802154: add rx LQI from userspace 2018-07-13 12:18:18 -04:00
af_rxrpc.h Revert "rxrpc: Allow failed client calls to be retried" 2019-01-15 21:33:36 -08:00
af_unix.h net: drop a space before tabs 2018-10-31 12:37:12 -07:00
af_vsock.h vsock: split dwork to avoid reinitializations 2018-08-07 12:39:13 -07:00
ah.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
arp.h ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY 2018-01-15 14:53:43 -05:00
atmclip.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ax25.h ax25: fix possible use-after-free 2019-01-23 11:18:00 -08:00
ax88796.h net-next: ax88796: add interrupt status callback to platform data 2018-04-19 16:11:11 -04:00
bond_3ad.h include/net/bond_3ad: Simplify the code by using the ARRAY_SIZE 2018-08-04 13:23:15 -07:00
bond_alb.h
bond_options.h
bonding.h bonding: avoid possible dead-lock 2018-09-26 20:22:19 -07:00
busy_poll.h net: remove sock_poll_busy_flag 2018-07-30 09:10:25 -07:00
calipso.h
cfg80211-wext.h
cfg80211.h cfg80211: clarify LCI/civic location documentation 2018-12-18 13:15:04 +01:00
cfg802154.h
checksum.h Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
cipso_ipv4.h
cls_cgroup.h
codel_impl.h
codel_qdisc.h
codel.h
compat.h net: remove compat_sys_*() prototypes from net/compat.h 2018-04-02 20:16:17 +02:00
datalink.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dcbevent.h
dcbnl.h net: dcb: Add priority-to-DSCP map getters 2018-07-27 13:17:50 -07:00
devlink.h devlink: Add 'fw_load_policy' generic parameter 2018-12-03 13:55:43 -08:00
dn_dev.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dn_fib.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dn_neigh.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dn_nsp.h net/decnet: Convert timers to use timer_setup() 2017-10-18 12:39:36 +01:00
dn_route.h decnet: Move dn_next into decnet route structure. 2017-11-30 09:54:25 -05:00
dn.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-04 09:26:51 +09:00
dsa.h net: dsa: ksz: Rename NET_DSA_TAG_KSZ to _KSZ9477 2018-12-16 14:23:33 -08:00
dsfield.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dst_cache.h net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency 2018-03-05 12:52:45 -05:00
dst_metadata.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-04 09:26:51 +09:00
dst_ops.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dst.h geneve, vxlan: Don't set exceptions if skb->len < mtu 2018-10-17 21:51:13 -07:00
erspan.h erspan: set bso bit based on mirrored packet's len 2018-05-20 18:31:42 -04:00
esp.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ethoc.h inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
failover.h net: Introduce generic failover module 2018-05-28 22:59:54 -04:00
fib_notifier.h net: Add extack to fib_notifier_info 2017-11-01 11:50:43 +09:00
fib_rules.h net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
firewire.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
flow_dissector.h net/flow_dissector: correct comments on enum flow_dissector_key_id 2018-11-30 13:21:52 -08:00
flow.h net: reorder flowi_common fields to avoid holes 2018-11-30 17:12:39 -08:00
fou.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fq_impl.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-30 21:09:24 +09:00
fq.h fq: support filtering a given tin 2017-10-11 09:49:34 +02:00
garp.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
gen_stats.h net: align gnet_stats_basic_cpu struct 2018-11-17 21:37:29 -08:00
genetlink.h genetlink: constify genl_err_attr() argument 2018-08-29 19:42:52 -07:00
geneve.h net: add netif_is_geneve() 2018-11-07 23:00:23 -08:00
gre.h net: Add netif_is_gretap()/netif_is_ip6gretap() 2018-12-10 15:53:04 -08:00
gro_cells.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
gtp.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
gue.h fou: fix some member types in guehdr 2017-12-11 14:10:06 -05:00
hwbm.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
icmp.h net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
ieee80211_radiotap.h mac80211: support reporting 0-length PSDU in radiotap 2018-09-05 10:08:25 +02:00
ieee802154_netdev.h
if_inet6.h net/ipv6: Add anycast addresses to a global hashtable 2018-11-02 23:54:56 -07:00
ife.h net: sched: ife: handle malformed tlv length 2018-04-22 21:12:00 -04:00
ila.h
inet6_connection_sock.h
inet6_hashtables.h net: allow binding socket in a VRF when there's an unbound socket 2018-11-07 16:12:38 -08:00
inet_common.h net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
inet_connection_sock.h inet/connection_sock: prefer _THIS_IP_ to current_text_addr 2018-08-14 10:04:36 -07:00
inet_ecn.h inet: Refactor INET_ECN_decapsulate() 2018-10-17 17:45:07 -07:00
inet_frag.h ip: add helpers to process in-order fragments faster. 2018-08-11 17:54:18 -07:00
inet_hashtables.h net: dccp: fix kernel crash on module load 2018-12-24 15:27:56 -08:00
inet_sock.h net: ensure unbound stream socket to be chosen when not in a VRF 2018-11-07 16:12:38 -08:00
inet_timewait_sock.h net-tcp: remove useless tw_timeout field 2018-06-05 10:45:24 -04:00
inetpeer.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip6_checksum.h
ip6_fib.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00
ip6_route.h net: Add struct for fib dump filter 2018-10-16 00:13:12 -07:00
ip6_tunnel.h udp: Support for error handlers of tunnels with arbitrary destination port 2018-11-08 17:13:08 -08:00
ip_fib.h net: ipv4: Fix memory leak in network namespace dismantle 2019-01-15 13:33:44 -08:00
ip_tunnels.h ip: validate header length on virtual device xmit 2019-01-01 12:05:02 -08:00
ip_vs.h ipvs: add assured state for conn templates 2018-07-18 11:26:40 +02:00
ip.h ip: factor out protocol delivery helper 2018-11-07 16:23:05 -08:00
ipcomp.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ipconfig.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ipv6_frag.h ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module 2018-07-18 11:26:53 +02:00
ipv6.h ipv6: factor out protocol delivery helper 2018-11-07 16:23:05 -08:00
ipx.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
iw_handler.h net: Spelling s/stucture/structure/ 2018-03-27 09:51:23 +02:00
kcm.h
l3mdev.h ipvlan, l3mdev: fix broken l3s mode wrt local routes 2019-01-30 22:13:34 -08:00
lag.h net: Add lag.h, net_lag_port_dev_txable() 2018-07-11 23:10:19 -07:00
lapb.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
lib80211.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
llc_c_ac.h net: LLC: Convert timers to use timer_setup() 2017-10-25 12:06:25 +09:00
llc_c_ev.h
llc_c_st.h
llc_conn.h llc: delete timers synchronously in llc_sk_free() 2018-04-22 14:55:03 -04:00
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h llc: avoid blocking in llc_sap_close() 2018-09-13 09:04:58 -07:00
lwtunnel.h net: Move ipv4 set_lwt_redirect helper to lwtunnel 2018-02-14 14:43:32 -05:00
mac80211.h mac80211: propagate the support for TWT to the driver 2018-12-18 14:18:49 +01:00
mac802154.h
mip6.h
mld.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mpls_iptunnel.h
mpls.h
mrp.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ncsi.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ndisc.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
neighbour.h neighbour: register rtnl doit handler 2018-12-19 13:37:34 -08:00
net_failover.h net: Introduce net_failover driver 2018-05-28 22:59:54 -04:00
net_namespace.h flow_dissector: implements flow dissector BPF hook 2018-09-14 12:04:33 -07:00
net_ratelimit.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
netevent.h net: ipv4: Notify about changes to ip_forward_update_priority 2018-08-01 09:52:30 -07:00
netlabel.h
netlink.h netlink: replace __NLA_ENSURE implementation 2018-10-12 11:00:22 -07:00
netprio_cgroup.h
netrom.h proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
nexthop.h net: fix rtnh_ok() 2018-04-07 22:32:31 -04:00
nl802154.h
nsh.h openvswitch: enable NSH support 2017-11-08 16:12:33 +09:00
p8022.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_pool.h xdp: introduce xdp_return_frame_rx_napi 2018-05-24 18:36:15 -07:00
ping.h ipv{4,6}/ping: simplify proc file creation 2018-05-16 07:23:35 +02:00
pkt_cls.h net_sched: fold tcf_block_cb_call() into tc_setup_cb_call() 2018-12-14 15:32:19 -08:00
pkt_sched.h net: sched: extend Qdisc with rcu 2018-09-25 20:17:35 -07:00
pptp.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
protocol.h net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
psample.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
psnap.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raw.h net: fix raw socket lookup device bind matching with VRFs 2018-11-07 16:12:39 -08:00
rawv6.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
red.h net_sched: red: Avoid illegal values 2017-12-05 14:37:13 -05:00
regulatory.h cfg80211: make wmm_rule part of the reg_rule structure 2018-08-28 11:11:47 +02:00
request_sock.h tcp: socket option to set TCP fast open key 2017-10-20 13:21:36 +01:00
rose.h proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
route.h net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
rsi_91x.h Bluetooth: btrsi: add new rsi bluetooth driver 2018-03-13 18:37:02 +02:00
rtnetlink.h net: Add extack argument to rtnl_create_link 2018-11-06 15:00:45 -08:00
sch_generic.h net: sched: register callbacks for indirect tc block binds 2018-11-11 09:54:52 -08:00
scm.h pids: Compute task_tgid using signal->leader_pid 2018-07-21 10:43:12 -05:00
secure_seq.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
seg6_hmac.h rhashtable: split rhashtable.h 2018-06-22 13:43:27 +09:00
seg6_local.h bpf: add End.DT6 action to bpf_lwt_seg6_action helper 2018-07-31 09:22:48 +02:00
seg6.h net: seg6.h: remove an unused #include 2018-12-20 16:56:04 -08:00
slhc_vj.h slip: Check if rstate is initialized before uncompressing 2018-04-11 10:33:46 -04:00
smc.h net/smc: add pnetid support for SMC-D and ISM 2018-06-30 20:42:25 +09:00
snmp.h
sock_reuseport.h bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog in reuseport selection 2018-08-11 01:58:46 +02:00
sock.h sock: Make sock->sk_stamp thread-safe 2019-01-01 09:47:59 -08:00
Space.h net/mac89x0: Convert to platform_driver 2018-03-01 21:21:36 -05:00
stp.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
strparser.h strparser: Add __strp_unpause and use it in ktls. 2018-06-06 14:07:53 -04:00
switchdev.h net: switchdev: Add extack to switchdev_handle_port_obj_add() callback 2018-12-12 16:34:22 -08:00
tcp_states.h tcp: remove the hardcode in the definition of TCPF Macro 2018-02-21 15:06:05 -05:00
tcp.h tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT 2018-12-04 21:21:18 -08:00
timewait_sock.h
tipc.h flow_dissector: do not rely on implicit casts 2018-05-08 00:02:41 -04:00
tls.h net: tls: Save iv in tls_rec for async crypto requests 2019-01-28 23:05:55 -08:00
transp_v6.h ipv6: fold sockcm_cookie into ipcm6_cookie 2018-07-07 10:58:49 +09:00
tso.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tun_proto.h
udp_tunnel.h udp_tunnel: add config option to bind to a device 2018-12-03 14:15:26 -08:00
udp.h net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udplite.h udplite: fix partial checksum initialization 2018-02-16 15:57:42 -05:00
vsock_addr.h
vxlan.h vxlan: Add vxlan_fdb_clear_offload() 2018-12-07 12:59:08 -08:00
wext.h lift handling of SIOCIW... out of dev_ioctl() 2018-01-24 19:13:45 -05:00
wimax.h
x25.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
x25device.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xdp_sock.h ethtool: don't allow disabling queues with umem installed 2018-10-05 09:31:01 +02:00
xdp.h xdp: export xdp_rxq_info_unreg_mem_model 2018-08-29 12:25:53 -07:00
xfrm.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00