linux/drivers/net
Daniel Borkmann d5256083f6 ipvlan, l3mdev: fix broken l3s mode wrt local routes
While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.

Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
f5a0aab84b ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24c2 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net->loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.

Now judging from 4fbae7d83c ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb->dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.

Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev->priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.

  [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf

Fixes: f5a0aab84b ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24c2 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d83c ("ipvlan: Introduce l3s mode")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Martynas Pumputis <m@lambda.lt>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 22:13:34 -08:00
..
appletalk drivers/net: appletalk/cops: remove redundant if statement and mask 2018-12-24 14:48:26 -08:00
arcnet
bonding bonding: update nest level on unlink 2019-01-10 16:49:39 -05:00
caif net: caif: call dev_consume_skb_any when skb xmit done 2019-01-29 10:09:28 -08:00
can can: flexcan: fix NULL pointer exception during bringup 2019-01-22 11:35:33 +01:00
dsa net: dsa: mv88e6xxx: Fix serdes irq setup going recursive 2019-01-27 23:19:19 -08:00
ethernet ucc_geth: Reset BQL queue when stopping device 2019-01-30 10:36:23 -08:00
fddi cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
fjes fjes: convert to DEFINE_SHOW_ATTRIBUTE 2018-12-10 12:05:20 -08:00
hamradio net/hamradio/6pack: use mod_timer() to rearm timers 2019-01-02 10:27:01 -08:00
hippi
hyperv hv_netvsc: fix typos in code comments 2019-01-23 13:21:34 -05:00
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00
ipvlan ipvlan, l3mdev: fix broken l3s mode wrt local routes 2019-01-30 22:13:34 -08:00
netdevsim drivers: net: netdevsim: use skb_sec_path helper 2018-12-19 11:21:37 -08:00
phy net: phy: Fixup GPLv2+ SPDX tags based on license text 2019-01-22 20:57:03 -08:00
plip
ppp net: Fix usage of pskb_trim_rcsum 2019-01-18 14:05:14 -08:00
slip
team net: dev: Add extack argument to dev_set_mac_address() 2018-12-13 18:41:38 -08:00
usb net: usb: asix: ax88772_bind return error when hw_reset fail 2019-01-24 22:33:11 -08:00
vmxnet3 cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
wan Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-16 05:13:36 +12:00
wimax
wireless virt_wifi: fix error return code in virt_wifi_newlink() 2019-01-19 09:12:16 +01:00
xen-netback net: xenbus: convert to DEFINE_SHOW_ATTRIBUTE 2018-12-10 12:05:20 -08:00
dummy.c
eql.c
geneve.c geneve: Initialize addr6 with memset 2018-11-17 22:03:06 -08:00
gtp.c
ifb.c
Kconfig net: Fix typo in NET_FAILOVER help text 2019-01-18 14:06:29 -08:00
LICENSE.SRC
loopback.c
macsec.c
macvlan.c macvlan: replace kfree_skb by consume_skb for drop profiles 2019-01-17 22:09:09 -08:00
macvtap.c
Makefile
mdio.c
mii.c
net_failover.c net: core: dev: Add extack argument to dev_open() 2018-12-06 13:26:06 -08:00
netconsole.c
nlmon.c
ntb_netdev.c ntb_netdev: Simplify remove with client device drvdata 2018-10-31 21:20:05 -04:00
rionet.c rapidio/rionet: do not free skb before reading its length 2018-11-28 10:38:48 -08:00
sb1000.c
Space.c
sungem_phy.c
tap.c tap: call skb_probe_transport_header after setting skb->dev 2019-01-01 12:01:02 -08:00
thunderbolt.c
tun.c tun: move the call to tun_set_real_num_queues 2019-01-30 21:40:25 -08:00
veth.c net: Add extack argument to rtnl_create_link 2018-11-06 15:00:45 -08:00
virtio_net.c virtio_net: Differentiate sk_buff and xdp_frame on freeing 2019-01-30 14:02:43 -08:00
vrf.c net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
vsockmon.c
vxlan.c vxlan: Correct merge error. 2018-12-20 16:14:22 -08:00
xen-netfront.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00