linux/net
Vlad Yasevich 0d5501c1c8 net: Always untag vlan-tagged traffic on input.
Currently the functionality to untag traffic on input resides
as part of the vlan module and is build only when VLAN support
is enabled in the kernel.  When VLAN is disabled, the function
vlan_untag() turns into a stub and doesn't really untag the
packets.  This seems to create an interesting interaction
between VMs supporting checksum offloading and some network drivers.

There are some drivers that do not allow the user to change
tx-vlan-offload feature of the driver.  These drivers also seem
to assume that any VLAN-tagged traffic they transmit will
have the vlan information in the vlan_tci and not in the vlan
header already in the skb.  When transmitting skbs that already
have tagged data with partial checksum set, the checksum doesn't
appear to be updated correctly by the card thus resulting in a
failure to establish TCP connections.

The following is a packet trace taken on the receiver where a
sender is a VM with a VLAN configued.  The host VM is running on
doest not have VLAN support and the outging interface on the
host is tg3:
10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294837885 ecr 0,nop,wscale 7], length 0
10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294838888 ecr 0,nop,wscale 7], length 0

This connection finally times out.

I've only access to the TG3 hardware in this configuration thus have
only tested this with TG3 driver.  There are a lot of other drivers
that do not permit user changes to vlan acceleration features, and
I don't know if they all suffere from a similar issue.

The patch attempt to fix this another way.  It moves the vlan header
stipping code out of the vlan module and always builds it into the
kernel network core.  This way, even if vlan is not supported on
a virtualizatoin host, the virtual machines running on top of such
host will still work with VLANs enabled.

CC: Patrick McHardy <kaber@trash.net>
CC: Nithin Nayak Sujir <nsujir@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-11 12:16:51 -07:00
..
6lowpan 6lowpan: Allow 6LoWPAN to be modular 2014-08-07 11:44:18 -07:00
9p 9P: remove unnecessary break after return 2014-07-15 16:27:00 -07:00
802 net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
8021q net: Always untag vlan-tagged traffic on input. 2014-08-11 12:16:51 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-07-16 14:09:34 -07:00
atm net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
ax25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
batman-adv batman: fix duplicate #include of multicast.h 2014-08-07 16:02:57 -07:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-08-06 09:38:14 -07:00
bridge net: Always untag vlan-tagged traffic on input. 2014-08-11 12:16:51 -07:00
caif caif: remove unnecessary break after goto 2014-07-15 16:27:01 -07:00
can can: add hash based access to single EFF frame filters 2014-05-19 09:38:24 +02:00
ceph KEYS: Ceph: Use user_match() 2014-07-22 21:46:30 +01:00
core net: Always untag vlan-tagged traffic on input. 2014-08-11 12:16:51 -07:00
dcb dcbnl : Fix misleading dcb_app->priority explanation 2014-07-30 17:21:05 -07:00
dccp inet: move ipv6only in sock_common 2014-07-01 23:46:21 -07:00
decnet net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
dns_resolver Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-08-06 08:06:39 -07:00
dsa net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
ethernet net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
hsr net/hsr: Remove left-over never-true conditional code. 2014-07-11 15:04:40 -07:00
ieee802154 inet: frags: use kmem_cache for inet_frag_queue 2014-08-02 15:31:31 -07:00
ipv4 ipv4: removed redundant conditional 2014-08-08 10:22:22 -07:00
ipv6 Merge branch 'akpm' (patchbomb from Andrew Morton) 2014-08-06 21:14:42 -07:00
ipx net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
irda Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-08-06 09:38:14 -07:00
iucv af_iucv: avoid path quiesce of severed path in shutdown() 2014-07-21 20:21:40 -07:00
key af_key: remove unnecessary break after return 2014-07-15 16:27:00 -07:00
l2tp net: use inet6_iif instead of IP6CB()->iif 2014-07-31 22:37:06 -07:00
lapb
llc
mac80211 Merge tag 'master-2014-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-07-28 17:36:25 -07:00
mac802154 net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
mpls gre: Call gso_make_checksum 2014-06-04 22:46:38 -07:00
netfilter netfilter: nf_tables: fix error return code 2014-08-08 16:47:29 +02:00
netlabel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-08-06 09:38:14 -07:00
netlink netlink: reset network header before passing to taps 2014-08-07 16:02:58 -07:00
netrom net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
nfc Merge tag 'master-2014-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-08-05 13:18:20 -07:00
openvswitch openvswitch: fix duplicate #include headers 2014-08-07 16:02:57 -07:00
packet packet: remove deprecated syststamp timestamp 2014-07-29 11:39:50 -07:00
phonet net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-22 13:58:36 -04:00
rose net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-08-06 09:38:14 -07:00
sched net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-08-05 18:46:26 -07:00
sunrpc sched: Allow wait_on_bit_action() functions to support a timeout 2014-07-16 15:10:41 +02:00
tipc tipc: remove duplicated include from socket.c 2014-07-29 15:51:14 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
vmw_vsock vsock: Make transport the proto owner 2014-05-05 13:13:50 -04:00
wimax
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-07-25 10:22:36 -04:00
x25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
xfrm list: fix order of arguments for hlist_add_after(_rcu) 2014-08-06 18:01:24 -07:00
compat.c net: sendmsg: fix NULL pointer dereference 2014-07-29 12:20:22 -07:00
Kconfig 6lowpan: introduce new net/6lowpan directory 2014-07-12 01:53:30 +02:00
Makefile 6lowpan: introduce new net/6lowpan directory 2014-07-12 01:53:30 +02:00
nonet.c
socket.c net-timestamp: sock_tx_timestamp() fix 2014-08-06 12:38:07 -07:00
sysctl_net.c