linux/net/core
Eric Dumazet 2bd82484bb xps: fix xps for stacked devices
A typical qdisc setup is the following :

bond0 : bonding device, using HTB hierarchy
eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc

XPS allows to spread packets on specific tx queues, based on the cpu
doing the send.

Problem is that dequeues from bond0 qdisc can happen on random cpus,
due to the fact that qdisc_run() can dequeue a batch of packets.

CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0

CPUB -> dequeue packet P1 from bond0
        enqueue packet on eth1/eth2
CPUC -> dequeue packet P2 from bond0
        enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)

get_xps_queue() then might select wrong queue for P1, since current cpu
might be different than CPUA.

P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)

Effect of this bug is TCP reorders, and more generally not optimal
TX queue placement. (A victim bulk flow can be migrated to the wrong TX
queue for a while)

To fix this, we have to record sender cpu number the first time
dev_queue_xmit() is called for one tx skb.

We can union napi_id (used on receive path) and sender_cpu,
granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
this union idea)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 13:02:54 -08:00
..
datagram.c skb_copy_datagram_iovec() can die 2014-12-09 16:29:11 -05:00
dev_addr_lists.c net: fix spelling for synchronized 2014-11-18 15:26:32 -05:00
dev_ioctl.c dev_ioctl: use sizeof(x) instead of sizeof x 2014-11-18 15:27:32 -05:00
dev.c dev: add per net_device packet type chains 2015-01-29 14:41:39 -08:00
drop_monitor.c net: Replace get_cpu_var through this_cpu_ptr 2014-08-26 13:45:47 -04:00
dst.c dst: no need to take reference on DST_NOCACHE dsts 2014-12-09 16:08:17 -05:00
ethtool.c ethtool: Extend ethtool plugin module eeprom API to phylib 2015-01-06 17:16:55 -05:00
fib_rules.c netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
filter.c net: sock: fix access via invalid file descriptor 2014-12-10 23:34:27 -05:00
flow_dissector.c xps: fix xps for stacked devices 2015-02-04 13:02:54 -08:00
flow.c CPU hotplug notifiers registration fixes for 3.15-rc1 2014-04-07 14:55:46 -07:00
gen_estimator.c net: sched: make bstats per cpu and estimator RCU safe 2014-09-30 01:02:26 -04:00
gen_stats.c net_sched: fix unused variables in __gnet_stats_copy_basic_cpu() 2014-10-07 00:10:49 -04:00
iovec.c fold verify_iovec() into copy_msghdr_from_user() 2014-11-19 16:23:49 -05:00
link_watch.c net/core: include linux/types.h instead of asm/types.h 2014-11-18 15:26:32 -05:00
Makefile dmaengine-3.17 2014-10-07 20:39:25 -04:00
neighbour.c netlink: Fix bugs in nlmsg_end() conversions. 2015-01-18 23:36:08 -05:00
net_namespace.c vxlan: advertise netns of vxlan dev in fdb msg 2015-01-23 17:51:15 -08:00
net-procfs.c rps: selective flow shedding during softnet overflow 2013-05-20 13:48:04 -07:00
net-sysfs.c net-sysfs: expose physical switch id for particular device 2014-12-02 20:01:22 -08:00
net-sysfs.h net: netdev_kobject_init: annotate with __init 2014-01-05 20:27:54 -05:00
net-traces.c
netclassid_cgroup.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
netevent.c
netpoll.c net: rename vlan_tx_* helpers since "tx" is misleading there 2015-01-13 17:51:08 -05:00
netprio_cgroup.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
pktgen.c net: pktgen: Deletion of an unnecessary check before the function call "proc_remove" 2014-11-19 15:20:15 -05:00
ptp_classifier.c net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
request_sock.c inet: reduce TLB pressure for listeners 2014-06-25 16:37:24 -07:00
rtnetlink.c bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink 2015-02-01 23:16:33 -08:00
scm.c net: introduce helper macro for_each_cmsghdr 2014-12-10 22:41:55 -05:00
secure_seq.c net: use ktime_get_ns() and ktime_get_real_ns() helpers 2014-08-22 19:57:23 -07:00
skbuff.c xps: fix xps for stacked devices 2015-02-04 13:02:54 -08:00
sock_diag.c net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
sock.c net-timestamp: no-payload only sysctl 2015-02-02 18:46:51 -08:00
stream.c net: replace macros net_random and net_srandom with direct calls to prandom 2014-01-14 15:15:25 -08:00
sysctl_net_core.c net-timestamp: no-payload only sysctl 2015-02-02 18:46:51 -08:00
timestamping.c net-timestamp: Make the clone operation stand-alone from phy timestamping 2014-09-05 17:43:45 -07:00
tso.c net: tso: fix unaligned access to crafted TCP header in helper API 2014-10-22 12:52:55 -04:00
utils.c net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited 2014-11-11 14:10:31 -05:00