linux/net
Daniel Borkmann d346a3fae3 packet: introduce PACKET_QDISC_BYPASS socket option
This patch introduces a PACKET_QDISC_BYPASS socket option, that
allows for using a similar xmit() function as in pktgen instead
of taking the dev_queue_xmit() path. This can be very useful when
PF_PACKET applications are required to be used in a similar
scenario as pktgen, but with full, flexible packet payload that
needs to be provided, for example.

On default, nothing changes in behaviour for normal PF_PACKET
TX users, so everything stays as is for applications. New users,
however, can now set PACKET_QDISC_BYPASS if needed to prevent
own packets from i) reentering packet_rcv() and ii) to directly
push the frame to the driver.

In doing so we can increase pps (here 64 byte packets) for
PF_PACKET a bit:

  # CPUs -- QDISC_BYPASS   -- qdisc path -- qdisc path[**]
  1 CPU  ==  1,509,628 pps --  1,208,708 --  1,247,436
  2 CPUs ==  3,198,659 pps --  2,536,012 --  1,605,779
  3 CPUs ==  4,787,992 pps --  3,788,740 --  1,735,610
  4 CPUs ==  6,173,956 pps --  4,907,799 --  1,909,114
  5 CPUs ==  7,495,676 pps --  5,956,499 --  2,014,422
  6 CPUs ==  9,001,496 pps --  7,145,064 --  2,155,261
  7 CPUs == 10,229,776 pps --  8,190,596 --  2,220,619
  8 CPUs == 11,040,732 pps --  9,188,544 --  2,241,879
  9 CPUs == 12,009,076 pps -- 10,275,936 --  2,068,447
 10 CPUs == 11,380,052 pps -- 11,265,337 --  1,578,689
 11 CPUs == 11,672,676 pps -- 11,845,344 --  1,297,412
 [...]
 20 CPUs == 11,363,192 pps -- 11,014,933 --  1,245,081

 [**]: qdisc path with packet_rcv(), how probably most people
       seem to use it (hopefully not anymore if not needed)

The test was done using a modified trafgen, sending a simple
static 64 bytes packet, on all CPUs.  The trick in the fast
"qdisc path" case, is to avoid reentering packet_rcv() by
setting the RAW socket protocol to zero, like:
socket(PF_PACKET, SOCK_RAW, 0);

Tradeoffs are documented as well in this patch, clearly, if
queues are busy, we will drop more packets, tc disciplines are
ignored, and these packets are not visible to taps anymore. For
a pktgen like scenario, we argue that this is acceptable.

The pointer to the xmit function has been placed in packet
socket structure hole between cached_dev and prot_hook that
is hot anyway as we're working on cached_dev in each send path.

Done in joint work together with Jesper Dangaard Brouer.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:23:33 -05:00
..
9p Nothing really exciting: some groundwork for changing virtio endian, and 2013-11-15 13:28:47 +09:00
802 mrp: add periodictimer to allow retries when packets get lost 2013-09-23 16:53:52 -04:00
8021q Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:30:30 +09:00
appletalk net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
atm net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
ax25 net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
batman-adv batman-adv: generalize batman-adv icmp packet handling 2013-10-23 17:03:47 +02:00
bluetooth net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-12-09 20:20:14 -05:00
caif net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
can Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-11-15 16:47:22 -08:00
ceph net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
core net: dev: move inline skb_needs_linearize helper to header 2013-12-09 20:23:33 -05:00
dcb net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
dccp ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE 2013-11-05 21:52:27 -05:00
decnet netfilter: pass hook ops to hookfn 2013-10-14 11:29:31 +02:00
dns_resolver net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
dsa net: dsa: inherit addr_assign_type along with dev_addr 2013-09-03 20:57:49 -04:00
ethernet ethernet: use likely() for common Ethernet encap 2013-09-30 21:52:53 -07:00
hsr net/hsr: Support iproute print_opt ('ip -details ...') 2013-11-30 12:48:14 -05:00
ieee802154 genetlink: make multicast groups const, prevent abuse 2013-11-19 16:39:06 -05:00
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-12-09 20:20:14 -05:00
ipv6 ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses 2013-12-06 16:34:43 -05:00
ipx net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
irda net/irda: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
iucv net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
key net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
l2tp inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions 2013-11-23 14:46:23 -08:00
lapb net/lapb: re-send packets on timeout 2013-09-23 16:52:45 -04:00
llc net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2013-12-06 09:50:45 -05:00
mac802154 6lowpan: set and use mac_len for mac header length 2013-10-30 17:18:46 -04:00
mpls ipip: add GSO/TSO support 2013-10-19 19:36:19 -04:00
netfilter netfilter: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
netlabel netlabel: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
netlink genetlink/pmcraid: use proper genetlink multicast API 2013-11-28 18:26:30 -05:00
netrom net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
nfc net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-11-19 15:50:47 -08:00
packet packet: introduce PACKET_QDISC_BYPASS socket option 2013-12-09 20:23:33 -05:00
phonet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-11-19 15:50:47 -08:00
rds rds: prevent BUG_ON triggered on congestion update to loopback 2013-12-03 11:54:18 -05:00
rfkill net: rfkill: gpio: add ACPI support 2013-10-28 15:05:25 +01:00
rose net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
rxrpc net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-12-09 20:20:14 -05:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-12-09 20:20:14 -05:00
sunrpc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-11-20 14:25:39 -08:00
tipc net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
unix unix: convert printks to pr_<level> 2013-12-06 16:35:58 -05:00
vmw_vsock net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
wimax wimax: remove dead code 2013-11-21 13:09:42 -05:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2013-12-06 09:50:45 -05:00
x25 net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
xfrm net: move pskb_put() to core code 2013-11-07 19:28:58 -05:00
compat.c net: clamp ->msg_namelen instead of returning an error 2013-11-29 16:12:52 -05:00
Kconfig kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly 2013-11-21 16:42:27 -08:00
Makefile net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) 2013-11-03 23:20:14 -05:00
nonet.c
socket.c Merge branch 'siocghwtstamp' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next 2013-12-05 19:45:14 -05:00
sysctl_net.c net: Update the sysctl permissions handler to test effective uid/gid 2013-10-07 15:57:56 -04:00