linux/net/core
Lukas Wunner 8537f78647 netfilter: Introduce egress hook
Commit e687ad60af ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets on ingress.

Allow the same on egress.  Position the hook immediately before a packet
is handed to tc and then sent out on an interface, thereby mirroring the
ingress order.  This order allows marking packets in the netfilter
egress hook and subsequently using the mark in tc.  Another benefit of
this order is consistency with a lot of existing documentation which
says that egress tc is performed after netfilter hooks.

Egress hooks already exist for the most common protocols, such as
NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
they are executed earlier during packet processing.  However for more
exotic protocols, there is currently no provision to apply netfilter on
egress.  A common workaround is to enslave the interface to a bridge and
use ebtables, or to resort to tc.  But when the ingress hook was
introduced, consensus was that users should be given the choice to use
netfilter or tc, whichever tool suits their needs best:
https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/
This hook is also useful for NAT46/NAT64, tunneling and filtering of
locally generated af_packet traffic such as dhclient.

There have also been occasional user requests for a netfilter egress
hook in the past, e.g.:
https://www.spinics.net/lists/netfilter/msg50038.html

Performance measurements with pktgen surprisingly show a speedup rather
than a slowdown with this commit:

* Without this commit:
  Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
  2920481pps 1401Mb/sec (1401830880bps) errors: 0

* With this commit:
  Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
  2941410pps 1411Mb/sec (1411876800bps) errors: 0

* Without this commit + tc egress:
  Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
  2562631pps 1230Mb/sec (1230062880bps) errors: 0

* With this commit + tc egress:
  Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
  2659259pps 1276Mb/sec (1276444320bps) errors: 0

* With this commit + nft egress:
  Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
  2413320pps 1158Mb/sec (1158393600bps) errors: 0

Tested on a bare-metal Core i7-3615QM, each measurement was performed
three times to verify that the numbers are stable.

Commands to perform a measurement:
modprobe pktgen
echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000

Commands for testing tc egress:
tc qdisc add dev lo clsact
tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32

Commands for testing nft egress:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop

All testing was performed on the loopback interface to avoid distorting
measurements by the packet handling in the low-level Ethernet driver.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-18 01:20:15 +01:00
..
bpf_sk_storage.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-02-29 15:53:35 -08:00
datagram.c net: datagram: drop 'destructor' argument from several helpers 2020-02-28 12:12:53 -08:00
datagram.h net/core: Allow the compiler to verify declaration and definition consistency 2019-03-27 13:49:44 -07:00
dev_addr_lists.c net: remove unnecessary variables and callback 2019-10-24 14:53:49 -07:00
dev_ioctl.c net: Introduce peer to peer one step PTP time stamping. 2019-12-25 19:51:34 -08:00
dev.c netfilter: Introduce egress hook 2020-03-18 01:20:15 +01:00
devlink.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
drop_monitor.c net: core: Replace zero-length array with flexible-array member 2020-02-28 12:08:37 -08:00
dst_cache.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
dst.c net: print proper warning on dst underflow 2019-09-26 09:05:56 +02:00
failover.c failover: allow name change on IFF_UP slave interfaces 2019-04-10 22:12:26 -07:00
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c net: fib_rules: Correctly set table field when table number exceeds 8 bits 2020-02-16 18:38:24 -08:00
filter.c bpf: Add bpf_trampoline_ name prefix for DECLARE_BPF_DISPATCHER 2020-03-13 12:49:51 -07:00
flow_dissector.c bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites. 2020-02-24 16:20:09 -08:00
flow_offload.c flow_offload: Add flow_match_ct to get rule ct match 2020-03-12 15:00:39 -07:00
gen_estimator.c net_sched: gen_estimator: extend packet counter to 64bit 2019-11-06 21:51:36 -08:00
gen_stats.c net_sched: add TCA_STATS_PKT64 attribute 2019-11-05 18:20:55 -08:00
gro_cells.c gro_cells: make sure device is up in gro_cells_receive() 2019-03-10 11:07:14 -07:00
hwbm.c net: hwbm: Make the hwbm_pool lock a mutex 2019-06-09 19:40:10 -07:00
link_watch.c net: link_watch: prevent starvation when processing linkwatch wq 2019-07-01 19:02:47 -07:00
lwt_bpf.c net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-04 12:27:13 -08:00
lwtunnel.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
Makefile ethtool: move to its own directory 2019-12-12 17:07:05 -08:00
neighbour.c net: neigh: remove unused NEIGH_SYSCTL_MS_JIFFIES_ENTRY 2020-02-20 10:02:23 -08:00
net_namespace.c netns: Constify exported functions 2020-01-17 13:25:24 +01:00
net-procfs.c net: procfs: use index hashlist instead of name hashlist 2019-10-01 14:47:19 -07:00
net-sysfs.c net-sysfs: add queue_change_owner() 2020-02-26 20:07:26 -08:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c page_pool: add tracepoints for page_pool with details need by XDP 2019-06-19 11:23:13 -04:00
netclassid_cgroup.c cgroup, netclassid: periodically release file_lock on classid updating 2020-03-09 18:13:39 -07:00
netevent.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
netpoll.c net: fix skb use after free in netpoll 2019-08-27 20:52:02 -07:00
netprio_cgroup.c netprio: use css ID instead of cgroup ID 2019-11-12 08:18:03 -08:00
page_pool.c net: page_pool: API cleanup and comments 2020-02-20 10:09:25 -08:00
pktgen.c pktgen: Allow on loopback device 2020-03-10 15:44:59 -07:00
ptp_classifier.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
request_sock.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
rtnetlink.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-02-21 13:39:34 -08:00
scm.c y2038: socket: remove timespec reference in timestamping 2019-11-15 14:38:29 +01:00
secure_seq.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
skbuff.c net: core: Distribute switch variables for initialization 2020-02-20 10:00:19 -08:00
skmsg.c bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites. 2020-02-24 16:20:09 -08:00
sock_diag.c sock: make cookie generation global instead of per netns 2019-08-09 13:14:46 -07:00
sock_map.c bpf: sockmap: Add UDP support 2020-03-09 22:34:58 +01:00
sock_reuseport.c net: Generate reuseport group ID on group creation 2020-02-21 22:29:45 +01:00
sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
stream.c tcp: make sure EPOLLOUT wont be missed 2019-08-19 13:07:43 -07:00
sysctl_net_core.c net, sysctl: Fix compiler warning when only cBPF is present 2019-12-19 17:17:51 +01:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: Use skb accessors in network core 2019-07-22 20:47:56 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c net: page_pool: API cleanup and comments 2020-02-20 10:09:25 -08:00