linux/net/ipv4/netfilter
Xin Long 202f59afd4 netfilter: ipt_CLUSTERIP: do not hold dev
It's a terrible thing to hold dev in iptables target. When the dev is
being removed, unregister_netdevice has to wait for the dev to become
free. dmesg will keep logging the err:

  kernel:unregister_netdevice: waiting for veth0_in to become free. \
  Usage count = 1

until iptables rules with this target are removed manually.

The worse thing is when deleting a netns, a virtual nic will be deleted
instead of reset to init_net in default_device_ops exit/exit_batch. As
it is earlier than to flush the iptables rules in iptable_filter_net_ops
exit, unregister_netdevice will block to wait for the nic to become free.

As unregister_netdevice is actually waiting for iptables rules flushing
while iptables rules have to be flushed after unregister_netdevice. This
'dead lock' will cause unregister_netdevice to block there forever. As
the netns is not available to operate at that moment, iptables rules can
not even be flushed manually either.

The reproducer can be:

  # ip netns add test
  # ip link add veth0_in type veth peer name veth0_out
  # ip link set veth0_in netns test
  # ip netns exec test ip link set lo up
  # ip netns exec test ip link set veth0_in up
  # ip netns exec test iptables -I INPUT -d 1.2.3.4 -i veth0_in -j \
    CLUSTERIP --new --clustermac 89:d4:47:eb:9a:fa --total-nodes 3 \
    --local-node 1 --hashmode sourceip-sourceport
  # ip netns del test

This issue can be triggered by all virtual nics with ipt_CLUSTERIP.

This patch is to fix it by not holding dev in ipt_CLUSTERIP, but saving
the dev->ifindex instead of the dev.

As Pablo Neira Ayuso's suggestion, it will refresh c->ifindex and dev's
mc by registering a netdevice notifier, just as what xt_TEE does. So it
removes the old codes updating dev's mc, and also no need to initialize
c->ifindex with dev->ifindex.

But as one config can be shared by more than one targets, and the netdev
notifier is per config, not per target. It couldn't get e->ip.iniface
in the notifier handler. So e->ip.iniface has to be saved into config.

Note that for backwards compatibility, this patch doesn't remove the
codes checking if the dev exists before creating a config.

v1->v2:
  - As Pablo Neira Ayuso's suggestion, register a netdevice notifier to
    manage c->ifindex and dev's mc.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:06:20 +02:00
..
arp_tables.c netfilter: Remove unnecessary cast on void pointer 2017-04-07 17:29:17 +02:00
arpt_mangle.c
arptable_filter.c netfilter: arp_tables: register table in initns 2016-04-07 11:58:49 +02:00
ip_tables.c netfilter: Remove unnecessary cast on void pointer 2017-04-07 17:29:17 +02:00
ipt_ah.c netfilter: ipv4: whitespace around operators 2015-10-16 19:19:23 +02:00
ipt_CLUSTERIP.c netfilter: ipt_CLUSTERIP: do not hold dev 2017-06-19 19:06:20 +02:00
ipt_ECN.c net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool 2015-08-17 21:33:06 -07:00
ipt_MASQUERADE.c netfilter: nat: add dependencies on conntrack module 2016-12-04 21:16:51 +01:00
ipt_REJECT.c netfilter: x_tables: move hook state into xt_action_param structure 2016-11-03 10:56:21 +01:00
ipt_rpfilter.c netfilter: rpfilter: fix incorrect loopback packet judgment 2017-01-16 14:23:01 +01:00
ipt_SYNPROXY.c netfilter: SYNPROXY: Return NF_STOLEN instead of NF_DROP during handshaking 2017-04-26 09:30:22 +02:00
iptable_filter.c netfilter: xtables: don't hook tables by default 2016-03-02 20:05:24 +01:00
iptable_mangle.c netfilter: x_tables: simplify ip{6}table_mangle_hook() 2016-07-01 16:37:02 +02:00
iptable_nat.c netfilter: xtables: don't hook tables by default 2016-03-02 20:05:24 +01:00
iptable_raw.c netfilter: xtables: don't hook tables by default 2016-03-02 20:05:24 +01:00
iptable_security.c netfilter: xtables: don't hook tables by default 2016-03-02 20:05:24 +01:00
Kconfig netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c 2016-11-01 20:50:31 +01:00
Makefile netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c 2016-11-01 20:50:31 +01:00
nf_conntrack_l3proto_ipv4.c netfilter: don't track fragmented packets 2017-03-08 18:02:12 +01:00
nf_conntrack_proto_icmp.c netfilter: add and use nf_ct_set helper 2017-02-02 14:31:54 +01:00
nf_defrag_ipv4.c skbuff: add and use skb_nfct helper 2017-02-02 14:31:53 +01:00
nf_dup_ipv4.c netfilter: kill the fake untracked conntrack objects 2017-04-15 11:47:57 +02:00
nf_log_arp.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
nf_log_ipv4.c netfilter: allow logging from non-init namespaces 2017-02-02 14:31:58 +01:00
nf_nat_h323.c netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report() 2014-02-05 17:46:05 +01:00
nf_nat_l3proto_ipv4.c netfilter: don't attach a nat extension by default 2017-04-26 09:30:22 +02:00
nf_nat_masquerade_ipv4.c netfilter: conntrack: rename nf_ct_iterate_cleanup 2017-05-29 12:46:08 +02:00
nf_nat_pptp.c netfilter: pptp: attach nat extension when needed 2017-04-26 09:30:22 +02:00
nf_nat_proto_gre.c netfilter: gre: Use consistent GRE and PTTP header structure instead of the ones defined by netfilter 2016-09-07 10:36:52 +02:00
nf_nat_proto_icmp.c net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool 2015-08-17 21:33:06 -07:00
nf_nat_snmp_basic.c netfilter: snmp: avoid stack size warning 2017-05-01 11:43:58 +02:00
nf_reject_ipv4.c sk_buff: remove support for csum_bad in sk_buff 2017-05-19 19:21:29 -04:00
nf_socket_ipv4.c netfilter: remove nf_ct_is_untracked 2017-04-15 11:51:33 +02:00
nf_tables_arp.c netfilter: Add the missed return value check of nft_register_chain_type 2016-09-12 19:54:45 +02:00
nf_tables_ipv4.c netfilter: Add the missed return value check of nft_register_chain_type 2016-09-12 19:54:45 +02:00
nft_chain_nat_ipv4.c netfilter: Pass priv instead of nf_hook_ops to netfilter hooks 2015-09-18 22:00:16 +02:00
nft_chain_route_ipv4.c netfilter: nft_chain_route: re-route before skb is queued to userspace 2016-09-06 18:02:37 +02:00
nft_dup_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-11-15 10:54:36 -05:00
nft_fib_ipv4.c netfilter: Remove exceptional & on function name 2017-04-07 18:24:47 +02:00
nft_masq_ipv4.c netfilter: nf_tables: fix mismatch in big-endian system 2017-03-13 13:30:28 +01:00
nft_redir_ipv4.c netfilter: nf_tables: fix mismatch in big-endian system 2017-03-13 13:30:28 +01:00
nft_reject_ipv4.c netfilter: nf_tables: use hook state from xt_action_param structure 2016-11-03 11:52:34 +01:00