linux/net
Daniel Borkmann c78e1746d3 net: sched: fix call_rcu() race on classifier module unloads
Vijay reported that a loop as simple as ...

  while true; do
    tc qdisc add dev foo root handle 1: prio
    tc filter add dev foo parent 1: u32 match u32 0 0  flowid 1
    tc qdisc del dev foo root
    rmmod cls_u32
  done

... will panic the kernel. Moreover, he bisected the change
apparently introducing it to 78fd1d0ab0 ("netlink: Re-add
locking to netlink_lookup() and seq walker").

The removal of synchronize_net() from the netlink socket
triggering the qdisc to be removed, seems to have uncovered
an RCU resp. module reference count race from the tc API.
Given that RCU conversion was done after e341694e3e ("netlink:
Convert netlink_lookup() to use RCU protected hash table")
which added the synchronize_net() originally, occasion of
hitting the bug was less likely (not impossible though):

When qdiscs that i) support attaching classifiers and,
ii) have at least one of them attached, get deleted, they
invoke tcf_destroy_chain(), and thus call into ->destroy()
handler from a classifier module.

After RCU conversion, all classifier that have an internal
prio list, unlink them and initiate freeing via call_rcu()
deferral.

Meanhile, tcf_destroy() releases already reference to the
tp->ops->owner module before the queued RCU callback handler
has been invoked.

Subsequent rmmod on the classifier module is then not prevented
since all module references are already dropped.

By the time, the kernel invokes the RCU callback handler from
the module, that function address is then invalid.

One way to fix it would be to add an rcu_barrier() to
unregister_tcf_proto_ops() to wait for all pending call_rcu()s
to complete.

synchronize_rcu() is not appropriate as under heavy RCU
callback load, registered call_rcu()s could be deferred
longer than a grace period. In case we don't have any pending
call_rcu()s, the barrier is allowed to return immediately.

Since we came here via unregister_tcf_proto_ops(), there
are no users of a given classifier anymore. Further nested
call_rcu()s pointing into the module space are not being
done anywhere.

Only cls_bpf_delete_prog() may schedule a work item, to
unlock pages eventually, but that is not in the range/context
of cls_bpf anymore.

Fixes: 25d8c0d55f ("net: rcu-ify tcf_proto")
Fixes: 9888faefe1 ("net: sched: cls_basic use RCU")
Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 18:48:18 -04:00
..
6lowpan 6lowpan: nhc: add other known rfc6282 compressions 2015-02-14 23:08:44 +01:00
9p 9p: patches for 4.1 merge window 2015-04-18 17:45:30 -04:00
802 net: Kill dev_rebuild_header 2015-03-02 16:43:41 -05:00
8021q vlan: Correctly propagate promisc|allmulti flags in notifier. 2015-05-14 00:54:32 -04:00
appletalk appletalk: Use eth_<foo>_addr instead of memset 2015-03-03 17:01:37 -05:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-04-15 09:00:47 -07:00
ax25 ax25: Fix the build when CONFIG_INET is disabled 2015-03-05 13:17:39 -05:00
batman-adv dev: introduce dev_get_iflink() 2015-04-02 14:04:59 -04:00
bluetooth Bluetooth: Fix remote name event return directly. 2015-05-14 10:35:04 +02:00
bridge bridge/nl: remove wrong use of NLM_F_MULTI 2015-04-29 14:59:16 -04:00
caif Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
can can: introduce new raw socket option to join the given CAN filters 2015-04-01 11:28:22 +02:00
ceph crush: straw2 bucket type with an efficient 64-bit crush_ln() 2015-04-22 18:33:43 +03:00
core rtnl/bond: don't send rtnl msg for unregistered iface 2015-05-17 22:43:07 -04:00
dcb net/dcb: Add IEEE QCN attribute 2015-03-06 21:50:02 -05:00
dccp inet: fix possible panic in reqsk_queue_unlink() 2015-04-24 11:39:15 -04:00
decnet netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00
dns_resolver
dsa net: dsa: Fix scope of eeprom-length property 2015-04-29 15:35:04 -04:00
ethernet ethernet: Use eth_<foo>_addr instead of memset 2015-03-03 17:01:38 -05:00
hsr net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface. 2015-03-01 13:40:23 -05:00
ieee802154 ieee802154: trace: fix endian convertion 2015-04-30 18:48:11 +02:00
ipv4 tcp: don't over-send F-RTO probes 2015-05-19 16:36:57 -04:00
ipv6 ipv6: fix ECMP route replacement 2015-05-20 12:02:26 -04:00
ipx net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
irda Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-09 23:38:02 -04:00
iucv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-04-02 16:16:53 -04:00
key xfrm: simplify xfrm_address_t use 2015-03-31 13:58:35 -04:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-04-06 22:34:15 -04:00
lapb
llc net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
mac80211 mac80211: move WEP tailroom size check 2015-05-11 14:51:29 +02:00
mac802154 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2015-05-09 15:51:00 -04:00
mpls mpls: Change reserved label names to be consistent with netbsd 2015-05-09 22:29:50 -04:00
netfilter netfilter: nf_tables: fix bogus warning in nft_data_uninit() 2015-05-15 22:07:30 +02:00
netlabel netlink: implement nla_put_in_addr and nla_put_in6_addr 2015-03-31 13:58:35 -04:00
netlink netlink: Reset portid after netlink_insert failure 2015-05-16 17:08:57 -04:00
netrom net: Kill dev_rebuild_header 2015-03-02 16:43:41 -05:00
nfc nfc: Fix portid type in urelease_work 2015-04-13 16:35:16 -04:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-04-15 09:00:47 -07:00
packet af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT). 2015-05-10 19:40:08 -04:00
phonet net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
rds net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket. 2015-05-09 16:03:28 -04:00
rfkill Last round of updates for net-next: 2015-02-04 14:57:45 -08:00
rose net: Kill dev_rebuild_header 2015-03-02 16:43:41 -05:00
rxrpc new helper: msg_data_left() 2015-04-11 15:53:35 -04:00
sched net: sched: fix call_rcu() race on classifier module unloads 2015-05-21 18:48:18 -04:00
sctp sctp: avoid to repeatedly declare external variables 2015-03-25 11:40:16 -04:00
sunrpc svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures 2015-05-04 12:02:40 -04:00
switchdev rename RTNH_F_EXTERNAL to RTNH_F_OFFLOAD 2015-05-14 22:45:39 -04:00
tipc tipc: fix problem with parallel link synchronization mechanism 2015-04-29 15:08:59 -04:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-04-27 14:05:19 -07:00
vmw_vsock net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
wimax
wireless cfg80211: don't allow disabling WEXT if it's required 2015-04-08 09:19:29 +02:00
x25 net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-04-14 15:44:14 -04:00
compat.c net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
Kconfig
Makefile mpls: Refactor how the mpls module is built 2015-03-04 00:26:06 -05:00
socket.c VFS: net/: d_inode() annotations 2015-04-15 15:06:56 -04:00
sysctl_net.c