linux/net
Ying Xue b952b2befb tipc: fix potential deadlock when all links are reset
[   60.988363] ======================================================
[   60.988754] [ INFO: possible circular locking dependency detected ]
[   60.989152] 3.19.0+ #194 Not tainted
[   60.989377] -------------------------------------------------------
[   60.989781] swapper/3/0 is trying to acquire lock:
[   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.990743]
[   60.990743] but task is already holding lock:
[   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.991738]
[   60.991738] which lock already depends on the new lock.
[   60.991738]
[   60.992174]
[   60.992174] the existing dependency chain (in reverse order) is:
[   60.992174]
-> #1 (&(&bclink->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
[   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
[   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
[   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
[   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
-> #0 (&(&n_ptr->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
[   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
[   60.992174] other info that might help us debug this:
[   60.992174]
[   60.992174]  Possible unsafe locking scenario:
[   60.992174]
[   60.992174]        CPU0                    CPU1
[   60.992174]        ----                    ----
[   60.992174]   lock(&(&bclink->lock)->rlock);
[   60.992174]                                lock(&(&n_ptr->lock)->rlock);
[   60.992174]                                lock(&(&bclink->lock)->rlock);
[   60.992174]   lock(&(&n_ptr->lock)->rlock);
[   60.992174]
[   60.992174]  *** DEADLOCK ***
[   60.992174]
[   60.992174] 3 locks held by swapper/3/0:
[   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
[   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
[   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]

The correct the sequence of grabbing n_ptr->lock and bclink->lock
should be that the former is first held and the latter is then taken,
which exactly happened on CPU1. But especially when the retransmission
of broadcast link is failed, bclink->lock is first held in
tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
called by tipc_link_retransmit() subsequently, which is demonstrated on
CPU0. As a result, deadlock occurs.

If the order of holding the two locks happening on CPU0 is reversed, the
deadlock risk will be relieved. Therefore, the node lock taken in
link_retransmit_failure() originally is moved to tipc_bclink_rcv()
so that it's obtained before bclink lock. But the precondition of
the adjustment of node lock is that responding to bclink reset event
must be moved from tipc_bclink_unlock() to tipc_node_unlock().

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 12:40:27 -07:00
..
6lowpan 6lowpan: nhc: add other known rfc6282 compressions 2015-02-14 23:08:44 +01:00
9p Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
802 net: Kill dev_rebuild_header 2015-03-02 16:43:41 -05:00
8021q net: Fix high overhead of vlan sub-device teardown. 2015-03-18 22:52:56 -04:00
appletalk appletalk: Use eth_<foo>_addr instead of memset 2015-03-03 17:01:37 -05:00
atm atm: Use eth_<foo>_addr instead of memset 2015-03-03 17:01:37 -05:00
ax25 ax25: Fix the build when CONFIG_INET is disabled 2015-03-05 13:17:39 -05:00
batman-adv batman-adv: Fix use of seq_has_overflowed() 2015-02-22 17:00:08 -05:00
bluetooth Bluetooth: Fix potential NULL dereference in SMP channel setup 2015-03-18 08:30:03 +02:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2015-03-23 22:02:46 -04:00
caif Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-09 23:38:02 -04:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2015-02-19 14:14:42 -08:00
core filter: introduce SKF_AD_VLAN_TPID BPF extension 2015-03-24 15:25:15 -04:00
dcb net/dcb: Add IEEE QCN attribute 2015-03-06 21:50:02 -05:00
dccp inet: fix double request socket freeing 2015-03-23 21:40:48 -04:00
decnet net: Remove protocol from struct dst_ops 2015-03-09 16:06:10 -04:00
dns_resolver
dsa net: dsa: Handle non-bridge master change 2015-03-25 14:04:57 -04:00
ethernet ethernet: Use eth_<foo>_addr instead of memset 2015-03-03 17:01:38 -05:00
hsr net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface. 2015-03-01 13:40:23 -05:00
ieee802154 ieee802154: don't export static symbol 2015-03-14 17:11:31 +01:00
ipv4 tcp: tcp_syn_flood_action() can be static 2015-03-29 12:17:18 -07:00
ipv6 fib6: install fib6 ops in the last step 2015-03-29 12:12:37 -07:00
ipx net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
irda Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-09 23:38:02 -04:00
iucv net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
key net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
l2tp l2tp: Use eth_<foo>_addr instead of memset 2015-03-03 17:01:38 -05:00
lapb lapb: move EXPORT_SYMBOL after functions. 2014-10-24 15:51:42 -04:00
llc net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
mac802154 mac802154: fix typo in header guard 2015-03-19 15:32:22 +01:00
mpls mpls: In mpls_egress verify the packet length. 2015-03-12 23:05:04 -04:00
netfilter ipv4: hash net ptr into fragmentation bucket selection 2015-03-25 14:07:04 -04:00
netlabel Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2015-02-11 20:25:11 -08:00
netlink rhashtable: Disable automatic shrinking by default 2015-03-24 17:48:40 -04:00
netrom net: Kill dev_rebuild_header 2015-03-02 16:43:41 -05:00
nfc net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
openvswitch net: Introduce possible_net_t 2015-03-12 14:39:40 -04:00
packet af_packet: pass checksum validation status to the user 2015-03-23 22:01:28 -04:00
phonet net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
rfkill Last round of updates for net-next: 2015-02-04 14:57:45 -08:00
rose net: Kill dev_rebuild_header 2015-03-02 16:43:41 -05:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
sched act_bpf: add initial eBPF support for actions 2015-03-20 19:10:44 -04:00
sctp sctp: avoid to repeatedly declare external variables 2015-03-25 11:40:16 -04:00
sunrpc sunrpc: fix braino in ->poll() 2015-03-08 12:53:46 -07:00
switchdev switchdev: fix stp update API to work with layered netdevices 2015-03-23 16:44:56 -04:00
tipc tipc: fix potential deadlock when all links are reset 2015-03-29 12:40:27 -07:00
unix net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
vmw_vsock net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
wimax
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
x25 net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2015-03-16 16:16:49 -04:00
compat.c net: socket: add support for async operations 2015-03-23 16:41:36 -04:00
Kconfig kconfig: use bool instead of boolean for type definition attributes 2015-01-07 13:08:04 +01:00
Makefile mpls: Refactor how the mpls module is built 2015-03-04 00:26:06 -05:00
socket.c net: socket: add support for async operations 2015-03-23 16:41:36 -04:00
sysctl_net.c