hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()

This patch fixes the race between netvsc_probe() and
rndis_set_subchannel(), which can cause a deadlock.

These are the related 3 paths which show the deadlock:

path :
    Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
    Call Trace:
     schedule
     schedule_preempt_disabled
     __mutex_lock
     __device_attach
     bus_probe_device
     device_add
     vmbus_device_register
     vmbus_onoffer
     vmbus_onmessage_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path :
    schedule
     schedule_preempt_disabled
     __mutex_lock
     netvsc_probe
     vmbus_probe
     really_probe
     __driver_attach
     bus_for_each_dev
     driver_attach_async
     async_run_entry_fn
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path :
    Workqueue: events netvsc_subchan_work [hv_netvsc]
    Call Trace:
     schedule
     rndis_set_subchannel
     netvsc_subchan_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

Before path  finishes, path  can start to run, because just before
the "bus_probe_device(dev);" in device_add() in path , there is a line
"object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can
immediately try to load hv_netvsc and hence path  can start to run.

Next, path  offloads the subchannal's initialization to a workqueue,
i.e. path , so we can end up in a deadlock situation like this:

Path  gets the device lock, and is trying to get the rtnl lock;
Path  gets the rtnl lock and is waiting for all the subchannel messages
to be processed;
Path  is trying to get the device lock, but since  is not releasing
the device lock, path  has to sleep; since the VMBus messages are
processed one by one, this means the sub-channel messages can't be
procedded, so  has to sleep with the rtnl lock held, and finally 
has to sleep... Now all the 3 paths are sleeping and we hit the deadlock.

With the patch, we can make sure  gets both the device lock and the
rtnl lock together, gets its job done, and releases the locks, so 
and  will not be blocked for ever.

Fixes: 8195b1396e ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
Dexuan Cui 2018-08-30 05:42:13 +00:00 committed by David S. Miller
parent 9ad716b95f
commit e04e7a7bbd

View File

@ -2206,6 +2206,16 @@ static int netvsc_probe(struct hv_device *dev,
memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN);
/* We must get rtnl lock before scheduling nvdev->subchan_work,
* otherwise netvsc_subchan_work() can get rtnl lock first and wait
* all subchannels to show up, but that may not happen because
* netvsc_probe() can't get rtnl lock and as a result vmbus_onoffer()
* -> ... -> device_add() -> ... -> __device_attach() can't get
* the device lock, so all the subchannels can't be processed --
* finally netvsc_subchan_work() hangs for ever.
*/
rtnl_lock();
if (nvdev->num_chn > 1)
schedule_work(&nvdev->subchan_work);
@ -2224,7 +2234,6 @@ static int netvsc_probe(struct hv_device *dev,
else
net->max_mtu = ETH_DATA_LEN;
rtnl_lock();
ret = register_netdevice(net);
if (ret != 0) {
pr_err("Unable to register netdev.\n");