linux/drivers/misc
Nadav Amit ba03a9bbd1 VMCI: Release resource if the work is already queued
Francois reported that VMware balloon gets stuck after a balloon reset,
when the VMCI doorbell is removed. A similar error can occur when the
balloon driver is removed with the following splat:

[ 1088.622000] INFO: task modprobe:3565 blocked for more than 120 seconds.
[ 1088.622035]       Tainted: G        W         5.2.0 #4
[ 1088.622087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.622205] modprobe        D    0  3565   1450 0x00000000
[ 1088.622210] Call Trace:
[ 1088.622246]  __schedule+0x2a8/0x690
[ 1088.622248]  schedule+0x2d/0x90
[ 1088.622250]  schedule_timeout+0x1d3/0x2f0
[ 1088.622252]  wait_for_completion+0xba/0x140
[ 1088.622320]  ? wake_up_q+0x80/0x80
[ 1088.622370]  vmci_resource_remove+0xb9/0xc0 [vmw_vmci]
[ 1088.622373]  vmci_doorbell_destroy+0x9e/0xd0 [vmw_vmci]
[ 1088.622379]  vmballoon_vmci_cleanup+0x6e/0xf0 [vmw_balloon]
[ 1088.622381]  vmballoon_exit+0x18/0xcc8 [vmw_balloon]
[ 1088.622394]  __x64_sys_delete_module+0x146/0x280
[ 1088.622408]  do_syscall_64+0x5a/0x130
[ 1088.622410]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1088.622415] RIP: 0033:0x7f54f62791b7
[ 1088.622421] Code: Bad RIP value.
[ 1088.622421] RSP: 002b:00007fff2a949008 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1088.622426] RAX: ffffffffffffffda RBX: 000055dff8b55d00 RCX: 00007f54f62791b7
[ 1088.622426] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055dff8b55d68
[ 1088.622427] RBP: 000055dff8b55d00 R08: 00007fff2a947fb1 R09: 0000000000000000
[ 1088.622427] R10: 00007f54f62f5cc0 R11: 0000000000000206 R12: 000055dff8b55d68
[ 1088.622428] R13: 0000000000000001 R14: 000055dff8b55d68 R15: 00007fff2a94a3f0

The cause for the bug is that when the "delayed" doorbell is invoked, it
takes a reference on the doorbell entry and schedules work that is
supposed to run the appropriate code and drop the doorbell entry
reference. The code ignores the fact that if the work is already queued,
it will not be scheduled to run one more time. As a result one of the
references would not be dropped. When the code waits for the reference
to get to zero, during balloon reset or module removal, it gets stuck.

Fix it. Drop the reference if schedule_work() indicates that the work is
already queued.

Note that this bug got more apparent (or apparent at all) due to
commit ce664331b2 ("vmw_balloon: VMCI_DOORBELL_SET does not check status").

Fixes: 83e2ec765b ("VMCI: doorbell implementation.")
Reported-by: Francois Rigault <rigault.francois@gmail.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Adit Ranadive <aditr@vmware.com>
Cc: Alexios Zavras <alexios.zavras@intel.com>
Cc: Vishnu DASA <vdasa@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Link: https://lore.kernel.org/r/20190820202638.49003-1-namit@vmware.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-28 22:57:07 +02:00
..
altera-stapl treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 61 2019-05-24 17:36:45 +02:00
c2port treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
cardreader treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 2019-06-19 17:09:07 +02:00
cb710 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
cxl Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
echo treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 176 2019-05-30 11:29:19 -07:00
eeprom at24 fixes for v5.3-rc3 2019-08-01 14:05:17 +02:00
genwqe Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
habanalabs habanalabs: fix device IRQ unmasking for BE host 2019-08-12 09:01:10 +03:00
ibmasm Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
lis3lv02d treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
lkdtm lkdtm/bugs: fix build error in lkdtm_EXHAUST_STACK 2019-08-28 22:33:44 +02:00
mei mei: me: add Tiger Lake point LP device ID 2019-08-28 22:30:15 +02:00
mic Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
ocxl powerpc updates for 5.3 2019-07-13 16:08:36 -07:00
sgi-gru treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
sgi-xp
ti-st Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
vmw_vmci VMCI: Release resource if the work is already queued 2019-08-28 22:57:07 +02:00
ad525x_dpot-i2c.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 149 2019-05-30 11:25:18 -07:00
ad525x_dpot-spi.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 149 2019-05-30 11:25:18 -07:00
ad525x_dpot.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 149 2019-05-30 11:25:18 -07:00
ad525x_dpot.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 149 2019-05-30 11:25:18 -07:00
apds990x.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
apds9802als.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 167 2019-05-30 11:26:39 -07:00
atmel_tclib.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
atmel-ssc.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
bh1770glc.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
cs5535-mfgpt.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
ds1682.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
dummy-irq.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
enclosure.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 176 2019-05-30 11:29:19 -07:00
fastrpc.c
hmc6352.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 167 2019-05-30 11:26:39 -07:00
hpilo.c
hpilo.h
ibmvmc.c
ibmvmc.h
ics932s401.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
ioc4.c
isl29003.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 61 2019-05-24 17:36:45 +02:00
isl29020.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 167 2019-05-30 11:26:39 -07:00
Kconfig misc: xilinx-sdfec: fix dependency and build error 2019-08-15 18:10:25 +02:00
kgdbts.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
lattice-ecp3-config.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
Makefile
pch_phub.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 167 2019-05-30 11:26:39 -07:00
pci_endpoint_test.c dmaengine updates for v5.3-rc1 2019-07-17 09:55:43 -07:00
phantom.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
pti.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
pvpanic.c
qcom-coincell.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 284 2019-06-05 17:36:37 +02:00
spear13xx_pcie_gadget.c
sram-exec.c
sram.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 1 2019-05-21 11:28:39 +02:00
sram.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
tifm_7xx1.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
tifm_core.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
tsl2550.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 61 2019-05-24 17:36:45 +02:00
vexpress-syscfg.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
vmw_balloon.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
xilinx_sdfec.c