linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-17 17:41:44 +00:00

Author	SHA1	Message	Date
Farhan Ali	41be3e2618	vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING" vfio_dev_present() which is the condition to wait_event_interruptible_timeout(), will call vfio_group_get_device and try to acquire the mutex group->device_lock. wait_event_interruptible_timeout() will set the state of the current task to TASK_INTERRUPTIBLE, before doing the condition check. This means that we will try to acquire the mutex while already in a sleeping state. The scheduler warns us by giving the following warning: [ 4050.264464] ------------[ cut here ]------------ [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188 [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90 .... 4050.264756] Call Trace: [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90) [ 4050.264774] [<0000000000b97edc>] __mutex_lock+0x44/0x8c0 [ 4050.264782] [<0000000000b9878a>] mutex_lock_nested+0x32/0x40 [ 4050.264793] [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio] [ 4050.264803] [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio] [ 4050.264813] [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev] [ 4050.264825] [<00000000008e01b0>] device_release_driver_internal+0x168/0x268 [ 4050.264834] [<00000000008de692>] bus_remove_device+0x162/0x190 [ 4050.264843] [<00000000008daf42>] device_del+0x1e2/0x368 [ 4050.264851] [<00000000008db12c>] device_unregister+0x64/0x88 [ 4050.264862] [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev] [ 4050.264872] [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev] [ 4050.264881] [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8 [ 4050.264890] [<00000000003c1530>] __vfs_write+0x38/0x1a8 [ 4050.264899] [<00000000003c187c>] vfs_write+0xb4/0x198 [ 4050.264908] [<00000000003c1af2>] ksys_write+0x5a/0xb0 [ 4050.264916] [<0000000000b9e270>] system_call+0xdc/0x2d8 [ 4050.264925] 4 locks held by sh/35924: [ 4050.264933] #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198 [ 4050.264948] #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8 [ 4050.264963] #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150 [ 4050.264979] #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268 [ 4050.264993] Last Breaking-Event-Address: [ 4050.265002] [<000000000017bbaa>] __might_sleep+0x72/0x90 [ 4050.265010] irq event stamp: 7039 [ 4050.265020] hardirqs last enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740 [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740 [ 4050.265040] softirqs last enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100 [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100 [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]--- Let's fix this as described in the article https://lwn.net/Articles/628628/. Signed-off-by: Farhan Ali <alifm@linux.ibm.com> [remove now redundant vfio_dev_present()] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-04-23 11:30:46 -06:00
Bjorn Helgaas	a88a7b3eb0	vfio: Use dev_printk() when possible Use dev_printk() when possible to make messages consistent with other device-related messages. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-04-22 11:45:42 -06:00
Chengguang Xu	8bcb64a510	vfio: expand minor range when registering chrdev region Actually, total amount of available minor number for a single major is MINORMARK + 1. So expand minor range when registering chrdev region. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-02-12 13:20:56 -07:00
Yisheng Xie	e77addf018	vfio: use match_string() helper match_string() returns the index of an array for a matching string, which can be used intead of open coded variant. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-06-08 10:24:33 -06:00
Alex Williamson	dda01f787d	vfio: Simplify capability helper The vfio_info_add_capability() helper requires the caller to pass a capability ID, which it then uses to fill in header fields, assuming hard coded versions. This makes for an awkward and rigid interface. The only thing we want this helper to do is allocate sufficient space in the caps buffer and chain this capability into the list. Reduce it to that simple task. Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-12-20 09:53:54 -07:00
Mark Rutland	6aa7de0591	locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-25 11:01:08 +02:00
Alex Williamson	6586b561a9	vfio: Stall vfio_del_group_dev() for container group detach When the user unbinds the last device of a group from a vfio bus driver, the devices within that group should be available for other purposes. We currently have a race that makes this generally, but not always true. The device can be unbound from the vfio bus driver, but remaining IOMMU context of the group attached to the container can result in errors as the next driver configures DMA for the device. Wait for the group to be detached from the IOMMU backend before allowing the bus driver remove callback to complete. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-08-30 14:02:16 -06:00
Eric Auger	d935ad91f0	vfio: fix noiommu vfio_iommu_group_get reference count In vfio_iommu_group_get() we want to increase the reference count of the iommu group. In noiommu case, the group does not exist and is allocated. iommu_group_add_device() increases the group ref count. However we then call iommu_group_put() which decrements it. This leads to a "refcount_t: underflow WARN_ON". Only decrement the ref count in case of iommu_group_add_device failure. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-08-30 14:00:47 -06:00
Alex Williamson	7f56c30bd0	vfio: Remove unnecessary uses of vfio_container.group_lock The original intent of vfio_container.group_lock is to protect vfio_container.group_list, however over time it's become a crutch to prevent changes in container composition any time we call into the iommu driver backend. This introduces problems when we start to have more complex interactions, for example when a user's DMA unmap request triggers a notification to an mdev vendor driver, who responds by attempting to unpin mappings within that request, re-entering the iommu backend. We incorrectly assume that the use of read-locks here allow for this nested locking behavior, but a poorly timed write-lock could in fact trigger a deadlock. The current use of group_lock seems to fall into the trap of locking code, not data. Correct that by removing uses of group_lock that are not directly related to group_list. Note that the vfio type1 iommu backend has its own mutex, vfio_iommu.lock, which it uses to protect itself for each of these interfaces anyway. The group_lock appears to be a redundancy for these interfaces and type1 even goes so far as to release its mutex to allow for exactly the re-entrant code path above. Reported-by: Chuanxiao Dong <chuanxiao.dong@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: stable@vger.kernel.org # v4.10+	2017-07-07 15:37:38 -06:00
Alex Williamson	5d6dee80a1	vfio: New external user group/file match At the point where the kvm-vfio pseudo device wants to release its vfio group reference, we can't always acquire a new reference to make that happen. The group can be in a state where we wouldn't allow a new reference to be added. This new helper function allows a caller to match a file to a group to facilitate this. Given a file and group, report if they match. Thus the caller needs to already have a group reference to match to the file. This allows the deletion of a group without acquiring a new reference. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Cc: stable@vger.kernel.org	2017-06-28 13:50:05 -06:00
Alex Williamson	811642d8d8	vfio: Fix group release deadlock If vfio_iommu_group_notifier() acquires a group reference and that reference becomes the last reference to the group, then vfio_group_put introduces a deadlock code path where we're trying to unregister from the iommu notifier chain from within a callout of that chain. Use a work_struct to release this reference asynchronously. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Cc: stable@vger.kernel.org	2017-06-28 13:49:38 -06:00
Dan Carpenter	7b3a10df1d	vfio: Use ERR_CAST() instead of open coding it It's a small cleanup to use ERR_CAST() here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-06-13 09:24:21 -06:00
Alex Williamson	65b1adebfe	vfio: Rework group release notifier warning The intent of the original warning is make sure that the mdev vendor driver has removed any group notifiers at the point where the group is closed by the user. Theoretically this would be through an orderly shutdown where any devices are release prior to the group release. We can't always count on an orderly shutdown, the user can close the group before the notifier can be removed or the user task might be killed. We'd like to add this sanity test when the group is idle and the only references are from the devices within the group themselves, but we don't have a good way to do that. Instead check both when the group itself is removed and when the group is opened. A bit later than we'd prefer, but better than the current over aggressive approach. Fixes: `ccd46dbae7` ("vfio: support notifier chain in vfio_group") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: <stable@vger.kernel.org> # v4.10 Cc: Jike Song <jike.song@intel.com>	2017-03-21 13:19:09 -06:00
Changbin Du	d9d84780f1	vfio: fix a typo in comment of function vfio_pin_pages Correct the description that 'unpinned' -> 'pinned'. Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-02-22 11:40:15 -07:00
Alex Williamson	0ca582fd04	vfio: Replace module request with softdep Rather than doing a module request from within the init function, add a soft dependency on the available IOMMU backend drivers. This makes the dependency visible to userspace when picking modules for the ram disk. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-02-09 12:13:53 -07:00
Jike Song	ccd46dbae7	vfio: support notifier chain in vfio_group Beyond vfio_iommu events, users might also be interested in vfio_group events. For example, if a vfio_group is used along with Qemu/KVM, whenever kvm pointer is set to/cleared from the vfio_group, users could be notified. Currently only VFIO_GROUP_NOTIFY_SET_KVM supported. Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> [aw: remove use of new typedef] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-12-01 10:40:05 -07:00
Jike Song	22195cbd34	vfio: vfio_register_notifier: classify iommu notifier Currently vfio_register_notifier assumes that there is only one notifier chain, which is in vfio_iommu. However, the user might also be interested in events other than vfio_iommu, for example, vfio_group. Refactor vfio_{un}register_notifier implementation to make it feasible. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Jike Song <jike.song@intel.com> [aw: merge with commit 816ca69ea9c7 ("vfio: Fix handling of error returned by 'vfio_group_get_from_dev()'"), remove typedef] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-12-01 09:38:47 -07:00
Christophe JAILLET	d256459fae	vfio: Fix handling of error returned by 'vfio_group_get_from_dev()' 'vfio_group_get_from_dev()' seems to return only NULL on error, not an error pointer. Fixes: `2169037dc3` ("vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops") Fixes: `c086de818d` ("vfio iommu: Add blocking notifier to notify DMA_UNMAP") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-12-01 08:45:49 -07:00
Eric Auger	5ba6de98c7	vfio: fix vfio_info_cap_add/shift Capability header next field is an offset relative to the start of the INFO buffer. tmp->next is assigned the proper value but iterations implemented in vfio_info_cap_add and vfio_info_cap_shift use next as an offset between the headers. When coping with multiple capabilities this leads to an Oops. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-21 11:51:53 -07:00
Kirti Wankhede	c747f08aea	vfio: Introduce vfio_set_irqs_validate_and_prepare() Vendor driver using mediated device framework would use same mechnism to validate and prepare IRQs. Introducing this function to reduce code replication in multiple drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:33:20 -07:00
Kirti Wankhede	b3c0a866f1	vfio: Introduce common function to add capabilities Vendor driver using mediated device framework should use vfio_info_add_capability() to add capabilities. Introduced this function to reduce code duplication in vendor drivers. vfio_info_cap_shift() manipulated a data buffer to add an offset to each element in a chain. This data buffer is documented in a uapi header. Changing vfio_info_cap_shift symbol to be available to all drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:33:20 -07:00
Kirti Wankhede	c086de818d	vfio iommu: Add blocking notifier to notify DMA_UNMAP Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers about DMA_UNMAP. Exported two APIs vfio_register_notifier() and vfio_unregister_notifier(). Notifier should be registered, if external user wants to use vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages. Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate mappings. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:33:07 -07:00
Kirti Wankhede	2169037dc3	vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops Added APIs for pining and unpining set of pages. These call back into backend iommu module to actually pin and unpin pages. Added two new callback functions to struct vfio_iommu_driver_ops. Backend IOMMU module that supports pining and unpinning pages for mdev devices should provide these functions. Renamed static functions in vfio_type1_iommu.c to resolve conflicts Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:24:58 -07:00
Kirti Wankhede	32f55d835b	vfio: Common function to increment container_users This change rearrange functions to have common function to increment container_users Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:24:55 -07:00
Kirti Wankhede	7ed3ea8a71	vfio: Rearrange functions to get vfio_group from dev This patch rearranges functions to get vfio_group from device Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:24:52 -07:00
Ilya Lesokhin	d370c917b9	vfio: fix possible use after free of vfio group The vfio group should be released after the vfio_group_try_dissolve_container call. The code should not rely on someone else to hold a reference on the group. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-14 14:28:16 -06:00
Alex Williamson	d7a8d5ed87	vfio: Add capability chain helpers Allow sub-modules to easily reallocate a buffer for managing capability chains for info ioctls. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:08 -07:00
Alex Williamson	7c435b46c2	vfio: If an IOMMU backend fails, keep looking Consider an IOMMU to be an API rather than an implementation, we might have multiple implementations supporting the same API, so try another if one fails. The expectation here is that we'll really only have one implementation per device type. For instance the existing type1 driver works with any PCI device where the IOMMU API is available. A vGPU vendor may have a virtual PCI device which provides DMA isolation and mapping through other mechanisms, but can re-use userspaces that make use of the type1 VFIO IOMMU API. This allows that to work. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:08 -07:00
Alex Williamson	16ab8a5cbe	vfio/noiommu: Don't use iommu_present() to track fake groups Using iommu_present() to determine whether an IOMMU group is real or fake has some problems. First, apparently Power systems don't register an IOMMU on the device bus, so the groups and containers get marked as noiommu and then won't bind to their actual IOMMU driver. Second, I expect we'll run into the same issue as we try to support vGPUs through vfio, since they're likely to emulate this behavior of creating an IOMMU group on a virtual device and then providing a vfio IOMMU backend tailored to the sort of isolation they provide, which won't necessarily be fully compatible with the IOMMU API. The solution here is to use the existing iommudata interface to IOMMU groups, which allows us to easily identify the fake groups we've created for noiommu purposes. The iommudata we set is purely arbitrary since we're only comparing the address, so we use the address of the noiommu switch itself. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <sshukla@mvista.com> Fixes: `03a76b60f8` ("vfio: Include No-IOMMU mode") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-01-27 11:22:25 -07:00
Alex Williamson	03a76b60f8	vfio: Include No-IOMMU mode There is really no way to safely give a user full access to a DMA capable device without an IOMMU to protect the host system. There is also no way to provide DMA translation, for use cases such as device assignment to virtual machines. However, there are still those users that want userspace drivers even under those conditions. The UIO driver exists for this use case, but does not provide the degree of device access and programming that VFIO has. In an effort to avoid code duplication, this introduces a No-IOMMU mode for VFIO. This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling the "enable_unsafe_noiommu_mode" option on the vfio driver. This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported. This patch includes no-iommu support for the vfio-pci bus driver only. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2015-12-21 15:28:11 -07:00
Alex Williamson	ae5515d663	Revert: "vfio: Include No-IOMMU mode" Revert commit `033291eccb` ("vfio: Include No-IOMMU mode") due to lack of a user. This was originally intended to fill a need for the DPDK driver, but uptake has been slow so rather than support an unproven kernel interface revert it and revisit when userspace catches up. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-12-04 08:38:42 -07:00
Dan Carpenter	049af1060b	vfio: fix a warning message The first argument to the WARN() macro has to be a condition. I'm sort of disappointed that this code doesn't generate a compiler warning. I guess -Wformat-extra-args doesn't work in the kernel. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-21 06:55:58 -07:00
Alex Williamson	033291eccb	vfio: Include No-IOMMU mode There is really no way to safely give a user full access to a DMA capable device without an IOMMU to protect the host system. There is also no way to provide DMA translation, for use cases such as device assignment to virtual machines. However, there are still those users that want userspace drivers even under those conditions. The UIO driver exists for this use case, but does not provide the degree of device access and programming that VFIO has. In an effort to avoid code duplication, this introduces a No-IOMMU mode for VFIO. This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling the "enable_unsafe_noiommu_mode" option on the vfio driver. This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported. This patch includes no-iommu support for the vfio-pci bus driver only. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2015-11-04 09:56:16 -07:00
Joerg Roedel	e324fc82ea	vfio: Fix bug in vfio_device_get_from_name() The vfio_device_get_from_name() function might return a non-NULL pointer, when called with a device name that is not found in the list. This causes undefined behavior, in my case calling an invalid function pointer later on: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffff8800cb3ddc08 [...] Call Trace: [<ffffffffa03bd733>] ? vfio_group_fops_unl_ioctl+0x253/0x410 [vfio] [<ffffffff811efc4d>] do_vfs_ioctl+0x2cd/0x4c0 [<ffffffff811f9657>] ? __fget+0x77/0xb0 [<ffffffff811efeb9>] SyS_ioctl+0x79/0x90 [<ffffffff81001bb0>] ? syscall_return_slowpath+0x50/0x130 [<ffffffff8167f776>] entry_SYSCALL_64_fastpath+0x16/0x75 Fix the issue by returning NULL when there is no device with the requested name in the list. Cc: stable@vger.kernel.org # v4.2+ Fixes: `4bc94d5dc9` ("vfio: Fix lockdep issue") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-04 09:27:39 -07:00
Alex Williamson	5f096b14d4	vfio: Whitelist PCI bridges When determining whether a group is viable, we already allow devices bound to pcieport. Generalize this to include any PCI bridge device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-27 14:53:04 -06:00
Alex Williamson	4bc94d5dc9	vfio: Fix lockdep issue When we open a device file descriptor, we currently have the following: vfio_group_get_device_fd() mutex_lock(&group->device_lock); open() ... if (ret) release() If we hit that error case, we call the backend driver release path, which for vfio-pci looks like this: vfio_pci_release() vfio_pci_disable() vfio_pci_try_bus_reset() vfio_pci_get_devs() vfio_device_get_from_dev() vfio_group_get_device() mutex_lock(&group->device_lock); Whoops, we've stumbled back onto group.device_lock and created a deadlock. There's a low likelihood of ever seeing this play out, but obviously it needs to be fixed. To do that we can use a reference to the vfio_device for vfio_group_get_device_fd() rather than holding the lock. There was a loop in this function, theoretically allowing multiple devices with the same name, but in practice we don't expect such a thing to happen and the code is already aborting from the loop with break on any sort of error rather than continuing and only parsing the first match anyway, so the loop was effectively unused already. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Fixes: `20f300175a` ("vfio/pci: Fix racy vfio_device_get_from_dev() call") Reported-by: Joerg Roedel <joro@8bytes.org> Tested-by: Joerg Roedel <jroedel@suse.de>	2015-07-24 15:14:04 -06:00
Alex Williamson	20f300175a	vfio/pci: Fix racy vfio_device_get_from_dev() call Testing the driver for a PCI device is racy, it can be all but complete in the release path and still report the driver as ours. Therefore we can't trust drvdata to be valid. This race can sometimes be seen when one port of a multifunction device is being unbound from the vfio-pci driver while another function is being released by the user and attempting a bus reset. The device in the remove path is found as a dependent device for the bus reset of the release path device, the driver is still set to vfio-pci, but the drvdata has already been cleared, resulting in a null pointer dereference. To resolve this, fix vfio_device_get_from_dev() to not take the dev_get_drvdata() shortcut and instead traverse through the iommu_group, vfio_group, vfio_device path to get a reference we can trust. Once we have that reference, we know the device isn't in transition and we can test to make sure the driver is still what we expect, so that we don't interfere with devices we don't own. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-09 10:08:57 -06:00
Alex Williamson	db7d4d7f40	vfio: Fix runaway interruptible timeout Commit `13060b64b8` ("vfio: Add and use device request op for vfio bus drivers") incorrectly makes use of an interruptible timeout. When interrupted, the signal remains pending resulting in subsequent timeouts occurring instantly. This makes the loop spin at a much higher rate than intended. Instead of making this completely non-interruptible, we can change this into a sort of interruptible-once behavior and use the "once" to log debug information. The driver API doesn't allow us to abort and return an error code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Fixes: `13060b64b8` Cc: stable@vger.kernel.org # v4.0	2015-05-01 16:31:41 -06:00
Alex Williamson	71be3423a6	vfio: Split virqfd into a separate module for vfio bus drivers An unintended consequence of commit `42ac9bd18d` ("vfio: initialize the virqfd workqueue in VFIO generic code") is that the vfio module is renamed to vfio_core so that it can include both vfio and virqfd. That's a user visible change that may break module loading scritps and it imposes eventfd support as a dependency on the core vfio code, which it's really not. virqfd is intended to be provided as a service to vfio bus drivers, so instead of wrapping it into vfio.ko, we can make it a stand-alone module toggled by vfio bus drivers. This has the additional benefit of removing initialization and exit from the core vfio code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-17 08:33:38 -06:00
Zhen Lei	2f51bf4be9	vfio: put off the allocation of "minor" in vfio_create_group The next code fragment "list_for_each_entry" is not depend on "minor". With this patch, the free of "minor" in "list_for_each_entry" can be reduced, and there is no functional change. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:56 -06:00
Antonios Motakis	42ac9bd18d	vfio: initialize the virqfd workqueue in VFIO generic code Now we have finally completely decoupled virqfd from VFIO_PCI. We can initialize it from the VFIO generic code, in order to safely use it from multiple independent VFIO bus drivers. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:54 -06:00
Alex Williamson	13060b64b8	vfio: Add and use device request op for vfio bus drivers When a request is made to unbind a device from a vfio bus driver, we need to wait for the device to become unused, ie. for userspace to release the device. However, we have a long standing TODO in the code to do something proactive to make that happen. To enable this, we add a request callback on the vfio bus driver struct, which is intended to signal the user through the vfio device interface to release the device. Instead of passively waiting for the device to become unused, we can now pester the user to give it up. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 12:37:47 -07:00
Alex Williamson	4a68810dbb	vfio: Tie IOMMU group reference to vfio group Move the iommu_group reference from the device to the vfio_group. This ensures that the iommu_group persists as long as the vfio_group remains. This can be important if all of the device from an iommu_group are removed, but we still have an outstanding vfio_group reference; we can still walk the empty list of devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 15:05:06 -07:00
Alex Williamson	60720a0fc6	vfio: Add device tracking during unbind There's a small window between the vfio bus driver calling vfio_del_group_dev() and the device being completely unbound where the vfio group appears to be non-viable. This creates a race for users like QEMU/KVM where the kvm-vfio module tries to get an external reference to the group in order to match and release an existing reference, while the device is potentially being removed from the vfio bus driver. If the group is momentarily non-viable, kvm-vfio may not be able to release the group reference until VM shutdown, making the group unusable until that point. Bridge the gap between device removal from the group and completion of the driver unbind by tracking it in a list. The device is added to the list before the bus driver reference is released and removed using the existing unbind notifier. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 15:05:06 -07:00
Jean Delvare	8283b4919e	driver core: dev_set_drvdata can no longer fail So there is no point in checking its return value, which will soon disappear. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-05-27 13:40:51 -07:00
Alex Williamson	88d7ab8949	vfio: Add external user check extension interface This lets us check extensions, particularly VFIO_DMA_CC_IOMMU using the external user interface, allowing KVM to probe IOMMU coherency. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-02-26 11:38:39 -07:00
Alex Williamson	d10999016f	vfio: Convert control interface to misc driver This change allows us to support module auto loading using devname support in userspace tools. With this, /dev/vfio/vfio will always be present and opening it will cause the vfio module to load. This should avoid needing to configure the system to statically load vfio in order to get libvirt to correctly detect support for it. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-12-19 10:17:13 -07:00
Alex Williamson	5d042fbdbb	vfio: Add O_CLOEXEC flag to vfio device fd Add the default O_CLOEXEC flag for device file descriptors. This is generally considered a safer option as it allows the user a race free option to decide whether file descriptors are inherited across exec, with the default avoiding file descriptor leaks. Reported-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-08-22 10:33:41 -06:00
Yann Droneaud	a5d550703d	vfio: use get_unused_fd_flags(0) instead of get_unused_fd() Macro get_unused_fd() is used to allocate a file descriptor with default flags. Those default flags (0) can be "unsafe": O_CLOEXEC must be used by default to not leak file descriptor across exec(). Instead of macro get_unused_fd(), functions anon_inode_getfd() or get_unused_fd_flags() should be used with flags given by userspace. If not possible, flags should be set to O_CLOEXEC to provide userspace with a default safe behavor. In a further patch, get_unused_fd() will be removed so that new code start using anon_inode_getfd() or get_unused_fd_flags() with correct flags. This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. The hard coded flag value (0) should be reviewed on a per-subsystem basis, and, if possible, set to O_CLOEXEC. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud@opteya.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-08-22 10:20:05 -06:00
Alexey Kardashevskiy	6cdd978213	vfio: add external user support VFIO is designed to be used via ioctls on file descriptors returned by VFIO. However in some situations support for an external user is required. The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to use the existing VFIO groups for exclusive access in real/virtual mode on a host to avoid passing map/unmap requests to the user space which would made things pretty slow. The protocol includes: 1. do normal VFIO init operation: - opening a new container; - attaching group(s) to it; - setting an IOMMU driver for a container. When IOMMU is set for a container, all groups in it are considered ready to use by an external user. 2. User space passes a group fd to an external user. The external user calls vfio_group_get_external_user() to verify that: - the group is initialized; - IOMMU is set for it. If both checks passed, vfio_group_get_external_user() increments the container user counter to prevent the VFIO group from disposal before KVM exits. 3. The external user calls vfio_external_user_iommu_id() to know an IOMMU ID. PPC64 KVM uses it to link logical bus number (LIOBN) with IOMMU ID. 4. When the external KVM finishes, it calls vfio_group_put_external_user() to release the VFIO group. This call decrements the container user counter. Everything gets released. The "vfio: Limit group opens" patch is also required for the consistency. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-08-05 10:52:36 -06:00

1 2

73 Commits