linux

Author	SHA1	Message	Date
Kirti Wankhede	c747f08aea	vfio: Introduce vfio_set_irqs_validate_and_prepare() Vendor driver using mediated device framework would use same mechnism to validate and prepare IRQs. Introducing this function to reduce code replication in multiple drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:33:20 -07:00
Kirti Wankhede	c535d34569	vfio_pci: Update vfio_pci to use vfio_info_add_capability() Update msix_sparse_mmap_cap() to use vfio_info_add_capability() Update region type capability to use vfio_info_add_capability() Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:33:20 -07:00
Kirti Wankhede	b3c0a866f1	vfio: Introduce common function to add capabilities Vendor driver using mediated device framework should use vfio_info_add_capability() to add capabilities. Introduced this function to reduce code duplication in vendor drivers. vfio_info_cap_shift() manipulated a data buffer to add an offset to each element in a chain. This data buffer is documented in a uapi header. Changing vfio_info_cap_shift symbol to be available to all drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:33:20 -07:00
Kirti Wankhede	c086de818d	vfio iommu: Add blocking notifier to notify DMA_UNMAP Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers about DMA_UNMAP. Exported two APIs vfio_register_notifier() and vfio_unregister_notifier(). Notifier should be registered, if external user wants to use vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages. Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate mappings. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:33:07 -07:00
Kirti Wankhede	a54eb55045	vfio iommu type1: Add support for mediated devices VFIO IOMMU drivers are designed for the devices which are IOMMU capable. Mediated device only uses IOMMU APIs, the underlying hardware can be managed by an IOMMU domain. Aim of this change is: - To use most of the code of TYPE1 IOMMU driver for mediated devices - To support direct assigned device and mediated device in single module This change adds pin and unpin support for mediated device to TYPE1 IOMMU backend module. More details: - Domain for external user is tracked separately in vfio_iommu structure. It is allocated when group for first mdev device is attached. - Pages pinned for external domain are tracked in each vfio_dma structure for that iova range. - Page tracking rb-tree in vfio_dma keeps <iova, pfn, ref_count>. Key of rb-tree is iova, but it actually aims to track pfns. - On external pin request for an iova, page is pinned once, if iova is already pinned and tracked, ref_count is incremented. - External unpin request unpins pages only when ref_count is 0. - Pinned pages list is used to find pfn from iova and then unpin it. WARN_ON is added if there are entires in pfn_list while detaching the group and releasing the domain. - Page accounting is updated to account in its address space where the pages are pinned/unpinned, i.e dma->task - Accouting for mdev device is only done if there is no iommu capable domain in the container. When there is a direct device assigned to the container and that domain is iommu capable, all pages are already pinned during DMA_MAP. - Page accouting is updated on hot plug and unplug mdev device and pass through device. Tested by assigning below combinations of devices to a single VM: - GPU pass through only - vGPU device only - One GPU pass through and one vGPU device - Linux VM hot plug and unplug vGPU device while GPU pass through device exist - Linux VM hot plug and unplug GPU pass through device while vGPU device exist Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:25:11 -07:00
Kirti Wankhede	8f0d5bb95f	vfio iommu type1: Add task structure to vfio_dma Add task structure to vfio_dma structure. Task structure is used for: - During DMA_UNMAP, same task who mapped it or other task who shares same address space is allowed to unmap, otherwise unmap fails. QEMU maps few iova ranges initially, then fork threads and from the child thread calls DMA_UNMAP on previously mapped iova. Since child shares same address space, DMA_UNMAP is successful. - Avoid accessing struct mm while process is exiting by acquiring reference of task's mm during page accounting. - It is also used to get task mlock capability and rlimit for mlock. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:25:09 -07:00
Kirti Wankhede	7896c998f0	vfio iommu type1: Add find_iommu_group() function Add find_iommu_group() Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:25:06 -07:00
Kirti Wankhede	ea85cf353e	vfio iommu type1: Update argument of vaddr_get_pfn() Update arguments of vaddr_get_pfn() to take struct mm_struct *mm as input argument. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:25:03 -07:00
Kirti Wankhede	3624a2486c	vfio iommu type1: Update arguments of vfio_lock_acct Added task structure as input argument to vfio_lock_acct() function. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:25:01 -07:00
Kirti Wankhede	2169037dc3	vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops Added APIs for pining and unpining set of pages. These call back into backend iommu module to actually pin and unpin pages. Added two new callback functions to struct vfio_iommu_driver_ops. Backend IOMMU module that supports pining and unpinning pages for mdev devices should provide these functions. Renamed static functions in vfio_type1_iommu.c to resolve conflicts Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:24:58 -07:00
Kirti Wankhede	32f55d835b	vfio: Common function to increment container_users This change rearrange functions to have common function to increment container_users Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:24:55 -07:00
Kirti Wankhede	7ed3ea8a71	vfio: Rearrange functions to get vfio_group from dev This patch rearranges functions to get vfio_group from device Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:24:52 -07:00
Kirti Wankhede	fa3da00cb8	vfio: VFIO based driver for Mediated devices vfio_mdev driver registers with mdev core driver. mdev core driver creates mediated device and calls probe routine of vfio_mdev driver for each device. Probe routine of vfio_mdev driver adds mediated device to VFIO core module This driver forms a shim layer that pass through VFIO devices operations to vendor driver for mediated devices. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:24:50 -07:00
Kirti Wankhede	7b96953bc6	vfio: Mediated device Core driver Design for Mediated Device Driver: Main purpose of this driver is to provide a common interface for mediated device management that can be used by different drivers of different devices. This module provides a generic interface to create the device, add it to mediated bus, add device to IOMMU group and then add it to vfio group. Below is the high Level block diagram, with Nvidia, Intel and IBM devices as example, since these are the devices which are going to actively use this module as of now. +---------------+ \| \| \| +-----------+ \| mdev_register_driver() +--------------+ \| \| \| +<------------------------+ __init() \| \| \| mdev \| \| \| \| \| \| bus \| +------------------------>+ \|<-> VFIO user \| \| driver \| \| probe()/remove() \| vfio_mdev.ko \| APIs \| \| \| \| \| \| \| +-----------+ \| +--------------+ \| \| \| MDEV CORE \| \| MODULE \| \| mdev.ko \| \| +-----------+ \| mdev_register_device() +--------------+ \| \| \| +<------------------------+ \| \| \| \| \| \| nvidia.ko \|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| \| Physical \| \| \| \| device \| \| mdev_register_device() +--------------+ \| \| interface \| \|<------------------------+ \| \| \| \| \| \| i915.ko \|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| \| \| \| \| \| \| \| mdev_register_device() +--------------+ \| \| \| +<------------------------+ \| \| \| \| \| \| ccw_device.ko\|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| +-----------+ \| +---------------+ Core driver provides two types of registration interfaces: 1. Registration interface for mediated bus driver: /** * struct mdev_driver - Mediated device's driver * @name: driver name * @probe: called when new device created * @remove:called when device removed * @driver:device driver structure * */ struct mdev_driver { const char name; int (probe) (struct device dev); void (remove) (struct device dev); struct device_driver driver; }; Mediated bus driver for mdev device should use this interface to register and unregister with core driver respectively: int mdev_register_driver(struct mdev_driver drv, struct module owner); void mdev_unregister_driver(struct mdev_driver drv); Mediated bus driver is responsible to add/delete mediated devices to/from VFIO group when devices are bound and unbound to the driver. 2. Physical device driver interface This interface provides vendor driver the set APIs to manage physical device related work in its driver. APIs are : dev_attr_groups: attributes of the parent device. * mdev_attr_groups: attributes of the mediated device. * supported_type_groups: attributes to define supported type. This is mandatory field. * create: to allocate basic resources in vendor driver for a mediated device. This is mandatory to be provided by vendor driver. * remove: to free resources in vendor driver when mediated device is destroyed. This is mandatory to be provided by vendor driver. * open: open callback of mediated device * release: release callback of mediated device * read : read emulation callback. * write: write emulation callback. * ioctl: ioctl callback. * mmap: mmap emulation callback. Drivers should use these interfaces to register and unregister device to mdev core driver respectively: extern int mdev_register_device(struct device dev, const struct parent_ops ops); extern void mdev_unregister_device(struct device *dev); There are no locks to serialize above callbacks in mdev driver and vfio_mdev driver. If required, vendor driver can have locks to serialize above APIs in their driver. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-11-17 08:24:48 -07:00
Vlad Tsyrklevich	05692d7005	vfio/pci: Fix integer overflows, bitmask check The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize user-supplied integers, potentially allowing memory corruption. This patch adds appropriate integer overflow checks, checks the range bounds for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set. VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in vfio_pci_set_irqs_ioctl(). Furthermore, a kzalloc is changed to a kcalloc because the use of a kzalloc with an integer multiplication allowed an integer overflow condition to be reached without this patch. kcalloc checks for overflow and should prevent a similar occurrence. Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-26 13:49:29 -06:00
Christoph Hellwig	61771468e0	vfio_pci: use pci_alloc_irq_vectors Simplify the interrupt setup by using the new PCI layer helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-29 13:36:38 -06:00
Alex Williamson	c93a97ee05	vfio-pci: Disable INTx after MSI/X teardown The MSI/X shutdown path can gratuitously enable INTx, which is not something we want to happen if we're dealing with broken INTx device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-26 13:52:19 -06:00
Alex Williamson	ddf9dc0eb5	vfio-pci: Virtualize PCIe & AF FLR We use a BAR restore trick to try to detect when a user has performed a device reset, possibly through FLR or other backdoors, to put things back into a working state. This is important for backdoor resets, but we can actually just virtualize the "front door" resets provided via PCIe and AF FLR. Set these bits as virtualized + writable, allowing the default write to set them in vconfig, then we can simply check the bit, perform an FLR of our own, and clear the bit. We don't actually have the granularity in PCI to specify the type of reset we want to do, but generally devices don't implement both PCIe and AF FLR and we'll favor these over other types of reset, so we should generally lineup. We do test whether the device provides the requested FLR type to stay consistent with hardware capabilities though. This seems to fix several instance of devices getting into bad states with userspace drivers, like dpdk, running inside a VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Rose <grose@lightfleet.com>	2016-09-26 13:52:16 -06:00
Baoyou Xie	2e06285655	vfio: platform: mark symbols static where possible We get a few warnings when building kernel with W=1: drivers/vfio/platform/vfio_platform_common.c:76:5: warning: no previous prototype for 'vfio_platform_acpi_call_reset' [-Wmissing-prototypes] drivers/vfio/platform/vfio_platform_common.c:98:6: warning: no previous prototype for 'vfio_platform_acpi_has_reset' [-Wmissing-prototypes] drivers/vfio/platform/vfio_platform_common.c:640:5: warning: no previous prototype for 'vfio_platform_of_probe' [-Wmissing-prototypes] drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:59:5: warning: no previous prototype for 'vfio_platform_amdxgbe_reset' [-Wmissing-prototypes] drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c:60:5: warning: no previous prototype for 'vfio_platform_calxedaxgmac_reset' [-Wmissing-prototypes] .... In fact, these functions are only used in the file in which they are declared and don't need a declaration, but can be made static. so this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-13 16:11:37 -06:00
Wei Jiangang	8138dabbab	vfio/pci: Fix typos in comments Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-08-29 12:39:09 -06:00
Alex Williamson	c8952a7075	vfio/pci: Fix NULL pointer oops in error interrupt setup handling There are multiple cases in vfio_pci_set_ctx_trigger_single() where we assume we can safely read from our data pointer without actually checking whether the user has passed any data via the count field. VFIO_IRQ_SET_DATA_NONE in particular is entirely broken since we attempt to pull an int32_t file descriptor out before even checking the data type. The other data types assume the data pointer contains one element of their type as well. In part this is good news because we were previously restricted from doing much sanitization of parameters because it was missed in the past and we didn't want to break existing users. Clearly DATA_NONE is completely broken, so it must not have any users and we can fix it up completely. For DATA_BOOL and DATA_EVENTFD, we'll just protect ourselves, returning error when count is zero since we previously would have oopsed. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reported-by: Chris Thompson <the_cartographer@hotmail.com> Cc: stable@vger.kernel.org Reviewed-by: Eric Auger <eric.auger@redhat.com>	2016-08-08 16:16:23 -06:00
Sinan Kaya	0991bbdbf5	vfio: platform: check reset call return code during release Release call is ignoring the return code from reset call and can potentially continue even though reset call failed. If reset_required module parameter is set, this patch is going to validate the return code and will cause stack dump with WARN_ON and warn the user of failure. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:54:45 -06:00
Sinan Kaya	e99442323f	vfio: platform: check reset call return code during open Open call is ignoring the return code from reset call and can potentially continue even though reset call failed. If reset_required module parameter is set, this patch is going to validate the return code and will abort open if reset fails. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:54:45 -06:00
Sinan Kaya	b5add544d6	vfio, platform: make reset driver a requirement by default The code was allowing platform devices to be used without a supporting VFIO reset driver. The hardware can be left in some inconsistent state after a guest machine abort. The reset driver will put the hardware back to safe state and disable interrupts before returning the control back to the host machine. Adding a new reset_required kernel module option to platform VFIO drivers. The default value is true for the DT and ACPI based drivers. The reset requirement value for AMBA drivers is set to false and is unchangeable to maintain the existing functionality. New requirements are: 1. A reset function needs to be implemented by the corresponding driver via DT/ACPI. 2. The reset function needs to be discovered via DT/ACPI. The probe of the driver will fail if any of the above conditions are not satisfied. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:46 -06:00
Sinan Kaya	d30daa33ec	vfio: platform: call _RST method when using ACPI The device tree code checks for the presence of a reset driver and calls the of_reset function pointer by looking up the reset driver as a module. ACPI defines _RST method to perform device level reset. After the _RST method is executed, the OS can resume using the device. _RST method is expected to stop DMA transfers and IRQs. This patch introduces two functions as vfio_platform_acpi_has_reset and vfio_platform_acpi_call_reset. The has reset method is used to declare reset capability via the ioctl flag VFIO_DEVICE_FLAGS_RESET. The call reset function is used to execute the _RST ACPI method. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:44 -06:00
Sinan Kaya	5afec27474	vfio: platform: add extra debug info argument to call reset Getting ready to bring out extra debug information to the caller so that more verbose information can be printed when an error is observed. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:43 -06:00
Sinan Kaya	a12a9368e1	vfio: platform: add support for ACPI probe The code is using the compatible DT string to associate a reset driver with the actual device itself. The compatible string does not exist on ACPI based systems. HID is the unique identifier for a device driver instead. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:41 -06:00
Sinan Kaya	dc5542fb11	vfio: platform: determine reset capability Creating a new function to determine if this driver supports reset function or not. This is an attempt to abstract device tree calls from the rest of the code. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:40 -06:00
Sinan Kaya	f084aa7495	vfio: platform: move reset call to a common function The reset call sequence seems to replicate itself multiple times across the file. Grouping them together for maintenance reasons. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:38 -06:00
Sinan Kaya	7aef80cf31	vfio: platform: rename reset function Renaming the reset function to of_reset as it is only used by the device tree based platforms. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:36 -06:00
Ilya Lesokhin	d370c917b9	vfio: fix possible use after free of vfio group The vfio group should be released after the vfio_group_try_dissolve_container call. The code should not rely on someone else to hold a reference on the group. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-14 14:28:16 -06:00
Yongji Xie	05f0c03fba	vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive Current vfio-pci implementation disallows to mmap sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio page may be shared with other BARs. This will cause some performance issues when we passthrough a PCI device with this kind of BARs. Guest will be not able to handle the mmio accesses to the BARs which leads to mmio emulations in host. However, not all sub-page BARs will share page with other BARs. We should allow to mmap the sub-page MMIO BARs which we can make sure will not share page with other BARs. This patch adds support for this case. And we try to add a dummy resource to reserve the remainder of the page which hot-add device's BAR might be assigned into. But it's not necessary to handle the case when the BAR is not page aligned. Because we can't expect the BAR will be assigned into the same location in a page in guest when we passthrough the BAR. And it's hard to access this BAR in userspace because we have no way to get the BAR's location in a page. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-08 10:06:04 -06:00
Peng Fan	9698cbf0be	vfio: platform: support No-IOMMU mode The vfio No-IOMMU mode was supported by this 'commit `03a76b60f8` ("vfio: Include No-IOMMU mode")', but it only support vfio-pci. Using vfio_iommu_group_get/put, but not iommu_group_get/put, the platform devices can be exposed to userspace with CONFIG_VFIO_NOIOMMU and the "enable_unsafe_noiommu_mode" option enabled. From 'commit `03a76b60f8` ("vfio: Include No-IOMMU mode")', "This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported." Signed-off-by: Peng Fan <van.freenix@gmail.com> Cc: Eric Auger <eric.auger@linaro.org> Cc: Baptiste Reynal <b.reynal@virtualopensystems.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-23 09:37:17 -06:00
Alex Williamson	ce7585f3c4	vfio/pci: Allow VPD short read The size of the VPD area is not necessarily 4-byte aligned, so a pci_vpd_read() might return less than 4 bytes. Zero our buffer and accept anything other than an error. Intel X710 NICs exercise this. Fixes: `4e1a635552` ("vfio/pci: Use kernel VPD access functions") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-31 21:25:52 -06:00
Alex Williamson	089f1c6b2d	vfio/type1: Fix build warning This function cannot actually be called with npage = 0, so in practice this doesn't return an uninitialized value. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-30 07:58:10 -06:00
Alex Williamson	956b56a984	vfio/pci: Fix ordering of eventfd vs virqfd shutdown Both the INTx and MSI/X disable paths do an eventfd_ctx_put() for the trigger eventfd before calling vfio_virqfd_disable() any potential mask and unmask eventfds. This opens a use-after-free race where an inopportune irqfd can reference the freed signalling eventfd. Reorder to avoid this possibility. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-30 07:50:10 -06:00
Linus Torvalds	48dd7cefa0	VFIO updates for v4.7-rc1 - Hide INTx on certain known broken devices (Alex Williamson) - Additional backdoor reset detection (Alex Williamson) - Remove unused iommudata reference (Alexey Kardashevskiy) - Use cfg_size to avoid probing extended config space (Alexey Kardashevskiy) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJXRc8hAAoJECObm247sIsigWAP/R+q+UnOHXd7cvSKpI33p2IP ur09xwV9GktQViWz4tclFdfk8cScq86UX8I5e/jkeFVCXT7e5FU7FhKqJi7dPbp1 Yh3g5fGUkk6a0B3gebIq3qY02aLwJF17WZg0R/KHkQT0B/yQuTQqdF37Xh9rGv0S c7sgQn1X5nmO7jVDYYKO3SwEgmZWAcBz4Ht2XpqCdSrmhsse9OxlnmOkieM5BNUz rQhnziaeJE/Ya+y/A74XicbGforyThvNyJs/anJnPEE89773SGVQy/Jdlz4Lwji7 XImPuj4AT9duTgX9HaD38xIpFOsAKDfZ6sClsICkIvhbs232UXuiMxcPszDA97c0 7MxSJcLVr9fKB6zatq2JWhGDp7C6ylUxapEI9PFCV6gE5OYZRM7KD+hm32oVnuPv rSzPoqnm0Sudu8SO6n46QUZAifp+mX9MNhqzkXGR/YlBHhB1L3/QcczyekY0eBbj vJ2htebrg0qNQn4G9n4ygMm19r53ew/Q+pO2y7y4TdOr+gZNW1Wj7uOezMpvDOOB hiy+HkJ24MCTfAGNgjpjjCot/o608+QT5H+y8SR7vT1IK4shOSE4rYfvo6jlzRQp 9FRTolGhYqrtih+zB5R7eLghtvlDp4lN0gDCSgWGHM7e2rMLxaomzoF1VNQ8eKJZ iUJZ8jE2QrdGDpKssRuF =fGnT -----END PGP SIGNATURE----- Merge tag 'vfio-v4.7-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Hide INTx on certain known broken devices (Alex Williamson) - Additional backdoor reset detection (Alex Williamson) - Remove unused iommudata reference (Alexey Kardashevskiy) - Use cfg_size to avoid probing extended config space (Alexey Kardashevskiy) * tag 'vfio-v4.7-rc1' of git://github.com/awilliam/linux-vfio: vfio_pci: Test for extended capabilities if config space > 256 bytes vfio_iommu_spapr_tce: Remove unneeded iommu_group_get_iommudata vfio/pci: Add test for BAR restore vfio/pci: Hide broken INTx support from user	2016-05-25 09:47:26 -07:00
Linus Torvalds	c04a588029	powerpc updates for 4.7 Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G. Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G. Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXPsGzAAoJEFHr6jzI4aWAVoAP/iKdrDe0eYHlVAE9SqnbsiZs lgDxdsC8P3fsmP1G9o/HkKhC82zHl/La8Ztz8dtqa+LkSzbfliWP1ztJsI7GsBFo tyCKzWnX9Rwvd3meHu/o/SQ29TNLm/PbPyyRqpj5QPbJ8XCXkAXR7ZZZqjvcMsJW /AgIr7Cgf53tl9oZzzl/c7CnNHhMq+NBdA71vhWtUx+T97wfJEGyKW6HhZyHDbEU iAki7fu77ZpEqC/Fh9swf0dCGBJ+a132NoMVo0AdV7EQLznUYlQpQEqa+1PyHZOP /ArOzf2mDg6m3PfCo1eiB07v8PnVZ3llEUbVAJNg3GUxbE4SHrqq/kwm0iElm3p/ DvFxerCwdX9vmskJX4wDs+pSZRabXYj9XVMptsgFzA4joWrqqb7mBHqaort88YcY YSljEt1bHyXmiJ+dBya40qARsWUkCVN7ZgEzdxckq0KI3w7g2tqpqIbO2lClWT6t B3GpqQ4jp34+d1M14FB91fIGK7tMvOhSInE0Mv9+tPvRsepXqiiU/SwdAtRlr3m2 zs/K+4FYcVjJ3Rmpgc+tI38PbZxHe212I35YN6L1LP+4ZfAtzz0NyKdooTIBtkbO 19pX4WbBjKq8zK+YutrySncBIrbnI6VjW51vtRhgVKZliPFO/6zKagyU6FbxM+E5 udQES+t3F/9gvtxgxtDe =YvyQ -----END PGP SIGNATURE----- Merge tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." * tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits) powerpc/86xx: Fix PCI interrupt map definition powerpc/86xx: Move pci1 definition to the include file powerpc/fsl: Fix build of the dtb embedded kernel images powerpc/fsl: Fix rcpm compatible string powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC powerpc/fsl-pci: Add a workaround for PCI 5 errata powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb powerpc/powernv/npu: Add PE to PHB's list powerpc/powernv: Fix insufficient memory allocation powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner() powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover() powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()" powerpc/powernv/npu: Enable NVLink pass through powerpc/powernv/npu: Rework TCE Kill handling powerpc/powernv/npu: Add set/unset window helpers powerpc/powernv/ioda2: Export debug helper pe_level_printk() ...	2016-05-20 10:12:41 -07:00
Alexey Kardashevskiy	f705528094	vfio_pci: Test for extended capabilities if config space > 256 bytes PCI-Express spec says that reading 4 bytes at offset 100h should return zero if there is no extended capability so VFIO reads this dword to know if there are extended capabilities. However it is not always possible to access the extended space so generic PCI code in pci_cfg_space_size_ext() checks if pci_read_config_dword() can read beyond 100h and if the check fails, it sets the config space size to 100h. VFIO does its own extended capabilities check by reading at offset 100h which may produce 0xffffffff which VFIO treats as the extended config space presense and calls vfio_ecap_init() which fails to parse capabilities (which is expected) but right before the exit, it writes zero at offset 100h which is beyond the buffer allocated for vdev->vconfig (which is 256 bytes) which leads to random memory corruption. This makes VFIO only check for the extended capabilities if the discovered config size is more than 256 bytes. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-19 15:04:40 -06:00
Alexey Kardashevskiy	54de285beb	vfio/spapr: Relax the IOMMU compatibility check We are going to have multiple different types of PHB on the same system with POWER8 + NVLink and PHBs will have different IOMMU ops. However we only really care about one callback - create_table - so we can relax the compatibility check here. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:27 +10:00
Robin Murphy	d16e0faab9	iommu: Allow selecting page sizes per domain Many IOMMUs support multiple page table formats, meaning that any given domain may only support a subset of the hardware page sizes presented in iommu_ops->pgsize_bitmap. There are also certain use-cases where the creator of a domain may want to control which page sizes are used, for example to force the use of hugepage mappings to reduce pagetable walk depth. To this end, add a per-domain pgsize_bitmap to represent the subset of page sizes actually in use, to make it possible for domains with different requirements to coexist. Signed-off-by: Will Deacon <will.deacon@arm.com> [rm: hijacked and rebased original patch with new commit message] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2016-05-09 15:33:29 +02:00
Alexey Kardashevskiy	5ed4aba126	vfio_iommu_spapr_tce: Remove unneeded iommu_group_get_iommudata This removes iommu_group_get_iommudata() as the result is never used. As this is a minor cleanup, no change in behavior is expected. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-04-28 11:12:41 -06:00
Alex Williamson	dc92810997	vfio/pci: Add test for BAR restore If a device is reset without the memory or i/o bits enabled in the command register we may not detect it, potentially leaving the device without valid BAR programming. Add an additional test to check the BARs on each write to the command register. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-04-28 11:12:33 -06:00
Alex Williamson	450744051d	vfio/pci: Hide broken INTx support from user INTx masking has two components, the first is that we need the ability to prevent the device from continuing to assert INTx. This is provided via the DisINTx bit in the command register and is the only thing we can really probe for when testing if INTx masking is supported. The second component is that the device needs to indicate if INTx is asserted via the interrupt status bit in the device status register. With these two features we can generically determine if one of the devices we own is asserting INTx, signal the user, and mask the interrupt while the user services the device. Generally if one or both of these components is broken we resort to APIC level interrupt masking, which requires an exclusive interrupt since we have no way to determine the source of the interrupt in a shared configuration. This often makes it difficult or impossible to configure the system for userspace use of the device, for an interrupt mode that the user may not need. One possible configuration of broken INTx masking is that the DisINTx support is fully functional, but the interrupt status bit never signals interrupt assertion. In this case we do have the ability to prevent the device from asserting INTx, but lack the ability to identify the interrupt source. For this case we can simply pretend that the device lacks INTx support entirely, keeping DisINTx set on the physical device, virtualizing this bit for the user, and virtualizing the interrupt pin register to indicate no INTx support. We already support virtualization of the DisINTx bit and already virtualize the interrupt pin for platforms without INTx support. By tying these components together, setting DisINTx on open and reset, and identifying devices broken in this particular way, we can provide support for them w/o the handicap of APIC level INTx masking. Intel i40e (XL710/X710) 10/20/40GbE NICs have been identified as being broken in this specific way. We leave the vfio-pci.nointxmask option as a mechanism to bypass this support, enabling INTx on the device with all the requirements of APIC level masking. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: John Ronciak <john.ronciak@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>	2016-04-28 11:12:27 -06:00
Linus Torvalds	45cb5230f8	VFIO updates for v4.6-rc1 Various enablers for assignment of Intel graphics devices and future support of vGPU devices (Alex Williamson). This includes - Handling the vfio type1 interface as an API rather than a specific implementation, allowing multiple type1 providers. - Capability chains, similar to PCI device capabilities, that allow extending ioctls. Extensions here include device specific regions and sparse mmap descriptions. The former is used to expose non-PCI regions for IGD, including the OpRegion (particularly the Video BIOS Table), and read only PCI config access to the host and LPC bridge as drivers often depend on identifying those devices. Sparse mmaps here are used to describe the MSIx vector table, which vfio has always protected from mmap, but never had an API to explicitly define that protection. In future vGPU support this is expected to allow the description of PCI BARs that may mix direct access and emulated access within a single region. - The ability to expose the shadow ROM as an option ROM as IGD use cases may rely on the ROM even though the physical device does not make use of a PCI option ROM BAR. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJW6aT1AAoJECObm247sIsiiP4P/1xf7Z08/2QWVFQzex9CLcZk +/iJlyb/fTpPVQE+NTKPz3Qh5h6ZhSd/57s85IUqq0T6tgVPkoGx8kkyCjBaw2y1 yMezXZlQqJdZqGzQNI4OiHWvO+/vGxYKjQMfUnMlDM6dJgz4lGncGFoSouFPa3Vp mB12hGxrlk1cfIdb+C1KbfZcEdS0WhtigQtz8flBKgOfO+hYWmUO+CClJBhVw8Z4 RNcWNAxFfLuwUPVsPb6uOLG2g65SC2vmQ9k0Tnknf1znV3PFFVjITf0aM6uChLNP S3SgqtPX+6yOFyCuSEs8UKhhmCbeQmAyKgt5BpxV3Rw3OMP4PsVAehr82vQmSj6g 2o96pR2s8MDPBr8eG7gdRe4DQe3PonpLkpDfaghcpYqhkGEqNVeW5/GjiOzGQqD3 xMshzxJ1Iz7DOHkQRUVqOfupDB0TusJmTVKwvXe6yIYL9pjkUS/sbN9U563HYSES JTV68TMj0VKfKwD3XKYXvGH3km1sL4i5NMlAUrsDtsMkGlXEswoGbj82Mjc8+jUo BvWQTJb+kouJQ88VhsO2abg1UrO9E6u82iHFHy9fEObxE8KH7pvROlS93ihMT1Wv WQNuUcltdpHMRVX0BDknaPs3YtC3/TGgm3RcU5SZPbv/ys1471ZmJxMlAAKcfITr SuvkMTYElF5b1pigv46c =/lJn -----END PGP SIGNATURE----- Merge tag 'vfio-v4.6-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: "Various enablers for assignment of Intel graphics devices and future support of vGPU devices (Alex Williamson). This includes - Handling the vfio type1 interface as an API rather than a specific implementation, allowing multiple type1 providers. - Capability chains, similar to PCI device capabilities, that allow extending ioctls. Extensions here include device specific regions and sparse mmap descriptions. The former is used to expose non-PCI regions for IGD, including the OpRegion (particularly the Video BIOS Table), and read only PCI config access to the host and LPC bridge as drivers often depend on identifying those devices. Sparse mmaps here are used to describe the MSIx vector table, which vfio has always protected from mmap, but never had an API to explicitly define that protection. In future vGPU support this is expected to allow the description of PCI BARs that may mix direct access and emulated access within a single region. - The ability to expose the shadow ROM as an option ROM as IGD use cases may rely on the ROM even though the physical device does not make use of a PCI option ROM BAR" * tag 'vfio-v4.6-rc1' of git://github.com/awilliam/linux-vfio: vfio/pci: return -EFAULT if copy_to_user fails vfio/pci: Expose shadow ROM as PCI option ROM vfio/pci: Intel IGD host and LCP bridge config space access vfio/pci: Intel IGD OpRegion support vfio/pci: Enable virtual register in PCI config space vfio/pci: Add infrastructure for additional device specific regions vfio: Define device specific region type capability vfio/pci: Include sparse mmap capability for MSI-X table regions vfio: Define sparse mmap capability for regions vfio: Add capability chain helpers vfio: Define capability chains vfio: If an IOMMU backend fails, keep looking vfio/pci: Fix unsigned comparison overflow	2016-03-17 13:05:09 -07:00
Michael S. Tsirkin	8160c4e455	vfio: fix ioctl error handling Calling return copy_to_user(...) in an ioctl will not do the right thing if there's a pagefault: copy_to_user returns the number of bytes not copied in this case. Fix up vfio to do return copy_to_user(...)) ? -EFAULT : 0; everywhere. Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-28 07:38:52 -07:00
Dan Carpenter	c4aec31013	vfio/pci: return -EFAULT if copy_to_user fails The copy_to_user() function returns the number of bytes that were not copied but we want to return -EFAULT on error here. Fixes: `188ad9d6cb` ('vfio/pci: Include sparse mmap capability for MSI-X table regions') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-25 21:48:42 -07:00
Alex Williamson	a13b645917	vfio/pci: Expose shadow ROM as PCI option ROM Integrated graphics may have their ROM shadowed at 0xc0000 rather than implement a PCI option ROM. Make this ROM appear to the user using the ROM BAR. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:09 -07:00
Alex Williamson	f572a960a1	vfio/pci: Intel IGD host and LCP bridge config space access Provide read-only access to PCI config space of the PCI host bridge and LPC bridge through device specific regions. This may be used to configure a VM with matching register contents to satisfy driver requirements. Providing this through the vfio file descriptor removes an additional userspace requirement for access through pci-sysfs and removes the CAP_SYS_ADMIN requirement that doesn't appear to apply to the specific devices we're accessing. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:09 -07:00
Alex Williamson	5846ff54e8	vfio/pci: Intel IGD OpRegion support This is the first consumer of vfio device specific resource support, providing read-only access to the OpRegion for Intel graphics devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:09 -07:00
Alex Williamson	345d710491	vfio/pci: Enable virtual register in PCI config space Typically config space for a device is mapped out into capability specific handlers and unassigned space. The latter allows direct read/write access to config space. Sometimes we know about registers living in this void space and would like an easy way to virtualize them, similar to how BAR registers are managed. To do this, create one more pseudo (fake) PCI capability to be handled as purely virtual space. Reads and writes are serviced entirely from virtual config space. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:09 -07:00
Alex Williamson	28541d41c9	vfio/pci: Add infrastructure for additional device specific regions Add support for additional regions with indexes started after the already defined fixed regions. Device specific code can register these regions with the new vfio_pci_register_dev_region() function. The ops structure per region currently only includes read/write access and a release function, allowing automatic cleanup when the device is closed. mmap support is only missing here because it's not needed by the first user queued for this support. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:09 -07:00
Alex Williamson	188ad9d6cb	vfio/pci: Include sparse mmap capability for MSI-X table regions vfio-pci has never allowed the user to directly mmap the MSI-X vector table, but we've always relied on implicit knowledge of the user that they cannot do this. Now that we have capability chains that we can expose in the region info ioctl and a sparse mmap capability that represents the sub-areas within the region that can be mmap'd, we can make the mmap constraints more explicit. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:09 -07:00
Alex Williamson	d7a8d5ed87	vfio: Add capability chain helpers Allow sub-modules to easily reallocate a buffer for managing capability chains for info ioctls. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:08 -07:00
Alex Williamson	7c435b46c2	vfio: If an IOMMU backend fails, keep looking Consider an IOMMU to be an API rather than an implementation, we might have multiple implementations supporting the same API, so try another if one fails. The expectation here is that we'll really only have one implementation per device type. For instance the existing type1 driver works with any PCI device where the IOMMU API is available. A vGPU vendor may have a virtual PCI device which provides DMA isolation and mapping through other mechanisms, but can re-use userspaces that make use of the type1 VFIO IOMMU API. This allows that to work. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:10:08 -07:00
Alex Williamson	b95d9305e8	vfio/pci: Fix unsigned comparison overflow Signed versus unsigned comparisons are implicitly cast to unsigned, which result in a couple possible overflows. For instance (start + count) might overflow and wrap, getting through our validation test. Also when unwinding setup, -1 being compared as unsigned doesn't produce the intended stop condition. Fix both of these and also fix vfio_msi_set_vector_signal() to validate parameters before using the vector index, though none of the callers should pass bad indexes anymore. Reported-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-22 16:03:54 -07:00
Alex Williamson	16ab8a5cbe	vfio/noiommu: Don't use iommu_present() to track fake groups Using iommu_present() to determine whether an IOMMU group is real or fake has some problems. First, apparently Power systems don't register an IOMMU on the device bus, so the groups and containers get marked as noiommu and then won't bind to their actual IOMMU driver. Second, I expect we'll run into the same issue as we try to support vGPUs through vfio, since they're likely to emulate this behavior of creating an IOMMU group on a virtual device and then providing a vfio IOMMU backend tailored to the sort of isolation they provide, which won't necessarily be fully compatible with the IOMMU API. The solution here is to use the existing iommudata interface to IOMMU groups, which allows us to easily identify the fake groups we've created for noiommu purposes. The iommudata we set is purely arbitrary since we're only comparing the address, so we use the address of the noiommu switch itself. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <sshukla@mvista.com> Fixes: `03a76b60f8` ("vfio: Include No-IOMMU mode") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-01-27 11:22:25 -07:00
Pierre Morel	d4f50ee2f5	vfio/iommu_type1: make use of info.flags The flags entry is there to tell the user that some optional information is available. Since we report the iova_pgsizes signal it to the user by setting the flags to VFIO_IOMMU_INFO_PGSIZES. Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-01-04 12:55:44 -07:00
Alex Williamson	03a76b60f8	vfio: Include No-IOMMU mode There is really no way to safely give a user full access to a DMA capable device without an IOMMU to protect the host system. There is also no way to provide DMA translation, for use cases such as device assignment to virtual machines. However, there are still those users that want userspace drivers even under those conditions. The UIO driver exists for this use case, but does not provide the degree of device access and programming that VFIO has. In an effort to avoid code duplication, this introduces a No-IOMMU mode for VFIO. This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling the "enable_unsafe_noiommu_mode" option on the vfio driver. This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported. This patch includes no-iommu support for the vfio-pci bus driver only. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2015-12-21 15:28:11 -07:00
Dan Carpenter	967628827f	VFIO: platform: reset: fix a warning message condition This loop ends with count set to -1 and not zero so the warning message isn't printed when it should be. I've fixed this by change the postop to a preop. Fixes: `0990822c98` ('VFIO: platform: reset: AMD xgbe reset module') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-12-21 15:28:11 -07:00
Alex Williamson	ae5515d663	Revert: "vfio: Include No-IOMMU mode" Revert commit `033291eccb` ("vfio: Include No-IOMMU mode") due to lack of a user. This was originally intended to fill a need for the DPDK driver, but uptake has been slow so rather than support an unproven kernel interface revert it and revisit when userspace catches up. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-12-04 08:38:42 -07:00
Dan Carpenter	049af1060b	vfio: fix a warning message The first argument to the WARN() macro has to be a condition. I'm sort of disappointed that this code doesn't generate a compiler warning. I guess -Wformat-extra-args doesn't work in the kernel. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-21 06:55:58 -07:00
Kees Cook	7200be7c81	vfio: platform: remove needless stack usage request_module already takes format strings, so no need to duplicate the effort. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-20 09:00:10 -07:00
Julia Lawall	7d10f4e079	vfio-pci: constify pci_error_handlers structures This pci_error_handlers structure is never modified, like all the other pci_error_handlers structures, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-19 16:53:07 -07:00
Krzysztof Kozlowski	7fe9414223	vfio: Drop owner assignment from platform_driver platform_driver does not need to set an owner because platform_driver_register() will set it. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-19 15:52:08 -07:00
Linus Torvalds	934f98d7e8	VFIO updates for v4.4-rc1 - Use kernel interfaces for VPD emulation (Alex Williamson) - Platform fix for releasing IRQs (Eric Auger) - Type1 IOMMU always advertises PAGE_SIZE support when smaller mapping sizes are available (Eric Auger) - Platform fixes for incorrectly using copies of structures rather than pointers to structures (James Morse) - Rework platform reset modules, fix leak, and add AMD xgbe reset module (Eric Auger) - Fix vfio_device_get_from_name() return value (Joerg Roedel) - No-IOMMU interface (Alex Williamson) - Fix potential out of bounds array access in PCI config handling (Dan Carpenter) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWRg8+AAoJECObm247sIsitkwP+wc6cRzBpeGxiufZl9Ci7JV9 G0qNHBm8tAYDwm1uJATcyZC303ad2B3gaYSO6msTFUXTg3d9ZDtUOWhoTNPs/px1 5dF38DREzqYHpyC9HT2Qj3i9G+ejvgg+SoyxBOTOIw/dmq3tGZhUz1Sj+wuvQTwr XXBKsWltZ+8wwXmQXmOWI4L3m7Xhs8NwAl5iLJ3UiltpW9zZzuPtoKnQCfMYUcmh hIJg52t0WPSLyn47UvecUQcqxaO+QYELa7UN84fnQAihk+ewMpzg5blTebayFdu3 f2WC3ivxbamebw74LaRfQFjx4mT+DI0aXYtraC600PVe7gdVXB66QMNNpPhBwAy5 wpfeFpTKU5gC+LHmrIMUS2/A4sdNfUBw44CS8+Lm2D6bQAblPv/C5xQV1rz9HADv f4/D3Y0TUKSYArewtBHTC0mnXdkZwetttBoy6/zQBl8vkelhoJ3GPcVa8FEZCIuT 2MSS17I3ftJ1enfynicF+Wstn/H5lWcuRBdg5wTLHIuhFn6MiEVxfIuSEx9JfjIb NGZO7y5JiJ0b5QRCG0tFznwceU/cql/3oRqOGXqaf1cQ1Ag3JOIAUzxknFoJQUj1 XYe+Im1eMaugjj39J3+m5EYNKT3nh/bBLD/V3iWYpgoZtQrmQQm5nu0JsQo88/JR je0BuJioCuPlO/Wj/KYw =bI62 -----END PGP SIGNATURE----- Merge tag 'vfio-v4.4-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Use kernel interfaces for VPD emulation (Alex Williamson) - Platform fix for releasing IRQs (Eric Auger) - Type1 IOMMU always advertises PAGE_SIZE support when smaller mapping sizes are available (Eric Auger) - Platform fixes for incorrectly using copies of structures rather than pointers to structures (James Morse) - Rework platform reset modules, fix leak, and add AMD xgbe reset module (Eric Auger) - Fix vfio_device_get_from_name() return value (Joerg Roedel) - No-IOMMU interface (Alex Williamson) - Fix potential out of bounds array access in PCI config handling (Dan Carpenter) * tag 'vfio-v4.4-rc1' of git://github.com/awilliam/linux-vfio: vfio/pci: make an array larger vfio: Include No-IOMMU mode vfio: Fix bug in vfio_device_get_from_name() VFIO: platform: reset: AMD xgbe reset module vfio: platform: reset: calxedaxgmac: fix ioaddr leak vfio: platform: add dev_info on device reset vfio: platform: use list of registered reset function vfio: platform: add compat in vfio_platform_device vfio: platform: reset: calxedaxgmac: add reset function registration vfio: platform: introduce module_vfio_reset_handler macro vfio: platform: add capability to register a reset function vfio: platform: introduce vfio-platform-base module vfio/platform: store mapped memory in region, instead of an on-stack copy vfio/type1: handle case where IOMMU does not support PAGE_SIZE size VFIO: platform: clear IRQ_NOAUTOEN when de-assigning the IRQ vfio/pci: Use kernel VPD access functions vfio: Whitelist PCI bridges	2015-11-13 17:05:32 -08:00
Dan Carpenter	222e684ca7	vfio/pci: make an array larger Smatch complains about a possible out of bounds error: drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init() error: buffer overflow 'pci_cap_length' 20 <= 20 The problem is that pci_cap_length[] was defined as large enough to hold "PCI_CAP_ID_AF + 1" elements. The code in vfio_cap_init() assumes it has PCI_CAP_ID_MAX + 1 elements. Originally, PCI_CAP_ID_AF and PCI_CAP_ID_MAX were the same but then we introduced PCI_CAP_ID_EA in commit `f80b0ba959` ("PCI: Add Enhanced Allocation register entries") so now the array is too small. Let's fix this by making the array size PCI_CAP_ID_MAX + 1. And let's make a similar change to pci_ext_cap_length[] for consistency. Also both these arrays can be made const. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-09 08:59:11 -07:00
Alex Williamson	033291eccb	vfio: Include No-IOMMU mode There is really no way to safely give a user full access to a DMA capable device without an IOMMU to protect the host system. There is also no way to provide DMA translation, for use cases such as device assignment to virtual machines. However, there are still those users that want userspace drivers even under those conditions. The UIO driver exists for this use case, but does not provide the degree of device access and programming that VFIO has. In an effort to avoid code duplication, this introduces a No-IOMMU mode for VFIO. This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling the "enable_unsafe_noiommu_mode" option on the vfio driver. This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported. This patch includes no-iommu support for the vfio-pci bus driver only. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2015-11-04 09:56:16 -07:00
Joerg Roedel	e324fc82ea	vfio: Fix bug in vfio_device_get_from_name() The vfio_device_get_from_name() function might return a non-NULL pointer, when called with a device name that is not found in the list. This causes undefined behavior, in my case calling an invalid function pointer later on: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffff8800cb3ddc08 [...] Call Trace: [<ffffffffa03bd733>] ? vfio_group_fops_unl_ioctl+0x253/0x410 [vfio] [<ffffffff811efc4d>] do_vfs_ioctl+0x2cd/0x4c0 [<ffffffff811f9657>] ? __fget+0x77/0xb0 [<ffffffff811efeb9>] SyS_ioctl+0x79/0x90 [<ffffffff81001bb0>] ? syscall_return_slowpath+0x50/0x130 [<ffffffff8167f776>] entry_SYSCALL_64_fastpath+0x16/0x75 Fix the issue by returning NULL when there is no device with the requested name in the list. Cc: stable@vger.kernel.org # v4.2+ Fixes: `4bc94d5dc9` ("vfio: Fix lockdep issue") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-04 09:27:39 -07:00
Eric Auger	0990822c98	VFIO: platform: reset: AMD xgbe reset module This patch introduces a module that registers and implements a low-level reset function for the AMD XGBE device. it performs the following actions: - reset the PHY - disable auto-negotiation - disable & clear auto-negotiation IRQ - soft-reset the MAC Those tiny pieces of code are inherited from the native xgbe driver. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:55:21 -07:00
Eric Auger	daac3bbedb	vfio: platform: reset: calxedaxgmac: fix ioaddr leak In the current code the vfio_platform_region is copied on the stack. As a consequence the ioaddr address is not iounmapped in the vfio platform driver (vfio_platform_regions_cleanup). The patch uses the pointer to the region instead. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:55:06 -07:00
Eric Auger	705e60bae3	vfio: platform: add dev_info on device reset It might be helpful for the end-user to check the device reset function was found by the vfio platform reset framework. Lets store a pointer to the struct device in vfio_platform_device and trace when the reset function is called or not found. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:55:05 -07:00
Eric Auger	e9e0506ee6	vfio: platform: use list of registered reset function Remove the static lookup table and use the dynamic list of registered reset functions instead. Also load the reset module through its alias. The reset struct module pointer is stored in vfio_platform_device. We also remove the useless struct device pointer parameter in vfio_platform_get_reset. This patch fixes the issue related to the usage of __symbol_get, which besides from being moot, prevented compilation with CONFIG_MODULES disabled. Also usage of MODULE_ALIAS makes possible to add a new reset module without needing to update the framework. This was suggested by Arnd. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:55:03 -07:00
Eric Auger	0628c4dfd3	vfio: platform: add compat in vfio_platform_device Let's retrieve the compatibility string on probe and store it in the vfio_platform_device struct Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:55:02 -07:00
Eric Auger	680742f644	vfio: platform: reset: calxedaxgmac: add reset function registration This patch adds the reset function registration/unregistration. This is handled through the module_vfio_reset_handler macro. This latter also defines a MODULE_ALIAS which simplifies the load from vfio-platform. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:55:00 -07:00
Eric Auger	588646529f	vfio: platform: introduce module_vfio_reset_handler macro The module_vfio_reset_handler macro - define a module alias - implement module init/exit function which respectively registers and unregisters the reset function. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:54:59 -07:00
Eric Auger	e086497d31	vfio: platform: add capability to register a reset function In preparation for subsequent changes in reset function lookup, lets introduce a dynamic list of reset combos (compat string, reset module, reset function). The list can be populated/voided with vfio_platform_register/unregister_reset. Those are not yet used in this patch. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:54:57 -07:00
Eric Auger	32a2d71c4e	vfio: platform: introduce vfio-platform-base module To prepare for vfio platform reset rework let's build vfio_platform_common.c and vfio_platform_irq.c in a separate module from vfio-platform and vfio-amba. This makes possible to have separate module inits and works around a race between platform driver init and vfio reset module init: that way we make sure symbols exported by base are available when vfio-platform driver gets probed. The open/release being implemented in the base module, the ref count is applied to the parent module instead. Signed-off-by: Eric Auger <eric.auger@linaro.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:54:56 -07:00
James Morse	1b4bb2eaa9	vfio/platform: store mapped memory in region, instead of an on-stack copy vfio_platform_{read,write}_mmio() call ioremap_nocache() to map a region of io memory, which they store in struct vfio_platform_region to be eventually re-used, or unmapped by vfio_platform_regions_cleanup(). These functions receive a copy of their struct vfio_platform_region argument on the stack - so these mapped areas are always allocated, and always leaked. Pass this argument as a pointer instead. Fixes: `6e3f264560` "vfio/platform: read and write support for the device fd" Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Tested-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:54:00 -07:00
Eric Auger	4644321fd3	vfio/type1: handle case where IOMMU does not support PAGE_SIZE size Current vfio_pgsize_bitmap code hides the supported IOMMU page sizes smaller than PAGE_SIZE. As a result, in case the IOMMU does not support PAGE_SIZE page, the alignment check on map/unmap is done with larger page sizes, if any. This can fail although mapping could be done with pages smaller than PAGE_SIZE. This patch modifies vfio_pgsize_bitmap implementation so that, in case the IOMMU supports page sizes smaller than PAGE_SIZE we pretend PAGE_SIZE is supported and hide sub-PAGE_SIZE sizes. That way the user will be able to map/unmap buffers whose size/ start address is aligned with PAGE_SIZE. Pinning code uses that granularity while iommu driver can use the sub-PAGE_SIZE size to map the buffer. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-03 12:53:53 -07:00
Eric Auger	1276ece32c	VFIO: platform: clear IRQ_NOAUTOEN when de-assigning the IRQ The vfio platform driver currently sets the IRQ_NOAUTOEN before doing the request_irq to properly handle the user masking. However it does not clear it when de-assigning the IRQ. This brings issues when loading the native driver again which may not explicitly enable the IRQ. This problem was observed with xgbe driver. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-27 15:02:53 -06:00
Alex Williamson	4e1a635552	vfio/pci: Use kernel VPD access functions The PCI VPD capability operates on a set of window registers in PCI config space. Writing to the address register triggers either a read or write, depending on the setting of the PCI_VPD_ADDR_F bit within the address register. The data register provides either the source for writes or the target for reads. This model is susceptible to being broken by concurrent access, for which the kernel has adopted a set of access functions to serialize these registers. Additionally, commits like `932c435cab` ("PCI: Add dev_flags bit to access VPD through function 0") and `7aa6ca4d39` ("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate that VPD registers can be shared between functions on multifunction devices creating dependencies between otherwise independent devices. Fortunately it's quite easy to emulate the VPD registers, simply storing copies of the address and data registers in memory and triggering a VPD read or write on writes to the address register. This allows vfio users to avoid seeing spurious register changes from accesses on other devices and enables the use of shared quirks in the host kernel. We can theoretically still race with access through sysfs, but the window of opportunity is much smaller. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Mark Rustad <mark.d.rustad@intel.com>	2015-10-27 14:53:05 -06:00
Alex Williamson	5f096b14d4	vfio: Whitelist PCI bridges When determining whether a group is viable, we already allow devices bound to pcieport. Generalize this to include any PCI bridge device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-27 14:53:04 -06:00
Feng Wu	6d7425f109	vfio: Register/unregister irq_bypass_producer This patch adds the registration/unregistration of an irq_bypass_producer for MSI/MSIx on vfio pci devices. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:50 +02:00
Alex Williamson	4bc94d5dc9	vfio: Fix lockdep issue When we open a device file descriptor, we currently have the following: vfio_group_get_device_fd() mutex_lock(&group->device_lock); open() ... if (ret) release() If we hit that error case, we call the backend driver release path, which for vfio-pci looks like this: vfio_pci_release() vfio_pci_disable() vfio_pci_try_bus_reset() vfio_pci_get_devs() vfio_device_get_from_dev() vfio_group_get_device() mutex_lock(&group->device_lock); Whoops, we've stumbled back onto group.device_lock and created a deadlock. There's a low likelihood of ever seeing this play out, but obviously it needs to be fixed. To do that we can use a reference to the vfio_device for vfio_group_get_device_fd() rather than holding the lock. There was a loop in this function, theoretically allowing multiple devices with the same name, but in practice we don't expect such a thing to happen and the code is already aborting from the loop with break on any sort of error rather than continuing and only parsing the first match anyway, so the loop was effectively unused already. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Fixes: `20f300175a` ("vfio/pci: Fix racy vfio_device_get_from_dev() call") Reported-by: Joerg Roedel <joro@8bytes.org> Tested-by: Joerg Roedel <jroedel@suse.de>	2015-07-24 15:14:04 -06:00
Linus Torvalds	b779157dd3	VFIO updates for v4.2 - Fix race with device reference versus driver release (Alex Williamson) - Add reset hooks and Calxeda xgmac reset for vfio-platform (Eric Auger) - Enable vfio-platform for ARM64 (Eric Auger) - Tag Baptiste Reynal as vfio-platform sub-maintainer (Alex Williamson) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVjaiUAAoJECObm247sIsiRJQP/RiqcYg9uDGbBt8Gi+Qi2BmG KrVqG/ty18CZgyis01GZa5O8bL2ViQFAwfpnIJ6l5v4IQAYUUPlfy6Gy9ZgotQir U5ot4ABT9KGqJyjkDixvfUmg5M1gj1euO/xC+KVMpnxN0tC1wbymtAahzd3/RXyq 5cef/cran5bwnh2RtAN/rFkbjZBuqppVZJJV7uyOd69HXhktrZw5skQssL/61+vT znww9H6WT/Qk60haQrcD+SVqAaPv6Tx8bu1LVZ1Zc/rWVxJ2HisOp5dTxGeRnaSu PtMfOndgtYwe5fBaYyNjWluJLpBzY7b5UvHtsZ52hPXvlgkKSMrfCJxM4BQQSekW txmeV6LpQsm+Nq0+sIOFph8h/LzZaebvMsc5YnLamixGb/mAxbbr31yGjokTuS4F ndKUR+9/mFrdcqeBT4UD1Yt45YHl7bRZ9AWKJIC1TstrTW+1rEZ49r71PLea6y4f SJJ+eCvhYr/eebvrsfePqzeo1X+S8HNTgCXrSlqZdohS21qtnsh4g/CaOxI2UVt8 zaZosxNQLzpNK/Z8s7MZpCMenWZsq2frcckROeEgNKdPntwkbVT4MG3LBrWvtuBU HGAKfWCGAjfIIN8ZBUSFtIqua/kXCcec99CcYYQOtv16UbmhadAPoUlMur3HI9T9 S0vWtIKTsRa8ymyrGBDp =/gWA -----END PGP SIGNATURE----- Merge tag 'vfio-v4.2-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - fix race with device reference versus driver release (Alex Williamson) - add reset hooks and Calxeda xgmac reset for vfio-platform (Eric Auger) - enable vfio-platform for ARM64 (Eric Auger) - tag Baptiste Reynal as vfio-platform sub-maintainer (Alex Williamson) * tag 'vfio-v4.2-rc1' of git://github.com/awilliam/linux-vfio: MAINTAINERS: Add vfio-platform sub-maintainer VFIO: platform: enable ARM64 build VFIO: platform: Calxeda xgmac reset module VFIO: platform: populate the reset function on probe VFIO: platform: add reset callback VFIO: platform: add reset struct and lookup table vfio/pci: Fix racy vfio_device_get_from_dev() call	2015-06-28 12:32:13 -07:00
Linus Torvalds	08d183e3c1	powerpc updates for 4.2 - Disable the 32-bit vdso when building LE, so we can build with a 64-bit only toolchain. - EEH fixes from Gavin & Richard. - Enable the sys_kcmp syscall from Laurent. - Sysfs control for fastsleep workaround from Shreyas. - Expose OPAL events as an irq chip by Alistair. - MSI ops moved to pci_controller_ops by Daniel. - Fix for kernel to userspace backtraces for perf from Anton. - Merge pseries and pseries_le defconfigs from Cyril. - CXL in-kernel API from Mikey. - OPAL prd driver from Jeremy. - Fix for DSCR handling & tests from Anshuman. - Powernv flash mtd driver from Cyril. - Dynamic DMA Window support on powernv from Alexey. - LLVM clang fixes & workarounds from Anton. - Reworked version of the patch to abort syscalls when transactional. - Fix the swap encoding to support 4TB, from Aneesh. - Various fixes as usual. - Freescale updates from Scott: Highlights include more 8xx optimizations, an e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and various fixes and cleanup. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJViSZqAAoJEFHr6jzI4aWAA7kQAKq3+pejfo2rY7alpKJyeVao vlaIEaDNOTh+ctcmu3MFF9Jy6fai8gNZziRXU5JRmE5RW4GVBN4KZiqXRbkVjdBK uG9sCX7Y58VRsS2vnGBYLsamfTMgjaXeDvgunQHVLiechJnrDr0RHEK90F3LSi73 Axp6l8XIG63a3zFZmkhzANMCme2lm5+MWmGlSjUUNi5F+viQUgJc5iiO8xrVUgM5 RpNlV2NJSqFiU+gMQWJ226V85UIniouq4j+qtyUcu8/m9BberyolXVU0GPlPFdsx r/Qh9uCJyZaUdSB5hzomQZj50IsSz6J6nEuJTeGRoVZOmeI8Dnc2xU9fxQF5fC8H lUJw10WPoNOggQZTeSUKn7wTXw3i4p3KsWNUczaW68VJdhqZUVaSp0+I6mnDSqzs 9iGC+VffLYNa1OHq7mGRFrgDdLBCHes31aZ3CxlQsmyNpAPCwMzsD4TUfVnvOG6E oJOeaQ4mZM9PvqxEYJfoIL+vgRxmQ8sdIBtNY4in+C7J6eFnZNFO9xmPnJZuVU31 PGtx60kjFCOVMXvqn34WkRNbgqGWI91IK0KcRwFO2LXVio1uY77TWL52kNK2IMsp Az+VDDvqnT3+BoV1yz0P6SrXAkwTpvFk2y+IdmEiUUN7zZFL5ZSA2epej9AzHTAK WID2bc5yVtIL6p6x5ICH =d9Wh -----END PGP SIGNATURE----- Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - disable the 32-bit vdso when building LE, so we can build with a 64-bit only toolchain. - EEH fixes from Gavin & Richard. - enable the sys_kcmp syscall from Laurent. - sysfs control for fastsleep workaround from Shreyas. - expose OPAL events as an irq chip by Alistair. - MSI ops moved to pci_controller_ops by Daniel. - fix for kernel to userspace backtraces for perf from Anton. - merge pseries and pseries_le defconfigs from Cyril. - CXL in-kernel API from Mikey. - OPAL prd driver from Jeremy. - fix for DSCR handling & tests from Anshuman. - Powernv flash mtd driver from Cyril. - dynamic DMA Window support on powernv from Alexey. - LLVM clang fixes & workarounds from Anton. - reworked version of the patch to abort syscalls when transactional. - fix the swap encoding to support 4TB, from Aneesh. - various fixes as usual. - Freescale updates from Scott: Highlights include more 8xx optimizations, an e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and various fixes and cleanup. * tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits) cxl: Fix typo in debug print cxl: Add CXL_KERNEL_API config option powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma() powerpc/mm: Change the swap encoding in pte. powerpc/mm: PTE_RPN_MAX is not used, remove the same powerpc/tm: Abort syscalls in active transactions powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off powerpc/include: Add opal-prd to installed uapi headers powerpc/powernv: fix construction of opal PRD messages powerpc/powernv: Increase opal-irqchip initcall priority powerpc: Make doorbell check preemption safe powerpc/powernv: pnv_init_idle_states() should only run on powernv macintosh/nvram: Remove as unused powerpc: Don't use gcc specific options on clang powerpc: Don't use -mno-strict-align on clang powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it powerpc: Only use -mabi=altivec if toolchain supports it powerpc: Fix duplicate const clang warning in user access code vfio: powerpc/spapr: Support Dynamic DMA windows vfio: powerpc/spapr: Register memory and define IOMMU v2 ...	2015-06-24 08:46:32 -07:00
Eric Auger	e6bcd47f8f	VFIO: platform: enable ARM64 build This patch enables building VFIO platform and derivatives on ARM64. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Tested-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-22 09:35:47 -06:00
Eric Auger	713cc334a6	VFIO: platform: Calxeda xgmac reset module This patch introduces a module that registers and implements a basic reset function for the Calxeda xgmac device. This latter basically disables interrupts and stops DMA transfers. The reset function code is inherited from the native calxeda xgmac driver. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Tested-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-22 09:35:38 -06:00
Eric Auger	3eeb0d510c	VFIO: platform: populate the reset function on probe The reset function lookup happens on vfio-platform probe. The reset module load is requested and a reference to the function symbol is hold. The reference is released on vfio-platform remove. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Tested-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-22 09:35:30 -06:00
Eric Auger	813ae66008	VFIO: platform: add reset callback A new reset callback is introduced. If this callback is populated, the reset is invoked on device first open/last close or upon userspace ioctl. The modality is exposed on VFIO_DEVICE_GET_INFO. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Tested-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-22 09:35:22 -06:00
Eric Auger	9f85d8f9fa	VFIO: platform: add reset struct and lookup table This patch introduces the vfio_platform_reset_combo struct that stores all the information useful to handle the reset modality: compat string, name of the reset function, name of the module that implements the reset function. A lookup table of such structures is added, currently void. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Tested-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-17 15:39:09 -06:00
Alexey Kardashevskiy	e633bc86a9	vfio: powerpc/spapr: Support Dynamic DMA windows This adds create/remove window ioctls to create and remove DMA windows. sPAPR defines a Dynamic DMA windows capability which allows para-virtualized guests to create additional DMA windows on a PCI bus. The existing linux kernels use this new window to map the entire guest memory and switch to the direct DMA operations saving time on map/unmap requests which would normally happen in a big amounts. This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows. Up to 2 windows are supported now by the hardware and by this driver. This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional information such as a number of supported windows and maximum number levels of TCE tables. DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature as we still want to support v2 on platforms which cannot do DDW for the sake of TCE acceleration in KVM (coming soon). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:55 +10:00
Alexey Kardashevskiy	2157e7b82f	vfio: powerpc/spapr: Register memory and define IOMMU v2 The existing implementation accounts the whole DMA window in the locked_vm counter. This is going to be worse with multiple containers and huge DMA windows. Also, real-time accounting would requite additional tracking of accounted pages due to the page size difference - IOMMU uses 4K pages and system uses 4K or 64K pages. Another issue is that actual pages pinning/unpinning happens on every DMA map/unmap request. This does not affect the performance much now as we spend way too much time now on switching context between guest/userspace/host but this will start to matter when we add in-kernel DMA map/unmap acceleration. This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU. New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces 2 new ioctls to register/unregister DMA memory - VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY - which receive user space address and size of a memory region which needs to be pinned/unpinned and counted in locked_vm. New IOMMU splits physical pages pinning and TCE table update into 2 different operations. It requires: 1) guest pages to be registered first 2) consequent map/unmap requests to work only with pre-registered memory. For the default single window case this means that the entire guest (instead of 2GB) needs to be pinned before using VFIO. When a huge DMA window is added, no additional pinning will be required, otherwise it would be guest RAM + 2GB. The new memory registration ioctls are not supported by VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration will require memory to be preregistered in order to work. The accounting is done per the user process. This advertises v2 SPAPR TCE IOMMU and restricts what the userspace can do with v1 or v2 IOMMUs. In order to support memory pre-registration, we need a way to track the use of every registered memory region and only allow unregistration if a region is not in use anymore. So we need a way to tell from what region the just cleared TCE was from. This adds a userspace view of the TCE table into iommu_table struct. It contains userspace address, one per TCE entry. The table is only allocated when the ownership over an IOMMU group is taken which means it is only used from outside of the powernv code (such as VFIO). As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support DDW API), this creates a default DMA window for IODA2 for consistency. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:55 +10:00
Alexey Kardashevskiy	46d3e1e162	vfio: powerpc/spapr: powerpc/powernv/ioda2: Use DMA windows API in ownership control Before the IOMMU user (VFIO) would take control over the IOMMU table belonging to a specific IOMMU group. This approach did not allow sharing tables between IOMMU groups attached to the same container. This introduces a new IOMMU ownership flavour when the user can not just control the existing IOMMU table but remove/create tables on demand. If an IOMMU implements take/release_ownership() callbacks, this lets the user have full control over the IOMMU group. When the ownership is taken, the platform code removes all the windows so the caller must create them. Before returning the ownership back to the platform code, VFIO unprograms and removes all the tables it created. This changes IODA2's onwership handler to remove the existing table rather than manipulating with the existing one. From now on, iommu_take_ownership() and iommu_release_ownership() are only called from the vfio_iommu_spapr_tce driver. Old-style ownership is still supported allowing VFIO to run on older P5IOC2 and IODA IO controllers. No change in userspace-visible behaviour is expected. Since it recreates TCE tables on each ownership change, related kernel traces will appear more often. This adds a pnv_pci_ioda2_setup_default_config() which is called when PE is being configured at boot time and when the ownership is passed from VFIO to the platform code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:54 +10:00
Alexey Kardashevskiy	4793d65d1a	vfio: powerpc/spapr: powerpc/powernv/ioda: Define and implement DMA windows API This extends iommu_table_group_ops by a set of callbacks to support dynamic DMA windows management. create_table() creates a TCE table with specific parameters. it receives iommu_table_group to know nodeid in order to allocate TCE table memory closer to the PHB. The exact format of allocated multi-level table might be also specific to the PHB model (not the case now though). This callback calculated the DMA window offset on a PCI bus from @num and stores it in a just created table. set_window() sets the window at specified TVT index + @num on PHB. unset_window() unsets the window from specified TVT. This adds a free() callback to iommu_table_ops to free the memory (potentially a tree of tables) allocated for the TCE table. create_table() and free() are supposed to be called once per VFIO container and set_window()/unset_window() are supposed to be called for every group in a container. This adds IOMMU capabilities to iommu_table_group such as default 32bit window parameters and others. This makes use of new values in vfio_iommu_spapr_tce. IODA1/P5IOC2 do not support DDW so they do not advertise pagemasks to the userspace. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:52 +10:00
Alexey Kardashevskiy	05c6cfb9dc	powerpc/iommu/powernv: Release replaced TCE At the moment writing new TCE value to the IOMMU table fails with EBUSY if there is a valid entry already. However PAPR specification allows the guest to write new TCE value without clearing it first. Another problem this patch is addressing is the use of pool locks for external IOMMU users such as VFIO. The pool locks are to protect DMA page allocator rather than entries and since the host kernel does not control what pages are in use, there is no point in pool locks and exchange()+put_page(oldtce) is sufficient to avoid possible races. This adds an exchange() callback to iommu_table_ops which does the same thing as set() plus it returns replaced TCE and DMA direction so the caller can release the pages afterwards. The exchange() receives a physical address unlike set() which receives linear mapping address; and returns a physical address as the clear() does. This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement for a platform to have exchange() implemented in order to support VFIO. This replaces iommu_tce_build() and iommu_clear_tce() with a single iommu_tce_xchg(). This makes sure that TCE permission bits are not set in TCE passed to IOMMU API as those are to be calculated by platform code from DMA direction. This moves SetPageDirty() to the IOMMU code to make it work for both VFIO ioctl interface in in-kernel TCE acceleration (when it becomes available later). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:49 +10:00
Alexey Kardashevskiy	f87a88642e	vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control This adds tce_iommu_take_ownership() and tce_iommu_release_ownership which call in a loop iommu_take_ownership()/iommu_release_ownership() for every table on the group. As there is just one now, no change in behaviour is expected. At the moment the iommu_table struct has a set_bypass() which enables/ disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code which calls this callback when external IOMMU users such as VFIO are about to get over a PHB. The set_bypass() callback is not really an iommu_table function but IOMMU/PE function. This introduces a iommu_table_group_ops struct and adds take_ownership()/release_ownership() callbacks to it which are called when an external user takes/releases control over the IOMMU. This replaces set_bypass() with ownership callbacks as it is not necessarily just bypass enabling, it can be something else/more so let's give it more generic name. The callbacks is implemented for IODA2 only. Other platforms (P5IOC2, IODA1) will use the old iommu_take_ownership/iommu_release_ownership API. The following patches will replace iommu_take_ownership/ iommu_release_ownership calls in IODA2 with full IOMMU table release/ create. As we here and touching bypass control, this removes pnv_pci_ioda2_setup_bypass_pe() as it does not do much more compared to pnv_pci_ioda2_set_bypass. This moves tce_bypass_base initialization to pnv_pci_ioda2_setup_dma_pe. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:47 +10:00
Alexey Kardashevskiy	0eaf4defc7	powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group So far one TCE table could only be used by one IOMMU group. However IODA2 hardware allows programming the same TCE table address to multiple PE allowing sharing tables. This replaces a single pointer to a group in a iommu_table struct with a linked list of groups which provides the way of invalidating TCE cache for every PE when an actual TCE table is updated. This adds pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group() helpers to manage the list. However without VFIO, it is still going to be a single IOMMU group per iommu_table. This changes iommu_add_device() to add a device to a first group from the group list of a table as it is only called from the platform init code or PCI bus notifier and at these moments there is only one group per table. This does not change TCE invalidation code to loop through all attached groups in order to simplify this patch and because it is not really needed in most cases. IODA2 is fixed in a later patch. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:15 +10:00
Alexey Kardashevskiy	b348aa6529	powerpc/spapr: vfio: Replace iommu_table with iommu_table_group Modern IBM POWERPC systems support multiple (currently two) TCE tables per IOMMU group (a.k.a. PE). This adds a iommu_table_group container for TCE tables. Right now just one table is supported. This defines iommu_table_group struct which stores pointers to iommu_group and iommu_table(s). This replaces iommu_table with iommu_table_group where iommu_table was used to identify a group: - iommu_register_group(); - iommudata of generic iommu_group; This removes @data from iommu_table as it_table_group provides same access to pnv_ioda_pe. For IODA, instead of embedding iommu_table, the new iommu_table_group keeps pointers to those. The iommu_table structs are allocated dynamically. For P5IOC2, both iommu_table_group and iommu_table are embedded into PE struct. As there is no EEH and SRIOV support for P5IOC2, iommu_free_table() should not be called on iommu_table struct pointers so we can keep it embedded in pnv_phb::p5ioc2. For pSeries, this replaces multiple calls of kzalloc_node() with a new iommu_pseries_alloc_group() helper and stores the table group struct pointer into the pci_dn struct. For release, a iommu_table_free_group() helper is added. This moves iommu_table struct allocation from SR-IOV code to the generic DMA initialization code in pnv_pci_ioda_setup_dma_pe and pnv_pci_ioda2_setup_dma_pe as this is where DMA is actually initialized. This change is here because those lines had to be changed anyway. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:57 +10:00
Alexey Kardashevskiy	22af48596e	vfio: powerpc/spapr: Rework groups attaching This is to make extended ownership and multiple groups support patches simpler for review. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	649354b75d	vfio: powerpc/spapr: Moving pinning/unpinning to helpers This is a pretty mechanical patch to make next patches simpler. New tce_iommu_unuse_page() helper does put_page() now but it might skip that after the memory registering patch applied. As we are here, this removes unnecessary checks for a value returned by pfn_to_page() as it cannot possibly return NULL. This moves tce_iommu_disable() later to let tce_iommu_clear() know if the container has been enabled because if it has not been, then put_page() must not be called on TCEs from the TCE table. This situation is not yet possible but it will after KVM acceleration patchset is applied. This changes code to work with physical addresses rather than linear mapping addresses for better code readability. Following patches will add an xchg() callback for an IOMMU table which will accept/return physical addresses (unlike current tce_build()) which will eliminate redundant conversions. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	3c56e822f8	vfio: powerpc/spapr: Disable DMA mappings on disabled container At the moment DMA map/unmap requests are handled irrespective to the container's state. This allows the user space to pin memory which it might not be allowed to pin. This adds checks to MAP/UNMAP that the container is enabled, otherwise -EPERM is returned. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	2d270df8f7	vfio: powerpc/spapr: Move locked_vm accounting to helpers There moves locked pages accounting to helpers. Later they will be reused for Dynamic DMA windows (DDW). This reworks debug messages to show the current value and the limit. This stores the locked pages number in the container so when unlocking the iommu table pointer won't be needed. This does not have an effect now but it will with the multiple tables per container as then we will allow attaching/detaching groups on fly and we may end up having a container with no group attached but with the counter incremented. While we are here, update the comment explaining why RLIMIT_MEMLOCK might be required to be bigger than the guest RAM. This also prints pid of the current process in pr_warn/pr_debug. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	00663d4ee0	vfio: powerpc/spapr: Use it_page_size This makes use of the it_page_size from the iommu_table struct as page size can differ. This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code as recently introduced IOMMU_PAGE_XXX macros do not include IOMMU_PAGE_SHIFT. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	e432bc7e15	vfio: powerpc/spapr: Check that IOMMU page is fully contained by system page This checks that the TCE table page size is not bigger that the size of a page we just pinned and going to put its physical address to the table. Otherwise the hardware gets unwanted access to physical memory between the end of the actual page and the end of the aligned up TCE page. Since compound_order() and compound_head() work correctly on non-huge pages, there is no need for additional check whether the page is huge. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	9b14a1ff86	vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver This moves page pinning (get_user_pages_fast()/put_page()) code out of the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs to as the platform code does not deal with page pinning. This makes iommu_take_ownership()/iommu_release_ownership() deal with the IOMMU table bitmap only. This removes page unpinning from iommu_take_ownership() as the actual TCE table might contain garbage and doing put_page() on it is undefined behaviour. Besides the last part, the rest of the patch is mechanical. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alex Williamson	20f300175a	vfio/pci: Fix racy vfio_device_get_from_dev() call Testing the driver for a PCI device is racy, it can be all but complete in the release path and still report the driver as ours. Therefore we can't trust drvdata to be valid. This race can sometimes be seen when one port of a multifunction device is being unbound from the vfio-pci driver while another function is being released by the user and attempting a bus reset. The device in the remove path is found as a dependent device for the bus reset of the release path device, the driver is still set to vfio-pci, but the drvdata has already been cleared, resulting in a null pointer dereference. To resolve this, fix vfio_device_get_from_dev() to not take the dev_get_drvdata() shortcut and instead traverse through the iommu_group, vfio_group, vfio_device path to get a reference we can trust. Once we have that reference, we know the device isn't in transition and we can test to make sure the driver is still what we expect, so that we don't interfere with devices we don't own. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-09 10:08:57 -06:00
Will Deacon	8a0a01bff8	drivers/vfio: Allow type-1 IOMMU instantiation on top of an ARM SMMUv3 The ARM SMMUv3 driver is compatible with the notion of a type-1 IOMMU in VFIO. This patch allows VFIO_IOMMU_TYPE1 to be selected if ARM_SMMU_V3=y. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-05-29 11:12:40 +02:00
Gavin Shan	68cbbc3a9d	drivers/vfio: Support EEH error injection The patch adds one more EEH sub-command (VFIO_EEH_PE_INJECT_ERR) to inject the specified EEH error, which is represented by (struct vfio_eeh_pe_err), to the indicated PE for testing purpose. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-05-12 20:33:35 +10:00
Alex Williamson	db7d4d7f40	vfio: Fix runaway interruptible timeout Commit `13060b64b8` ("vfio: Add and use device request op for vfio bus drivers") incorrectly makes use of an interruptible timeout. When interrupted, the signal remains pending resulting in subsequent timeouts occurring instantly. This makes the loop spin at a much higher rate than intended. Instead of making this completely non-interruptible, we can change this into a sort of interruptible-once behavior and use the "once" to log debug information. The driver API doesn't allow us to abort and return an error code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Fixes: `13060b64b8` Cc: stable@vger.kernel.org # v4.0	2015-05-01 16:31:41 -06:00
Alex Williamson	5f55d2ae69	vfio-pci: Log device requests more verbosely Log some clues indicating whether the user is receiving device request interfaces or not listening. This can help indicate why a driver unbind is blocked or explain why QEMU automatically unplugged a device from the VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-05-01 14:00:53 -06:00
Alex Williamson	5a0ff17741	vfio-pci: Fix use after free Reported by 0-day test infrastructure. Fixes: `ecaa1f6a01` ("vfio-pci: Add VGA arbiter client") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-08 08:11:51 -06:00
Alex Williamson	6eb7018705	vfio-pci: Move idle devices to D3hot power state We can save some power by putting devices that are bound to vfio-pci but not in use by the user in the D3hot power state. Devices get woken into D0 when opened by the user. Resets return the device to D0, so we need to re-apply the low power state after a bus reset. It's tempting to try to use D3cold, but we have no reason to inhibit hotplug of idle devices and we might get into a loop of having the device disappear before we have a chance to try to use it. A new module parameter allows this feature to be disabled if there are devices that misbehave as a result of this change. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:46 -06:00
Alex Williamson	561d72ddbb	vfio-pci: Remove warning if try-reset fails As indicated in the comment, this is not entirely uncommon and causes user concern for no reason. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:44 -06:00
Alex Williamson	80c7e8cc2a	vfio-pci: Allow PCI IDs to be specified as module options This copies the same support from pci-stub for exactly the same purpose, enabling a set of PCI IDs to be automatically added to the driver's dynamic ID table at module load time. The code here is pretty simple and both vfio-pci and pci-stub are fairly unique in being meta drivers, capable of attaching to any device, so there's no attempt made to generalize the code into pci-core. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:43 -06:00
Alex Williamson	ecaa1f6a01	vfio-pci: Add VGA arbiter client If VFIO VGA access is disabled for the user, either by CONFIG option or module parameter, we can often opt-out of VGA arbitration. We can do this when PCI bridge control of VGA routing is possible. This means that we must have a parent bridge and there must only be a single VGA device below that bridge. Fortunately this is the typical case for discrete GPUs. Doing this allows us to minimize the impact of additional GPUs, in terms of VGA arbitration, when they are only used via vfio-pci for non-VGA applications. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:41 -06:00
Alex Williamson	88c0dead9f	vfio-pci: Add module option to disable VGA region access Add a module option so that we don't require a CONFIG change and kernel rebuild to disable VGA support. Not only can VGA support be troublesome in itself, but by disabling it we can reduce the impact to host devices by doing a VGA arbitration opt-out. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:40 -06:00
Alex Williamson	71be3423a6	vfio: Split virqfd into a separate module for vfio bus drivers An unintended consequence of commit `42ac9bd18d` ("vfio: initialize the virqfd workqueue in VFIO generic code") is that the vfio module is renamed to vfio_core so that it can include both vfio and virqfd. That's a user visible change that may break module loading scritps and it imposes eventfd support as a dependency on the core vfio code, which it's really not. virqfd is intended to be provided as a service to vfio bus drivers, so instead of wrapping it into vfio.ko, we can make it a stand-alone module toggled by vfio bus drivers. This has the additional benefit of removing initialization and exit from the core vfio code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-17 08:33:38 -06:00
kbuild test robot	66fdc052d7	vfio: virqfd_lock can be static Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-17 08:20:31 -06:00
Zhen Lei	2f51bf4be9	vfio: put off the allocation of "minor" in vfio_create_group The next code fragment "list_for_each_entry" is not depend on "minor". With this patch, the free of "minor" in "list_for_each_entry" can be reduced, and there is no functional change. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:56 -06:00
Antonios Motakis	a7fa7c77cf	vfio/platform: implement IRQ masking/unmasking via an eventfd With this patch the VFIO user will be able to set an eventfd that can be used in order to mask and unmask IRQs of platform devices. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:55 -06:00
Antonios Motakis	42ac9bd18d	vfio: initialize the virqfd workqueue in VFIO generic code Now we have finally completely decoupled virqfd from VFIO_PCI. We can initialize it from the VFIO generic code, in order to safely use it from multiple independent VFIO bus drivers. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:54 -06:00
Antonios Motakis	7e992d6927	vfio: move eventfd support code for VFIO_PCI to a separate file The virqfd functionality that is used by VFIO_PCI to implement interrupt masking and unmasking via an eventfd, is generic enough and can be reused by another driver. Move it to a separate file in order to allow the code to be shared. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:54 -06:00
Antonios Motakis	09bbcb8810	vfio: pass an opaque pointer on virqfd initialization VFIO_PCI passes the VFIO device structure *vdev via eventfd to the handler that implements masking/unmasking of IRQs via an eventfd. We can replace it in the virqfd infrastructure with an opaque type so we can make use of the mechanism from other VFIO bus drivers. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:53 -06:00
Antonios Motakis	9269c393e7	vfio: add local lock for virqfd instead of depending on VFIO PCI The Virqfd code needs to keep accesses to any struct *virqfd safe, but this comes into play only when creating or destroying eventfds, so sharing the same spinlock with the VFIO bus driver is not necessary. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:52 -06:00
Antonios Motakis	bb78e9eaab	vfio: virqfd: rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit The functions vfio_pci_virqfd_init and vfio_pci_virqfd_exit are not really PCI specific, since we plan to reuse the virqfd code with more VFIO drivers in addition to VFIO_PCI. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: Move rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit from "vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export"] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:52 -06:00
Antonios Motakis	bdc5e1021b	vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export We want to reuse virqfd functionality in multiple VFIO drivers; before moving these functions to core VFIO, add the vfio_ prefix to the virqfd_enable and virqfd_disable functions, and export them so they can be used from other modules. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:51 -06:00
Antonios Motakis	06211b40ce	vfio/platform: support for level sensitive interrupts Level sensitive interrupts are exposed as maskable and automasked interrupts and are masked and disabled automatically when they fire. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: Move masked interrupt initialization from "vfio/platform: trigger an interrupt via eventfd"] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:50 -06:00
Antonios Motakis	57f972e2b3	vfio/platform: trigger an interrupt via eventfd This patch allows to set an eventfd for a platform device's interrupt, and also to trigger the interrupt eventfd from userspace for testing. Level sensitive interrupts are marked as maskable and are handled in a later patch. Edge triggered interrupts are not advertised as maskable and are implemented here using a simple and efficient IRQ handler. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: fix masked interrupt initialization] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:50 -06:00
Antonios Motakis	9a36321c8d	vfio/platform: initial interrupts support code This patch is a skeleton for the VFIO_DEVICE_SET_IRQS IOCTL, around which most IRQ functionality is implemented in VFIO. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:49 -06:00
Antonios Motakis	682704c41e	vfio/platform: return IRQ info Return information for the interrupts exposed by the device. This patch extends VFIO_DEVICE_GET_INFO with the number of IRQs and enables VFIO_DEVICE_GET_IRQ_INFO. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:48 -06:00
Antonios Motakis	fad4d5b1f0	vfio/platform: support MMAP of MMIO regions Allow to memory map the MMIO regions of the device so userspace can directly access them. PIO regions are not being handled at this point. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:48 -06:00
Antonios Motakis	6e3f264560	vfio/platform: read and write support for the device fd VFIO returns a file descriptor which we can use to manipulate the memory regions of the device. Usually, the user will mmap memory regions that are addressable on page boundaries, however for memory regions where this is not the case we cannot provide mmap functionality due to security concerns. For this reason we also allow to use read and write functions to the file descriptor pointing to the memory regions. We implement this functionality only for MMIO regions of platform devices; PIO regions are not being handled at this point. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:47 -06:00
Antonios Motakis	e8909e67ca	vfio/platform: return info for device memory mapped IO regions This patch enables the IOCTLs VFIO_DEVICE_GET_REGION_INFO ioctl call, which allows the user to learn about the available MMIO resources of a device. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:46 -06:00
Antonios Motakis	2e8567bbb5	vfio/platform: return info for bound device A VFIO userspace driver will start by opening the VFIO device that corresponds to an IOMMU group, and will use the ioctl interface to get the basic device info, such as number of memory regions and interrupts, and their properties. This patch enables the VFIO_DEVICE_GET_INFO ioctl call. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: added include in vfio_platform_common.c] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:46 -06:00
Antonios Motakis	b13329adc2	vfio: amba: add the VFIO for AMBA devices module to Kconfig Enable building the VFIO AMBA driver. VFIO_AMBA depends on VFIO_PLATFORM, since it is sharing a portion of the code, and it is essentially implemented as a platform device whose resources are discovered via AMBA specific APIs in the kernel. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:45 -06:00
Antonios Motakis	36fe431f28	vfio: amba: VFIO support for AMBA devices Add support for discovering AMBA devices with VFIO and handle them similarly to Linux platform devices. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:44 -06:00
Antonios Motakis	5316153239	vfio: platform: add the VFIO PLATFORM module to Kconfig Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:44 -06:00
Antonios Motakis	9df85aaa43	vfio: platform: probe to devices on the platform bus Driver to bind to Linux platform devices, and callbacks to discover their resources to be used by the main VFIO PLATFORM code. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:43 -06:00
Antonios Motakis	de49fc0d99	vfio/platform: initial skeleton of VFIO support for platform devices This patch forms the common skeleton code for platform devices support with VFIO. This will include the core functionality of VFIO_PLATFORM, however binding to the device and discovering the device resources will be done with the help of a separate file where any Linux platform bus specific code will reside. This will allow us to implement support for also discovering AMBA devices and their resources, but still reuse a large part of the VFIO_PLATFORM implementation. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: added includes in vfio_platform_private.h] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:42 -06:00
Alexey Kardashevskiy	ec76f40070	vfio-pci: Add missing break to enable VFIO_PCI_ERR_IRQ_INDEX This adds a missing break statement to VFIO_DEVICE_SET_IRQS handler without which vfio_pci_set_err_trigger() would never be called. While we are here, add another "break" to VFIO_PCI_REQ_IRQ_INDEX case so if we add more indexes later, we won't miss it. Fixes: `6140a8f562` ("vfio-pci: Add device request interface") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-12 09:51:38 -06:00
Alex Williamson	6140a8f562	vfio-pci: Add device request interface Userspace can opt to receive a device request notification, indicating that the device should be released. This is setup the same way as the error IRQ and also supports eventfd signaling. Future support may forcefully remove the device from the user if the request is ignored. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 12:38:14 -07:00
Alex Williamson	cac80d6e38	vfio-pci: Generalize setup of simple eventfds We want another single vector IRQ index to support signaling of the device request to userspace. Generalize the error reporting IRQ index to avoid code duplication. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 12:37:57 -07:00
Alex Williamson	13060b64b8	vfio: Add and use device request op for vfio bus drivers When a request is made to unbind a device from a vfio bus driver, we need to wait for the device to become unused, ie. for userspace to release the device. However, we have a long standing TODO in the code to do something proactive to make that happen. To enable this, we add a request callback on the vfio bus driver struct, which is intended to signal the user through the vfio device interface to release the device. Instead of passively waiting for the device to become unused, we can now pester the user to give it up. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 12:37:47 -07:00
Alex Williamson	4a68810dbb	vfio: Tie IOMMU group reference to vfio group Move the iommu_group reference from the device to the vfio_group. This ensures that the iommu_group persists as long as the vfio_group remains. This can be important if all of the device from an iommu_group are removed, but we still have an outstanding vfio_group reference; we can still walk the empty list of devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 15:05:06 -07:00
Alex Williamson	60720a0fc6	vfio: Add device tracking during unbind There's a small window between the vfio bus driver calling vfio_del_group_dev() and the device being completely unbound where the vfio group appears to be non-viable. This creates a race for users like QEMU/KVM where the kvm-vfio module tries to get an external reference to the group in order to match and release an existing reference, while the device is potentially being removed from the vfio bus driver. If the group is momentarily non-viable, kvm-vfio may not be able to release the group reference until VM shutdown, making the group unusable until that point. Bridge the gap between device removal from the group and completion of the driver unbind by tracking it in a list. The device is added to the list before the bus driver reference is released and removed using the existing unbind notifier. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 15:05:06 -07:00
Alex Williamson	c5e6688752	vfio/type1: Add conditional rescheduling IOMMU operations can be expensive and it's not very difficult for a user to give us a lot of work to do for a map or unmap operation. Killing a large VM will vfio assigned devices can result in soft lockups and IOMMU tracing shows that we can easily spend 80% of our time with need-resched set. A sprinkling of conf_resched() calls after map and unmap calls has a very tiny affect on performance while resulting in traces with <1% of calls overflowing into needs- resched. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 14:19:12 -07:00
Alex Williamson	babbf17609	vfio/type1: Chunk contiguous reserved/invalid page mappings We currently map invalid and reserved pages, such as often occur from mapping MMIO regions of a VM through the IOMMU, using single pages. There's really no reason we can't instead follow the methodology we use for normal pages and find the largest possible physically contiguous chunk for mapping. The only difference is that we don't do locked memory accounting for these since they're not back by RAM. In most applications this will be a very minor improvement, but when graphics and GPGPU devices are in play, MMIO BARs become non-trivial. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 10:59:16 -07:00
Alex Williamson	6fe1010d6d	vfio/type1: DMA unmap chunking When unmapping DMA entries we try to rely on the IOMMU API behavior that allows the IOMMU to unmap a larger area than requested, up to the size of the original mapping. This works great when the IOMMU supports superpages and they're in use. Otherwise, each PAGE_SIZE increment is unmapped separately, resulting in poor performance. Instead we can use the IOVA-to-physical-address translation provided by the IOMMU API and unmap using the largest contiguous physical memory chunk available, which is also how vfio/type1 would have mapped the region. For a synthetic 1TB guest VM mapping and shutdown test on Intel VT-d (2M IOMMU pagesize support), this achieves about a 30% overall improvement mapping standard 4K pages, regardless of IOMMU superpage enabling, and about a 40% improvement mapping 2M hugetlbfs pages when IOMMU superpages are not available. Hugetlbfs with IOMMU superpages enabled is effectively unchanged. Unfortunately the same algorithm does not work well on IOMMUs with fine-grained superpages, like AMD-Vi, costing about 25% extra since the IOMMU will automatically unmap any power-of-two contiguous mapping we've provided it. We add a routine and a domain flag to detect this feature, leaving AMD-Vi unaffected by this unmap optimization. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 10:58:56 -07:00

1 2 3 4 5 ...

358 Commits