VFIO updates for v6.1-rc1

- Prune private items from vfio_pci_core.h to a new internal header,
    fix missed function rename, and refactor vfio-pci interrupt defines.
    (Jason Gunthorpe)
 
  - Create consistent naming and handling of ioctls with a function per
    ioctl for vfio-pci and vfio group handling, use proper type args
    where available. (Jason Gunthorpe)
 
  - Implement a set of low power device feature ioctls allowing userspace
    to make use of power states such as D3cold where supported.
    (Abhishek Sahu)
 
  - Remove device counter on vfio groups, which had restricted the page
    pinning interface to singleton groups to account for limitations in
    the type1 IOMMU backend.  Document usage as limited to emulated IOMMU
    devices, ie. traditional mdev devices where this restriction is
    consistent.  (Jason Gunthorpe)
 
  - Correct function prefix in hisi_acc driver incurred during previous
    refactoring. (Shameer Kolothum)
 
  - Correct typo and remove redundant warning triggers in vfio-fsl driver.
    (Christophe JAILLET)
 
  - Introduce device level DMA dirty tracking uAPI and implementation in
    the mlx5 variant driver (Yishai Hadas & Joao Martins)
 
  - Move much of the vfio_device life cycle management into vfio core,
    simplifying and avoiding duplication across drivers.  This also
    facilitates adding a struct device to vfio_device which begins the
    introduction of device rather than group level user support and fills
    a gap allowing userspace identify devices as vfio capable without
    implicit knowledge of the driver. (Kevin Tian & Yi Liu)
 
  - Split vfio container handling to a separate file, creating a more
    well defined API between the core and container code, masking IOMMU
    backend implementation from the core, allowing for an easier future
    transition to an iommufd based implementation of the same.
    (Jason Gunthorpe)
 
  - Attempt to resolve race accessing the iommu_group for a device
    between vfio releasing DMA ownership and removal of the device from
    the IOMMU driver.  Follow-up with support to allow vfio_group to
    exist with NULL iommu_group pointer to support existing userspace
    use cases of holding the group file open.  (Jason Gunthorpe)
 
  - Fix error code and hi/lo register manipulation issues in the hisi_acc
    variant driver, along with various code cleanups. (Longfang Liu)
 
  - Fix a prior regression in GVT-g group teardown, resulting in
    unreleased resources. (Jason Gunthorpe)
 
  - A significant cleanup and simplification of the mdev interface,
    consolidating much of the open coded per driver sysfs interface
    support into the mdev core. (Christoph Hellwig)
 
  - Simplification of tracking and locking around vfio_groups that
    fall out from previous refactoring. (Jason Gunthorpe)
 
  - Replace trivial open coded f_ops tests with new helper.
    (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmNGz2AbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiatYQAI+7bFjVsTKwCnWUhp/A
 WnFmLpnh/OsBIYiXRbXGZBgIO4iPmMyFkxqjnv6e8H1WnKhLbuPy/xCaAvPrtI8b
 YKCpzdrDnfrPfB4+0cyGLJx15Jqd3sOZy097kl2lQJTscELTjJxTl0uB/Fbf/s38
 t1K2nIhBm+sGK3rTf3JjY4Jc7vDbwX7HQt6rUVEbd3NoyLJV1T/HdeSgwSMdyiED
 WwkRZ0z/vU0hEDk5wk1ZyltkiUzdCSws3C8T0J39xRObPLHR1vYgKO8aeZhfQb4p
 luD1fzGRMt3JinSXCPPm5HfADXq2Rozx7Y7a454fvCa7lpX4MNAgaQdfIzI64lZj
 cMgSYAIskVq4vxCkO4bKec4FYrzJoxBMJwiXZvOZ4mF5SL4UIDwerMqQTA3fvtQ+
 puS6x+/DF9XXHrEewEX7teg6QYPQueneSS+fWeFpMGzDXSjdQB6qV+rMWS297t+4
 1KyITxkOxcZQ4+j1OLPGtxsRLKtWApawoNTpRMlaD+hSExxHLbUmKexOLXzuAoVP
 nhbjud+jzEbpCnwps24Og/iEBdRYJcl2KwEeSRPI856YRDrNa9jPtiDlsAtKZOK2
 gJnOixSss6R+wgVVYIyMDZ8tsvO+UDQruvqQ2kFku1FOlO86pvwD6UUVuTVosdNc
 fktw6Dx90N3fdb/o8jjAjssx
 =Z8+P
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Prune private items from vfio_pci_core.h to a new internal header,
   fix missed function rename, and refactor vfio-pci interrupt defines
   (Jason Gunthorpe)

 - Create consistent naming and handling of ioctls with a function per
   ioctl for vfio-pci and vfio group handling, use proper type args
   where available (Jason Gunthorpe)

 - Implement a set of low power device feature ioctls allowing userspace
   to make use of power states such as D3cold where supported (Abhishek
   Sahu)

 - Remove device counter on vfio groups, which had restricted the page
   pinning interface to singleton groups to account for limitations in
   the type1 IOMMU backend. Document usage as limited to emulated IOMMU
   devices, ie. traditional mdev devices where this restriction is
   consistent (Jason Gunthorpe)

 - Correct function prefix in hisi_acc driver incurred during previous
   refactoring (Shameer Kolothum)

 - Correct typo and remove redundant warning triggers in vfio-fsl driver
   (Christophe JAILLET)

 - Introduce device level DMA dirty tracking uAPI and implementation in
   the mlx5 variant driver (Yishai Hadas & Joao Martins)

 - Move much of the vfio_device life cycle management into vfio core,
   simplifying and avoiding duplication across drivers. This also
   facilitates adding a struct device to vfio_device which begins the
   introduction of device rather than group level user support and fills
   a gap allowing userspace identify devices as vfio capable without
   implicit knowledge of the driver (Kevin Tian & Yi Liu)

 - Split vfio container handling to a separate file, creating a more
   well defined API between the core and container code, masking IOMMU
   backend implementation from the core, allowing for an easier future
   transition to an iommufd based implementation of the same (Jason
   Gunthorpe)

 - Attempt to resolve race accessing the iommu_group for a device
   between vfio releasing DMA ownership and removal of the device from
   the IOMMU driver. Follow-up with support to allow vfio_group to exist
   with NULL iommu_group pointer to support existing userspace use cases
   of holding the group file open (Jason Gunthorpe)

 - Fix error code and hi/lo register manipulation issues in the hisi_acc
   variant driver, along with various code cleanups (Longfang Liu)

 - Fix a prior regression in GVT-g group teardown, resulting in
   unreleased resources (Jason Gunthorpe)

 - A significant cleanup and simplification of the mdev interface,
   consolidating much of the open coded per driver sysfs interface
   support into the mdev core (Christoph Hellwig)

 - Simplification of tracking and locking around vfio_groups that fall
   out from previous refactoring (Jason Gunthorpe)

 - Replace trivial open coded f_ops tests with new helper (Alex
   Williamson)

* tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio: (77 commits)
  vfio: More vfio_file_is_group() use cases
  vfio: Make the group FD disassociate from the iommu_group
  vfio: Hold a reference to the iommu_group in kvm for SPAPR
  vfio: Add vfio_file_is_group()
  vfio: Change vfio_group->group_rwsem to a mutex
  vfio: Remove the vfio_group->users and users_comp
  vfio/mdev: add mdev available instance checking to the core
  vfio/mdev: consolidate all the description sysfs into the core code
  vfio/mdev: consolidate all the available_instance sysfs into the core code
  vfio/mdev: consolidate all the name sysfs into the core code
  vfio/mdev: consolidate all the device_api sysfs into the core code
  vfio/mdev: remove mtype_get_parent_dev
  vfio/mdev: remove mdev_parent_dev
  vfio/mdev: unexport mdev_bus_type
  vfio/mdev: remove mdev_from_dev
  vfio/mdev: simplify mdev_type handling
  vfio/mdev: embedd struct mdev_parent in the parent data structure
  vfio/mdev: make mdev.h standalone includable
  drm/i915/gvt: simplify vgpu configuration management
  drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_types
  ...
This commit is contained in:
Linus Torvalds 2022-10-12 14:46:48 -07:00
commit d3cf405133
51 changed files with 4940 additions and 2773 deletions

View File

@ -0,0 +1,8 @@
What: /sys/.../<device>/vfio-dev/vfioX/
Date: September 2022
Contact: Yi Liu <yi.l.liu@intel.com>
Description:
This directory is created when the device is bound to a
vfio driver. The layout under this directory matches what
exists for a standard 'struct device'. 'X' is a unique
index marking this device in vfio.

View File

@ -58,19 +58,19 @@ devices as examples, as these devices are the first devices to use this module::
| MDEV CORE |
| MODULE |
| mdev.ko |
| +-----------+ | mdev_register_device() +--------------+
| +-----------+ | mdev_register_parent() +--------------+
| | | +<------------------------+ |
| | | | | nvidia.ko |<-> physical
| | | +------------------------>+ | device
| | | | callbacks +--------------+
| | Physical | |
| | device | | mdev_register_device() +--------------+
| | device | | mdev_register_parent() +--------------+
| | interface | |<------------------------+ |
| | | | | i915.ko |<-> physical
| | | +------------------------>+ | device
| | | | callbacks +--------------+
| | | |
| | | | mdev_register_device() +--------------+
| | | | mdev_register_parent() +--------------+
| | | +<------------------------+ |
| | | | | ccw_device.ko|<-> physical
| | | +------------------------>+ | device
@ -103,7 +103,8 @@ structure to represent a mediated device's driver::
struct mdev_driver {
int (*probe) (struct mdev_device *dev);
void (*remove) (struct mdev_device *dev);
struct attribute_group **supported_type_groups;
unsigned int (*get_available)(struct mdev_type *mtype);
ssize_t (*show_description)(struct mdev_type *mtype, char *buf);
struct device_driver driver;
};
@ -125,7 +126,7 @@ vfio_device_ops.
When a driver wants to add the GUID creation sysfs to an existing device it has
probe'd to then it should call::
int mdev_register_device(struct device *dev,
int mdev_register_parent(struct mdev_parent *parent, struct device *dev,
struct mdev_driver *mdev_driver);
This will provide the 'mdev_supported_types/XX/create' files which can then be
@ -134,7 +135,7 @@ attached to the specified driver.
When the driver needs to remove itself it calls::
void mdev_unregister_device(struct device *dev);
void mdev_unregister_parent(struct mdev_parent *parent);
Which will unbind and destroy all the created mdevs and remove the sysfs files.
@ -200,17 +201,14 @@ Directories and files under the sysfs for Each Physical Device
sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
(or using mdev_parent_dev(mdev) to arrive at the parent device outside
of the core mdev code)
* device_api
This attribute should show which device API is being created, for example,
This attribute shows which device API is being created, for example,
"vfio-pci" for a PCI device.
* available_instances
This attribute should show the number of devices of type <type-id> that can be
This attribute shows the number of devices of type <type-id> that can be
created.
* [device]
@ -220,11 +218,11 @@ Directories and files under the sysfs for Each Physical Device
* name
This attribute should show human readable name. This is optional attribute.
This attribute shows a human readable name.
* description
This attribute should show brief features/description of the type. This is
This attribute can show brief features/description of the type. This is an
optional attribute.
Directories and Files Under the sysfs for Each mdev Device

View File

@ -297,7 +297,7 @@ of the VFIO AP mediated device driver::
| MDEV CORE |
| MODULE |
| mdev.ko |
| +---------+ | mdev_register_device() +--------------+
| +---------+ | mdev_register_parent() +--------------+
| |Physical | +<-----------------------+ |
| | device | | | vfio_ap.ko |<-> matrix
| |interface| +----------------------->+ | device

View File

@ -156,7 +156,7 @@ Below is a high Level block diagram::
| MDEV CORE |
| MODULE |
| mdev.ko |
| +---------+ | mdev_register_device() +--------------+
| +---------+ | mdev_register_parent() +--------------+
| |Physical | +<-----------------------+ |
| | device | | | vfio_ccw.ko |<-> subchannel
| |interface| +----------------------->+ | device

View File

@ -21558,6 +21558,7 @@ R: Cornelia Huck <cohuck@redhat.com>
L: kvm@vger.kernel.org
S: Maintained
T: git git://github.com/awilliam/linux-vfio.git
F: Documentation/ABI/testing/sysfs-devices-vfio-dev
F: Documentation/driver-api/vfio.rst
F: drivers/vfio/
F: include/linux/vfio.h

View File

@ -240,13 +240,13 @@ static void free_resource(struct intel_vgpu *vgpu)
}
static int alloc_resource(struct intel_vgpu *vgpu,
struct intel_vgpu_creation_params *param)
const struct intel_vgpu_config *conf)
{
struct intel_gvt *gvt = vgpu->gvt;
unsigned long request, avail, max, taken;
const char *item;
if (!param->low_gm_sz || !param->high_gm_sz || !param->fence_sz) {
if (!conf->low_mm || !conf->high_mm || !conf->fence) {
gvt_vgpu_err("Invalid vGPU creation params\n");
return -EINVAL;
}
@ -255,7 +255,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
max = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
taken = gvt->gm.vgpu_allocated_low_gm_size;
avail = max - taken;
request = MB_TO_BYTES(param->low_gm_sz);
request = conf->low_mm;
if (request > avail)
goto no_enough_resource;
@ -266,7 +266,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
max = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
taken = gvt->gm.vgpu_allocated_high_gm_size;
avail = max - taken;
request = MB_TO_BYTES(param->high_gm_sz);
request = conf->high_mm;
if (request > avail)
goto no_enough_resource;
@ -277,16 +277,16 @@ static int alloc_resource(struct intel_vgpu *vgpu,
max = gvt_fence_sz(gvt) - HOST_FENCE;
taken = gvt->fence.vgpu_allocated_fence_num;
avail = max - taken;
request = param->fence_sz;
request = conf->fence;
if (request > avail)
goto no_enough_resource;
vgpu_fence_sz(vgpu) = request;
gvt->gm.vgpu_allocated_low_gm_size += MB_TO_BYTES(param->low_gm_sz);
gvt->gm.vgpu_allocated_high_gm_size += MB_TO_BYTES(param->high_gm_sz);
gvt->fence.vgpu_allocated_fence_num += param->fence_sz;
gvt->gm.vgpu_allocated_low_gm_size += conf->low_mm;
gvt->gm.vgpu_allocated_high_gm_size += conf->high_mm;
gvt->fence.vgpu_allocated_fence_num += conf->fence;
return 0;
no_enough_resource:
@ -340,11 +340,11 @@ void intel_vgpu_reset_resource(struct intel_vgpu *vgpu)
*
*/
int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
struct intel_vgpu_creation_params *param)
const struct intel_vgpu_config *conf)
{
int ret;
ret = alloc_resource(vgpu, param);
ret = alloc_resource(vgpu, conf);
if (ret)
return ret;

View File

@ -36,6 +36,7 @@
#include <uapi/linux/pci_regs.h>
#include <linux/kvm_host.h>
#include <linux/vfio.h>
#include <linux/mdev.h>
#include "i915_drv.h"
#include "intel_gvt.h"
@ -172,6 +173,7 @@ struct intel_vgpu_submission {
#define KVMGT_DEBUGFS_FILENAME "kvmgt_nr_cache_entries"
struct intel_vgpu {
struct vfio_device vfio_device;
struct intel_gvt *gvt;
struct mutex vgpu_lock;
int id;
@ -211,7 +213,6 @@ struct intel_vgpu {
u32 scan_nonprivbb;
struct vfio_device vfio_device;
struct vfio_region *region;
int num_regions;
struct eventfd_ctx *intx_trigger;
@ -294,15 +295,25 @@ struct intel_gvt_firmware {
bool firmware_loaded;
};
#define NR_MAX_INTEL_VGPU_TYPES 20
struct intel_vgpu_type {
char name[16];
unsigned int avail_instance;
unsigned int low_gm_size;
unsigned int high_gm_size;
struct intel_vgpu_config {
unsigned int low_mm;
unsigned int high_mm;
unsigned int fence;
/*
* A vGPU with a weight of 8 will get twice as much GPU as a vGPU with
* a weight of 4 on a contended host, different vGPU type has different
* weight set. Legal weights range from 1 to 16.
*/
unsigned int weight;
enum intel_vgpu_edid resolution;
enum intel_vgpu_edid edid;
const char *name;
};
struct intel_vgpu_type {
struct mdev_type type;
char name[16];
const struct intel_vgpu_config *conf;
};
struct intel_gvt {
@ -326,6 +337,8 @@ struct intel_gvt {
struct intel_gvt_workload_scheduler scheduler;
struct notifier_block shadow_ctx_notifier_block[I915_NUM_ENGINES];
DECLARE_HASHTABLE(cmd_table, GVT_CMD_HASH_BITS);
struct mdev_parent parent;
struct mdev_type **mdev_types;
struct intel_vgpu_type *types;
unsigned int num_types;
struct intel_vgpu *idle_vgpu;
@ -436,19 +449,8 @@ int intel_gvt_load_firmware(struct intel_gvt *gvt);
/* ring context size i.e. the first 0x50 dwords*/
#define RING_CTX_SIZE 320
struct intel_vgpu_creation_params {
__u64 low_gm_sz; /* in MB */
__u64 high_gm_sz; /* in MB */
__u64 fence_sz;
__u64 resolution;
__s32 primary;
__u64 vgpu_id;
__u32 weight;
};
int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
struct intel_vgpu_creation_params *param);
const struct intel_vgpu_config *conf);
void intel_vgpu_reset_resource(struct intel_vgpu *vgpu);
void intel_vgpu_free_resource(struct intel_vgpu *vgpu);
void intel_vgpu_write_fence(struct intel_vgpu *vgpu,
@ -494,8 +496,8 @@ void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt);
struct intel_vgpu *intel_gvt_create_idle_vgpu(struct intel_gvt *gvt);
void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu);
struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
struct intel_vgpu_type *type);
int intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
const struct intel_vgpu_config *conf);
void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu);
void intel_gvt_release_vgpu(struct intel_vgpu *vgpu);
void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,

View File

@ -34,7 +34,6 @@
*/
#include <linux/init.h>
#include <linux/device.h>
#include <linux/mm.h>
#include <linux/kthread.h>
#include <linux/sched/mm.h>
@ -43,7 +42,6 @@
#include <linux/rbtree.h>
#include <linux/spinlock.h>
#include <linux/eventfd.h>
#include <linux/uuid.h>
#include <linux/mdev.h>
#include <linux/debugfs.h>
@ -115,117 +113,18 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
struct kvm_memory_slot *slot,
struct kvm_page_track_notifier_node *node);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
static ssize_t intel_vgpu_show_description(struct mdev_type *mtype, char *buf)
{
struct intel_vgpu_type *type;
unsigned int num = 0;
struct intel_gvt *gvt = kdev_to_i915(mtype_get_parent_dev(mtype))->gvt;
type = &gvt->types[mtype_get_type_group_id(mtype)];
if (!type)
num = 0;
else
num = type->avail_instance;
return sprintf(buf, "%u\n", num);
}
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
}
static ssize_t description_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
struct intel_vgpu_type *type;
struct intel_gvt *gvt = kdev_to_i915(mtype_get_parent_dev(mtype))->gvt;
type = &gvt->types[mtype_get_type_group_id(mtype)];
if (!type)
return 0;
struct intel_vgpu_type *type =
container_of(mtype, struct intel_vgpu_type, type);
return sprintf(buf, "low_gm_size: %dMB\nhigh_gm_size: %dMB\n"
"fence: %d\nresolution: %s\n"
"weight: %d\n",
BYTES_TO_MB(type->low_gm_size),
BYTES_TO_MB(type->high_gm_size),
type->fence, vgpu_edid_str(type->resolution),
type->weight);
}
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
struct intel_vgpu_type *type;
struct intel_gvt *gvt = kdev_to_i915(mtype_get_parent_dev(mtype))->gvt;
type = &gvt->types[mtype_get_type_group_id(mtype)];
if (!type)
return 0;
return sprintf(buf, "%s\n", type->name);
}
static MDEV_TYPE_ATTR_RO(available_instances);
static MDEV_TYPE_ATTR_RO(device_api);
static MDEV_TYPE_ATTR_RO(description);
static MDEV_TYPE_ATTR_RO(name);
static struct attribute *gvt_type_attrs[] = {
&mdev_type_attr_available_instances.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_description.attr,
&mdev_type_attr_name.attr,
NULL,
};
static struct attribute_group *gvt_vgpu_type_groups[] = {
[0 ... NR_MAX_INTEL_VGPU_TYPES - 1] = NULL,
};
static int intel_gvt_init_vgpu_type_groups(struct intel_gvt *gvt)
{
int i, j;
struct intel_vgpu_type *type;
struct attribute_group *group;
for (i = 0; i < gvt->num_types; i++) {
type = &gvt->types[i];
group = kzalloc(sizeof(struct attribute_group), GFP_KERNEL);
if (!group)
goto unwind;
group->name = type->name;
group->attrs = gvt_type_attrs;
gvt_vgpu_type_groups[i] = group;
}
return 0;
unwind:
for (j = 0; j < i; j++) {
group = gvt_vgpu_type_groups[j];
kfree(group);
}
return -ENOMEM;
}
static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
{
int i;
struct attribute_group *group;
for (i = 0; i < gvt->num_types; i++) {
group = gvt_vgpu_type_groups[i];
gvt_vgpu_type_groups[i] = NULL;
kfree(group);
}
BYTES_TO_MB(type->conf->low_mm),
BYTES_TO_MB(type->conf->high_mm),
type->conf->fence, vgpu_edid_str(type->conf->edid),
type->conf->weight);
}
static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
@ -1546,7 +1445,28 @@ static const struct attribute_group *intel_vgpu_groups[] = {
NULL,
};
static int intel_vgpu_init_dev(struct vfio_device *vfio_dev)
{
struct mdev_device *mdev = to_mdev_device(vfio_dev->dev);
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
struct intel_vgpu_type *type =
container_of(mdev->type, struct intel_vgpu_type, type);
vgpu->gvt = kdev_to_i915(mdev->type->parent->dev)->gvt;
return intel_gvt_create_vgpu(vgpu, type->conf);
}
static void intel_vgpu_release_dev(struct vfio_device *vfio_dev)
{
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
intel_gvt_destroy_vgpu(vgpu);
vfio_free_device(vfio_dev);
}
static const struct vfio_device_ops intel_vgpu_dev_ops = {
.init = intel_vgpu_init_dev,
.release = intel_vgpu_release_dev,
.open_device = intel_vgpu_open_device,
.close_device = intel_vgpu_close_device,
.read = intel_vgpu_read,
@ -1558,35 +1478,28 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
static int intel_vgpu_probe(struct mdev_device *mdev)
{
struct device *pdev = mdev_parent_dev(mdev);
struct intel_gvt *gvt = kdev_to_i915(pdev)->gvt;
struct intel_vgpu_type *type;
struct intel_vgpu *vgpu;
int ret;
type = &gvt->types[mdev_get_type_group_id(mdev)];
if (!type)
return -EINVAL;
vgpu = intel_gvt_create_vgpu(gvt, type);
vgpu = vfio_alloc_device(intel_vgpu, vfio_device, &mdev->dev,
&intel_vgpu_dev_ops);
if (IS_ERR(vgpu)) {
gvt_err("failed to create intel vgpu: %ld\n", PTR_ERR(vgpu));
return PTR_ERR(vgpu);
}
vfio_init_group_dev(&vgpu->vfio_device, &mdev->dev,
&intel_vgpu_dev_ops);
dev_set_drvdata(&mdev->dev, vgpu);
ret = vfio_register_emulated_iommu_dev(&vgpu->vfio_device);
if (ret) {
intel_gvt_destroy_vgpu(vgpu);
return ret;
}
if (ret)
goto out_put_vdev;
gvt_dbg_core("intel_vgpu_create succeeded for mdev: %s\n",
dev_name(mdev_dev(mdev)));
return 0;
out_put_vdev:
vfio_put_device(&vgpu->vfio_device);
return ret;
}
static void intel_vgpu_remove(struct mdev_device *mdev)
@ -1595,10 +1508,34 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached))
return;
intel_gvt_destroy_vgpu(vgpu);
vfio_unregister_group_dev(&vgpu->vfio_device);
vfio_put_device(&vgpu->vfio_device);
}
static unsigned int intel_vgpu_get_available(struct mdev_type *mtype)
{
struct intel_vgpu_type *type =
container_of(mtype, struct intel_vgpu_type, type);
struct intel_gvt *gvt = kdev_to_i915(mtype->parent->dev)->gvt;
unsigned int low_gm_avail, high_gm_avail, fence_avail;
mutex_lock(&gvt->lock);
low_gm_avail = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE -
gvt->gm.vgpu_allocated_low_gm_size;
high_gm_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE -
gvt->gm.vgpu_allocated_high_gm_size;
fence_avail = gvt_fence_sz(gvt) - HOST_FENCE -
gvt->fence.vgpu_allocated_fence_num;
mutex_unlock(&gvt->lock);
return min3(low_gm_avail / type->conf->low_mm,
high_gm_avail / type->conf->high_mm,
fence_avail / type->conf->fence);
}
static struct mdev_driver intel_vgpu_mdev_driver = {
.device_api = VFIO_DEVICE_API_PCI_STRING,
.driver = {
.name = "intel_vgpu_mdev",
.owner = THIS_MODULE,
@ -1606,7 +1543,8 @@ static struct mdev_driver intel_vgpu_mdev_driver = {
},
.probe = intel_vgpu_probe,
.remove = intel_vgpu_remove,
.supported_type_groups = gvt_vgpu_type_groups,
.get_available = intel_vgpu_get_available,
.show_description = intel_vgpu_show_description,
};
int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
@ -1904,8 +1842,7 @@ static void intel_gvt_clean_device(struct drm_i915_private *i915)
if (drm_WARN_ON(&i915->drm, !gvt))
return;
mdev_unregister_device(i915->drm.dev);
intel_gvt_cleanup_vgpu_type_groups(gvt);
mdev_unregister_parent(&gvt->parent);
intel_gvt_destroy_idle_vgpu(gvt->idle_vgpu);
intel_gvt_clean_vgpu_types(gvt);
@ -2005,19 +1942,15 @@ static int intel_gvt_init_device(struct drm_i915_private *i915)
intel_gvt_debugfs_init(gvt);
ret = intel_gvt_init_vgpu_type_groups(gvt);
ret = mdev_register_parent(&gvt->parent, i915->drm.dev,
&intel_vgpu_mdev_driver,
gvt->mdev_types, gvt->num_types);
if (ret)
goto out_destroy_idle_vgpu;
ret = mdev_register_device(i915->drm.dev, &intel_vgpu_mdev_driver);
if (ret)
goto out_cleanup_vgpu_type_groups;
gvt_dbg_core("gvt device initialization is done\n");
return 0;
out_cleanup_vgpu_type_groups:
intel_gvt_cleanup_vgpu_type_groups(gvt);
out_destroy_idle_vgpu:
intel_gvt_destroy_idle_vgpu(gvt->idle_vgpu);
intel_gvt_debugfs_clean(gvt);

View File

@ -73,24 +73,21 @@ void populate_pvinfo_page(struct intel_vgpu *vgpu)
drm_WARN_ON(&i915->drm, sizeof(struct vgt_if) != VGT_PVINFO_SIZE);
}
/*
* vGPU type name is defined as GVTg_Vx_y which contains the physical GPU
* generation type (e.g V4 as BDW server, V5 as SKL server).
*
* Depening on the physical SKU resource, we might see vGPU types like
* GVTg_V4_8, GVTg_V4_4, GVTg_V4_2, etc. We can create different types of
* vGPU on same physical GPU depending on available resource. Each vGPU
* type will have a different number of avail_instance to indicate how
* many vGPU instance can be created for this type.
*/
#define VGPU_MAX_WEIGHT 16
#define VGPU_WEIGHT(vgpu_num) \
(VGPU_MAX_WEIGHT / (vgpu_num))
static const struct {
unsigned int low_mm;
unsigned int high_mm;
unsigned int fence;
/* A vGPU with a weight of 8 will get twice as much GPU as a vGPU
* with a weight of 4 on a contended host, different vGPU type has
* different weight set. Legal weights range from 1 to 16.
*/
unsigned int weight;
enum intel_vgpu_edid edid;
const char *name;
} vgpu_types[] = {
/* Fixed vGPU type table */
static const struct intel_vgpu_config intel_vgpu_configs[] = {
{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8), GVT_EDID_1024_768, "8" },
{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4), GVT_EDID_1920_1200, "4" },
{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2), GVT_EDID_1920_1200, "2" },
@ -106,104 +103,60 @@ static const struct {
*/
int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
{
unsigned int num_types;
unsigned int i, low_avail, high_avail;
unsigned int min_low;
/* vGPU type name is defined as GVTg_Vx_y which contains
* physical GPU generation type (e.g V4 as BDW server, V5 as
* SKL server).
*
* Depend on physical SKU resource, might see vGPU types like
* GVTg_V4_8, GVTg_V4_4, GVTg_V4_2, etc. We can create
* different types of vGPU on same physical GPU depending on
* available resource. Each vGPU type will have "avail_instance"
* to indicate how many vGPU instance can be created for this
* type.
*
*/
low_avail = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
high_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
num_types = ARRAY_SIZE(vgpu_types);
unsigned int low_avail = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
unsigned int high_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
unsigned int num_types = ARRAY_SIZE(intel_vgpu_configs);
unsigned int i;
gvt->types = kcalloc(num_types, sizeof(struct intel_vgpu_type),
GFP_KERNEL);
if (!gvt->types)
return -ENOMEM;
min_low = MB_TO_BYTES(32);
gvt->mdev_types = kcalloc(num_types, sizeof(*gvt->mdev_types),
GFP_KERNEL);
if (!gvt->mdev_types)
goto out_free_types;
for (i = 0; i < num_types; ++i) {
if (low_avail / vgpu_types[i].low_mm == 0)
const struct intel_vgpu_config *conf = &intel_vgpu_configs[i];
if (low_avail / conf->low_mm == 0)
break;
if (conf->weight < 1 || conf->weight > VGPU_MAX_WEIGHT)
goto out_free_mdev_types;
gvt->types[i].low_gm_size = vgpu_types[i].low_mm;
gvt->types[i].high_gm_size = vgpu_types[i].high_mm;
gvt->types[i].fence = vgpu_types[i].fence;
if (vgpu_types[i].weight < 1 ||
vgpu_types[i].weight > VGPU_MAX_WEIGHT)
return -EINVAL;
gvt->types[i].weight = vgpu_types[i].weight;
gvt->types[i].resolution = vgpu_types[i].edid;
gvt->types[i].avail_instance = min(low_avail / vgpu_types[i].low_mm,
high_avail / vgpu_types[i].high_mm);
if (GRAPHICS_VER(gvt->gt->i915) == 8)
sprintf(gvt->types[i].name, "GVTg_V4_%s",
vgpu_types[i].name);
else if (GRAPHICS_VER(gvt->gt->i915) == 9)
sprintf(gvt->types[i].name, "GVTg_V5_%s",
vgpu_types[i].name);
sprintf(gvt->types[i].name, "GVTg_V%u_%s",
GRAPHICS_VER(gvt->gt->i915) == 8 ? 4 : 5, conf->name);
gvt->types[i].conf = conf;
gvt_dbg_core("type[%d]: %s avail %u low %u high %u fence %u weight %u res %s\n",
i, gvt->types[i].name,
gvt->types[i].avail_instance,
gvt->types[i].low_gm_size,
gvt->types[i].high_gm_size, gvt->types[i].fence,
gvt->types[i].weight,
vgpu_edid_str(gvt->types[i].resolution));
min(low_avail / conf->low_mm,
high_avail / conf->high_mm),
conf->low_mm, conf->high_mm, conf->fence,
conf->weight, vgpu_edid_str(conf->edid));
gvt->mdev_types[i] = &gvt->types[i].type;
gvt->mdev_types[i]->sysfs_name = gvt->types[i].name;
}
gvt->num_types = i;
return 0;
out_free_mdev_types:
kfree(gvt->mdev_types);
out_free_types:
kfree(gvt->types);
return -EINVAL;
}
void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt)
{
kfree(gvt->mdev_types);
kfree(gvt->types);
}
static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
{
int i;
unsigned int low_gm_avail, high_gm_avail, fence_avail;
unsigned int low_gm_min, high_gm_min, fence_min;
/* Need to depend on maxium hw resource size but keep on
* static config for now.
*/
low_gm_avail = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE -
gvt->gm.vgpu_allocated_low_gm_size;
high_gm_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE -
gvt->gm.vgpu_allocated_high_gm_size;
fence_avail = gvt_fence_sz(gvt) - HOST_FENCE -
gvt->fence.vgpu_allocated_fence_num;
for (i = 0; i < gvt->num_types; i++) {
low_gm_min = low_gm_avail / gvt->types[i].low_gm_size;
high_gm_min = high_gm_avail / gvt->types[i].high_gm_size;
fence_min = fence_avail / gvt->types[i].fence;
gvt->types[i].avail_instance = min(min(low_gm_min, high_gm_min),
fence_min);
gvt_dbg_core("update type[%d]: %s avail %u low %u high %u fence %u\n",
i, gvt->types[i].name,
gvt->types[i].avail_instance, gvt->types[i].low_gm_size,
gvt->types[i].high_gm_size, gvt->types[i].fence);
}
}
/**
* intel_gvt_active_vgpu - activate a virtual GPU
* @vgpu: virtual GPU
@ -298,12 +251,6 @@ void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu)
intel_vgpu_clean_mmio(vgpu);
intel_vgpu_dmabuf_cleanup(vgpu);
mutex_unlock(&vgpu->vgpu_lock);
mutex_lock(&gvt->lock);
intel_gvt_update_vgpu_types(gvt);
mutex_unlock(&gvt->lock);
vfree(vgpu);
}
#define IDLE_VGPU_IDR 0
@ -363,42 +310,38 @@ void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu)
vfree(vgpu);
}
static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
struct intel_vgpu_creation_params *param)
int intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
const struct intel_vgpu_config *conf)
{
struct intel_gvt *gvt = vgpu->gvt;
struct drm_i915_private *dev_priv = gvt->gt->i915;
struct intel_vgpu *vgpu;
int ret;
gvt_dbg_core("low %llu MB high %llu MB fence %llu\n",
param->low_gm_sz, param->high_gm_sz,
param->fence_sz);
vgpu = vzalloc(sizeof(*vgpu));
if (!vgpu)
return ERR_PTR(-ENOMEM);
gvt_dbg_core("low %u MB high %u MB fence %u\n",
BYTES_TO_MB(conf->low_mm), BYTES_TO_MB(conf->high_mm),
conf->fence);
mutex_lock(&gvt->lock);
ret = idr_alloc(&gvt->vgpu_idr, vgpu, IDLE_VGPU_IDR + 1, GVT_MAX_VGPU,
GFP_KERNEL);
if (ret < 0)
goto out_free_vgpu;
goto out_unlock;;
vgpu->id = ret;
vgpu->gvt = gvt;
vgpu->sched_ctl.weight = param->weight;
vgpu->sched_ctl.weight = conf->weight;
mutex_init(&vgpu->vgpu_lock);
mutex_init(&vgpu->dmabuf_lock);
INIT_LIST_HEAD(&vgpu->dmabuf_obj_list_head);
INIT_RADIX_TREE(&vgpu->page_track_tree, GFP_KERNEL);
idr_init_base(&vgpu->object_idr, 1);
intel_vgpu_init_cfg_space(vgpu, param->primary);
intel_vgpu_init_cfg_space(vgpu, 1);
vgpu->d3_entered = false;
ret = intel_vgpu_init_mmio(vgpu);
if (ret)
goto out_clean_idr;
ret = intel_vgpu_alloc_resource(vgpu, param);
ret = intel_vgpu_alloc_resource(vgpu, conf);
if (ret)
goto out_clean_vgpu_mmio;
@ -412,7 +355,7 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
if (ret)
goto out_clean_gtt;
ret = intel_vgpu_init_display(vgpu, param->resolution);
ret = intel_vgpu_init_display(vgpu, conf->edid);
if (ret)
goto out_clean_opregion;
@ -437,7 +380,9 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
if (ret)
goto out_clean_sched_policy;
return vgpu;
intel_gvt_update_reg_whitelist(vgpu);
mutex_unlock(&gvt->lock);
return 0;
out_clean_sched_policy:
intel_vgpu_clean_sched_policy(vgpu);
@ -455,48 +400,9 @@ out_clean_vgpu_mmio:
intel_vgpu_clean_mmio(vgpu);
out_clean_idr:
idr_remove(&gvt->vgpu_idr, vgpu->id);
out_free_vgpu:
vfree(vgpu);
return ERR_PTR(ret);
}
/**
* intel_gvt_create_vgpu - create a virtual GPU
* @gvt: GVT device
* @type: type of the vGPU to create
*
* This function is called when user wants to create a virtual GPU.
*
* Returns:
* pointer to intel_vgpu, error pointer if failed.
*/
struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
struct intel_vgpu_type *type)
{
struct intel_vgpu_creation_params param;
struct intel_vgpu *vgpu;
param.primary = 1;
param.low_gm_sz = type->low_gm_size;
param.high_gm_sz = type->high_gm_size;
param.fence_sz = type->fence;
param.weight = type->weight;
param.resolution = type->resolution;
/* XXX current param based on MB */
param.low_gm_sz = BYTES_TO_MB(param.low_gm_sz);
param.high_gm_sz = BYTES_TO_MB(param.high_gm_sz);
mutex_lock(&gvt->lock);
vgpu = __intel_gvt_create_vgpu(gvt, &param);
if (!IS_ERR(vgpu)) {
/* calculate left instance change for types */
intel_gvt_update_vgpu_types(gvt);
intel_gvt_update_reg_whitelist(vgpu);
}
out_unlock:
mutex_unlock(&gvt->lock);
return vgpu;
return ret;
}
/**

View File

@ -12,7 +12,6 @@
#include <linux/module.h>
#include <linux/init.h>
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/mdev.h>
@ -142,7 +141,6 @@ static struct vfio_ccw_private *vfio_ccw_alloc_private(struct subchannel *sch)
INIT_LIST_HEAD(&private->crw);
INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
INIT_WORK(&private->crw_work, vfio_ccw_crw_todo);
atomic_set(&private->avail, 1);
private->cp.guest_cp = kcalloc(CCWCHAIN_LEN_MAX, sizeof(struct ccw1),
GFP_KERNEL);
@ -203,7 +201,6 @@ static void vfio_ccw_free_private(struct vfio_ccw_private *private)
mutex_destroy(&private->io_mutex);
kfree(private);
}
static int vfio_ccw_sch_probe(struct subchannel *sch)
{
struct pmcw *pmcw = &sch->schib.pmcw;
@ -222,7 +219,12 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
dev_set_drvdata(&sch->dev, private);
ret = mdev_register_device(&sch->dev, &vfio_ccw_mdev_driver);
private->mdev_type.sysfs_name = "io";
private->mdev_type.pretty_name = "I/O subchannel (Non-QDIO)";
private->mdev_types[0] = &private->mdev_type;
ret = mdev_register_parent(&private->parent, &sch->dev,
&vfio_ccw_mdev_driver,
private->mdev_types, 1);
if (ret)
goto out_free;
@ -241,7 +243,7 @@ static void vfio_ccw_sch_remove(struct subchannel *sch)
{
struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
mdev_unregister_device(&sch->dev);
mdev_unregister_parent(&private->parent);
dev_set_drvdata(&sch->dev, NULL);

View File

@ -11,7 +11,6 @@
*/
#include <linux/vfio.h>
#include <linux/mdev.h>
#include <linux/nospec.h>
#include <linux/slab.h>
@ -45,47 +44,14 @@ static void vfio_ccw_dma_unmap(struct vfio_device *vdev, u64 iova, u64 length)
vfio_ccw_mdev_reset(private);
}
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "I/O subchannel (Non-QDIO)\n");
}
static MDEV_TYPE_ATTR_RO(name);
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_CCW_STRING);
}
static MDEV_TYPE_ATTR_RO(device_api);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
static int vfio_ccw_mdev_init_dev(struct vfio_device *vdev)
{
struct vfio_ccw_private *private =
dev_get_drvdata(mtype_get_parent_dev(mtype));
container_of(vdev, struct vfio_ccw_private, vdev);
return sprintf(buf, "%d\n", atomic_read(&private->avail));
init_completion(&private->release_comp);
return 0;
}
static MDEV_TYPE_ATTR_RO(available_instances);
static struct attribute *mdev_types_attrs[] = {
&mdev_type_attr_name.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_available_instances.attr,
NULL,
};
static struct attribute_group mdev_type_group = {
.name = "io",
.attrs = mdev_types_attrs,
};
static struct attribute_group *mdev_type_groups[] = {
&mdev_type_group,
NULL,
};
static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
{
@ -95,12 +61,9 @@ static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
if (private->state == VFIO_CCW_STATE_NOT_OPER)
return -ENODEV;
if (atomic_dec_if_positive(&private->avail) < 0)
return -EPERM;
memset(&private->vdev, 0, sizeof(private->vdev));
vfio_init_group_dev(&private->vdev, &mdev->dev,
&vfio_ccw_dev_ops);
ret = vfio_init_device(&private->vdev, &mdev->dev, &vfio_ccw_dev_ops);
if (ret)
return ret;
VFIO_CCW_MSG_EVENT(2, "sch %x.%x.%04x: create\n",
private->sch->schid.cssid,
@ -109,16 +72,32 @@ static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
ret = vfio_register_emulated_iommu_dev(&private->vdev);
if (ret)
goto err_atomic;
goto err_put_vdev;
dev_set_drvdata(&mdev->dev, private);
return 0;
err_atomic:
vfio_uninit_group_dev(&private->vdev);
atomic_inc(&private->avail);
err_put_vdev:
vfio_put_device(&private->vdev);
return ret;
}
static void vfio_ccw_mdev_release_dev(struct vfio_device *vdev)
{
struct vfio_ccw_private *private =
container_of(vdev, struct vfio_ccw_private, vdev);
/*
* We cannot free vfio_ccw_private here because it includes
* parent info which must be free'ed by css driver.
*
* Use a workaround by memset'ing the core device part and
* then notifying the remove path that all active references
* to this device have been released.
*/
memset(vdev, 0, sizeof(*vdev));
complete(&private->release_comp);
}
static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
{
struct vfio_ccw_private *private = dev_get_drvdata(mdev->dev.parent);
@ -130,8 +109,16 @@ static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
vfio_unregister_group_dev(&private->vdev);
vfio_uninit_group_dev(&private->vdev);
atomic_inc(&private->avail);
vfio_put_device(&private->vdev);
/*
* Wait for all active references on mdev are released so it
* is safe to defer kfree() to a later point.
*
* TODO: the clean fix is to split parent/mdev info from ccw
* private structure so each can be managed in its own life
* cycle.
*/
wait_for_completion(&private->release_comp);
}
static int vfio_ccw_mdev_open_device(struct vfio_device *vdev)
@ -592,6 +579,8 @@ static void vfio_ccw_mdev_request(struct vfio_device *vdev, unsigned int count)
}
static const struct vfio_device_ops vfio_ccw_dev_ops = {
.init = vfio_ccw_mdev_init_dev,
.release = vfio_ccw_mdev_release_dev,
.open_device = vfio_ccw_mdev_open_device,
.close_device = vfio_ccw_mdev_close_device,
.read = vfio_ccw_mdev_read,
@ -602,6 +591,8 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
};
struct mdev_driver vfio_ccw_mdev_driver = {
.device_api = VFIO_DEVICE_API_CCW_STRING,
.max_instances = 1,
.driver = {
.name = "vfio_ccw_mdev",
.owner = THIS_MODULE,
@ -609,5 +600,4 @@ struct mdev_driver vfio_ccw_mdev_driver = {
},
.probe = vfio_ccw_mdev_probe,
.remove = vfio_ccw_mdev_remove,
.supported_type_groups = mdev_type_groups,
};

View File

@ -18,6 +18,7 @@
#include <linux/workqueue.h>
#include <linux/vfio_ccw.h>
#include <linux/vfio.h>
#include <linux/mdev.h>
#include <asm/crw.h>
#include <asm/debug.h>
@ -72,7 +73,6 @@ struct vfio_ccw_crw {
* @sch: pointer to the subchannel
* @state: internal state of the device
* @completion: synchronization helper of the I/O completion
* @avail: available for creating a mediated device
* @io_region: MMIO region to input/output I/O arguments/results
* @io_mutex: protect against concurrent update of I/O regions
* @region: additional regions for other subchannel operations
@ -88,13 +88,14 @@ struct vfio_ccw_crw {
* @req_trigger: eventfd ctx for signaling userspace to return device
* @io_work: work for deferral process of I/O handling
* @crw_work: work for deferral process of CRW handling
* @release_comp: synchronization helper for vfio device release
* @parent: parent data structures for mdevs created
*/
struct vfio_ccw_private {
struct vfio_device vdev;
struct subchannel *sch;
int state;
struct completion *completion;
atomic_t avail;
struct ccw_io_region *io_region;
struct mutex io_mutex;
struct vfio_ccw_region *region;
@ -113,6 +114,12 @@ struct vfio_ccw_private {
struct eventfd_ctx *req_trigger;
struct work_struct io_work;
struct work_struct crw_work;
struct completion release_comp;
struct mdev_parent parent;
struct mdev_type mdev_type;
struct mdev_type *mdev_types[1];
} __aligned(8);
int vfio_ccw_sch_quiesce(struct subchannel *sch);

View File

@ -684,42 +684,41 @@ static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm,
AP_DOMAINS);
}
static int vfio_ap_mdev_probe(struct mdev_device *mdev)
static int vfio_ap_mdev_init_dev(struct vfio_device *vdev)
{
struct ap_matrix_mdev *matrix_mdev;
int ret;
struct ap_matrix_mdev *matrix_mdev =
container_of(vdev, struct ap_matrix_mdev, vdev);
if ((atomic_dec_if_positive(&matrix_dev->available_instances) < 0))
return -EPERM;
matrix_mdev = kzalloc(sizeof(*matrix_mdev), GFP_KERNEL);
if (!matrix_mdev) {
ret = -ENOMEM;
goto err_dec_available;
}
vfio_init_group_dev(&matrix_mdev->vdev, &mdev->dev,
&vfio_ap_matrix_dev_ops);
matrix_mdev->mdev = mdev;
matrix_mdev->mdev = to_mdev_device(vdev->dev);
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
matrix_mdev->pqap_hook = handle_pqap;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
hash_init(matrix_mdev->qtable.queues);
return 0;
}
static int vfio_ap_mdev_probe(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
int ret;
matrix_mdev = vfio_alloc_device(ap_matrix_mdev, vdev, &mdev->dev,
&vfio_ap_matrix_dev_ops);
if (IS_ERR(matrix_mdev))
return PTR_ERR(matrix_mdev);
ret = vfio_register_emulated_iommu_dev(&matrix_mdev->vdev);
if (ret)
goto err_list;
goto err_put_vdev;
dev_set_drvdata(&mdev->dev, matrix_mdev);
mutex_lock(&matrix_dev->mdevs_lock);
list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
mutex_unlock(&matrix_dev->mdevs_lock);
return 0;
err_list:
vfio_uninit_group_dev(&matrix_mdev->vdev);
kfree(matrix_mdev);
err_dec_available:
atomic_inc(&matrix_dev->available_instances);
err_put_vdev:
vfio_put_device(&matrix_mdev->vdev);
return ret;
}
@ -766,6 +765,11 @@ static void vfio_ap_mdev_unlink_fr_queues(struct ap_matrix_mdev *matrix_mdev)
}
}
static void vfio_ap_mdev_release_dev(struct vfio_device *vdev)
{
vfio_free_device(vdev);
}
static void vfio_ap_mdev_remove(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(&mdev->dev);
@ -779,54 +783,9 @@ static void vfio_ap_mdev_remove(struct mdev_device *mdev)
list_del(&matrix_mdev->node);
mutex_unlock(&matrix_dev->mdevs_lock);
mutex_unlock(&matrix_dev->guests_lock);
vfio_uninit_group_dev(&matrix_mdev->vdev);
kfree(matrix_mdev);
atomic_inc(&matrix_dev->available_instances);
vfio_put_device(&matrix_mdev->vdev);
}
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
}
static MDEV_TYPE_ATTR_RO(name);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
{
return sprintf(buf, "%d\n",
atomic_read(&matrix_dev->available_instances));
}
static MDEV_TYPE_ATTR_RO(available_instances);
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
}
static MDEV_TYPE_ATTR_RO(device_api);
static struct attribute *vfio_ap_mdev_type_attrs[] = {
&mdev_type_attr_name.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_available_instances.attr,
NULL,
};
static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
.name = VFIO_AP_MDEV_TYPE_HWVIRT,
.attrs = vfio_ap_mdev_type_attrs,
};
static struct attribute_group *vfio_ap_mdev_type_groups[] = {
&vfio_ap_mdev_hwvirt_type_group,
NULL,
};
#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
"already assigned to %s"
@ -1824,6 +1783,8 @@ static const struct attribute_group vfio_queue_attr_group = {
};
static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
.init = vfio_ap_mdev_init_dev,
.release = vfio_ap_mdev_release_dev,
.open_device = vfio_ap_mdev_open_device,
.close_device = vfio_ap_mdev_close_device,
.ioctl = vfio_ap_mdev_ioctl,
@ -1831,6 +1792,8 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
};
static struct mdev_driver vfio_ap_matrix_driver = {
.device_api = VFIO_DEVICE_API_AP_STRING,
.max_instances = MAX_ZDEV_ENTRIES_EXT,
.driver = {
.name = "vfio_ap_mdev",
.owner = THIS_MODULE,
@ -1839,20 +1802,22 @@ static struct mdev_driver vfio_ap_matrix_driver = {
},
.probe = vfio_ap_mdev_probe,
.remove = vfio_ap_mdev_remove,
.supported_type_groups = vfio_ap_mdev_type_groups,
};
int vfio_ap_mdev_register(void)
{
int ret;
atomic_set(&matrix_dev->available_instances, MAX_ZDEV_ENTRIES_EXT);
ret = mdev_register_driver(&vfio_ap_matrix_driver);
if (ret)
return ret;
ret = mdev_register_device(&matrix_dev->device, &vfio_ap_matrix_driver);
matrix_dev->mdev_type.sysfs_name = VFIO_AP_MDEV_TYPE_HWVIRT;
matrix_dev->mdev_type.pretty_name = VFIO_AP_MDEV_NAME_HWVIRT;
matrix_dev->mdev_types[0] = &matrix_dev->mdev_type;
ret = mdev_register_parent(&matrix_dev->parent, &matrix_dev->device,
&vfio_ap_matrix_driver,
matrix_dev->mdev_types, 1);
if (ret)
goto err_driver;
return 0;
@ -1864,7 +1829,7 @@ err_driver:
void vfio_ap_mdev_unregister(void)
{
mdev_unregister_device(&matrix_dev->device);
mdev_unregister_parent(&matrix_dev->parent);
mdev_unregister_driver(&vfio_ap_matrix_driver);
}

View File

@ -13,7 +13,6 @@
#define _VFIO_AP_PRIVATE_H_
#include <linux/types.h>
#include <linux/device.h>
#include <linux/mdev.h>
#include <linux/delay.h>
#include <linux/mutex.h>
@ -30,7 +29,6 @@
* struct ap_matrix_dev - Contains the data for the matrix device.
*
* @device: generic device structure associated with the AP matrix device
* @available_instances: number of mediated matrix devices that can be created
* @info: the struct containing the output from the PQAP(QCI) instruction
* @mdev_list: the list of mediated matrix devices created
* @mdevs_lock: mutex for locking the AP matrix device. This lock will be
@ -47,12 +45,14 @@
*/
struct ap_matrix_dev {
struct device device;
atomic_t available_instances;
struct ap_config_info info;
struct list_head mdev_list;
struct mutex mdevs_lock; /* serializes access to each ap_matrix_mdev */
struct ap_driver *vfio_ap_drv;
struct mutex guests_lock; /* serializes access to each KVM guest */
struct mdev_parent parent;
struct mdev_type mdev_type;
struct mdev_type *mdev_types[];
};
extern struct ap_matrix_dev *matrix_dev;

View File

@ -3,6 +3,7 @@ menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
select IOMMU_API
select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
select INTERVAL_TREE
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/driver-api/vfio.rst for more details.

View File

@ -1,9 +1,12 @@
# SPDX-License-Identifier: GPL-2.0
vfio_virqfd-y := virqfd.o
vfio-y += vfio_main.o
obj-$(CONFIG_VFIO) += vfio.o
vfio-y += vfio_main.o \
iova_bitmap.o \
container.o
obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o

680
drivers/vfio/container.c Normal file
View File

@ -0,0 +1,680 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2012 Red Hat, Inc. All rights reserved.
*
* VFIO container (/dev/vfio/vfio)
*/
#include <linux/file.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/capability.h>
#include <linux/iommu.h>
#include <linux/miscdevice.h>
#include <linux/vfio.h>
#include <uapi/linux/vfio.h>
#include "vfio.h"
struct vfio_container {
struct kref kref;
struct list_head group_list;
struct rw_semaphore group_lock;
struct vfio_iommu_driver *iommu_driver;
void *iommu_data;
bool noiommu;
};
static struct vfio {
struct list_head iommu_drivers_list;
struct mutex iommu_drivers_lock;
} vfio;
#ifdef CONFIG_VFIO_NOIOMMU
bool vfio_noiommu __read_mostly;
module_param_named(enable_unsafe_noiommu_mode,
vfio_noiommu, bool, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode. This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel. If you do not know what this is for, step away. (default: false)");
#endif
static void *vfio_noiommu_open(unsigned long arg)
{
if (arg != VFIO_NOIOMMU_IOMMU)
return ERR_PTR(-EINVAL);
if (!capable(CAP_SYS_RAWIO))
return ERR_PTR(-EPERM);
return NULL;
}
static void vfio_noiommu_release(void *iommu_data)
{
}
static long vfio_noiommu_ioctl(void *iommu_data,
unsigned int cmd, unsigned long arg)
{
if (cmd == VFIO_CHECK_EXTENSION)
return vfio_noiommu && (arg == VFIO_NOIOMMU_IOMMU) ? 1 : 0;
return -ENOTTY;
}
static int vfio_noiommu_attach_group(void *iommu_data,
struct iommu_group *iommu_group, enum vfio_group_type type)
{
return 0;
}
static void vfio_noiommu_detach_group(void *iommu_data,
struct iommu_group *iommu_group)
{
}
static const struct vfio_iommu_driver_ops vfio_noiommu_ops = {
.name = "vfio-noiommu",
.owner = THIS_MODULE,
.open = vfio_noiommu_open,
.release = vfio_noiommu_release,
.ioctl = vfio_noiommu_ioctl,
.attach_group = vfio_noiommu_attach_group,
.detach_group = vfio_noiommu_detach_group,
};
/*
* Only noiommu containers can use vfio-noiommu and noiommu containers can only
* use vfio-noiommu.
*/
static bool vfio_iommu_driver_allowed(struct vfio_container *container,
const struct vfio_iommu_driver *driver)
{
if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU))
return true;
return container->noiommu == (driver->ops == &vfio_noiommu_ops);
}
/*
* IOMMU driver registration
*/
int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops)
{
struct vfio_iommu_driver *driver, *tmp;
if (WARN_ON(!ops->register_device != !ops->unregister_device))
return -EINVAL;
driver = kzalloc(sizeof(*driver), GFP_KERNEL);
if (!driver)
return -ENOMEM;
driver->ops = ops;
mutex_lock(&vfio.iommu_drivers_lock);
/* Check for duplicates */
list_for_each_entry(tmp, &vfio.iommu_drivers_list, vfio_next) {
if (tmp->ops == ops) {
mutex_unlock(&vfio.iommu_drivers_lock);
kfree(driver);
return -EINVAL;
}
}
list_add(&driver->vfio_next, &vfio.iommu_drivers_list);
mutex_unlock(&vfio.iommu_drivers_lock);
return 0;
}
EXPORT_SYMBOL_GPL(vfio_register_iommu_driver);
void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
{
struct vfio_iommu_driver *driver;
mutex_lock(&vfio.iommu_drivers_lock);
list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {
if (driver->ops == ops) {
list_del(&driver->vfio_next);
mutex_unlock(&vfio.iommu_drivers_lock);
kfree(driver);
return;
}
}
mutex_unlock(&vfio.iommu_drivers_lock);
}
EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
/*
* Container objects - containers are created when /dev/vfio/vfio is
* opened, but their lifecycle extends until the last user is done, so
* it's freed via kref. Must support container/group/device being
* closed in any order.
*/
static void vfio_container_release(struct kref *kref)
{
struct vfio_container *container;
container = container_of(kref, struct vfio_container, kref);
kfree(container);
}
static void vfio_container_get(struct vfio_container *container)
{
kref_get(&container->kref);
}
static void vfio_container_put(struct vfio_container *container)
{
kref_put(&container->kref, vfio_container_release);
}
void vfio_device_container_register(struct vfio_device *device)
{
struct vfio_iommu_driver *iommu_driver =
device->group->container->iommu_driver;
if (iommu_driver && iommu_driver->ops->register_device)
iommu_driver->ops->register_device(
device->group->container->iommu_data, device);
}
void vfio_device_container_unregister(struct vfio_device *device)
{
struct vfio_iommu_driver *iommu_driver =
device->group->container->iommu_driver;
if (iommu_driver && iommu_driver->ops->unregister_device)
iommu_driver->ops->unregister_device(
device->group->container->iommu_data, device);
}
long vfio_container_ioctl_check_extension(struct vfio_container *container,
unsigned long arg)
{
struct vfio_iommu_driver *driver;
long ret = 0;
down_read(&container->group_lock);
driver = container->iommu_driver;
switch (arg) {
/* No base extensions yet */
default:
/*
* If no driver is set, poll all registered drivers for
* extensions and return the first positive result. If
* a driver is already set, further queries will be passed
* only to that driver.
*/
if (!driver) {
mutex_lock(&vfio.iommu_drivers_lock);
list_for_each_entry(driver, &vfio.iommu_drivers_list,
vfio_next) {
if (!list_empty(&container->group_list) &&
!vfio_iommu_driver_allowed(container,
driver))
continue;
if (!try_module_get(driver->ops->owner))
continue;
ret = driver->ops->ioctl(NULL,
VFIO_CHECK_EXTENSION,
arg);
module_put(driver->ops->owner);
if (ret > 0)
break;
}
mutex_unlock(&vfio.iommu_drivers_lock);
} else
ret = driver->ops->ioctl(container->iommu_data,
VFIO_CHECK_EXTENSION, arg);
}
up_read(&container->group_lock);
return ret;
}
/* hold write lock on container->group_lock */
static int __vfio_container_attach_groups(struct vfio_container *container,
struct vfio_iommu_driver *driver,
void *data)
{
struct vfio_group *group;
int ret = -ENODEV;
list_for_each_entry(group, &container->group_list, container_next) {
ret = driver->ops->attach_group(data, group->iommu_group,
group->type);
if (ret)
goto unwind;
}
return ret;
unwind:
list_for_each_entry_continue_reverse(group, &container->group_list,
container_next) {
driver->ops->detach_group(data, group->iommu_group);
}
return ret;
}
static long vfio_ioctl_set_iommu(struct vfio_container *container,
unsigned long arg)
{
struct vfio_iommu_driver *driver;
long ret = -ENODEV;
down_write(&container->group_lock);
/*
* The container is designed to be an unprivileged interface while
* the group can be assigned to specific users. Therefore, only by
* adding a group to a container does the user get the privilege of
* enabling the iommu, which may allocate finite resources. There
* is no unset_iommu, but by removing all the groups from a container,
* the container is deprivileged and returns to an unset state.
*/
if (list_empty(&container->group_list) || container->iommu_driver) {
up_write(&container->group_lock);
return -EINVAL;
}
mutex_lock(&vfio.iommu_drivers_lock);
list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {
void *data;
if (!vfio_iommu_driver_allowed(container, driver))
continue;
if (!try_module_get(driver->ops->owner))
continue;
/*
* The arg magic for SET_IOMMU is the same as CHECK_EXTENSION,
* so test which iommu driver reported support for this
* extension and call open on them. We also pass them the
* magic, allowing a single driver to support multiple
* interfaces if they'd like.
*/
if (driver->ops->ioctl(NULL, VFIO_CHECK_EXTENSION, arg) <= 0) {
module_put(driver->ops->owner);
continue;
}
data = driver->ops->open(arg);
if (IS_ERR(data)) {
ret = PTR_ERR(data);
module_put(driver->ops->owner);
continue;
}
ret = __vfio_container_attach_groups(container, driver, data);
if (ret) {
driver->ops->release(data);
module_put(driver->ops->owner);
continue;
}
container->iommu_driver = driver;
container->iommu_data = data;
break;
}
mutex_unlock(&vfio.iommu_drivers_lock);
up_write(&container->group_lock);
return ret;
}
static long vfio_fops_unl_ioctl(struct file *filep,
unsigned int cmd, unsigned long arg)
{
struct vfio_container *container = filep->private_data;
struct vfio_iommu_driver *driver;
void *data;
long ret = -EINVAL;
if (!container)
return ret;
switch (cmd) {
case VFIO_GET_API_VERSION:
ret = VFIO_API_VERSION;
break;
case VFIO_CHECK_EXTENSION:
ret = vfio_container_ioctl_check_extension(container, arg);
break;
case VFIO_SET_IOMMU:
ret = vfio_ioctl_set_iommu(container, arg);
break;
default:
driver = container->iommu_driver;
data = container->iommu_data;
if (driver) /* passthrough all unrecognized ioctls */
ret = driver->ops->ioctl(data, cmd, arg);
}
return ret;
}
static int vfio_fops_open(struct inode *inode, struct file *filep)
{
struct vfio_container *container;
container = kzalloc(sizeof(*container), GFP_KERNEL);
if (!container)
return -ENOMEM;
INIT_LIST_HEAD(&container->group_list);
init_rwsem(&container->group_lock);
kref_init(&container->kref);
filep->private_data = container;
return 0;
}
static int vfio_fops_release(struct inode *inode, struct file *filep)
{
struct vfio_container *container = filep->private_data;
struct vfio_iommu_driver *driver = container->iommu_driver;
if (driver && driver->ops->notify)
driver->ops->notify(container->iommu_data,
VFIO_IOMMU_CONTAINER_CLOSE);
filep->private_data = NULL;
vfio_container_put(container);
return 0;
}
static const struct file_operations vfio_fops = {
.owner = THIS_MODULE,
.open = vfio_fops_open,
.release = vfio_fops_release,
.unlocked_ioctl = vfio_fops_unl_ioctl,
.compat_ioctl = compat_ptr_ioctl,
};
struct vfio_container *vfio_container_from_file(struct file *file)
{
struct vfio_container *container;
/* Sanity check, is this really our fd? */
if (file->f_op != &vfio_fops)
return NULL;
container = file->private_data;
WARN_ON(!container); /* fget ensures we don't race vfio_release */
return container;
}
static struct miscdevice vfio_dev = {
.minor = VFIO_MINOR,
.name = "vfio",
.fops = &vfio_fops,
.nodename = "vfio/vfio",
.mode = S_IRUGO | S_IWUGO,
};
int vfio_container_attach_group(struct vfio_container *container,
struct vfio_group *group)
{
struct vfio_iommu_driver *driver;
int ret = 0;
lockdep_assert_held(&group->group_lock);
if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
return -EPERM;
down_write(&container->group_lock);
/* Real groups and fake groups cannot mix */
if (!list_empty(&container->group_list) &&
container->noiommu != (group->type == VFIO_NO_IOMMU)) {
ret = -EPERM;
goto out_unlock_container;
}
if (group->type == VFIO_IOMMU) {
ret = iommu_group_claim_dma_owner(group->iommu_group, group);
if (ret)
goto out_unlock_container;
}
driver = container->iommu_driver;
if (driver) {
ret = driver->ops->attach_group(container->iommu_data,
group->iommu_group,
group->type);
if (ret) {
if (group->type == VFIO_IOMMU)
iommu_group_release_dma_owner(
group->iommu_group);
goto out_unlock_container;
}
}
group->container = container;
group->container_users = 1;
container->noiommu = (group->type == VFIO_NO_IOMMU);
list_add(&group->container_next, &container->group_list);
/* Get a reference on the container and mark a user within the group */
vfio_container_get(container);
out_unlock_container:
up_write(&container->group_lock);
return ret;
}
void vfio_group_detach_container(struct vfio_group *group)
{
struct vfio_container *container = group->container;
struct vfio_iommu_driver *driver;
lockdep_assert_held(&group->group_lock);
WARN_ON(group->container_users != 1);
down_write(&container->group_lock);
driver = container->iommu_driver;
if (driver)
driver->ops->detach_group(container->iommu_data,
group->iommu_group);
if (group->type == VFIO_IOMMU)
iommu_group_release_dma_owner(group->iommu_group);
group->container = NULL;
group->container_users = 0;
list_del(&group->container_next);
/* Detaching the last group deprivileges a container, remove iommu */
if (driver && list_empty(&container->group_list)) {
driver->ops->release(container->iommu_data);
module_put(driver->ops->owner);
container->iommu_driver = NULL;
container->iommu_data = NULL;
}
up_write(&container->group_lock);
vfio_container_put(container);
}
int vfio_device_assign_container(struct vfio_device *device)
{
struct vfio_group *group = device->group;
lockdep_assert_held(&group->group_lock);
if (!group->container || !group->container->iommu_driver ||
WARN_ON(!group->container_users))
return -EINVAL;
if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
return -EPERM;
get_file(group->opened_file);
group->container_users++;
return 0;
}
void vfio_device_unassign_container(struct vfio_device *device)
{
mutex_lock(&device->group->group_lock);
WARN_ON(device->group->container_users <= 1);
device->group->container_users--;
fput(device->group->opened_file);
mutex_unlock(&device->group->group_lock);
}
/*
* Pin contiguous user pages and return their associated host pages for local
* domain only.
* @device [in] : device
* @iova [in] : starting IOVA of user pages to be pinned.
* @npage [in] : count of pages to be pinned. This count should not
* be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
* @prot [in] : protection flags
* @pages[out] : array of host pages
* Return error or number of pages pinned.
*
* A driver may only call this function if the vfio_device was created
* by vfio_register_emulated_iommu_dev().
*/
int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, struct page **pages)
{
struct vfio_container *container;
struct vfio_group *group = device->group;
struct vfio_iommu_driver *driver;
int ret;
if (!pages || !npage || !vfio_assert_device_open(device))
return -EINVAL;
if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
return -E2BIG;
/* group->container cannot change while a vfio device is open */
container = group->container;
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
group->iommu_group, iova,
npage, prot, pages);
else
ret = -ENOTTY;
return ret;
}
EXPORT_SYMBOL(vfio_pin_pages);
/*
* Unpin contiguous host pages for local domain only.
* @device [in] : device
* @iova [in] : starting address of user pages to be unpinned.
* @npage [in] : count of pages to be unpinned. This count should not
* be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
*/
void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
{
struct vfio_container *container;
struct vfio_iommu_driver *driver;
if (WARN_ON(npage <= 0 || npage > VFIO_PIN_PAGES_MAX_ENTRIES))
return;
if (WARN_ON(!vfio_assert_device_open(device)))
return;
/* group->container cannot change while a vfio device is open */
container = device->group->container;
driver = container->iommu_driver;
driver->ops->unpin_pages(container->iommu_data, iova, npage);
}
EXPORT_SYMBOL(vfio_unpin_pages);
/*
* This interface allows the CPUs to perform some sort of virtual DMA on
* behalf of the device.
*
* CPUs read/write from/into a range of IOVAs pointing to user space memory
* into/from a kernel buffer.
*
* As the read/write of user space memory is conducted via the CPUs and is
* not a real device DMA, it is not necessary to pin the user space memory.
*
* @device [in] : VFIO device
* @iova [in] : base IOVA of a user space buffer
* @data [in] : pointer to kernel buffer
* @len [in] : kernel buffer length
* @write : indicate read or write
* Return error code on failure or 0 on success.
*/
int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
size_t len, bool write)
{
struct vfio_container *container;
struct vfio_iommu_driver *driver;
int ret = 0;
if (!data || len <= 0 || !vfio_assert_device_open(device))
return -EINVAL;
/* group->container cannot change while a vfio device is open */
container = device->group->container;
driver = container->iommu_driver;
if (likely(driver && driver->ops->dma_rw))
ret = driver->ops->dma_rw(container->iommu_data,
iova, data, len, write);
else
ret = -ENOTTY;
return ret;
}
EXPORT_SYMBOL(vfio_dma_rw);
int __init vfio_container_init(void)
{
int ret;
mutex_init(&vfio.iommu_drivers_lock);
INIT_LIST_HEAD(&vfio.iommu_drivers_list);
ret = misc_register(&vfio_dev);
if (ret) {
pr_err("vfio: misc device register failed\n");
return ret;
}
if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) {
ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
if (ret)
goto err_misc;
}
return 0;
err_misc:
misc_deregister(&vfio_dev);
return ret;
}
void vfio_container_cleanup(void)
{
if (IS_ENABLED(CONFIG_VFIO_NOIOMMU))
vfio_unregister_iommu_driver(&vfio_noiommu_ops);
misc_deregister(&vfio_dev);
mutex_destroy(&vfio.iommu_drivers_lock);
}

View File

@ -108,9 +108,9 @@ static void vfio_fsl_mc_close_device(struct vfio_device *core_vdev)
/* reset the device before cleaning up the interrupts */
ret = vfio_fsl_mc_reset_device(vdev);
if (WARN_ON(ret))
if (ret)
dev_warn(&mc_cont->dev,
"VFIO_FLS_MC: reset device has failed (%d)\n", ret);
"VFIO_FSL_MC: reset device has failed (%d)\n", ret);
vfio_fsl_mc_irqs_cleanup(vdev);
@ -418,16 +418,7 @@ static int vfio_fsl_mc_mmap(struct vfio_device *core_vdev,
return vfio_fsl_mc_mmap_mmio(vdev->regions[index], vma);
}
static const struct vfio_device_ops vfio_fsl_mc_ops = {
.name = "vfio-fsl-mc",
.open_device = vfio_fsl_mc_open_device,
.close_device = vfio_fsl_mc_close_device,
.ioctl = vfio_fsl_mc_ioctl,
.read = vfio_fsl_mc_read,
.write = vfio_fsl_mc_write,
.mmap = vfio_fsl_mc_mmap,
};
static const struct vfio_device_ops vfio_fsl_mc_ops;
static int vfio_fsl_mc_bus_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@ -518,35 +509,43 @@ static void vfio_fsl_uninit_device(struct vfio_fsl_mc_device *vdev)
bus_unregister_notifier(&fsl_mc_bus_type, &vdev->nb);
}
static int vfio_fsl_mc_init_dev(struct vfio_device *core_vdev)
{
struct vfio_fsl_mc_device *vdev =
container_of(core_vdev, struct vfio_fsl_mc_device, vdev);
struct fsl_mc_device *mc_dev = to_fsl_mc_device(core_vdev->dev);
int ret;
vdev->mc_dev = mc_dev;
mutex_init(&vdev->igate);
if (is_fsl_mc_bus_dprc(mc_dev))
ret = vfio_assign_device_set(core_vdev, &mc_dev->dev);
else
ret = vfio_assign_device_set(core_vdev, mc_dev->dev.parent);
if (ret)
return ret;
/* device_set is released by vfio core if @init fails */
return vfio_fsl_mc_init_device(vdev);
}
static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev)
{
struct vfio_fsl_mc_device *vdev;
struct device *dev = &mc_dev->dev;
int ret;
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
return -ENOMEM;
vfio_init_group_dev(&vdev->vdev, dev, &vfio_fsl_mc_ops);
vdev->mc_dev = mc_dev;
mutex_init(&vdev->igate);
if (is_fsl_mc_bus_dprc(mc_dev))
ret = vfio_assign_device_set(&vdev->vdev, &mc_dev->dev);
else
ret = vfio_assign_device_set(&vdev->vdev, mc_dev->dev.parent);
if (ret)
goto out_uninit;
ret = vfio_fsl_mc_init_device(vdev);
if (ret)
goto out_uninit;
vdev = vfio_alloc_device(vfio_fsl_mc_device, vdev, dev,
&vfio_fsl_mc_ops);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
ret = vfio_register_group_dev(&vdev->vdev);
if (ret) {
dev_err(dev, "VFIO_FSL_MC: Failed to add to vfio group\n");
goto out_device;
goto out_put_vdev;
}
ret = vfio_fsl_mc_scan_container(mc_dev);
@ -557,30 +556,44 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev)
out_group_dev:
vfio_unregister_group_dev(&vdev->vdev);
out_device:
vfio_fsl_uninit_device(vdev);
out_uninit:
vfio_uninit_group_dev(&vdev->vdev);
kfree(vdev);
out_put_vdev:
vfio_put_device(&vdev->vdev);
return ret;
}
static void vfio_fsl_mc_release_dev(struct vfio_device *core_vdev)
{
struct vfio_fsl_mc_device *vdev =
container_of(core_vdev, struct vfio_fsl_mc_device, vdev);
vfio_fsl_uninit_device(vdev);
mutex_destroy(&vdev->igate);
vfio_free_device(core_vdev);
}
static int vfio_fsl_mc_remove(struct fsl_mc_device *mc_dev)
{
struct device *dev = &mc_dev->dev;
struct vfio_fsl_mc_device *vdev = dev_get_drvdata(dev);
vfio_unregister_group_dev(&vdev->vdev);
mutex_destroy(&vdev->igate);
dprc_remove_devices(mc_dev, NULL, 0);
vfio_fsl_uninit_device(vdev);
vfio_uninit_group_dev(&vdev->vdev);
kfree(vdev);
vfio_put_device(&vdev->vdev);
return 0;
}
static const struct vfio_device_ops vfio_fsl_mc_ops = {
.name = "vfio-fsl-mc",
.init = vfio_fsl_mc_init_dev,
.release = vfio_fsl_mc_release_dev,
.open_device = vfio_fsl_mc_open_device,
.close_device = vfio_fsl_mc_close_device,
.ioctl = vfio_fsl_mc_ioctl,
.read = vfio_fsl_mc_read,
.write = vfio_fsl_mc_write,
.mmap = vfio_fsl_mc_mmap,
};
static struct fsl_mc_driver vfio_fsl_mc_driver = {
.probe = vfio_fsl_mc_probe,
.remove = vfio_fsl_mc_remove,

422
drivers/vfio/iova_bitmap.c Normal file
View File

@ -0,0 +1,422 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2022, Oracle and/or its affiliates.
* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
*/
#include <linux/iova_bitmap.h>
#include <linux/mm.h>
#include <linux/highmem.h>
#define BITS_PER_PAGE (PAGE_SIZE * BITS_PER_BYTE)
/*
* struct iova_bitmap_map - A bitmap representing an IOVA range
*
* Main data structure for tracking mapped user pages of bitmap data.
*
* For example, for something recording dirty IOVAs, it will be provided a
* struct iova_bitmap structure, as a general structure for iterating the
* total IOVA range. The struct iova_bitmap_map, though, represents the
* subset of said IOVA space that is pinned by its parent structure (struct
* iova_bitmap).
*
* The user does not need to exact location of the bits in the bitmap.
* From user perspective the only API available is iova_bitmap_set() which
* records the IOVA *range* in the bitmap by setting the corresponding
* bits.
*
* The bitmap is an array of u64 whereas each bit represents an IOVA of
* range of (1 << pgshift). Thus formula for the bitmap data to be set is:
*
* data[(iova / page_size) / 64] & (1ULL << (iova % 64))
*/
struct iova_bitmap_map {
/* base IOVA representing bit 0 of the first page */
unsigned long iova;
/* page size order that each bit granules to */
unsigned long pgshift;
/* page offset of the first user page pinned */
unsigned long pgoff;
/* number of pages pinned */
unsigned long npages;
/* pinned pages representing the bitmap data */
struct page **pages;
};
/*
* struct iova_bitmap - The IOVA bitmap object
*
* Main data structure for iterating over the bitmap data.
*
* Abstracts the pinning work and iterates in IOVA ranges.
* It uses a windowing scheme and pins the bitmap in relatively
* big ranges e.g.
*
* The bitmap object uses one base page to store all the pinned pages
* pointers related to the bitmap. For sizeof(struct page*) == 8 it stores
* 512 struct page pointers which, if the base page size is 4K, it means
* 2M of bitmap data is pinned at a time. If the iova_bitmap page size is
* also 4K then the range window to iterate is 64G.
*
* For example iterating on a total IOVA range of 4G..128G, it will walk
* through this set of ranges:
*
* 4G - 68G-1 (64G)
* 68G - 128G-1 (64G)
*
* An example of the APIs on how to use/iterate over the IOVA bitmap:
*
* bitmap = iova_bitmap_alloc(iova, length, page_size, data);
* if (IS_ERR(bitmap))
* return PTR_ERR(bitmap);
*
* ret = iova_bitmap_for_each(bitmap, arg, dirty_reporter_fn);
*
* iova_bitmap_free(bitmap);
*
* Each iteration of the @dirty_reporter_fn is called with a unique @iova
* and @length argument, indicating the current range available through the
* iova_bitmap. The @dirty_reporter_fn uses iova_bitmap_set() to mark dirty
* areas (@iova_length) within that provided range, as following:
*
* iova_bitmap_set(bitmap, iova, iova_length);
*
* The internals of the object uses an index @mapped_base_index that indexes
* which u64 word of the bitmap is mapped, up to @mapped_total_index.
* Those keep being incremented until @mapped_total_index is reached while
* mapping up to PAGE_SIZE / sizeof(struct page*) maximum of pages.
*
* The IOVA bitmap is usually located on what tracks DMA mapped ranges or
* some form of IOVA range tracking that co-relates to the user passed
* bitmap.
*/
struct iova_bitmap {
/* IOVA range representing the currently mapped bitmap data */
struct iova_bitmap_map mapped;
/* userspace address of the bitmap */
u64 __user *bitmap;
/* u64 index that @mapped points to */
unsigned long mapped_base_index;
/* how many u64 can we walk in total */
unsigned long mapped_total_index;
/* base IOVA of the whole bitmap */
unsigned long iova;
/* length of the IOVA range for the whole bitmap */
size_t length;
};
/*
* Converts a relative IOVA to a bitmap index.
* This function provides the index into the u64 array (bitmap::bitmap)
* for a given IOVA offset.
* Relative IOVA means relative to the bitmap::mapped base IOVA
* (stored in mapped::iova). All computations in this file are done using
* relative IOVAs and thus avoid an extra subtraction against mapped::iova.
* The user API iova_bitmap_set() always uses a regular absolute IOVAs.
*/
static unsigned long iova_bitmap_offset_to_index(struct iova_bitmap *bitmap,
unsigned long iova)
{
unsigned long pgsize = 1 << bitmap->mapped.pgshift;
return iova / (BITS_PER_TYPE(*bitmap->bitmap) * pgsize);
}
/*
* Converts a bitmap index to a *relative* IOVA.
*/
static unsigned long iova_bitmap_index_to_offset(struct iova_bitmap *bitmap,
unsigned long index)
{
unsigned long pgshift = bitmap->mapped.pgshift;
return (index * BITS_PER_TYPE(*bitmap->bitmap)) << pgshift;
}
/*
* Returns the base IOVA of the mapped range.
*/
static unsigned long iova_bitmap_mapped_iova(struct iova_bitmap *bitmap)
{
unsigned long skip = bitmap->mapped_base_index;
return bitmap->iova + iova_bitmap_index_to_offset(bitmap, skip);
}
/*
* Pins the bitmap user pages for the current range window.
* This is internal to IOVA bitmap and called when advancing the
* index (@mapped_base_index) or allocating the bitmap.
*/
static int iova_bitmap_get(struct iova_bitmap *bitmap)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
unsigned long npages;
u64 __user *addr;
long ret;
/*
* @mapped_base_index is the index of the currently mapped u64 words
* that we have access. Anything before @mapped_base_index is not
* mapped. The range @mapped_base_index .. @mapped_total_index-1 is
* mapped but capped at a maximum number of pages.
*/
npages = DIV_ROUND_UP((bitmap->mapped_total_index -
bitmap->mapped_base_index) *
sizeof(*bitmap->bitmap), PAGE_SIZE);
/*
* We always cap at max number of 'struct page' a base page can fit.
* This is, for example, on x86 means 2M of bitmap data max.
*/
npages = min(npages, PAGE_SIZE / sizeof(struct page *));
/*
* Bitmap address to be pinned is calculated via pointer arithmetic
* with bitmap u64 word index.
*/
addr = bitmap->bitmap + bitmap->mapped_base_index;
ret = pin_user_pages_fast((unsigned long)addr, npages,
FOLL_WRITE, mapped->pages);
if (ret <= 0)
return -EFAULT;
mapped->npages = (unsigned long)ret;
/* Base IOVA where @pages point to i.e. bit 0 of the first page */
mapped->iova = iova_bitmap_mapped_iova(bitmap);
/*
* offset of the page where pinned pages bit 0 is located.
* This handles the case where the bitmap is not PAGE_SIZE
* aligned.
*/
mapped->pgoff = offset_in_page(addr);
return 0;
}
/*
* Unpins the bitmap user pages and clears @npages
* (un)pinning is abstracted from API user and it's done when advancing
* the index or freeing the bitmap.
*/
static void iova_bitmap_put(struct iova_bitmap *bitmap)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
if (mapped->npages) {
unpin_user_pages(mapped->pages, mapped->npages);
mapped->npages = 0;
}
}
/**
* iova_bitmap_alloc() - Allocates an IOVA bitmap object
* @iova: Start address of the IOVA range
* @length: Length of the IOVA range
* @page_size: Page size of the IOVA bitmap. It defines what each bit
* granularity represents
* @data: Userspace address of the bitmap
*
* Allocates an IOVA object and initializes all its fields including the
* first user pages of @data.
*
* Return: A pointer to a newly allocated struct iova_bitmap
* or ERR_PTR() on error.
*/
struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length,
unsigned long page_size, u64 __user *data)
{
struct iova_bitmap_map *mapped;
struct iova_bitmap *bitmap;
int rc;
bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
if (!bitmap)
return ERR_PTR(-ENOMEM);
mapped = &bitmap->mapped;
mapped->pgshift = __ffs(page_size);
bitmap->bitmap = data;
bitmap->mapped_total_index =
iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
bitmap->iova = iova;
bitmap->length = length;
mapped->iova = iova;
mapped->pages = (struct page **)__get_free_page(GFP_KERNEL);
if (!mapped->pages) {
rc = -ENOMEM;
goto err;
}
rc = iova_bitmap_get(bitmap);
if (rc)
goto err;
return bitmap;
err:
iova_bitmap_free(bitmap);
return ERR_PTR(rc);
}
/**
* iova_bitmap_free() - Frees an IOVA bitmap object
* @bitmap: IOVA bitmap to free
*
* It unpins and releases pages array memory and clears any leftover
* state.
*/
void iova_bitmap_free(struct iova_bitmap *bitmap)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
iova_bitmap_put(bitmap);
if (mapped->pages) {
free_page((unsigned long)mapped->pages);
mapped->pages = NULL;
}
kfree(bitmap);
}
/*
* Returns the remaining bitmap indexes from mapped_total_index to process for
* the currently pinned bitmap pages.
*/
static unsigned long iova_bitmap_mapped_remaining(struct iova_bitmap *bitmap)
{
unsigned long remaining;
remaining = bitmap->mapped_total_index - bitmap->mapped_base_index;
remaining = min_t(unsigned long, remaining,
(bitmap->mapped.npages << PAGE_SHIFT) / sizeof(*bitmap->bitmap));
return remaining;
}
/*
* Returns the length of the mapped IOVA range.
*/
static unsigned long iova_bitmap_mapped_length(struct iova_bitmap *bitmap)
{
unsigned long max_iova = bitmap->iova + bitmap->length - 1;
unsigned long iova = iova_bitmap_mapped_iova(bitmap);
unsigned long remaining;
/*
* iova_bitmap_mapped_remaining() returns a number of indexes which
* when converted to IOVA gives us a max length that the bitmap
* pinned data can cover. Afterwards, that is capped to
* only cover the IOVA range in @bitmap::iova .. @bitmap::length.
*/
remaining = iova_bitmap_index_to_offset(bitmap,
iova_bitmap_mapped_remaining(bitmap));
if (iova + remaining - 1 > max_iova)
remaining -= ((iova + remaining - 1) - max_iova);
return remaining;
}
/*
* Returns true if there's not more data to iterate.
*/
static bool iova_bitmap_done(struct iova_bitmap *bitmap)
{
return bitmap->mapped_base_index >= bitmap->mapped_total_index;
}
/*
* Advances to the next range, releases the current pinned
* pages and pins the next set of bitmap pages.
* Returns 0 on success or otherwise errno.
*/
static int iova_bitmap_advance(struct iova_bitmap *bitmap)
{
unsigned long iova = iova_bitmap_mapped_length(bitmap) - 1;
unsigned long count = iova_bitmap_offset_to_index(bitmap, iova) + 1;
bitmap->mapped_base_index += count;
iova_bitmap_put(bitmap);
if (iova_bitmap_done(bitmap))
return 0;
/* When advancing the index we pin the next set of bitmap pages */
return iova_bitmap_get(bitmap);
}
/**
* iova_bitmap_for_each() - Iterates over the bitmap
* @bitmap: IOVA bitmap to iterate
* @opaque: Additional argument to pass to the callback
* @fn: Function that gets called for each IOVA range
*
* Helper function to iterate over bitmap data representing a portion of IOVA
* space. It hides the complexity of iterating bitmaps and translating the
* mapped bitmap user pages into IOVA ranges to process.
*
* Return: 0 on success, and an error on failure either upon
* iteration or when the callback returns an error.
*/
int iova_bitmap_for_each(struct iova_bitmap *bitmap, void *opaque,
iova_bitmap_fn_t fn)
{
int ret = 0;
for (; !iova_bitmap_done(bitmap) && !ret;
ret = iova_bitmap_advance(bitmap)) {
ret = fn(bitmap, iova_bitmap_mapped_iova(bitmap),
iova_bitmap_mapped_length(bitmap), opaque);
if (ret)
break;
}
return ret;
}
/**
* iova_bitmap_set() - Records an IOVA range in bitmap
* @bitmap: IOVA bitmap
* @iova: IOVA to start
* @length: IOVA range length
*
* Set the bits corresponding to the range [iova .. iova+length-1] in
* the user bitmap.
*
* Return: The number of bits set.
*/
void iova_bitmap_set(struct iova_bitmap *bitmap,
unsigned long iova, size_t length)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
unsigned long offset = (iova - mapped->iova) >> mapped->pgshift;
unsigned long nbits = max_t(unsigned long, 1, length >> mapped->pgshift);
unsigned long page_idx = offset / BITS_PER_PAGE;
unsigned long page_offset = mapped->pgoff;
void *kaddr;
offset = offset % BITS_PER_PAGE;
do {
unsigned long size = min(BITS_PER_PAGE - offset, nbits);
kaddr = kmap_local_page(mapped->pages[page_idx]);
bitmap_set(kaddr + page_offset, offset, size);
kunmap_local(kaddr);
page_offset = offset = 0;
nbits -= size;
page_idx++;
} while (nbits > 0);
}
EXPORT_SYMBOL_GPL(iova_bitmap_set);

View File

@ -8,9 +8,7 @@
*/
#include <linux/module.h>
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/uuid.h>
#include <linux/sysfs.h>
#include <linux/mdev.h>
@ -20,71 +18,11 @@
#define DRIVER_AUTHOR "NVIDIA Corporation"
#define DRIVER_DESC "Mediated device Core Driver"
static LIST_HEAD(parent_list);
static DEFINE_MUTEX(parent_list_lock);
static struct class_compat *mdev_bus_compat_class;
static LIST_HEAD(mdev_list);
static DEFINE_MUTEX(mdev_list_lock);
struct device *mdev_parent_dev(struct mdev_device *mdev)
{
return mdev->type->parent->dev;
}
EXPORT_SYMBOL(mdev_parent_dev);
/*
* Return the index in supported_type_groups that this mdev_device was created
* from.
*/
unsigned int mdev_get_type_group_id(struct mdev_device *mdev)
{
return mdev->type->type_group_id;
}
EXPORT_SYMBOL(mdev_get_type_group_id);
/*
* Used in mdev_type_attribute sysfs functions to return the index in the
* supported_type_groups that the sysfs is called from.
*/
unsigned int mtype_get_type_group_id(struct mdev_type *mtype)
{
return mtype->type_group_id;
}
EXPORT_SYMBOL(mtype_get_type_group_id);
/*
* Used in mdev_type_attribute sysfs functions to return the parent struct
* device
*/
struct device *mtype_get_parent_dev(struct mdev_type *mtype)
{
return mtype->parent->dev;
}
EXPORT_SYMBOL(mtype_get_parent_dev);
/* Should be called holding parent_list_lock */
static struct mdev_parent *__find_parent_device(struct device *dev)
{
struct mdev_parent *parent;
list_for_each_entry(parent, &parent_list, next) {
if (parent->dev == dev)
return parent;
}
return NULL;
}
void mdev_release_parent(struct kref *kref)
{
struct mdev_parent *parent = container_of(kref, struct mdev_parent,
ref);
struct device *dev = parent->dev;
kfree(parent);
put_device(dev);
}
/* Caller must hold parent unreg_sem read or write lock */
static void mdev_device_remove_common(struct mdev_device *mdev)
{
@ -99,145 +37,96 @@ static void mdev_device_remove_common(struct mdev_device *mdev)
static int mdev_device_remove_cb(struct device *dev, void *data)
{
struct mdev_device *mdev = mdev_from_dev(dev);
if (mdev)
mdev_device_remove_common(mdev);
if (dev->bus == &mdev_bus_type)
mdev_device_remove_common(to_mdev_device(dev));
return 0;
}
/*
* mdev_register_device : Register a device
* mdev_register_parent: Register a device as parent for mdevs
* @parent: parent structure registered
* @dev: device structure representing parent device.
* @mdev_driver: Device driver to bind to the newly created mdev
* @types: Array of supported mdev types
* @nr_types: Number of entries in @types
*
* Registers the @parent stucture as a parent for mdev types and thus mdev
* devices. The caller needs to hold a reference on @dev that must not be
* released until after the call to mdev_unregister_parent().
*
* Add device to list of registered parent devices.
* Returns a negative value on error, otherwise 0.
*/
int mdev_register_device(struct device *dev, struct mdev_driver *mdev_driver)
int mdev_register_parent(struct mdev_parent *parent, struct device *dev,
struct mdev_driver *mdev_driver, struct mdev_type **types,
unsigned int nr_types)
{
int ret;
struct mdev_parent *parent;
char *env_string = "MDEV_STATE=registered";
char *envp[] = { env_string, NULL };
int ret;
/* check for mandatory ops */
if (!mdev_driver->supported_type_groups)
return -EINVAL;
dev = get_device(dev);
if (!dev)
return -EINVAL;
mutex_lock(&parent_list_lock);
/* Check for duplicate */
parent = __find_parent_device(dev);
if (parent) {
parent = NULL;
ret = -EEXIST;
goto add_dev_err;
}
parent = kzalloc(sizeof(*parent), GFP_KERNEL);
if (!parent) {
ret = -ENOMEM;
goto add_dev_err;
}
kref_init(&parent->ref);
memset(parent, 0, sizeof(*parent));
init_rwsem(&parent->unreg_sem);
parent->dev = dev;
parent->mdev_driver = mdev_driver;
parent->types = types;
parent->nr_types = nr_types;
atomic_set(&parent->available_instances, mdev_driver->max_instances);
if (!mdev_bus_compat_class) {
mdev_bus_compat_class = class_compat_register("mdev_bus");
if (!mdev_bus_compat_class) {
ret = -ENOMEM;
goto add_dev_err;
}
if (!mdev_bus_compat_class)
return -ENOMEM;
}
ret = parent_create_sysfs_files(parent);
if (ret)
goto add_dev_err;
return ret;
ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
if (ret)
dev_warn(dev, "Failed to create compatibility class link\n");
list_add(&parent->next, &parent_list);
mutex_unlock(&parent_list_lock);
dev_info(dev, "MDEV: Registered\n");
kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
return 0;
add_dev_err:
mutex_unlock(&parent_list_lock);
if (parent)
mdev_put_parent(parent);
else
put_device(dev);
return ret;
}
EXPORT_SYMBOL(mdev_register_device);
EXPORT_SYMBOL(mdev_register_parent);
/*
* mdev_unregister_device : Unregister a parent device
* @dev: device structure representing parent device.
*
* Remove device from list of registered parent devices. Give a chance to free
* existing mediated devices for given device.
* mdev_unregister_parent : Unregister a parent device
* @parent: parent structure to unregister
*/
void mdev_unregister_device(struct device *dev)
void mdev_unregister_parent(struct mdev_parent *parent)
{
struct mdev_parent *parent;
char *env_string = "MDEV_STATE=unregistered";
char *envp[] = { env_string, NULL };
mutex_lock(&parent_list_lock);
parent = __find_parent_device(dev);
if (!parent) {
mutex_unlock(&parent_list_lock);
return;
}
dev_info(dev, "MDEV: Unregistering\n");
list_del(&parent->next);
mutex_unlock(&parent_list_lock);
dev_info(parent->dev, "MDEV: Unregistering\n");
down_write(&parent->unreg_sem);
class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
device_for_each_child(dev, NULL, mdev_device_remove_cb);
class_compat_remove_link(mdev_bus_compat_class, parent->dev, NULL);
device_for_each_child(parent->dev, NULL, mdev_device_remove_cb);
parent_remove_sysfs_files(parent);
up_write(&parent->unreg_sem);
mdev_put_parent(parent);
/* We still have the caller's reference to use for the uevent */
kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
kobject_uevent_env(&parent->dev->kobj, KOBJ_CHANGE, envp);
}
EXPORT_SYMBOL(mdev_unregister_device);
EXPORT_SYMBOL(mdev_unregister_parent);
static void mdev_device_release(struct device *dev)
{
struct mdev_device *mdev = to_mdev_device(dev);
/* Pairs with the get in mdev_device_create() */
kobject_put(&mdev->type->kobj);
struct mdev_parent *parent = mdev->type->parent;
mutex_lock(&mdev_list_lock);
list_del(&mdev->next);
if (!parent->mdev_driver->get_available)
atomic_inc(&parent->available_instances);
mutex_unlock(&mdev_list_lock);
/* Pairs with the get in mdev_device_create() */
kobject_put(&mdev->type->kobj);
dev_dbg(&mdev->dev, "MDEV: destroying\n");
kfree(mdev);
}
@ -259,6 +148,18 @@ int mdev_device_create(struct mdev_type *type, const guid_t *uuid)
}
}
if (!drv->get_available) {
/*
* Note: that non-atomic read and dec is fine here because
* all modifications are under mdev_list_lock.
*/
if (!atomic_read(&parent->available_instances)) {
mutex_unlock(&mdev_list_lock);
return -EUSERS;
}
atomic_dec(&parent->available_instances);
}
mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
if (!mdev) {
mutex_unlock(&mdev_list_lock);

View File

@ -7,7 +7,6 @@
* Kirti Wankhede <kwankhede@nvidia.com>
*/
#include <linux/device.h>
#include <linux/iommu.h>
#include <linux/mdev.h>
@ -47,7 +46,6 @@ struct bus_type mdev_bus_type = {
.remove = mdev_remove,
.match = mdev_match,
};
EXPORT_SYMBOL_GPL(mdev_bus_type);
/**
* mdev_register_driver - register a new MDEV driver
@ -57,10 +55,11 @@ EXPORT_SYMBOL_GPL(mdev_bus_type);
**/
int mdev_register_driver(struct mdev_driver *drv)
{
if (!drv->device_api)
return -EINVAL;
/* initialize common driver fields */
drv->driver.bus = &mdev_bus_type;
/* register with core */
return driver_register(&drv->driver);
}
EXPORT_SYMBOL(mdev_register_driver);

View File

@ -13,25 +13,7 @@
int mdev_bus_register(void);
void mdev_bus_unregister(void);
struct mdev_parent {
struct device *dev;
struct mdev_driver *mdev_driver;
struct kref ref;
struct list_head next;
struct kset *mdev_types_kset;
struct list_head type_list;
/* Synchronize device creation/removal with parent unregistration */
struct rw_semaphore unreg_sem;
};
struct mdev_type {
struct kobject kobj;
struct kobject *devices_kobj;
struct mdev_parent *parent;
struct list_head next;
unsigned int type_group_id;
};
extern struct bus_type mdev_bus_type;
extern const struct attribute_group *mdev_device_groups[];
#define to_mdev_type_attr(_attr) \
@ -48,16 +30,4 @@ void mdev_remove_sysfs_files(struct mdev_device *mdev);
int mdev_device_create(struct mdev_type *kobj, const guid_t *uuid);
int mdev_device_remove(struct mdev_device *dev);
void mdev_release_parent(struct kref *kref);
static inline void mdev_get_parent(struct mdev_parent *parent)
{
kref_get(&parent->ref);
}
static inline void mdev_put_parent(struct mdev_parent *parent)
{
kref_put(&parent->ref, mdev_release_parent);
}
#endif /* MDEV_PRIVATE_H */

View File

@ -9,14 +9,24 @@
#include <linux/sysfs.h>
#include <linux/ctype.h>
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/uuid.h>
#include <linux/mdev.h>
#include "mdev_private.h"
/* Static functions */
struct mdev_type_attribute {
struct attribute attr;
ssize_t (*show)(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf);
ssize_t (*store)(struct mdev_type *mtype,
struct mdev_type_attribute *attr, const char *buf,
size_t count);
};
#define MDEV_TYPE_ATTR_RO(_name) \
struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_RO(_name)
#define MDEV_TYPE_ATTR_WO(_name) \
struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_WO(_name)
static ssize_t mdev_type_attr_show(struct kobject *kobj,
struct attribute *__attr, char *buf)
@ -74,152 +84,156 @@ static ssize_t create_store(struct mdev_type *mtype,
return count;
}
static MDEV_TYPE_ATTR_WO(create);
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sysfs_emit(buf, "%s\n", mtype->parent->mdev_driver->device_api);
}
static MDEV_TYPE_ATTR_RO(device_api);
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n",
mtype->pretty_name ? mtype->pretty_name : mtype->sysfs_name);
}
static MDEV_TYPE_ATTR_RO(name);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
{
struct mdev_driver *drv = mtype->parent->mdev_driver;
if (drv->get_available)
return sysfs_emit(buf, "%u\n", drv->get_available(mtype));
return sysfs_emit(buf, "%u\n",
atomic_read(&mtype->parent->available_instances));
}
static MDEV_TYPE_ATTR_RO(available_instances);
static ssize_t description_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
{
return mtype->parent->mdev_driver->show_description(mtype, buf);
}
static MDEV_TYPE_ATTR_RO(description);
static struct attribute *mdev_types_core_attrs[] = {
&mdev_type_attr_create.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_name.attr,
&mdev_type_attr_available_instances.attr,
&mdev_type_attr_description.attr,
NULL,
};
static umode_t mdev_types_core_is_visible(struct kobject *kobj,
struct attribute *attr, int n)
{
if (attr == &mdev_type_attr_description.attr &&
!to_mdev_type(kobj)->parent->mdev_driver->show_description)
return 0;
return attr->mode;
}
static struct attribute_group mdev_type_core_group = {
.attrs = mdev_types_core_attrs,
.is_visible = mdev_types_core_is_visible,
};
static const struct attribute_group *mdev_type_groups[] = {
&mdev_type_core_group,
NULL,
};
static void mdev_type_release(struct kobject *kobj)
{
struct mdev_type *type = to_mdev_type(kobj);
pr_debug("Releasing group %s\n", kobj->name);
/* Pairs with the get in add_mdev_supported_type() */
mdev_put_parent(type->parent);
kfree(type);
put_device(type->parent->dev);
}
static struct kobj_type mdev_type_ktype = {
.sysfs_ops = &mdev_type_sysfs_ops,
.release = mdev_type_release,
.default_groups = mdev_type_groups,
};
static struct mdev_type *add_mdev_supported_type(struct mdev_parent *parent,
unsigned int type_group_id)
static int mdev_type_add(struct mdev_parent *parent, struct mdev_type *type)
{
struct mdev_type *type;
struct attribute_group *group =
parent->mdev_driver->supported_type_groups[type_group_id];
int ret;
if (!group->name) {
pr_err("%s: Type name empty!\n", __func__);
return ERR_PTR(-EINVAL);
}
type = kzalloc(sizeof(*type), GFP_KERNEL);
if (!type)
return ERR_PTR(-ENOMEM);
type->kobj.kset = parent->mdev_types_kset;
type->parent = parent;
/* Pairs with the put in mdev_type_release() */
mdev_get_parent(parent);
type->type_group_id = type_group_id;
get_device(parent->dev);
ret = kobject_init_and_add(&type->kobj, &mdev_type_ktype, NULL,
"%s-%s", dev_driver_string(parent->dev),
group->name);
type->sysfs_name);
if (ret) {
kobject_put(&type->kobj);
return ERR_PTR(ret);
return ret;
}
ret = sysfs_create_file(&type->kobj, &mdev_type_attr_create.attr);
if (ret)
goto attr_create_failed;
type->devices_kobj = kobject_create_and_add("devices", &type->kobj);
if (!type->devices_kobj) {
ret = -ENOMEM;
goto attr_devices_failed;
}
ret = sysfs_create_files(&type->kobj,
(const struct attribute **)group->attrs);
if (ret) {
ret = -ENOMEM;
goto attrs_failed;
}
return type;
attrs_failed:
kobject_put(type->devices_kobj);
attr_devices_failed:
sysfs_remove_file(&type->kobj, &mdev_type_attr_create.attr);
attr_create_failed:
kobject_del(&type->kobj);
kobject_put(&type->kobj);
return ERR_PTR(ret);
}
static void remove_mdev_supported_type(struct mdev_type *type)
{
struct attribute_group *group =
type->parent->mdev_driver->supported_type_groups[type->type_group_id];
sysfs_remove_files(&type->kobj,
(const struct attribute **)group->attrs);
kobject_put(type->devices_kobj);
sysfs_remove_file(&type->kobj, &mdev_type_attr_create.attr);
kobject_del(&type->kobj);
kobject_put(&type->kobj);
}
static int add_mdev_supported_type_groups(struct mdev_parent *parent)
{
int i;
for (i = 0; parent->mdev_driver->supported_type_groups[i]; i++) {
struct mdev_type *type;
type = add_mdev_supported_type(parent, i);
if (IS_ERR(type)) {
struct mdev_type *ltype, *tmp;
list_for_each_entry_safe(ltype, tmp, &parent->type_list,
next) {
list_del(&ltype->next);
remove_mdev_supported_type(ltype);
}
return PTR_ERR(type);
}
list_add(&type->next, &parent->type_list);
}
return 0;
attr_devices_failed:
kobject_del(&type->kobj);
kobject_put(&type->kobj);
return ret;
}
static void mdev_type_remove(struct mdev_type *type)
{
kobject_put(type->devices_kobj);
kobject_del(&type->kobj);
kobject_put(&type->kobj);
}
/* mdev sysfs functions */
void parent_remove_sysfs_files(struct mdev_parent *parent)
{
struct mdev_type *type, *tmp;
list_for_each_entry_safe(type, tmp, &parent->type_list, next) {
list_del(&type->next);
remove_mdev_supported_type(type);
}
int i;
for (i = 0; i < parent->nr_types; i++)
mdev_type_remove(parent->types[i]);
kset_unregister(parent->mdev_types_kset);
}
int parent_create_sysfs_files(struct mdev_parent *parent)
{
int ret;
int ret, i;
parent->mdev_types_kset = kset_create_and_add("mdev_supported_types",
NULL, &parent->dev->kobj);
if (!parent->mdev_types_kset)
return -ENOMEM;
INIT_LIST_HEAD(&parent->type_list);
ret = add_mdev_supported_type_groups(parent);
for (i = 0; i < parent->nr_types; i++) {
ret = mdev_type_add(parent, parent->types[i]);
if (ret)
goto create_err;
goto out_err;
}
return 0;
create_err:
kset_unregister(parent->mdev_types_kset);
return ret;
out_err:
while (--i >= 0)
mdev_type_remove(parent->types[i]);
return 0;
}
static ssize_t remove_store(struct device *dev, struct device_attribute *attr,

View File

@ -16,7 +16,7 @@
#include "hisi_acc_vfio_pci.h"
/* return 0 on VM acc device ready, -ETIMEDOUT hardware timeout */
/* Return 0 on VM acc device ready, -ETIMEDOUT hardware timeout */
static int qm_wait_dev_not_ready(struct hisi_qm *qm)
{
u32 val;
@ -189,7 +189,7 @@ static int qm_set_regs(struct hisi_qm *qm, struct acc_vf_data *vf_data)
struct device *dev = &qm->pdev->dev;
int ret;
/* check VF state */
/* Check VF state */
if (unlikely(hisi_qm_wait_mb_ready(qm))) {
dev_err(&qm->pdev->dev, "QM device is not ready to write\n");
return -EBUSY;
@ -337,16 +337,7 @@ static int vf_qm_cache_wb(struct hisi_qm *qm)
return 0;
}
static struct hisi_acc_vf_core_device *hssi_acc_drvdata(struct pci_dev *pdev)
{
struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
return container_of(core_device, struct hisi_acc_vf_core_device,
core_device);
}
static void vf_qm_fun_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev,
struct hisi_qm *qm)
static void vf_qm_fun_reset(struct hisi_qm *qm)
{
int i;
@ -382,7 +373,7 @@ static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
return -EINVAL;
}
/* vf qp num check */
/* VF qp num check */
ret = qm_get_vft(vf_qm, &vf_qm->qp_base);
if (ret <= 0) {
dev_err(dev, "failed to get vft qp nums\n");
@ -396,7 +387,7 @@ static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
vf_qm->qp_num = ret;
/* vf isolation state check */
/* VF isolation state check */
ret = qm_read_regs(pf_qm, QM_QUE_ISO_CFG_V, &que_iso_state, 1);
if (ret) {
dev_err(dev, "failed to read QM_QUE_ISO_CFG_V\n");
@ -405,7 +396,7 @@ static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
if (vf_data->que_iso_cfg != que_iso_state) {
dev_err(dev, "failed to match isolation state\n");
return ret;
return -EINVAL;
}
ret = qm_write_regs(vf_qm, QM_VF_STATE, &vf_data->vf_qm_state, 1);
@ -427,10 +418,10 @@ static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
int ret;
vf_data->acc_magic = ACC_DEV_MAGIC;
/* save device id */
/* Save device id */
vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
/* vf qp num save from PF */
/* VF qp num save from PF */
ret = pf_qm_get_qp_num(pf_qm, vf_id, &vf_data->qp_base);
if (ret <= 0) {
dev_err(dev, "failed to get vft qp nums!\n");
@ -474,19 +465,19 @@ static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
ret = qm_set_regs(qm, vf_data);
if (ret) {
dev_err(dev, "Set VF regs failed\n");
dev_err(dev, "set VF regs failed\n");
return ret;
}
ret = hisi_qm_mb(qm, QM_MB_CMD_SQC_BT, qm->sqc_dma, 0, 0);
if (ret) {
dev_err(dev, "Set sqc failed\n");
dev_err(dev, "set sqc failed\n");
return ret;
}
ret = hisi_qm_mb(qm, QM_MB_CMD_CQC_BT, qm->cqc_dma, 0, 0);
if (ret) {
dev_err(dev, "Set cqc failed\n");
dev_err(dev, "set cqc failed\n");
return ret;
}
@ -528,12 +519,12 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
return -EINVAL;
/* Every reg is 32 bit, the dma address is 64 bit. */
vf_data->eqe_dma = vf_data->qm_eqc_dw[2];
vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
vf_data->eqe_dma |= vf_data->qm_eqc_dw[1];
vf_data->aeqe_dma = vf_data->qm_aeqc_dw[2];
vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[1];
vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
/* Through SQC_BT/CQC_BT to get sqc and cqc address */
ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
@ -552,6 +543,14 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
return 0;
}
static struct hisi_acc_vf_core_device *hisi_acc_drvdata(struct pci_dev *pdev)
{
struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
return container_of(core_device, struct hisi_acc_vf_core_device,
core_device);
}
/* Check the PF's RAS state and Function INT state */
static int
hisi_acc_check_int_state(struct hisi_acc_vf_core_device *hisi_acc_vdev)
@ -662,7 +661,10 @@ static void hisi_acc_vf_start_device(struct hisi_acc_vf_core_device *hisi_acc_vd
if (hisi_acc_vdev->vf_qm_state != QM_READY)
return;
vf_qm_fun_reset(hisi_acc_vdev, vf_qm);
/* Make sure the device is enabled */
qm_dev_cmd_init(vf_qm);
vf_qm_fun_reset(vf_qm);
}
static int hisi_acc_vf_load_state(struct hisi_acc_vf_core_device *hisi_acc_vdev)
@ -970,7 +972,7 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
{
struct hisi_acc_vf_core_device *hisi_acc_vdev = hssi_acc_drvdata(pdev);
struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);
if (hisi_acc_vdev->core_device.vdev.migration_flags !=
VFIO_MIGRATION_STOP_COPY)
@ -1213,8 +1215,28 @@ static const struct vfio_migration_ops hisi_acc_vfio_pci_migrn_state_ops = {
.migration_get_state = hisi_acc_vfio_pci_get_device_state,
};
static int hisi_acc_vfio_pci_migrn_init_dev(struct vfio_device *core_vdev)
{
struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(core_vdev,
struct hisi_acc_vf_core_device, core_device.vdev);
struct pci_dev *pdev = to_pci_dev(core_vdev->dev);
struct hisi_qm *pf_qm = hisi_acc_get_pf_qm(pdev);
hisi_acc_vdev->vf_id = pci_iov_vf_id(pdev) + 1;
hisi_acc_vdev->pf_qm = pf_qm;
hisi_acc_vdev->vf_dev = pdev;
mutex_init(&hisi_acc_vdev->state_mutex);
core_vdev->migration_flags = VFIO_MIGRATION_STOP_COPY;
core_vdev->mig_ops = &hisi_acc_vfio_pci_migrn_state_ops;
return vfio_pci_core_init_dev(core_vdev);
}
static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
.name = "hisi-acc-vfio-pci-migration",
.init = hisi_acc_vfio_pci_migrn_init_dev,
.release = vfio_pci_core_release_dev,
.open_device = hisi_acc_vfio_pci_open_device,
.close_device = hisi_acc_vfio_pci_close_device,
.ioctl = hisi_acc_vfio_pci_ioctl,
@ -1228,6 +1250,8 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
.name = "hisi-acc-vfio-pci",
.init = vfio_pci_core_init_dev,
.release = vfio_pci_core_release_dev,
.open_device = hisi_acc_vfio_pci_open_device,
.close_device = vfio_pci_core_close_device,
.ioctl = vfio_pci_core_ioctl,
@ -1239,73 +1263,45 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
.match = vfio_pci_core_match,
};
static int
hisi_acc_vfio_pci_migrn_init(struct hisi_acc_vf_core_device *hisi_acc_vdev,
struct pci_dev *pdev, struct hisi_qm *pf_qm)
{
int vf_id;
vf_id = pci_iov_vf_id(pdev);
if (vf_id < 0)
return vf_id;
hisi_acc_vdev->vf_id = vf_id + 1;
hisi_acc_vdev->core_device.vdev.migration_flags =
VFIO_MIGRATION_STOP_COPY;
hisi_acc_vdev->pf_qm = pf_qm;
hisi_acc_vdev->vf_dev = pdev;
mutex_init(&hisi_acc_vdev->state_mutex);
return 0;
}
static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct hisi_acc_vf_core_device *hisi_acc_vdev;
const struct vfio_device_ops *ops = &hisi_acc_vfio_pci_ops;
struct hisi_qm *pf_qm;
int vf_id;
int ret;
hisi_acc_vdev = kzalloc(sizeof(*hisi_acc_vdev), GFP_KERNEL);
if (!hisi_acc_vdev)
return -ENOMEM;
pf_qm = hisi_acc_get_pf_qm(pdev);
if (pf_qm && pf_qm->ver >= QM_HW_V3) {
ret = hisi_acc_vfio_pci_migrn_init(hisi_acc_vdev, pdev, pf_qm);
if (!ret) {
vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
&hisi_acc_vfio_pci_migrn_ops);
hisi_acc_vdev->core_device.vdev.mig_ops =
&hisi_acc_vfio_pci_migrn_state_ops;
} else {
vf_id = pci_iov_vf_id(pdev);
if (vf_id >= 0)
ops = &hisi_acc_vfio_pci_migrn_ops;
else
pci_warn(pdev, "migration support failed, continue with generic interface\n");
vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
&hisi_acc_vfio_pci_ops);
}
} else {
vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
&hisi_acc_vfio_pci_ops);
}
hisi_acc_vdev = vfio_alloc_device(hisi_acc_vf_core_device,
core_device.vdev, &pdev->dev, ops);
if (IS_ERR(hisi_acc_vdev))
return PTR_ERR(hisi_acc_vdev);
dev_set_drvdata(&pdev->dev, &hisi_acc_vdev->core_device);
ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
if (ret)
goto out_free;
goto out_put_vdev;
return 0;
out_free:
vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
kfree(hisi_acc_vdev);
out_put_vdev:
vfio_put_device(&hisi_acc_vdev->core_device.vdev);
return ret;
}
static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
{
struct hisi_acc_vf_core_device *hisi_acc_vdev = hssi_acc_drvdata(pdev);
struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);
vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
kfree(hisi_acc_vdev);
vfio_put_device(&hisi_acc_vdev->core_device.vdev);
}
static const struct pci_device_id hisi_acc_vfio_pci_table[] = {

View File

@ -16,7 +16,6 @@
#define SEC_CORE_INT_STATUS 0x301008
#define HPRE_HAC_INT_STATUS 0x301800
#define HZIP_CORE_INT_STATUS 0x3010AC
#define QM_QUE_ISO_CFG 0x301154
#define QM_VFT_CFG_RDY 0x10006c
#define QM_VFT_CFG_OP_WR 0x100058
@ -80,7 +79,7 @@ struct acc_vf_data {
/* QM reserved 5 regs */
u32 qm_rsv_regs[5];
u32 padding;
/* qm memory init information */
/* QM memory init information */
u64 eqe_dma;
u64 aeqe_dma;
u64 sqc_dma;
@ -99,7 +98,7 @@ struct hisi_acc_vf_migration_file {
struct hisi_acc_vf_core_device {
struct vfio_pci_core_device core_device;
u8 deferred_reset:1;
/* for migration state */
/* For migration state */
struct mutex state_mutex;
enum vfio_device_mig_state mig_state;
struct pci_dev *pf_dev;
@ -108,7 +107,7 @@ struct hisi_acc_vf_core_device {
struct hisi_qm vf_qm;
u32 vf_qm_state;
int vf_id;
/* for reset handler */
/* For reset handler */
spinlock_t reset_lock;
struct hisi_acc_vf_migration_file *resuming_migf;
struct hisi_acc_vf_migration_file *saving_migf;

File diff suppressed because it is too large Load Diff

View File

@ -9,6 +9,8 @@
#include <linux/kernel.h>
#include <linux/vfio_pci_core.h>
#include <linux/mlx5/driver.h>
#include <linux/mlx5/cq.h>
#include <linux/mlx5/qp.h>
struct mlx5vf_async_data {
struct mlx5_async_work cb_work;
@ -39,6 +41,56 @@ struct mlx5_vf_migration_file {
struct mlx5vf_async_data async_data;
};
struct mlx5_vhca_cq_buf {
struct mlx5_frag_buf_ctrl fbc;
struct mlx5_frag_buf frag_buf;
int cqe_size;
int nent;
};
struct mlx5_vhca_cq {
struct mlx5_vhca_cq_buf buf;
struct mlx5_db db;
struct mlx5_core_cq mcq;
size_t ncqe;
};
struct mlx5_vhca_recv_buf {
u32 npages;
struct page **page_list;
dma_addr_t *dma_addrs;
u32 next_rq_offset;
u32 mkey;
};
struct mlx5_vhca_qp {
struct mlx5_frag_buf buf;
struct mlx5_db db;
struct mlx5_vhca_recv_buf recv_buf;
u32 tracked_page_size;
u32 max_msg_size;
u32 qpn;
struct {
unsigned int pc;
unsigned int cc;
unsigned int wqe_cnt;
__be32 *db;
struct mlx5_frag_buf_ctrl fbc;
} rq;
};
struct mlx5_vhca_page_tracker {
u32 id;
u32 pdn;
u8 is_err:1;
struct mlx5_uars_page *uar;
struct mlx5_vhca_cq cq;
struct mlx5_vhca_qp *host_qp;
struct mlx5_vhca_qp *fw_qp;
struct mlx5_nb nb;
int status;
};
struct mlx5vf_pci_core_device {
struct vfio_pci_core_device core_device;
int vf_id;
@ -46,6 +98,8 @@ struct mlx5vf_pci_core_device {
u8 migrate_cap:1;
u8 deferred_reset:1;
u8 mdev_detach:1;
u8 log_active:1;
struct completion tracker_comp;
/* protect migration state */
struct mutex state_mutex;
enum vfio_device_mig_state mig_state;
@ -53,6 +107,7 @@ struct mlx5vf_pci_core_device {
spinlock_t reset_lock;
struct mlx5_vf_migration_file *resuming_migf;
struct mlx5_vf_migration_file *saving_migf;
struct mlx5_vhca_page_tracker tracker;
struct workqueue_struct *cb_wq;
struct notifier_block nb;
struct mlx5_core_dev *mdev;
@ -63,7 +118,8 @@ int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
size_t *state_size);
void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
const struct vfio_migration_ops *mig_ops);
const struct vfio_migration_ops *mig_ops,
const struct vfio_log_ops *log_ops);
void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev);
void mlx5vf_cmd_close_migratable(struct mlx5vf_pci_core_device *mvdev);
int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
@ -73,4 +129,9 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev,
void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev);
void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev);
void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work);
int mlx5vf_start_page_tracker(struct vfio_device *vdev,
struct rb_root_cached *ranges, u32 nnodes, u64 *page_size);
int mlx5vf_stop_page_tracker(struct vfio_device *vdev);
int mlx5vf_tracker_read_and_clear(struct vfio_device *vdev, unsigned long iova,
unsigned long length, struct iova_bitmap *dirty);
#endif /* MLX5_VFIO_CMD_H */

View File

@ -579,8 +579,41 @@ static const struct vfio_migration_ops mlx5vf_pci_mig_ops = {
.migration_get_state = mlx5vf_pci_get_device_state,
};
static const struct vfio_log_ops mlx5vf_pci_log_ops = {
.log_start = mlx5vf_start_page_tracker,
.log_stop = mlx5vf_stop_page_tracker,
.log_read_and_clear = mlx5vf_tracker_read_and_clear,
};
static int mlx5vf_pci_init_dev(struct vfio_device *core_vdev)
{
struct mlx5vf_pci_core_device *mvdev = container_of(core_vdev,
struct mlx5vf_pci_core_device, core_device.vdev);
int ret;
ret = vfio_pci_core_init_dev(core_vdev);
if (ret)
return ret;
mlx5vf_cmd_set_migratable(mvdev, &mlx5vf_pci_mig_ops,
&mlx5vf_pci_log_ops);
return 0;
}
static void mlx5vf_pci_release_dev(struct vfio_device *core_vdev)
{
struct mlx5vf_pci_core_device *mvdev = container_of(core_vdev,
struct mlx5vf_pci_core_device, core_device.vdev);
mlx5vf_cmd_remove_migratable(mvdev);
vfio_pci_core_release_dev(core_vdev);
}
static const struct vfio_device_ops mlx5vf_pci_ops = {
.name = "mlx5-vfio-pci",
.init = mlx5vf_pci_init_dev,
.release = mlx5vf_pci_release_dev,
.open_device = mlx5vf_pci_open_device,
.close_device = mlx5vf_pci_close_device,
.ioctl = vfio_pci_core_ioctl,
@ -598,21 +631,19 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
struct mlx5vf_pci_core_device *mvdev;
int ret;
mvdev = kzalloc(sizeof(*mvdev), GFP_KERNEL);
if (!mvdev)
return -ENOMEM;
vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops);
mlx5vf_cmd_set_migratable(mvdev, &mlx5vf_pci_mig_ops);
mvdev = vfio_alloc_device(mlx5vf_pci_core_device, core_device.vdev,
&pdev->dev, &mlx5vf_pci_ops);
if (IS_ERR(mvdev))
return PTR_ERR(mvdev);
dev_set_drvdata(&pdev->dev, &mvdev->core_device);
ret = vfio_pci_core_register_device(&mvdev->core_device);
if (ret)
goto out_free;
goto out_put_vdev;
return 0;
out_free:
mlx5vf_cmd_remove_migratable(mvdev);
vfio_pci_core_uninit_device(&mvdev->core_device);
kfree(mvdev);
out_put_vdev:
vfio_put_device(&mvdev->core_device.vdev);
return ret;
}
@ -621,9 +652,7 @@ static void mlx5vf_pci_remove(struct pci_dev *pdev)
struct mlx5vf_pci_core_device *mvdev = mlx5vf_drvdata(pdev);
vfio_pci_core_unregister_device(&mvdev->core_device);
mlx5vf_cmd_remove_migratable(mvdev);
vfio_pci_core_uninit_device(&mvdev->core_device);
kfree(mvdev);
vfio_put_device(&mvdev->core_device.vdev);
}
static const struct pci_device_id mlx5vf_pci_table[] = {

View File

@ -25,7 +25,7 @@
#include <linux/types.h>
#include <linux/uaccess.h>
#include <linux/vfio_pci_core.h>
#include "vfio_pci_priv.h"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
#define DRIVER_DESC "VFIO PCI - User Level meta-driver"
@ -127,6 +127,8 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
static const struct vfio_device_ops vfio_pci_ops = {
.name = "vfio-pci",
.init = vfio_pci_core_init_dev,
.release = vfio_pci_core_release_dev,
.open_device = vfio_pci_open_device,
.close_device = vfio_pci_core_close_device,
.ioctl = vfio_pci_core_ioctl,
@ -146,20 +148,19 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (vfio_pci_is_denylisted(pdev))
return -EINVAL;
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
return -ENOMEM;
vfio_pci_core_init_device(vdev, pdev, &vfio_pci_ops);
vdev = vfio_alloc_device(vfio_pci_core_device, vdev, &pdev->dev,
&vfio_pci_ops);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
dev_set_drvdata(&pdev->dev, vdev);
ret = vfio_pci_core_register_device(vdev);
if (ret)
goto out_free;
goto out_put_vdev;
return 0;
out_free:
vfio_pci_core_uninit_device(vdev);
kfree(vdev);
out_put_vdev:
vfio_put_device(&vdev->vdev);
return ret;
}
@ -168,8 +169,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
vfio_pci_core_unregister_device(vdev);
vfio_pci_core_uninit_device(vdev);
kfree(vdev);
vfio_put_device(&vdev->vdev);
}
static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)

View File

@ -26,7 +26,7 @@
#include <linux/vfio.h>
#include <linux/slab.h>
#include <linux/vfio_pci_core.h>
#include "vfio_pci_priv.h"
/* Fake capability ID for standard config space */
#define PCI_CAP_ID_BASIC 0
@ -1166,7 +1166,7 @@ static int vfio_msi_config_write(struct vfio_pci_core_device *vdev, int pos,
flags = le16_to_cpu(*pflags);
/* MSI is enabled via ioctl */
if (!is_msi(vdev))
if (vdev->irq_type != VFIO_PCI_MSI_IRQ_INDEX)
flags &= ~PCI_MSI_FLAGS_ENABLE;
/* Check queue size */

View File

@ -28,7 +28,7 @@
#include <linux/nospec.h>
#include <linux/sched/mm.h>
#include <linux/vfio_pci_core.h>
#include "vfio_pci_priv.h"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
#define DRIVER_DESC "core driver for VFIO based PCI devices"
@ -41,6 +41,23 @@ static bool disable_idle_d3;
static DEFINE_MUTEX(vfio_pci_sriov_pfs_mutex);
static LIST_HEAD(vfio_pci_sriov_pfs);
struct vfio_pci_dummy_resource {
struct resource resource;
int index;
struct list_head res_next;
};
struct vfio_pci_vf_token {
struct mutex lock;
uuid_t uuid;
int users;
};
struct vfio_pci_mmap_vma {
struct vm_area_struct *vma;
struct list_head vma_next;
};
static inline bool vfio_vga_disabled(void)
{
#ifdef CONFIG_VFIO_PCI_VGA
@ -260,16 +277,189 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
return ret;
}
static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev,
struct eventfd_ctx *efdctx)
{
/*
* The vdev power related flags are protected with 'memory_lock'
* semaphore.
*/
vfio_pci_zap_and_down_write_memory_lock(vdev);
if (vdev->pm_runtime_engaged) {
up_write(&vdev->memory_lock);
return -EINVAL;
}
vdev->pm_runtime_engaged = true;
vdev->pm_wake_eventfd_ctx = efdctx;
pm_runtime_put_noidle(&vdev->pdev->dev);
up_write(&vdev->memory_lock);
return 0;
}
static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
struct vfio_pci_core_device *vdev =
container_of(device, struct vfio_pci_core_device, vdev);
int ret;
ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
if (ret != 1)
return ret;
/*
* Inside vfio_pci_runtime_pm_entry(), only the runtime PM usage count
* will be decremented. The pm_runtime_put() will be invoked again
* while returning from the ioctl and then the device can go into
* runtime suspended state.
*/
return vfio_pci_runtime_pm_entry(vdev, NULL);
}
static int vfio_pci_core_pm_entry_with_wakeup(
struct vfio_device *device, u32 flags,
struct vfio_device_low_power_entry_with_wakeup __user *arg,
size_t argsz)
{
struct vfio_pci_core_device *vdev =
container_of(device, struct vfio_pci_core_device, vdev);
struct vfio_device_low_power_entry_with_wakeup entry;
struct eventfd_ctx *efdctx;
int ret;
ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
sizeof(entry));
if (ret != 1)
return ret;
if (copy_from_user(&entry, arg, sizeof(entry)))
return -EFAULT;
if (entry.wakeup_eventfd < 0)
return -EINVAL;
efdctx = eventfd_ctx_fdget(entry.wakeup_eventfd);
if (IS_ERR(efdctx))
return PTR_ERR(efdctx);
ret = vfio_pci_runtime_pm_entry(vdev, efdctx);
if (ret)
eventfd_ctx_put(efdctx);
return ret;
}
static void __vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
{
if (vdev->pm_runtime_engaged) {
vdev->pm_runtime_engaged = false;
pm_runtime_get_noresume(&vdev->pdev->dev);
if (vdev->pm_wake_eventfd_ctx) {
eventfd_ctx_put(vdev->pm_wake_eventfd_ctx);
vdev->pm_wake_eventfd_ctx = NULL;
}
}
}
static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
{
/*
* The vdev power related flags are protected with 'memory_lock'
* semaphore.
*/
down_write(&vdev->memory_lock);
__vfio_pci_runtime_pm_exit(vdev);
up_write(&vdev->memory_lock);
}
static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
struct vfio_pci_core_device *vdev =
container_of(device, struct vfio_pci_core_device, vdev);
int ret;
ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
if (ret != 1)
return ret;
/*
* The device is always in the active state here due to pm wrappers
* around ioctls. If the device had entered a low power state and
* pm_wake_eventfd_ctx is valid, vfio_pci_core_runtime_resume() has
* already signaled the eventfd and exited low power mode itself.
* pm_runtime_engaged protects the redundant call here.
*/
vfio_pci_runtime_pm_exit(vdev);
return 0;
}
#ifdef CONFIG_PM
static int vfio_pci_core_runtime_suspend(struct device *dev)
{
struct vfio_pci_core_device *vdev = dev_get_drvdata(dev);
down_write(&vdev->memory_lock);
/*
* The user can move the device into D3hot state before invoking
* power management IOCTL. Move the device into D0 state here and then
* the pci-driver core runtime PM suspend function will move the device
* into the low power state. Also, for the devices which have
* NoSoftRst-, it will help in restoring the original state
* (saved locally in 'vdev->pm_save').
*/
vfio_pci_set_power_state(vdev, PCI_D0);
up_write(&vdev->memory_lock);
/*
* If INTx is enabled, then mask INTx before going into the runtime
* suspended state and unmask the same in the runtime resume.
* If INTx has already been masked by the user, then
* vfio_pci_intx_mask() will return false and in that case, INTx
* should not be unmasked in the runtime resume.
*/
vdev->pm_intx_masked = ((vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) &&
vfio_pci_intx_mask(vdev));
return 0;
}
static int vfio_pci_core_runtime_resume(struct device *dev)
{
struct vfio_pci_core_device *vdev = dev_get_drvdata(dev);
/*
* Resume with a pm_wake_eventfd_ctx signals the eventfd and exit
* low power mode.
*/
down_write(&vdev->memory_lock);
if (vdev->pm_wake_eventfd_ctx) {
eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
__vfio_pci_runtime_pm_exit(vdev);
}
up_write(&vdev->memory_lock);
if (vdev->pm_intx_masked)
vfio_pci_intx_unmask(vdev);
return 0;
}
#endif /* CONFIG_PM */
/*
* The dev_pm_ops needs to be provided to make pci-driver runtime PM working,
* so use structure without any callbacks.
*
* The pci-driver core runtime PM routines always save the device state
* before going into suspended state. If the device is going into low power
* state with only with runtime PM ops, then no explicit handling is needed
* for the devices which have NoSoftRst-.
*/
static const struct dev_pm_ops vfio_pci_core_pm_ops = { };
static const struct dev_pm_ops vfio_pci_core_pm_ops = {
SET_RUNTIME_PM_OPS(vfio_pci_core_runtime_suspend,
vfio_pci_core_runtime_resume,
NULL)
};
int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
{
@ -371,6 +561,18 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
/*
* This function can be invoked while the power state is non-D0.
* This non-D0 power state can be with or without runtime PM.
* vfio_pci_runtime_pm_exit() will internally increment the usage
* count corresponding to pm_runtime_put() called during low power
* feature entry and then pm_runtime_resume() will wake up the device,
* if the device has already gone into the suspended state. Otherwise,
* the vfio_pci_set_power_state() will change the device power state
* to D0.
*/
vfio_pci_runtime_pm_exit(vdev);
pm_runtime_resume(&pdev->dev);
/*
* This function calls __pci_reset_function_locked() which internally
* can use pci_pm_reset() for the function reset. pci_pm_reset() will
* fail if the power state is non-D0. Also, for the devices which
@ -645,7 +847,7 @@ static int msix_mmappable_cap(struct vfio_pci_core_device *vdev,
return vfio_info_add_capability(caps, &header, sizeof(header));
}
int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev,
int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
unsigned int type, unsigned int subtype,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data)
@ -670,27 +872,21 @@ int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev,
return 0;
}
EXPORT_SYMBOL_GPL(vfio_pci_register_dev_region);
EXPORT_SYMBOL_GPL(vfio_pci_core_register_dev_region);
long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
unsigned long arg)
static int vfio_pci_ioctl_get_info(struct vfio_pci_core_device *vdev,
struct vfio_device_info __user *arg)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
unsigned long minsz;
if (cmd == VFIO_DEVICE_GET_INFO) {
unsigned long minsz = offsetofend(struct vfio_device_info, num_irqs);
struct vfio_device_info info;
struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
unsigned long capsz;
int ret;
minsz = offsetofend(struct vfio_device_info, num_irqs);
/* For backward compatibility, cannot require this */
capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
if (copy_from_user(&info, (void __user *)arg, minsz))
if (copy_from_user(&info, arg, minsz))
return -EFAULT;
if (info.argsz < minsz)
@ -711,7 +907,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
ret = vfio_pci_info_zdev_add_caps(vdev, &caps);
if (ret && ret != -ENODEV) {
pci_warn(vdev->pdev, "Failed to setup zPCI info capabilities\n");
pci_warn(vdev->pdev,
"Failed to setup zPCI info capabilities\n");
return ret;
}
@ -721,30 +918,29 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
info.argsz = sizeof(info) + caps.size;
} else {
vfio_info_cap_shift(&caps, sizeof(info));
if (copy_to_user((void __user *)arg +
sizeof(info), caps.buf,
caps.size)) {
if (copy_to_user(arg + 1, caps.buf, caps.size)) {
kfree(caps.buf);
return -EFAULT;
}
info.cap_offset = sizeof(info);
info.cap_offset = sizeof(*arg);
}
kfree(caps.buf);
}
return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;
return copy_to_user(arg, &info, minsz) ? -EFAULT : 0;
}
} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev,
struct vfio_region_info __user *arg)
{
unsigned long minsz = offsetofend(struct vfio_region_info, offset);
struct pci_dev *pdev = vdev->pdev;
struct vfio_region_info info;
struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
int i, ret;
minsz = offsetofend(struct vfio_region_info, offset);
if (copy_from_user(&info, (void __user *)arg, minsz))
if (copy_from_user(&info, arg, minsz))
return -EFAULT;
if (info.argsz < minsz)
@ -777,8 +973,7 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
}
break;
case VFIO_PCI_ROM_REGION_INDEX:
{
case VFIO_PCI_ROM_REGION_INDEX: {
void __iomem *io;
size_t size;
u16 cmd;
@ -798,8 +993,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
}
/*
* Is it really there? Enable memory decode for
* implicit access in pci_map_rom().
* Is it really there? Enable memory decode for implicit access
* in pci_map_rom().
*/
cmd = vfio_pci_memory_lock_and_enable(vdev);
io = pci_map_rom(pdev, &size);
@ -823,18 +1018,16 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
VFIO_REGION_INFO_FLAG_WRITE;
break;
default:
{
default: {
struct vfio_region_info_cap_type cap_type = {
.header.id = VFIO_REGION_INFO_CAP_TYPE,
.header.version = 1 };
.header.version = 1
};
if (info.index >=
VFIO_PCI_NUM_REGIONS + vdev->num_regions)
if (info.index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
return -EINVAL;
info.index = array_index_nospec(info.index,
VFIO_PCI_NUM_REGIONS +
vdev->num_regions);
info.index = array_index_nospec(
info.index, VFIO_PCI_NUM_REGIONS + vdev->num_regions);
i = info.index - VFIO_PCI_NUM_REGIONS;
@ -851,8 +1044,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
return ret;
if (vdev->region[i].ops->add_capability) {
ret = vdev->region[i].ops->add_capability(vdev,
&vdev->region[i], &caps);
ret = vdev->region[i].ops->add_capability(
vdev, &vdev->region[i], &caps);
if (ret)
return ret;
}
@ -866,27 +1059,26 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
info.cap_offset = 0;
} else {
vfio_info_cap_shift(&caps, sizeof(info));
if (copy_to_user((void __user *)arg +
sizeof(info), caps.buf,
caps.size)) {
if (copy_to_user(arg + 1, caps.buf, caps.size)) {
kfree(caps.buf);
return -EFAULT;
}
info.cap_offset = sizeof(info);
info.cap_offset = sizeof(*arg);
}
kfree(caps.buf);
}
return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;
return copy_to_user(arg, &info, minsz) ? -EFAULT : 0;
}
} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
struct vfio_irq_info __user *arg)
{
unsigned long minsz = offsetofend(struct vfio_irq_info, count);
struct vfio_irq_info info;
minsz = offsetofend(struct vfio_irq_info, count);
if (copy_from_user(&info, (void __user *)arg, minsz))
if (copy_from_user(&info, arg, minsz))
return -EFAULT;
if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
@ -909,50 +1101,53 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
info.count = vfio_pci_get_irq_count(vdev, info.index);
if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
info.flags |= (VFIO_IRQ_INFO_MASKABLE |
VFIO_IRQ_INFO_AUTOMASKED);
info.flags |=
(VFIO_IRQ_INFO_MASKABLE | VFIO_IRQ_INFO_AUTOMASKED);
else
info.flags |= VFIO_IRQ_INFO_NORESIZE;
return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;
return copy_to_user(arg, &info, minsz) ? -EFAULT : 0;
}
} else if (cmd == VFIO_DEVICE_SET_IRQS) {
static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
struct vfio_irq_set __user *arg)
{
unsigned long minsz = offsetofend(struct vfio_irq_set, count);
struct vfio_irq_set hdr;
u8 *data = NULL;
int max, ret = 0;
size_t data_size = 0;
minsz = offsetofend(struct vfio_irq_set, count);
if (copy_from_user(&hdr, (void __user *)arg, minsz))
if (copy_from_user(&hdr, arg, minsz))
return -EFAULT;
max = vfio_pci_get_irq_count(vdev, hdr.index);
ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
VFIO_PCI_NUM_IRQS, &data_size);
ret = vfio_set_irqs_validate_and_prepare(&hdr, max, VFIO_PCI_NUM_IRQS,
&data_size);
if (ret)
return ret;
if (data_size) {
data = memdup_user((void __user *)(arg + minsz),
data_size);
data = memdup_user(&arg->data, data_size);
if (IS_ERR(data))
return PTR_ERR(data);
}
mutex_lock(&vdev->igate);
ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index,
hdr.start, hdr.count, data);
ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, hdr.start,
hdr.count, data);
mutex_unlock(&vdev->igate);
kfree(data);
return ret;
}
} else if (cmd == VFIO_DEVICE_RESET) {
static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
void __user *arg)
{
int ret;
if (!vdev->reset_works)
@ -961,13 +1156,12 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
vfio_pci_zap_and_down_write_memory_lock(vdev);
/*
* This function can be invoked while the power state is non-D0.
* If pci_try_reset_function() has been called while the power
* state is non-D0, then pci_try_reset_function() will
* internally set the power state to D0 without vfio driver
* involvement. For the devices which have NoSoftRst-, the
* reset function can cause the PCI config space reset without
* restoring the original state (saved locally in
* This function can be invoked while the power state is non-D0. If
* pci_try_reset_function() has been called while the power state is
* non-D0, then pci_try_reset_function() will internally set the power
* state to D0 without vfio driver involvement. For the devices which
* have NoSoftRst-, the reset function can cause the PCI config space
* reset without restoring the original state (saved locally in
* 'vdev->pm_save').
*/
vfio_pci_set_power_state(vdev, PCI_D0);
@ -976,17 +1170,21 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
up_write(&vdev->memory_lock);
return ret;
}
} else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) {
static int vfio_pci_ioctl_get_pci_hot_reset_info(
struct vfio_pci_core_device *vdev,
struct vfio_pci_hot_reset_info __user *arg)
{
unsigned long minsz =
offsetofend(struct vfio_pci_hot_reset_info, count);
struct vfio_pci_hot_reset_info hdr;
struct vfio_pci_fill_info fill = { 0 };
struct vfio_pci_dependent_device *devices = NULL;
bool slot = false;
int ret = 0;
minsz = offsetofend(struct vfio_pci_hot_reset_info, count);
if (copy_from_user(&hdr, (void __user *)arg, minsz))
if (copy_from_user(&hdr, arg, minsz))
return -EFAULT;
if (hdr.argsz < minsz)
@ -1001,8 +1199,7 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
return -ENODEV;
/* How many devices are affected? */
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
vfio_pci_count_devs,
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
&fill.max, slot);
if (ret)
return ret;
@ -1010,8 +1207,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
WARN_ON(!fill.max); /* Should always be at least one */
/*
* If there's enough space, fill it now, otherwise return
* -ENOSPC and the number of devices affected.
* If there's enough space, fill it now, otherwise return -ENOSPC and
* the number of devices affected.
*/
if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
ret = -ENOSPC;
@ -1025,32 +1222,35 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
fill.devices = devices;
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
vfio_pci_fill_devs,
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
&fill, slot);
/*
* If a device was removed between counting and filling,
* we may come up short of fill.max. If a device was
* added, we'll have a return of -EAGAIN above.
* If a device was removed between counting and filling, we may come up
* short of fill.max. If a device was added, we'll have a return of
* -EAGAIN above.
*/
if (!ret)
hdr.count = fill.cur;
reset_info_exit:
if (copy_to_user((void __user *)arg, &hdr, minsz))
if (copy_to_user(arg, &hdr, minsz))
ret = -EFAULT;
if (!ret) {
if (copy_to_user((void __user *)(arg + minsz), devices,
if (copy_to_user(&arg->devices, devices,
hdr.count * sizeof(*devices)))
ret = -EFAULT;
}
kfree(devices);
return ret;
}
} else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) {
static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
struct vfio_pci_hot_reset __user *arg)
{
unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
struct vfio_pci_hot_reset hdr;
int32_t *group_fds;
struct file **files;
@ -1058,9 +1258,7 @@ reset_info_exit:
bool slot = false;
int file_idx, count = 0, ret = 0;
minsz = offsetofend(struct vfio_pci_hot_reset, count);
if (copy_from_user(&hdr, (void __user *)arg, minsz))
if (copy_from_user(&hdr, arg, minsz))
return -EFAULT;
if (hdr.argsz < minsz || hdr.flags)
@ -1073,13 +1271,11 @@ reset_info_exit:
return -ENODEV;
/*
* We can't let userspace give us an arbitrarily large
* buffer to copy, so verify how many we think there
* could be. Note groups can have multiple devices so
* one group per device is the max.
* We can't let userspace give us an arbitrarily large buffer to copy,
* so verify how many we think there could be. Note groups can have
* multiple devices so one group per device is the max.
*/
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
vfio_pci_count_devs,
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
&count, slot);
if (ret)
return ret;
@ -1096,7 +1292,7 @@ reset_info_exit:
return -ENOMEM;
}
if (copy_from_user(group_fds, (void __user *)(arg + minsz),
if (copy_from_user(group_fds, arg->group_fds,
hdr.count * sizeof(*group_fds))) {
kfree(group_fds);
kfree(files);
@ -1104,9 +1300,9 @@ reset_info_exit:
}
/*
* For each group_fd, get the group through the vfio external
* user interface and store the group and iommu ID. This
* ensures the group is held across the reset.
* For each group_fd, get the group through the vfio external user
* interface and store the group and iommu ID. This ensures the group
* is held across the reset.
*/
for (file_idx = 0; file_idx < hdr.count; file_idx++) {
struct file *file = fget(group_fds[file_idx]);
@ -1117,7 +1313,7 @@ reset_info_exit:
}
/* Ensure the FD is a vfio group FD.*/
if (!vfio_file_iommu_group(file)) {
if (!vfio_file_is_group(file)) {
fput(file);
ret = -EINVAL;
break;
@ -1143,13 +1339,16 @@ hot_reset_release:
kfree(files);
return ret;
} else if (cmd == VFIO_DEVICE_IOEVENTFD) {
}
static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
struct vfio_device_ioeventfd __user *arg)
{
unsigned long minsz = offsetofend(struct vfio_device_ioeventfd, fd);
struct vfio_device_ioeventfd ioeventfd;
int count;
minsz = offsetofend(struct vfio_device_ioeventfd, fd);
if (copy_from_user(&ioeventfd, (void __user *)arg, minsz))
if (copy_from_user(&ioeventfd, arg, minsz))
return -EFAULT;
if (ioeventfd.argsz < minsz)
@ -1163,15 +1362,42 @@ hot_reset_release:
if (hweight8(count) != 1 || ioeventfd.fd < -1)
return -EINVAL;
return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
ioeventfd.data, count, ioeventfd.fd);
}
return vfio_pci_ioeventfd(vdev, ioeventfd.offset, ioeventfd.data, count,
ioeventfd.fd);
}
long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
unsigned long arg)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
void __user *uarg = (void __user *)arg;
switch (cmd) {
case VFIO_DEVICE_GET_INFO:
return vfio_pci_ioctl_get_info(vdev, uarg);
case VFIO_DEVICE_GET_IRQ_INFO:
return vfio_pci_ioctl_get_irq_info(vdev, uarg);
case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO:
return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg);
case VFIO_DEVICE_GET_REGION_INFO:
return vfio_pci_ioctl_get_region_info(vdev, uarg);
case VFIO_DEVICE_IOEVENTFD:
return vfio_pci_ioctl_ioeventfd(vdev, uarg);
case VFIO_DEVICE_PCI_HOT_RESET:
return vfio_pci_ioctl_pci_hot_reset(vdev, uarg);
case VFIO_DEVICE_RESET:
return vfio_pci_ioctl_reset(vdev, uarg);
case VFIO_DEVICE_SET_IRQS:
return vfio_pci_ioctl_set_irqs(vdev, uarg);
default:
return -ENOTTY;
}
}
EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);
static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
uuid_t __user *arg, size_t argsz)
{
struct vfio_pci_core_device *vdev =
container_of(device, struct vfio_pci_core_device, vdev);
@ -1202,6 +1428,13 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
switch (flags & VFIO_DEVICE_FEATURE_MASK) {
case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY:
return vfio_pci_core_pm_entry(device, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP:
return vfio_pci_core_pm_entry_with_wakeup(device, flags,
arg, argsz);
case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT:
return vfio_pci_core_pm_exit(device, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
return vfio_pci_core_feature_token(device, flags, arg, argsz);
default:
@ -1214,31 +1447,47 @@ static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
int ret;
if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
return -EINVAL;
ret = pm_runtime_resume_and_get(&vdev->pdev->dev);
if (ret) {
pci_info_ratelimited(vdev->pdev, "runtime resume failed %d\n",
ret);
return -EIO;
}
switch (index) {
case VFIO_PCI_CONFIG_REGION_INDEX:
return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite);
ret = vfio_pci_config_rw(vdev, buf, count, ppos, iswrite);
break;
case VFIO_PCI_ROM_REGION_INDEX:
if (iswrite)
return -EINVAL;
return vfio_pci_bar_rw(vdev, buf, count, ppos, false);
ret = -EINVAL;
else
ret = vfio_pci_bar_rw(vdev, buf, count, ppos, false);
break;
case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
return vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite);
ret = vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite);
break;
case VFIO_PCI_VGA_REGION_INDEX:
return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite);
ret = vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite);
break;
default:
index -= VFIO_PCI_NUM_REGIONS;
return vdev->region[index].ops->rw(vdev, buf,
ret = vdev->region[index].ops->rw(vdev, buf,
count, ppos, iswrite);
break;
}
return -EINVAL;
pm_runtime_put(&vdev->pdev->dev);
return ret;
}
ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
@ -1433,7 +1682,11 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
mutex_lock(&vdev->vma_lock);
down_read(&vdev->memory_lock);
if (!__vfio_pci_memory_enabled(vdev)) {
/*
* Memory region cannot be accessed if the low power feature is engaged
* or memory access is disabled.
*/
if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev)) {
ret = VM_FAULT_SIGBUS;
goto up_out;
}
@ -1825,12 +2078,12 @@ static void vfio_pci_vga_uninit(struct vfio_pci_core_device *vdev)
VGA_RSRC_LEGACY_MEM);
}
void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,
struct pci_dev *pdev,
const struct vfio_device_ops *vfio_pci_ops)
int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
{
vfio_init_group_dev(&vdev->vdev, &pdev->dev, vfio_pci_ops);
vdev->pdev = pdev;
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
vdev->pdev = to_pci_dev(core_vdev->dev);
vdev->irq_type = VFIO_PCI_NUM_IRQS;
mutex_init(&vdev->igate);
spin_lock_init(&vdev->irqlock);
@ -1841,19 +2094,24 @@ void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,
INIT_LIST_HEAD(&vdev->vma_list);
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
init_rwsem(&vdev->memory_lock);
}
EXPORT_SYMBOL_GPL(vfio_pci_core_init_device);
void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev)
return 0;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_init_dev);
void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
mutex_destroy(&vdev->igate);
mutex_destroy(&vdev->ioeventfds_lock);
mutex_destroy(&vdev->vma_lock);
vfio_uninit_group_dev(&vdev->vdev);
kfree(vdev->region);
kfree(vdev->pm_save);
vfio_free_device(core_vdev);
}
EXPORT_SYMBOL_GPL(vfio_pci_core_uninit_device);
EXPORT_SYMBOL_GPL(vfio_pci_core_release_dev);
int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
{
@ -1875,6 +2133,11 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
return -EINVAL;
}
if (vdev->vdev.log_ops && !(vdev->vdev.log_ops->log_start &&
vdev->vdev.log_ops->log_stop &&
vdev->vdev.log_ops->log_read_and_clear))
return -EINVAL;
/*
* Prevent binding to PFs with VFs enabled, the VFs might be in use
* by the host or other users. We cannot capture the VFs if they
@ -2148,6 +2411,15 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
goto err_unlock;
}
/*
* Some of the devices in the dev_set can be in the runtime suspended
* state. Increment the usage count for all the devices in the dev_set
* before reset and decrement the same after reset.
*/
ret = vfio_pci_dev_set_pm_runtime_get(dev_set);
if (ret)
goto err_unlock;
list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
/*
* Test whether all the affected devices are contained by the
@ -2203,6 +2475,9 @@ err_undo:
else
mutex_unlock(&cur->vma_lock);
}
list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
pm_runtime_put(&cur->pdev->dev);
err_unlock:
mutex_unlock(&dev_set->lock);
return ret;

View File

@ -15,7 +15,7 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/vfio_pci_core.h>
#include "vfio_pci_priv.h"
#define OPREGION_SIGNATURE "IntelGraphicsMem"
#define OPREGION_SIZE (8 * 1024)
@ -257,7 +257,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_core_device *vdev)
}
}
ret = vfio_pci_register_dev_region(vdev,
ret = vfio_pci_core_register_dev_region(vdev,
PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &vfio_pci_igd_regops,
size, VFIO_REGION_INFO_FLAG_READ, opregionvbt);
@ -402,7 +402,7 @@ static int vfio_pci_igd_cfg_init(struct vfio_pci_core_device *vdev)
return -EINVAL;
}
ret = vfio_pci_register_dev_region(vdev,
ret = vfio_pci_core_register_dev_region(vdev,
PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG,
&vfio_pci_igd_cfg_regops, host_bridge->cfg_size,
@ -422,7 +422,7 @@ static int vfio_pci_igd_cfg_init(struct vfio_pci_core_device *vdev)
return -EINVAL;
}
ret = vfio_pci_register_dev_region(vdev,
ret = vfio_pci_core_register_dev_region(vdev,
PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG,
&vfio_pci_igd_cfg_regops, lpc_bridge->cfg_size,

View File

@ -20,7 +20,33 @@
#include <linux/wait.h>
#include <linux/slab.h>
#include <linux/vfio_pci_core.h>
#include "vfio_pci_priv.h"
struct vfio_pci_irq_ctx {
struct eventfd_ctx *trigger;
struct virqfd *unmask;
struct virqfd *mask;
char *name;
bool masked;
struct irq_bypass_producer producer;
};
static bool irq_is(struct vfio_pci_core_device *vdev, int type)
{
return vdev->irq_type == type;
}
static bool is_intx(struct vfio_pci_core_device *vdev)
{
return vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX;
}
static bool is_irq_none(struct vfio_pci_core_device *vdev)
{
return !(vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX ||
vdev->irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
}
/*
* INTx
@ -33,10 +59,12 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
eventfd_signal(vdev->ctx[0].trigger, 1);
}
void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
/* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
unsigned long flags;
bool masked_changed = false;
spin_lock_irqsave(&vdev->irqlock, flags);
@ -60,9 +88,11 @@ void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
disable_irq_nosync(pdev->irq);
vdev->ctx[0].masked = true;
masked_changed = true;
}
spin_unlock_irqrestore(&vdev->irqlock, flags);
return masked_changed;
}
/*

View File

@ -0,0 +1,104 @@
/* SPDX-License-Identifier: GPL-2.0-only */
#ifndef VFIO_PCI_PRIV_H
#define VFIO_PCI_PRIV_H
#include <linux/vfio_pci_core.h>
/* Special capability IDs predefined access */
#define PCI_CAP_ID_INVALID 0xFF /* default raw access */
#define PCI_CAP_ID_INVALID_VIRT 0xFE /* default virt access */
/* Cap maximum number of ioeventfds per device (arbitrary) */
#define VFIO_PCI_IOEVENTFD_MAX 1000
struct vfio_pci_ioeventfd {
struct list_head next;
struct vfio_pci_core_device *vdev;
struct virqfd *virqfd;
void __iomem *addr;
uint64_t data;
loff_t pos;
int bar;
int count;
bool test_mem;
};
bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev);
void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev);
int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
unsigned index, unsigned start, unsigned count,
void *data);
ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
#ifdef CONFIG_VFIO_PCI_VGA
ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
#else
static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev,
char __user *buf, size_t count,
loff_t *ppos, bool iswrite)
{
return -EINVAL;
}
#endif
int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
uint64_t data, int count, int fd);
int vfio_pci_init_perm_bits(void);
void vfio_pci_uninit_perm_bits(void);
int vfio_config_init(struct vfio_pci_core_device *vdev);
void vfio_config_free(struct vfio_pci_core_device *vdev);
int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev,
pci_power_t state);
bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev);
void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev);
u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev);
void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev,
u16 cmd);
#ifdef CONFIG_VFIO_PCI_IGD
int vfio_pci_igd_init(struct vfio_pci_core_device *vdev);
#else
static inline int vfio_pci_igd_init(struct vfio_pci_core_device *vdev)
{
return -ENODEV;
}
#endif
#ifdef CONFIG_VFIO_PCI_ZDEV_KVM
int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps);
int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
#else
static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps)
{
return -ENODEV;
}
static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
{
return 0;
}
static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
{}
#endif
static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
{
return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
}
#endif

View File

@ -17,7 +17,7 @@
#include <linux/vfio.h>
#include <linux/vgaarb.h>
#include <linux/vfio_pci_core.h>
#include "vfio_pci_priv.h"
#ifdef __LITTLE_ENDIAN
#define vfio_ioread64 ioread64
@ -412,7 +412,7 @@ static void vfio_pci_ioeventfd_thread(void *opaque, void *unused)
vfio_pci_ioeventfd_do_write(ioeventfd, ioeventfd->test_mem);
}
long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
uint64_t data, int count, int fd)
{
struct pci_dev *pdev = vdev->pdev;

View File

@ -15,7 +15,7 @@
#include <asm/pci_clp.h>
#include <asm/pci_io.h>
#include <linux/vfio_pci_core.h>
#include "vfio_pci_priv.h"
/*
* Add the Base PCI Function information to the device info region.

View File

@ -7,6 +7,7 @@
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/vfio.h>
#include <linux/pm_runtime.h>
#include <linux/amba/bus.h>
#include "vfio_platform_private.h"
@ -40,20 +41,16 @@ static int get_amba_irq(struct vfio_platform_device *vdev, int i)
return ret ? ret : -ENXIO;
}
static int vfio_amba_probe(struct amba_device *adev, const struct amba_id *id)
static int vfio_amba_init_dev(struct vfio_device *core_vdev)
{
struct vfio_platform_device *vdev;
struct vfio_platform_device *vdev =
container_of(core_vdev, struct vfio_platform_device, vdev);
struct amba_device *adev = to_amba_device(core_vdev->dev);
int ret;
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
return -ENOMEM;
vdev->name = kasprintf(GFP_KERNEL, "vfio-amba-%08x", adev->periphid);
if (!vdev->name) {
kfree(vdev);
if (!vdev->name)
return -ENOMEM;
}
vdev->opaque = (void *) adev;
vdev->flags = VFIO_DEVICE_FLAGS_AMBA;
@ -61,26 +58,67 @@ static int vfio_amba_probe(struct amba_device *adev, const struct amba_id *id)
vdev->get_irq = get_amba_irq;
vdev->reset_required = false;
ret = vfio_platform_probe_common(vdev, &adev->dev);
if (ret) {
ret = vfio_platform_init_common(vdev);
if (ret)
kfree(vdev->name);
kfree(vdev);
return ret;
}
}
static const struct vfio_device_ops vfio_amba_ops;
static int vfio_amba_probe(struct amba_device *adev, const struct amba_id *id)
{
struct vfio_platform_device *vdev;
int ret;
vdev = vfio_alloc_device(vfio_platform_device, vdev, &adev->dev,
&vfio_amba_ops);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
ret = vfio_register_group_dev(&vdev->vdev);
if (ret)
goto out_put_vdev;
pm_runtime_enable(&adev->dev);
dev_set_drvdata(&adev->dev, vdev);
return 0;
out_put_vdev:
vfio_put_device(&vdev->vdev);
return ret;
}
static void vfio_amba_release_dev(struct vfio_device *core_vdev)
{
struct vfio_platform_device *vdev =
container_of(core_vdev, struct vfio_platform_device, vdev);
vfio_platform_release_common(vdev);
kfree(vdev->name);
vfio_free_device(core_vdev);
}
static void vfio_amba_remove(struct amba_device *adev)
{
struct vfio_platform_device *vdev = dev_get_drvdata(&adev->dev);
vfio_platform_remove_common(vdev);
kfree(vdev->name);
kfree(vdev);
vfio_unregister_group_dev(&vdev->vdev);
pm_runtime_disable(vdev->device);
vfio_put_device(&vdev->vdev);
}
static const struct vfio_device_ops vfio_amba_ops = {
.name = "vfio-amba",
.init = vfio_amba_init_dev,
.release = vfio_amba_release_dev,
.open_device = vfio_platform_open_device,
.close_device = vfio_platform_close_device,
.ioctl = vfio_platform_ioctl,
.read = vfio_platform_read,
.write = vfio_platform_write,
.mmap = vfio_platform_mmap,
};
static const struct amba_id pl330_ids[] = {
{ 0, 0 },
};

View File

@ -7,6 +7,7 @@
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/vfio.h>
#include <linux/pm_runtime.h>
#include <linux/platform_device.h>
#include "vfio_platform_private.h"
@ -36,14 +37,11 @@ static int get_platform_irq(struct vfio_platform_device *vdev, int i)
return platform_get_irq_optional(pdev, i);
}
static int vfio_platform_probe(struct platform_device *pdev)
static int vfio_platform_init_dev(struct vfio_device *core_vdev)
{
struct vfio_platform_device *vdev;
int ret;
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
return -ENOMEM;
struct vfio_platform_device *vdev =
container_of(core_vdev, struct vfio_platform_device, vdev);
struct platform_device *pdev = to_platform_device(core_vdev->dev);
vdev->opaque = (void *) pdev;
vdev->name = pdev->name;
@ -52,24 +50,64 @@ static int vfio_platform_probe(struct platform_device *pdev)
vdev->get_irq = get_platform_irq;
vdev->reset_required = reset_required;
ret = vfio_platform_probe_common(vdev, &pdev->dev);
if (ret) {
kfree(vdev);
return ret;
}
return vfio_platform_init_common(vdev);
}
static const struct vfio_device_ops vfio_platform_ops;
static int vfio_platform_probe(struct platform_device *pdev)
{
struct vfio_platform_device *vdev;
int ret;
vdev = vfio_alloc_device(vfio_platform_device, vdev, &pdev->dev,
&vfio_platform_ops);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
ret = vfio_register_group_dev(&vdev->vdev);
if (ret)
goto out_put_vdev;
pm_runtime_enable(&pdev->dev);
dev_set_drvdata(&pdev->dev, vdev);
return 0;
out_put_vdev:
vfio_put_device(&vdev->vdev);
return ret;
}
static void vfio_platform_release_dev(struct vfio_device *core_vdev)
{
struct vfio_platform_device *vdev =
container_of(core_vdev, struct vfio_platform_device, vdev);
vfio_platform_release_common(vdev);
vfio_free_device(core_vdev);
}
static int vfio_platform_remove(struct platform_device *pdev)
{
struct vfio_platform_device *vdev = dev_get_drvdata(&pdev->dev);
vfio_platform_remove_common(vdev);
kfree(vdev);
vfio_unregister_group_dev(&vdev->vdev);
pm_runtime_disable(vdev->device);
vfio_put_device(&vdev->vdev);
return 0;
}
static const struct vfio_device_ops vfio_platform_ops = {
.name = "vfio-platform",
.init = vfio_platform_init_dev,
.release = vfio_platform_release_dev,
.open_device = vfio_platform_open_device,
.close_device = vfio_platform_close_device,
.ioctl = vfio_platform_ioctl,
.read = vfio_platform_read,
.write = vfio_platform_write,
.mmap = vfio_platform_mmap,
};
static struct platform_driver vfio_platform_driver = {
.probe = vfio_platform_probe,
.remove = vfio_platform_remove,

View File

@ -218,7 +218,7 @@ static int vfio_platform_call_reset(struct vfio_platform_device *vdev,
return -EINVAL;
}
static void vfio_platform_close_device(struct vfio_device *core_vdev)
void vfio_platform_close_device(struct vfio_device *core_vdev)
{
struct vfio_platform_device *vdev =
container_of(core_vdev, struct vfio_platform_device, vdev);
@ -236,8 +236,9 @@ static void vfio_platform_close_device(struct vfio_device *core_vdev)
vfio_platform_regions_cleanup(vdev);
vfio_platform_irq_cleanup(vdev);
}
EXPORT_SYMBOL_GPL(vfio_platform_close_device);
static int vfio_platform_open_device(struct vfio_device *core_vdev)
int vfio_platform_open_device(struct vfio_device *core_vdev)
{
struct vfio_platform_device *vdev =
container_of(core_vdev, struct vfio_platform_device, vdev);
@ -273,8 +274,9 @@ err_irq:
vfio_platform_regions_cleanup(vdev);
return ret;
}
EXPORT_SYMBOL_GPL(vfio_platform_open_device);
static long vfio_platform_ioctl(struct vfio_device *core_vdev,
long vfio_platform_ioctl(struct vfio_device *core_vdev,
unsigned int cmd, unsigned long arg)
{
struct vfio_platform_device *vdev =
@ -382,6 +384,7 @@ static long vfio_platform_ioctl(struct vfio_device *core_vdev,
return -ENOTTY;
}
EXPORT_SYMBOL_GPL(vfio_platform_ioctl);
static ssize_t vfio_platform_read_mmio(struct vfio_platform_region *reg,
char __user *buf, size_t count,
@ -438,7 +441,7 @@ err:
return -EFAULT;
}
static ssize_t vfio_platform_read(struct vfio_device *core_vdev,
ssize_t vfio_platform_read(struct vfio_device *core_vdev,
char __user *buf, size_t count, loff_t *ppos)
{
struct vfio_platform_device *vdev =
@ -460,6 +463,7 @@ static ssize_t vfio_platform_read(struct vfio_device *core_vdev,
return -EINVAL;
}
EXPORT_SYMBOL_GPL(vfio_platform_read);
static ssize_t vfio_platform_write_mmio(struct vfio_platform_region *reg,
const char __user *buf, size_t count,
@ -515,7 +519,7 @@ err:
return -EFAULT;
}
static ssize_t vfio_platform_write(struct vfio_device *core_vdev, const char __user *buf,
ssize_t vfio_platform_write(struct vfio_device *core_vdev, const char __user *buf,
size_t count, loff_t *ppos)
{
struct vfio_platform_device *vdev =
@ -537,6 +541,7 @@ static ssize_t vfio_platform_write(struct vfio_device *core_vdev, const char __u
return -EINVAL;
}
EXPORT_SYMBOL_GPL(vfio_platform_write);
static int vfio_platform_mmap_mmio(struct vfio_platform_region region,
struct vm_area_struct *vma)
@ -558,7 +563,7 @@ static int vfio_platform_mmap_mmio(struct vfio_platform_region region,
req_len, vma->vm_page_prot);
}
static int vfio_platform_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma)
int vfio_platform_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma)
{
struct vfio_platform_device *vdev =
container_of(core_vdev, struct vfio_platform_device, vdev);
@ -598,16 +603,7 @@ static int vfio_platform_mmap(struct vfio_device *core_vdev, struct vm_area_stru
return -EINVAL;
}
static const struct vfio_device_ops vfio_platform_ops = {
.name = "vfio-platform",
.open_device = vfio_platform_open_device,
.close_device = vfio_platform_close_device,
.ioctl = vfio_platform_ioctl,
.read = vfio_platform_read,
.write = vfio_platform_write,
.mmap = vfio_platform_mmap,
};
EXPORT_SYMBOL_GPL(vfio_platform_mmap);
static int vfio_platform_of_probe(struct vfio_platform_device *vdev,
struct device *dev)
@ -639,55 +635,34 @@ static int vfio_platform_of_probe(struct vfio_platform_device *vdev,
* If the firmware is ACPI type, then acpi_disabled is 0. All other checks are
* valid checks. We cannot claim that this system is DT.
*/
int vfio_platform_probe_common(struct vfio_platform_device *vdev,
struct device *dev)
int vfio_platform_init_common(struct vfio_platform_device *vdev)
{
int ret;
vfio_init_group_dev(&vdev->vdev, dev, &vfio_platform_ops);
struct device *dev = vdev->vdev.dev;
ret = vfio_platform_acpi_probe(vdev, dev);
if (ret)
ret = vfio_platform_of_probe(vdev, dev);
if (ret)
goto out_uninit;
return ret;
vdev->device = dev;
ret = vfio_platform_get_reset(vdev);
if (ret && vdev->reset_required) {
dev_err(dev, "No reset function found for device %s\n",
vdev->name);
goto out_uninit;
}
ret = vfio_register_group_dev(&vdev->vdev);
if (ret)
goto put_reset;
mutex_init(&vdev->igate);
pm_runtime_enable(dev);
return 0;
put_reset:
vfio_platform_put_reset(vdev);
out_uninit:
vfio_uninit_group_dev(&vdev->vdev);
ret = vfio_platform_get_reset(vdev);
if (ret && vdev->reset_required)
dev_err(dev, "No reset function found for device %s\n",
vdev->name);
return ret;
}
EXPORT_SYMBOL_GPL(vfio_platform_probe_common);
EXPORT_SYMBOL_GPL(vfio_platform_init_common);
void vfio_platform_remove_common(struct vfio_platform_device *vdev)
void vfio_platform_release_common(struct vfio_platform_device *vdev)
{
vfio_unregister_group_dev(&vdev->vdev);
pm_runtime_disable(vdev->device);
vfio_platform_put_reset(vdev);
vfio_uninit_group_dev(&vdev->vdev);
}
EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
EXPORT_SYMBOL_GPL(vfio_platform_release_common);
void __vfio_platform_register_reset(struct vfio_platform_reset_node *node)
{

View File

@ -78,9 +78,21 @@ struct vfio_platform_reset_node {
vfio_platform_reset_fn_t of_reset;
};
int vfio_platform_probe_common(struct vfio_platform_device *vdev,
struct device *dev);
void vfio_platform_remove_common(struct vfio_platform_device *vdev);
int vfio_platform_init_common(struct vfio_platform_device *vdev);
void vfio_platform_release_common(struct vfio_platform_device *vdev);
int vfio_platform_open_device(struct vfio_device *core_vdev);
void vfio_platform_close_device(struct vfio_device *core_vdev);
long vfio_platform_ioctl(struct vfio_device *core_vdev,
unsigned int cmd, unsigned long arg);
ssize_t vfio_platform_read(struct vfio_device *core_vdev,
char __user *buf, size_t count,
loff_t *ppos);
ssize_t vfio_platform_write(struct vfio_device *core_vdev,
const char __user *buf,
size_t count, loff_t *ppos);
int vfio_platform_mmap(struct vfio_device *core_vdev,
struct vm_area_struct *vma);
int vfio_platform_irq_init(struct vfio_platform_device *vdev);
void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev);

View File

@ -3,6 +3,16 @@
* Copyright (C) 2012 Red Hat, Inc. All rights reserved.
* Author: Alex Williamson <alex.williamson@redhat.com>
*/
#ifndef __VFIO_VFIO_H__
#define __VFIO_VFIO_H__
#include <linux/device.h>
#include <linux/cdev.h>
#include <linux/module.h>
struct iommu_group;
struct vfio_device;
struct vfio_container;
enum vfio_group_type {
/*
@ -28,6 +38,30 @@ enum vfio_group_type {
VFIO_NO_IOMMU,
};
struct vfio_group {
struct device dev;
struct cdev cdev;
/*
* When drivers is non-zero a driver is attached to the struct device
* that provided the iommu_group and thus the iommu_group is a valid
* pointer. When drivers is 0 the driver is being detached. Once users
* reaches 0 then the iommu_group is invalid.
*/
refcount_t drivers;
unsigned int container_users;
struct iommu_group *iommu_group;
struct vfio_container *container;
struct list_head device_list;
struct mutex device_lock;
struct list_head vfio_next;
struct list_head container_next;
enum vfio_group_type type;
struct mutex group_lock;
struct kvm *kvm;
struct file *opened_file;
struct blocking_notifier_head notifier;
};
/* events for the backend driver notify callback */
enum vfio_iommu_notify_type {
VFIO_IOMMU_CONTAINER_CLOSE = 0,
@ -67,5 +101,33 @@ struct vfio_iommu_driver_ops {
enum vfio_iommu_notify_type event);
};
struct vfio_iommu_driver {
const struct vfio_iommu_driver_ops *ops;
struct list_head vfio_next;
};
int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);
bool vfio_assert_device_open(struct vfio_device *device);
struct vfio_container *vfio_container_from_file(struct file *filep);
int vfio_device_assign_container(struct vfio_device *device);
void vfio_device_unassign_container(struct vfio_device *device);
int vfio_container_attach_group(struct vfio_container *container,
struct vfio_group *group);
void vfio_group_detach_container(struct vfio_group *group);
void vfio_device_container_register(struct vfio_device *device);
void vfio_device_container_unregister(struct vfio_device *device);
long vfio_container_ioctl_check_extension(struct vfio_container *container,
unsigned long arg);
int __init vfio_container_init(void);
void vfio_container_cleanup(void);
#ifdef CONFIG_VFIO_NOIOMMU
extern bool vfio_noiommu __read_mostly;
#else
enum { vfio_noiommu = false };
#endif
#endif

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,26 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (c) 2022, Oracle and/or its affiliates.
* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
*/
#ifndef _IOVA_BITMAP_H_
#define _IOVA_BITMAP_H_
#include <linux/types.h>
struct iova_bitmap;
typedef int (*iova_bitmap_fn_t)(struct iova_bitmap *bitmap,
unsigned long iova, size_t length,
void *opaque);
struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length,
unsigned long page_size,
u64 __user *data);
void iova_bitmap_free(struct iova_bitmap *bitmap);
int iova_bitmap_for_each(struct iova_bitmap *bitmap, void *opaque,
iova_bitmap_fn_t fn);
void iova_bitmap_set(struct iova_bitmap *bitmap,
unsigned long iova, size_t length);
#endif

View File

@ -10,6 +10,9 @@
#ifndef MDEV_H
#define MDEV_H
#include <linux/device.h>
#include <linux/uuid.h>
struct mdev_type;
struct mdev_device {
@ -20,67 +23,67 @@ struct mdev_device {
bool active;
};
struct mdev_type {
/* set by the driver before calling mdev_register parent: */
const char *sysfs_name;
const char *pretty_name;
/* set by the core, can be used drivers */
struct mdev_parent *parent;
/* internal only */
struct kobject kobj;
struct kobject *devices_kobj;
};
/* embedded into the struct device that the mdev devices hang off */
struct mdev_parent {
struct device *dev;
struct mdev_driver *mdev_driver;
struct kset *mdev_types_kset;
/* Synchronize device creation/removal with parent unregistration */
struct rw_semaphore unreg_sem;
struct mdev_type **types;
unsigned int nr_types;
atomic_t available_instances;
};
static inline struct mdev_device *to_mdev_device(struct device *dev)
{
return container_of(dev, struct mdev_device, dev);
}
unsigned int mdev_get_type_group_id(struct mdev_device *mdev);
unsigned int mtype_get_type_group_id(struct mdev_type *mtype);
struct device *mtype_get_parent_dev(struct mdev_type *mtype);
/* interface for exporting mdev supported type attributes */
struct mdev_type_attribute {
struct attribute attr;
ssize_t (*show)(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf);
ssize_t (*store)(struct mdev_type *mtype,
struct mdev_type_attribute *attr, const char *buf,
size_t count);
};
#define MDEV_TYPE_ATTR(_name, _mode, _show, _store) \
struct mdev_type_attribute mdev_type_attr_##_name = \
__ATTR(_name, _mode, _show, _store)
#define MDEV_TYPE_ATTR_RW(_name) \
struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_RW(_name)
#define MDEV_TYPE_ATTR_RO(_name) \
struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_RO(_name)
#define MDEV_TYPE_ATTR_WO(_name) \
struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_WO(_name)
/**
* struct mdev_driver - Mediated device driver
* @device_api: string to return for the device_api sysfs
* @max_instances: maximum number of instances supported (optional)
* @probe: called when new device created
* @remove: called when device removed
* @supported_type_groups: Attributes to define supported types. It is mandatory
* to provide supported types.
* @get_available: Return the max number of instances that can be created
* @show_description: Print a description of the mtype
* @driver: device driver structure
*
**/
struct mdev_driver {
const char *device_api;
unsigned int max_instances;
int (*probe)(struct mdev_device *dev);
void (*remove)(struct mdev_device *dev);
struct attribute_group **supported_type_groups;
unsigned int (*get_available)(struct mdev_type *mtype);
ssize_t (*show_description)(struct mdev_type *mtype, char *buf);
struct device_driver driver;
};
extern struct bus_type mdev_bus_type;
int mdev_register_device(struct device *dev, struct mdev_driver *mdev_driver);
void mdev_unregister_device(struct device *dev);
int mdev_register_parent(struct mdev_parent *parent, struct device *dev,
struct mdev_driver *mdev_driver, struct mdev_type **types,
unsigned int nr_types);
void mdev_unregister_parent(struct mdev_parent *parent);
int mdev_register_driver(struct mdev_driver *drv);
void mdev_unregister_driver(struct mdev_driver *drv);
struct device *mdev_parent_dev(struct mdev_device *mdev);
static inline struct device *mdev_dev(struct mdev_device *mdev)
{
return &mdev->dev;
}
static inline struct mdev_device *mdev_from_dev(struct device *dev)
{
return dev->bus == &mdev_bus_type ? to_mdev_device(dev) : NULL;
}
#endif /* MDEV_H */

View File

@ -14,6 +14,7 @@
#include <linux/workqueue.h>
#include <linux/poll.h>
#include <uapi/linux/vfio.h>
#include <linux/iova_bitmap.h>
struct kvm;
@ -33,10 +34,11 @@ struct vfio_device {
struct device *dev;
const struct vfio_device_ops *ops;
/*
* mig_ops is a static property of the vfio_device which must be set
* prior to registering the vfio_device.
* mig_ops/log_ops is a static property of the vfio_device which must
* be set prior to registering the vfio_device.
*/
const struct vfio_migration_ops *mig_ops;
const struct vfio_log_ops *log_ops;
struct vfio_group *group;
struct vfio_device_set *dev_set;
struct list_head dev_set_list;
@ -45,7 +47,9 @@ struct vfio_device {
struct kvm *kvm;
/* Members below here are private, not for driver use */
refcount_t refcount;
unsigned int index;
struct device device; /* device.kref covers object life circle */
refcount_t refcount; /* user count on registered device*/
unsigned int open_count;
struct completion comp;
struct list_head group_next;
@ -55,6 +59,8 @@ struct vfio_device {
/**
* struct vfio_device_ops - VFIO bus driver device callbacks
*
* @init: initialize private fields in device structure
* @release: Reclaim private fields in device structure
* @open_device: Called when the first file descriptor is opened for this device
* @close_device: Opposite of open_device
* @read: Perform read(2) on device file descriptor
@ -72,6 +78,8 @@ struct vfio_device {
*/
struct vfio_device_ops {
char *name;
int (*init)(struct vfio_device *vdev);
void (*release)(struct vfio_device *vdev);
int (*open_device)(struct vfio_device *vdev);
void (*close_device)(struct vfio_device *vdev);
ssize_t (*read)(struct vfio_device *vdev, char __user *buf,
@ -108,6 +116,28 @@ struct vfio_migration_ops {
enum vfio_device_mig_state *curr_state);
};
/**
* @log_start: Optional callback to ask the device start DMA logging.
* @log_stop: Optional callback to ask the device stop DMA logging.
* @log_read_and_clear: Optional callback to ask the device read
* and clear the dirty DMAs in some given range.
*
* The vfio core implementation of the DEVICE_FEATURE_DMA_LOGGING_ set
* of features does not track logging state relative to the device,
* therefore the device implementation of vfio_log_ops must handle
* arbitrary user requests. This includes rejecting subsequent calls
* to log_start without an intervening log_stop, as well as graceful
* handling of log_stop and log_read_and_clear from invalid states.
*/
struct vfio_log_ops {
int (*log_start)(struct vfio_device *device,
struct rb_root_cached *ranges, u32 nnodes, u64 *page_size);
int (*log_stop)(struct vfio_device *device);
int (*log_read_and_clear)(struct vfio_device *device,
unsigned long iova, unsigned long length,
struct iova_bitmap *dirty);
};
/**
* vfio_check_feature - Validate user input for the VFIO_DEVICE_FEATURE ioctl
* @flags: Arg from the device_feature op
@ -137,9 +167,23 @@ static inline int vfio_check_feature(u32 flags, size_t argsz, u32 supported_ops,
return 1;
}
void vfio_init_group_dev(struct vfio_device *device, struct device *dev,
struct vfio_device *_vfio_alloc_device(size_t size, struct device *dev,
const struct vfio_device_ops *ops);
void vfio_uninit_group_dev(struct vfio_device *device);
#define vfio_alloc_device(dev_struct, member, dev, ops) \
container_of(_vfio_alloc_device(sizeof(struct dev_struct) + \
BUILD_BUG_ON_ZERO(offsetof( \
struct dev_struct, member)), \
dev, ops), \
struct dev_struct, member)
int vfio_init_device(struct vfio_device *device, struct device *dev,
const struct vfio_device_ops *ops);
void vfio_free_device(struct vfio_device *device);
static inline void vfio_put_device(struct vfio_device *device)
{
put_device(&device->device);
}
int vfio_register_group_dev(struct vfio_device *device);
int vfio_register_emulated_iommu_dev(struct vfio_device *device);
void vfio_unregister_group_dev(struct vfio_device *device);
@ -155,6 +199,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
* External user API
*/
struct iommu_group *vfio_file_iommu_group(struct file *file);
bool vfio_file_is_group(struct file *file);
bool vfio_file_enforced_coherent(struct file *file);
void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

View File

@ -20,39 +20,10 @@
#define VFIO_PCI_CORE_H
#define VFIO_PCI_OFFSET_SHIFT 40
#define VFIO_PCI_OFFSET_TO_INDEX(off) (off >> VFIO_PCI_OFFSET_SHIFT)
#define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
#define VFIO_PCI_OFFSET_MASK (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
/* Special capability IDs predefined access */
#define PCI_CAP_ID_INVALID 0xFF /* default raw access */
#define PCI_CAP_ID_INVALID_VIRT 0xFE /* default virt access */
/* Cap maximum number of ioeventfds per device (arbitrary) */
#define VFIO_PCI_IOEVENTFD_MAX 1000
struct vfio_pci_ioeventfd {
struct list_head next;
struct vfio_pci_core_device *vdev;
struct virqfd *virqfd;
void __iomem *addr;
uint64_t data;
loff_t pos;
int bar;
int count;
bool test_mem;
};
struct vfio_pci_irq_ctx {
struct eventfd_ctx *trigger;
struct virqfd *unmask;
struct virqfd *mask;
char *name;
bool masked;
struct irq_bypass_producer producer;
};
struct vfio_pci_core_device;
struct vfio_pci_region;
@ -78,23 +49,6 @@ struct vfio_pci_region {
u32 flags;
};
struct vfio_pci_dummy_resource {
struct resource resource;
int index;
struct list_head res_next;
};
struct vfio_pci_vf_token {
struct mutex lock;
uuid_t uuid;
int users;
};
struct vfio_pci_mmap_vma {
struct vm_area_struct *vma;
struct list_head vma_next;
};
struct vfio_pci_core_device {
struct vfio_device vdev;
struct pci_dev *pdev;
@ -124,11 +78,14 @@ struct vfio_pci_core_device {
bool needs_reset;
bool nointx;
bool needs_pm_restore;
bool pm_intx_masked;
bool pm_runtime_engaged;
struct pci_saved_state *pci_saved_state;
struct pci_saved_state *pm_save;
int ioeventfds_nr;
struct eventfd_ctx *err_trigger;
struct eventfd_ctx *req_trigger;
struct eventfd_ctx *pm_wake_eventfd_ctx;
struct list_head dummy_resources_list;
struct mutex ioeventfds_lock;
struct list_head ioeventfds_list;
@ -141,100 +98,17 @@ struct vfio_pci_core_device {
struct rw_semaphore memory_lock;
};
#define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
#define is_msi(vdev) (vdev->irq_type == VFIO_PCI_MSI_IRQ_INDEX)
#define is_msix(vdev) (vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX)
#define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
#define irq_is(vdev, type) (vdev->irq_type == type)
void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev);
void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev);
int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev,
uint32_t flags, unsigned index,
unsigned start, unsigned count, void *data);
ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev,
char __user *buf, size_t count,
loff_t *ppos, bool iswrite);
ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
#ifdef CONFIG_VFIO_PCI_VGA
ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
#else
static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev,
char __user *buf, size_t count,
loff_t *ppos, bool iswrite)
{
return -EINVAL;
}
#endif
long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
uint64_t data, int count, int fd);
int vfio_pci_init_perm_bits(void);
void vfio_pci_uninit_perm_bits(void);
int vfio_config_init(struct vfio_pci_core_device *vdev);
void vfio_config_free(struct vfio_pci_core_device *vdev);
int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev,
/* Will be exported for vfio pci drivers usage */
int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
unsigned int type, unsigned int subtype,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data);
int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev,
pci_power_t state);
bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev);
void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev);
u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev);
void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev,
u16 cmd);
#ifdef CONFIG_VFIO_PCI_IGD
int vfio_pci_igd_init(struct vfio_pci_core_device *vdev);
#else
static inline int vfio_pci_igd_init(struct vfio_pci_core_device *vdev)
{
return -ENODEV;
}
#endif
#ifdef CONFIG_VFIO_PCI_ZDEV_KVM
int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps);
int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
#else
static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps)
{
return -ENODEV;
}
static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
{
return 0;
}
static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
{}
#endif
/* Will be exported for vfio pci drivers usage */
void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
bool is_disable_idle_d3);
void vfio_pci_core_close_device(struct vfio_device *core_vdev);
void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,
struct pci_dev *pdev,
const struct vfio_device_ops *vfio_pci_ops);
int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev);
void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev);
void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev);
extern const struct pci_error_handlers vfio_pci_core_err_handlers;
int vfio_pci_core_sriov_configure(struct vfio_pci_core_device *vdev,
@ -256,9 +130,4 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state);
static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
{
return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
}
#endif /* VFIO_PCI_CORE_H */

View File

@ -986,6 +986,148 @@ enum vfio_device_mig_state {
VFIO_DEVICE_STATE_RUNNING_P2P = 5,
};
/*
* Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power
* state with the platform-based power management. Device use of lower power
* states depends on factors managed by the runtime power management core,
* including system level support and coordinating support among dependent
* devices. Enabling device low power entry does not guarantee lower power
* usage by the device, nor is a mechanism provided through this feature to
* know the current power state of the device. If any device access happens
* (either from the host or through the vfio uAPI) when the device is in the
* low power state, then the host will move the device out of the low power
* state as necessary prior to the access. Once the access is completed, the
* device may re-enter the low power state. For single shot low power support
* with wake-up notification, see
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below. Access to mmap'd
* device regions is disabled on LOW_POWER_ENTRY and may only be resumed after
* calling LOW_POWER_EXIT.
*/
#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3
/*
* This device feature has the same behavior as
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user
* provides an eventfd for wake-up notification. When the device moves out of
* the low power state for the wake-up, the host will not allow the device to
* re-enter a low power state without a subsequent user call to one of the low
* power entry device feature IOCTLs. Access to mmap'd device regions is
* disabled on LOW_POWER_ENTRY_WITH_WAKEUP and may only be resumed after the
* low power exit. The low power exit can happen either through LOW_POWER_EXIT
* or through any other access (where the wake-up notification has been
* generated). The access to mmap'd device regions will not trigger low power
* exit.
*
* The notification through the provided eventfd will be generated only when
* the device has entered and is resumed from a low power state after
* calling this device feature IOCTL. A device that has not entered low power
* state, as managed through the runtime power management core, will not
* generate a notification through the provided eventfd on access. Calling the
* LOW_POWER_EXIT feature is optional in the case where notification has been
* signaled on the provided eventfd that a resume from low power has occurred.
*/
struct vfio_device_low_power_entry_with_wakeup {
__s32 wakeup_eventfd;
__u32 reserved;
};
#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4
/*
* Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states as
* previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features.
* This device feature IOCTL may itself generate a wakeup eventfd notification
* in the latter case if the device had previously entered a low power state.
*/
#define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5
/*
* Upon VFIO_DEVICE_FEATURE_SET start/stop device DMA logging.
* VFIO_DEVICE_FEATURE_PROBE can be used to detect if the device supports
* DMA logging.
*
* DMA logging allows a device to internally record what DMAs the device is
* initiating and report them back to userspace. It is part of the VFIO
* migration infrastructure that allows implementing dirty page tracking
* during the pre copy phase of live migration. Only DMA WRITEs are logged,
* and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
*
* When DMA logging is started a range of IOVAs to monitor is provided and the
* device can optimize its logging to cover only the IOVA range given. Each
* DMA that the device initiates inside the range will be logged by the device
* for later retrieval.
*
* page_size is an input that hints what tracking granularity the device
* should try to achieve. If the device cannot do the hinted page size then
* it's the driver choice which page size to pick based on its support.
* On output the device will return the page size it selected.
*
* ranges is a pointer to an array of
* struct vfio_device_feature_dma_logging_range.
*
* The core kernel code guarantees to support by minimum num_ranges that fit
* into a single kernel page. User space can try higher values but should give
* up if the above can't be achieved as of some driver limitations.
*
* A single call to start device DMA logging can be issued and a matching stop
* should follow at the end. Another start is not allowed in the meantime.
*/
struct vfio_device_feature_dma_logging_control {
__aligned_u64 page_size;
__u32 num_ranges;
__u32 __reserved;
__aligned_u64 ranges;
};
struct vfio_device_feature_dma_logging_range {
__aligned_u64 iova;
__aligned_u64 length;
};
#define VFIO_DEVICE_FEATURE_DMA_LOGGING_START 6
/*
* Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
* by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
*/
#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 7
/*
* Upon VFIO_DEVICE_FEATURE_GET read back and clear the device DMA log
*
* Query the device's DMA log for written pages within the given IOVA range.
* During querying the log is cleared for the IOVA range.
*
* bitmap is a pointer to an array of u64s that will hold the output bitmap
* with 1 bit reporting a page_size unit of IOVA. The mapping of IOVA to bits
* is given by:
* bitmap[(addr - iova)/page_size] & (1ULL << (addr % 64))
*
* The input page_size can be any power of two value and does not have to
* match the value given to VFIO_DEVICE_FEATURE_DMA_LOGGING_START. The driver
* will format its internal logging to match the reporting page size, possibly
* by replicating bits if the internal page size is lower than requested.
*
* The LOGGING_REPORT will only set bits in the bitmap and never clear or
* perform any initialization of the user provided bitmap.
*
* If any error is returned userspace should assume that the dirty log is
* corrupted. Error recovery is to consider all memory dirty and try to
* restart the dirty tracking, or to abort/restart the whole migration.
*
* If DMA logging is not enabled, an error will be returned.
*
*/
struct vfio_device_feature_dma_logging_report {
__aligned_u64 iova;
__aligned_u64 length;
__aligned_u64 page_size;
__aligned_u64 bitmap;
};
#define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 8
/* -------- API for Type1 VFIO IOMMU -------- */
/**

View File

@ -21,7 +21,6 @@
*/
#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
@ -100,35 +99,44 @@ MODULE_PARM_DESC(mem, "megabytes available to " MBOCHS_NAME " devices");
#define MBOCHS_TYPE_2 "medium"
#define MBOCHS_TYPE_3 "large"
static const struct mbochs_type {
const char *name;
static struct mbochs_type {
struct mdev_type type;
u32 mbytes;
u32 max_x;
u32 max_y;
} mbochs_types[] = {
{
.name = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_1,
.type.sysfs_name = MBOCHS_TYPE_1,
.type.pretty_name = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_1,
.mbytes = 4,
.max_x = 800,
.max_y = 600,
}, {
.name = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_2,
.type.sysfs_name = MBOCHS_TYPE_2,
.type.pretty_name = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_2,
.mbytes = 16,
.max_x = 1920,
.max_y = 1440,
}, {
.name = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_3,
.type.sysfs_name = MBOCHS_TYPE_3,
.type.pretty_name = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_3,
.mbytes = 64,
.max_x = 0,
.max_y = 0,
},
};
static struct mdev_type *mbochs_mdev_types[] = {
&mbochs_types[0].type,
&mbochs_types[1].type,
&mbochs_types[2].type,
};
static dev_t mbochs_devt;
static struct class *mbochs_class;
static struct cdev mbochs_cdev;
static struct device mbochs_dev;
static struct mdev_parent mbochs_parent;
static atomic_t mbochs_avail_mbytes;
static const struct vfio_device_ops mbochs_dev_ops;
@ -505,13 +513,14 @@ static int mbochs_reset(struct mdev_state *mdev_state)
return 0;
}
static int mbochs_probe(struct mdev_device *mdev)
static int mbochs_init_dev(struct vfio_device *vdev)
{
struct mdev_state *mdev_state =
container_of(vdev, struct mdev_state, vdev);
struct mdev_device *mdev = to_mdev_device(vdev->dev);
struct mbochs_type *type =
container_of(mdev->type, struct mbochs_type, type);
int avail_mbytes = atomic_read(&mbochs_avail_mbytes);
const struct mbochs_type *type =
&mbochs_types[mdev_get_type_group_id(mdev)];
struct device *dev = mdev_dev(mdev);
struct mdev_state *mdev_state;
int ret = -ENOMEM;
do {
@ -520,14 +529,9 @@ static int mbochs_probe(struct mdev_device *mdev)
} while (!atomic_try_cmpxchg(&mbochs_avail_mbytes, &avail_mbytes,
avail_mbytes - type->mbytes));
mdev_state = kzalloc(sizeof(struct mdev_state), GFP_KERNEL);
if (mdev_state == NULL)
goto err_avail;
vfio_init_group_dev(&mdev_state->vdev, &mdev->dev, &mbochs_dev_ops);
mdev_state->vconfig = kzalloc(MBOCHS_CONFIG_SPACE_SIZE, GFP_KERNEL);
if (mdev_state->vconfig == NULL)
goto err_mem;
if (!mdev_state->vconfig)
goto err_avail;
mdev_state->memsize = type->mbytes * 1024 * 1024;
mdev_state->pagecount = mdev_state->memsize >> PAGE_SHIFT;
@ -535,10 +539,7 @@ static int mbochs_probe(struct mdev_device *mdev)
sizeof(struct page *),
GFP_KERNEL);
if (!mdev_state->pages)
goto err_mem;
dev_info(dev, "%s: %s, %d MB, %ld pages\n", __func__,
type->name, type->mbytes, mdev_state->pagecount);
goto err_vconfig;
mutex_init(&mdev_state->ops_lock);
mdev_state->mdev = mdev;
@ -553,31 +554,55 @@ static int mbochs_probe(struct mdev_device *mdev)
mbochs_create_config_space(mdev_state);
mbochs_reset(mdev_state);
ret = vfio_register_emulated_iommu_dev(&mdev_state->vdev);
if (ret)
goto err_mem;
dev_set_drvdata(&mdev->dev, mdev_state);
dev_info(vdev->dev, "%s: %s, %d MB, %ld pages\n", __func__,
type->type.pretty_name, type->mbytes, mdev_state->pagecount);
return 0;
err_mem:
vfio_uninit_group_dev(&mdev_state->vdev);
kfree(mdev_state->pages);
err_vconfig:
kfree(mdev_state->vconfig);
kfree(mdev_state);
err_avail:
atomic_add(type->mbytes, &mbochs_avail_mbytes);
return ret;
}
static int mbochs_probe(struct mdev_device *mdev)
{
struct mdev_state *mdev_state;
int ret = -ENOMEM;
mdev_state = vfio_alloc_device(mdev_state, vdev, &mdev->dev,
&mbochs_dev_ops);
if (IS_ERR(mdev_state))
return PTR_ERR(mdev_state);
ret = vfio_register_emulated_iommu_dev(&mdev_state->vdev);
if (ret)
goto err_put_vdev;
dev_set_drvdata(&mdev->dev, mdev_state);
return 0;
err_put_vdev:
vfio_put_device(&mdev_state->vdev);
return ret;
}
static void mbochs_release_dev(struct vfio_device *vdev)
{
struct mdev_state *mdev_state =
container_of(vdev, struct mdev_state, vdev);
atomic_add(mdev_state->type->mbytes, &mbochs_avail_mbytes);
kfree(mdev_state->pages);
kfree(mdev_state->vconfig);
vfio_free_device(vdev);
}
static void mbochs_remove(struct mdev_device *mdev)
{
struct mdev_state *mdev_state = dev_get_drvdata(&mdev->dev);
vfio_unregister_group_dev(&mdev_state->vdev);
vfio_uninit_group_dev(&mdev_state->vdev);
atomic_add(mdev_state->type->mbytes, &mbochs_avail_mbytes);
kfree(mdev_state->pages);
kfree(mdev_state->vconfig);
kfree(mdev_state);
vfio_put_device(&mdev_state->vdev);
}
static ssize_t mbochs_read(struct vfio_device *vdev, char __user *buf,
@ -1325,78 +1350,27 @@ static const struct attribute_group *mdev_dev_groups[] = {
NULL,
};
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
static ssize_t mbochs_show_description(struct mdev_type *mtype, char *buf)
{
const struct mbochs_type *type =
&mbochs_types[mtype_get_type_group_id(mtype)];
return sprintf(buf, "%s\n", type->name);
}
static MDEV_TYPE_ATTR_RO(name);
static ssize_t description_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
const struct mbochs_type *type =
&mbochs_types[mtype_get_type_group_id(mtype)];
struct mbochs_type *type =
container_of(mtype, struct mbochs_type, type);
return sprintf(buf, "virtual display, %d MB video memory\n",
type ? type->mbytes : 0);
}
static MDEV_TYPE_ATTR_RO(description);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
static unsigned int mbochs_get_available(struct mdev_type *mtype)
{
const struct mbochs_type *type =
&mbochs_types[mtype_get_type_group_id(mtype)];
int count = atomic_read(&mbochs_avail_mbytes) / type->mbytes;
struct mbochs_type *type =
container_of(mtype, struct mbochs_type, type);
return sprintf(buf, "%d\n", count);
return atomic_read(&mbochs_avail_mbytes) / type->mbytes;
}
static MDEV_TYPE_ATTR_RO(available_instances);
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
}
static MDEV_TYPE_ATTR_RO(device_api);
static struct attribute *mdev_types_attrs[] = {
&mdev_type_attr_name.attr,
&mdev_type_attr_description.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_available_instances.attr,
NULL,
};
static struct attribute_group mdev_type_group1 = {
.name = MBOCHS_TYPE_1,
.attrs = mdev_types_attrs,
};
static struct attribute_group mdev_type_group2 = {
.name = MBOCHS_TYPE_2,
.attrs = mdev_types_attrs,
};
static struct attribute_group mdev_type_group3 = {
.name = MBOCHS_TYPE_3,
.attrs = mdev_types_attrs,
};
static struct attribute_group *mdev_type_groups[] = {
&mdev_type_group1,
&mdev_type_group2,
&mdev_type_group3,
NULL,
};
static const struct vfio_device_ops mbochs_dev_ops = {
.close_device = mbochs_close_device,
.init = mbochs_init_dev,
.release = mbochs_release_dev,
.read = mbochs_read,
.write = mbochs_write,
.ioctl = mbochs_ioctl,
@ -1404,6 +1378,7 @@ static const struct vfio_device_ops mbochs_dev_ops = {
};
static struct mdev_driver mbochs_driver = {
.device_api = VFIO_DEVICE_API_PCI_STRING,
.driver = {
.name = "mbochs",
.owner = THIS_MODULE,
@ -1412,7 +1387,8 @@ static struct mdev_driver mbochs_driver = {
},
.probe = mbochs_probe,
.remove = mbochs_remove,
.supported_type_groups = mdev_type_groups,
.get_available = mbochs_get_available,
.show_description = mbochs_show_description,
};
static const struct file_operations vd_fops = {
@ -1457,7 +1433,9 @@ static int __init mbochs_dev_init(void)
if (ret)
goto err_class;
ret = mdev_register_device(&mbochs_dev, &mbochs_driver);
ret = mdev_register_parent(&mbochs_parent, &mbochs_dev, &mbochs_driver,
mbochs_mdev_types,
ARRAY_SIZE(mbochs_mdev_types));
if (ret)
goto err_device;
@ -1478,7 +1456,7 @@ err_cdev:
static void __exit mbochs_dev_exit(void)
{
mbochs_dev.bus = NULL;
mdev_unregister_device(&mbochs_dev);
mdev_unregister_parent(&mbochs_parent);
device_unregister(&mbochs_dev);
mdev_unregister_driver(&mbochs_driver);

View File

@ -17,7 +17,6 @@
*/
#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
@ -43,36 +42,34 @@
MODULE_LICENSE("GPL v2");
static int max_devices = 4;
module_param_named(count, max_devices, int, 0444);
MODULE_PARM_DESC(count, "number of " MDPY_NAME " devices");
#define MDPY_TYPE_1 "vga"
#define MDPY_TYPE_2 "xga"
#define MDPY_TYPE_3 "hd"
static const struct mdpy_type {
const char *name;
static struct mdpy_type {
struct mdev_type type;
u32 format;
u32 bytepp;
u32 width;
u32 height;
} mdpy_types[] = {
{
.name = MDPY_CLASS_NAME "-" MDPY_TYPE_1,
.type.sysfs_name = MDPY_TYPE_1,
.type.pretty_name = MDPY_CLASS_NAME "-" MDPY_TYPE_1,
.format = DRM_FORMAT_XRGB8888,
.bytepp = 4,
.width = 640,
.height = 480,
}, {
.name = MDPY_CLASS_NAME "-" MDPY_TYPE_2,
.type.sysfs_name = MDPY_TYPE_2,
.type.pretty_name = MDPY_CLASS_NAME "-" MDPY_TYPE_2,
.format = DRM_FORMAT_XRGB8888,
.bytepp = 4,
.width = 1024,
.height = 768,
}, {
.name = MDPY_CLASS_NAME "-" MDPY_TYPE_3,
.type.sysfs_name = MDPY_TYPE_3,
.type.pretty_name = MDPY_CLASS_NAME "-" MDPY_TYPE_3,
.format = DRM_FORMAT_XRGB8888,
.bytepp = 4,
.width = 1920,
@ -80,11 +77,17 @@ static const struct mdpy_type {
},
};
static struct mdev_type *mdpy_mdev_types[] = {
&mdpy_types[0].type,
&mdpy_types[1].type,
&mdpy_types[2].type,
};
static dev_t mdpy_devt;
static struct class *mdpy_class;
static struct cdev mdpy_cdev;
static struct device mdpy_dev;
static u32 mdpy_count;
static struct mdev_parent mdpy_parent;
static const struct vfio_device_ops mdpy_dev_ops;
/* State of each mdev device */
@ -216,38 +219,25 @@ static int mdpy_reset(struct mdev_state *mdev_state)
return 0;
}
static int mdpy_probe(struct mdev_device *mdev)
static int mdpy_init_dev(struct vfio_device *vdev)
{
struct mdev_state *mdev_state =
container_of(vdev, struct mdev_state, vdev);
struct mdev_device *mdev = to_mdev_device(vdev->dev);
const struct mdpy_type *type =
&mdpy_types[mdev_get_type_group_id(mdev)];
struct device *dev = mdev_dev(mdev);
struct mdev_state *mdev_state;
container_of(mdev->type, struct mdpy_type, type);
u32 fbsize;
int ret;
if (mdpy_count >= max_devices)
return -ENOMEM;
mdev_state = kzalloc(sizeof(struct mdev_state), GFP_KERNEL);
if (mdev_state == NULL)
return -ENOMEM;
vfio_init_group_dev(&mdev_state->vdev, &mdev->dev, &mdpy_dev_ops);
int ret = -ENOMEM;
mdev_state->vconfig = kzalloc(MDPY_CONFIG_SPACE_SIZE, GFP_KERNEL);
if (mdev_state->vconfig == NULL) {
ret = -ENOMEM;
goto err_state;
}
if (!mdev_state->vconfig)
return ret;
fbsize = roundup_pow_of_two(type->width * type->height * type->bytepp);
mdev_state->memblk = vmalloc_user(fbsize);
if (!mdev_state->memblk) {
ret = -ENOMEM;
goto err_vconfig;
}
dev_info(dev, "%s: %s (%dx%d)\n", __func__, type->name, type->width,
type->height);
if (!mdev_state->memblk)
goto out_vconfig;
mutex_init(&mdev_state->ops_lock);
mdev_state->mdev = mdev;
@ -256,23 +246,46 @@ static int mdpy_probe(struct mdev_device *mdev)
mdpy_create_config_space(mdev_state);
mdpy_reset(mdev_state);
mdpy_count++;
dev_info(vdev->dev, "%s: %s (%dx%d)\n", __func__, type->type.pretty_name,
type->width, type->height);
return 0;
out_vconfig:
kfree(mdev_state->vconfig);
return ret;
}
static int mdpy_probe(struct mdev_device *mdev)
{
struct mdev_state *mdev_state;
int ret;
mdev_state = vfio_alloc_device(mdev_state, vdev, &mdev->dev,
&mdpy_dev_ops);
if (IS_ERR(mdev_state))
return PTR_ERR(mdev_state);
ret = vfio_register_emulated_iommu_dev(&mdev_state->vdev);
if (ret)
goto err_mem;
goto err_put_vdev;
dev_set_drvdata(&mdev->dev, mdev_state);
return 0;
err_mem:
vfree(mdev_state->memblk);
err_vconfig:
kfree(mdev_state->vconfig);
err_state:
vfio_uninit_group_dev(&mdev_state->vdev);
kfree(mdev_state);
err_put_vdev:
vfio_put_device(&mdev_state->vdev);
return ret;
}
static void mdpy_release_dev(struct vfio_device *vdev)
{
struct mdev_state *mdev_state =
container_of(vdev, struct mdev_state, vdev);
vfree(mdev_state->memblk);
kfree(mdev_state->vconfig);
vfio_free_device(vdev);
}
static void mdpy_remove(struct mdev_device *mdev)
{
struct mdev_state *mdev_state = dev_get_drvdata(&mdev->dev);
@ -280,12 +293,7 @@ static void mdpy_remove(struct mdev_device *mdev)
dev_info(&mdev->dev, "%s\n", __func__);
vfio_unregister_group_dev(&mdev_state->vdev);
vfree(mdev_state->memblk);
kfree(mdev_state->vconfig);
vfio_uninit_group_dev(&mdev_state->vdev);
kfree(mdev_state);
mdpy_count--;
vfio_put_device(&mdev_state->vdev);
}
static ssize_t mdpy_read(struct vfio_device *vdev, char __user *buf,
@ -641,73 +649,17 @@ static const struct attribute_group *mdev_dev_groups[] = {
NULL,
};
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
static ssize_t mdpy_show_description(struct mdev_type *mtype, char *buf)
{
const struct mdpy_type *type =
&mdpy_types[mtype_get_type_group_id(mtype)];
return sprintf(buf, "%s\n", type->name);
}
static MDEV_TYPE_ATTR_RO(name);
static ssize_t description_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
const struct mdpy_type *type =
&mdpy_types[mtype_get_type_group_id(mtype)];
struct mdpy_type *type = container_of(mtype, struct mdpy_type, type);
return sprintf(buf, "virtual display, %dx%d framebuffer\n",
type->width, type->height);
}
static MDEV_TYPE_ATTR_RO(description);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
{
return sprintf(buf, "%d\n", max_devices - mdpy_count);
}
static MDEV_TYPE_ATTR_RO(available_instances);
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
}
static MDEV_TYPE_ATTR_RO(device_api);
static struct attribute *mdev_types_attrs[] = {
&mdev_type_attr_name.attr,
&mdev_type_attr_description.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_available_instances.attr,
NULL,
};
static struct attribute_group mdev_type_group1 = {
.name = MDPY_TYPE_1,
.attrs = mdev_types_attrs,
};
static struct attribute_group mdev_type_group2 = {
.name = MDPY_TYPE_2,
.attrs = mdev_types_attrs,
};
static struct attribute_group mdev_type_group3 = {
.name = MDPY_TYPE_3,
.attrs = mdev_types_attrs,
};
static struct attribute_group *mdev_type_groups[] = {
&mdev_type_group1,
&mdev_type_group2,
&mdev_type_group3,
NULL,
};
static const struct vfio_device_ops mdpy_dev_ops = {
.init = mdpy_init_dev,
.release = mdpy_release_dev,
.read = mdpy_read,
.write = mdpy_write,
.ioctl = mdpy_ioctl,
@ -715,6 +667,8 @@ static const struct vfio_device_ops mdpy_dev_ops = {
};
static struct mdev_driver mdpy_driver = {
.device_api = VFIO_DEVICE_API_PCI_STRING,
.max_instances = 4,
.driver = {
.name = "mdpy",
.owner = THIS_MODULE,
@ -723,7 +677,7 @@ static struct mdev_driver mdpy_driver = {
},
.probe = mdpy_probe,
.remove = mdpy_remove,
.supported_type_groups = mdev_type_groups,
.show_description = mdpy_show_description,
};
static const struct file_operations vd_fops = {
@ -766,7 +720,9 @@ static int __init mdpy_dev_init(void)
if (ret)
goto err_class;
ret = mdev_register_device(&mdpy_dev, &mdpy_driver);
ret = mdev_register_parent(&mdpy_parent, &mdpy_dev, &mdpy_driver,
mdpy_mdev_types,
ARRAY_SIZE(mdpy_mdev_types));
if (ret)
goto err_device;
@ -787,7 +743,7 @@ err_cdev:
static void __exit mdpy_dev_exit(void)
{
mdpy_dev.bus = NULL;
mdev_unregister_device(&mdpy_dev);
mdev_unregister_parent(&mdpy_parent);
device_unregister(&mdpy_dev);
mdev_unregister_driver(&mdpy_driver);
@ -797,5 +753,8 @@ static void __exit mdpy_dev_exit(void)
mdpy_class = NULL;
}
module_param_named(count, mdpy_driver.max_instances, int, 0444);
MODULE_PARM_DESC(count, "number of " MDPY_NAME " devices");
module_init(mdpy_dev_init)
module_exit(mdpy_dev_exit)

View File

@ -12,7 +12,6 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/poll.h>
@ -20,7 +19,6 @@
#include <linux/cdev.h>
#include <linux/sched.h>
#include <linux/wait.h>
#include <linux/uuid.h>
#include <linux/vfio.h>
#include <linux/iommu.h>
#include <linux/sysfs.h>
@ -74,6 +72,7 @@ static struct mtty_dev {
struct cdev vd_cdev;
struct idr vd_idr;
struct device dev;
struct mdev_parent parent;
} mtty_dev;
struct mdev_region_info {
@ -144,6 +143,21 @@ struct mdev_state {
int nr_ports;
};
static struct mtty_type {
struct mdev_type type;
int nr_ports;
} mtty_types[2] = {
{ .nr_ports = 1, .type.sysfs_name = "1",
.type.pretty_name = "Single port serial" },
{ .nr_ports = 2, .type.sysfs_name = "2",
.type.pretty_name = "Dual port serial" },
};
static struct mdev_type *mtty_mdev_types[] = {
&mtty_types[0].type,
&mtty_types[1].type,
};
static atomic_t mdev_avail_ports = ATOMIC_INIT(MAX_MTTYS);
static const struct file_operations vd_fops = {
@ -703,71 +717,82 @@ accessfailed:
return ret;
}
static int mtty_probe(struct mdev_device *mdev)
static int mtty_init_dev(struct vfio_device *vdev)
{
struct mdev_state *mdev_state;
int nr_ports = mdev_get_type_group_id(mdev) + 1;
struct mdev_state *mdev_state =
container_of(vdev, struct mdev_state, vdev);
struct mdev_device *mdev = to_mdev_device(vdev->dev);
struct mtty_type *type =
container_of(mdev->type, struct mtty_type, type);
int avail_ports = atomic_read(&mdev_avail_ports);
int ret;
do {
if (avail_ports < nr_ports)
if (avail_ports < type->nr_ports)
return -ENOSPC;
} while (!atomic_try_cmpxchg(&mdev_avail_ports,
&avail_ports, avail_ports - nr_ports));
&avail_ports,
avail_ports - type->nr_ports));
mdev_state = kzalloc(sizeof(struct mdev_state), GFP_KERNEL);
if (mdev_state == NULL) {
ret = -ENOMEM;
goto err_nr_ports;
}
vfio_init_group_dev(&mdev_state->vdev, &mdev->dev, &mtty_dev_ops);
mdev_state->nr_ports = nr_ports;
mdev_state->nr_ports = type->nr_ports;
mdev_state->irq_index = -1;
mdev_state->s[0].max_fifo_size = MAX_FIFO_SIZE;
mdev_state->s[1].max_fifo_size = MAX_FIFO_SIZE;
mutex_init(&mdev_state->rxtx_lock);
mdev_state->vconfig = kzalloc(MTTY_CONFIG_SPACE_SIZE, GFP_KERNEL);
if (mdev_state->vconfig == NULL) {
mdev_state->vconfig = kzalloc(MTTY_CONFIG_SPACE_SIZE, GFP_KERNEL);
if (!mdev_state->vconfig) {
ret = -ENOMEM;
goto err_state;
goto err_nr_ports;
}
mutex_init(&mdev_state->ops_lock);
mdev_state->mdev = mdev;
mtty_create_config_space(mdev_state);
return 0;
err_nr_ports:
atomic_add(type->nr_ports, &mdev_avail_ports);
return ret;
}
static int mtty_probe(struct mdev_device *mdev)
{
struct mdev_state *mdev_state;
int ret;
mdev_state = vfio_alloc_device(mdev_state, vdev, &mdev->dev,
&mtty_dev_ops);
if (IS_ERR(mdev_state))
return PTR_ERR(mdev_state);
ret = vfio_register_emulated_iommu_dev(&mdev_state->vdev);
if (ret)
goto err_vconfig;
goto err_put_vdev;
dev_set_drvdata(&mdev->dev, mdev_state);
return 0;
err_vconfig:
kfree(mdev_state->vconfig);
err_state:
vfio_uninit_group_dev(&mdev_state->vdev);
kfree(mdev_state);
err_nr_ports:
atomic_add(nr_ports, &mdev_avail_ports);
err_put_vdev:
vfio_put_device(&mdev_state->vdev);
return ret;
}
static void mtty_release_dev(struct vfio_device *vdev)
{
struct mdev_state *mdev_state =
container_of(vdev, struct mdev_state, vdev);
atomic_add(mdev_state->nr_ports, &mdev_avail_ports);
kfree(mdev_state->vconfig);
vfio_free_device(vdev);
}
static void mtty_remove(struct mdev_device *mdev)
{
struct mdev_state *mdev_state = dev_get_drvdata(&mdev->dev);
int nr_ports = mdev_state->nr_ports;
vfio_unregister_group_dev(&mdev_state->vdev);
kfree(mdev_state->vconfig);
vfio_uninit_group_dev(&mdev_state->vdev);
kfree(mdev_state);
atomic_add(nr_ports, &mdev_avail_ports);
vfio_put_device(&mdev_state->vdev);
}
static int mtty_reset(struct mdev_state *mdev_state)
@ -1231,68 +1256,24 @@ static const struct attribute_group *mdev_dev_groups[] = {
NULL,
};
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
static unsigned int mtty_get_available(struct mdev_type *mtype)
{
static const char *name_str[2] = { "Single port serial",
"Dual port serial" };
struct mtty_type *type = container_of(mtype, struct mtty_type, type);
return sysfs_emit(buf, "%s\n",
name_str[mtype_get_type_group_id(mtype)]);
return atomic_read(&mdev_avail_ports) / type->nr_ports;
}
static MDEV_TYPE_ATTR_RO(name);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
{
unsigned int ports = mtype_get_type_group_id(mtype) + 1;
return sprintf(buf, "%d\n", atomic_read(&mdev_avail_ports) / ports);
}
static MDEV_TYPE_ATTR_RO(available_instances);
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
}
static MDEV_TYPE_ATTR_RO(device_api);
static struct attribute *mdev_types_attrs[] = {
&mdev_type_attr_name.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_available_instances.attr,
NULL,
};
static struct attribute_group mdev_type_group1 = {
.name = "1",
.attrs = mdev_types_attrs,
};
static struct attribute_group mdev_type_group2 = {
.name = "2",
.attrs = mdev_types_attrs,
};
static struct attribute_group *mdev_type_groups[] = {
&mdev_type_group1,
&mdev_type_group2,
NULL,
};
static const struct vfio_device_ops mtty_dev_ops = {
.name = "vfio-mtty",
.init = mtty_init_dev,
.release = mtty_release_dev,
.read = mtty_read,
.write = mtty_write,
.ioctl = mtty_ioctl,
};
static struct mdev_driver mtty_driver = {
.device_api = VFIO_DEVICE_API_PCI_STRING,
.driver = {
.name = "mtty",
.owner = THIS_MODULE,
@ -1301,7 +1282,7 @@ static struct mdev_driver mtty_driver = {
},
.probe = mtty_probe,
.remove = mtty_remove,
.supported_type_groups = mdev_type_groups,
.get_available = mtty_get_available,
};
static void mtty_device_release(struct device *dev)
@ -1352,7 +1333,9 @@ static int __init mtty_dev_init(void)
if (ret)
goto err_class;
ret = mdev_register_device(&mtty_dev.dev, &mtty_driver);
ret = mdev_register_parent(&mtty_dev.parent, &mtty_dev.dev,
&mtty_driver, mtty_mdev_types,
ARRAY_SIZE(mtty_mdev_types));
if (ret)
goto err_device;
return 0;
@ -1372,7 +1355,7 @@ err_cdev:
static void __exit mtty_dev_exit(void)
{
mtty_dev.dev.bus = NULL;
mdev_unregister_device(&mtty_dev.dev);
mdev_unregister_parent(&mtty_dev.parent);
device_unregister(&mtty_dev.dev);
idr_destroy(&mtty_dev.vd_idr);

View File

@ -24,6 +24,9 @@
struct kvm_vfio_group {
struct list_head node;
struct file *file;
#ifdef CONFIG_SPAPR_TCE_IOMMU
struct iommu_group *iommu_group;
#endif
};
struct kvm_vfio {
@ -61,6 +64,23 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
return ret;
}
static bool kvm_vfio_file_is_group(struct file *file)
{
bool (*fn)(struct file *file);
bool ret;
fn = symbol_get(vfio_file_is_group);
if (!fn)
return false;
ret = fn(file);
symbol_put(vfio_file_is_group);
return ret;
}
#ifdef CONFIG_SPAPR_TCE_IOMMU
static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
{
struct iommu_group *(*fn)(struct file *file);
@ -77,16 +97,15 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
return ret;
}
#ifdef CONFIG_SPAPR_TCE_IOMMU
static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
struct kvm_vfio_group *kvg)
{
struct iommu_group *grp = kvm_vfio_file_iommu_group(kvg->file);
if (WARN_ON_ONCE(!grp))
if (WARN_ON_ONCE(!kvg->iommu_group))
return;
kvm_spapr_tce_release_iommu_group(kvm, grp);
kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
iommu_group_put(kvg->iommu_group);
kvg->iommu_group = NULL;
}
#endif
@ -136,7 +155,7 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
return -EBADF;
/* Ensure the FD is a vfio group FD.*/
if (!kvm_vfio_file_iommu_group(filp)) {
if (!kvm_vfio_file_is_group(filp)) {
ret = -EINVAL;
goto err_fput;
}
@ -236,19 +255,19 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
mutex_lock(&kv->lock);
list_for_each_entry(kvg, &kv->group_list, node) {
struct iommu_group *grp;
if (kvg->file != f.file)
continue;
grp = kvm_vfio_file_iommu_group(kvg->file);
if (WARN_ON_ONCE(!grp)) {
if (!kvg->iommu_group) {
kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
if (WARN_ON_ONCE(!kvg->iommu_group)) {
ret = -EIO;
goto err_fdput;
}
}
ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
grp);
kvg->iommu_group);
break;
}