mirror of
https://github.com/torvalds/linux.git
synced 2024-12-31 23:31:29 +00:00
b62767e7ba
Add devlink rate objects section at devlink port documentation. Add devlink rate support info at netdevsim devlink documentation. Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
235 lines
10 KiB
ReStructuredText
235 lines
10 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
.. _devlink_port:
|
|
|
|
============
|
|
Devlink Port
|
|
============
|
|
|
|
``devlink-port`` is a port that exists on the device. It has a logically
|
|
separate ingress/egress point of the device. A devlink port can be any one
|
|
of many flavours. A devlink port flavour along with port attributes
|
|
describe what a port represents.
|
|
|
|
A device driver that intends to publish a devlink port sets the
|
|
devlink port attributes and registers the devlink port.
|
|
|
|
Devlink port flavours are described below.
|
|
|
|
.. list-table:: List of devlink port flavours
|
|
:widths: 33 90
|
|
|
|
* - Flavour
|
|
- Description
|
|
* - ``DEVLINK_PORT_FLAVOUR_PHYSICAL``
|
|
- Any kind of physical port. This can be an eswitch physical port or any
|
|
other physical port on the device.
|
|
* - ``DEVLINK_PORT_FLAVOUR_DSA``
|
|
- This indicates a DSA interconnect port.
|
|
* - ``DEVLINK_PORT_FLAVOUR_CPU``
|
|
- This indicates a CPU port applicable only to DSA.
|
|
* - ``DEVLINK_PORT_FLAVOUR_PCI_PF``
|
|
- This indicates an eswitch port representing a port of PCI
|
|
physical function (PF).
|
|
* - ``DEVLINK_PORT_FLAVOUR_PCI_VF``
|
|
- This indicates an eswitch port representing a port of PCI
|
|
virtual function (VF).
|
|
* - ``DEVLINK_PORT_FLAVOUR_PCI_SF``
|
|
- This indicates an eswitch port representing a port of PCI
|
|
subfunction (SF).
|
|
* - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
|
|
- This indicates a virtual port for the PCI virtual function.
|
|
|
|
Devlink port can have a different type based on the link layer described below.
|
|
|
|
.. list-table:: List of devlink port types
|
|
:widths: 23 90
|
|
|
|
* - Type
|
|
- Description
|
|
* - ``DEVLINK_PORT_TYPE_ETH``
|
|
- Driver should set this port type when a link layer of the port is
|
|
Ethernet.
|
|
* - ``DEVLINK_PORT_TYPE_IB``
|
|
- Driver should set this port type when a link layer of the port is
|
|
InfiniBand.
|
|
* - ``DEVLINK_PORT_TYPE_AUTO``
|
|
- This type is indicated by the user when driver should detect the port
|
|
type automatically.
|
|
|
|
PCI controllers
|
|
---------------
|
|
In most cases a PCI device has only one controller. A controller consists of
|
|
potentially multiple physical, virtual functions and subfunctions. A function
|
|
consists of one or more ports. This port is represented by the devlink eswitch
|
|
port.
|
|
|
|
A PCI device connected to multiple CPUs or multiple PCI root complexes or a
|
|
SmartNIC, however, may have multiple controllers. For a device with multiple
|
|
controllers, each controller is distinguished by a unique controller number.
|
|
An eswitch is on the PCI device which supports ports of multiple controllers.
|
|
|
|
An example view of a system with two controllers::
|
|
|
|
---------------------------------------------------------
|
|
| |
|
|
| --------- --------- ------- ------- |
|
|
----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
|
|
| server | | ------- ----/---- ---/----- ------- ---/--- ---/--- |
|
|
| pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ |
|
|
| connect | | ------- ------- |
|
|
----------- | | controller_num=1 (no eswitch) |
|
|
------|--------------------------------------------------
|
|
(internal wire)
|
|
|
|
|
---------------------------------------------------------
|
|
| devlink eswitch ports and reps |
|
|
| ----------------------------------------------------- |
|
|
| |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
|
|
| |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | |
|
|
| ----------------------------------------------------- |
|
|
| |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
|
|
| |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | |
|
|
| ----------------------------------------------------- |
|
|
| |
|
|
| |
|
|
----------- | --------- --------- ------- ------- |
|
|
| smartNIC| | | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
|
|
| pci rc |==| ------- ----/---- ---/----- ------- ---/--- ---/--- |
|
|
| connect | | | pf0 |______/________/ | pf1 |___/_______/ |
|
|
----------- | ------- ------- |
|
|
| |
|
|
| local controller_num=0 (eswitch) |
|
|
---------------------------------------------------------
|
|
|
|
In the above example, the external controller (identified by controller number = 1)
|
|
doesn't have the eswitch. Local controller (identified by controller number = 0)
|
|
has the eswitch. The Devlink instance on the local controller has eswitch
|
|
devlink ports for both the controllers.
|
|
|
|
Function configuration
|
|
======================
|
|
|
|
A user can configure the function attribute before enumerating the PCI
|
|
function. Usually it means, user should configure function attribute
|
|
before a bus specific device for the function is created. However, when
|
|
SRIOV is enabled, virtual function devices are created on the PCI bus.
|
|
Hence, function attribute should be configured before binding virtual
|
|
function device to the driver. For subfunctions, this means user should
|
|
configure port function attribute before activating the port function.
|
|
|
|
A user may set the hardware address of the function using
|
|
'devlink port function set hw_addr' command. For Ethernet port function
|
|
this means a MAC address.
|
|
|
|
Subfunction
|
|
============
|
|
|
|
Subfunction is a lightweight function that has a parent PCI function on which
|
|
it is deployed. Subfunction is created and deployed in unit of 1. Unlike
|
|
SRIOV VFs, a subfunction doesn't require its own PCI virtual function.
|
|
A subfunction communicates with the hardware through the parent PCI function.
|
|
|
|
To use a subfunction, 3 steps setup sequence is followed.
|
|
(1) create - create a subfunction;
|
|
(2) configure - configure subfunction attributes;
|
|
(3) deploy - deploy the subfunction;
|
|
|
|
Subfunction management is done using devlink port user interface.
|
|
User performs setup on the subfunction management device.
|
|
|
|
(1) Create
|
|
----------
|
|
A subfunction is created using a devlink port interface. A user adds the
|
|
subfunction by adding a devlink port of subfunction flavour. The devlink
|
|
kernel code calls down to subfunction management driver (devlink ops) and asks
|
|
it to create a subfunction devlink port. Driver then instantiates the
|
|
subfunction port and any associated objects such as health reporters and
|
|
representor netdevice.
|
|
|
|
(2) Configure
|
|
-------------
|
|
A subfunction devlink port is created but it is not active yet. That means the
|
|
entities are created on devlink side, the e-switch port representor is created,
|
|
but the subfunction device itself is not created. A user might use e-switch port
|
|
representor to do settings, putting it into bridge, adding TC rules, etc. A user
|
|
might as well configure the hardware address (such as MAC address) of the
|
|
subfunction while subfunction is inactive.
|
|
|
|
(3) Deploy
|
|
----------
|
|
Once a subfunction is configured, user must activate it to use it. Upon
|
|
activation, subfunction management driver asks the subfunction management
|
|
device to instantiate the subfunction device on particular PCI function.
|
|
A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst <auxiliary_bus>`.
|
|
At this point a matching subfunction driver binds to the subfunction's auxiliary device.
|
|
|
|
Rate object management
|
|
======================
|
|
|
|
Devlink provides API to manage tx rates of single devlink port or a group.
|
|
This is done through rate objects, which can be one of the two types:
|
|
|
|
``leaf``
|
|
Represents a single devlink port; created/destroyed by the driver. Since leaf
|
|
have 1to1 mapping to its devlink port, in user space it is referred as
|
|
``pci/<bus_addr>/<port_index>``;
|
|
|
|
``node``
|
|
Represents a group of rate objects (leafs and/or nodes); created/deleted by
|
|
request from the userspace; initially empty (no rate objects added). In
|
|
userspace it is referred as ``pci/<bus_addr>/<node_name>``, where
|
|
``node_name`` can be any identifier, except decimal number, to avoid
|
|
collisions with leafs.
|
|
|
|
API allows to configure following rate object's parameters:
|
|
|
|
``tx_share``
|
|
Minimum TX rate value shared among all other rate objects, or rate objects
|
|
that parts of the parent group, if it is a part of the same group.
|
|
|
|
``tx_max``
|
|
Maximum TX rate value.
|
|
|
|
``parent``
|
|
Parent node name. Parent node rate limits are considered as additional limits
|
|
to all node children limits. ``tx_max`` is an upper limit for children.
|
|
``tx_share`` is a total bandwidth distributed among children.
|
|
|
|
Driver implementations are allowed to support both or either rate object types
|
|
and setting methods of their parameters.
|
|
|
|
Terms and Definitions
|
|
=====================
|
|
|
|
.. list-table:: Terms and Definitions
|
|
:widths: 22 90
|
|
|
|
* - Term
|
|
- Definitions
|
|
* - ``PCI device``
|
|
- A physical PCI device having one or more PCI buses consists of one or
|
|
more PCI controllers.
|
|
* - ``PCI controller``
|
|
- A controller consists of potentially multiple physical functions,
|
|
virtual functions and subfunctions.
|
|
* - ``Port function``
|
|
- An object to manage the function of a port.
|
|
* - ``Subfunction``
|
|
- A lightweight function that has parent PCI function on which it is
|
|
deployed.
|
|
* - ``Subfunction device``
|
|
- A bus device of the subfunction, usually on a auxiliary bus.
|
|
* - ``Subfunction driver``
|
|
- A device driver for the subfunction auxiliary device.
|
|
* - ``Subfunction management device``
|
|
- A PCI physical function that supports subfunction management.
|
|
* - ``Subfunction management driver``
|
|
- A device driver for PCI physical function that supports
|
|
subfunction management using devlink port interface.
|
|
* - ``Subfunction host driver``
|
|
- A device driver for PCI physical function that hosts subfunction
|
|
devices. In most cases it is same as subfunction management driver. When
|
|
subfunction is used on external controller, subfunction management and
|
|
host drivers are different.
|