forked from Minki/linux
fb1f5f79ae
Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware GRO. With this flag, we can now independently turn on or off hardware GRO when GRO is on. Previously, drivers were using NETIF_F_GRO to control hardware GRO and so it cannot be independently turned on or off without affecting GRO. Hardware GRO (just like GRO) guarantees that packets can be re-segmented by TSO/GSO to reconstruct the original packet stream. Logically, GRO_HW should depend on GRO since it a subset, but we will let individual drivers enforce this dependency as they see fit. Since NETIF_F_GRO is not propagated between upper and lower devices, NETIF_F_GRO_HW should follow suit since it is a subset of GRO. In other words, a lower device can independent have GRO/GRO_HW enabled or disabled and no feature propagation is required. This will preserve the current GRO behavior. This can be changed later if we decide to propagate GRO/ GRO_HW/RXCSUM from upper to lower devices. Cc: Ariel Elior <Ariel.Elior@cavium.com> Cc: everest-linux-l2@cavium.com Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
175 lines
6.5 KiB
Plaintext
175 lines
6.5 KiB
Plaintext
Netdev features mess and how to get out from it alive
|
|
=====================================================
|
|
|
|
Author:
|
|
Michał Mirosław <mirq-linux@rere.qmqm.pl>
|
|
|
|
|
|
|
|
Part I: Feature sets
|
|
======================
|
|
|
|
Long gone are the days when a network card would just take and give packets
|
|
verbatim. Today's devices add multiple features and bugs (read: offloads)
|
|
that relieve an OS of various tasks like generating and checking checksums,
|
|
splitting packets, classifying them. Those capabilities and their state
|
|
are commonly referred to as netdev features in Linux kernel world.
|
|
|
|
There are currently three sets of features relevant to the driver, and
|
|
one used internally by network core:
|
|
|
|
1. netdev->hw_features set contains features whose state may possibly
|
|
be changed (enabled or disabled) for a particular device by user's
|
|
request. This set should be initialized in ndo_init callback and not
|
|
changed later.
|
|
|
|
2. netdev->features set contains features which are currently enabled
|
|
for a device. This should be changed only by network core or in
|
|
error paths of ndo_set_features callback.
|
|
|
|
3. netdev->vlan_features set contains features whose state is inherited
|
|
by child VLAN devices (limits netdev->features set). This is currently
|
|
used for all VLAN devices whether tags are stripped or inserted in
|
|
hardware or software.
|
|
|
|
4. netdev->wanted_features set contains feature set requested by user.
|
|
This set is filtered by ndo_fix_features callback whenever it or
|
|
some device-specific conditions change. This set is internal to
|
|
networking core and should not be referenced in drivers.
|
|
|
|
|
|
|
|
Part II: Controlling enabled features
|
|
=======================================
|
|
|
|
When current feature set (netdev->features) is to be changed, new set
|
|
is calculated and filtered by calling ndo_fix_features callback
|
|
and netdev_fix_features(). If the resulting set differs from current
|
|
set, it is passed to ndo_set_features callback and (if the callback
|
|
returns success) replaces value stored in netdev->features.
|
|
NETDEV_FEAT_CHANGE notification is issued after that whenever current
|
|
set might have changed.
|
|
|
|
The following events trigger recalculation:
|
|
1. device's registration, after ndo_init returned success
|
|
2. user requested changes in features state
|
|
3. netdev_update_features() is called
|
|
|
|
ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
|
|
are treated as always returning success.
|
|
|
|
A driver that wants to trigger recalculation must do so by calling
|
|
netdev_update_features() while holding rtnl_lock. This should not be done
|
|
from ndo_*_features callbacks. netdev->features should not be modified by
|
|
driver except by means of ndo_fix_features callback.
|
|
|
|
|
|
|
|
Part III: Implementation hints
|
|
================================
|
|
|
|
* ndo_fix_features:
|
|
|
|
All dependencies between features should be resolved here. The resulting
|
|
set can be reduced further by networking core imposed limitations (as coded
|
|
in netdev_fix_features()). For this reason it is safer to disable a feature
|
|
when its dependencies are not met instead of forcing the dependency on.
|
|
|
|
This callback should not modify hardware nor driver state (should be
|
|
stateless). It can be called multiple times between successive
|
|
ndo_set_features calls.
|
|
|
|
Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
|
|
NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
|
|
care must be taken as the change won't affect already configured VLANs.
|
|
|
|
* ndo_set_features:
|
|
|
|
Hardware should be reconfigured to match passed feature set. The set
|
|
should not be altered unless some error condition happens that can't
|
|
be reliably detected in ndo_fix_features. In this case, the callback
|
|
should update netdev->features to match resulting hardware state.
|
|
Errors returned are not (and cannot be) propagated anywhere except dmesg.
|
|
(Note: successful return is zero, >0 means silent error.)
|
|
|
|
|
|
|
|
Part IV: Features
|
|
===================
|
|
|
|
For current list of features, see include/linux/netdev_features.h.
|
|
This section describes semantics of some of them.
|
|
|
|
* Transmit checksumming
|
|
|
|
For complete description, see comments near the top of include/linux/skbuff.h.
|
|
|
|
Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
|
|
It means that device can fill TCP/UDP-like checksum anywhere in the packets
|
|
whatever headers there might be.
|
|
|
|
* Transmit TCP segmentation offload
|
|
|
|
NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
|
|
set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
|
|
|
|
* Transmit DMA from high memory
|
|
|
|
On platforms where this is relevant, NETIF_F_HIGHDMA signals that
|
|
ndo_start_xmit can handle skbs with frags in high memory.
|
|
|
|
* Transmit scatter-gather
|
|
|
|
Those features say that ndo_start_xmit can handle fragmented skbs:
|
|
NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
|
|
chained skbs (skb->next/prev list).
|
|
|
|
* Software features
|
|
|
|
Features contained in NETIF_F_SOFT_FEATURES are features of networking
|
|
stack. Driver should not change behaviour based on them.
|
|
|
|
* LLTX driver (deprecated for hardware drivers)
|
|
|
|
NETIF_F_LLTX is meant to be used by drivers that don't need locking at all,
|
|
e.g. software tunnels.
|
|
|
|
This is also used in a few legacy drivers that implement their
|
|
own locking, don't use it for new (hardware) drivers.
|
|
|
|
* netns-local device
|
|
|
|
NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
|
|
network namespaces (e.g. loopback).
|
|
|
|
Don't use it in drivers.
|
|
|
|
* VLAN challenged
|
|
|
|
NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
|
|
headers. Some drivers set this because the cards can't handle the bigger MTU.
|
|
[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
|
|
VLANs. This may be not useful, though.]
|
|
|
|
* rx-fcs
|
|
|
|
This requests that the NIC append the Ethernet Frame Checksum (FCS)
|
|
to the end of the skb data. This allows sniffers and other tools to
|
|
read the CRC recorded by the NIC on receipt of the packet.
|
|
|
|
* rx-all
|
|
|
|
This requests that the NIC receive all possible frames, including errored
|
|
frames (such as bad FCS, etc). This can be helpful when sniffing a link with
|
|
bad packets on it. Some NICs may receive more packets if also put into normal
|
|
PROMISC mode.
|
|
|
|
* rx-gro-hw
|
|
|
|
This requests that the NIC enables Hardware GRO (generic receive offload).
|
|
Hardware GRO is basically the exact reverse of TSO, and is generally
|
|
stricter than Hardware LRO. A packet stream merged by Hardware GRO must
|
|
be re-segmentable by GSO or TSO back to the exact original packet stream.
|
|
Hardware GRO is dependent on RXCSUM since every packet successfully merged
|
|
by hardware must also have the checksum verified by hardware.
|