net: Add documentation for netdev features handling
v2: incorporated suggestions from Randy Dunlap Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
4c5102f94c
commit
e5b1de1f5e
1 changed files with 154 additions and 0 deletions
154
Documentation/networking/netdev-features.txt
Normal file
154
Documentation/networking/netdev-features.txt
Normal file
|
@ -0,0 +1,154 @@
|
|||
Netdev features mess and how to get out from it alive
|
||||
=====================================================
|
||||
|
||||
Author:
|
||||
Michał Mirosław <mirq-linux@rere.qmqm.pl>
|
||||
|
||||
|
||||
|
||||
Part I: Feature sets
|
||||
======================
|
||||
|
||||
Long gone are the days when a network card would just take and give packets
|
||||
verbatim. Today's devices add multiple features and bugs (read: offloads)
|
||||
that relieve an OS of various tasks like generating and checking checksums,
|
||||
splitting packets, classifying them. Those capabilities and their state
|
||||
are commonly referred to as netdev features in Linux kernel world.
|
||||
|
||||
There are currently three sets of features relevant to the driver, and
|
||||
one used internally by network core:
|
||||
|
||||
1. netdev->hw_features set contains features whose state may possibly
|
||||
be changed (enabled or disabled) for a particular device by user's
|
||||
request. This set should be initialized in ndo_init callback and not
|
||||
changed later.
|
||||
|
||||
2. netdev->features set contains features which are currently enabled
|
||||
for a device. This should be changed only by network core or in
|
||||
error paths of ndo_set_features callback.
|
||||
|
||||
3. netdev->vlan_features set contains features whose state is inherited
|
||||
by child VLAN devices (limits netdev->features set). This is currently
|
||||
used for all VLAN devices whether tags are stripped or inserted in
|
||||
hardware or software.
|
||||
|
||||
4. netdev->wanted_features set contains feature set requested by user.
|
||||
This set is filtered by ndo_fix_features callback whenever it or
|
||||
some device-specific conditions change. This set is internal to
|
||||
networking core and should not be referenced in drivers.
|
||||
|
||||
|
||||
|
||||
Part II: Controlling enabled features
|
||||
=======================================
|
||||
|
||||
When current feature set (netdev->features) is to be changed, new set
|
||||
is calculated and filtered by calling ndo_fix_features callback
|
||||
and netdev_fix_features(). If the resulting set differs from current
|
||||
set, it is passed to ndo_set_features callback and (if the callback
|
||||
returns success) replaces value stored in netdev->features.
|
||||
NETDEV_FEAT_CHANGE notification is issued after that whenever current
|
||||
set might have changed.
|
||||
|
||||
The following events trigger recalculation:
|
||||
1. device's registration, after ndo_init returned success
|
||||
2. user requested changes in features state
|
||||
3. netdev_update_features() is called
|
||||
|
||||
ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
|
||||
are treated as always returning success.
|
||||
|
||||
A driver that wants to trigger recalculation must do so by calling
|
||||
netdev_update_features() while holding rtnl_lock. This should not be done
|
||||
from ndo_*_features callbacks. netdev->features should not be modified by
|
||||
driver except by means of ndo_fix_features callback.
|
||||
|
||||
|
||||
|
||||
Part III: Implementation hints
|
||||
================================
|
||||
|
||||
* ndo_fix_features:
|
||||
|
||||
All dependencies between features should be resolved here. The resulting
|
||||
set can be reduced further by networking core imposed limitations (as coded
|
||||
in netdev_fix_features()). For this reason it is safer to disable a feature
|
||||
when its dependencies are not met instead of forcing the dependency on.
|
||||
|
||||
This callback should not modify hardware nor driver state (should be
|
||||
stateless). It can be called multiple times between successive
|
||||
ndo_set_features calls.
|
||||
|
||||
Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
|
||||
NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
|
||||
care must be taken as the change won't affect already configured VLANs.
|
||||
|
||||
* ndo_set_features:
|
||||
|
||||
Hardware should be reconfigured to match passed feature set. The set
|
||||
should not be altered unless some error condition happens that can't
|
||||
be reliably detected in ndo_fix_features. In this case, the callback
|
||||
should update netdev->features to match resulting hardware state.
|
||||
Errors returned are not (and cannot be) propagated anywhere except dmesg.
|
||||
(Note: successful return is zero, >0 means silent error.)
|
||||
|
||||
|
||||
|
||||
Part IV: Features
|
||||
===================
|
||||
|
||||
For current list of features, see include/linux/netdev_features.h.
|
||||
This section describes semantics of some of them.
|
||||
|
||||
* Transmit checksumming
|
||||
|
||||
For complete description, see comments near the top of include/linux/skbuff.h.
|
||||
|
||||
Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
|
||||
It means that device can fill TCP/UDP-like checksum anywhere in the packets
|
||||
whatever headers there might be.
|
||||
|
||||
* Transmit TCP segmentation offload
|
||||
|
||||
NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
|
||||
set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
|
||||
|
||||
* Transmit DMA from high memory
|
||||
|
||||
On platforms where this is relevant, NETIF_F_HIGHDMA signals that
|
||||
ndo_start_xmit can handle skbs with frags in high memory.
|
||||
|
||||
* Transmit scatter-gather
|
||||
|
||||
Those features say that ndo_start_xmit can handle fragmented skbs:
|
||||
NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
|
||||
chained skbs (skb->next/prev list).
|
||||
|
||||
* Software features
|
||||
|
||||
Features contained in NETIF_F_SOFT_FEATURES are features of networking
|
||||
stack. Driver should not change behaviour based on them.
|
||||
|
||||
* LLTX driver (deprecated for hardware drivers)
|
||||
|
||||
NETIF_F_LLTX should be set in drivers that implement their own locking in
|
||||
transmit path or don't need locking at all (e.g. software tunnels).
|
||||
In ndo_start_xmit, it is recommended to use a try_lock and return
|
||||
NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly
|
||||
protect against other callbacks (the rules you need to find out).
|
||||
|
||||
Don't use it for new drivers.
|
||||
|
||||
* netns-local device
|
||||
|
||||
NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
|
||||
network namespaces (e.g. loopback).
|
||||
|
||||
Don't use it in drivers.
|
||||
|
||||
* VLAN challenged
|
||||
|
||||
NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
|
||||
headers. Some drivers set this because the cards can't handle the bigger MTU.
|
||||
[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
|
||||
VLANs. This may be not useful, though.]
|
Loading…
Reference in a new issue