linux/Documentation/mm/page_frags.rst
Jakub Kicinski 6e9b01909a net: remove gfp_mask from napi_alloc_skb()
__napi_alloc_skb() is napi_alloc_skb() with the added flexibility
of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is
implied. The only practical choice the caller has is whether to
set __GFP_NOWARN. But that's a false choice, too, allocation failures
in atomic context will happen, and printing warnings in logs,
effectively for a packet drop, is both too much and very likely
non-actionable.

This leads me to a conclusion that most uses of napi_alloc_skb()
are simply misguided, and should use __GFP_NOWARN in the first
place. We also have a "standard" way of reporting allocation
failures via the queue stat API (qstats::rx-alloc-fail).

The direct motivation for this patch is that one of the drivers
used at Meta calls napi_alloc_skb() (so prior to this patch without
__GFP_NOWARN), and the resulting OOM warning is the top networking
warning in our fleet.

Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240327040213.3153864-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:30:40 -07:00

44 lines
2.2 KiB
ReStructuredText

==============
Page fragments
==============
A page fragment is an arbitrary-length arbitrary-offset area of memory
which resides within a 0 or higher order compound page. Multiple
fragments within that page are individually refcounted, in the page's
reference counter.
The page_frag functions, page_frag_alloc and page_frag_free, provide a
simple allocation framework for page fragments. This is used by the
network stack and network device drivers to provide a backing region of
memory for use as either an sk_buff->head, or to be used in the "frags"
portion of skb_shared_info.
In order to make use of the page fragment APIs a backing page fragment
cache is needed. This provides a central point for the fragment allocation
and tracks allows multiple calls to make use of a cached page. The
advantage to doing this is that multiple calls to get_page can be avoided
which can be expensive at allocation time. However due to the nature of
this caching it is required that any calls to the cache be protected by
either a per-cpu limitation, or a per-cpu limitation and forcing interrupts
to be disabled when executing the fragment allocation.
The network stack uses two separate caches per CPU to handle fragment
allocation. The netdev_alloc_cache is used by callers making use of the
netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is
used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The
main difference between these two calls is the context in which they may be
called. The "netdev" prefixed functions are usable in any context as these
functions will disable interrupts, while the "napi" prefixed functions are
only usable within the softirq context.
Many network device drivers use a similar methodology for allocating page
fragments, but the page fragments are cached at the ring or descriptor
level. In order to enable these cases it is necessary to provide a generic
way of tearing down a page cache. For this reason __page_frag_cache_drain
was implemented. It allows for freeing multiple references from a single
page via a single call. The advantage to doing this is that it allows for
cleaning up the multiple references that were added to a page in order to
avoid calling get_page per allocation.
Alexander Duyck, Nov 29, 2016.