linux/Documentation/networking
David Ahern 58956317c8 neighbor: Improve garbage collection
The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:03:10 -08:00
..
caif
device_drivers net: documentation: build a directory structure for drivers 2018-12-05 11:30:06 -08:00
dsa Documentation: net: dsa: Cut set_addr() documentation 2017-11-30 10:10:16 -05:00
mac80211_hwsim mac80211_hwsim: suggest nl80211 instead of wext driver in documentation 2016-10-17 11:38:01 +02:00
6lowpan.txt docs: networking: fix minor typos in various documentation files 2018-06-04 17:21:28 -04:00
6pack.txt
af_xdp.rst bpf: typo fix in Documentation/networking/af_xdp.rst 2018-10-05 00:11:24 +02:00
alias.rst docs: networking: Convert alias.txt to rst 2018-07-18 15:28:27 -07:00
altera_tse.txt Documentation: networking: fix spelling mistakes 2016-04-28 14:21:13 -04:00
arcnet-hardware.txt
arcnet.txt
atm.txt
ax25.txt
batman-adv.rst batman-adv: Add SPDX license identifier to batman-adv.rst 2017-12-15 17:22:46 +01:00
baycom.txt
bonding.txt bonding: Fix a typo in bonding.txt 2018-07-16 13:32:12 -07:00
bridge.rst docs: networking: Convert bridge.txt to rst 2018-07-18 15:28:27 -07:00
can_ucan_protocol.rst can: ucan: add driver for Theobroma Systems UCAN devices 2018-07-27 10:40:16 +02:00
can.rst docs: can.rst: fix a footnote reference 2018-06-15 12:48:59 -03:00
cdc_mbim.txt Documentation: fix usb related doc refs 2017-10-12 11:15:48 -06:00
checksum-offloads.txt Documentation: fix networking related doc refs. 2017-10-12 11:21:05 -06:00
conf.py docs-rst: convert networking book to ReST 2017-05-16 08:44:13 -03:00
cops.txt
cxacru-cf.py
cxacru.txt
dccp.txt
dctcp.txt
decnet.txt
defza.txt FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter 2018-10-15 21:46:06 -07:00
devlink-params-bnxt.txt devlink: Add Documentation/networking/devlink-params-bnxt.txt 2018-10-04 13:49:43 -07:00
devlink-params.txt devlink: Add 'fw_load_policy' generic parameter 2018-12-03 13:55:43 -08:00
dns_resolver.txt doc: ReSTify keys-request-key.txt 2017-05-18 10:33:51 -06:00
driver.txt
eql.txt
failover.rst net: Introduce generic failover module 2018-05-28 22:59:54 -04:00
fib_trie.txt
filter.txt bpf, doc: Document Jump X addressing mode 2018-10-08 10:20:56 +02:00
fore200e.txt
framerelay.txt
gen_stats.txt net: sched: do not acquire qdisc spinlock in qdisc/class stats dump 2016-06-07 16:37:14 -07:00
generic_netlink.txt
generic-hdlc.txt
gtp.txt docs: networking: fix minor typos in various documentation files 2018-06-04 17:21:28 -04:00
hinic.txt net-next/hinic: Initialize hw interface 2017-08-22 10:48:52 -07:00
ieee802154.txt doc: linux-wpan: Fulfill the description of missed 802.15.4 APIs 2017-11-29 16:49:40 +01:00
ila.txt docs: networking: fix minor typos in various documentation files 2018-06-04 17:21:28 -04:00
index.rst documentation of some IP/ICMP snmp counters 2018-11-11 09:59:02 -08:00
ip_dynaddr.txt
ip-sysctl.txt neighbor: Improve garbage collection 2018-12-07 16:03:10 -08:00
ipddp.txt
iphase.txt
ipsec.txt docs: networking: fix minor typos in various documentation files 2018-06-04 17:21:28 -04:00
ipv6.txt
ipvlan.txt docs: networking: fix minor typos in various documentation files 2018-06-04 17:21:28 -04:00
ipvs-sysctl.txt ipvs: Document sysctl pmtu_disc 2017-03-16 13:33:39 +01:00
kapi.rst sfp: add documentation for kernel APIs 2017-12-05 11:16:19 -05:00
kcm.txt docs: networking: fix minor typos in various documentation files 2018-06-04 17:21:28 -04:00
l2tp.txt net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_* 2016-12-10 23:29:11 -05:00
lapb-module.txt
ltpc.txt
mac80211-auth-assoc-deauth.txt
mac80211-injection.txt mac80211: document only injected *_RADIOTAP_* flags 2016-04-05 10:48:57 +02:00
mpls-sysctl.txt mpls: allow TTL propagation from IP packets to be configured 2017-03-13 15:29:22 -07:00
msg_zerocopy.rst sock: remove zerocopy sockopt restriction on closed tcp state 2018-03-14 12:51:28 -04:00
multiqueue.txt
net_dim.txt Documentation/networking: Add net DIM documentation 2018-03-22 14:50:44 -04:00
net_failover.rst docs: networking: Fix failover build warnings 2018-07-16 11:23:54 -07:00
netconsole.txt docs: fix locations of several documents that got moved 2016-10-24 08:12:35 -02:00
netdev-FAQ.rst docs: net: Convert netdev-FAQ to restructured text 2018-07-26 21:27:54 -07:00
netdev-features.txt docs-networking: fix typo in define 2018-11-21 10:30:30 -08:00
netdevices.txt net: remove NETDEV_TX_LOCKED support 2016-04-26 15:53:05 -04:00
netfilter-sysctl.txt netfilter: allow logging from non-init namespaces 2017-02-02 14:31:58 +01:00
netif-msg.txt
nf_conntrack-sysctl.txt docs: networking: fix minor typos in various documentation files 2018-06-04 17:21:28 -04:00
nf_flowtable.txt netfilter: add flowtable documentation 2018-03-30 11:04:41 +02:00
nfc.txt
openvswitch.txt
operstates.txt
packet_mmap.txt doc: remove out of date links and info from packet mmap 2018-03-16 10:48:52 -04:00
phonet.txt
phy.txt net: phy: Delete unused function phy_ethtool_gset 2017-06-06 15:12:28 -04:00
pktgen.txt Documentation/pktgen: Clearify how-to use pktgen samples 2018-01-24 15:03:36 -05:00
PLIP.txt
ppp_generic.txt ppp: remove the PPPIOCDETACH ioctl 2018-05-24 22:55:07 -04:00
proc_net_tcp.txt
radiotap-headers.txt
ray_cs.txt
rds.txt Documentation: RDS: Document Multipath RDS (mprds) 2016-07-15 11:36:58 -07:00
regulatory.txt cfg80211: reg: remove support for built-in regdb 2017-10-11 13:18:51 +02:00
rxrpc.txt rxrpc: Fix life check 2018-11-15 11:35:40 -08:00
scaling.txt Documentation: Add explanation for XPS using Rx-queue(s) map 2018-07-02 09:06:24 +09:00
sctp.txt
secid.txt
seg6-sysctl.txt ipv6: sr: add documentation file for per-interface sysctls 2016-11-09 20:40:06 -05:00
segmentation-offloads.txt net: use skb_is_gso_sctp() instead of open-coding 2018-03-09 11:41:47 -05:00
skfp.txt
snmp_counter.rst add documents for snmp counters 2018-11-27 15:39:37 -08:00
strparser.txt strparser: Corrected typo in documentation. 2018-06-24 16:40:20 +09:00
switchdev.txt Documentation: networking: fix ASCII art in switchdev.txt 2017-09-18 16:38:46 -07:00
tc-actions-env-rules.txt
tcp-thin.txt
team.txt
timestamping.txt net: Fix minor code bug in timestamping.txt 2017-07-11 13:34:54 -07:00
tls.txt tls: Add receive path documentation 2018-03-23 12:25:54 -04:00
tproxy.txt netfilter: doc: Add nf_tables part in tproxy.txt 2018-08-16 19:37:07 +02:00
tuntap.txt
udplite.txt
vrf.txt net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs 2018-11-07 16:12:38 -08:00
vxlan.txt documentation: bring vxlan documentation more up-to-date 2015-08-12 16:46:30 -07:00
x25-iface.txt
x25.txt
xfrm_device.txt xfrm: allow driver to quietly refuse offload 2018-08-29 08:04:44 +02:00
xfrm_proc.txt xfrm: update the stats documentation 2017-12-22 06:45:48 +01:00
xfrm_sync.txt Documentation: networking: fix spelling mistakes 2016-04-28 14:21:13 -04:00
xfrm_sysctl.txt
z8530book.rst docs-rst: convert scsi DocBook to ReST 2017-05-16 08:44:15 -03:00
z8530drv.txt