linux/net
Eric Dumazet 1ddbcb005c net: fix rtable leak in net/ipv4/route.c
Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation
 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
 patch has at least one critical flaw, and another problem.

 rt_intern_hash calculates rthi pointer, which is later used for new entry
 insertion. The same loop calculates cand pointer which is used to clean the
 list. If the pointers are the same, rtable leak occurs, as first the cand is
 removed then the new entry is appended to it.

 This leak leads to unregister_netdevice problem (usage count > 0).

 Another problem of the patch is that it tries to insert the entries in certain
 order, to facilitate counting of entries distinct by all but QoS parameters.
 Unfortunately, referencing an existing rtable entry moves it to list beginning,
 to speed up further lookups, so the carefully built order is destroyed.

 For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
 it will also destroy the ordering.
end_of_quotation

Problematic commit is 1080d709fb
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.

Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:18:02 -07:00
..
9p net/9p: handle correctly interrupted 9P requests 2009-04-05 16:54:53 -05:00
802 tr: fix leakage of device in net/802/tr.c 2009-04-11 01:43:17 -07:00
8021q vlan: update vlan carrier state for admin up/down 2009-04-25 18:03:35 -07:00
appletalk proc 2/2: remove struct proc_dir_entry::owner 2009-03-31 01:14:44 +04:00
atm Subject: [PATCH] br2684: restore net_dev initialization 2009-05-02 13:49:36 -07:00
ax25 ax25: proc uid file misses header 2009-04-20 02:14:59 -07:00
bluetooth Bluetooth: Don't trigger disconnect timeout for security mode 3 pairing 2009-05-09 18:09:52 -07:00
bridge bridge: fix initial packet flood if !STP 2009-05-17 21:12:55 -07:00
can can: Network Drop Monitor: Make use of consume_skb() in af_can.c 2009-04-17 01:38:46 -07:00
core net: fix skb_seq_read returning wrong offset/length for page frag data 2009-05-18 21:43:27 -07:00
dcb DCB: fix kfree(skb) 2009-01-04 17:29:21 -08:00
dccp dccp: Do not let initial option overhead shrink the MPS 2009-03-02 03:07:23 -08:00
decnet net/*: use linux/kernel.h swap() 2009-03-21 13:36:17 -07:00
dsa dsa: add switch chip cascading support 2009-03-21 19:06:54 -07:00
econet net: convert usage of packet_type to read_mostly 2009-03-10 05:22:43 -07:00
ethernet eth: Declare an optimized compare_ether_addr_64bits() function 2008-11-23 23:24:32 -08:00
ipv4 net: fix rtable leak in net/ipv4/route.c 2009-05-20 17:18:02 -07:00
ipv6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-05-05 12:00:53 -07:00
ipx ipx: use constant for strings and desciptor 2009-03-21 19:06:51 -07:00
irda proc tty: switch ircomm to ->proc_fops 2009-04-01 08:59:10 -07:00
iucv af_iucv: Fix race when queuing incoming iucv messages 2009-04-21 23:43:15 -07:00
key af_key: remove some pointless conditionals before kfree_skb() 2009-02-26 23:07:32 -08:00
lapb
llc proc 2/2: remove struct proc_dir_entry::owner 2009-03-31 01:14:44 +04:00
mac80211 mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel 2009-05-11 15:07:01 -04:00
netfilter ipvs: Fix IPv4 FWMARK virtual services 2009-05-08 14:54:47 -07:00
netlabel netlabel: Always remove the correct address selector 2009-04-22 00:46:09 -07:00
netlink Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-03-26 15:23:24 -07:00
netrom net/netrom: Fix socket locking 2009-04-22 00:49:51 -07:00
packet packet: avoid warnings when high-order page allocation fails 2009-04-15 03:39:52 -07:00
phonet trivial: fix typos/grammar errors in Kconfig texts 2009-03-30 15:22:01 +02:00
rds FRV: Fix the section attribute on UP DECLARE_PER_CPU() 2009-04-21 19:39:59 -07:00
rfkill net/rfkill/rfkill.c: fix unused rfkill_led_trigger() warning 2009-01-04 17:11:24 -08:00
rose Revert "rose: zero length frame filtering in af_rose.c" 2009-04-14 20:28:00 -07:00
rxrpc RxRPC: Fix a potential NULL dereference 2009-02-06 21:50:52 -08:00
sched sch_teql: should not dereference skb after ndo_start_xmit() 2009-05-18 15:12:31 -07:00
sctp proc 2/2: remove struct proc_dir_entry::owner 2009-03-31 01:14:44 +04:00
sunrpc Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux 2009-05-12 17:11:56 -07:00
tipc tipc: fix non-const printf format arguments 2009-03-18 19:11:29 -07:00
unix New helper - current_umask() 2009-03-31 23:00:26 -04:00
wanrouter wanrouter: fix sparse warnings: context imbalance 2009-02-26 23:13:36 -08:00
wimax wimax: oops: wimax_dev_add() is the only one that can initialize the state 2009-05-06 13:48:37 -07:00
wireless cfg80211: fix comment on regulatory hint processing 2009-05-04 16:22:14 -04:00
x25 af_rose/x25: Sanity check the maximum user frame size 2009-03-27 00:28:21 -07:00
xfrm xfrm: wrong hash value for temporary SA 2009-04-27 02:58:59 -07:00
compat.c net: socket infrastructure for SO_TIMESTAMPING 2009-02-15 22:43:35 -08:00
Kconfig net: remove stale reference to fastroute from Kconfig help text 2009-05-07 16:31:01 -07:00
Makefile RDS: Kconfig and Makefile 2009-02-26 23:43:35 -08:00
nonet.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2009-04-06 18:05:43 -07:00
sysctl_net.c net: sysctl_net - use net_eq to compare nets 2009-03-16 16:23:30 +01:00
TUNABLE