Commit Graph

26380 Commits

Author SHA1 Message Date
Akinobu Mita
e76e4320a2 batman-adv: rename random32() to prandom_u32()
Use more preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Cc: Antonio Quartulli <ordex@autistici.org>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:12 +08:00
Marek Lindner
88a32c9a8a batman-adv: kernel doc for types.h
Thanks to Sven Eckelmann and Simon Wunderlich for their support.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:11 +08:00
Marek Lindner
712bbfe46b batman-adv: rename batadv_claim struct to make clear it is used by bla
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:11 +08:00
Marek Lindner
bae9877471 batman-adv: rename batadv_backbone_gw struct to make clear it is used by bla
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:11 +08:00
Marek Lindner
28500f07ab batman-adv: rename batadv_recvlist_node struct to make clear it is used by vis
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:10 +08:00
Marek Lindner
015b4ae4a3 batman-adv: rename batadv_if_list_entry struct to make clear it is used by vis
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:10 +08:00
Marek Lindner
2006fea820 batman-adv: group tt type definitions together
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:10 +08:00
Marek Lindner
0abf5d8117 batman-adv: mark debug_log struct as bat_priv only struct
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:10 +08:00
Marek Lindner
b6d0ab7ca3 batman-adv: align kernel doc properly
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-19 21:18:09 +08:00
Antonio Quartulli
7241444209 batman-adv: a delayed_work has to be initialised once
A delayed_work struct does not need to be initialized each
every time before being enqueued. Therefore the
INIT_DELAYED_WORK() macro should be used during the
initialization process only.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-19 21:18:09 +08:00
Hannes Frederic Sowa
1ad759d847 ipv6: remove unneeded check to pskb_may_pull in ipip6_rcv
This is already checked by the caller (tunnel64_rcv) and brings ipip6_rcv
in line with ipip_rcv.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:43:51 -05:00
YOSHIFUJI Hideaki / 吉藤英明
115b0aa6b4 ndisc: Check NS message length before access.
Check message length before accessing "target" field,
as we do for other types.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明
12fd84f438 ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers.
Because of rt->n removal, we do not need neigh argument any more.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:41:13 -05:00
Alan Ott
ee21c7e0d1 6lowpan: Handle uncompressed IPv6 packets over 6LoWPAN
Handle the reception of uncompressed packets (dispatch type = IPv6).

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:18:30 -05:00
Alan Ott
0c446212c4 6lowpan: Refactor packet delivery into a function
Refactor the handing of the skb's to the individual lowpan devices into a
function.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:18:30 -05:00
YOSHIFUJI Hideaki / 吉藤英明
887c95cc1d ipv6: Complete neighbour entry removal from dst_entry.
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明
6fd6ce2056 ipv6: Do not depend on rt->n in ip6_finish_output2().
If neigh is not found, create new one.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明
707be1ff3d ipv6: Do not depend on rt->n in ip6_dst_lookup_tail().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明
2152caea71 ipv6: Do not depend on rt->n in rt6_probe().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明
145a36217a ipv6: Do not depend on rt->n in rt6_check_neigh().
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明
c440f1609b ipv6: Do not depend on rt->n in ip6_pol_route().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明
dd0cbf29b1 ipv6 route: Dump gateway based on RTF_GATEWAY flag and rt->rt6i_gateway.
Do not depend on rt->n.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明
8e022ee63f ndisc: Remove tbl argument for __ipv6_neigh_lookup().
We can refer to nd_tbl directly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明
7ff74a596b ndisc: Update neigh->updated with write lock.
neigh->nud_state and neigh->updated are under protection of
neigh->lock.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:18 -05:00
Jesper Dangaard Brouer
c2a936600f net: increase fragment memory usage limits
Increase the amount of memory usage limits for incomplete
IP fragments.

Arguing for new thresh high/low values:

 High threshold = 4 MBytes
 Low  threshold = 3 MBytes

The fragmentation memory accounting code, tries to account for the
real memory usage, by measuring both the size of frag queue struct
(inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.

We want to be able to handle/hold-on-to enough fragments, to ensure
good performance, without causing incomplete fragments to hurt
scalability, by causing the number of inet_frag_queue to grow too much
(resulting longer searches for frag queues).

For IPv4, how much memory does the largest frag consume.

Maximum size fragment is 64K, which is approx 44 fragments with
MTU(1500) sized packets. Sizeof(struct ipq) is 200.  A 1500 byte
packet results in a truesize of 2944 (not 2048 as I first assumed)

  (44*2944)+200 = 129736 bytes

The current default high thresh of 262144 bytes, is obviously
problematic, as only two 64K fragments can fit in the queue at the
same time.

How many 64K fragment can we fit into 4 MBytes:

  4*2^20/((44*2944)+200) = 32.34 fragment in queues

An attacker could send a separate/distinct fake fragment packets per
queue, causing us to allocate one inet_frag_queue per packet, and thus
attacking the hash table and its lists.

How many frag queue do we need to store, and given a current hash size
of 64, what is the average list length.

Using one MTU sized fragment per inet_frag_queue, each consuming
(2944+200) 3144 bytes.

  4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length

An attack could send small fragments, the smallest packet I could send
resulted in a truesize of 896 bytes (I'm a little surprised by this).

  4*2^20/(896+200)  = 3827 frag queues -> 59 avg list length

When increasing these number, we also need to followup with
improvements, that is going to help scalability.  Simply increasing
the hash size, is not enough as the current implementation does not
have a per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 14:29:53 -05:00
Cong Wang
7a9885b93b xfrm: use separated locks to protect pointers of struct xfrm_state_afinfo
afinfo->type_map and afinfo->mode_map deserve separated locks,
they are different things.

We should just take RCU read lock to protect afinfo itself,
but not for the inner pointers.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-01-17 10:03:57 +01:00
Vincent Bernat
d59577b6ff sk-filter: Add ability to lock a socket filter program
While a privileged program can open a raw socket, attach some
restrictive filter and drop its privileges (or send the socket to an
unprivileged program through some Unix socket), the filter can still
be removed or modified by the unprivileged program. This commit adds a
socket option to lock the filter (SO_LOCK_FILTER) preventing any
modification of a socket filter program.

This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even
root is not allowed change/drop the filter.

The state of the lock can be read with getsockopt(). No error is
triggered if the state is not changed. -EPERM is returned when a user
tries to remove the lock or to change/remove the filter while the lock
is active. The check is done directly in sk_attach_filter() and
sk_detach_filter() and does not affect only setsockopt() syscall.

Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 03:21:25 -05:00
Cong Wang
5bd30d3987 netpoll: fix a missing dev refcounting
__dev_get_by_name() doesn't refcount the network device,
so we have to do this by ourselves. Noticed by Eric.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-16 23:33:06 -05:00
Cong Wang
f92d318023 netpoll: fix a rtnl lock assertion failure
v4: hold rtnl lock for the whole netpoll_setup()
v3: remove the comment
v2: use RCU read lock

This patch fixes the following warning:

[   72.013864] RTNL: assertion failed at net/core/dev.c (4955)
[   72.017758] Pid: 668, comm: netpoll-prep-v6 Not tainted 3.8.0-rc1+ #474
[   72.019582] Call Trace:
[   72.020295]  [<ffffffff8176653d>] netdev_master_upper_dev_get+0x35/0x58
[   72.022545]  [<ffffffff81784edd>] netpoll_setup+0x61/0x340
[   72.024846]  [<ffffffff815d837e>] store_enabled+0x82/0xc3
[   72.027466]  [<ffffffff815d7e51>] netconsole_target_attr_store+0x35/0x37
[   72.029348]  [<ffffffff811c3479>] configfs_write_file+0xe2/0x10c
[   72.030959]  [<ffffffff8115d239>] vfs_write+0xaf/0xf6
[   72.032359]  [<ffffffff81978a05>] ? sysret_check+0x22/0x5d
[   72.033824]  [<ffffffff8115d453>] sys_write+0x5c/0x84
[   72.035328]  [<ffffffff819789d9>] system_call_fastpath+0x16/0x1b

In case of other races, hold rtnl lock for the entire netpoll_setup() function.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-16 15:26:03 -05:00
Cong Wang
85168c0036 xfrm: replace rwlock on xfrm_km_list with rcu
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-01-16 11:03:34 +01:00
Cong Wang
44abdc3047 xfrm: replace rwlock on xfrm_state_afinfo with rcu
Similar to commit 418a99ac6a
(Replace rwlock on xfrm_policy_afinfo with rcu), the rwlock
on xfrm_state_afinfo can be replaced by RCU too.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-01-16 11:03:34 +01:00
Eric Dumazet
757b8b1d2b net_sched: fix qdisc_pkt_len_init()
commit 1def9238d4 (net_sched: more precise pkt_len computation)
does a wrong computation of mac + network headers length, as it includes
the padding before the frame.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-16 00:41:19 -05:00
David S. Miller
4b87f92259 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/networking/ip-sysctl.txt
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

Both conflicts were simply overlapping context.

A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-15 15:05:59 -05:00
David S. Miller
47fb3a26e2 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains netfilter fixes for 3.8-rc3,
they are:

* fix possible BUG_ON if several netns are in use and the nf_conntrack
  module is removed, initial patch from Gao feng, final patch from myself.

* fix unset return value if conntrack zone are disabled at
  compile-time, reported by Borislav Petkov, fix from myself.

* fix display error message via dmesg for arp_tables, from Jan Engelhardt.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 18:26:41 -05:00
Eric Dumazet
cce894bb82 tcp: fix a panic on UP machines in reqsk_fastopen_remove
spin_is_locked() on a non !SMP build is kind of useless.

BUG_ON(!spin_is_locked(xx)) is guaranteed to crash.

Just remove this check in reqsk_fastopen_remove() as
the callers do hold the socket lock.

Reported-by: Ketan Kulkarni <ketkulka@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Dave Taht <dave.taht@gmail.com>
Acked-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 18:10:05 -05:00
Florian Fainelli
f9a8f83b04 net: phy: remove flags argument from phy_{attach, connect, connect_direct}
The flags argument of the phy_{attach,connect,connect_direct} functions
is then used to assign a struct phy_device dev_flags with its value.
All callers but the tg3 driver pass the flag 0, which results in the
underlying PHY drivers in drivers/net/phy/ not being able to actually
use any of the flags they would set in dev_flags. This patch gets rid of
the flags argument, and passes phydev->dev_flags to the internal PHY
library call phy_attach_direct() such that drivers which actually modify
a phy device dev_flags get the value preserved for use by the underlying
phy driver.

Acked-by: Kosta Zertsekel <konszert@marvell.com>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:11:50 -05:00
Benjamin LaHaise
c1b52739e4 pkt_sched: namespace aware act_mirred
Eric Dumazet pointed out that act_mirred needs to find the current net_ns,
and struct net pointer is not provided in the call chain.  His original
patch made use of current->nsproxy->net_ns to find the network namespace,
but this fails to work correctly for userspace code that makes use of
netlink sockets in different network namespaces.  Instead, pass the
"struct net *" down along the call chain to where it is needed.

This version removes the ifb changes as Eric has submitted that patch
separately, but is otherwise identical to the previous version.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:09:36 -05:00
YOSHIFUJI Hideaki / 吉藤英明
6059283378 ipv6 netevent: Remove old_neigh from netevent_redirect.
The only user is cxgb3 driver.

old_neigh is used to check device change, but it must not happen
on redirect.  In this sense, we can remove old_neigh argument.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:04:59 -05:00
Linus Torvalds
6843cc0e0f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix regression allowing IP_TTL setting of zero, fix from Cong Wang.

 2) Fix leak regressions in tunap, from Jason Wang.

 3) be2net driver always returns IRQ_HANDLED in INTx handler, fix from
    Sathya Perla.

 4) qlge doesn't really support NETIF_F_TSO6, don't set that flag.  Fix
    from Amerigo Wang.

 5) Add 802.11ad Atheros wil6210 driver, from Vladimir Kondratiev.

 6) Fix MTU calculations in mac80211 layer, from T Krishna Chaitanya.

 7) Station info layer of mac80211 needs to use del_timer_sync(), from
    Johannes Berg.

 8) tcp_read_sock() can loop forever, because we don't immediately stop
    when recv_actor() returns zero.  Fix from Eric Dumazet.

 9) Fix WARN_ON() in tcp_cleanup_rbuf().  We have to use sk_eat_skb() in
    tcp_recv_skb() to handle the case where a large GRO packet is split
    up while it is use by a splice() operation.  Fix also from Eric
    Dumazet.

10) addrconf_get_prefix_route() in ipv6 tests flags incorrectly, it
    does:

        if (X && (p->flags & Y) != 0)

    when it really meant to go:

        if (X && (p->flags & X) != 0)

    fix from Romain Kuntz.

11) Fix lost Kconfig dependency for bfin_mac driver hardware
    timestamping.  From Lars-Peter Clausen.

12) Fix regression in handling of RST without ACK in TCP, from Eric
    Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
  be2net: fix unconditionally returning IRQ_HANDLED in INTx
  tuntap: fix leaking reference count
  tuntap: forbid calling TUNSETIFF when detached
  tuntap: switch to use rtnl_dereference()
  net, wireless: overwrite default_ethtool_ops
  qlge: remove NETIF_F_TSO6 flag
  tcp: accept RST without ACK flag
  net: ethernet: xilinx: Do not use NO_IRQ in axienet
  net: ethernet: xilinx: Do not use axienet on PPC
  bnx2x: Allow management traffic after boot from SAN
  bnx2x: Fix fastpath structures when memory allocation fails
  bfin_mac: Restore hardware time-stamping dependency on BF518
  tun: avoid owner checks on IFF_ATTACH_QUEUE
  bnx2x: move debugging code before the return
  tuntap: refuse to re-attach to different tun_struct
  ipv6: use addrconf_get_prefix_route for prefix route lookup [v2]
  ipv6: fix the noflags test in addrconf_get_prefix_route
  tcp: fix splice() and tcp collapsing interaction
  tcp: splice: fix an infinite loop in tcp_read_sock()
  net: prevent setting ttl=0 via IP_TTL
  ...
2013-01-14 08:27:10 -08:00
David S. Miller
a4412a681a Included changes:
- use per_cpu_add when possible
 - prevent the TT component to add multicast address as "mesh clients"
 - some debug output improvements
 - proper lockdeps class initializations
 - new style fixes (space before/after brackets)
 - other minor fixes and refactoring
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ8y1GAAoJEADl0hg6qKeOgegQAKq0TMD2XVDpFthqwszcEZ7o
 1zunQ/UGp1+8ZZ+GiJ3B8VxbFpq4MYybC/J2eow7WmlDPMGmRCS5dTsYn1tkYiwY
 MgBGhzuoDLcSpKZsJTrfzu8l6CkgdRHf3hLVON/UNu5SDueEXxPO7cRCrOtfNR/g
 auFxNIfCderWaBiRmSQ2BZdAhSwdsh0igqg87YVETuXJrFBG4yLTAEerxOU2m+eC
 v2YrNI+tRIlCoE8N8o3cJwkpCml+teFElA0Z7KxswHzmTIipkALd/LWBaU/GDKWS
 Pto5n9bH2oIS4VywxH3OB5Gd8NNnEPlxUblupZljlcHLJiVD06qFYCQx0/iVon5j
 Dbcfc9P9RSL4pp1zrOR6VvtOmrrtn06HXVdIT0Fse9ZjgaHX0WzdMV2ZM830xmdq
 AslxOcZsuQMXmiPr8pwmHvIwSfVnWPmQpkXyP8NCYIqM6NYzznAuLdnbT6bix1wC
 sPPtNEip2HW0d524qWiyVDrSuJ44tndk6t6Ycjw0uv/KnGX4XWUi4XaR6/qVOCES
 EmGVt+cQhqPTl+qzXsyfGoyE5cieU0BfTyvQ//5JS/At9H4VGNGsBvJx3T2HD0+0
 fkP3uu9EvChZWDKGQbJE3FMeBG8v+xtQIIS81UCRrryIpkCY4Txl264wdvDcjbup
 DzS6fJgDo2WnRNs8JPT8
 =G3Vh
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- use per_cpu_add when possible
- prevent the TT component to add multicast address as "mesh clients"
- some debug output improvements
- proper lockdeps class initializations
- new style fixes (space before/after brackets)
- other minor fixes and refactoring

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:36:56 -05:00
YOSHIFUJI Hideaki / 吉藤英明
dd3332bfcb ipv6: Store Router Alert option in IP6CB directly.
Router Alert option is very small and we can store the value
itself in the skb.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明
2b464f61f0 ipv6 xfrm: Use ipv6_addr_hash() in xfrm6_tunnel_spi_hash_byaddr().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明
c08977bb2b ipv6 route: Use ipv6_addr_hash() in rt6_info_hash_nhsfn().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明
daad151263 ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().
Move generalized version of ipv6_is_mld() to header,
and use it from ip6_mc_input().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明
e7219858ac ipv6: Use ipv6_get_dsfield() instead of ipv6_tclass().
Commit 7a3198a8 ("ipv6: helper function to get tclass") introduced
ipv6_tclass(), but similar function is already available as
ipv6_get_dsfield().

We might be able to call ipv6_tclass() from ipv6_get_dsfield(),
but it is confusing to have two versions.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明
6502ca527f ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明
3e4e4c1f2d ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.
This is not only for readability but also for optimization.
What we do here is to build the 32bit word at the beginning of the ipv6
header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and
we do not need to read the tclass portion of the target buffer.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
Jan Engelhardt
5b76c4948f netfilter: x_tables: print correct hook names for ARP
arptables 0.0.4 (released on 10th Jan 2013) supports calling the
CLASSIFY target, but on adding a rule to the wrong chain, the
diagnostic is as follows:

	# arptables -A INPUT -j CLASSIFY --set-class 0:0
	arptables: Invalid argument
	# dmesg | tail -n1
	x_tables: arp_tables: CLASSIFY target: used from hooks
	PREROUTING, but only usable from INPUT/FORWARD

This is incorrect, since xt_CLASSIFY.c does specify
(1 << NF_ARP_OUT) | (1 << NF_ARP_FORWARD).

This patch corrects the x_tables diagnostic message to print the
proper hook names for the NFPROTO_ARP case.

Affects all kernels down to and including v2.6.31.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-13 12:54:12 +01:00
Pablo Neira Ayuso
1e47ee8367 netfilter: nf_conntrack: fix BUG_ON while removing nf_conntrack with netns
canqun zhang reported that we're hitting BUG_ON in the
nf_conntrack_destroy path when calling kfree_skb while
rmmod'ing the nf_conntrack module.

Currently, the nf_ct_destroy hook is being set to NULL in the
destroy path of conntrack.init_net. However, this is a problem
since init_net may be destroyed before any other existing netns
(we cannot assume any specific ordering while releasing existing
netns according to what I read in recent emails).

Thanks to Gao feng for initial patch to address this issue.

Reported-by: canqun zhang <canqunzhang@gmail.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-12 14:12:36 +01:00
Marek Lindner
0c430d0d7b batman-adv: unbloat batadv_priv if debug is not enabled
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-12 20:58:23 +10:00
Marek Lindner
9338026107 batman-adv: remove unused variable from orig_node struct
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-12 20:58:23 +10:00
Antonio Quartulli
c0275e243c batman-adv: fix typo in debug message
in bat_iv_ogm.c a debug message should print "tq" instead of "td"

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:23 +10:00
Antonio Quartulli
467b5fe697 batman-adv: use the const qualifier in hash functions
The data argument in each hash function should carry the
"const" qualifier as it is never modified.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:22 +10:00
Antonio Quartulli
fa706554d6 batman-adv: don't compile the BLA switch if not requested
When the Bridge Loop Avoidance component is not compiled-in, its boolean switch
should be not compiled as well. This patch surrounds the switch with a proper
ifdef.

This behaviour was introduced by 9fd6b0615b5499b270d39a92b8790e206cf75833
("batman-adv: add bridge loop avoidance compile option")

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:22 +10:00
Antonio Quartulli
3f87c4239f batman-adv: remove useless NULL check
debugfs_remove_recursive() checks whether its argument is not null
on its own, therefore it is possible to remove the external check.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:22 +10:00
Antonio Quartulli
46d160ef88 batman-adv: remove useless blank lines before and after brackets
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:21 +10:00
Antonio Quartulli
dec05074b1 batman-adv: Initialize lockdep class keys for hashes
Different hashes have the same class key because they get
initialised with the same one. For this reason lockdep can create
false warning when they are used recursively.

Re-initialise the key for each hash after the invocation to hash_new()
to avoid this problem.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Tested-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:21 +10:00
Antonio Quartulli
8425ec6aea batman-adv: remove useless assignment in tt_local_add()
The flag field of the tt_local_entry->common structure in
tt_local_add() is first assigned NO_FLAGS and then TT_CLIENT_NEW so
nullifying the first operation. For this reason it is safe to remove
the first assignment.

This was introuduced by ("batman-adv: keep local table consistency for
further TT_RESPONSE")

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:20 +10:00
Antonio Quartulli
39a3299158 batman-adv: unify and properly print hex values
Values are printed in hexadecimal format in several points in the
code, but they are not printed using the same format string.

This patches unifies the format used for such numbers so that they
look the same everywhere.

Given the fact that all the variables printed as hexadecimal are 16
bit long, this is the chosen printing format: %#.4x

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:20 +10:00
Antonio Quartulli
f9d8a53784 batman-adv: print the CRC together with the translation tables
To simplify debugging operations, it is better to print the related
CRC together with the translation table (local CRC for the local
table and global CRC for each entry in the global table)

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:20 +10:00
Antonio Quartulli
85766a8200 batman-adv: improve local translation table output
This patch adds a nice header to the local translation table and
the last_seen time for each local entry

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:20 +10:00
Antonio Quartulli
7cf4d520fd batman-adv: reduce local TT entry timeout to 10 minutes
The current timeout is set to one hour. However a client connected to the mesh
network will always generate traffic. In the worst case it will send ARP
requests every 4 or 5 minutes. On the other hand having a long timeout means
storing dead entries for one hour and it leads to very big trans-tables
containing useless clients.

This patch reduces the timeout to 10 minutes

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12 20:58:19 +10:00
Linus Lüssing
02233e0c75 batman-adv: Do not add multicast MAC addresses to translation table
The current translation table mechanism is not suitable for multicast
addresses and we are currently flooding such frames anyway.

Therefore this patch prevents multicast MAC addresses being added to the
translation table.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-12 20:58:19 +10:00
Shan Wei
569174433d batman-adv: use per_cpu_add helper
this_cpu_add is an atomic operation.
and be more faster than per_cpu_ptr operation.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-12 20:58:19 +10:00
Eric Dumazet
18aafc622a net: splice: fix __splice_segment()
commit 9ca1b22d6d (net: splice: avoid high order page splitting)
forgot that skb->head could need a copy into several page frags.

This could be the case for loopback traffic mostly.

Also remove now useless skb argument from linear_to_page()
and __splice_segment() prototypes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11 16:48:08 -08:00
Rami Rosen
28a28283f8 ipv4: fib: fix a comment.
In fib_frontend.c, there is a confusing comment; NETLINK_CB(skb).portid does not
refer to a pid of sending process, but rather to a netlink portid.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11 15:58:08 -08:00
Stanislaw Gruszka
d07d7507bf net, wireless: overwrite default_ethtool_ops
Since:

commit 2c60db0370
Author: Eric Dumazet <edumazet@google.com>
Date:   Sun Sep 16 09:17:26 2012 +0000

    net: provide a default dev->ethtool_ops

wireless core does not correctly assign ethtool_ops.

After alloc_netdev*() call, some cfg80211 drivers provide they own
ethtool_ops, but some do not. For them, wireless core provide generic
cfg80211_ethtool_ops, which is assigned in NETDEV_REGISTER notify call:

        if (!dev->ethtool_ops)
                dev->ethtool_ops = &cfg80211_ethtool_ops;

But after Eric's commit, dev->ethtool_ops is no longer NULL (on cfg80211
drivers without custom ethtool_ops), but points to &default_ethtool_ops.

In order to fix the problem, provide function which will overwrite
default_ethtool_ops and use it by wireless core.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11 15:55:48 -08:00
Alexander Duyck
87696f9234 net: Export __netdev_pick_tx so that it can be used in modules
When testing with FCoE enabled we discovered that I had not exported
__netdev_pick_tx.  As a result ixgbe doesn't build with the RFC patches
applied because ixgbe_select_queue was calling the function.  This change
corrects that build issue by correctly exporting __netdev_pick_tx so it
can be used by modules.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11 15:47:27 -08:00
Linus Torvalds
93ccb3910a NFS client bugfixe for Linux 3.8
- Fix a socket lock leak in net/sunrpc/xprt.c
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ8FNyAAoJEGcL54qWCgDy1EAP/jetZgUmOLCV37TVAFDPkaDy
 ADjeIshsJt7T2/2zKWBoDQ4sKSNO3wRbuSQ9gaMPglfdf8j3PV38+2MOyL3L4yTp
 2L5RqVrbzs+xgIRN7uu6pajVNeZpZb4PqphO+2SnM8uSz6XMVpYRoDtVBiEhgF16
 F9csoBEX5HMC4AFhbkDoKOUoIb13cutYdd+0ijKnAwBrc31YUrcQDwUtZfcp8h2P
 xk4q/k5uj0ilHGafu0BkkMqyQLVocvp/FJXDQ5CjCI73J55hE7lcfM2LMavrJ0gA
 ACxE5+kr0vVOaasvpyu3nkntQ4Td6Z2PYbXCyIIlGvsyqCM8QgqUrfTU9zZauxRa
 mrRWgw0c/mqJ2o41Jl2GxWXCPIoDMX9izdZad3wZ9ct0OTTk6RumHTvnGo1XoZBI
 i5UTVgmnZoOFBQ+gWsxBay9rBjEoG2IBxsew7eEDPCXM0nIG0NztvGK7psFbjR1y
 +wPAgB9+NghOzTwH3GrC1zEK5tpGq1DAbyciT5HC7gk/1ZmfVcvT0iAqO6nkyeyX
 MArMSS6TAgR4IH+gr/qdybnwI6AezGVLiRwCScNPWyHq/gJ9tMCpZ+iodQKxMkoW
 PGHaldLdMWtL+PEEYAmqWclMTaEnnsgMbbqmU1PucWYZ9Ovq2Kktzucczd/2GwdO
 Gh2Utpg0vfAJSZkxy1yK
 =ukG7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfix from Trond Myklebust:

- Fix a socket lock leak in net/sunrpc/xprt.c

* tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Ensure we release the socket write lock if the rpc_task exits early
2013-01-11 12:09:04 -08:00
Eric Dumazet
7b514a886b tcp: accept RST without ACK flag
commit c3ae62af8e (tcp: should drop incoming frames without ACK flag
set) added a regression on the handling of RST messages.

RST should be allowed to come even without ACK bit set. We validate
the RST by checking the exact sequence, as requested by RFC 793 and
5961 3.2, in tcp_validate_incoming()

Reported-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 22:49:30 -08:00
Alexander Duyck
024e9679a2 net: Add support for XPS without sysfs being defined
This patch makes it so that we can support transmit packet steering without
sysfs needing to be enabled.  The reason for making this change is to make
it so that a driver can make use of the XPS even while the sysfs portion of
the interface is not present.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 22:47:04 -08:00
Alexander Duyck
01c5f864e6 net: Rewrite netif_set_xps_queues to address several issues
This change is meant to address several issues I found within the
netif_set_xps_queues function.

If the allocation of one of the maps to be assigned to new_dev_maps failed
we could end up with the device map in an inconsistent state since we had
already worked through a number of CPUs and removed or added the queue.  To
address that I split the process into several steps.  The first of which is
just the allocation of updated maps for CPUs that will need larger maps to
store the queue.  By doing this we can fail gracefully without actually
altering the contents of the current device map.

The second issue I found was the fact that we were always allocating a new
device map even if we were not adding any queues.  I have updated the code
so that we only allocate a new device map if we are adding queues,
otherwise if we are not adding any queues to CPUs we just skip to the
removal process.

The last change I made was to reuse the code from remove_xps_queue to remove
the queue from the CPU.  By making this change we can be consistent in how
we go about adding and removing the queues from the CPUs.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 22:47:04 -08:00
Alexander Duyck
10cdc3f3cd net: Rewrite netif_reset_xps_queue to allow for better code reuse
This patch does a minor refactor on netif_reset_xps_queue to address a few
items I noticed.

First is the fact that we are doing removal of queues in both
netif_reset_xps_queue and netif_set_xps_queue.  Since there is no need to
have the code in two places I am pushing it out into a separate function
and will come back in another patch and reuse the code in
netif_set_xps_queue.

The second item this change addresses is the fact that the Tx queues were
not getting their numa_node value cleared as a part of the XPS queue reset.
This patch resolves that by resetting the numa_node value if the dev_maps
value is set.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 22:47:04 -08:00
Alexander Duyck
537c00de1c net: Add functions netif_reset_xps_queue and netif_set_xps_queue
This patch adds two functions, netif_reset_xps_queue and
netif_set_xps_queue.  The main idea behind these two functions is to
provide a mechanism through which drivers can update their defaults in
regards to XPS.

Currently no such mechanism exists and as a result we cannot use XPS for
things such as ATR which would require a basic configuration to start in
which the Tx queues are mapped to CPUs via a 1:1 mapping.  With this change
I am making it possible for drivers such as ixgbe to be able to use the XPS
feature by controlling the default configuration.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 22:47:03 -08:00
Alexander Duyck
416186fbf8 net: Split core bits of netdev_pick_tx into __netdev_pick_tx
This change splits the core bits of netdev_pick_tx into a separate function.
The main idea behind this is to make this code accessible to select queue
functions when they decide to process the standard path instead of their
own custom path in their select queue routine.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 22:47:03 -08:00
Eric Dumazet
1def9238d4 net_sched: more precise pkt_len computation
One long standing problem with TSO/GSO/GRO packets is that skb->len
doesn't represent a precise amount of bytes on wire.

Headers are only accounted for the first segment.
For TCP, thats typically 66 bytes per 1448 bytes segment missing,
an error of 4.5 % for normal MSS value.

As consequences :

1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits.
2) Device stats are slightly under estimated as well.

Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len
computation.

Packet schedulers should use qdisc pkt_len instead of skb->len for their
bandwidth limitations, and TSO enabled devices drivers could use pkt_len
if their statistics are not hardware assisted, and if they don't scratch
skb->cb[] first word.

Both egress and ingress paths work, thanks to commit fda55eca5a
(net: introduce skb_transport_header_was_set()) : If GRO built
a GSO packet, it also set the transport header for us.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Paolo Valente <paolo.valente@unimore.it>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 14:58:13 -08:00
Randy Dunlap
7144bca681 nfs: fix sunrpc/clnt.c kernel-doc warnings
Fix new kernel-doc warnings in clnt.c:

  Warning(net/sunrpc/clnt.c:561): No description found for parameter 'flavor'
  Warning(net/sunrpc/clnt.c:561): Excess function parameter 'auth' description in 'rpc_clone_client_set_auth'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10 14:35:23 -08:00
Romain Kuntz
21caa6622b ipv6: use addrconf_get_prefix_route for prefix route lookup [v2]
Replace ip6_route_lookup() with addrconf_get_prefix_route() when
looking up for a prefix route. This ensures that the connected prefix
is looked up in the main table, and avoids the selection of other
matching routes located in different tables as well as blackhole
or prohibited entries.

In addition, this fixes an Opps introduced by commit 64c6d08e (ipv6:
del unreachable route when an addr is deleted on lo), that would occur
when a blackhole or prohibited entry is selected by ip6_route_lookup().
Such entries have a NULL rt6i_table argument, which is accessed by
__ip6_del_rt() when trying to lock rt6i_table->tb6_lock.

The function addrconf_is_prefix_route() is not used anymore and is
removed.

[v2] Minor indentation cleanup and log updates.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 14:22:54 -08:00
Romain Kuntz
85da53bf1c ipv6: fix the noflags test in addrconf_get_prefix_route
The tests on the flags in addrconf_get_prefix_route() does no make
much sense: the 'noflags' parameter contains the set of flags that
must not match with the route flags, so the test must be done
against 'noflags', and not against 'flags'.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 14:13:33 -08:00
Eric Dumazet
f26845b43c tcp: fix splice() and tcp collapsing interaction
Under unusual circumstances, TCP collapse can split a big GRO TCP packet
while its being used in a splice(socket->pipe) operation.

skb_splice_bits() releases the socket lock before calling
splice_to_pipe().

[ 1081.353685] WARNING: at net/ipv4/tcp.c:1330 tcp_cleanup_rbuf+0x4d/0xfc()
[ 1081.371956] Hardware name: System x3690 X5 -[7148Z68]-
[ 1081.391820] cleanup rbuf bug: copied AD3BCF1 seq AD370AF rcvnxt AD3CF13

To fix this problem, we must eat skbs in tcp_recv_skb().

Remove the inline keyword from tcp_recv_skb() definition since
it has three call sites.

Reported-by: Christian Becker <c.becker@traviangames.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 14:09:57 -08:00
Eric Dumazet
ff905b1e4a tcp: splice: fix an infinite loop in tcp_read_sock()
commit 02275a2ee7 (tcp: don't abort splice() after small transfers)
added a regression.

[   83.843570] INFO: rcu_sched self-detected stall on CPU
[   83.844575] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=21002 jiffies, g=4457, c=4456, q=13132)
[   83.844582] Task dump for CPU 6:
[   83.844584] netperf         R  running task        0  8966   8952 0x0000000c
[   83.844587]  0000000000000000 0000000000000006 0000000000006c6c 0000000000000000
[   83.844589]  000000000000006c 0000000000000096 ffffffff819ce2bc ffffffffffffff10
[   83.844592]  ffffffff81088679 0000000000000010 0000000000000246 ffff880c4b9ddcd8
[   83.844594] Call Trace:
[   83.844596]  [<ffffffff81088679>] ? vprintk_emit+0x1c9/0x4c0
[   83.844601]  [<ffffffff815ad449>] ? schedule+0x29/0x70
[   83.844606]  [<ffffffff81537bd2>] ? tcp_splice_data_recv+0x42/0x50
[   83.844610]  [<ffffffff8153beaa>] ? tcp_read_sock+0xda/0x260
[   83.844613]  [<ffffffff81537b90>] ? tcp_prequeue_process+0xb0/0xb0
[   83.844615]  [<ffffffff8153c0f0>] ? tcp_splice_read+0xc0/0x250
[   83.844618]  [<ffffffff814dc0c2>] ? sock_splice_read+0x22/0x30
[   83.844622]  [<ffffffff811b820b>] ? do_splice_to+0x7b/0xa0
[   83.844627]  [<ffffffff811ba4bc>] ? sys_splice+0x59c/0x5d0
[   83.844630]  [<ffffffff8119745b>] ? putname+0x2b/0x40
[   83.844633]  [<ffffffff8118bcb4>] ? do_sys_open+0x174/0x1e0
[   83.844636]  [<ffffffff815b6202>] ? system_call_fastpath+0x16/0x1b

if recv_actor() returns 0, we should stop immediately,
because looping wont give a chance to drain the pipe.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 14:07:19 -08:00
Linus Torvalds
7be72c3954 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
 "Add the finit_module system call, fix the irq statistics in
  /proc/stat, fix a s390dbf lockdep problem, a patch revert for a
  problem that is not 100% understood yet, and a few patches to
  fix warnings."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pci: define read*_relaxed functions
  s390/topology: export cpu_topology
  s390/pm: export pm_power_off
  s390/pci: define isa_dma_bridge_buggy
  s390/3215: partially revert tty close handling fix
  s390/irq: count cpu restart events
  s390/irq: remove split irq fields from /proc/stat
  s390/irq: enable irq sum accounting for /proc/stat again
  s390/syscalls: wire up finit_module syscall
  s390/pci: remove dead code
  s390/smp: fix section mismatch for smp_add_present_cpu()
  s390/debug: Fix s390dbf lockdep problem in debug_(un)register_view()
2013-01-10 08:20:15 -08:00
Pablo Neira Ayuso
4610476d89 netfilter: xt_CT: fix unset return value if conntrack zone are disabled
net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v1’:
net/netfilter/xt_CT.c:250:6: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v0’:
net/netfilter/xt_CT.c:112:6: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Reported-by: Borislav Petkov <bp@alien8.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-10 13:11:00 +01:00
YOSHIFUJI Hideaki / 吉藤英明
6c40d100ce ipv6: Use container_of macro instead of magic number to get ipv6 header.
In ipv6_recv_error(), addr_offset points to daddr field of the ip header.
To get ipv6 header, use container_of() macro instead of substracting magic
number (24).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:59:53 -08:00
YOSHIFUJI Hideaki / 吉藤英明
b4fff5f8bf unix: Use FIELD_SIZEOF() in af_unix_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:24 -08:00
YOSHIFUJI Hideaki / 吉藤英明
ce6654cfc1 rxrpc: Use FIELD_SIZEOF() in af_rxrpc_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:24 -08:00
YOSHIFUJI Hideaki / 吉藤英明
3523b29bd2 openvswitch: Use FIELD_SIZEOF() in dp_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:24 -08:00
YOSHIFUJI Hideaki / 吉藤英明
fab2574591 netlink: Use FIELD_SIZEOF() in netlink_proto_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:23 -08:00
YOSHIFUJI Hideaki / 吉藤英明
ba96bcbcd2 ipv6: Use FIELD_SIZEOF() in inet6_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:23 -08:00
YOSHIFUJI Hideaki / 吉藤英明
95c7e0e4d4 ipv4: Use FIELD_SIZEOF() in inet_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:23 -08:00
John W. Linville
a9b8a894ad Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-01-09 11:01:37 -05:00
Jiri Pirko
948b337e62 net: init perm_addr in register_netdevice()
Benefit from the fact that dev->addr_assign_type is set to NET_ADDR_PERM
in case the device has permanent address.

This also fixes the problem that many drivers do not set perm_addr at
all.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 18:00:47 -08:00
Cong Wang
c9be4a5c49 net: prevent setting ttl=0 via IP_TTL
A regression is introduced by the following commit:

	commit 4d52cfbef6
	Author: Eric Dumazet <eric.dumazet@gmail.com>
	Date:   Tue Jun 2 00:42:16 2009 -0700

	    net: ipv4/ip_sockglue.c cleanups

	    Pure cleanups

but it is not a pure cleanup...

	-               if (val != -1 && (val < 1 || val>255))
	+               if (val != -1 && (val < 0 || val > 255))

Since there is no reason provided to allow ttl=0, change it back.

Reported-by: nitin padalia <padalia.nitin@gmail.com>
Cc: nitin padalia <padalia.nitin@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 17:57:10 -08:00
Cong Wang
b3d936f3ea netpoll: add IPv6 support
Currently, netpoll only supports IPv4. This patch adds IPv6
support to netpoll so that we can run netconsole over IPv6 network.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 17:56:10 -08:00
Cong Wang
acb3e04119 ipv6: move csum_ipv6_magic() and udp6_csum_init() into static library
As suggested by David, udp6_csum_init() is too big to be inlined,
move it to ipv6 static library, net/ipv6/ip6_checksum.c.

And the generic csum_ipv6_magic() too.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 17:56:10 -08:00
Cong Wang
b7394d2429 netpoll: prepare for ipv6
This patch adjusts some struct and functions, to prepare
for supporting IPv6.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 17:56:09 -08:00
Eric Dumazet
fda55eca5a net: introduce skb_transport_header_was_set()
We have skb_mac_header_was_set() helper to tell if mac_header
was set on a skb. We would like the same for transport_header.

__netif_receive_skb() doesn't reset the transport header if already
set by GRO layer.

Note that network stacks usually reset the transport header anyway,
after pulling the network header, so this change only allows
a followup patch to have more precise qdisc pkt_len computation
for GSO packets at ingress side.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 17:51:54 -08:00
Trond Myklebust
87ed50036b SUNRPC: Ensure we release the socket write lock if the rpc_task exits early
If the rpc_task exits while holding the socket write lock before it has
allocated an rpc slot, then the usual mechanism for releasing the write
lock in xprt_release() is defeated.

The problem occurs if the call to xprt_lock_write() initially fails, so
that the rpc_task is put on the xprt->sending wait queue. If the task
exits after being assigned the lock by __xprt_lock_write_func, but
before it has retried the call to xprt_lock_and_alloc_slot(), then
it calls xprt_release() while holding the write lock, but will
immediately exit due to the test for task->tk_rqstp != NULL.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
2013-01-08 14:30:43 -05:00
Linus Torvalds
5c33d9b248 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) New sysctl ndisc_notify needs some documentation, from Hanns
    Frederic Sowa.

 2) Netfilter REJECT target doesn't set transport header of SKB
    correctly, from Mukund Jampala.

 3) Forcedeth driver needs to check for DMA mapping failures, from Larry
    Finger.

 4) brcmsmac driver can't use usleep_range while holding locks, use
    udelay instead.  From Niels Ole Salscheider.

 5) Fix unregister of netlink bridge multicast database handlers, from
    Vlad Yasevich and Rami Rosen.

 6) Fix checksum calculations in netfilter's ipv6 network prefix
    translation module.

 7) Fix high order page allocation failures in netfilter xt_recent, from
    Eric Dumazet.

 8) mac802154 needs to use netif_rx_ni() instead of netif_rx() because
    mac802154_process_data() can execute in process rather than
    interrupt context.  From Alexander Aring.

 9) Fix splice handling of MSG_SENDPAGE_NOTLAST, otherwise we elide one
    tcp_push() too many.  From Eric Dumazet and Willy Tarreau.

10) Fix skb->truesize tracking in XEN netfront driver, from Ian
    Campbell.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  xen/netfront: improve truesize tracking
  ipv4: fix NULL checking in devinet_ioctl()
  tcp: fix MSG_SENDPAGE_NOTLAST logic
  net/ipv4/ipconfig: really display the BOOTP/DHCP server's address.
  ip-sysctl: fix spelling errors
  mac802154: fix NOHZ local_softirq_pending 08 warning
  ipv6: document ndisc_notify in networking/ip-sysctl.txt
  ath9k: Fix Kconfig for ATH9K_HTC
  netfilter: xt_recent: avoid high order page allocations
  netfilter: fix missing dependencies for the NOTRACK target
  netfilter: ip6t_NPT: fix IPv6 NTP checksum calculation
  bridge: add empty br_mdb_init() and br_mdb_uninit() definitions.
  vxlan: allow live mac address change
  bridge: Correctly unregister MDB rtnetlink handlers
  brcmfmac: fix parsing rsn ie for ap mode.
  brcmsmac: add copyright information for Canonical
  rtlwifi: rtl8723ae: Fix warning for unchecked pci_map_single() call
  rtlwifi: rtl8192se: Fix warning for unchecked pci_map_single() call
  rtlwifi: rtl8192de: Fix warning for unchecked pci_map_single() call
  rtlwifi: rtl8192ce: Fix warning for unchecked pci_map_single() call
  ...
2013-01-08 07:31:49 -08:00
Heiko Carstens
420f42ecf4 s390/irq: remove split irq fields from /proc/stat
Now that irq sum accounting for /proc/stat's "intr" line works again we
have the oddity that the sum field (first field) contains only the sum
of the second (external irqs) and third field (I/O interrupts).
The reason for that is that these two fields are already sums of all other
fields. So if we would sum up everything we would count every interrupt
twice.
This is broken since the split interrupt accounting was merged two years
ago: 052ff461c8 "[S390] irq: have detailed
statistics for interrupt types".
To fix this remove the split interrupt fields from /proc/stat's "intr"
line again and only have them in /proc/interrupts.

This restores the old behaviour, seems to be the only sane fix and mimics
a behaviour from other architectures where /proc/interrupts also contains
more than /proc/stat's "intr" line does.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-01-08 10:57:07 +01:00