Commit Graph

31249 Commits

Author SHA1 Message Date
Jiri Pirko
df7dbcbbaf rtnetlink: put "BOND" into nl attribute names which are related to bonding
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:05 -08:00
viresh kumar
f618002b0b net/neighbour: queue work on power efficient wq
Workqueue used in neighbour layer have no real dependency of scheduling these on
the cpu which scheduled them.

On a idle system, it is observed that an idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces normal workqueues with power efficient versions. This
doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is
enabled.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:05 -08:00
viresh kumar
906e073f3e net/ipv4: queue work on power efficient wq
Workqueue used in ipv4 layer have no real dependency of scheduling these on the
cpu which scheduled them.

On a idle system, it is observed that an idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces normal workqueues with power efficient versions. This
doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is
enabled.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:05 -08:00
FX Le Bail
7c90cc2d40 ipv6: enable anycast addresses as source addresses for datagrams
This change allows to consider an anycast address valid as source address
when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
So, when sending a datagram with ancillary data, the unicast and anycast
addresses are handled in the same way.

- Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
  on given interface or is global.
- Uses it in ip6_datagram_send_ctl().

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:05 -08:00
Toshiaki Makita
bdf4351bbc bridge: Remove unnecessary vlan_put_tag in br_handle_vlan
br_handle_vlan() pushes HW accelerated vlan tag into skbuff when outgoing
port is the bridge device.
This is unnecessary because __netif_receive_skb_core() can handle skbs
with HW accelerated vlan tag. In current implementation,
__netif_receive_skb_core() needs to extract the vlan tag embedded in skb
data. This could cause low network performance especially when receiving
frames at a high frame rate on the bridge device.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:29:27 -08:00
Christoph Paasch
00ca9c5b2b tcp: metrics: Fix rcu-race when deleting multiple entries
In bbf852b96e I introduced the tmlist, which allows to delete
multiple entries from the cache that match a specified destination if no
source-IP is specified.

However, as the cache is an RCU-list, we should not create this tmlist, as
it will change the tcpm_next pointer of the element that will be deleted
and so a thread iterating over the cache's entries while holding the
RCU-lock might get "redirected" to this tmlist.

This patch fixes this, by reverting back to the old behavior prior to
bbf852b96e, which means that we simply change the tcpm_next
pointer of the previous element (pp) to jump over the one we are
deleting.
The difference is that we call kfree_rcu() directly on the cache entry,
which allows us to delete multiple entries from the list.

Fixes: bbf852b96e (tcp: metrics: Delete all entries matching a certain destination)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:26:16 -08:00
Harry Mason
29824310ce sch_htb: let skb->priority refer to non-leaf class
If the class in skb->priority is not a leaf, apply filters from the
selected class, not the qdisc. This lets netfilter or user space
partially classify the packet.

Signed-off-by: Harry Mason <harry.mason@smoothwall.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 17:39:48 -08:00
Neil Horman
2d36097d26 af_packet: Add Queue mapping mode to af_packet fanout operation
This patch adds a queue mapping mode to the fanout operation of af_packet
sockets.  This allows user space af_packet users to better filter on flows
ingressing and egressing via a specific hardware queue, and avoids the potential
packet reordering that can occur when FANOUT_CPU is being used and irq affinity
varies.

Tested successfully by myself.  applies to net-next

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 17:35:50 -08:00
Hannes Frederic Sowa
809fa972fd reciprocal_divide: update/correction of the algorithm
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1<<32)*((1<<l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')>>32
  q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
Daniel Borkmann
89770b0a69 net: introduce reciprocal_scale helper and convert users
As David Laight suggests, we shouldn't necessarily call this
reciprocal_divide() when users didn't requested a reciprocal_value();
lets keep the basic idea and call it reciprocal_scale(). More
background information on this topic can be found in [1].

Joint work with Hannes Frederic Sowa.

  [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html

Suggested-by: David Laight <david.laight@aculab.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
Daniel Borkmann
f337db64af random32: add prandom_u32_max and convert open coded users
Many functions have open coded a function that returns a random
number in range [0,N-1]. Under the assumption that we have a PRNG
such as taus113 with being well distributed in [0, ~0U] space,
we can implement such a function as uword t = (n*m')>>32, where
m' is a random number obtained from PRNG, n the right open interval
border and t our resulting random number, with n,m',t in u32 universe.

Lets go with Joe and simply call it prandom_u32_max(), although
technically we have an right open interval endpoint, but that we
have documented. Other users can further be migrated to the new
prandom_u32_max() function later on; for now, we need to make sure
to migrate reciprocal_divide() users for the reciprocal_divide()
follow-up fixup since their function signatures are going to change.

Joint work with Hannes Frederic Sowa.

Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
David S. Miller
14e481445d net: Missing change from the ether_addr_copy() fixups.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 22:54:01 -08:00
David S. Miller
9be68c1ae0 net: Fix some fallout from the etner_addr_copy() changes.
net/appletalk/aarp.c: In function ‘__aarp_send_query’:
net/appletalk/aarp.c:137:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]
 ...
net/atm/lec.c: In function ‘send_to_lecd’:
net/atm/lec.c:524:3: warning: passing argument 1 of ‘ether_addr_copy’ from incompatible pointer type [enabled by default]
In file included from net/atm/lec.c:17:0:
include/linux/etherdevice.h:227:20: note: expected ‘u8 *’ but argument is of type ‘unsigned char (*)[6]’
 ...
net/caif/caif_usb.c: In function ‘cfusbl_create’:
net/caif/caif_usb.c:108:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:57:26 -08:00
wangweidong
5bc1d1b4a2 sctp: remove macros sctp_bh_[un]lock_sock
Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user
space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
wangweidong
048ed4b626 sctp: remove macros sctp_{lock|release}_sock
Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
wangweidong
387602dfdc sctp: remove macros sctp_write_[un]_lock
Redefined write_[un]lock to sctp_write_[un]lock for user space
friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:41 -08:00
wangweidong
3c8e43ba9f sctp: remove macros sctp_spin_[un]lock
Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:41 -08:00
wangweidong
79b91130a2 sctp: remove macros sctp_local_bh_{disable|enable}
Redefined local_bh_{disable|enable} to sctp_local_bh_{disable|enable}
for user space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:40 -08:00
Joe Perches
d08f161a10 dsa: Use ether_addr_copy
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:13:05 -08:00
Joe Perches
9ea08b1299 pktgen: Use ether_addr_copy
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:13:05 -08:00
Joe Perches
c62326abac netpoll: Use ether_addr_copy
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:13:05 -08:00
Joe Perches
34b2cff4ee caif_usb: Use ether_addr_copy
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:13:04 -08:00
Joe Perches
116e853f7f atm: Use ether_addr_copy
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:13:04 -08:00
Joe Perches
90ccb6aa40 appletalk: Use ether_addr_copy
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Convert struct aarp_entry.hwaddr[6] to hwaddr[ETH_ALEN].

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:13:04 -08:00
Joe Perches
07fc67befd 8021q: Use ether_addr_copy
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:13:04 -08:00
Or Gerlitz
e27a2f8395 net: Export gro_find_by_type helpers
Export the gro_find_receive/complete_by_type helpers to they can be invoked
by the gro callbacks of encapsulation protocols such as vxlan.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:05:04 -08:00
Or Gerlitz
b582ef0990 net: Add GRO support for UDP encapsulating protocols
Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive
code twice on a packet. This solves the problem of udp encapsulated packets whose
inner VM packet is udp and happen to carry a port which has registered offloads.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:05:04 -08:00
Dan Carpenter
08d4d217df rxrpc: out of bound read in debug code
Smatch complains because we are using an untrusted index into the
rxrpc_acks[] array.  It's just a read and it's only in the debug code,
but it's simple enough to add a check and fix it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 17:02:52 -08:00
Yegor Yefremov
2fa053a0a2 8021q: update description
Replace deprecated 'vconfig' tool with 'ip' from 'iproute2'. Add
some beautifications like replacing 'ethernet' with 'Ethernet' and
removing unneeded spaces.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 17:01:25 -08:00
Hannes Frederic Sowa
82b276cd2b ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts
Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:59:19 -08:00
FX Le Bail
446fab5933 ipv6: enable anycast addresses as source addresses in ICMPv6 error messages
- Uses ipv6_anycast_destination() in icmp6_send().

Suggested-by: Bill Fink <billfink@mindspring.com>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:53:23 -08:00
Peter Pan(潘卫平)
4d83e17730 tcp: delete redundant calls of tcp_mtup_init()
As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen
sock, we can delete the redundant calls of tcp_mtup_init() in
tcp_{v4,v6}_syn_recv_sock().

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:52:31 -08:00
Daniel Borkmann
f0d4eb29d1 packet: fix a couple of cppcheck warnings
Doesn't bring much, but also doesn't hurt us to fix 'em:

1) In tpacket_rcv() flush dcache page we can restirct the scope
   for start and end and remove one layer of indent.

2) In tpacket_destruct_skb() we can restirct the scope for ph.

3) In alloc_one_pg_vec_page() we can remove the NULL assignment
   and change spacing a bit.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:51:42 -08:00
Duan Jiong
967680e027 ipv4: remove the useless argument from ip_tunnel_hash()
Since commit c544193214("GRE: Refactor GRE tunneling code")
introduced function ip_tunnel_hash(), the argument itn is no
longer in use, so remove it.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:48:18 -08:00
Sabrina Dubroca
f14fe8a848 net: remove unnecessary initializations in net_dev_init
softnet_data is already set to 0, no need to use memset or initialize
specific fields to 0 or NULL afterwards.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:44:45 -08:00
WANG Cong
6e6a50c254 net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()
So that we will not expose struct tcf_common to modules.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
WANG Cong
c779f7af99 net_sched: act: fetch hinfo from a->ops->hinfo
Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
Weilong Chen
82ef3d5d5f net: fix "queues" uevent between network namespaces
When I create a new namespace with 'ip netns add net0', or add/remove
new links in a namespace with 'ip link add/delete type veth', rx/tx
queues events can be got in all namespaces. That is because rx/tx queue
ktypes do not have namespace support, and their kobj parents are setted to
NULL. This patch is to fix it.

Reported-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 20:02:02 -08:00
WANG Cong
671314a5ab net_sched: act: remove capab from struct tc_action_ops
It is not actually implemented.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:58:07 -08:00
Jason Wang
9d08dd3d32 net: document accel_priv parameter for __dev_queue_xmit()
To silent "make htmldocs" warning.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:55:51 -08:00
Hannes Frederic Sowa
602582ca7a ipv6: optimize link local address search
ipv6_link_dev_addr sorts newly added addresses by scope in
ifp->addr_list. Smaller scope addresses are added to the tail of the
list. Use this fact to iterate in reverse over addr_list and break out
as soon as a higher scoped one showes up, so we can spare some cycles
on machines with lot's of addresses.

The ordering of the addresses is not relevant and we are more likely to
get the eui64 generated address with this change anyway.

Suggested-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:55:50 -08:00
Hannes Frederic Sowa
4b261c75a9 ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:53:18 -08:00
Yang Yingliang
a6e2fe17eb sch_netem: replace magic numbers with enumerate
Replace some magic numbers which describe states of 4-state model
loss generator with enumerate.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:17:34 -08:00
Florent Fourcot
6444f72b4b ipv6: add flowlabel_consistency sysctl
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
 * rename ip6_flowlabel_consistency to flowlabel_consistency
 * use net_info_ratelimited()
 * checkpatch cleanups

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Florent Fourcot
46e5f40176 ipv6: add a flag to get the flow label used remotly
This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Florent Fourcot
df3687ffc6 ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET
With this option, the socket will reply with the flow label value read
on received packets.

The goal is to have a connection with the same flow label in both
direction of the communication.

Changelog of V4:
 * Do not erase the flow label on the listening socket. Use pktopts to
 store the received value

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Eric Dumazet
3acfa1e73c ipv4: be friend with drop monitor
Replace some dev_kfree_skb() with kfree_skb() calls when
we drop one skb, this might help bug tracking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 23:08:02 -08:00
Steffen Hurrle
342dfc306f net: add build-time checks for msg->msg_name size
This is a follow-up patch to f3d3342602 ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 23:04:16 -08:00
Michal Sekletar
ea02f9411d net: introduce SO_BPF_EXTENSIONS
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f8 ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.

Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.

As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 19:08:58 -08:00
David S. Miller
4180442058 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 00:55:41 -08:00