Commit Graph

31286 Commits

Author SHA1 Message Date
Linus Torvalds
f075e0f699 Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "The bulk of changes are cleanups and preparations for the upcoming
  kernfs conversion.

   - cgroup_event mechanism which is and will be used only by memcg is
     moved to memcg.

   - pidlist handling is updated so that it can be served by seq_file.

     Also, the list is not sorted if sane_behavior.  cgroup
     documentation explicitly states that the file is not sorted but it
     has been for quite some time.

   - All cgroup file handling now happens on top of seq_file.  This is
     to prepare for kernfs conversion.  In addition, all operations are
     restructured so that they map 1-1 to kernfs operations.

   - Other cleanups and low-pri fixes"

* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits)
  cgroup: trivial style updates
  cgroup: remove stray references to css_id
  doc: cgroups: Fix typo in doc/cgroups
  cgroup: fix fail path in cgroup_load_subsys()
  cgroup: fix missing unlock on error in cgroup_load_subsys()
  cgroup: remove for_each_root_subsys()
  cgroup: implement for_each_css()
  cgroup: factor out cgroup_subsys_state creation into create_css()
  cgroup: combine css handling loops in cgroup_create()
  cgroup: reorder operations in cgroup_create()
  cgroup: make for_each_subsys() useable under cgroup_root_mutex
  cgroup: css iterations and css_from_dir() are safe under cgroup_mutex
  cgroup: unify pidlist and other file handling
  cgroup: replace cftype->read_seq_string() with cftype->seq_show()
  cgroup: attach cgroup_open_file to all cgroup files
  cgroup: generalize cgroup_pidlist_open_file
  cgroup: unify read path so that seq_file is always used
  cgroup: unify cgroup_write_X64() and cgroup_write_string()
  cgroup: remove cftype->read(), ->read_map() and ->write()
  hugetlb_cgroup: convert away from cftype->read()
  ...
2014-01-21 17:51:34 -08:00
Dan Carpenter
08d4d217df rxrpc: out of bound read in debug code
Smatch complains because we are using an untrusted index into the
rxrpc_acks[] array.  It's just a read and it's only in the debug code,
but it's simple enough to add a check and fix it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 17:02:52 -08:00
Yegor Yefremov
2fa053a0a2 8021q: update description
Replace deprecated 'vconfig' tool with 'ip' from 'iproute2'. Add
some beautifications like replacing 'ethernet' with 'Ethernet' and
removing unneeded spaces.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 17:01:25 -08:00
Hannes Frederic Sowa
82b276cd2b ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts
Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:59:19 -08:00
FX Le Bail
446fab5933 ipv6: enable anycast addresses as source addresses in ICMPv6 error messages
- Uses ipv6_anycast_destination() in icmp6_send().

Suggested-by: Bill Fink <billfink@mindspring.com>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:53:23 -08:00
Peter Pan(潘卫平)
4d83e17730 tcp: delete redundant calls of tcp_mtup_init()
As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen
sock, we can delete the redundant calls of tcp_mtup_init() in
tcp_{v4,v6}_syn_recv_sock().

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:52:31 -08:00
Daniel Borkmann
f0d4eb29d1 packet: fix a couple of cppcheck warnings
Doesn't bring much, but also doesn't hurt us to fix 'em:

1) In tpacket_rcv() flush dcache page we can restirct the scope
   for start and end and remove one layer of indent.

2) In tpacket_destruct_skb() we can restirct the scope for ph.

3) In alloc_one_pg_vec_page() we can remove the NULL assignment
   and change spacing a bit.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:51:42 -08:00
Duan Jiong
967680e027 ipv4: remove the useless argument from ip_tunnel_hash()
Since commit c544193214("GRE: Refactor GRE tunneling code")
introduced function ip_tunnel_hash(), the argument itn is no
longer in use, so remove it.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:48:18 -08:00
Sabrina Dubroca
f14fe8a848 net: remove unnecessary initializations in net_dev_init
softnet_data is already set to 0, no need to use memset or initialize
specific fields to 0 or NULL afterwards.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:44:45 -08:00
WANG Cong
6e6a50c254 net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()
So that we will not expose struct tcf_common to modules.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
WANG Cong
c779f7af99 net_sched: act: fetch hinfo from a->ops->hinfo
Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
Linus Torvalds
8e50966072 GPIO tree bulk changes for v3.14
A big set this merge window, as we have much going on in
 this subsystem. Major changes this time:
 
 - Some core improvements and cleanups to the new GPIO
   descriptor API. This seems to be working now so we can
   start the exodus to this API, moving gradually away from
   the global GPIO numberspace.
 
 - Incremental improvements to the ACPI GPIO core, and move
   the few GPIO ACPI clients we have to the GPIO descriptor
   API right *now* before we go any further. We actually
   managed to contain this *before* we started to litter
   the kernel with yet another hackish global numberspace for
   the ACPI GPIOs, which is a big win.
 
 - The RFkill GPIO driver and all platforms using it have
   been migrated to use the GPIO descriptors rather than
   fixed number assignments. Tegra machine has been migrated
   as part of this.
 
 - New drivers for MOXA ART, Xtensa GPIO32 and SMSC SCH311x.
   Those should be really good examples of how I expect a
   nice GPIO driver to look these days.
 
 - Do away with custom GPIO implementations on a major
   part of the ARM machines: ks8695, lpc32xx, mv78xx0.
   Make a first step towards the same in the horribly
   convoluted Samsung S3C include forest. We expect to
   continue to clean this up as we move forward.
 
 - Flag GPIO lines used for IRQ on adnp, bcm-kona, em,
   intel-mid and lynxpoint.
   This makes the GPIOlib core aware that a certain GPIO line
   is used for IRQs and can then enforce some semantics such
   as disallowing a GPIO line marked as in use for IRQ to be
   switched to output mode.
 
 - Drop all use of irq_set_chip_and_handler_name().
   The name provided in these cases were just unhelpful
   tags like "mux" or "demux".
 
 - Extend the MCP23s08 driver to handle interrupts.
 
 - Minor incremental improvements for rcar, lynxpoint, em
   74x164 and msm drivers.
 
 - Some non-urgent bug fixes here and there, duplicate
   #includes and that usual kind of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS3i/MAAoJEEEQszewGV1zVB8P/Rjzgx8To0gQPn49M4u/A1Mk
 mAzpUoKa05ILTKBm/bpWPYZPpg9PDqUxOYPsIDEAkc70BKMPTXxrYiE+LSfIzwaJ
 a8IRwOzNL7Iwc+zPNS/GrmRJyxymb4lmMD/fypk/YaumZ6j4Hbo+9R8Zct9gbZ5Q
 ZbKtz6kLhbkbNCc71bVMgk6yacSBx1ak8Xpd12HlW85NgOCoBj7/DI1Lb61x1ImY
 NYpSpmtfGGTkQLtBl5dTLefZOvL1dKSct9TMOsA2jzNqf3zA1YA6XOxPGHK/qtjq
 3s9cN1sIVF/g7sm1+qoKXe0OTQrXHT7SX8BH9/tb3MiKO8ItactlQUJlYNR3WFSN
 zm1PNe5zWr+GWzV0iUrqoMN4XX8nThiFDOxZpOwBTZcUD6qtDFIZp41M3qxwFTbJ
 hCtSQ8gUO1Ce+xtOQYYOwEkRS7FZa1Z+p/lendTFuGDh6DcXy97SrKkTktM4Q98B
 LhqrwUzCdES0ecNDi2+P5y4Fc7M0cMMn9SnFvbSBObLB89TF9uzMIn8jUBCZMvrM
 eAeZlRBYk8F+6F12higaWqZyiBKIEubXo/Z8T0L2KEDm/z/ddJvhQgBKvWlf3rqi
 RToD446rda+RhFBnxLZ3mTui5nZ2WyKTOqhVqeBuriJhE/cTUaQHUBUrbOwx20kE
 Xb9mQ2n3GRk2157n1CLY
 =lW2i
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO tree bulk changes from Linus Walleij:
 "A big set this merge window, as we have much going on in this
  subsystem.  The changes to other subsystems (notably a slew of ARM
  machines as I am doing away with their custom APIs) have all been
  ACKed to the extent possible.

  Major changes this time:

   - Some core improvements and cleanups to the new GPIO descriptor API.
     This seems to be working now so we can start the exodus to this
     API, moving gradually away from the global GPIO numberspace.

   - Incremental improvements to the ACPI GPIO core, and move the few
     GPIO ACPI clients we have to the GPIO descriptor API right *now*
     before we go any further.  We actually managed to contain this
     *before* we started to litter the kernel with yet another hackish
     global numberspace for the ACPI GPIOs, which is a big win.

   - The RFkill GPIO driver and all platforms using it have been
     migrated to use the GPIO descriptors rather than fixed number
     assignments.  Tegra machine has been migrated as part of this.

   - New drivers for MOXA ART, Xtensa GPIO32 and SMSC SCH311x.  Those
     should be really good examples of how I expect a nice GPIO driver
     to look these days.

   - Do away with custom GPIO implementations on a major part of the ARM
     machines: ks8695, lpc32xx, mv78xx0.  Make a first step towards the
     same in the horribly convoluted Samsung S3C include forest.  We
     expect to continue to clean this up as we move forward.

   - Flag GPIO lines used for IRQ on adnp, bcm-kona, em, intel-mid and
     lynxpoint.

     This makes the GPIOlib core aware that a certain GPIO line is used
     for IRQs and can then enforce some semantics such as disallowing a
     GPIO line marked as in use for IRQ to be switched to output mode.

   - Drop all use of irq_set_chip_and_handler_name().  The name provided
     in these cases were just unhelpful tags like "mux" or "demux".

   - Extend the MCP23s08 driver to handle interrupts.

   - Minor incremental improvements for rcar, lynxpoint, em 74x164 and
     msm drivers.

   - Some non-urgent bug fixes here and there, duplicate #includes and
     that usual kind of cleanups"

Fix up broken Kconfig file manually to make this all compile.

* tag 'gpio-v3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (71 commits)
  gpio: mcp23s08: fix casting caused build warning
  gpio: mcp23s08: depend on OF_GPIO
  gpio: mcp23s08: Add irq functionality for i2c chips
  ARM: S5P[v210|c100|64x0]: Fix build error
  gpio: pxa: clamp gpio get value to [0,1]
  ARM: s3c24xx: explicit dependency on <plat/gpio-cfg.h>
  ARM: S3C[24|64]xx: move includes back under <mach/> scope
  Documentation / ACPI: update to GPIO descriptor API
  gpio / ACPI: get rid of acpi_gpio.h
  gpio / ACPI: register to ACPI events automatically
  mmc: sdhci-acpi: convert to use GPIO descriptor API
  ARM: s3c24xx: fix build error
  gpio: f7188x: set can_sleep attribute
  gpio: samsung: Update documentation
  gpio: samsung: Remove hardware.h inclusion
  gpio: xtensa: depend on HAVE_XTENSA_GPIO32
  gpio: clps711x: Enable driver compilation with COMPILE_TEST
  gpio: clps711x: Use of_match_ptr()
  net: rfkill: gpio: convert to descriptor-based GPIO interface
  leds: s3c24xx: Fix build failure
  ...
2014-01-21 10:09:12 -08:00
Linus Torvalds
a0fa1dd3cd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:

 - Add the initial implementation of SCHED_DEADLINE support: a real-time
   scheduling policy where tasks that meet their deadlines and
   periodically execute their instances in less than their runtime quota
   see real-time scheduling and won't miss any of their deadlines.
   Tasks that go over their quota get delayed (Available to privileged
   users for now)

 - Clean up and fix preempt_enable_no_resched() abuse all around the
   tree

 - Do sched_clock() performance optimizations on x86 and elsewhere

 - Fix and improve auto-NUMA balancing

 - Fix and clean up the idle loop

 - Apply various cleanups and fixes

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  sched: Fix __sched_setscheduler() nice test
  sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
  sched: Fix up attr::sched_priority warning
  sched: Fix up scheduler syscall LTP fails
  sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
  sched/core: Fix htmldocs warnings
  sched/deadline: No need to check p if dl_se is valid
  sched/deadline: Remove unused variables
  sched/deadline: Fix sparse static warnings
  m68k: Fix build warning in mac_via.h
  sched, thermal: Clean up preempt_enable_no_resched() abuse
  sched, net: Fixup busy_loop_us_clock()
  sched, net: Clean up preempt_enable_no_resched() abuse
  sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
  sched/preempt, locking: Rework local_bh_{dis,en}able()
  sched/clock, x86: Avoid a runtime condition in native_sched_clock()
  sched/clock: Fix up clear_sched_clock_stable()
  sched/clock, x86: Use a static_key for sched_clock_stable
  sched/clock: Remove local_irq_disable() from the clocks
  sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
  ...
2014-01-20 10:42:08 -08:00
Weilong Chen
82ef3d5d5f net: fix "queues" uevent between network namespaces
When I create a new namespace with 'ip netns add net0', or add/remove
new links in a namespace with 'ip link add/delete type veth', rx/tx
queues events can be got in all namespaces. That is because rx/tx queue
ktypes do not have namespace support, and their kobj parents are setted to
NULL. This patch is to fix it.

Reported-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 20:02:02 -08:00
WANG Cong
671314a5ab net_sched: act: remove capab from struct tc_action_ops
It is not actually implemented.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:58:07 -08:00
Jason Wang
9d08dd3d32 net: document accel_priv parameter for __dev_queue_xmit()
To silent "make htmldocs" warning.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:55:51 -08:00
Hannes Frederic Sowa
602582ca7a ipv6: optimize link local address search
ipv6_link_dev_addr sorts newly added addresses by scope in
ifp->addr_list. Smaller scope addresses are added to the tail of the
list. Use this fact to iterate in reverse over addr_list and break out
as soon as a higher scoped one showes up, so we can spare some cycles
on machines with lot's of addresses.

The ordering of the addresses is not relevant and we are more likely to
get the eui64 generated address with this change anyway.

Suggested-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:55:50 -08:00
Hannes Frederic Sowa
4b261c75a9 ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:53:18 -08:00
Yang Yingliang
a6e2fe17eb sch_netem: replace magic numbers with enumerate
Replace some magic numbers which describe states of 4-state model
loss generator with enumerate.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:17:34 -08:00
Florent Fourcot
6444f72b4b ipv6: add flowlabel_consistency sysctl
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
 * rename ip6_flowlabel_consistency to flowlabel_consistency
 * use net_info_ratelimited()
 * checkpatch cleanups

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Florent Fourcot
46e5f40176 ipv6: add a flag to get the flow label used remotly
This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Florent Fourcot
df3687ffc6 ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET
With this option, the socket will reply with the flow label value read
on received packets.

The goal is to have a connection with the same flow label in both
direction of the communication.

Changelog of V4:
 * Do not erase the flow label on the listening socket. Use pktopts to
 store the received value

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Eric Dumazet
3acfa1e73c ipv4: be friend with drop monitor
Replace some dev_kfree_skb() with kfree_skb() calls when
we drop one skb, this might help bug tracking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 23:08:02 -08:00
Steffen Hurrle
342dfc306f net: add build-time checks for msg->msg_name size
This is a follow-up patch to f3d3342602 ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 23:04:16 -08:00
Michal Sekletar
ea02f9411d net: introduce SO_BPF_EXTENSIONS
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f8 ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.

Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.

As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 19:08:58 -08:00
David S. Miller
4180442058 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 00:55:41 -08:00
Stephen Warren
c91514972b Bluetooth: remove direct compilation of 6lowpan_iphc.c
It's now built as a separate utility module, and enabling BT selects
that module in Kconfig. This fixes:

net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here
net/ieee802154/built-in.o: In function `lowpan_header_compress':
net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here
net/ieee802154/built-in.o: In function `lowpan_process_data':
net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here
make[1]: *** [net/built-in.o] Error 1

(this change probably simply wasn't "git add"d to a53d34c346)

Fixes: a53d34c346 ("net: move 6lowpan compression code to separate module")
Fixes: 18722c2470 ("Bluetooth: Enable 6LoWPAN support for BT LE devices")
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 19:13:49 -08:00
sfeldma@cumulusnetworks.com
1d3ee88ae0 bonding: add netlink attributes to slave link dev
If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest.  Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.

Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes.  Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:51:58 -08:00
Eric Dumazet
6c7e7610ff ipv4: fix a dst leak in tunnels
This patch :

1) Remove a dst leak if DST_NOCACHE was set on dst
   Fix this by holding a reference only if dst really cached.

2) Remove a lockdep warning in __tunnel_dst_set()
    This was reported by Cong Wang.

3) Remove usage of a spinlock where xchg() is enough

4) Remove some spurious inline keywords.
   Let compiler decide for us.

Fixes: 7d442fab0a ("ipv4: Cache dst in tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:36:39 -08:00
Flavio Leitner
6a7cc41872 ipv6: send Change Status Report after DAD is completed
The RFC 3810 defines two type of messages for multicast
listeners. The "Current State Report" message, as the name
implies, refreshes the *current* state to the querier.
Since the querier sends Query messages periodically, there
is no need to retransmit the report.

On the other hand, any change should be reported immediately
using "State Change Report" messages. Since it's an event
triggered by a change and that it can be affected by packet
loss, the rfc states it should be retransmitted [RobVar] times
to make sure routers will receive timely.

Currently, we are sending "Current State Reports" after
DAD is completed.  Before that, we send messages using
unspecified address (::) which should be silently discarded
by routers.

This patch changes to send "State Change Report" messages
after DAD is completed fixing the behavior to be RFC compliant
and also to pass TAHI IPv6 testsuite.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:12:29 -08:00
Hannes Frederic Sowa
11ffff752c ipv6: simplify detection of first operational link-local address on interface
In commit 1ec047eb47 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.

Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.

Fixes: 1ec047eb47 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
Reported-by: Jiri Pirko <jiri@resnulli.us>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:10:01 -08:00
Christoph Paasch
77f99ad16a tcp: metrics: Avoid duplicate entries with the same destination-IP
Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.

This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.

Fixes: 51c5d0c4b1 (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:05:34 -08:00
Florent Fourcot
1d13a96c74 ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT
This patch is following the commit b903d324be (ipv6: tcp: fix TCLASS
value in ACK messages sent from TIME_WAIT).

For the same reason than tclass, we have to store the flow label in the
inet_timewait_sock to provide consistency of flow label on the last ACK.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 17:56:33 -08:00
Gerald Schaefer
c196403b79 net: rds: fix per-cpu helper usage
commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
handling for rds. chpfirst is the result of __this_cpu_read(), so it is
an absolute pointer and not __percpu. Therefore, __this_cpu_write()
should not operate on chpfirst, but rather on cache->percpu->first, just
like __this_cpu_read() did before.

Cc: <stable@vger.kernel.org> # 3.8+
Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 17:52:22 -08:00
John W. Linville
7916a07557 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-17 14:43:17 -05:00
Michael Dalton
a953be53ce net-sysfs: add support for device-specific rx queue sysfs attributes
Extend existing support for netdevice receive queue sysfs attributes to
permit a device-specific attribute group. Initial use case for this
support will be to allow the virtio-net device to export per-receive
queue mergeable receive buffer size.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Michael Dalton
097b4f19e5 net: allow > 0 order atomic page alloc in skb_page_frag_refill
skb_page_frag_refill currently permits only order-0 page allocs
unless GFP_WAIT is used. Change skb_page_frag_refill to attempt
higher-order page allocations whether or not GFP_WAIT is used. If
memory cannot be allocated, the allocator will fall back to
successively smaller page allocs (down to order-0 page allocs).

This change brings skb_page_frag_refill in line with the existing
page allocation strategy employed by netdev_alloc_frag, which attempts
higher-order page allocations whether or not GFP_WAIT is set, falling
back to successively lower-order page allocations on failure. Part
of migration of virtio-net to per-receive queue page frag allocators.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Wei Yongjun
722e47d792 net_sched: fix error return code in fw_change_attrs()
The error code was not set if change indev fail, so the error
condition wasn't reflected in the return value. Fix to return a
negative error code from this error handling case instead of 0.

Fixes: 2519a602c2 ('net_sched: optimize tcf_match_indev()')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:12:03 -08:00
Ying Xue
9bbb4ecc68 tipc: standardize recvmsg routine
Standardize the behaviour of waiting for events in TIPC recvmsg()
so that all variables of socket or port structures are protected
within socket lock, allowing the process of calling recvmsg() to
be woken up at appropriate time.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
Ying Xue
391a6dd1da tipc: standardize sendmsg routine of connected socket
Standardize the behaviour of waiting for events in TIPC send_packet()
so that all variables of socket or port structures are protected within
socket lock, allowing the process of calling sendmsg() to be woken up
at appropriate time.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
Ying Xue
3f40504f7e tipc: standardize sendmsg routine of connectionless socket
Comparing the behaviour of how to wait for events in TIPC sendmsg()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. For instance, sk_sleep()
and tport->congested variables associated with socket are exposed
without socket lock protection while wait_event_interruptible_timeout()
accesses them. So standardizing it with similar implementation
in other stacks can help us correct these errors which the process
of calling sendmsg() cannot be woken up event if an expected event
arrive at socket or improperly woken up although the wake condition
doesn't match.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
Ying Xue
6398e23cdb tipc: standardize accept routine
Comparing the behaviour of how to wait for events in TIPC accept()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. As sk_sleep() and
sk->sk_receive_queue variables associated with socket are not
protected by socket lock, the process of calling accept() may be
woken up improperly or sometimes cannot be woken up at all. After
standardizing it with inet_csk_wait_for_connect routine, we can
get benefits including: avoiding 'thundering herd' phenomenon,
adding a timeout mechanism for accept(), coping with a pending
signal, and having sk_sleep() and sk->sk_receive_queue being
always protected within socket lock scope and so on.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
Ying Xue
78eb3a5379 tipc: standardize connect routine
Comparing the behaviour of how to wait for events in TIPC connect()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. For instance, as both
sock->state and sk_sleep() are directly fed to
wait_event_interruptible_timeout() as its arguments, and socket lock
has to be released before we call wait_event_interruptible_timeout(),
the two variables associated with socket are exposed out of socket
lock protection, thereby probably getting stale values so that the
process of calling connect() cannot be woken up exactly even if
correct event arrives or it is woken up improperly even if the wake
condition is not satisfied in practice. Therefore, standardizing its
behaviour with sk_stream_wait_connect routine can avoid these risks.

Additionally the implementation of connect routine is simplified as a
whole, allowing it to return correct values in all different cases.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
wangweidong
abfce3ef58 sctp: remove the unnecessary assignment
When go the right path, the status is 0, no need to assign it again.
So just remove the assignment.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:31:49 -08:00
WANG Cong
6c80563c2f net_sched: act: pick a different type for act_xt
In tcf_register_action() we check either ->type or ->kind to see if
there is an existing action registered, but ipt action registers two
actions with same type but different kinds. They should have different
types too.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:24:11 -08:00
David S. Miller
7dff08bbda Included change:
- properly format already existing kerneldoc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS1xdYAAoJEEKTMo6mOh1VNv0P/R73udErvBaHTUIAVSJ3pH8/
 rQ673H+/ZDi6zJQNlNe5UHLu5/IrK/ARCz5Ldl71lNpNHjRNGloSmjIUsdOZY01q
 D8ZR9YdWxpVcTCNSKxGnDrhVyShM3bN70h2KuYWv01/IsTGJKvkkhnG0SXqJqKYm
 XBRrEyVnyn52mQM7BhWSkCrX+wSb8OyEtie4Gdn8LjKJ0i2LYeQZ7vnXSR4aORXt
 pqYe7O+cwWZoj7Vpqea8Ye7srt6BEATEGwuezcQYH+FvaIEepORHWsqqZm6z/o91
 wyjjoPcFVyKoEUASdrkC0Hq9eqoaRBVo3yZ/AWlFJ06O/eVhTNKIBN6HykmPJ+vW
 9QESaHjZ8Gackz6ehU86h7mu6Mtrbfk73BZfG23SVL6cZltXazE539v0Pv8Ge+P/
 2NKdDjRp1vCHViRLNCwX20VmZFg7bHVDHgYV0gBH/VgZAihhRhSQk9LEEhk1fbij
 UQjBMtRnWOoxNHi311Khsxqm+0X82/mBN2mw5VdV5S9ZnBXbbCNRAuxKlZWiZNGV
 MJ2oPUrzs4RIlhKuCc1Ckjr2n7xtbU/j05ASZ4iJgbDkulkGEniDBL+K1bOFIPJl
 Kb0T4F/0ucz2IcbeDijG8Xe+5XZmaFNbDQ0ooBiHrOAEVM9vqsiQ76P5uetWPnWr
 Gop6D4HE4fCGjWcONp1u
 =1KKs
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included change:
- properly format already existing kerneldoc

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:22:58 -08:00
WANG Cong
fb1d598d48 net_sched: act: use tcf_hash_release() in net/sched/act_police.c
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:21:55 -08:00
David S. Miller
8c12ec7411 Included change:
- properly compute the batman-adv header overhead. Such
   result is later used to initialize the hard_header_len
   member of the soft-interface netdev object
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS1xUQAAoJEEKTMo6mOh1VStwP/RsjepUZEgjIXs/SrKC0Pkdk
 UCnznIyn8b7sLIQokimNESDxADbTphyIBjMZC2XI87udVIRHD3v1NlHWRUzWYvqq
 KvYqxuQTfbRnHPwzWiuOzQb9tGJxF+cG4CX8E4ydKqm8LmrNgMX+hL3gq50t5FHU
 jtFOiTcqjrZTPcXsdtOyoH9eYDv+9TTqgeetCF4iOZ7G+W/3d4h/EGAaWiCmSjvP
 4vdxRFokCdr0OkU/a790SbtHNMWcjCxFCtRSoK9cZFrRjCE+ersOg1g0vWA100Vi
 O0yRkZWqODnTu12skhcA2aojFENiT9CFFvZENMpfUxJKnSortFebpKXo+vyW19gC
 H6uYR1/KLk00pFOzuKu27hQBvARiKwwkuubeSNthas9jadDSO5ZgZGrDeE4U3GNZ
 KIkHunf2HVS2zpylRdnb9A9B/mkO+iW5y6rVtDdXTkrzyhmqTpJPeJqi8j1S10Zl
 oyoJIpS+wN7Bslf3fo0f8Pag7cXT28GD4/iPmCkJDpc1GGz7A/HJF/Vpx06L8BWB
 pl9TFkxZutBOpCjsFCJ+DvRnmgjolGVqyyyDhizeBVrKMvDW/oJ1PHR6Tx5S3S35
 QG3ZNmWIaiZ73bJxzJAY5KxNoehGbPzITEBdTlpo2AMuP1+XFadDF+lquqmeAJSQ
 TAI0Paz0VP/RRXxhksKZ
 =opzN
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included change:
- properly compute the batman-adv header overhead. Such
  result is later used to initialize the hard_header_len
  member of the soft-interface netdev object

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:16:43 -08:00
Veaceslav Falico
1d486bfb66 net: add NETDEV_PRECHANGEMTU to notify before mtu change happens
Currently, if a device changes its mtu, first the change happens (invloving
all the side effects), and after that the NETDEV_CHANGEMTU is sent so that
other devices can catch up with the new mtu. However, if they return
NOTIFY_BAD, then the change is reverted and error returned.

This is a really long and costy operation (sometimes). To fix this, add
NETDEV_PRECHANGEMTU notification which is called prior to any change
actually happening, and if any callee returns NOTIFY_BAD - the change is
aborted. This way we're skipping all the playing with apply/revert the mtu.

CC: "David S. Miller" <davem@davemloft.net>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Eric Dumazet <edumazet@google.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:15:41 -08:00
Tom Herbert
0b4cec8c2e net: Check skb->rxhash in gro_receive
When initializing a gro_list for a packet, first check the rxhash of
the incoming skb against that of the skb's in the list. This should be
a very strong inidicator of whether the flow is going to be matched,
and potentially allows a lot of other checks to be short circuited.
Use skb_hash_raw so that we don't force the hash to be calculated.

Tested by running netperf 200 TCP_STREAMs between two machines with
GRO, HW rxhash, and 1G. Saw no performance degration, slight reduction
of time in dev_gro_receive.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:22:54 -08:00
Daniel Borkmann
b013840810 packet: use percpu mmap tx frame pending refcount
In PF_PACKET's packet mmap(), we can avoid using one atomic_inc()
and one atomic_dec() call in skb destructor and use a percpu
reference count instead in order to determine if packets are
still pending to be sent out. Micro-benchmark with [1] that has
been slightly modified (that is, protcol = 0 in socket(2) and
bind(2)), example on a rather crappy testing machine; I expect
it to scale and have even better results on bigger machines:

./packet_mm_tx -s7000 -m7200 -z700000 em1, avg over 2500 runs:

With patch:    4,022,015 cyc
Without patch: 4,812,994 cyc

time ./packet_mm_tx -s64 -c10000000 em1 > /dev/null, stable:

With patch:
  real         1m32.241s
  user         0m0.287s
  sys          1m29.316s

Without patch:
  real         1m38.386s
  user         0m0.265s
  sys          1m35.572s

In function tpacket_snd(), it is okay to use packet_read_pending()
since in fast-path we short-circuit the condition already with
ph != NULL, since we have next frames to process. In case we have
MSG_DONTWAIT, we also do not execute this path as need_wait is
false here anyway, and in case of _no_ MSG_DONTWAIT flag, it is
okay to call a packet_read_pending(), because when we ever reach
that path, we're done processing outgoing frames anyway and only
look if there are skbs still outstanding to be orphaned. We can
stay lockless in this percpu counter since it's acceptable when we
reach this path for the sum to be imprecise first, but we'll level
out at 0 after all pending frames have reached the skb destructor
eventually through tx reclaim. When people pin a tx process to
particular CPUs, we expect overflows to happen in the reference
counter as on one CPU we expect heavy increase; and distributed
through ksoftirqd on all CPUs a decrease, for example. As
David Laight points out, since the C language doesn't define the
result of signed int overflow (i.e. rather than wrap, it is
allowed to saturate as a possible outcome), we have to use
unsigned int as reference count. The sum over all CPUs when tx
is complete will result in 0 again.

The BUG_ON() in tpacket_destruct_skb() we can remove as well. It
can _only_ be set from inside tpacket_snd() path and we made sure
to increase tx_ring.pending in any case before we called po->xmit(skb).
So testing for tx_ring.pending == 0 is not too useful. Instead, it
would rather have been useful to test if lower layers didn't orphan
the skb so that we're missing ring slots being put back to
TP_STATUS_AVAILABLE. But such a bug will be caught in user space
already as we end up realizing that we do not have any
TP_STATUS_AVAILABLE slots left anymore. Therefore, we're all set.

Btw, in case of RX_RING path, we do not make use of the pending
member, therefore we also don't need to use up any percpu memory
here. Also note that __alloc_percpu() already returns a zero-filled
percpu area, so initialization is done already.

  [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:17:12 -08:00
Daniel Borkmann
87a2fd286a packet: don't unconditionally schedule() in case of MSG_DONTWAIT
In tpacket_snd(), when we've discovered a first frame that is
not in status TP_STATUS_SEND_REQUEST, and return a NULL buffer,
we exit the send routine in case of MSG_DONTWAIT, since we've
finished traversing the mmaped send ring buffer and don't care
about pending frames.

While doing so, we still unconditionally call an expensive
schedule() in the packet_current_frame() "error" path, which
is unnecessary in this case since it's enough to just quit
the function.

Also, in case MSG_DONTWAIT is not set, we should rather test
for need_resched() first and do schedule() only if necessary
since meanwhile pending frames could already have finished
processing and called skb destructor.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:17:11 -08:00
Daniel Borkmann
902fefb82e packet: improve socket create/bind latency in some cases
Most people acquire PF_PACKET sockets with a protocol argument in
the socket call, e.g. libpcap does so with htons(ETH_P_ALL) for
all its sockets. Most likely, at some point in time a subsequent
bind() call will follow, e.g. in libpcap with ...

  memset(&sll, 0, sizeof(sll));
  sll.sll_family          = AF_PACKET;
  sll.sll_ifindex         = ifindex;
  sll.sll_protocol        = htons(ETH_P_ALL);

... as arguments. What happens in the kernel is that already
in socket() syscall, we install a proto hook via register_prot_hook()
if our protocol argument is != 0. Yet, in bind() we're almost
doing the same work by doing a unregister_prot_hook() with an
expensive synchronize_net() call in case during socket() the proto
was != 0, plus follow-up register_prot_hook() with a bound device
to it this time, in order to limit traffic we get.

In the case when the protocol and user supplied device index (== 0)
does not change from socket() to bind(), we can spare us doing
the same work twice. Similarly for re-binding to the same device
and protocol. For these scenarios, we can decrease create/bind
latency from ~7447us (sock-bind-2 case) to ~89us (sock-bind-1 case)
with this patch.

Alternatively, for the first case, if people care, they should
simply create their sockets with proto == 0 argument and define
the protocol during bind() as this saves a call to synchronize_net()
as well (sock-bind-3 case).

In all other cases, we're tied to user space behaviour we must not
change, also since a bind() is not strictly required. Thus, we need
the synchronize_net() to make sure no asynchronous packet processing
paths still refer to the previous elements of po->prot_hook.

In case of mmap()ed sockets, the workflow that includes bind() is
socket() -> setsockopt(<ring>) -> bind(). In that case, a pair of
{__unregister, register}_prot_hook is being called from setsockopt()
in order to install the new protocol receive handler. Thus, when
we call bind and can skip a re-hook, we have already previously
installed the new handler. For fanout, this is handled different
entirely, so we should be good.

Timings on an i7-3520M machine:

  * sock-bind-1:   89 us
  * sock-bind-2: 7447 us
  * sock-bind-3:   75 us

sock-bind-1:
  socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3
  bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=all(0),
           pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0

sock-bind-2:
  socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3
  bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1),
           pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0

sock-bind-3:
  socket(PF_PACKET, SOCK_RAW, 0) = 3
  bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1),
           pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:17:11 -08:00
Paul Gortmaker
cf17228372 net/ipv4: don't use module_init in non-modular gre_offload
Recent commit 438e38fadc
("gre_offload: statically build GRE offloading support") added
new module_init/module_exit calls to the gre_offload.c file.

The file is obj-y and can't be anything other than built-in.
Currently it can never be built modular, so using module_init
as an alias for __initcall can be somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.  We also make the inclusion explicit.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- it will remain at level 6 in initcall ordering.

As for the module_exit, rather than replace it with __exitcall,
we simply remove it, since it appears only UML does anything
with those, and even for UML, there is no relevant cleanup
to be done here.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:08:27 -08:00
Eric Dumazet
0864c15883 net: eth_type_trans() should use skb_header_pointer()
eth_type_trans() can read uninitialized memory as drivers
do not necessarily pull more than 14 bytes in skb->head before
calling it.

As David suggested, we can use skb_header_pointer() to
fix this without breaking some drivers that might not expect
eth_type_trans() pulling 2 additional bytes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 15:30:31 -08:00
David S. Miller
5ff1dd2416 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
This small batch contains several Netfilter fixes for your net-next
tree, more specifically:

* Fix compilation warning in nft_ct in NF_CONNTRACK_MARK is not set,
  from Kristian Evensen.

* Add dependency to IPV6 for NF_TABLES_INET. This one has been reported
  by the several robots that are testing .config combinations, from Paul
  Gortmaker.

* Fix default base chain policy setting in nf_tables, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 11:44:54 -08:00
Jiri Pirko
89740ca74f neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions.
When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is
not initialized yet. This might cause confusion for the people who use
NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for
usage in ndo_neigh_setup.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 11:31:58 -08:00
Eric Dumazet
aee636c480 bpf: do not use reciprocal divide
At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c

He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Daniel Borkmann <dxchgb@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 17:02:08 -08:00
Thomas Haller
5b84efecb7 ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE
Refactor the deletion/update of prefix routes when removing an
address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address
present with this flag, to not cleanup the route. Instead, assume
that userspace is taking care of this route.

Also perform the same cleanup, when userspace changes an existing address
to add NOPREFIXROUTE (to an address that didn't have this flag). This is
done because when the address was added, a prefix route was created for it.
Since the user now wants to handle this route by himself, we cleanup this
route.

This cleanup of the route is not totally robust. There is no guarantee,
that the route we are about to delete was really the one added by the
kernel. This behavior does not change by the patch, and in practice it
should work just fine.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 17:00:40 -08:00
Thomas Haller
761aac737e ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
When adding/modifying an IPv6 address, the userspace application needs
a way to suppress adding a prefix route. This is for example relevant
together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf
generated addresses, but depending on on-link, no route for the
prefix should be added.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 17:00:40 -08:00
David S. Miller
6631c5cea8 Revert "batman-adv: drop dependency against CRC16"
This reverts commit 12afc36e38.

The dependency is actually still necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 16:54:14 -08:00
wangweidong
0ea5e4df7b sctp: create helper function to enable|disable sackdelay
add sctp_spp_sackdelay_{enable|disable} helper function for
avoiding code duplication.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 16:47:48 -08:00
Li RongQing
d76ed22b22 ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper
Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h,
and use this macro as possible. And define ip6_tclass helper to return
tclass

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:53:18 -08:00
Dmitry Eremin-Solenikov
a53d34c346 net: move 6lowpan compression code to separate module
IEEE 802.15.4 and Bluetooth networking stacks share 6lowpan compression
code. Instead of introducing Makefile/Kconfig hacks, build this code as
a separate module referenced from both ieee802154 and bluetooth modules.

This fixes the following build error observed in some kernel
configurations:

net/built-in.o: In function `header_create': 6lowpan.c:(.text+0x166149): undefined reference to `lowpan_header_compress'
net/built-in.o: In function `bt_6lowpan_recv': (.text+0x166b3c): undefined reference to `lowpan_process_data'

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Dmitry Eremin-Solenikov <dmitry_eremin@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:36:38 -08:00
Veaceslav Falico
5bb025fae5 net: rename sysfs symlinks on device name change
Currently, we don't rename the upper/lower_ifc symlinks in
/sys/class/net/*/ , which might result stale/duplicate links/names.

Fix this by adding netdev_adjacent_rename_links(dev, oldname) which renames
all the upper/lower interface's links to dev from the upper/lower_oldname
to the new name.

We don't need a rollback because only we control these symlinks and if we
fail to rename them - sysfs will anyway complain.

Reported-by: Ding Tianhong <dingtianhong@huawei.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:16:20 -08:00
Veaceslav Falico
3ee3270756 net: add sysfs helpers for netdev_adjacent logic
They clean up the code a bit and can be used further.

CC: Ding Tianhong <dingtianhong@huawei.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:16:19 -08:00
Simon Wunderlich
1b371d1307 batman-adv: use consistent kerneldoc style
Reported-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-16 00:16:00 +01:00
Marek Lindner
1df0cbd509 batman-adv: fix batman-adv header overhead calculation
Batman-adv prepends a full ethernet header in addition to its own
header. This has to be reflected in the MTU calculation, especially
since the value is used to set dev->hard_header_len.

Introduced by 411d6ed93a
("batman-adv: consider network coding overhead when calculating required mtu")

Reported-by: cmsv <cmsv@wirelesspt.net>
Reported-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-15 23:54:20 +01:00
Jiri Pirko
3977458c9c neigh: split lines for NEIGH_VAR_SET so they are not too long
introduced by:
commit 1f9248e560
"neigh: convert parms to an array"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 14:46:00 -08:00
Kristian Evensen
847c8e2959 netfilter: nft_ct: fix compilation warning if NF_CONNTRACK_MARK is not set
net/netfilter/nft_ct.c: In function 'nft_ct_set_eval':
net/netfilter/nft_ct.c:136:6: warning: unused variable 'value' [-Wunused-variable]

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-15 11:00:14 +01:00
Eric Dumazet
b53c733600 tcp: do not export tcp_gso_segment() and tcp_gro_receive()
tcp_gso_segment() and tcp_gro_receive() no longer need to be
exported. IPv4 and IPv6 offloads are statically linked.

Note that tcp_gro_complete() is still used by bnx2x, unfortunately.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:53:48 -08:00
Ying Xue
7f2b8562c2 net: nl80211: __dev_get_by_index instead of dev_get_by_index to find interface
As __cfg80211_rdev_from_attrs(), nl80211_dump_wiphy_parse() and
nl80211_set_wiphy() are all under rtnl_lock protection,
__dev_get_by_index() instead of dev_get_by_index() should be used
to find interface handler in them allowing us to avoid to change
interface reference counter.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:47 -08:00
Ying Xue
5af28de353 can: use __dev_get_by_index instead of dev_get_by_index to find interface
As cgw_create_job() is always under rtnl_lock protection,
__dev_get_by_index() instead of dev_get_by_index() should be used to
find interface handler in it having us avoid to change interface
reference counter.

Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:47 -08:00
Ying Xue
a74e942694 caif: __dev_get_by_index instead of dev_get_by_index to find interface
The following call chains indicate that chnl_net_open() is under
rtnl_lock protection as __dev_open() is protected by rtnl_lock.
So if __dev_get_by_index() instead of dev_get_by_index() is used
to find interface handler in it, this would help us avoid to change
interface reference counter.

__dev_open()
  chnl_net_open()

Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:47 -08:00
Ying Xue
16b77695ed batman-adv: use __dev_get_by_index instead of dev_get_by_index to find interface
The following call chains indicate that batadv_is_on_batman_iface()
is always under rtnl_lock protection as call_netdevice_notifier()
is protected by rtnl_lock. So if __dev_get_by_index() rather than
dev_get_by_index() is used to find interface handler in it, this
would help us avoid to change interface reference counter.

call_netdevice_notifier()
  batadv_hard_if_event()
    batadv_hardif_add_interface()
      batadv_is_valid_iface()
        batadv_is_on_batman_iface()

Cc: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:47 -08:00
Ying Xue
d4c5fba2f6 decnet: use __dev_get_by_index instead of dev_get_by_index to find interface
The following call chain we can identify that dn_cache_getroute() is
protected under rtnl_lock. So if we use __dev_get_by_index() instead
of dev_get_by_index() to find interface handlers in it, this would help
us avoid to change interface reference counter.

rtnetlink_rcv()
  rtnl_lock()
    netlink_rcv_skb()
      dn_cache_getroute()
  rtnl_unlock()

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:46 -08:00
Ying Xue
d9ac62be57 dcb: use __dev_get_by_name instead of dev_get_by_name to find interface
The following call chain indicates that dcb_doit() is protected
under rtnl_lock. So if we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handlers in it, this would
help us avoid to change interface reference counter.

rtnetlink_rcv()
  rtnl_lock()
  netlink_rcv_skb()
    dcb_doit()
  rtnl_unlock()

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:46 -08:00
FX Le Bail
ec35b61ea5 IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
Dan Carpenter
0e864b21e5 sctp: remove a redundant NULL check
It confuses Smatch when we check "sinit" for NULL and then non-NULL and
that causes a false positive warning later.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
stephen hemminger
963a185539 tipc: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
stephen hemminger
db9c7c3943 ipv6: addrconf spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
Hannes Frederic Sowa
95f4a45de1 net: avoid reference counter overflows on fib_rules in multicast forwarding
Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.

This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.

Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.

Fixes: f0ad0860d0 ("ipv4: ipmr: support multiple tables")
Fixes: d1db275dd3 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 17:37:25 -08:00
Geert Uytterhoeven
88bfe6ea00 net: Spelling s/transmition/transmission/
Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 17:11:26 -08:00
Christian Engelmayer
267d29a69c ieee802154: Fix memory leak in ieee802154_add_iface()
Fix a memory leak in the ieee802154_add_iface() error handling path.
Detected by Coverity: CID 710490.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:40:56 -08:00
Aruna-Hewapathirane
63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
Hannes Frederic Sowa
825edac4e7 ipv6: copy traffic class from ping request to reply
Suggested-by: Simon Schneider <simon-schneider@gmx.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:08:52 -08:00
WANG Cong
72c1d3bdd5 ipv4: register igmp_notifier even when !CONFIG_PROC_FS
We still need this notifier even when we don't config
PROC_FS.

It should be rare to have a kernel without PROC_FS,
so just for completeness.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:03:33 -08:00
Ben Hutchings
ae78dbfa40 net: Add trace events for all receive entry points, exposing more skb fields
The existing net/netif_rx and net/netif_receive_skb trace events
provide little information about the skb, nor do they indicate how it
entered the stack.

Add trace events at entry of each of the exported functions, including
most fields that are likely to be interesting for debugging driver
datapath behaviour.  Split netif_rx() and netif_receive_skb() so that
internal calls are not traced.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:46:02 -08:00
Ben Hutchings
d87d04a785 net: Add net_dev_start_xmit trace event, exposing more skb fields
The existing net/net_dev_xmit trace event provides little information
about the skb that has been passed to the driver, and it is not
simple to add more since the skb may already have been freed at
the point the event is emitted.

Add a separate trace event before the skb is passed to the driver,
including most fields that are likely to be interesting for debugging
driver datapath behaviour.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:46:02 -08:00
Ben Hutchings
20567661a1 net: Fix indentation in dev_hard_start_xmit()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:45:42 -08:00
David S. Miller
0a379e21c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-01-14 14:42:42 -08:00
Paul Durrant
ed1f50c3a7 net: add skb_checksum_setup
This patch adds a function to set up the partial checksum offset for IP
packets (and optionally re-calculate the pseudo-header checksum) into the
core network code.
The implementation was previously private and duplicated between xen-netback
and xen-netfront, however it is not xen-specific and is potentially useful
to any network driver.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:24:19 -08:00
Paul Gortmaker
419331d8ff netfilter: Add dependency on IPV6 for NF_TABLES_INET
Commit 1d49144c0a ("netfilter: nf_tables: add "inet" table for
IPv4/IPv6") allows creation of non-IPV6 enabled .config files that
will fail to configure/link as follows:

warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
net/built-in.o: In function `nft_reject_eval':
nft_reject.c:(.text+0x651e8): undefined reference to `nf_ip6_checksum'
nft_reject.c:(.text+0x65270): undefined reference to `ip6_route_output'
nft_reject.c:(.text+0x656c4): undefined reference to `ip6_dst_hoplimit'
make: *** [vmlinux] Error 1

Since the feature is to allow for a mixed IPV4 and IPV6 table, it
seems sensible to make it depend on IPV6.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-14 16:33:00 +01:00
WANG Cong
b86f81cca9 bridge: move br_net_exit() to br.c
And it can become static.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:42:39 -08:00
David S. Miller
aef2b45fe4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Steffen Klassert says:

====================
This pull request has a merge conflict between commits be7928d20b
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
da7c224b1b ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.

The version from net-next can be used, like it is done in linux-next.

1) Checkpatch cleanups, from Weilong Chen.

2) Fix lockdep complaints when pktgen is used with IPsec,
   from Fan Du.

3) Update pktgen to allow any combination of IPsec transport/tunnel mode
   and AH/ESP/IPcomp type, from Fan Du.

4) Make pktgen_dst_metrics static, Fengguang Wu.

5) Compile fix for pktgen when CONFIG_XFRM is not set,
   from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:14:25 -08:00
Neal Cardwell
70315d22d3 inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
(not just TIME_WAIT), and for such sockets the tw_substate field holds
the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.

This brings the inet_diag state-matching code in line with the field
it uses to populate idiag_state. This is also analogous to the info
exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
and get_timewait4_sock() exports tw->tw_substate.

Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
state time-wait" would also return sockets in state TCP_FIN_WAIT2.

This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 22:35:46 -08:00
David S. Miller
853dc21bfe Included changes:
- drop dependency against CRC16
 - move to new release version
 - add size check at compile time for packet structs
 - update copyright years in every file
 - implement new bonding/interface alternation feature
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS0p53AAoJEEKTMo6mOh1VZ7sP/RDgh5PMXC5l1LNTG9gtsV5Z
 zzqs+kEfeTCivcMtONXhBhuli7wlW3eZehscB/Hn6VOKf40Ktwb0tNjrJZ+OMEHC
 hwR7i56ucNOKA5L1cswWwprIY3tV3dK+tI/Y7Oc0d/HAkhU2j3wHWdzCdUMa2yBp
 PurZZrRFXrqcKtIKP+AK1DkkQ2TyUZVNB6dZDf1AifMxhcfFf6Vxg1JGj6AKgvKF
 zf9q8SC5x33qGfvx4VML1b0JNChAwt01PecY/Eo154W3eOWHNXh9UxQEQBRoChbJ
 A/hCoo/LWGFeyqZwEeWBIjR+nGi/5zbCg310FGTgjWZaJ+BBD/mjKAjDhtszX2sI
 Lf0TJ7ytfCn/qij0Xmqg3R5RnmstJpb+weZ7gMqk63o9I06RtvV6/x58tPB+7+Y1
 AJ5egjL0yUypn7LtFUL3z1S7Np6m+FC9KuH47Yc1FyXR8tSwDWYszdlAloqPeYVI
 CDMw73T+6vsWr2UPBnXqudy0BtG3XT8LFXAUC9GMkz0k8GY8RZK7MeUun9Sax+ra
 c7MC+yZYDhwKgNSiEw3J86n0ozsIGVCheAQGuenmdicQdirJa8KImj1CZCoGqWBk
 kWsyNhw68SVLZFSfCiKPjJjbWijpVfQKM/TSB30Cmj2L1hyWeEtBn0McHig6srdX
 hf1Xh4tFh1yGwMYdCcAq
 =BKos
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- drop dependency against CRC16
- move to new release version
- add size check at compile time for packet structs
- update copyright years in every file
- implement new bonding/interface alternation feature

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 21:50:27 -08:00
Eric Paris
4440e85481 audit: convert all sessionid declaration to unsigned int
Right now the sessionid value in the kernel is a combination of u32,
int, and unsigned int.  Just use unsigned int throughout.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:31:46 -05:00
Veaceslav Falico
2315dc91a5 net: make dev_set_mtu() honor notification return code
Currently, after changing the MTU for a device, dev_set_mtu() calls
NETDEV_CHANGEMTU notification, however doesn't verify it's return code -
which can be NOTIFY_BAD - i.e. some of the net notifier blocks refused this
change, and continues nevertheless.

To fix this, verify the return code, and if it's an error - then revert the
MTU to the original one, notify again and pass the error code.

CC: Jiri Pirko <jiri@resnulli.us>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 15:19:26 -08:00
stephen hemminger
6daaf0de2f sctp: make sctp_addto_chunk_fixed local
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:42:30 -08:00