Commit Graph

12147 Commits

Author SHA1 Message Date
Eric Dumazet
24ea591d22 net: sched: extend percpu stats helpers
qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.

We want to add percpu stats for tc action, so this patch add common
helpers.

qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 13:50:41 -07:00
David Miller
1830fcea5b net: Kill sock->sk_protinfo
No more users, so it can now be removed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28 16:55:44 -07:00
David Miller
3200392b88 ax25: Stop using sock->sk_protinfo.
Just make a ax25_sock structure that provides the ax25_cb pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28 16:55:44 -07:00
Linus Torvalds
e0456717e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add TX fast path in mac80211, from Johannes Berg.

 2) Add TSO/GRO support to ibmveth, from Thomas Falcon

 3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

 4) Lots of new rhashtable tests, from Thomas Graf.

 5) Run ingress qdisc lockless, from Alexei Starovoitov.

 6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting.  From Eric Dumazet.

 7) Add mode parameter to pktgen, for testing receive.  From Alexei
    Starovoitov.

 8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

 9) Move page frag allocator under mm/, also from Alexander.

10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier.  From Jiri Pirko.

13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics.  From Willem de Bruijn.

14) Add netdev driver for GENEVE tunnels, from John W Linville.

15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

18) Add tail call support to BPF, from Alexei Starovoitov.

19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion.  From Eric Dumazet.

22) Add Cavium ThunderX driver, from Sunil Goutham.

23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation.  From Wei Liu.

26) Add more entropy inputs to flow dissector, from Tom Herbert.

27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
  bridge: vlan: flush the dynamically learned entries on port vlan delete
  bridge: multicast: add a comment to br_port_state_selection about blocking state
  net: inet_diag: export IPV6_V6ONLY sockopt
  stmmac: troubleshoot unexpected bits in des0 & des1
  net: ipv4 sysctl option to ignore routes when nexthop link is down
  net: track link-status of ipv4 nexthops
  net: switchdev: ignore unsupported bridge flags
  net: Cavium: Fix MAC address setting in shutdown state
  drivers: net: xgene: fix for ACPI support without ACPI
  ip: report the original address of ICMP messages
  net/mlx5e: Prefetch skb data on RX
  net/mlx5e: Pop cq outside mlx5e_get_cqe
  net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
  net/mlx5e: Remove extra spaces
  net/mlx5e: Avoid TX CQE generation if more xmit packets expected
  net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
  net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
  net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
  net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
  net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
  ...
2015-06-24 16:49:49 -07:00
David S. Miller
3a07bd6fea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/main.c
	net/packet/af_packet.c

Both conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 02:58:51 -07:00
Andy Gospodarek
0eeb075fad net: ipv4 sysctl option to ignore routes when nexthop link is down
This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup.  This will signal to userspace that the route will not be
selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down.  This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected.   The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.

With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
    cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 dev p8p1  src 80.0.0.1
    cache

and the output changes to what one would expect.

If the sysctl is not set, the following output would be expected when
p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set.  Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.

v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.

v4: Drop binary sysctl

v5: Whitespace fixups from Dave

v6: Style changes from Dave and checkpatch suggestions

v7: One more checkpatch fixup

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 02:15:54 -07:00
Andy Gospodarek
8a3d03166f net: track link-status of ipv4 nexthops
Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off.  No action is taken,
but additional flags are passed to userspace to indicate carrier status.

This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.

v2: Split out kernel functionality into 2 patches, this patch simply
sets and clears new nexthop flag RTNH_F_LINKDOWN.

v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.

v5: Whitespace and variable declaration fixups suggested by Dave.

v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
all.

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 02:15:54 -07:00
Scott Feldman
3e3a78b495 switchdev: rename vlan vid_start to vid_begin
Use vid_begin/end to be consistent with BRIDGE_VLAN_INFO_RANGE_BEGIN/END.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 06:56:18 -07:00
Eric W. Biederman
8405a8fff3 netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook
Add code to nf_unregister_hook to flush the nf_queue when a hook is
unregistered.  This guarantees that the pointer that the nf_queue code
retains into the nf_hook list will remain valid while a packet is
queued.

I tested what would happen if we do not flush queued packets and was
trivially able to obtain the oops below.  All that was required was
to stop the nf_queue listening process, to delete all of the nf_tables,
and to awaken the nf_queue listening process.

> BUG: unable to handle kernel paging request at 0000000100000001
> IP: [<0000000100000001>] 0x100000001
> PGD b9c35067 PUD 0
> Oops: 0010 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted
> task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000
> RIP: 0010:[<0000000100000001>]  [<0000000100000001>] 0x100000001
> RSP: 0018:ffff8800ba9dba40  EFLAGS: 00010a16
> RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90
> RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00
> RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28
> R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900
> R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000
> FS:  00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0
> Stack:
>  ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8
>  ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128
>  ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0
> Call Trace:
>  [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0
>  [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190
>  [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360
>  [<ffffffff81386290>] ? nla_parse+0x80/0xf0
>  [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240
>  [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150
>  [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20
>  [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0
>  [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0
>  [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650
>  [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50
>  [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0
>  [<ffffffff810e8f73>] ? __wake_up+0x43/0x70
>  [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0
>  [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80
>  [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a
> Code:  Bad RIP value.
> RIP  [<0000000100000001>] 0x100000001
>  RSP <ffff8800ba9dba40>
> CR2: 0000000100000001
> ---[ end trace 08eb65d42362793f ]---

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 06:23:23 -07:00
David S. Miller
bfdc8dbdf8 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-06-18

Here's the final bluetooth-next pull request for 4.2.

 - Cleanups & fixes to 802.15.4 code and related drivers
 - Fix btusb driver memory leak
 - New USB IDs for Atheros controllers
 - Support for BCM4324B3 UART based Broadcom controller
 - Fix for Bluetooth encryption key size handling
 - Broadcom controller initialization fixes
 - Support for Intel controller DDC parameters
 - Support for multiple Bluetooth LE advertising instances
 - Fix for HCI user channel cleanup path

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 03:17:47 -07:00
Linus Torvalds
44d21c3f3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.2:

  API:

   - Convert RNG interface to new style.

   - New AEAD interface with one SG list for AD and plain/cipher text.
     All external AEAD users have been converted.

   - New asymmetric key interface (akcipher).

  Algorithms:

   - Chacha20, Poly1305 and RFC7539 support.

   - New RSA implementation.

   - Jitter RNG.

   - DRBG is now seeded with both /dev/random and Jitter RNG.  If kernel
     pool isn't ready then DRBG will be reseeded when it is.

   - DRBG is now the default crypto API RNG, replacing krng.

   - 842 compression (previously part of powerpc nx driver).

  Drivers:

   - Accelerated SHA-512 for arm64.

   - New Marvell CESA driver that supports DMA and more algorithms.

   - Updated powerpc nx 842 support.

   - Added support for SEC1 hardware to talitos"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (292 commits)
  crypto: marvell/cesa - remove COMPILE_TEST dependency
  crypto: algif_aead - Temporarily disable all AEAD algorithms
  crypto: af_alg - Forbid the use internal algorithms
  crypto: echainiv - Only hold RNG during initialisation
  crypto: seqiv - Add compatibility support without RNG
  crypto: eseqiv - Offer normal cipher functionality without RNG
  crypto: chainiv - Offer normal cipher functionality without RNG
  crypto: user - Add CRYPTO_MSG_DELRNG
  crypto: user - Move cryptouser.h to uapi
  crypto: rng - Do not free default RNG when it becomes unused
  crypto: skcipher - Allow givencrypt to be NULL
  crypto: sahara - propagate the error on clk_disable_unprepare() failure
  crypto: rsa - fix invalid select for AKCIPHER
  crypto: picoxcell - Update to the current clk API
  crypto: nx - Check for bogus firmware properties
  crypto: marvell/cesa - add DT bindings documentation
  crypto: marvell/cesa - add support for Kirkwood and Dove SoCs
  crypto: marvell/cesa - add support for Orion SoCs
  crypto: marvell/cesa - add allhwsupport module parameter
  crypto: marvell/cesa - add support for all armada SoCs
  ...
2015-06-22 21:04:48 -07:00
Zhaowei Yuan
638579f00a net: Update out-of-date comment
Struct inet_proto no longer exists, so update the
comment which is out of date.

Signed-off-by: Zhaowei Yuan <zhaowei.yuan@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-21 10:00:07 -07:00
Herbert Xu
c0b59fafe3 Merge branch 'mvebu/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Merge the mvebu/drivers branch of the arm-soc tree which contains
just a single patch bfa1ce5f38 ("bus:
mvebu-mbus: add mv_mbus_dram_info_nooverlap()") that happens to be
a prerequisite of the new marvell/cesa crypto driver.
2015-06-19 22:07:07 +08:00
Pablo Neira Ayuso
a263653ed7 netfilter: don't pull include/linux/netfilter.h from netns headers
This pulls the full hook netfilter definitions from all those that include
net_namespace.h.

Instead let's just include the bare minimum required in the new
linux/netfilter_defs.h file, and use it from the netfilter netns header files.

I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit
this compilation error:

In file included from include/linux/netfilter_defs.h:4:0,
                 from include/net/netns/netfilter.h:4,
                 from include/net/net_namespace.h:22,
                 from include/linux/netdevice.h:43,
                 from net/netfilter/nfnetlink_queue_core.c:23:
include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type struct in_addr in;

And also explicit include linux/netfilter.h in several spots.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-06-18 21:14:31 +02:00
Pablo Neira Ayuso
10c04a8e71 netfilter: use forward declaration instead of including linux/proc_fs.h
We don't need to pull the full definitions in that file, a simple forward
declaration is enough.

Moreover, include linux/procfs.h from nf_synproxy_core, otherwise this hits a
compilation error due to missing declarations, ie.

net/netfilter/nf_synproxy_core.c: In function ‘synproxy_proc_init’:
net/netfilter/nf_synproxy_core.c:326:2: error: implicit declaration of function ‘proc_create’ [-Werror=implicit-function-declaration]
  if (!proc_create("synproxy", S_IRUGO, net->proc_net_stat,
  ^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-06-18 21:14:30 +02:00
Pablo Neira Ayuso
04c52dec14 net: include missing headers in net/net_namespace.h
Include linux/idr.h and linux/skbuff.h since they are required by objects that
are declared in the net structure.

 struct net {
	...
	struct idr		netns_ids;
	...
	struct sk_buff_head	wext_nlevents;
	...

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-06-18 21:14:29 +02:00
Pablo Neira Ayuso
230ac490f7 netfilter: bridge: split ipv6 code into separated file
Resolve compilation breakage when CONFIG_IPV6 is not set by moving the IPv6
code into a separated br_netfilter_ipv6.c file.

Fixes: efb6de9b4b ("netfilter: bridge: forward IPv6 fragmented packets")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-18 21:14:21 +02:00
Florian Grandel
db25be6657 Bluetooth: hci_core: increase max adv inst
Now that all preconditions are present for actual multi-advertising, the
number of allowed advertising instances can be larger than one. This
patch increases the number of allowed advertising instances to 5.

Signed-off-by: Florian Grandel <fgrandel@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-18 18:11:53 +02:00
Florian Grandel
d4c5af8f71 Bluetooth: hci_core: remove obsolete adv_instance
Now that the obsolete adv_instance is no longer being referenced
anywhere in the code it can be removed without breaking the build.

Signed-off-by: Florian Grandel <fgrandel@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-18 18:11:53 +02:00
Florian Grandel
fffd38bca5 Bluetooth: mgmt/hci_core: multi-adv for add_advertising*()
The add_advertising() and add_advertising_complete() functions reference
the now obsolete hdev->adv_instance struct. Both methods are being
refactored to access the dynamic advertising instance list instead.

This patch also introduces all logic necessary to actually deal with
multiple instance advertising. Notably the mgmt_adv_inst_expired() and
schedule_adv_inst() method are being referenced to schedule instances in
a round robin fashion.

This patch also introduces a "pending" flag into the adv_info struct.
This is necessary to identify and remove recently added advertising
instances when the HCI commands return with an error status code.
Otherwise new advertising instances could be leaked without properly
informing userspace about their existence.

Signed-off-by: Florian Grandel <fgrandel@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-18 18:11:52 +02:00
Florian Grandel
5d900e4601 Bluetooth: hci_core/mgmt: move adv timeout to hdev
Currently the delayed work managing advertising duration and timeout is
part of the advertising instance structure. This is not correct as only
a single instance can be advertised at any given time. To implement
round robin advertising a single delayed work structure is needed.

To fix this the delayed work structure is being moved to the hci_dev
structure. The instance specific variable is renamed to "remaining_time"
to make it clear that this is the remaining lifetime of the instance and
not the current advertising timeout.

Signed-off-by: Florian Grandel <fgrandel@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-18 18:11:51 +02:00
Florian Grandel
d2609b345e Bluetooth: hci_core/mgmt: Introduce multi-adv list
The current hci dev structure only supports a single advertising
instance. To support multi-instance advertising it is necessary to
introduce a linked list of advertising instances so that multiple
advertising instances can be dynamically added and/or removed.

In a first step, the existing adv_instance member of the hci_dev
struct is supplemented by a linked list of advertising instances.
This patch introduces the list and supporting list management
infrastructure. The list is not being used yet.

Signed-off-by: Florian Grandel <fgrandel@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-18 18:11:51 +02:00
Craig Gallek
eb4cb00852 sock_diag: define destruction multicast groups
These groups will contain socket-destruction events for
AF_INET/AF_INET6, IPPROTO_TCP/IPPROTO_UDP.

Near the end of socket destruction, a check for listeners is
performed.  In the presence of a listener, rather than completely
cleanup the socket, a unit of work will be added to a private
work queue which will first broadcast information about the socket
and then finish the cleanup operation.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 19:49:22 -07:00
David S. Miller
023033b1ec NFC 4.2 pull request
This is the NFC pull request for 4.2.
 
 - NCI drivers can now define their own handlers for processing
   proprietary NCI responses and notifications.
 
 - NFC vendors can use a dedicated netlink API to send their own
   proprietary commands, like e.g. all commands needed to implement
   vendor specific manufacturing tools.
 
 - A new generic NCI over UART driver against which any NCI chipset
   running on top of a serial interface can register.
 
 - The st21nfcb driver is renamed to st-nci as it can and will support
   most of ST Microelectronics NCI chipsets.
 
 - The st21nfcb driver can put its CLF in hibernate mode and save
   significant amount of power.
 
 - A few st21nfcb minor fixes.
 
 - The NXP NCI driver now supports ACPI enumeration.
 
 - The Marvell NCI driver now supports both USB and serial
   physical interfaces.
 
 - The Marvell NCI drivers also supports NCI frames being muxed
   over HCI. This is a setting that can be defined by a DT property.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVe2k1AAoJEIqAPN1PVmxKt88P/11r4VVq57Kh4BHKTa6CAs5m
 FBMmQdGlpU+O8VUjQLl7Y+GWoVTmDOUm/uKd6xWIvgBl4/X+UKwhNVldpsWXvw1t
 cTnn0BykvvfA4FOQJBqgGTC38oC04REr4uTK3+NnjE6sBpQ86Ljfk7xarMKdDpKI
 TKchY4sIOHIHXHVPVjW4fyEVF6pUduTenH73zWPkyKZgOQaSbgR75j12WMEI20kw
 POykX9t6UcPDzcJ+doktUgfxhHB1YlML7Z5xNrIkbk4kyAj140Ds9mzEEBllyivc
 xX1Cy6h3S+vrnx/CnNa85nA/y7cH0oK8NVQwmYicqKT/W7wASF7JZYLT6A7E81c1
 zLGHWVEK0awS/KM+bLI3ixQVNWFDqYPM8R36Ag0NotUZJPKiHRQkyBU3VSS8XT2f
 HRBlYgNYSKfR1m9LCPtm+sq6psP7cnvRAfzDrZuWtyqctriZKqCuOXbVAHJtb/3s
 gHxMkfyTL1zRW7u/xGid15JiicNAJd2aQmkHzZJCnIgUyTrnvhSNX6sRjBZ6iQCf
 z3+aTIaNEApOZ2qtfOrnSTtW0wjOBdpUK3hlSX5ozQ+V6dspoN0ld1JWURQPqkzu
 jWwCONO29/9yuoc/qTbWPGL4avAqyCFEzhAk/Ux19jIqoXNzHVF9f4HICc1aRLTf
 TKpcrQPYB61fU07CV9uT
 =+IG2
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC 4.2 pull request

This is the NFC pull request for 4.2.

- NCI drivers can now define their own handlers for processing
  proprietary NCI responses and notifications.

- NFC vendors can use a dedicated netlink API to send their own
  proprietary commands, like e.g. all commands needed to implement
  vendor specific manufacturing tools.

- A new generic NCI over UART driver against which any NCI chipset
  running on top of a serial interface can register.

- The st21nfcb driver is renamed to st-nci as it can and will support
  most of ST Microelectronics NCI chipsets.

- The st21nfcb driver can put its CLF in hibernate mode and save
  significant amount of power.

- A few st21nfcb minor fixes.

- The NXP NCI driver now supports ACPI enumeration.

- The Marvell NCI driver now supports both USB and serial
  physical interfaces.

- The Marvell NCI drivers also supports NCI frames being muxed
  over HCI. This is a setting that can be defined by a DT property.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:44:19 -07:00
Pablo Neira Ayuso
835b803377 netfilter: nf_tables_netdev: unregister hooks on net_device removal
In case the net_device is gone, we have to unregister the hooks and put back
the reference on the net_device object. Once it comes back, register them
again. This also covers the device rename case.

This patch also adds a new flag to indicate that the basechain is disabled, so
their hooks are not registered. This flag is used by the netdev family to
handle the case where the net_device object is gone. Currently this flag is not
exposed to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-15 23:02:35 +02:00
Pablo Neira Ayuso
2cbce139fc netfilter: nf_tables: attach net_device to basechain
The device is part of the hook configuration, so instead of a global
configuration per table, set it to each of the basechain that we create.

This patch reworks ebddf1a8d7 ("netfilter: nf_tables: allow to bind table to
net_device").

Note that this adds a dev_name field in the nft_base_chain structure which is
required the netdev notification subscription that follows up in a patch to
handle gone net_devices.

Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-15 23:02:31 +02:00
Marcelo Ricardo Leitner
2d45a02d01 sctp: fix ASCONF list handling
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.

Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.

This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().

Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().

Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.

Fixes: 9f7d653b67 ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-14 12:55:49 -07:00
Alexander Aring
70f36507d0 mac802154: fix flags BIT definitions order
This patch fixes commits ("mac802154: cleanup ieee802154 hardware flags")
bcbfd2078d and ("mac802154: cleanup address
filtering flags") 6b70a43c7e by starting
the flags definitions at BIT(0) which is same like the previous behaviour.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 11:43:58 +02:00
Varka Bhadram
fe89e69050 mac802154: cleanup llsec param flags
This patch changes the setting of llsec param flag with the BIT macro.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 11:42:29 +02:00
Johan Hedberg
821f376668 Bluetooth: Read encryption key size for BR/EDR connections
Since Bluetooth 3.0 there's a HCI command available for reading the
encryption key size of an BR/EDR connection. This information is
essential e.g. for generating an LTK using SMP over BR/EDR, so store
it as part of struct hci_conn.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 11:38:45 +02:00
Eric Dumazet
f69ad292cf tcp: fill shinfo->gso_size at last moment
In commit cd7d8498c9 ("tcp: change tcp_skb_pcount() location") we stored
gso_segs in a temporary cache hot location.

This patch does the same for gso_size.

This allows to save 2 cache line misses in tcp xmit path for
the last packet that is considered but not sent because of
various conditions (cwnd, tso defer, receiver window, TSQ...)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 16:33:11 -07:00
Vincent Cuissard
9961127d4b NFC: nci: add generic uart support
Some NFC controller supports UART as host interface.
As with SPI, a lot of code can be shared between vendor
drivers. This patch add the generic support of UART and
provides some extension API for vendor specific needs.

This code is strongly inspired by the Bluetooth HCI ldisc
implementation. NCI UART vendor drivers will have to register
themselves to this layer via nci_uart_register.

Underlying tty will have to be configured from user land
thanks to an ioctl.

Signed-off-by: Vincent Cuissard <cuissard@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-11 23:37:37 +02:00
David S. Miller
1edaa7e8a7 For this round we mostly have fixes:
* mesh fixes from Alexis Green and Chun-Yeow Yeoh,
  * a documentation fix from Jakub Kicinski,
  * a missing channel release (from Michal Kazior),
  * a fix for a signal strength reporting bug (from Sara Sharon),
  * handle deauth while associating (myself),
  * don't report mangled TX SKB back to userspace for status (myself),
  * handle aggregation session timeouts properly in fast-xmit (myself)
 
 However, there are also a few cleanups and one big change that
 affects all drivers (and that required me to pull in your tree)
 to change the mac80211 HW flags to use an unsigned long bitmap
 so that we can extend them more easily - we're running out of
 flags even with a cleanup to remove the two unused ones.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVeEQ1AAoJEDBSmw7B7bqrebcP/3v7I2ZXAeHag2W4hdD4YH6W
 tuKfs3JKW3GDh84l2AJs2JBpFxR6Tk0Z7zGKrPLzkBTkkJkSLgKuUKR0+YQU6PYH
 VfZ2NkdIHEqouLgMWxGGlp6suqp2yYD9tiIUroICXZ6aFm5trQuZgzv5ePI+lhmX
 cWUYCawE2tcpVdg0NJsFExeCJhw81e/Bet1LCGHo0asWNpIK7phMdltzD7e4tgQS
 4q475FCIkWxbxKgJRrRkz8J7grsjK1wf2W3acOxKMaoVBeqJVW5BWDrTgo0aDPts
 qQ8n8t1s9o/jKQIvaz3RyjkQgX8T4vCMqkouLF4jJOThKIsUSi3Fvm9oKcMg4YhA
 Ju5QWfbCBFhpLZeBzWzKyePTnDru1XDFFVdIATLONKTVg1modzFAs3j5gb4Z3Wtg
 VYLoLWWpRtHKd9pzfZMhyWq64Xb8C+qlyQHr4r4QRm9ADz0Jq+OCh0rTFt+/bncM
 CHxnf0VS9hEOFk0+TxFqi2yXOnv2uMgcN+jnGkEs4QuLfv9ML1Eb23ZjDoHxd1uq
 1Yd4R8IDEY/KU6UJMwksz+gV/ekoB32eAhw56pxehgAMuZL4OgNvmeAQHx7Jq9it
 0/OfAK2BSNH8odqYQbpg89C8keqSInMwUhFyRhyMJAWSKiPRHypsDBWxMKGJIssI
 3mB4d/go+RP1AvZnazeF
 =2wTw
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
For this round we mostly have fixes:
 * mesh fixes from Alexis Green and Chun-Yeow Yeoh,
 * a documentation fix from Jakub Kicinski,
 * a missing channel release (from Michal Kazior),
 * a fix for a signal strength reporting bug (from Sara Sharon),
 * handle deauth while associating (myself),
 * don't report mangled TX SKB back to userspace for status (myself),
 * handle aggregation session timeouts properly in fast-xmit (myself)

However, there are also a few cleanups and one big change that
affects all drivers (and that required me to pull in your tree)
to change the mac80211 HW flags to use an unsigned long bitmap
so that we can extend them more easily - we're running out of
flags even with a cleanup to remove the two unused ones.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 22:49:49 -07:00
Stephen Smalley
37a9a8df8c net/unix: support SCM_SECURITY for stream sockets
SCM_SECURITY was originally only implemented for datagram sockets,
not for stream sockets.  However, SCM_CREDENTIALS is supported on
Unix stream sockets.  For consistency, implement Unix stream support
for SCM_SECURITY as well.  Also clean up the existing code and get
rid of the superfluous UNIXSID macro.

Motivated by https://bugzilla.redhat.com/show_bug.cgi?id=1224211,
where systemd was using SCM_CREDENTIALS and assumed wrongly that
SCM_SECURITY was also supported on Unix stream sockets.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 22:49:20 -07:00
Johannes Berg
30686bf7f5 mac80211: convert HW flags to unsigned long bitmap
As we're running out of hardware capability flags pretty quickly,
convert them to use the regular test_bit() style unsigned long
bitmaps.

This introduces a number of helper functions/macros to set and to
test the bits, along with new debugfs code.

The occurrences of an explicit __clear_bit() are intentional, the
drivers were never supposed to change their supported bits on the
fly. We should investigate changing this to be a per-frame flag.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-10 16:05:36 +02:00
Johannes Berg
206c59d1d7 Merge remote-tracking branch 'net-next/master' into mac80211-next
Merge back net-next to get wireless driver changes (from Kalle)
to be able to create the API change across all trees properly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-10 12:45:09 +02:00
Christoffer Holmstedt
d446278c40 nl802154: fix misspelled enum
Signed-off-by: Christoffer Holmstedt <christoffer@christofferholmstedt.se>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-10 12:24:33 +02:00
Jakub Kicinski
c2d3955ba3 mac80211: remove obsolete sentence from documentation
FIF_PROMISC_IN_BSS was removed in commit df1404650c
("mac80211: remove support for IFF_PROMISC").

Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:48:20 +02:00
Arron Wang
ff50e8afc5 Bluetooth: Move SCO support under BT_BREDR config option
SCO/eSCO link is supported by BR/EDR controller, it is
suitable to move them under BT_BREDR config option

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-09 13:41:36 +02:00
Arron Wang
9b4c33364e Bluetooth: Make l2cap_recv_acldata() and sco_recv_scodata() return void
The return value of l2cap_recv_acldata() and sco_recv_scodata()
are not used, then change it to return void

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-09 13:41:36 +02:00
Johan Hedberg
8b76ce34c4 Bluetooth: Fix encryption key size handling for LTKs
The encryption key size for LTKs is supposed to be applied only at the
moment of encryption. When generating a Link Key (using LE SC) from
the LTK the full non-shortened value should be used. This patch
modifies the code to always keep the full value around and only apply
the key size when passing the value to HCI.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-09 09:09:06 +02:00
Samuel Ortiz
8115dd5905 NFC: Introduce vendor commands structures
Together with inline routines to associate a vendor commands
array with an NFC device.

Vendor commands allow vendors to implement their very specific
operations from driver code instead of adding new stack ops
for non NFC generic commands.
Vendors need to select their own unique IDs and use that as a
namespace for defining sub commands.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 01:20:19 +02:00
Christophe Ricard
759afb8d28 NFC: nci: Add nci_prop_cmd allowing to send proprietary nci cmd
Handle allowing to send proprietary nci commands anywhere in the nci
state machine.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:21 +02:00
Christophe Ricard
c39daeee50 NFC: nci: Add nci init ops for early device initialization
Some device may need to execute some proprietary commands
in order to "wake-up"; Before the nci state initialization.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:21 +02:00
Samuel Ortiz
b6355e972a NFC: nci: Handle proprietary response and notifications
Allow for drivers to explicitly define handlers for each
proprietary notifications and responses they expect to support.

Reviewed-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:20 +02:00
Eric Dumazet
b80c0e7858 tcp: get_cookie_sock() consolidation
IPv4 and IPv6 share same implementation of get_cookie_sock(),
and there is no point inlining it.

We add tcp_ prefix to the common helper name and export it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 15:19:52 -07:00
Alexander Aring
623c1234a2 mac802154: change pan_coord type to bool
To indicate if it's a coordinator or not a bool is enough. There should
no more values available which represent some other state.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Alexander Aring
a0825b03ae mac802154: add missing structure comments
This patch add missing comments to internal mac802154 structures.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Alexander Aring
af69a34548 mac802154: rearrange attribute in ieee802154_hw
This patch removes the priv attribute in ieee802154_hw to the right
section which is commented by attributes which needs to be filled by
driver layer.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Alexander Aring
5661d431c6 mac802154: remove unused hw_filt attribute
This patch removed an attribute from ieee802154_hw structure which is
never used inside kernel. Address information are stored inside wpan_dev
nowadays.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Alexander Aring
bcbfd2078d mac802154: cleanup ieee802154 hardware flags
This patch changes the ieee802154 hardware flags to enums and setting the
flag values with the BIT macro. Additional this patch changes the
commenting style for matching usual kernel style.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Alexander Aring
f265be3d12 mac802154: remove aack hw flag
This patch removes the hardware auto acknowdledge flag which indicates
that the transceiver supports this handling. This flag is never
evaluated inside mac802154 and all transceivers should support this
handling by default per hardware.

Suggested-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Alexander Aring
6b70a43c7e mac802154: cleanup address filtering flags
This patch changes the address filtering flags to enums and setting the
flag values with the BIT macro. Additional this patch changes the
commenting style for matching usual kernel style.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Alexander Aring
ed65963ba0 mac802154: remove unneeded vif struct
This patch removes the virtual interface structure from sub if data
struct, because it isn't used anywhere. This structure could be useful
for give per interface information at softmac driver layer. Nevertheless
there exist no use case currently and it contains the interface type
information currently. This information is also stored inside wpan dev
which is now used to check on the wpan dev interface type.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Acked-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Eric Dumazet
90c337da15 inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-06 23:57:12 -07:00
Tom Herbert
b3baa0fbd0 mpls: Add MPLS entropy label in flow_keys
In flow dissector if an MPLS header contains an entropy label this is
saved in the new keyid field of flow_keys. The entropy label is
then represented in the flow hash function input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
1fdd512c92 net: Add GRE keyid in flow_keys
In flow dissector if a GRE header contains a keyid this is saved in the
new keyid field of flow_keys. The GRE keyid is then represented
in the flow hash function input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
87ee9e52ff net: Add IPv6 flow label to flow_keys
In flow_dissector set the flow label in flow_keys for IPv6. This also
removes the shortcircuiting of flow dissection when a non-zero label
is present, the flow label can be considered to provide additional
entropy for a hash.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
d34af823ff net: Add VLAN ID to flow_keys
In flow_dissector set vlan_id in flow_keys when VLAN is found.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
45b47fd00c net: Get rid of IPv6 hash addresses flow keys
We don't need to return the IPv6 address hash as part of flow keys.
In general, using the IPv6 address hash is risky in a hash value
since the underlying use of xor provides no entropy. If someone
really needs the hash value they can get it from the full IPv6
addresses in flow keys (e.g. from flow_get_u32_src).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
9f24908901 net: Add keys for TIPC address
Add a new flow key for TIPC addresses.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
c3f8324188 net: Add full IPv6 addresses to flow_keys
This patch adds full IPv6 addresses into flow_keys and uses them as
input to the flow hash function. The implementation supports either
IPv4 or IPv6 addresses in a union, and selector is used to determine
how may words to input to jhash2.

We also add flow_get_u32_dst and flow_get_u32_src functions which are
used to get a u32 representation of the source and destination
addresses. For IPv6, ipv6_addr_hash is called. These functions retain
getting the legacy values of src and dst in flow_keys.

With this patch, Ethertype and IP protocol are now included in the
flow hash input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Tom Herbert
42aecaa9bb net: Get skb hash over flow_keys structure
This patch changes flow hashing to use jhash2 over the flow_keys
structure instead just doing jhash_3words over src, dst, and ports.
This method will allow us take more input into the hashing function
so that we can include full IPv6 addresses, VLAN, flow labels etc.
without needing to resort to xor'ing which makes for a poor hash.

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Varka Bhadram
133be0264f nl802154: export supported commands
This patch will export the supported commands by the devices
to the userspace. This will be useful to check if HardMAC
drivers can support a specific command or not.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-04 12:27:15 +02:00
Johannes Berg
c526a46767 mac80211: rename single hw-scan flag to follow naming convention
The naming convention is to always have the flags prefixed with
IEEE80211_HW_ so they're 'namespaced', make this flag follow it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 20:32:00 +02:00
Johannes Berg
ea1b2b45f5 mac80211: remove short slot/short preamble incapable flags
There are no drivers setting IEEE80211_HW_2GHZ_SHORT_SLOT_INCAPABLE
or IEEE80211_HW_2GHZ_SHORT_PREAMBLE_INCAPABLE, so any code using the
two flags is dead; it's also exceedingly unlikely that any new driver
could ever need to set these flags.

The wcn36xx code is almost certainly broken, but this preserves the
previous behaviour.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 20:28:58 +02:00
Johannes Berg
3b79af973c mac80211: stop using pointers as userspace cookies
Even if the pointers are really only accessible to root and used
pretty much only by wpa_supplicant, this is still not great; even
for debugging it'd be easier to have something that's easier to
read and guaranteed to never get reused.

With the recent change to make mac80211 create an ack_skb for the
mgmt-tx path this becomes possible, only the client probe method
needs to also allocate an ack_skb, and we can store the cookie in
that skb.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 13:07:59 +02:00
Johannes Berg
db388a567f mac80211: move TX PN to public part of key struct
For drivers supporting TSO or similar features, but that still have
PN assignment in software, there's a need to have some memory to
store the current PN value. As mac80211 already stores this and it's
somewhat complicated to add a per-driver area to the key struct (due
to the dynamic sizing thereof) it makes sense to just move the TX PN
to the keyconf, i.e. the public part of the key struct.

As TKIP is more complicated and we won't able to offload it in this
way right now (fast-xmit is skipped for TKIP unless the HW does it
all, and our hardware needs MMIC calculation in software) I've not
moved that for now - it's possible but requires exposing a lot of
the internal TKIP state.

As an bonus side effect, we can remove a lot of code by assuming the
keyseq struct has a certain layout - with BUILD_BUG_ON to verify it.

This might also improve performance, since now TX and RX no longer
share a cacheline.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 11:16:35 +02:00
David S. Miller
dda922c831 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/amd-xgbe-phy.c
	drivers/net/wireless/iwlwifi/Kconfig
	include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 22:51:30 -07:00
Neal Cardwell
9f950415e4 tcp: fix child sockets to use system default congestion control if not set
Linux 3.17 and earlier are explicitly engineered so that if the app
doesn't specifically request a CC module on a listener before the SYN
arrives, then the child gets the system default CC when the connection
is established. See tcp_init_congestion_control() in 3.17 or earlier,
which says "if no choice made yet assign the current value set as
default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
created") altered these semantics, so that children got their parent
listener's congestion control even if the system default had changed
after the listener was created.

This commit returns to those original semantics from 3.17 and earlier,
since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]:
Congestion control initialization."), and some Linux congestion
control workflows depend on that.

In summary, if a listener socket specifically sets TCP_CONGESTION to
"x", or the route locks the CC module to "x", then the child gets
"x". Otherwise the child gets current system default from
net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
earlier, and this commit restores that.

Fixes: 55d8694fa8 ("net: tcp: assign tcp cong_ops when tcp sk is created")
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 21:49:14 -07:00
David S. Miller
d803731462 As we get closer to the merge window, here are a few
more things for -next:
  * disconnect TDLS stations on CSA to avoid issues
  * fix a memory leak introduced in a recent commit
  * switch rfkill and cfg80211 to PM ops
  * in an unlikely scenario, prevent a bookkeeping
    value to get corrupted leading to dropped packets
  * fix a crash in VLAN assignment
  * switch rfkill-gpio to more modern gpiod API
  * send disconnected event to userspace with proper
    local/remote indication
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVaEyAAAoJEDBSmw7B7bqrPiIQAKOrX4g2UNtyoTWJzA7YRu+g
 GEUu/CE4LQKCodCpBiEhFlhQo2WzXsHoLj5+Nr56aFAZx19VZjXWVC5JS785wYn5
 r8hpOVWUUA3MVnXeL/+yz4chm0wTYN9pSpElZ4FHlUI0OkCMh2rPCTvdrbSKoGzV
 MN8NEO0jVE89AgOMF8gHk5YKpJ6B4QibZuUuZpgkqdwIi5udaCcrPFFrUg/NfRpA
 nTauP6blFUPOUV0sxbhS78uC3rqGQuYsnvab/QeGc9PDKk5ukrXzFdgRCVZq8224
 Ge0JcPzwzWldk892oEJoc2OfGkg5HOil9HtC+S2ehBGuK0yEXOBIkO1ZgudTH1kC
 0rLOPWVKRzTWE+sq+gWK/OjfaA7Dl6HFYYHRQ2dhm1XkqtAw8SwGQMDSIPJYWr4O
 jp4gYpwKVjnMmsEAg7FdKWyIiTgLyI07VnIciORXDyefddYMuofXI2pJkfzUeFeH
 HjCVYm2NYXDty6uneP4RC1nUbNc53FKJ5O9fW3BPMyVXD4pTjam50p9H6N7OcDN3
 k3dEevWiVgvBjZPVc3HI8RaCzS/Ww1ym+MYgV97QkMfgiuE2VkiFwK+zhWn9axbc
 eutkzFEdDcIACCZ74hIWqMJjsMnZm9E11Uq7tifAE0bi1Wpku1xPAnxMPnI+0eiF
 Dgo2bmlQ/d1dHr3N3FC0
 =KmwY
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
As we get closer to the merge window, here are a few
more things for -next:
 * disconnect TDLS stations on CSA to avoid issues
 * fix a memory leak introduced in a recent commit
 * switch rfkill and cfg80211 to PM ops
 * in an unlikely scenario, prevent a bookkeeping
   value to get corrupted leading to dropped packets
 * fix a crash in VLAN assignment
 * switch rfkill-gpio to more modern gpiod API
 * send disconnected event to userspace with proper
   local/remote indication
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 17:34:26 -07:00
David S. Miller
583d3f5af2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next, they are:

1) default CONFIG_NETFILTER_INGRESS to y for easier compile-testing of all
   options.

2) Allow to bind a table to net_device. This introduces the internal
   NFT_AF_NEEDS_DEV flag to perform a mandatory check for this binding.
   This is required by the next patch.

3) Add the 'netdev' table family, this new table allows you to create ingress
   filter basechains. This provides access to the existing nf_tables features
   from ingress.

4) Kill unused argument from compat_find_calc_{match,target} in ip_tables
   and ip6_tables, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 00:02:30 -07:00
David S. Miller
50dce303d0 This just has a single docbook build fix. In my confusion
I'd already sent the same fix for -next, but Ben Hutchings
 noted it's necessary in 4.1.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVZwzBAAoJEDBSmw7B7bqrH4YQAMZazIdDKxZNtprjT1WdB+qJ
 v82zJJUXLzNf7XPYACmwVQpyJq+uG9GHnvIr1sAU5JNtkHgSJQdRrhhICty0Y5dS
 U6dqb/5wvE3oQfPZNYYtEbtUP18j9z34nr18vWhJ3yra1ZWHjk1JJylv8Wy+TVdR
 iHQH/VVs8ACcytBQa7my5642ISWUABtOwOGJ0+cWltxbBUGpzGojXskJwclQ47TW
 bRqS6vXX27RCtfIKwq5BCdryM5uAyWIDwefRbsl1FEE6ergLUYTRQaaNV93kNCN2
 gvitY0xlK2nYHUOodNL03bifg6nnafBkellUTszi4Xfi6dP8WjEtXvpHb2ZHOMpV
 fiiYYtN3Uc6/XiW3UUufMGKmRUu2Fhl/4tIoubcmhaurbPXT1yMkoe0V6ZidKZvU
 N0sLeuCNnR0T9w2Ml23qUSQhHILT6VoaeW1SqTXBAAdlOcqTXuchU6RbOFRgqJyF
 C7VvKE+8+5Rjeu2orgj7bwOQN9cZh8MC1kVhTjjs/jm21CUKRt7pUSqBxmN0WihE
 lkfEqohIwti6v/bHOWpB+RnhWKeV1YX4G0ZMIwmn8tP0807hd9b9ov2F8CBK4e+g
 ylb4Y45WyzuR4cYGY5kndvV0xl8k7XYiFsUj1QnylFPuhaTtuZ5BtlgDq7CKMDTm
 AgkH8YAhLnWvzpTbeDPT
 =gYtq
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
This just has a single docbook build fix. In my confusion
I'd already sent the same fix for -next, but Ben Hutchings
noted it's necessary in 4.1.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 23:37:46 -07:00
David S. Miller
9d52bf0a23 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-05-28

Here's a set of patches intended for 4.2. The majority of the changes
are on the 802.15.4 side of things rather than Bluetooth related:

 - All sorts of cleanups & fixes to ieee802154 and related drivers
 - Rework of tx power support in ieee802154 and its drivers
 - Support for setting ieee802154 tx power through nl802154
 - New IDs for the btusb driver
 - Various cleanups & smaller fixes to btusb
 - New btrtl driver for Realtec devices
 - Fix suspend/resume for Realtek devices

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 23:26:45 -07:00
Jonathan Corbet
3a7af58faa mac80211: Fix mac80211.h docbook comments
A couple of enums in mac80211.h became structures recently, but the
comments didn't follow suit, leading to errors like:

  Error(.//include/net/mac80211.h:367): Cannot parse enum!
  Documentation/DocBook/Makefile:93: recipe for target 'Documentation/DocBook/80211.xml' failed
  make[1]: *** [Documentation/DocBook/80211.xml] Error 1
  Makefile:1361: recipe for target 'mandocs' failed
  make: *** [mandocs] Error 2

Fix the comments comments accordingly.  Added a couple of other small
comment fixes while I was there to silence other recently-added docbook
warnings.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-28 14:37:43 +02:00
Herbert Xu
69b0137f61 ipsec: Add IV generator information to xfrm_state
This patch adds IV generator information to xfrm_state.  This
is currently obtained from our own list of algorithm descriptions.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-28 11:23:20 +08:00
Herbert Xu
165ecc6373 xfrm: Add IV generator information to xfrm_algo_desc
This patch adds IV generator information for each AEAD and block
cipher to xfrm_algo_desc.  This will be used to access the new
AEAD interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-28 11:23:19 +08:00
Eric Dumazet
ed2dfd9009 tcp/dccp: warn user for preferred ip_local_port_range
After commit 07f4c90062 ("tcp/dccp: try to not exhaust
ip_local_port_range in connect()") it is advised to have an even number
of ports described in /proc/sys/net/ipv4/ip_local_port_range

This means start/end values should have a different parity.

Let's warn sysadmins of this, so that they can update their settings
if they want to.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 14:35:36 -04:00
Jason Gunthorpe
9302d7bb0c sctp: Fix mangled IPv4 addresses on a IPv6 listening socket
sctp_v4_map_v6 was subtly writing and reading from members
of a union in a way the clobbered data it needed to read before
it read it.

Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the
result.

Reorder things to guarantee correct behaviour no matter what the
union layout is.

This impacts user space clients that open an IPv6 SCTP socket and
receive IPv4 connections. Prior to 299ee user space would see a
sockaddr with AF_INET and a correct address, after 299ee the sockaddr
is AF_INET6, but the address is wrong.

Fixes: 299ee123e1 (sctp: Fixup v4mapped behaviour to comply with Sock API)
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 14:15:26 -04:00
Alexander Aring
b69644c1c7 nl802154: add support to set cca ed level
This patch adds support for setting the current cca ed level value over
nl802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-27 19:29:42 +02:00
Florian Westphal
d6b915e29f ip_fragment: don't forward defragmented DF packet
We currently always send fragments without DF bit set.

Thus, given following setup:

mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
   A           R1              R2         B

Where R1 and R2 run linux with netfilter defragmentation/conntrack
enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
will respond with icmp too big error if one of these fragments exceeded
1400 bytes.

However, if R1 receives fragment sizes 1200 and 100, it would
forward the reassembled packet without refragmenting, i.e.
R2 will send an icmp error in response to a packet that was never sent,
citing mtu that the original sender never exceeded.

The other minor issue is that a refragmentation on R1 will conceal the
MTU of R2-B since refragmentation does not set DF bit on the fragments.

This modifies ip_fragment so that we track largest fragment size seen
both for DF and non-DF packets, and set frag_max_size to the largest
value.

If the DF fragment size is larger or equal to the non-df one, we will
consider the packet a path mtu probe:
We set DF bit on the reassembled skb and also tag it with a new IPCB flag
to force refragmentation even if skb fits outdev mtu.

We will also set DF bit on each fragment in this case.

Joint work with Hannes Frederic Sowa.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 13:03:31 -04:00
Varka Bhadram
0f999b09f5 ieee802154: add set transmit power support
This patch adds transmission power setting support for IEEE-802.15.4
devices via nl802154.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-27 13:29:25 +02:00
Eric Dumazet
095dc8e0c3 tcp: fix/cleanup inet_ehash_locks_alloc()
If tcp ehash table is constrained to a very small number of buckets
(eg boot parameter thash_entries=128), then we can crash if spinlock
array has more entries.

While we are at it, un-inline inet_ehash_locks_alloc() and make
following changes :

- Budget 2 cache lines per cpu worth of 'spinlocks'
- Try to kmalloc() the array to avoid extra TLB pressure.
  (Most servers at Google allocate 8192 bytes for this hash table)
- Get rid of various #ifdef

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-26 19:48:46 -04:00
Lennert Buytenhek
d0997b44c9 ieee802154: Remove ieee802154_reduced_mlme_ops references.
As there doesn't seem to be a definition of it or any users of it.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-26 20:26:09 +02:00
Pablo Neira Ayuso
ed6c4136f1 netfilter: nf_tables: add netdev table to filter from ingress
This allows us to create netdev tables that contain ingress chains. Use
skb_header_pointer() as we may see shared sk_buffs at this stage.

This change provides access to the existing nf_tables features from the ingress
hook.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-26 18:41:23 +02:00
Pablo Neira Ayuso
ebddf1a8d7 netfilter: nf_tables: allow to bind table to net_device
This patch adds the internal NFT_AF_NEEDS_DEV flag to indicate that you must
attach this table to a net_device.

This change is required by the follow up patch that introduces the new netdev
table.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-26 18:41:17 +02:00
Johannes Berg
80279fb7ba cfg80211: properly send NL80211_ATTR_DISCONNECTED_BY_AP in disconnect
When we disconnect from the AP, drivers call cfg80211_disconnect().
This doesn't know whether the disconnection was initiated locally
or by the AP though, which can cause problems with the supplicant,
for example with WPS. This issue obviously doesn't show up with any
mac80211 based driver since mac80211 doesn't call this function.

Fix this by requiring drivers to indicate whether the disconnect is
locally generated or not. I've tried to update the drivers, but may
not have gotten the values correct, and some drivers may currently
not be able to report correct values. In case of doubt I left it at
false, which is the current behaviour.

For libertas, make adjustments as indicated by Dan Williams.

Reported-by: Matthieu Mauger <matthieux.mauger@intel.com>
Tested-by: Matthieu Mauger <matthieux.mauger@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-26 15:21:27 +02:00
Eric Dumazet
7f1598678d ipv6: ipv6_select_ident() returns a __be32
ipv6_select_ident() returns a 32bit value in network order.

Fixes: 286c2349f6 ("ipv6: Clean up ipv6_select_ident() and ip6_fragment()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 20:27:11 -04:00
Martin KaFai Lau
d52d3997f8 ipv6: Create percpu rt6_info
After the patch
'ipv6: Only create RTF_CACHE routes after encountering pmtu exception',
we need to compensate the performance hit (bouncing dst->__refcnt).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:35 -04:00
Martin KaFai Lau
8d0b94afdc ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister
This patch keeps track of the DST_NOCACHE routes in a list and replaces its
dev with loopback during the iface down/unregister event.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:34 -04:00
Martin KaFai Lau
3da59bd945 ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set
This patch always creates RTF_CACHE clone with DST_NOCACHE
when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to
the fl6->daddr.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Julian Anastasov <ja@ssi.bg>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:34 -04:00
Martin KaFai Lau
b197df4f0f ipv6: Add rt6_get_cookie() function
Instead of doing the rt6->rt6i_node check whenever we need
to get the route's cookie.  Refactor it into rt6_get_cookie().
It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
percpu rt6_info later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:34 -04:00
Martin KaFai Lau
45e4fd2668 ipv6: Only create RTF_CACHE routes after encountering pmtu exception
This patch creates a RTF_CACHE routes only after encountering a pmtu
exception.

After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6
tree, the rt->rt6i_node->fn_sernum is bumped which will fail the
ip6_dst_check() and trigger a relookup.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:33 -04:00
Martin KaFai Lau
2647a9b070 ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST
When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst.
Also, rt6i_gateway is always set to the nexthop while the nexthop
could be a gateway or the rt6i_dst.addr.

After removing the rt6i_dst and rt6i_src dependency in the last patch,
we also need to stop the caller from depending on rt6i_gateway and
RTF_ANYCAST.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:33 -04:00
Martin KaFai Lau
fd0273d793 ipv6: Remove external dependency on rt6i_dst and rt6i_src
This patch removes the assumptions that the returned rt is always
a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the
destination and source address.  The dst and src can be recovered from
the calling site.

We may consider to rename (rt6i_dst, rt6i_src) to
(rt6i_key_dst, rt6i_key_src) later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:32 -04:00
Martin KaFai Lau
286c2349f6 ipv6: Clean up ipv6_select_ident() and ip6_fragment()
This patch changes the ipv6_select_ident() signature to return a
fragment id instead of taking a whole frag_hdr as a param to
only set the frag_hdr->identification.

It also cleans up ip6_fragment() to obtain the fragment id at the
beginning instead of using multiple "if" later to check fragment id
has been generated or not.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:32 -04:00
Alexander Aring
c947f7e1e3 mac802154: remove mib lock
This patch removes the mib lock. The new locking mechanism is to protect
the mib values with the rtnl lock. Note that this isn't always necessary
if we have an interface up the most mib values are readonly (e.g.
address settings). With this behaviour we can remove locking in
hotpath like frame parsing completely. It depends on context if we need
to hold the rtnl lock or not, this makes the callbacks of
ieee802154_mlme_ops unnecessary because these callbacks hols always the
locks.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-23 17:57:08 +02:00
Alexander Aring
344f8c119d mac802154: use atomic ops for sequence incrementation
This patch will use atomic operations for sequence number incrementation
while MAC header generation. Upper layers like af_802154 or 6LoWPAN
could call this function in a parallel context while generating 802.15.4
MAC header before queuing into wpan interfaces transmit queue.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-23 17:57:08 +02:00
Alexander Aring
4a3a8c0c3a mac802154: remove pib lock
This patch removes the pib lock which is now replaced by rtnl lock. The
new interface already use the rtnl lock only. Nevertheless this patch
will fix issues while using new and old interface at the same time.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-23 17:57:08 +02:00
David S. Miller
36583eb54d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c
	drivers/net/phy/phy.c
	include/linux/skbuff.h
	net/ipv4/tcp.c
	net/switchdev/switchdev.c

Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
renaming overlapping with net-next changes of various
sorts.

phy.c was a case of two changes, one adding a local
variable to a function whilst the second was removing
one.

tcp.c overlapped a deadlock fix with the addition of new tcp_info
statistic values.

macb.c involved the addition of two zyncq device entries.

skbuff.h involved adding back ipv4_daddr to nf_bridge_info
whilst net-next changes put two other existing members of
that struct into a union.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-23 01:22:35 -04:00
Eric Dumazet
f5af1f57a2 inet_hashinfo: remove bsocket counter
We no longer need bsocket atomic counter, as inet_csk_get_port()
calls bind_conflict() regardless of its value, after commit
2b05ad33e1 ("tcp: bind() fix autoselection to share ports")

This patch removes overhead of maintaining this counter and
double inet_csk_get_port() calls under pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 18:55:32 -04:00
Eric Dumazet
eb9344781a tcp: add a force_schedule argument to sk_stream_alloc_skb()
In commit 8e4d980ac2 ("tcp: fix behavior for epoll edge trigger")
we fixed a possible hang of TCP sockets under memory pressure,
by allowing sk_stream_alloc_skb() to use sk_forced_mem_schedule()
if no packet is in socket write queue.

It turns out there are other cases where we want to force memory
schedule :

tcp_fragment() & tso_fragment() need to split a big TSO packet into
two smaller ones. If we block here because of TCP memory pressure,
we can effectively block TCP socket from sending new data.
If no further ACK is coming, this hang would be definitive, and socket
has no chance to effectively reduce its memory usage.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 16:56:40 -04:00
Johannes Berg
f8bdbb5847 mac80211: add missing drv_priv description for TXQ struct
The kernel-doc description for the drv_priv member of
struct ieee80211_txq was missing, leading to errors.
Add a suitable description to fix that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-20 15:04:53 +02:00
Andy Zhou
06b2c61c92 ip: remove unused function prototype
ip_do_nat() function was removed prior to kernel 3.4. Remove the
unnecessary function prototype as well.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:54:36 -04:00
Daniel Borkmann
492135557d tcp: add rfc3168, section 6.1.1.1. fallback
This work as a follow-up of commit f7b3bec6f5 ("net: allow setting ecn
via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
ECN connections. In other words, this work adds a retry with a non-ECN
setup SYN packet, as suggested from the RFC on the first timeout:

  [...] A host that receives no reply to an ECN-setup SYN within the
  normal SYN retransmission timeout interval MAY resend the SYN and
  any subsequent SYN retransmissions with CWR and ECE cleared. [...]

Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
that is, Linux default since 2009 via commit 255cac91c3 ("tcp: extend
ECN sysctl to allow server-side only ECN"):

 1) Normal ECN-capable path:

    SYN ECE CWR ----->
                <----- SYN ACK ECE
            ACK ----->

 2) Path with broken middlebox, when client has fallback:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
            SYN ----->
                <----- SYN ACK
            ACK ----->

In case we would not have the fallback implemented, the middlebox drop
point would basically end up as:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)

In any case, it's rather a smaller percentage of sites where there would
occur such additional setup latency: it was found in end of 2014 that ~56%
of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
fallback would mitigate with a slight latency trade-off. Recent related
paper on this topic:

  Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
  Gorry Fairhurst, and Richard Scheffenegger:
    "Enabling Internet-Wide Deployment of Explicit Congestion Notification."
    Proc. PAM 2015, New York.
  http://ecn.ethz.ch/ecn-pam15.pdf

Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
allows for disabling the fallback.

tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
rather we let tcp_ecn_rcv_synack() take that over on input path in case a
SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
ECN being negotiated eventually in that case.

Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Dave That <dave.taht@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:53:37 -04:00
Eric Dumazet
b5d721d761 inet: properly align icsk_ca_priv
tcp_illinois and upcoming tcp_cdg require 64bit alignment of
icsk_ca_priv

x86 does not care, but other architectures might.

Fixes: 05cbc0db03 ("ipv4: Create probe timer for tcp PMTU as per RFC4821")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@intel.com>
Acked-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 11:08:00 -04:00
Alexander Aring
0e66545701 nl802154: add support for dump phy capabilities
This patch add support to nl802154 to dump all phy capabilities which is
inside the wpan_phy_supported struct. Also we introduce a new method to
dumping supported channels. The new method will offer a easier interface
and has lesser netlink traffic.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:44 +02:00
Alexander Aring
65318680c9 ieee802154: add iftypes capability
This patch adds capability flags for supported interface types.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:42 +02:00
Alexander Aring
edea8f7c75 cfg802154: introduce wpan phy flags
This patch introduce a flag property for the wpan phy structure.
The current flag settings in ieee802154_hw are accessable in mac802154
layer only which is okay for flags which indicates MAC handling which
are done by phy. For real PHY layer settings like cca mode, transmit
power, cca energy detection level.

The difference between these flags are that the MAC handling flags are
only handled in mac802154/HardMac layer e.g. on an interface up. The phy
settings are direct netlink calls from nl802154 into the driver layer
and the nl802154 need to have a chance to check if the driver supports
this handling before sending to the next layer.

We also check now on PHY flags while dumping and setting pib attributes.
In comparing with MIB attributes the 802.15.4 gives us an default value
which we assume when a transceiver implement less functionality. In case
of MIB settings the nl802154 layer doesn't need to check on the
ieee802154_hw flags then.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:42 +02:00
Alexander Aring
791021bf13 mac802154: check for really changes
This patch adds check if the value is really changed inside pib/mib.
If a transceiver do support only one value for e.g. max_be then this
will also handle that the driver layer doesn't need to care about
handling to set one value only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:42 +02:00
Alexander Aring
fea3318d20 ieee802154: add several phy supported handling
This patch adds support for phy supported handling for all other already
existing handling 802.15.4 functionality. We assume now a fully 802.15.4
complaint transceiver at phy allocation. If a transceiver can support
802.15.4 default values only, then the values should be overwirtten by
values the transceiver supports. If the transceiver doesn't set the
according hardware flags, we assume the 802.15.4 defaults now which
cannot be changed.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Suggested-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:42 +02:00
Alexander Aring
72f655e44d ieee802154: introduce wpan_phy_supported
This patch introduce the wpan_phy_supported struct for wpan_phy. There
is currently no way to check if a transceiver can handle IEEE 802.15.4
complaint values. With this struct we can check before if the
transceiver supports these values before sending to driver layer.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Suggested-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Acked-by: Varka Bhadram <varkabhadram@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:42 +02:00
Alexander Aring
32b23550ad ieee802154: change cca ed level to mbm
This patch change the handling of cca energy detection level from dbm to
mbm. This prepares to handle floating point cca energy detection levels
values. The old netlink 802.15.4 will convert the dbm value to mbm for
handling backward compatibility.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:42 +02:00
Alexander Aring
e2eb173aaa ieee802154: change transmit power to mbm
This patch change the handling of transmit power level from dbm to mbm.
This prepares to handle floating point transmit power levels values. The
old netlink 802.15.4 will convert the dbm value to mbm for handling
backward compatibility.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:41 +02:00
Alexander Aring
1a19cb680b ieee802154: change transmit power to s32
This patch change the transmit power from s8 to s32. This prepares to store a
mbm value instead dbm inside the transmit power variable. The old
interface keep the a s8 dbm value, which should be backward compatibility
when assign s8 to s32.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-19 11:44:41 +02:00
Andy Zhou
49d16b23cd bridge_netfilter: No ICMP packet on IPv4 fragmentation error
When bridge netfilter re-fragments an IP packet for output, all
packets that can not be re-fragmented to their original input size
should be silently discarded.

However, current bridge netfilter output path generates an ICMP packet
with 'size exceeded MTU' message for such packets, this is a bug.

This patch refactors the ip_fragment() API to allow two separate
use cases. The bridge netfilter user case will not
send ICMP, the routing output will, as before.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 00:15:39 -04:00
Andy Zhou
5cf4228082 ipv4: introduce frag_expire_skip_icmp()
Improve readability of skip ICMP for de-fragmentation expiration logic.
This change will also make the logic easier to maintain when the
following patches in this series are applied.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 00:15:26 -04:00
WANG Cong
de133464c9 netns: make nsid_lock per net
The spinlock is used to protect netns_ids which is per net,
so there is no need to use a global spinlock.

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 23:41:11 -04:00
Samudrala, Sridhar
45d4122ca7 switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.
- introduce port fdb obj and generic switchdev_port_fdb_add/del/dump()
- use switchdev_port_fdb_add/del/dump in rocker/team/bonding ndo ops.
- add support for fdb obj in switchdev_port_obj_add/del/dump()
- switch rocker to implement fdb ops via switchdev_ops

v3: updated to sync with named union changes.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 22:49:09 -04:00
Eric Dumazet
b8da51ebb1 tcp: introduce tcp_under_memory_pressure()
Introduce an optimized version of sk_under_memory_pressure()
for TCP. Our intent is to use it in fast paths.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 22:45:48 -04:00
Eric Dumazet
a6c5ea4ccf tcp: rename sk_forced_wmem_schedule() to sk_forced_mem_schedule()
We plan to use sk_forced_wmem_schedule() in input path as well,
so make it non static and rename it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 22:45:48 -04:00
Eric Dumazet
1a24e04e4b net: fix sk_mem_reclaim_partial()
sk_mem_reclaim_partial() goal is to ensure each socket has
one SK_MEM_QUANTUM forward allocation. This is needed both for
performance and better handling of memory pressure situations in
follow up patches.

SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes
as some arches have 64KB pages.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 22:45:48 -04:00
Eric Dumazet
d53a2aa3a1 net: fix sparse error in csum_replace4()
make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/netfilter/nf_nat_l3proto_ipv4.o
  CHECK   net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
include/net/checksum.h:125:64: warning: incorrect type in argument 2 (different base types)
include/net/checksum.h:125:64:    expected restricted __wsum [usertype] addend
include/net/checksum.h:125:64:    got restricted __be32 [usertype] from
include/net/checksum.h:125:71: warning: incorrect type in argument 2 (different base types)
include/net/checksum.h:125:71:    expected restricted __wsum [usertype] addend
include/net/checksum.h:125:71:    got restricted __be32 [usertype] to
include/net/checksum.h:125:64: warning: incorrect type in argument 2 (different base types)
include/net/checksum.h:125:64:    expected restricted __wsum [usertype] addend
include/net/checksum.h:125:64:    got restricted __be32 [usertype] from
include/net/checksum.h:125:71: warning: incorrect type in argument 2 (different base types)
include/net/checksum.h:125:71:    expected restricted __wsum [usertype] addend
include/net/checksum.h:125:71:    got restricted __be32 [usertype] to

Fixes: 4565af0d40 ("net: optimise csum_replace4()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 13:08:29 -04:00
Eric Dumazet
264ea103a7 tcp: syncookies: extend validity range
Now we allow storing more request socks per listener, we might
hit syncookie mode less often and hit following bug in our stack :

When we send a burst of syncookies, then exit this mode,
tcp_synq_no_recent_overflow() can return false if the ACK packets coming
from clients are coming three seconds after the end of syncookie
episode.

This is a way too strong requirement and conflicts with rest of
syncookie code which allows ACK to be aged up to 2 minutes.

Perfectly valid ACK packets are dropped just because clients might be
in a crowded wifi environment or on another planet.

So let's fix this, and also change tcp_synq_overflow() to not
dirty a cache line for every syncookie we send, as we are under attack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 22:32:17 -04:00
John W. Linville
35d32e8fe4 geneve: move definition of geneve_hdr() to geneve.h
This is a static inline with identical definitions in multiple places...

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:59:13 -04:00
Jiri Pirko
59346afe7a flow_dissector: change port array into src, dst tuple
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:47 -04:00
Jiri Pirko
67a900cc04 flow_dissector: introduce support for Ethernet addresses
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:47 -04:00
Jiri Pirko
b924933cbb flow_dissector: introduce support for ipv6 addressses
So far, only hashes made out of ipv6 addresses could be dissected. This
patch introduces support for dissection of full ipv6 addresses.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:47 -04:00
Jiri Pirko
c3f8eaeb6e flow_dissector: add missing header includes
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:47 -04:00
Jiri Pirko
06635a35d1 flow_dissect: use programable dissector in skb_flow_dissect and friends
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:47 -04:00
Jiri Pirko
fbff949e3b flow_dissector: introduce programable flow_dissector
Introduce dissector infrastructure which allows user to specify which
parts of skb he wants to dissect.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:47 -04:00
Jiri Pirko
9c684b5083 net: move __skb_get_hash function declaration to flow_dissector.h
Since the definition of the function is in flow_dissector.c, it makes
sense to have the declaration in flow_dissector.h

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:46 -04:00
Jiri Pirko
10b89ee43e net: move *skb_get_poff declarations into correct header
Since these functions are defined in flow_dissector.c, move header
declarations from skbuff.h into flow_dissector.h

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:45 -04:00
Jiri Pirko
b0a31431b4 flow_dissector: remove unused function flow_get_hlen declaration
commit 56193d1bce ("net: Add function for parsing the header length out
of linear ethernet frames") added this function declaration but it is
defined nowhere.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:45 -04:00
Jiri Pirko
1bd758eb1c net: change name of flow_dissector header to match the .c file name
add couple of empty lines on the way.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:19:45 -04:00
Florian Westphal
e578d9c025 net: sched: use counter to break reclassify loops
Seems all we want here is to avoid endless 'goto reclassify' loop.
tc_classify_compat even resets this counter when something other
than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't
break hypothetical loops induced by something other than perpetual
TC_ACT_RECLASSIFY return values.

skb_act_clone is now identical to skb_clone, so just use that.

Tested with following (bogus) filter:
tc filter add dev eth0 parent ffff: \
 protocol ip u32 match u32 0 0 police rate 10Kbit burst \
 64000 mtu 1500 action reclassify

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:08:14 -04:00
David S. Miller
b04096ff33 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Four minor merge conflicts:

1) qca_spi.c renamed the local variable used for the SPI device
   from spi_device to spi, meanwhile the spi_set_drvdata() call
   got moved further up in the probe function.

2) Two changes were both adding new members to codel params
   structure, and thus we had overlapping changes to the
   initializer function.

3) 'net' was making a fix to sk_release_kernel() which is
   completely removed in 'net-next'.

4) In net_namespace.c, the rtnl_net_fill() call for GET operations
   had the command value fixed, meanwhile 'net-next' adjusted the
   argument signature a bit.

This also matches example merge resolutions posted by Stephen
Rothwell over the past two days.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 14:31:43 -04:00
Scott Feldman
42275bd8fc switchdev: don't use anonymous union on switchdev attr/obj structs
Older gcc versions (e.g.  gcc version 4.4.6) don't like anonymous unions
which was causing build issues on the newly added switchdev attr/obj
structs.  Fix this by using named union on structs.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 14:20:59 -04:00
Scott Feldman
5eb764edee switchdev: align comment with other comments in block
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 12:26:27 -04:00
Ying Xue
9449c3cd90 net: make skb_dst_pop routine static
As xfrm_output_one() is the only caller of skb_dst_pop(), we should
make skb_dst_pop() localized.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 23:19:49 -04:00
Scott Feldman
58c2cb16b1 switchdev: convert fib_ipv4_add/del over to switchdev_port_obj_add/del
The IPv4 FIB ops convert nicely to the switchdev objs and we're left with
only four switchdev ops: port get/set and port add/del.  Other objs will
follow, such as FDB.  So go ahead and convert IPv4 FIB over to switchdev
obj for consistency, anticipating more objs to come.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman
8793d0a664 switchdev: add new switchdev_port_bridge_getlink
Like bridge_setlink, add switchdev wrapper to handle bridge_getlink and
call into port driver to get port attrs.  For now, only BR_LEARNING and
BR_LEARNING_SYNC are returned.  To add more, we'll probably want to break
away from ndo_dflt_bridge_getlink() and build the netlink skb directly in
the switchdev code.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman
87a5dae59e switchdev: remove unused switchdev_port_bridge_dellink
Now we can remove old wrappers for dellink.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman
5c34e02214 switchdev: add new switchdev_port_bridge_dellink
Same change as setlink.  Provide the wrapper op for SELF ndo_bridge_dellink
and call into the switchdev driver to delete afspec VLANs.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:55 -04:00
Scott Feldman
e71f220b34 switchdev: remove old switchdev_port_bridge_setlink
New attr-based bridge_setlink can recurse lower devs and recover on err, so
remove old wrapper (including ndo_dflt_switchdev_port_bridge_setlink).

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:54 -04:00
Scott Feldman
6004c86718 switchdev: add bridge port flags attr
rocker: use switchdev get/set attr for bridge port flags

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:54 -04:00
Scott Feldman
6fc3016da7 switchdev: add port vlan obj
VLAN obj has flags (PVID and untagged) as well as start and end vid ranges.
The switchdev driver can optimize programing the device using the ranges.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Scott Feldman
491d0f1533 switchdev: introduce switchdev add/del obj ops
Like switchdev attr get/set, add new switchdev obj add/del.  switchdev objs
will be things like VLANs or FIB entries, so add/del fits better for
objects than get/set used for attributes.

Use same two-phase prepare-commit transaction model as in attr set.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Scott Feldman
3563606258 switchdev: convert STP update to switchdev attr set
STP update is just a settable port attribute, so convert
switchdev_port_stp_update to an attr set.

For DSA, the prepare phase is skipped and STP updates are only done in the
commit phase.  This is because currently the DSA drivers don't need to
allocate any memory for STP updates and the STP update will not fail to HW
(unless something horrible goes wrong on the MDIO bus, in which case the
prepare phase wouldn't have been able to predict anyway).

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Scott Feldman
f8e20a9f87 switchdev: convert parent_id_get to switchdev attr get
Switch ID is just a gettable port attribute.  Convert switchdev op
switchdev_parent_id_get to a switchdev attr.

Note: for sysfs and netlink interfaces, SWITCHDEV_ATTR_PORT_PARENT_ID is
called with SWITCHDEV_F_NO_RECUSE to limit switch ID user-visiblity to only
port netdevs.  So when a port is stacked under bond/bridge, the user can
only query switch id via the switch ports, but not via the upper devices

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Scott Feldman
3094333d90 switchdev: introduce get/set attrs ops
Add two new swdev ops for get/set switch port attributes.  Most swdev
interactions on a port are gets or sets on port attributes, so rather than
adding ops for each attribute, let's define clean get/set ops for all
attributes, and then we can have clear, consistent rules on how attributes
propagate on stacked devs.

Add the basic algorithms for get/set attr ops.  Use the same recusive algo
to walk lower devs we've used for STP updates, for example.  For get,
compare attr value for each lower dev and only return success if attr
values match across all lower devs.  For sets, set the same attr value for
all lower devs.  We'll use a two-phase prepare-commit transaction model for
sets.  In the first phase, the driver(s) are asked if attr set is OK.  If
all OK, the commit attr set in second phase.  A driver would NACK the
prepare phase if it can't set the attr due to lack of resources or support,
within it's control.  RTNL lock must be held across both phases because
we'll recurse all lower devs first in prepare phase, and then recurse all
lower devs again in commit phase.  If any lower dev fails the prepare
phase, we need to abort the transaction for all lower devs.

If lower dev recusion isn't desired, allow a flag SWITCHDEV_F_NO_RECURSE to
indicate get/set only work on port (lowest) device.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Jiri Pirko
9d47c0a2d9 switchdev: s/swdev_/switchdev_/
Turned out that "switchdev" sticks. So just unify all related terms to use
this prefix.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:53 -04:00
Jiri Pirko
ebb9a03a59 switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/
Turned out that "switchdev" sticks. So just unify all related terms to use
this prefix.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:52 -04:00
Eric Dumazet
b396cca6fa net: sched: deprecate enqueue_root()
Only left enqueue_root() user is netem, and it looks not necessary :

qdisc_skb_cb(skb)->pkt_len is preserved after one skb_clone()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 14:17:32 -04:00
Mahesh Bandewar
d22a5fc0c3 bonding: Implement user key part of port_key in an AD system.
The port key has three components - user-key, speed-part, and duplex-part.
The LSBit is for the duplex-part, next 5 bits are for the speed while the
remaining 10 bits are the user defined key bits. Get these 10 bits
from the user-space (through the SysFs interface) and use it to form the
admin port-key. Allowed range for the user-key is 0 - 1023 (10 bits). If
it is not provided then use zero for the user-key-bits (default).

It can set using following example code -

   # modprobe bonding mode=4
   # usr_port_key=$(( RANDOM & 0x3FF ))
   # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key
   # echo +eth1 > /sys/class/net/bond0/bonding/slaves
   ...
   # ip link set bond0 up

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: * fixed up style issues reported by checkpatch
     * fixed up context from change in ad_actor_sys_prio patch]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:59:32 -04:00
Mahesh Bandewar
7451495755 bonding: Allow userspace to set actors' macaddr in an AD-system.
In an AD system, the communication between actor and partner is the
business between these two entities. In the current setup anyone on the
same L2 can "guess" the LACPDU contents and then possibly send the
spoofed LACPDUs and trick the partner causing connectivity issues for
the AD system. This patch allows to use a random mac-address obscuring
it's identity making it harder for someone in the L2 is do the same thing.

This patch allows user-space to choose the mac-address for the AD-system.
This mac-address can not be NULL or a Multicast. If the mac-address is set
from user-space; kernel will honor it and will not overwrite it. In the
absence (value from user space); the logic will default to using the
masters' mac as the mac-address for the AD-system.

It can be set using example code below -

   # modprobe bonding mode=4
   # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \
                    $(( (RANDOM & 0xFE) | 0x02 )) \
                    $(( RANDOM & 0xFF )) \
                    $(( RANDOM & 0xFF )) \
                    $(( RANDOM & 0xFF )) \
                    $(( RANDOM & 0xFF )) \
                    $(( RANDOM & 0xFF )))
   # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system
   # echo +eth1 > /sys/class/net/bond0/bonding/slaves
   ...
   # ip link set bond0 up

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: fixed up style issues reported by checkpatch]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:59:32 -04:00
Mahesh Bandewar
6791e4661c bonding: Allow userspace to set actors' system_priority in AD system
This patch allows user to randomize the system-priority in an ad-system.
The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user
does not specify this value, the system defaults to 0xFFFF, which is
what it was before this patch.

Following example code could set the value -
    # modprobe bonding mode=4
    # sys_prio=$(( 1 + RANDOM + RANDOM ))
    # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio
    # echo +eth1 > /sys/class/net/bond0/bonding/slaves
    ...
    # ip link set bond0 up

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: * fixed up style issues reported by checkpatch
     * changed how the default value is set in bond_check_params(), this
       makes the default consistent between what gets set for a new bond
       and what the default is claimed to be in the bonding options.]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:59:31 -04:00
Eric W. Biederman
affb9792f1 net: kill sk_change_net and sk_release_kernel
These functions are no longer needed and no longer used kill them.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:18 -04:00
Eric W. Biederman
26abe14379 net: Modify sk_alloc to not reference count the netns of kernel sockets.
Now that sk_alloc knows when a kernel socket is being allocated modify
it to not reference count the network namespace of kernel sockets.

Keep track of if a socket needs reference counting by adding a flag to
struct sock called sk_net_refcnt.

Update all of the callers of sock_create_kern to stop using
sk_change_net and sk_release_kernel as those hacks are no longer
needed, to avoid reference counting a kernel socket.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:18 -04:00
Eric W. Biederman
11aa9c28b4 net: Pass kern from net_proto_family.create to sk_alloc
In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:17 -04:00
Eric Dumazet
80ba92fa1a codel: add ce_threshold attribute
For DCTCP or similar ECN based deployments on fabrics with shallow
buffers, hosts are responsible for a good part of the buffering.

This patch adds an optional ce_threshold to codel & fq_codel qdiscs,
so that DCTCP can have feedback from queuing in the host.

A DCTCP enabled egress port simply have a queue occupancy threshold
above which ECT packets get CE mark.

In codel language this translates to a sojourn time, so that one doesn't
have to worry about bytes or bandwidth but delays.

This makes the host an active participant in the health of the whole
network.

This also helps experimenting DCTCP in a setup without DCTCP compliant
fabric.

On following example, ce_threshold is set to 1ms, and we can see from
'ldelay xxx us' that TCP is not trying to go around the 5ms codel
target.

Queue has more capacity to absorb inelastic bursts (say from UDP
traffic), as queues are maintained to an optimal level.

lpaa23:~# ./tc -s -d qd sh dev eth1
qdisc mq 1: dev eth1 root
 Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961)
 backlog 3108242b 364p requeues 42961
qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503)
 rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503
  count 0 lastcount 0 ldelay 1.0ms drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384
qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186)
 rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186
  count 0 lastcount 0 ldelay 694us drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873
qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554)
 rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554
  count 0 lastcount 0 ldelay 889us drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780
...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-10 19:50:20 -04:00
Nicolas Dichtel
59324cf35a netlink: allow to listen "all" netns
More accurately, listen all netns that have a nsid assigned into the netns
where the netlink socket is opened.
For this purpose, a netlink socket option is added:
NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
socket will receive netlink notifications from all netns that have a nsid
assigned into the netns where the socket has been opened. The nsid is sent
to userland via an anscillary data.

With this patch, a daemon needs only one socket to listen many netns. This
is useful when the number of netns is high.

Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
the field nsid is valid or not. skb->cb is initialized to 0 on skb
allocation, thus we are sure that we will never send a nsid 0 by error to
the userland.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 22:15:31 -04:00
Nicolas Dichtel
7a0877d4b4 netns: rename peernet2id() to peernet2id_alloc()
In a following commit, a new function will be introduced to only lookup for
a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the
existing function is renamed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 22:15:30 -04:00
David S. Miller
0e00a0f73f Lots of updates for net-next for this cycle. As usual, we have
a lot of small fixes and cleanups, the bigger items are:
  * proper mac80211 rate control locking, to fix some random crashes
    (this required changing other locking as well)
  * mac80211 "fast-xmit", a mechanism to reduce, in most cases, the
    amount of code we execute while going from ndo_start_xmit() to
    the driver
  * this also clears the way for properly supporting S/G and checksum
    and segmentation offloads
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVSh65AAoJEDBSmw7B7bqrudwP/0iXyNQhF0mLTENrx+rdsDZS
 qQhB/8wejJaOJb89Re7M+bhwri7Q6S5BM/G24vhMc01dxmqNMcdKfEV3+nlmc5C+
 KeEgTI9aZiCnUt4WAd54Zwbkc9o+1kBtaFuaWDvOdQHUf0WDwEIQxjnV4+SZujV9
 xl1TV5yV35hRQgrDE8ZSbtOYRmhSVoi0MEgwqAjzdN2fEPyWVeqwYULDtpOopjL2
 UHQgv0E2fYVRWennHyQQ88tWBQg+EsRaG1U1/rYHhNBmAJ+f9AOxKi7ErzxYfkbM
 961B+3E++pM+zUeqw6+jaMKqT5jeCCM5ugCNSG4NrIvfxDIDgecAFV9Fs2islnI4
 8xd3GqyA5iqaitAWIUsaYaQfaAcwSIlpSinfQW9EUm2wuCkPyZboFP+GRd2K7sQn
 FnRJSJ9PkGPdWwdDE3gunLHBHtbDS0z+R8VegIeS0qT8LamkqICiNQSyPlsTeluW
 ig2kwHsDdj3k11wyelhfp/RdtsOch/brKpLSjdzPXC1BzIWhQLwmsPh9qZ83vSB9
 qbLsdnM/IPQXocWB6fOhmwaGsLeRalxs2yQFM0zdJCwpaU9dzKsJrxepAXVuq31p
 r0fygWTp8GVevHXzfS7fRya8xjsTRrSs6n2kH7ErOfiep13HQypAjbyLswNe4kW/
 D6x8pVC3AhdGkl/9CW4m
 =oUlh
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Lots of updates for net-next for this cycle. As usual, we have
a lot of small fixes and cleanups, the bigger items are:
 * proper mac80211 rate control locking, to fix some random crashes
   (this required changing other locking as well)
 * mac80211 "fast-xmit", a mechanism to reduce, in most cases, the
   amount of code we execute while going from ndo_start_xmit() to
   the driver
 * this also clears the way for properly supporting S/G and checksum
   and segmentation offloads
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 17:27:25 -04:00
Eric Dumazet
e520af48c7 tcp: add TCPWinProbe and TCPKeepAlive SNMP counters
Diagnosing problems related to Window Probes has been hard because
we lack a counter.

TCPWinProbe counts the number of ACK packets a sender has to send
at regular intervals to make sure a reverse ACK packet opening back
a window had not been lost.

TCPKeepAlive counts the number of ACK packets sent to keep TCP
flows alive (SO_KEEPALIVE)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 16:42:32 -04:00
Eric Dumazet
21c8fe9915 tcp: adjust window probe timers to safer values
With the advent of small rto timers in datacenter TCP,
(ip route ... rto_min x), the following can happen :

1) Qdisc is full, transmit fails.

   TCP sets a timer based on icsk_rto to retry the transmit, without
   exponential backoff.
   With low icsk_rto, and lot of sockets, all cpus are servicing timer
   interrupts like crazy.
   Intent of the code was to retry with a timer between 200 (TCP_RTO_MIN)
   and 500ms (TCP_RESOURCE_PROBE_INTERVAL)

2) Receivers can send zero windows if they don't drain their receive queue.

   TCP sends zero window probes, based on icsk_rto current value, with
   exponential backoff.
   With /proc/sys/net/ipv4/tcp_retries2 being 15 (or even smaller in
   some cases), sender can abort in less than one or two minutes !
   If receiver stops the sender, it obviously doesn't care of very tight
   rto. Probability of dropping the ACK reopening the window is not
   worth the risk.

Lets change the base timer to be at least 200ms (TCP_RTO_MIN) for these
events (but not normal RTO based retransmits)

A followup patch adds a new SNMP counter, as it would have helped a lot
diagnosing this issue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 16:42:32 -04:00
Arik Nemtsov
06f207fc54 cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STA
The GO_CONCURRENT regulatory definition can be extended to station
interfaces requesting to IR as part of TDLS off-channel operations.
Rename the GO_CONCURRENT flag to IR_CONCURRENT and allow the added
use-case.

Change internal users of GO_CONCURRENT to use the new definition.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-06 15:50:02 +02:00
Johannes Berg
a31cf1c69e mac80211: extend get_key() to return PN for all ciphers
For ciphers not supported by mac80211, the function currently
doesn't return any PN data. Fix this by extending the driver's
get_key_seq() a little more to allow moving arbitrary PN data.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-06 13:30:00 +02:00
Johannes Berg
9352c19f63 mac80211: extend get_tkip_seq to all keys
Extend the function to read the TKIP IV32/IV16 to read the IV/PN for
all ciphers in order to allow drivers with full hardware crypto to
properly support this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-06 13:29:59 +02:00
Eric Dumazet
cd8ae85299 tcp: provide SYN headers for passive connections
This patch allows a server application to get the TCP SYN headers for
its passive connections.  This is useful if the server is doing
fingerprinting of clients based on SYN packet contents.

Two socket options are added: TCP_SAVE_SYN and TCP_SAVED_SYN.

The first is used on a socket to enable saving the SYN headers
for child connections. This can be set before or after the listen()
call.

The latter is used to retrieve the SYN headers for passive connections,
if the parent listener has enabled TCP_SAVE_SYN.

TCP_SAVED_SYN is read once, it frees the saved SYN headers.

The data returned in TCP_SAVED_SYN are network (IPv4/IPv6) and TCP
headers.

Original patch was written by Tom Herbert, I changed it to not hold
a full skb (and associated dst and conntracking reference).

We have used such patch for about 3 years at Google.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05 16:02:34 -04:00
Johannes Berg
f5c4ae0799 mac80211: make LED trigger names const
This is just a code cleanup, make the LED trigger names const
as they're not expected to be modified by drivers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05 14:21:55 +02:00
David S. Miller
b7ba7b469a We have only a few fixes right now:
* a fix for an issue with hash collision handling in the
    rhashtable conversion
  * a merge issue - rhashtable removed default shrinking
    just before mac80211 was converted, so enable it now
  * remove an invalid WARN that can trigger with legitimate
    userspace behaviour
  * add a struct member missing from kernel-doc that caused
    a lot of warnings
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVR1FIAAoJEDBSmw7B7bqrbkQP/RPleNMLHNOLRoJva6vNVers
 hiyLwG+lpV9yHCUX1I7TsvyTHddjW3ANNC1LvlKyUiNzBBsr/Yg/Qe1RVGkRTK7E
 ZPTU7xutk2nSLjvbVLjTQtIAynKZM6W9vM3X16WtsZ9+25z4e8jN8BjhzP99Vs7X
 IkwL7Qe2q1O/ta+/ByVR4dFWtGyJDxRjXkFs4cQ/VdL/fJalKxIa9YJrmUwIBMOU
 UIuJ1lh7kkNFGky+4Y3dBA/2L+O/iS64dxe+ygLMJGf9xdGGHwyuM6oe6OKEKPi1
 pF0NIWgHRGOUEJNyUKZTOwCyk1PeOr9aFFDHN4GJyE463KzZqVYdA4ajI6udwL2h
 I/AjsBnM0zQcoBD8HUeJK2ZgndPxOrMZGVjs0/M3K9lFocYBALoLkt5YPD1PXOZs
 9EQ8mhej3/xbwyEaaAp+HJPOr/wLZ//juxsPP8HSR1BDm8cR40rd9rETEGWnmrxt
 5vpmb2vNpKekw+pKf2vwuEH6BwH02nuz4YPudTswiEzObEQAuGx+rpopQVI9bdgE
 V/r3WY0cl+bDMrcSpoZ7v6FDoHHA/zPq9aAAV0JsMyYApY0nMdqwShBJq5cTSoK9
 rK5I6oIbqChskpoWULUVbbjb1fLsNVI4C89KjxW8xziTEY1d2fEeeJ2tR0lYBX/5
 /KOygSze3Qi0MjrdE9D2
 =sf/A
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
We have only a few fixes right now:
 * a fix for an issue with hash collision handling in the
   rhashtable conversion
 * a merge issue - rhashtable removed default shrinking
   just before mac80211 was converted, so enable it now
 * remove an invalid WARN that can trigger with legitimate
   userspace behaviour
 * add a struct member missing from kernel-doc that caused
   a lot of warnings
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 16:00:55 -04:00
David S. Miller
73e84313ee Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-05-04

Here's the first bluetooth-next pull request for 4.2:

 - Various fixes for at86rf230 driver
 - ieee802154: trace events support for rdev->ops
 - HCI UART driver refactoring
 - New Realtek IDs added to btusb driver
 - Off-by-one fix for rtl8723b in btusb driver
 - Refactoring of btbcm driver for both UART & USB use

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 15:36:07 -04:00
Linus Lüssing
9afd85c9e4 net: Export IGMP/MLD message validation code
With this patch, the IGMP and MLD message validation functions are moved
from the bridge code to IPv4/IPv6 multicast files. Some small
refactoring was done to enhance readibility and to iron out some
differences in behaviour between the IGMP and MLD parsing code (e.g. the
skb-cloning of MLD messages is now only done if necessary, just like the
IGMP part always did).

Finally, these IGMP and MLD message validation functions are exported so
that not only the bridge can use it but batman-adv later, too.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 14:49:23 -04:00
Randy Dunlap
ff419b3f95 mac80211: fix 90 kernel-doc warnings
Eliminate 90 of these warnings:

Warning(..//include/net/mac80211.h:1682): No description found for parameter 'drv_priv[0]'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-04 12:56:13 +02:00
Tom Herbert
2f59e1ebaa net: Add flow_keys digest
Some users of flow keys (well just sch_choke now) need to pass
flow_keys in skbuff cb, and use them for exact comparisons of flows
so that skb->hash is not sufficient. In order to increase size of
the flow_keys structure, we introduce another structure for
the purpose of passing flow keys in skbuff cb. We limit this structure
to sixteen bytes, and we will technically treat this as a digest of
flow_keys struct hence its name flow_keys_digest. In the first
incaranation we just copy the flow_keys structure up to 16 bytes--
this is the same information previously passed in the cb. In the
future, we'll adapt this for larger flow_keys and could use something
like SHA-1 over the whole flow_keys to improve the quality of the
digest.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 00:09:09 -04:00
Eric Dumazet
a5d2809040 codel: fix maxpacket/mtu confusion
Under presence of TSO/GSO/GRO packets, codel at low rates can be quite
useless. In following example, not a single packet was ever dropped,
while average delay in codel queue is ~100 ms !

qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
 Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 13626b 3p requeues 0
  count 0 lastcount 0 ldelay 96.9ms drop_next 0us
  maxpacket 9084 ecn_mark 0 drop_overlimit 0

This comes from a confusion of what should be the minimal backlog. It is
pretty clear it is not 64KB or whatever max GSO packet ever reached the
qdisc.

codel intent was to use MTU of the device.

After the fix, we finally drop some packets, and rtt/cwnd of my single
TCP flow are meeting our expectations.

qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
 Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0)
 backlog 6056b 3p requeues 0
  count 1 lastcount 1 ldelay 36.3ms drop_next 0us
  maxpacket 10598 ecn_mark 0 drop_overlimit 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 22:17:40 -04:00
Tom Herbert
82a584b7cd ipv6: Flow label state ranges
This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.

Background:

IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.

Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one.  Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.

This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.

Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 21:58:01 -04:00
Florian Westphal
4749c3ef85 net: sched: remove TC_MUNGED bits
Not used.

pedit sets TC_MUNGED when packet content was altered, but all the core
does is unset MUNGED again and then set OK2MUNGE.

And the latter isn't tested anywhere. So lets remove both
TC_MUNGED and TC_OK2MUNGE.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-02 22:25:17 -04:00
Martin KaFai Lau
afc4eef80c ipv6: Remove DST_METRICS_FORCE_OVERWRITE and _rt6i_peer
_rt6i_peer is no longer needed after the last patch,
'ipv6: Stop rt6_info from using inet_peer's metrics'.

DST_METRICS_FORCE_OVERWRITE is added by
commit e5fd387ad5 ("ipv6: do not overwrite inetpeer metrics prematurely").
Since inetpeer is no longer used for metrics, this bit is also not needed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01 20:57:06 -04:00
Martin KaFai Lau
4b32b5ad31 ipv6: Stop rt6_info from using inet_peer's metrics
inet_peer is indexed by the dst address alone.  However, the fib6 tree
could have multiple routing entries (rt6_info) for the same dst. For
example,
1. A /128 dst via multiple gateways.
2. A RTF_CACHE route cloned from a /128 route.

In the above cases, all of them will share the same metrics and
step on each other.

This patch will steer away from inet_peer's metrics and use
dst_cow_metrics_generic() for everything.

Change Highlights:
1. Remove rt6_cow_metrics() which currently acquires metrics from
   inet_peer for DST_HOST route (i.e. /128 route).
2. Add rt6i_pmtu to take care of the pmtu update to avoid creating a
   full size metrics just to override the RTAX_MTU.
3. After (2), the RTF_CACHE route can also share the metrics with its
   dst.from route, by:
   dst_init_metrics(&cache_rt->dst, dst_metrics_ptr(cache_rt->dst.from), true);
4. Stop creating RTF_CACHE route by cloning another RTF_CACHE route.  Instead,
   directly clone from rt->dst.

   [ Currently, cloning from another RTF_CACHE is only possible during
     rt6_do_redirect().  Also, the old clone is removed from the tree
     immediately after the new clone is added. ]

   In case of cloning from an older redirect RTF_CACHE, it should work as
   before.

   In case of cloning from an older pmtu RTF_CACHE, this patch will forget
   the pmtu and re-learn it (if there is any) from the redirected route.

The _rt6i_peer and DST_METRICS_FORCE_OVERWRITE will be removed
in the next cleanup patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01 20:57:06 -04:00
Varka Bhadram
5b4a103904 cfg802154: pass name_assign_type to rdev_add_virtual_intf()
This code is based on commit 6bab2e19c5
("cfg80211: pass name_assign_type to rdev_add_virtual_intf()")

This will expose in sysfs whether the ifname of a IEEE-802.15.4
device is set by userspace or generated by the kernel.
We are using two types of name_assign_types
 o NET_NAME_ENUM: Default interface name provided by kernel
 o NET_NAME_USER: Interface name provided by user.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30 18:48:10 +02:00
Varka Bhadram
42fb23e2f5 mac802154: add description to mac802154 APIs
This patch adds the proper description to the mac802154 core APIs.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30 18:48:09 +02:00
Eric Dumazet
64f40ff5bb tcp: prepare CC get_info() access from getsockopt()
We would like that optional info provided by Congestion Control
modules using netlink can also be read using getsockopt()

This patch changes get_info() to put this information in a buffer,
instead of skb, like tcp_get_info(), so that following patch
can reuse this common infrastructure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 17:10:38 -04:00
Eric Dumazet
0df48c26d8 tcp: add tcpi_bytes_acked to tcp_info
This patch tracks total number of bytes acked for a TCP socket.
This is the sum of all changes done to tp->snd_una, and allows
for precise tracking of delivered data.

RFC4898 named this : tcpEStatsAppHCThruOctetsAcked

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_acked was placed near tp->snd_una for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 17:10:37 -04:00
Matan Barak
73b5a6f2a7 net/bonding: Make DRV macros private
The bonding modules currently defines four macros with
general names that pollute the global namespace:
DRV_VERSION
DRV_RELDATE
DRV_NAME
DRV_DESCRIPTION

Fixing that by defining a private bonding_priv.h
header files which includes those defines.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-26 22:59:53 -04:00
Eric Dumazet
b357a364c5 inet: fix possible panic in reqsk_queue_unlink()
[ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
 0000000000000080
[ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243

There is a race when reqsk_timer_handler() and tcp_check_req() call
inet_csk_reqsk_queue_unlink() on the same req at the same time.

Before commit fa76ce7328 ("inet: get rid of central tcp/dccp listener
timer"), listener spinlock was held and race could not happen.

To solve this bug, we change reqsk_queue_unlink() to not assume req
must be found, and we return a status, to conditionally release a
refcount on the request sock.

This also means tcp_check_req() in non fastopen case might or not
consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have
to properly handle this.

(Same remark for dccp_check_req() and its callers)

inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
called 4 times in tcp and 3 times in dccp.

Fixes: fa76ce7328 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-24 11:39:15 -04:00
Emmanuel Grumbach
b497de63ad mac80211: notify the driver on reordering buffer timeout
When frames time out in the reordering buffer, it is a
good indication that something went wrong and the driver
may want to know about that to take action or trigger
debug flows.
It is pointless to notify the driver about each frame that
is released. Notify each time the timer fires.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-24 12:25:01 +02:00
Emmanuel Grumbach
6382246e89 mac80211: notify the driver upon BAR Rx
When we receive a BAR, this typically means that our peer
doesn't hear our Block-Acks or that we can't hear its
frames. Either way, it is a good indication that the link
is in a bad condition. This is why it can serve as a probe
to the driver.
Use the event_callback callback for this.
Since more events with the same data will be added in the
feature, the structure that describes the data attached to
the event is called in a generic name: ieee80211_ba_event.

This also means that from now on, the event_callback can't
sleep.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-24 12:25:00 +02:00
Johannes Berg
df1404650c mac80211: remove support for IFF_PROMISC
This support is essentially useless as typically networks are encrypted,
frames will be filtered by hardware, and rate scaling will be done with
the intended recipient in mind. For real monitoring of the network, the
monitor mode support should be used instead.

Removing it removes a lot of corner cases.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-24 11:14:13 +02:00
Johannes Berg
680a0daba7 mac80211: allow drivers to support S/G
If drivers want to support S/G (really just gather DMA on TX) then
we can now easily support this on the fast-xmit path since it just
needs to write to the ethernet header (and already has a check for
that being possible.)

However, disallow this on the regular TX path (which has to handle
fragmentation, software crypto, etc.) by calling skb_linearize().

Also allow the related HIGHDMA since that's not interesting to the
code in mac80211 at all anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 11:25:30 +02:00
Johannes Berg
17c18bf880 mac80211: add TX fastpath
In order to speed up mac80211's TX path, add the "fast-xmit" cache
that will cache the data frame 802.11 header and other data to be
able to build the frame more quickly. This cache is rebuilt when
external triggers imply changes, but a lot of the checks done per
packet today are simplified away to the check for the cache.

There's also a more detailed description in the code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 10:02:25 +02:00
Jonathan Corbet
a839e463e8 mac80211: Fix mac80211.h docbook comments
A couple of enums in mac80211.h became structures recently, but the
comments didn't follow suit, leading to errors like:

  Error(.//include/net/mac80211.h:367): Cannot parse enum!
  Documentation/DocBook/Makefile:93: recipe for target 'Documentation/DocBook/80211.xml' failed
  make[1]: *** [Documentation/DocBook/80211.xml] Error 1
  Makefile:1361: recipe for target 'mandocs' failed
  make: *** [mandocs] Error 2

Fix the comments comments accordingly.  Added a couple of other small
comment fixes while I was there to silence other recently-added docbook
warnings.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-20 13:05:30 +02:00
Linus Torvalds
388f997620 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix verifier memory corruption and other bugs in BPF layer, from
    Alexei Starovoitov.

 2) Add a conservative fix for doing BPF properly in the BPF classifier
    of the packet scheduler on ingress.  Also from Alexei.

 3) The SKB scrubber should not clear out the packet MARK and security
    label, from Herbert Xu.

 4) Fix oops on rmmod in stmmac driver, from Bryan O'Donoghue.

 5) Pause handling is not correct in the stmmac driver because it
    doesn't take into consideration the RX and TX fifo sizes.  From
    Vince Bridgers.

 6) Failure path missing unlock in FOU driver, from Wang Cong.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  net: dsa: use DEVICE_ATTR_RW to declare temp1_max
  netns: remove BUG_ONs from net_generic()
  IB/ipoib: Fix ndo_get_iflink
  sfc: Fix memcpy() with const destination compiler warning.
  altera tse: Fix network-delays and -retransmissions after high throughput.
  net: remove unused 'dev' argument from netif_needs_gso()
  act_mirred: Fix bogus header when redirecting from VLAN
  inet_diag: fix access to tcp cc information
  tcp: tcp_get_info() should fetch socket fields once
  net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()
  skbuff: Do not scrub skb mark within the same name space
  Revert "net: Reset secmark when scrubbing packet"
  bpf: fix two bugs in verification logic when accessing 'ctx' pointer
  bpf: fix bpf helpers to use skb->mac_header relative offsets
  stmmac: Configure Flow Control to work correctly based on rxfifo size
  stmmac: Enable unicast pause frame detect in GMAC Register 6
  stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree
  stmmac: Add defines and documentation for enabling flow control
  stmmac: Add properties for transmit and receive fifo sizes
  stmmac: fix oops on rmmod after assigning ip addr
  ...
2015-04-17 16:31:08 -04:00
Denys Vlasenko
2591ffd308 netns: remove BUG_ONs from net_generic()
This inline has ~500 callsites.

On 04/14/2015 08:37 PM, David Miller wrote:
> That BUG_ON() was added 7 years ago, and I don't remember it ever
> triggering or helping us diagnose something, so just remove it and
> keep the function inlined.

On x86 allyesconfig build:

    text     data      bss       dec     hex filename
82447071 22255384 20627456 125329911 77861f7 vmlinux4
82441375 22255384 20627456 125324215 7784bb7 vmlinux5prime

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: David S. Miller <davem@davemloft.net>
CC: Jan Engelhardt <jengelh@medozas.de>
CC: Jiri Pirko <jpirko@redhat.com>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17 15:21:48 -04:00
Eric Dumazet
521f1cf1db inet_diag: fix access to tcp cc information
Two different problems are fixed here :

1) inet_sk_diag_fill() might be called without socket lock held.
   icsk->icsk_ca_ops can change under us and module be unloaded.
   -> Access to freed memory.
   Fix this using rcu_read_lock() to prevent module unload.

2) Some TCP Congestion Control modules provide information
   but again this is not safe against icsk->icsk_ca_ops
   change and nla_put() errors were ignored. Some sockets
   could not get the additional info if skb was almost full.

Fix this by returning a status from get_info() handlers and
using rcu protection as well.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17 13:28:31 -04:00
Linus Torvalds
fa927894bb Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs update from Al Viro:
 "Now that net-next went in...  Here's the next big chunk - killing
  ->aio_read() and ->aio_write().

  There'll be one more pile today (direct_IO changes and
  generic_write_checks() cleanups/fixes), but I'd prefer to keep that
  one separate"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  ->aio_read and ->aio_write removed
  pcm: another weird API abuse
  infinibad: weird APIs switched to ->write_iter()
  kill do_sync_read/do_sync_write
  fuse: use iov_iter_get_pages() for non-splice path
  fuse: switch to ->read_iter/->write_iter
  switch drivers/char/mem.c to ->read_iter/->write_iter
  make new_sync_{read,write}() static
  coredump: accept any write method
  switch /dev/loop to vfs_iter_write()
  serial2002: switch to __vfs_read/__vfs_write
  ashmem: use __vfs_read()
  export __vfs_read()
  autofs: switch to __vfs_write()
  new helper: __vfs_write()
  switch hugetlbfs to ->read_iter()
  coda: switch to ->read_iter/->write_iter
  ncpfs: switch to ->read_iter/->write_iter
  net/9p: remove (now-)unused helpers
  p9_client_attach(): set fid->uid correctly
  ...
2015-04-15 13:22:56 -07:00
David S. Miller
bae97d8410 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

A final pull request, I know it's very late but this time I think it's worth a
bit of rush.

The following patchset contains Netfilter/nf_tables updates for net-next, more
specifically concatenation support and dynamic stateful expression
instantiation.

This also comes with a couple of small patches. One to fix the ebtables.h
userspace header and another to get rid of an obsolete example file in tree
that describes a nf_tables expression.

This time, I decided to paste the original descriptions. This will result in a
rather large commit description, but I think these bytes to keep.

Patrick McHardy says:

====================
netfilter: nf_tables: concatenation support

The following patches add support for concatenations, which allow multi
dimensional exact matches in O(1).

The basic idea is to split the data registers, currently consisting of
4 registers of 16 bytes each, into smaller units, 16 registers of 4
bytes each, and making sure each register store always leaves the
full 32 bit in a well defined state, meaning smaller stores will
zero the remaining bits.

Based on that, we can load multiple adjacent registers with different
values, thereby building a concatenated bigger value, and use that
value for set lookups.

Sets are changed to use variable sized extensions for their key and
data values, removing the fixed limit of 16 bytes while saving memory
if less space is needed.

As a side effect, these patches will allow some nice optimizations in
the future, like using jhash2 in nft_hash, removing the masking in
nft_cmp_fast, optimized data comparison using 32 bit word size etc.
These are not done so far however.

The patches are split up as follows:

 * the first five patches add length validation to register loads and
   stores to make sure we stay within bounds and prepare the validation
   functions for the new addressing mode

 * the next patches prepare for changing to 32 bit addressing by
   introducing a struct nft_regs, which holds the verdict register as
   well as the data registers. The verdict members are moved to a new
   struct nft_verdict to allow to pull struct nft_data out of the stack.

 * the next patches contain preparatory conversions of expressions and
   sets to use 32 bit addressing

 * the next patch introduces so far unused register conversion helpers
   for parsing and dumping register numbers over netlink

 * following is the real conversion to 32 bit addressing, consisting of
   replacing struct nft_data in struct nft_regs by an array of u32s and
   actually translating and validating the new register numbers.

 * the final two patches add support for variable sized data items and
   variable sized keys / data in set elements

The patches have been verified to work correctly with nft binaries using
both old and new addressing.
====================

Patrick McHardy says:

====================
netfilter: nf_tables: dynamic stateful expression instantiation

The following patches are the grand finale of my nf_tables set work,
using all the building blocks put in place by the previous patches
to support something like iptables hashlimit, but a lot more powerful.

Sets are extended to allow attaching expressions to set elements.
The dynset expression dynamically instantiates these expressions
based on a template when creating new set elements and evaluates
them for all new or updated set members.

In combination with concatenations this effectively creates state
tables for arbitrary combinations of keys, using the existing
expression types to maintain that state. Regular set GC takes care
of purging expired states.

We currently support two different stateful expressions, counter
and limit. Using limit as a template we can express the functionality
of hashlimit, but completely unrestricted in the combination of keys.
Using counter we can perform accounting for arbitrary flows.

The following examples from patch 5/5 show some possibilities.
Userspace syntax is still WIP, especially the listing of state
tables will most likely be seperated from normal set listings
and use a more structured format:

1. Limit the rate of new SSH connections per host, similar to iptables
   hashlimit:

        flow ip saddr timeout 60s \
        limit 10/second \
        accept

2. Account network traffic between each set of /24 networks:

        flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
        counter

3. Account traffic to each host per user:

        flow skuid . ip daddr \
        counter

4. Account traffic for each combination of source address and TCP flags:

        flow ip saddr . tcp flags \
        counter

The resulting set content after a Xmas-scan look like this:

{
        192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
        192.168.122.1 . ack : counter packets 74 bytes 3848,
        192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}

In the future the "expressions attached to elements" will be extended
to also support user created non-stateful expressions to allow to
efficiently select beween a set of parameter sets, f.i. a set of log
statements with different prefixes based on the interface, which currently
require one rule each. This will most likely have to wait until the next
kernel version though.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 18:51:19 -04:00
David S. Miller
6e8a9d9148 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Al Viro says:

====================
netdev-related stuff in vfs.git

There are several commits sitting in vfs.git that probably ought to go in
via net-next.git.  First of all, there's merge with vfs.git#iocb - that's
Christoph's aio rework, which has triggered conflicts with the ->sendmsg()
and ->recvmsg() patches a while ago.  It's not so much Christoph's stuff
that ought to be in net-next, as (pretty simple) conflict resolution on merge.
The next chunk is switch to {compat_,}import_iovec/import_single_range - new
safer primitives for initializing iov_iter.  The primitives themselves come
from vfs/git#iov_iter (and they are used quite a lot in vfs part of queue),
conversion of net/socket.c syscalls belongs in net-next, IMO.  Next there's
afs and rxrpc stuff from dhowells.  And then there's sanitizing kernel_sendmsg
et.al.  + missing inlined helper for "how much data is left in msg->msg_iter" -
this stuff is used in e.g.  cifs stuff, but it belongs in net-next.

That pile is pullable from
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-davem

I'll post the individual patches in there in followups; could you take a look
and tell if everything in there is OK with you?
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13 18:18:05 -04:00
Eric Dumazet
789f558cfb tcp/dccp: get rid of central timewait timer
Using a timer wheel for timewait sockets was nice ~15 years ago when
memory was expensive and machines had a single processor.

This does not scale, code is ugly and source of huge latencies
(Typically 30 ms have been seen, cpus spinning on death_lock spinlock.)

We can afford to use an extra 64 bytes per timewait sock and spread
timewait load to all cpus to have better behavior.

Tested:

On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1
on the target (lpaa24)

Before patch :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
419594

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
437171

While test is running, we can observe 25 or even 33 ms latencies.

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms
rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms
rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2

After patch :

About 90% increase of throughput :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
810442

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
800992

And latencies are kept to minimal values during this load, even
if network utilization is 90% higher :

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms
rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-13 16:40:05 -04:00
Patrick McHardy
151d799a61 netfilter: nf_tables: mark stateful expressions
Add a flag to mark stateful expressions.

This is used for dynamic expression instanstiation to limit the usable
expressions. Strictly speaking only the dynset expression can not be
used in order to avoid recursion, but since dynamically instantiating
non-stateful expressions will simply create an identical copy, which
behaves no differently than the original, this limits to expressions
where it actually makes sense to dynamically instantiate them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 20:12:31 +02:00
Patrick McHardy
f25ad2e907 netfilter: nf_tables: prepare for expressions associated to set elements
Preparation to attach expressions to set elements: add a set extension
type to hold an expression and dump the expression information with the
set element.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 20:12:31 +02:00
Patrick McHardy
0b2d8a7b63 netfilter: nf_tables: add helper functions for expression handling
Add helper functions for initializing, cloning, dumping and destroying
a single expression that is not part of a rule.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 20:12:31 +02:00
Patrick McHardy
7d7402642e netfilter: nf_tables: variable sized set element keys / data
This patch changes sets to support variable sized set element keys / data
up to 64 bytes each by using variable sized set extensions. This allows
to use concatenations with bigger data items suchs as IPv6 addresses.

As a side effect, small keys/data now don't require the full 16 bytes
of struct nft_data anymore but just the space they need.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:31 +02:00
Patrick McHardy
d0a11fc3dc netfilter: nf_tables: support variable sized data in nft_data_init()
Add a size argument to nft_data_init() and pass in the available space.
This will be used by the following patches to support variable sized
set element data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:30 +02:00
Patrick McHardy
49499c3e6e netfilter: nf_tables: switch registers to 32 bit addressing
Switch the nf_tables registers from 128 bit addressing to 32 bit
addressing to support so called concatenations, where multiple values
can be concatenated over multiple registers for O(1) exact matches of
multiple dimensions using sets.

The old register values are mapped to areas of 128 bits for compatibility.
When dumping register numbers, values are expressed using the old values
if they refer to the beginning of a 128 bit area for compatibility.

To support concatenations, register loads of less than a full 32 bit
value need to be padded. This mainly affects the payload and exthdr
expressions, which both unconditionally zero the last word before
copying the data.

Userspace fully passes the testsuite using both old and new register
addressing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:29 +02:00
Patrick McHardy
b1c96ed37c netfilter: nf_tables: add register parsing/dumping helpers
Add helper functions to parse and dump register values in netlink attributes.
These helpers will later be changed to take care of translation between the
old 128 bit and the new 32 bit register numbers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:28 +02:00
Patrick McHardy
8cd8937ac0 netfilter: nf_tables: convert sets to u32 data pointers
Simple conversion to use u32 pointers to the beginning of the data
area to keep follow up patches smaller.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:27 +02:00
Patrick McHardy
e562d860d7 netfilter: nf_tables: kill nft_data_cmp()
Only needlessly complicates things due to requiring specific argument
types. Use memcmp directly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:26 +02:00
Patrick McHardy
1ca2e1702c netfilter: nf_tables: use struct nft_verdict within struct nft_data
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:24 +02:00
Patrick McHardy
a55e22e92f netfilter: nf_tables: get rid of NFT_REG_VERDICT usage
Replace the array of registers passed to expressions by a struct nft_regs,
containing the verdict as a seperate member, which aliases to the
NFT_REG_VERDICT register.

This is needed to seperate the verdict from the data registers completely,
so their size can be changed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:07 +02:00
Patrick McHardy
d07db9884a netfilter: nf_tables: introduce nft_validate_register_load()
Change nft_validate_input_register() to not only validate the input
register number, but also the length of the load, and rename it to
nft_validate_register_load() to reflect that change.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 16:25:50 +02:00
Patrick McHardy
27e6d2017a netfilter: nf_tables: kill nft_validate_output_register()
All users of nft_validate_register_store() first invoke
nft_validate_output_register(). There is in fact no use for using it
on its own, so simplify the code by folding the functionality into
nft_validate_register_store() and kill it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 16:25:50 +02:00
Patrick McHardy
1ec10212f9 netfilter: nf_tables: rename nft_validate_data_load()
The existing name is ambiguous, data is loaded as well when we read from
a register. Rename to nft_validate_register_store() for clarity and
consistency with the upcoming patch to introduce its counterpart,
nft_validate_register_load().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 16:25:49 +02:00
Patrick McHardy
45d9bcda21 netfilter: nf_tables: validate len in nft_validate_data_load()
For values spanning multiple registers, we need to validate that enough
space is available from the destination register onwards. Add a len
argument to nft_validate_data_load() and consolidate the existing length
validations in preparation of that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 16:25:49 +02:00
David S. Miller
4e78eb0dbf There isn't much left, but we have
* new mac80211 internal software queue to allow drivers to have
    shorter hardware queues and pull on-demand
  * use rhashtable for mac80211 station table
  * minstrel rate control debug improvements and some refactoring
  * fix noisy message about TX power reduction
  * fix continuous message printing and activity if CRDA doesn't respond
  * fix VHT-related capabilities with "iw connect" or "iwconfig ..."
  * fix Kconfig for cfg80211 wireless extensions compatibility
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVJ7CPAAoJEDBSmw7B7bqr8+IQAKCAbUyd6PFRT5tcz9kW5GCW
 /ibb+n1e14yWKgNEe1gddUGKG/L3HGCBXNkCYzR2M8mlL7dLPqspaBcGHS4dx8F4
 D0AuikqvtXIxfAXi0zmU2uo7rH7u2X34R2LtS8AlKByD+jmFvxMiPPvxNFgzJu/7
 63UQm73p2pnu/KdXLW1OQEcpZtZJ9+N/uBiq9zbVdX3A8T84ME0oMyy+EAQqCZdK
 CcsTXHCnAgmmXWJlu1JRdopr1bd38mSGB70eXduFtPqDdmtQRnoaCQ9e+tJDA4j4
 svEw0yDmsc4WG1EKLKKCRd3uFOZsng+lcXrHfpm5wlSPpCOItfQ9BzT3x1u6Y5JU
 Z1WMOMkkEce+95U7/RLoXwC/2RS3XelUXTde4cGIRMvO5drOrU58P0gdn3J+yKbv
 6v+2GGKy/39tdXUOxIl3EZT/huIl+h1UNO8C2hyaEwdXK+X1zl31/u6kk1Ns18Wr
 YPEJixxHx0zR8jaZgDC7OlWLuqn4Ay+Ls9yCyIesdHzKpizJKqn83PntYnpJmxoA
 9hlIyRDWnqH44KxzB85ni1C2Qudec3mcCWIWV7M+UoSC1Cgs/LxDzH7kRejR2ZIl
 vRhg5pqyr53L0h2lq5DO4Cj4UzbXb7YioKJRxjyKloNOlRrCZtK/VEsHbdsKEcIp
 d/wHj1AyFZeQfuhk8Qqr
 =mtuo
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
There isn't much left, but we have
 * new mac80211 internal software queue to allow drivers to have
   shorter hardware queues and pull on-demand
 * use rhashtable for mac80211 station table
 * minstrel rate control debug improvements and some refactoring
 * fix noisy message about TX power reduction
 * fix continuous message printing and activity if CRDA doesn't respond
 * fix VHT-related capabilities with "iw connect" or "iwconfig ..."
 * fix Kconfig for cfg80211 wireless extensions compatibility
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12 20:43:46 -04:00
Al Viro
e1200fe68f 9p: switch p9_client_read() to passing struct iov_iter *
... and make it loop

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:28:27 -04:00
Al Viro
070b3656cf 9p: switch p9_client_write() to passing it struct iov_iter *
... and make it loop until it's done

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:28:25 -04:00
Al Viro
4f3b35c157 net/9p: switch the guts of p9_client_{read,write}() to iov_iter
... and have get_user_pages_fast() mapping fewer pages than requested
to generate a short read/write.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:28:25 -04:00
Thomas Graf
78ebb0d00b rtnetlink: Mark name argument of rtnl_create_link() const
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-10 12:42:40 -07:00
David S. Miller
c3d0dac693 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-04-09

We've had enough new patches during the past week (especially from
Marcel) that it'd be good to still get these queued for 4.1.

The majority of the changes are from Marcel with lots of cleanup &
refactoring patches for the HCI UART driver. Marcel also split out some
Broadcom & Intel vendor specific functionality into two new btintel &
btbcm modules.

In addition to the HCI driver changes there's the completion of our
local OOB data interface for pairing, added support for requesting
remote LE features when connecting, as well as a couple of minor fixes
for mac802154.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09 18:31:50 -04:00
David S. Miller
ca69d7102f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree.
They are:

* nf_tables set timeout infrastructure from Patrick Mchardy.

1) Add support for set timeout support.

2) Add support for set element timeouts using the new set extension
   infrastructure.

4) Add garbage collection helper functions to get rid of stale elements.
   Elements are accumulated in a batch that are asynchronously released
   via RCU when the batch is full.

5) Add garbage collection synchronization helpers. This introduces a new
   element busy bit to address concurrent access from the netlink API and the
   garbage collector.

5) Add timeout support for the nft_hash set implementation. The garbage
   collector peridically checks for stale elements from the workqueue.

* iptables/nftables cgroup fixes:

6) Ignore non full-socket objects from the input path, otherwise cgroup
   match may crash, from Daniel Borkmann.

7) Fix cgroup in nf_tables.

8) Save some cycles from xt_socket by skipping packet header parsing when
   skb->sk is already set because of early demux. Also from Daniel.

* br_netfilter updates from Florian Westphal.

9) Save frag_max_size and restore it from the forward path too.

10) Use a per-cpu area to restore the original source MAC address when traffic
    is DNAT'ed.

11) Add helper functions to access physical devices.

12) Use these new physdev helper function from xt_physdev.

13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter
    state information.

14) Annotate original layer 2 protocol number in nf_bridge info, instead of
    using kludgy flags.

15) Also annotate the pkttype mangling when the packet travels back and forth
    from the IP to the bridge layer, instead of using a flag.

* More nf_tables set enhancement from Patrick:

16) Fix possible usage of set variant that doesn't support timeouts.

17) Avoid spurious "set is full" errors from Netlink API when there are pending
    stale elements scheduled to be released.

18) Restrict loop checks to set maps.

19) Add support for dynamic set updates from the packet path.

20) Add support to store optional user data (eg. comments) per set element.

BTW, I have also pulled net-next into nf-next to anticipate the conflict
resolution between your okfn() signature changes and Florian's br_netfilter
updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09 14:46:04 -04:00
Varka Bhadram
23310f6f3a mac802154: fix transmission power datatype
Netlink attribute for the power is s8. But for the driver level
operations we are collection power level value into integer.
It has to be change to s8 from int.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-09 19:56:15 +02:00
Varka Bhadram
94910d419a mac802154: fix typo for device
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-09 09:19:33 +02:00
Marcel Holtmann
0fe29fd1cd Bluetooth: Read LE remote features during connection establishment
When establishing a Bluetooth LE connection, read the remote used
features mask to determine which features are supported. This was
not really needed with Bluetooth 4.0, but since Bluetooth 4.1 and
also 4.2 have introduced new optional features, this becomes more
important.

This works the same as with BR/EDR where the connection enters the
BT_CONFIG stage and hci_connect_cfm call is delayed until the remote
features have been retrieved. Only after successfully receiving the
remote features, the connection enters the BT_CONNECTED state.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-04-09 08:36:54 +03:00
Al Viro
da18428498 net: switch importing msghdr from userland to {compat_,}import_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-09 00:02:26 -04:00
Al Viro
237dae8890 Merge branch 'iocb' into for-davem
trivial conflict in net/socket.c and non-trivial one in crypto -
that one had evaded aio_complete() removal.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-09 00:01:38 -04:00
David S. Miller
82740b9ac2 NFC: 4.1 pull request
This is the NFC pull request for 4.1.
 
 This is a shorter one than usual, as the Intel Field Peak NFC
 driver could not make it in time.
 
 We have:
 
 - A new driver for NXP NCI based chipsets, like e.g. the NPC100 or
   the PN7150. It currently only supports an i2c physical layer, but
   could easily be extended to work on top of e.g. SPI.
   This driver also includes support for user space triggered firmware
   updates.
 
 - A few minor st21nfc[ab] fixes, cleanups, and comments improvements.
 
 - A pn533 error return fix.
 
 - A few NFC related logs formatting cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVJV81AAoJEIqAPN1PVmxKM3sP/R6fKXMxtVxXsiPzBMk+SpBI
 onNXbCx27rp1lHTFznE16JAaaSfcQEwSQmYx7xa6KXqRYVlDfRfC+5R+rPGrCjfH
 kV/eLBNnYpmPw0ViVT7dsWK0b4me0k5pr9ki9mze3YxvuMbA5vdv0tvuRFz5IRF/
 hl+WI5pntGuRtnIyBKIauAMylylUYVvCBGhmHnveiX0Dp8noJLBSV0wvdzujm51S
 +Uio/jHlUEV2lotrQBOfWNtEkwonXSwzZWSzimBCyEGLAwTx4lGXHmQftOtPd3zE
 sOT7Gw77ZCsulHoHiJyC0KpDSDS0NYVrtTI5BiuGgAivGi0YZw2XD3CYRIdy0m2Z
 lMoodYdiCgsxmrU6I6Rd/7DQBxD0Vhc+CGAyk41f7AwU4Fwq105kpSLupTtU+NzT
 ZpdvSXeirU8sxt+3WDOgv8esyYGZVVD8GuBbofMZQZ0vLq+FcDpYAW4w3LKvvi6X
 C+WN8f7SCI0kqpd4leyl6EG3SoQKFyPWobu0IlV520R9b76iBcyqooTIpvVa4Yk7
 az6fKhi9gK/T6FHW78y6fnkczd47JKrC924m+g6P/hhTD5zQ956ferp0uFTjkPtF
 8kNgRT7OIRi1JO783cnQx1uA61axC3GX1HFzsD9paVkTwRNtJ3eFsxtcYt7Nfv8D
 WGNWRQp5LmBsD/SqFBfk
 =MzOJ
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC: 4.1 pull request

This is the NFC pull request for 4.1.

This is a shorter one than usual, as the Intel Field Peak NFC
driver could not make it in time.

We have:

- A new driver for NXP NCI based chipsets, like e.g. the NPC100 or
  the PN7150. It currently only supports an i2c physical layer, but
  could easily be extended to work on top of e.g. SPI.
  This driver also includes support for user space triggered firmware
  updates.

- A few minor st21nfc[ab] fixes, cleanups, and comments improvements.

- A pn533 error return fix.

- A few NFC related logs formatting cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-08 17:12:50 -04:00
Pablo Neira Ayuso
aadd51aa71 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Resolve conflicts between 5888b93 ("Merge branch 'nf-hook-compress'") and
Florian Westphal br_netfilter works.

Conflicts:
        net/bridge/br_netfilter.c

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08 18:30:21 +02:00
Patrick McHardy
68e942e88a netfilter: nf_tables: support optional userdata for set elements
Add an userdata set extension and allow the user to attach arbitrary
data to set elements. This is intended to hold TLV encoded data like
comments or DNS annotations that have no meaning to the kernel.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08 16:58:27 +02:00
Patrick McHardy
22fe54d5fe netfilter: nf_tables: add support for dynamic set updates
Add a new "dynset" expression for dynamic set updates.

A new set op ->update() is added which, for non existant elements,
invokes an initialization callback and inserts the new element.
For both new or existing elements the extenstion pointer is returned
to the caller to optionally perform timer updates or other actions.

Element removal is not supported so far, however that seems to be a
rather exotic need and can be added later on.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08 16:58:27 +02:00
Patrick McHardy
11113e190b netfilter: nf_tables: support different set binding types
Currently a set binding is assumed to be related to a lookup and, in
case of maps, a data load.

In order to use bindings for set updates, the loop detection checks
must be restricted to map operations only. Add a flags member to the
binding struct to hold the set "action" flags such as NFT_SET_MAP,
and perform loop detection based on these.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08 16:58:27 +02:00
Patrick McHardy
3dd0673ac3 netfilter: nf_tables: prepare set element accounting for async updates
Use atomic operations for the element count to avoid races with async
updates.

To properly handle the transactional semantics during netlink updates,
deleted but not yet committed elements are accounted for seperately and
are treated as being already removed. This means for the duration of
a netlink transaction, the limit might be exceeded by the amount of
elements deleted. Set implementations must be prepared to handle this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08 16:58:27 +02:00
Sheng Yong
8bc0034cf6 net: remove extra newlines
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07 22:24:37 -04:00
Daniel Lee
2646c831c0 tcp: RFC7413 option support for Fast Open client
Fast Open has been using an experimental option with a magic number
(RFC6994). This patch makes the client by default use the RFC7413
option (34) to get and send Fast Open cookies.  This patch makes
the client solicit cookies from a given server first with the
RFC7413 option. If that fails to elicit a cookie, then it tries
the RFC6994 experimental option. If that also fails, it uses the
RFC7413 option on all subsequent connect attempts.  If the server
returns a Fast Open cookie then the client caches the form of the
option that successfully elicited a cookie, and uses that form on
later connects when it presents that cookie.

The idea is to gradually obsolete the use of experimental options as
the servers and clients upgrade, while keeping the interoperability
meanwhile.

Signed-off-by: Daniel Lee <Longinus00@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07 18:36:39 -04:00
Daniel Lee
7f9b838b71 tcp: RFC7413 option support for Fast Open server
Fast Open has been using the experimental option with a magic number
(RFC6994) to request and grant Fast Open cookies. This patch enables
the server to support the official IANA option 34 in RFC7413 in
addition.

The change has passed all existing Fast Open tests with both
old and new options at Google.

Signed-off-by: Daniel Lee <Longinus00@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07 18:36:39 -04:00
Johan Hedberg
d619ffcfbd Bluetooth: Update SSP OOB data EIR definitions
Since Bluetooth 4.1 there are two additional values for SSP OOB data,
namely C-256 and R-256. This patch updates the EIR definitions to take
into account both the 192 and 256 bit variants of C and R.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-07 23:11:37 +02:00
David Miller
79b16aadea udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb().
That was we can make sure the output path of ipv4/ipv6 operate on
the UDP socket rather than whatever random thing happens to be in
skb->sk.

Based upon a patch by Jiri Pirko.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2015-04-07 15:29:08 -04:00
David Miller
7026b1ddb6 netfilter: Pass socket pointer down through okfn().
On the output paths in particular, we have to sometimes deal with two
socket contexts.  First, and usually skb->sk, is the local socket that
generated the frame.

And second, is potentially the socket used to control a tunneling
socket, such as one the encapsulates using UDP.

We do not want to disassociate skb->sk when encapsulating in order
to fix this, because that would break socket memory accounting.

The most extreme case where this can cause huge problems is an
AF_PACKET socket transmitting over a vxlan device.  We hit code
paths doing checks that assume they are dealing with an ipv4
socket, but are actually operating upon the AF_PACKET one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07 15:25:55 -04:00
Marcel Holtmann
2d7cc19eeb Bluetooth: Remove hci_recv_stream_fragment function
The hci_recv_stream_fragment function should have never been introduced
in the first place. The Bluetooth core does not need to know anything
about the HCI transport protocol.

With all transport protocol specific detailed moved back into the
drivers where they belong (mainly generic USB and UART drivers), this
function can now be removed.

This reduces the size of hci_dev structure and also removes an exported
symbol from the Bluetooth core module.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-04-07 18:47:10 +02:00
Marcel Holtmann
5c7d2dd285 Bluetooth: Make data pointer of hci_recv_stream_fragment const
The data pointer provided to hci_recv_stream_fragment function should
have been marked const. The function has no business in modifying the
original data. So fix this now.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-04-07 18:47:09 +02:00
David S. Miller
7abccdba25 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-04-04

Here's what's probably the last bluetooth-next pull request for 4.1:

 - Fixes for LE advertising data & advertising parameters
 - Fix for race condition with HCI_RESET flag
 - New BNEPGETSUPPFEAT ioctl, needed for certification
 - New HCI request callback type to get the resulting skb
 - Cleanups to use BIT() macro wherever possible
 - Consolidate Broadcom device entries in the btusb HCI driver
 - Check for valid flags in CMTP, HIDP & BNEP
 - Disallow local privacy & OOB data combo to prevent a potential race
 - Expose SMP & ECDH selftest results through debugfs
 - Expose current Device ID info through debugfs

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07 11:47:52 -04:00
Johannes Berg
29464ccc78 cfg80211: move IE split utilities here from mac80211
As the next patch will require the IE splitting utility functions
in cfg80211, move them there from mac80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-07 13:56:41 +02:00
David S. Miller
c85d6975ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/cmd.c
	net/core/fib_rules.c
	net/ipv4/fib_frontend.c

The fib_rules.c and fib_frontend.c conflicts were locking adjustments
in 'net' overlapping addition and removal of code in 'net-next'.

The mlx4 conflict was a bug fix in 'net' happening in the same
place a constant was being replaced with a more suitable macro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06 22:34:15 -04:00
hannes@stressinduktion.org
f60e5990d9 ipv6: protect skb->sk accesses from recursive dereference inside the stack
We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.

ipv6 does not conform with this in three places:

1) ip6_fragment: we do consult ipv6_npinfo for frag_size

2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
   loop the packet back to the local socket

3) ip6_skb_dst_mtu could query the settings from the user socket and
   force a wrong MTU

Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.

Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.

Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06 16:12:49 -04:00
Christophe Ricard
1f74f323e2 nfc: nci: Add comment to explain NCI_HCI_MAX_PIPES
According to specification etsi 102 622 chapter 4.4 pipes
identifier is 7 bits long giving a 127 possible pipes value.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-04-06 00:19:05 +02:00
Christophe Ricard
0fc4a1291a nfc: Reduce nfc_evt_transaction params length to 0
According to etsi 102 622 chapter 11.2.2.4 EVT_TRANSACTION,
the nfc_evt_transaction parameters can be 0 up to 255 byte long.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-04-06 00:18:29 +02:00
Christophe Ricard
0b040964a0 nfc: hci: Add comment to explain NFC_HCI_MAX_PIPES
According to specification etsi 102 622 chapter 4.4 pipes identifier
is 7 bits long giving a 127 possible pipes value.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-04-06 00:18:20 +02:00
David S. Miller
073bfd5686 netfilter: Pass nf_hook_state through nft_set_pktinfo*().
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04 12:54:27 -04:00
David S. Miller
8fe22382d1 netfilter: Pass nf_hook_state through nf_nat_ipv6_{in,out,fn,local_fn}().
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04 12:48:08 -04:00
David S. Miller
d7cf4081ed netfilter: Pass nf_hook_state through nf_nat_ipv4_{in,out,fn,local_fn}().
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04 12:45:19 -04:00
David S. Miller
1d1de89b9a netfilter: Use nf_hook_state in nf_queue_entry.
That way we don't have to reinstantiate another nf_hook_state
on the stack of the nf_reinject() path.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04 12:25:22 -04:00
Nicolas Dichtel
1e99584b91 ipip,gre,vti,sit: implement ndo_get_iflink
Don't use dev->iflink anymore.

CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 14:05:00 -04:00
Nicolas Dichtel
ecf2c06a88 ip6tnl,gre6,vti6: implement ndo_get_iflink
Don't use dev->iflink anymore.

CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 14:04:59 -04:00
Johan Hedberg
1b9441f8ec Bluetooth: Convert local OOB data reading to use HCI request
Now that there's a HCI request API available where the callback receives
the resulting skb, we can convert the local OOB data reading to use this
new API. This patch does the necessary update in mgmt.c (which also
requires moving the callback higher up since it's now a static function)
and removes the custom calls from hci_event.c that are no-longer
necessary.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-02 16:09:29 +02:00
Johan Hedberg
abe66a4d03 Bluetooth: Remove unused hci_req_pending() function
The hci_req_pending() function has no users anymore, so simply remove
it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-02 16:09:28 +02:00
Johan Hedberg
f7d9e97592 Bluetooth: Remove unneeded recv_event variable
Now that the synchronous HCI requests use the new API and a new private
variable the recv_evt member of hci_dev is no-longer needed. This patch
removes it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-02 16:09:27 +02:00
Johan Hedberg
f60cb30579 Bluetooth: Convert hci_req_sync family of function to new request API
Now that there's an API in place that allows passing the resulting skb
to the request callback we can conveniently convert the hci_req_sync and
related functions to use it. Since we still need to get the skb from the
async callback into the sleeping _sync() function the patch adds another
req_skb variable to hci_dev where the sync request state is tracked.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-02 16:09:27 +02:00
Johan Hedberg
e621448749 Bluetooth: Add second hci_request callback option for full skb
This patch adds a second possible callback for HCI requests where the
callback will receive the full skb of the last successfully completed
HCI command. This API is useful for cases where we want to use a request
to read some data and the existing hci_event.c handlers do not store it
e.g. in the hci_dev struct.

The reason the patch is a bit bigger than just adding the new API is
because the hci_req_cmd_complete() functions required some refactoring
to enable it: now hci_req_cmd_complete() is simply used to request the
callback pointers if any, and the actual calling of them happens from a
single place at the end of hci_event_packet(). The reason for this is
that we need to pass the original skb (without any skb_pull, etc
modifications done to it) and it's simplest to keep track of it within
the hci_event_packet() function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-02 16:09:27 +02:00
Felix Fietkau
ba8c3d6f16 mac80211: add an intermediate software queue implementation
This allows drivers to request per-vif and per-sta-tid queues from which
they can pull frames. This makes it easier to keep the hardware queues
short, and to improve fairness between clients and vifs.

The task of scheduling packet transmission is left up to the driver -
queueing is controlled by mac80211. Drivers can only dequeue packets by
calling ieee80211_tx_dequeue. This makes it possible to add active queue
management later without changing drivers using this code.

This can also be used as a starting point to implement A-MSDU
aggregation in a way that does not add artificially induced latency.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[resolved minor context conflict, minor changes, endian annotations]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-01 20:44:34 +02:00
Patrick McHardy
9d0982927e netfilter: nft_hash: add support for timeouts
Add support for element timeouts to nft_hash. The lookup and walking
functions are changed to ignore timed out elements, a periodic garbage
collection task cleans out expired entries.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:49 +02:00
Patrick McHardy
6908665826 netfilter: nf_tables: add GC synchronization helpers
GC is expected to happen asynchrously to the netlink interface. In the
netlink path, both insertion and removal of elements consist of two
steps, insertion followed by activation or deactivation followed by
removal, during which the element must not be freed by GC.

The synchronization helpers use an unused bit in the genmask field to
atomically mark an element as "busy", meaning it is either currently
being handled through the netlink API or by GC.

Elements being processed by GC will never survive, netlink will simply
ignore them. Elements being currently processed through netlink will be
skipped by GC and reprocessed during the next run.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:29 +02:00
Patrick McHardy
cfed7e1b1f netfilter: nf_tables: add set garbage collection helpers
Add helpers for GC batch destruction: since element destruction needs
a RCU grace period for all set implementations, add some helper functions
for asynchronous batch destruction. Elements are collected in a batch
structure, which is asynchronously released using RCU once its full.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:29 +02:00
Patrick McHardy
c3e1b005ed netfilter: nf_tables: add set element timeout support
Add API support for set element timeouts. Elements can have a individual
timeout value specified, overriding the sets' default.

Two new extension types are used for timeouts - the timeout value and
the expiration time. The timeout value only exists if it differs from
the default value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:28 +02:00
Patrick McHardy
761da2935d netfilter: nf_tables: add set timeout API support
Add set timeout support to the netlink API. Sets with timeout support
enabled can have a default timeout value and garbage collection interval
specified.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:28 +02:00
David S. Miller
7b6249bba9 Lots of updates for net-next; along with the usual flurry
of small fixes, cleanups and internal features we have:
  * VHT support for TDLS and IBSS (conditional on drivers though)
  * first TX performance improvements (the biggest will come later)
  * many suspend/resume (race) fixes
  * name_assign_type support from Tom Gundersen
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVGWlNAAoJEDBSmw7B7bqr5s8P/R9+Q6y4Ixice9dDOJYynl/d
 dMEUEfCBWUyDaQD1bNQIED2mc0sM5+Ax8OVIVx9fdrLGPxaISBqDJKE1USoTNZzm
 q+U3dM4Q45SfQSsgaY4FtxTlPWPtUKsNMXY/CxLR+IqVeA3+30rX+hv1f3SAqBj0
 68IwW/uUIHb71IZ+hz2Mwudt4JVW8KRg9VlZ0UY6EEvC4m5QD2YkwQQo/hJ2WF+/
 wAJbku02L/Vy4J7E6hNcKYWXokht4fVYphjl/1ZDd/+8L8SUv9mC88n1Jzxa428p
 1PmbtwzbpOrtTcC2BCPDA04IyfMc7k9DlLKw/h2KLPbHZXheD9eVmo/Am5vz+uH6
 926f+FK339SzoJnZ5wBBDiZ8W8TLYNc8ImxtcxjnrtGfr1CKiuh23P1CWyOlKJCO
 BYFJqkCOqWOHYnN0embaj7JqM/LmQI5ZoBZHZhD2KQXIXpTsjjIMPfJvip5D+tsV
 +iXIlQwdeK6rbjxdonBgn7n57XIeSVMAYeyDCbzIShfibjHbUZPn+RsZCtv8RWv8
 EaZu8PerU5ZDKwdX940+lWrtf/TMDJBYQpAIBRuiZK4DTNWCt3BrDlvb1FXGgA+X
 vQJnr32vjJ/pLDxNLHQlkKWC4I/CYtG47OgcJN9AQXrig1zApd+C29zy3aqch3ea
 wxV9dFfheTqZFjtZfSsH
 =O/cf
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Lots of updates for net-next; along with the usual flurry
of small fixes, cleanups and internal features we have:
 * VHT support for TDLS and IBSS (conditional on drivers though)
 * first TX performance improvements (the biggest will come later)
 * many suspend/resume (race) fixes
 * name_assign_type support from Tom Gundersen
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:39:04 -04:00
Jiri Benc
67b61f6c13 netlink: implement nla_get_in_addr and nla_get_in6_addr
Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Jiri Benc
930345ea63 netlink: implement nla_put_in_addr and nla_put_in6_addr
IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Jiri Benc
15e318bdc6 xfrm: simplify xfrm_address_t use
In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and
get rid of the typecasting.

Modifying the uapi header is okay, the union has still the same size.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Jiri Benc
8f55db4860 tcp: simplify inetpeer_addr_base use
In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and get rid
of the typecasting.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
David S. Miller
89a3f3c55b Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-03-27

Here's another set of Bluetooth & 802.15.4 patches for 4.1:

 - New API to control LE advertising data (i.e. peripheral role)
 - mac802154 & at86rf230 cleanups
 - Support for toggling quirks from debugfs (useful for testing)
 - Memory leak fix for LE scanning
 - Extra version info reading support for Broadcom controllers

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 12:45:58 -04:00
Johan Hedberg
db6e3e8d01 Bluetooth: Refactor HCI request variables into own struct
In order to shrink the size of bt_skb_cb, this patch moves the HCI
request related variables into their own req_ctrl struct. Additionall
the L2CAP and HCI request structs are placed inside the same union since
they will never be used at the same time for the same skb.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-30 23:20:53 +02:00
Johan Hedberg
a4368ff3ed Bluetooth: Refactor L2CAP variables into l2cap_ctrl
We're getting very close to the maximum possible size of bt_skb_cb. To
prepare to shrink the struct with the help of a union this patch moves
all L2CAP related variables into the l2cap_ctrl struct. To later add
other 'ctrl' structs the L2CAP one is renamed simple 'l2cap' instead
of 'control'.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-30 23:20:53 +02:00
Johannes Berg
527871d720 mac80211: make sta.wme indicate whether QoS is used
Indicating just the peer's capability is fairly pointless
if the local device doesn't support it. Make the variable
track both combined, and remove the 'local support' check
in the TX path.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-30 10:47:16 +02:00
Tom Gundersen
6bab2e19c5 cfg80211: pass name_assign_type to rdev_add_virtual_intf()
This will expose in /sys whether the ifname of a device is set by
userspace or generated by the kernel. The latter kind (wlanX, etc)
is not deterministic, so userspace needs to rename these devices
to names that are guaranteed to stay the same between reboots. The
former, however should never be renamed, so userspace needs to be
able to reliably tell the difference.

Similar functionality was introduced for the rtnetlink core in
commit 5517750f05 ("net: rtnetlink - make create_link take name_assign_type")

Signed-off-by: Tom Gundersen <teg@jklm.no>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Brett Rudley <brudley@broadcom.com>
Cc: Arend van Spriel <arend@broadcom.com>
Cc: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Cc: Hante Meuleman <meuleman@broadcom.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
[reformat changelog to fit 72 cols]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-30 10:36:17 +02:00
Arik Nemtsov
a38700dd48 cfg/mac80211: add regulatory classes IE during TDLS setup
Seems Broadcom TDLS peers (Nexus 5, Xperia Z3) refuse to allow TDLS
connection when channel-switching is supported but the regulatory
classes IE is missing from the setup request.
Add a chandef to reg-class translation function to cfg80211 and use it
to add the required IE during setup. For now add only the current
regulatory class as supported - it is enough to resolve the
compatibility issue.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-30 10:26:36 +02:00
Emmanuel Grumbach
a90faa9d64 mac80211: notify the driver about deauth
This can allow the driver to take action based on the reason
of the deauth.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-30 10:17:12 +02:00
Emmanuel Grumbach
d0d1a12f9c mac80211: notify the driver about association status
This can allow the driver to take action based on the
success / failure of the association.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-30 10:17:11 +02:00
Emmanuel Grumbach
a9409093d2 mac80211: notify the driver about authentication status
This can allow the driver to take action based on the
success / failure of the authentication.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-30 10:17:10 +02:00
Emmanuel Grumbach
a818292952 mac80211: convert rssi_callback() to event_callback()
We will be able to add more events, such as MLME events and
others. The low level driver may be interested in knowing
about these events to dump firmware data upon failures, or
to change parameters in case connection attempts fail etc...

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-30 10:17:09 +02:00
Guenter Roeck
339d82626d net: dsa: Add basic framework to support ndo_fdb functions
Provide callbacks for ndo_fdb_add, ndo_fdb_del, and ndo_fdb_dump.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 13:23:54 -07:00
David S. Miller
4ef295e047 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree.
Basically, nf_tables updates to add the set extension infrastructure and finish
the transaction for sets from Patrick McHardy. More specifically, they are:

1) Move netns to basechain and use recently added possible_net_t, from
   Patrick McHardy.

2) Use LOGLEVEL_<FOO> from nf_log infrastructure, from Joe Perches.

3) Restore nf_log_trace that was accidentally removed during conflict
   resolution.

4) nft_queue does not depend on NETFILTER_XTABLES, starting from here
   all patches from Patrick McHardy.

5) Use raw_smp_processor_id() in nft_meta.

Then, several patches to prepare ground for the new set extension
infrastructure:

6) Pass object length to the hash callback in rhashtable as needed by
   the new set extension infrastructure.

7) Cleanup patch to restore struct nft_hash as wrapper for struct
   rhashtable

8) Another small source code readability cleanup for nft_hash.

9) Convert nft_hash to rhashtable callbacks.

And finally...

10) Add the new set extension infrastructure.

11) Convert the nft_hash and nft_rbtree sets to use it.

12) Batch set element release to avoid several RCU grace period in a row
    and add new function nft_set_elem_destroy() to consolidate set element
    release.

13) Return the set extension data area from nft_lookup.

14) Refactor existing transaction code to add some helper functions
    and document it.

15) Complete the set transaction support, using similar approach to what we
    already use, to activate/deactivate elements in an atomic fashion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 12:43:43 -07:00
Eric Dumazet
41d25fe092 tcp: tcp_syn_flood_action() can be static
After commit 1fb6f159fd ("tcp: add tcp_conn_request"),
tcp_syn_flood_action() is no longer used from IPv6.

We can make it static, by moving it above tcp_conn_request()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 12:17:18 -07:00
Patrick McHardy
cc02e457bb netfilter: nf_tables: implement set transaction support
Set elements are the last object type not supporting transaction support.
Implement similar to the existing rule transactions:

The global transaction counter keeps track of two generations, current
and next. Each element contains a bitmask specifying in which generations
it is inactive.

New elements start out as inactive in the current generation and active
in the next. On commit, the previous next generation becomes the current
generation and the element becomes active. The bitmask is then cleared
to indicate that the element is active in all future generations. If the
transaction is aborted, the element is removed from the set before it
becomes active.

When removing an element, it gets marked as inactive in the next generation.
On commit the next generation becomes active and the therefor the element
inactive. It is then taken out of then set and released. On abort, the
element is marked as active for the next generation again.

Lookups ignore elements not active in the current generation.

The current set types (hash/rbtree) both use a field in the extension area
to store the generation mask. This (currently) does not require any
additional memory since we have some free space in there.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-26 11:09:35 +01:00
Patrick McHardy
ea4bd995b0 netfilter: nf_tables: add transaction helper functions
Add some helper functions for building the genmask as preparation for
set transactions.

Also add a little documentation how this stuff actually works.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-26 11:09:35 +01:00
Patrick McHardy
b2832dd662 netfilter: nf_tables: return set extensions from ->lookup()
Return the extension area from the ->lookup() function to allow to
consolidate common actions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-26 11:09:34 +01:00
Patrick McHardy
61edafbb47 netfilter: nf_tables: consolide set element destruction
With the conversion to set extensions, it is now possible to consolidate
the different set element destruction functions.

The set implementations' ->remove() functions are changed to only take
the element out of their internal data structures. Elements will be freed
in a batched fashion after the global transaction's completion RCU grace
period.

This reduces the amount of grace periods required for nft_hash from N
to zero additional ones, additionally this guarantees that the set
elements' extensions of all implementations can be used under RCU
protection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-26 11:09:34 +01:00
Clément Perrochaud
25af01ed18 NFC: nci: Add firmware download support
A simple forward for firmware download (i.e. sending a new firmware
to the NFC adapter) from the NFC subsystem to the drivers.

This feature is required to update the firmware of NXP-NCI NFC
controllers but can be used by any NCI driver.

This feature has been present in the HCI subsystem since 9a695d.

Signed-off-by: Clément Perrochaud <clement.perrochaud@effinnov.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-03-26 10:56:20 +01:00
Arman Uguray
4950999621 Bluetooth: Add macros for advertising instance flags
This patch adds macro definitions for possible advertising instance
flags that can be passed to the "Add Advertising" command.

Signed-off-by: Arman Uguray <armansito@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-26 03:30:28 +01:00
Christoph Hellwig
e2e40f2c1e fs: move struct kiocb to fs.h
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-25 20:28:11 -04:00
Hannes Frederic Sowa
5a352dd0a3 ipv6: hash net ptr into fragmentation bucket selection
As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:07:04 -04:00
Hannes Frederic Sowa
b6a7719aed ipv4: hash net ptr into fragmentation bucket selection
As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:07:04 -04:00
Patrick McHardy
fe2811ebeb netfilter: nf_tables: convert hash and rbtree to set extensions
The set implementations' private struct will only contain the elements
needed to maintain the search structure, all other elements are moved
to the set extensions.

Element allocation and initialization is performed centrally by
nf_tables_api instead of by the different set implementations'
->insert() functions. A new "elemsize" member in the set ops specifies
the amount of memory to reserve for internal usage. Destruction
will also be moved out of the set implementations by a following patch.

Except for element allocation, the patch is a simple conversion to
using data from the extension area.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 17:18:35 +01:00
Patrick McHardy
3ac4c07a24 netfilter: nf_tables: add set extensions
Add simple set extension infrastructure for maintaining variable sized
and optional per element data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 17:18:34 +01:00
Ying Xue
7e3ea6d5c4 sctp: avoid to repeatedly declare external variables
Move the declaration for external variables to sctp.h file avoiding
to repeatedly declare them with extern keyword.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 11:40:16 -04:00
Patrick McHardy
5ebb335dcb netfilter: nf_tables: move struct net pointer to base chain
The network namespace is only needed for base chains to get at the
gencursor. Also convert to possible_net_t.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 12:09:38 +01:00
Eric Dumazet
fd3a154a00 tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()
With request socks convergence, we no longer need
different lookup methods. A request socket can
use generic lookup function.

Add const qualifier to 2nd tcp_v[46]_md5_lookup() parameter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:16:30 -04:00
Eric Dumazet
39f8e58e53 tcp: md5: remove request sock argument of calc_md5_hash()
Since request and established sockets now have same base,
there is no need to pass two pointers to tcp_v4_md5_hash_skb()
or tcp_v6_md5_hash_skb()

Also add a const qualifier to their struct tcp_md5sig_key argument.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:16:30 -04:00
David S. Miller
d5c1d8c567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/netfilter/nf_tables_core.c

The nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:22:43 -04:00
Hannes Frederic Sowa
1855b7c3e8 ipv6: introduce idgen_delay and idgen_retries knobs
This is specified by RFC 7217.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Hannes Frederic Sowa
5f40ef77ad ipv6: do retries on stable privacy addresses
If a DAD conflict is detected, we want to retry privacy stable address
generation up to idgen_retries (= 3) times with a delay of idgen_delay
(= 1 second). Add the logic to addrconf_dad_failure.

By design, we don't clean up dad failed permanent addresses.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Hannes Frederic Sowa
8e8e676d0b ipv6: collapse state_lock and lock
Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Arman Uguray
912098a630 Bluetooth: Add support for adv instance timeout
This patch implements support for the timeout parameter of the
Add Advertising command.

Signed-off-by: Arman Uguray <armansito@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-24 01:53:47 +01:00
Arman Uguray
203fea0178 Bluetooth: Add data structure for advertising instance
This patch introduces a new data structure to represent advertising
instances that were added using the "Add Advertising" mgmt command.
Initially an hci_dev structure will support only one of these instances
at a time, so the current instance is simply stored as a direct member
of hci_dev.

Signed-off-by: Arman Uguray <armansito@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-24 01:53:46 +01:00
Arman Uguray
4453b00653 Bluetooth: Introduce HCI_ADVERTISING_INSTANCE setting and add AD flags
This patch introduces the HCI_ADVERTISING_INSTANCE setting, which is set
when an at least one advertising instance has been added using the
"Add Advertising" mgmt command. This patch also adds a macro definition
for the EIR_APPEARANCE field type.

Signed-off-by: Arman Uguray <armansito@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-24 01:53:45 +01:00
Arman Uguray
841a6664f2 Bluetooth: Add definitions for Add/Remove Advertising API
This patch adds definitions for the Add Advertising and Remove
Advertising MGMT commands and events.

Signed-off-by: Arman Uguray <armansito@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-24 01:53:45 +01:00
Eric Dumazet
26e3736090 ipv4: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request sockets
tcp_v4_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

New tcp_req_err() helper is exported so that we can use it in IPv6
in following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 16:52:26 -04:00
Eric Dumazet
b282705336 net: convert syn_wait_lock to a spinlock
This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually.

We hold syn_wait_lock for such small sections, that it makes no sense to use
a read/write lock. A spin lock is simply faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 16:52:26 -04:00
Eric Dumazet
42cb80a235 inet: remove sk_listener parameter from syn_ack_timeout()
It is not needed, and req->sk_listener points to the listener anyway.
request_sock argument can be const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 16:52:25 -04:00
David S. Miller
c0e41fa76c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix missing initialization of tuple structure in nfnetlink_cthelper
   to avoid mismatches when looking up to attach userspace helpers to
   flows, from Ian Wilson.

2) Fix potential crash in nft_hash when we hit -EAGAIN in
   nft_hash_walk(), from Herbert Xu.

3) We don't need to indicate the hook information to update the
   basechain default policy in nf_tables.

4) Restore tracing over nfnetlink_log due to recent rework to
   accomodate logging infrastructure into nf_tables.

5) Fix wrong IP6T_INV_PROTO check in xt_TPROXY.

6) Set IP6T_F_PROTO flag in nft_compat so we can use SYNPROXY6 and
   REJECT6 from xt over nftables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-22 16:57:07 -04:00
YOSHIFUJI Hideaki/吉藤英明
8da86466b8 net: neighbour: Add mcast_resolicit to configure the number of multicast resolicitations in PROBE state.
We send unicast neighbor (ARP or NDP) solicitations ucast_probes
times in PROBE state.  Zhu Yanjun reported that some implementation
does not reply against them and the entry will become FAILED, which
is undesirable.

We had been dealt with such nodes by sending multicast probes mcast_
solicit times after unicast probes in PROBE state.  In 2003, I made
a change not to send them to improve compatibility with IPv6 NDP.

Let's introduce per-protocol per-interface sysctl knob "mcast_
reprobe" to configure the number of multicast (re)solicitation for
reconfirmation in PROBE state.  The default is 0, since we have
been doing so for 10+ years.

Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
CC: Ulf Samuelsson <ulf.samuelsson@ericsson.com>
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 21:47:40 -04:00
Scott Feldman
13bb8e2eb3 switchdev: kernel-doc cleanup on swithdev ops
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 21:36:53 -04:00
Daniel Borkmann
a8cb5f556b act_bpf: add initial eBPF support for actions
This work extends the "classic" BPF programmable tc action by extending
its scope also to native eBPF code!

Together with commit e2e9b6541d ("cls_bpf: add initial eBPF support
for programmable classifiers") this adds the facility to implement fully
flexible classifier and actions for tc that can be implemented in a C
subset in user space, "safely" loaded into the kernel, and being run in
native speed when JITed.

Also, since eBPF maps can be shared between eBPF programs, it offers the
possibility that cls_bpf and act_bpf can share data 1) between themselves
and 2) between user space applications. That means that, f.e. customized
runtime statistics can be collected in user space, but also more importantly
classifier and action behaviour could be altered based on map input from
the user space application.

For the remaining details on the workflow and integration, see the cls_bpf
commit e2e9b6541d. Preliminary iproute2 part can be found under [1].

  [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-act

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 19:10:44 -04:00
David S. Miller
0fa74a4be4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	net/core/sysctl_net_core.c
	net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 18:51:09 -04:00
Eric Dumazet
becb74f0ac net: increase sk_[max_]ack_backlog
sk_ack_backlog & sk_max_ack_backlog were 16bit fields, meaning
listen() backlog was limited to 65535.

It is time to increase the width to allow much bigger backlog,
if admins change /proc/sys/net/core/somaxconn &
/proc/sys/net/ipv4/tcp_max_syn_backlog default values.

Tested:

echo 5000000 >/proc/sys/net/core/somaxconn
echo 5000000 >/proc/sys/net/ipv4/tcp_max_syn_backlog

Ran a SYNFLOOD test against a listener using listen(fd, 5000000)

myhost~# grep request_sock_TCP /proc/slabinfo
request_sock_TCP  4185642 4411940    304   13    1 : tunables   54   27    8 : slabdata 339380 339380      0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 12:40:25 -04:00
Eric Dumazet
fa76ce7328 inet: get rid of central tcp/dccp listener timer
One of the major issue for TCP is the SYNACK rtx handling,
done by inet_csk_reqsk_queue_prune(), fired by the keepalive
timer of a TCP_LISTEN socket.

This function runs for awful long times, with socket lock held,
meaning that other cpus needing this lock have to spin for hundred of ms.

SYNACK are sent in huge bursts, likely to cause severe drops anyway.

This model was OK 15 years ago when memory was very tight.

We now can afford to have a timer per request sock.

Timer invocations no longer need to lock the listener,
and can be run from all cpus in parallel.

With following patch increasing somaxconn width to 32 bits,
I tested a listener with more than 4 million active request sockets,
and a steady SYNFLOOD of ~200,000 SYN per second.
Host was sending ~830,000 SYNACK per second.

This is ~100 times more what we could achieve before this patch.

Later, we will get rid of the listener hash and use ehash instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 12:40:25 -04:00
Eric Dumazet
52452c5425 inet: drop prev pointer handling in request sock
When request sock are put in ehash table, the whole notion
of having a previous request to update dl_next is pointless.

Also, following patch will get rid of big purge timer,
so we want to delete a request sock without holding listener lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 12:40:25 -04:00
David S. Miller
970282d0e1 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-03-19

This wont the last 4.1 bluetooth-next pull request, but we've piled up
enough patches in less than a week that I wanted to save you from a
single huge "last-minute" pull somewhere closer to the merge window.

The main changes are:

 - Simultaneous LE & BR/EDR discovery support for HW that can do it
 - Complete LE OOB pairing support
 - More fine-grained mgmt-command access control (normal user can now do
   harmless read-only operations).
 - Added RF power amplifier support in cc2520 ieee802154 driver
 - Some cleanups/fixes in ieee802154 code

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19 15:18:04 -04:00
Pablo Neira Ayuso
4017a7ee69 netfilter: restore rule tracing via nfnetlink_log
Since fab4085 ("netfilter: log: nf_log_packet() as real unified
interface"), the loginfo structure that is passed to nf_log_packet() is
used to explicitly indicate the logger type you want to use.

This is a problem for people tracing rules through nfnetlink_log since
packets are always routed to the NF_LOG_TYPE logger after the
aforementioned patch.

We can fix this by removing the trace loginfo structures, but that still
changes the log level from 4 to 5 for tracing messages and there may be
someone relying on this outthere. So let's just introduce a new
nf_log_trace() function that restores the former behaviour.

Reported-by: Markus Kötter <koetter@rrzn.uni-hannover.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-19 11:14:48 +01:00
Marcelo Ricardo Leitner
54ff9ef36b ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}
in favor of their inner __ ones, which doesn't grab rtnl.

As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.

So this patch:
- move rtnl handling to callers instead while already fixing some
  reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
  __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
  __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:05:09 -04:00
Eric Dumazet
08d2cc3b26 inet: request sock should init IPv6/IPv4 addresses
In order to be able to use sk_ehashfn() for request socks,
we need to initialize their IPv6/IPv4 addresses.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:00:35 -04:00
Eric Dumazet
b4d6444ea3 inet: get rid of last __inet_hash_connect() argument
We now always call __inet_hash_nolisten(), no need to pass it
as an argument.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:00:35 -04:00
Eric Dumazet
77a6a471bc ipv6: get rid of __inet6_hash()
We can now use inet_hash() and __inet_hash() instead of private
functions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:00:35 -04:00
Eric Dumazet
d1e559d0b1 inet: add IPv6 support to sk_ehashfn()
Intent is to converge IPv4 & IPv6 inet_hash functions to
factorize code.

IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr
in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set()
helpers.

__inet6_hash can now use sk_ehashfn() instead of a private
inet6_sk_ehashfn() and will simply use __inet_hash() in a
following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:00:34 -04:00
Eric Dumazet
5b441f76f1 net: introduce sk_ehashfn() helper
Goal is to unify IPv4/IPv6 inet_hash handling, and use common helpers
for all kind of sockets (full sockets, timewait and request sockets)

inet_sk_ehashfn() becomes sk_ehashfn() but still only copes with IPv4

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:00:34 -04:00
Eric Dumazet
6eada0110c netns: constify net_hash_mix() and various callers
const qualifiers ease code review by making clear
which objects are not written in a function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:00:34 -04:00
Eric Dumazet
0470c8ca1d inet: fix request sock refcounting
While testing last patch series, I found req sock refcounting was wrong.

We must set skc_refcnt to 1 for all request socks added in hashes,
but also on request sockets created by FastOpen or syncookies.

It is tricky because we need to defer this initialization so that
future RCU lookups do not try to take a refcount on a not yet
fully initialized request socket.

Also get rid of ireq_refcnt alias.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 13854e5a60 ("inet: add proper refcounting to request sock")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:02:29 -04:00
Eric Dumazet
4e9a578e5b inet: add rsk_listener field to struct request_sock
Once we'll be able to lookup request sockets in ehash table,
we'll need to get access to listener which created this request.

This avoid doing a lookup to find the listener, which benefits
for a more solid SO_REUSEPORT, and is needed once we no
longer queue request sock into a listener private queue.

Note that 'struct tcp_request_sock'->listener could be reduced
to a single bit, as TFO listener should match req->rsk_listener.
TFO will no longer need to hold a reference on the listener.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:01:56 -04:00
Eric Dumazet
e49bb337d7 inet: uninline inet_reqsk_alloc()
inet_reqsk_alloc() is becoming fat and should not be inlined.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:01:56 -04:00
Eric Dumazet
407640de21 inet: add sk_listener argument to inet_reqsk_alloc()
listener socket can be used to set net pointer, and will
be later used to hold a reference on listener.

Add a const qualifier to first argument (struct request_sock_ops *),
and factorize all write_pnet(&ireq->ireq_net, sock_net(sk));

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:01:55 -04:00
Eric Dumazet
7970ddc8f9 tcp: uninline tcp_oow_rate_limited()
tcp_oow_rate_limited() is hardly used in fast path, there is
no point inlining it.

Signed-of-by: Eric Dumazet <edumazet@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:18:00 -04:00
Eric Dumazet
1bfc4438a7 tcp: move tcp_openreq_init() to tcp_input.c
This big helper is called once from tcp_conn_request(), there is no
point having it in an include. Compiler will inline it anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:18:00 -04:00
Eric Dumazet
adc17d6a6c inet: move ir_mark to fill a hole
On 64bit arches, we can save 8 bytes in inet_request_sock
by moving ir_mark to fill a hole.

While we are at it, inet_request_mark() can get a const qualifier
for listener socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:17:59 -04:00
Johan Hedberg
fa4335d71a Bluetooth: Move generic mgmt command dispatcher to hci_sock.c
The mgmt.c file should be reserved purely for HCI_CHANNEL_CONTROL. The
mgmt_control() function in it is already completely generic and has a
single user in hci_sock.c. This patch moves the function there and
renames it a bit more appropriately to hci_mgmt_cmd() (as it's a command
dispatcher).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-17 18:03:08 +01:00
Johan Hedberg
88b94ce925 Bluetooth: Add hdev_init callback for HCI channels
In order to make the mgmt command handling more generic we can't have a
direct call to mgmt_init_hdev() from mgmt_control(). This patch adds a
new callback to struct hci_mgmt_chan. And sets it to point to the
mgmt_init_hdev() function for the HCI_CHANNEL_CONTROL instance.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-17 18:03:08 +01:00
Johan Hedberg
d0f172b14a Bluetooth: Add helper to get HCI channel of a socket
We'll need to have access to which HCI channel a socket is bound to, in
order to manage pending mgmt commands in clean way. This patch adds a
helper for the purpose.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-17 18:03:07 +01:00
Jakub Pawlowski
695c4cb619 Bluetooth: Introduce HCI_QUIRK_SIMULTANEOUS_DISCOVERY
Some controllers allow both LE scan and BR/EDR inquiry to run at
the same time, while others allow only one, LE SCAN or BR/EDR
inquiry at given time.

Since this is specific to each controller, add a new quirk setting
that allows drivers to tell the core wether given controller can
do both LE scan and BR/EDR inquiry at same time.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-17 18:30:58 +02:00
Marcel Holtmann
72000df2c0 Bluetooth: Add support for Local OOB Extended Data Update events
When a different user requests a new set of local out-of-band data, then
inform all previous users that the data has been updated. To limit the
scope of users, the updates are limited to previous users. If a user has
never requested out-of-band data, it will also not see the update.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-17 08:16:48 +02:00
David S. Miller
ca00942a81 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2015-03-16

1) Fix the network header offset in _decode_session6
   when multiple IPv6 extension headers are present.
   From Hajime Tazaki.

2) Fix an interfamily tunnel crash. We set outer mode
   protocol too early and may dispatch to the wrong
   address family. Move the setting of the outer mode
   protocol behind the last accessing of the inner mode
   to fix the crash.

3) Most callers of xfrm_lookup() expect that dst_orig
   is released on error. But xfrm_lookup_route() may
   need dst_orig to handle certain error cases. So
   introduce a flag that tells what should be done in
   case of error. From Huaibin Wang.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:16:49 -04:00
Eric Dumazet
13854e5a60 inet: add proper refcounting to request sock
reqsk_put() is the generic function that should be used
to release a refcount (and automatically call reqsk_free())

reqsk_free() might be called if refcount is known to be 0
or undefined.

refcnt is set to one in inet_csk_reqsk_queue_add()

As request socks are not yet in global ehash table,
I added temporary debugging checks in reqsk_put() and reqsk_free()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:55:29 -04:00
Eric Dumazet
1d0ab25387 net: add sk_fullsock() helper
We have many places where we want to check if a socket is
not a timewait or request socket. Use a helper to avoid
hard coding this.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:55:28 -04:00
Marcel Holtmann
f709bfcf6a Bluetooth: Add constants for LE SC Confirmation and Random values
The LE Secure Connections Confirmation Value and LE Secure Connections
Random Value contants are required for the out-of-band data and so
just define them.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-16 10:31:20 +02:00
Emmanuel Grumbach
dc5a1ad7bd mac80211: allow to get wireless_dev structure from ieee80211_vif
This will allow mac80211 drivers to call cfg80211 APIs with
the right handle.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-16 09:30:30 +01:00
Marcel Holtmann
aefedc1a4c Bluetooth: Remove unneeded HCI_CONN_REMOTE_OOB connection flag
The HCI_CONN_REMOTE_OOB connection flag is used to indicate if the
pairing initiator has provided out-of-band data. However since that
value is no longer used in any decision making, just remove it.

It is actually unclear what purpose the OOB data present field from
the HCI IO Capability Response event serves in the first place. If
either side provided out-of-band data, then that data will be used
for pairing.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-16 06:53:43 +02:00
Scott Feldman
4170604fee switchdev: add swdev ops
As discussed at netconf, introduce swdev_ops as first step to move switchdev
ops from ndo to swdev.  This will keep switchdev from cluttering up ndo ops
space.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 00:14:42 -04:00
Marcel Holtmann
4f0f155cea Bluetooth: Add simple version of Read Local OOB Extended Data command
This adds support for the simplest possible version of Read Local OOB
Extended Data management command. It includes all mandatory fields,
but none of the actual pairing related ones.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 10:05:29 +02:00
Marcel Holtmann
1471aae0d0 Bluetooth: Add defines for LE Bluetooth Device Address and LE Role
The OOB data requires to include LE Bluetooth Device Address and LE Role
and so add the type constants for these fields.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 10:05:28 +02:00
Marcel Holtmann
d3d5305bfd Bluetooth: Add simple version of Read Advertising Features command
This adds support for the simplest possible version of Read Advertising
Features management command. It allows basic testing of the interface.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 10:03:41 +02:00
Marcel Holtmann
a958452aa4 Bluetooth: Use BIT(n) macro instead of manually encoding (1 << n)
The flags for the management command table used manual encoding of
bits in the form of (1 << n). It is however preferred to use BIT(n)
macro instead.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 10:00:15 +02:00
Marcel Holtmann
f6b7712eb6 Bluetooth: Send global configuration updates to all management users
Changes to the global configuration updates like settings, class of
device, name etc. can be received by every user. They are allowed to
read them in the first place so provide the updates via events as
well. Otherwise untrusted users start polling for updates and that
is not a desired behavior.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:59:39 +02:00
Marcel Holtmann
c927a10487 Bluetooth: Add support for trust verification of management commands
Check the required trust level of each management command with the trust
level of the management socket. If it does not match up, then return the
newly introduced permission denied error.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:58:56 +02:00
Marcel Holtmann
c91041dc4e Bluetooth: Add support for untrusted access to management commands
Some management commands are safe to be accessed from any user without
special permissions. First step for allowing access to any of these
commands from untrusted application is to mark them accordingly.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:57:35 +02:00
Marcel Holtmann
c85be545ea Bluetooth: Add hci_sock_test_flag helper function
The management interface will need access to the socket flags and so
provide a helper function for checking them.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:57:31 +02:00
Marcel Holtmann
c08b1a1dba Bluetooth: Consolidate socket channel sending function back into one
With the introduction of trusted socket flag for control and monitor
channels, it is now possible to use a single function for sending
packets to these sockets. And with that consolidate the handling.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:56:41 +02:00
Marcel Holtmann
50ebc055fa Bluetooth: Introduce trusted flag for management control sockets
Providing a global trusted flag for management control sockets provides
an easy way for identifying sockets and imposing restriction on it. For
now all management sockets are trusted since they require CAP_NET_ADMIN.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:56:00 +02:00
Marcel Holtmann
96f1474af0 Bluetooth: Add support for extended index management command
The Read Extended Contoller Index List command can be used for
retrieving the complete list of local available controllers. This
included configured, unconfigured and also AMP controllers.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:55:51 +02:00
Marcel Holtmann
ced85549c3 Bluetooth: Add support for extended index management events
This introduces support for using Extended Index Added and Extended
Index Removed events. These events contain the controller type and
also the hardware bus information from the driver.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:53:08 +02:00
Marcel Holtmann
f920733885 Bluetooth: Use special function to send filter management index events
For sending Index Added, Index Removed, Unconfigured Index Added and
Unconfigured Index Removed managment events the new helper functions
allows taking into account if these events are enabled for a certain
management socket or not.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:47:51 +02:00
Marcel Holtmann
17711c6291 Bluetooth: Provide hci_send_to_flagged_channel helper function
The hci_send_to_flagged_channel helper function can be used to send
packets to all channels that have a certain HCI socket flag set.

This is especially useful for managment events that are limited to
sockets that have first enabled certain functionality. This allows
for filtering of events without confusing existing users.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:46:41 +02:00
Marcel Holtmann
6befc6445f Bluetooth: Add flags field and setting function for HCI sockets
To filter out certain actions for certain HCI sockets introcuce a flags
field that allows to configure specific settings on individual sockets.

Since the hci_pinfo structure is private in hci_sock.c, provide helper
functions for setting and clearing a given flag.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:45:39 +02:00
David S. Miller
5f1764ddfe Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
Here's another set of Bluetooth & ieee802154 patches intended for 4.1:

 - Added support for QCA ROME chipset family in the btusb driver
 - at86rf230 driver fixes & cleanups
 - ieee802154 cleanups
 - Refactoring of Bluetooth mgmt API to allow new users
 - New setting for static Bluetooth address exposed to user space
 - Refactoring of hci_dev flags to remove limit of 32
 - Remove unnecessary fast-connectable setting usage restrictions
 - Fix behavior to be consistent when trying to pair already paired device
 - Service discovery corner-case fixes

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:29:45 -04:00
Marcel Holtmann
b7cb93e528 Bluetooth: Merge hdev->dbg_flags fields into hdev->dev_flags
With the extension of hdev->dev_flags utilizing a bitmap now, the space
is no longer restricted. Merge the hdev->dbg_flags into hdev->dev_flags
to save space on 64-bit architectures. On 32-bit architectures no size
reduction happens.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 19:28:36 +02:00
Alexey Kodanev
40fb70f3aa vxlan: fix wrong usage of VXLAN_VID_MASK
commit dfd8645ea1 wrongly assumes that VXLAN_VDI_MASK includes
eight lower order reserved bits of VNI field that are using for remote
checksum offload.

Right now, when VNI number greater then 0xffff, vxlan_udp_encap_recv()
will always return with 'bad_flag' error, reducing the usable vni range
from 0..16777215 to 0..65535. Also, it doesn't really check whether RCO
bits processed or not.

Fix it by adding new VNI mask which has all 32 bits of VNI field:
24 bits for id and 8 bits for other usage.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 13:08:07 -04:00
Marcel Holtmann
eacb44dff9 Bluetooth: Use DECLARE_BITMAP for hdev->dev_flags field
The hdev->dev_flags field has outgrown itself on 32-bit systems. So
instead of hacking around it, switch to using DECLARE_BITMAP.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 18:35:45 +02:00
Marcel Holtmann
238be788fc Bluetooth: Introduce hci_dev_test_and_set_flag helper macro
Instead of manually coding test_and_set_bit on hdev->dev_flags all the
time, use hci_dev_test_and_set_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:33 +02:00
Marcel Holtmann
a69d892726 Bluetooth: Introduce hci_dev_test_and_clear_flag helper macro
Instead of manually coding test_and_clear_bit on hdev->dev_flags all the
time, use hci_dev_test_and_clear_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:32 +02:00
Marcel Holtmann
516018a9c0 Bluetooth: Introduce hci_dev_test_and_change_flag helper macro
Instead of manually coding test_and_change_bit on hdev->dev_flags all the
time, use hci_dev_test_and_change_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:31 +02:00
Marcel Holtmann
ce05d603af Bluetooth: Introduce hci_dev_change_flag helper macro
Instead of manually coding change_bit on hdev->dev_flags all the time,
use hci_dev_change_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:29 +02:00
Marcel Holtmann
a358dc11d8 Bluetooth: Introduce hci_dev_clear_flag helper macro
Instead of manually coding clear_bit on hdev->dev_flags all the time,
use hci_dev_clear_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:27 +02:00
Marcel Holtmann
a1536da255 Bluetooth: Introduce hci_dev_set_flag helper macro
Instead of manually coding set_bit on hdev->dev_flags all the time,
use hci_dev_set_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:26 +02:00
Marcel Holtmann
d7a5a11d7f Bluetooth: Introduce hci_dev_test_flag helper macro
Instead of manually coding test_bit on hdev->dev_flags all the time,
use hci_dev_test_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:25 +02:00
Marcel Holtmann
cc91cb042c Bluetooth: Add support connectable advertising setting
The patch adds a second advertising setting that allows switching of the
controller into connectable mode independent of the global connectable
setting.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:07:54 +02:00
Eric W. Biederman
098a697b49 tcp_metrics: Use a single hash table for all network namespaces.
Now that all of the operations are safe on a single hash table
accross network namespaces, allocate a single global hash table
and update the code to use it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 01:57:07 -04:00
Eric Dumazet
3f66b083a5 inet: introduce ireq_family
Before inserting request socks into general hash table,
fill their socket family.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet
41b822c59e inet: prepare sock_edemux() & sock_gen_put() for new SYN_RECV state
sock_edemux() & sock_gen_put() should be ready to cope with request socks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet
1e2e01172f inet: add rsk_refcnt/ireq_refcnt to request socks
When request socks will be in ehash, they'll need to be refcounted.

This patch adds rsk_refcnt/ireq_refcnt macros, and adds
reqsk_put() function, but nothing yet use them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet
d34ac51b76 inet: add ireq_state field to inet_request_sock
We need to identify request sock when they'll be visible in
global ehash table.

ireq_state is an alias to req.__req_common.skc_state.

Its value is set to TCP_NEW_SYN_RECV

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Eric Dumazet
10feb428a5 inet: add TCP_NEW_SYN_RECV state
TCP_SYN_RECV state is currently used by fast open sockets.

Initial TCP requests (the pseudo sockets created when a SYN is received)
are not yet associated to a state. They are attached to their parent,
and the parent is in TCP_LISTEN state.

This commit adds TCP_NEW_SYN_RECV state, so that we can convert
TCP stack to a different schem gradually.

This state is not exported to user space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Eric Dumazet
bd337c581b ipv6: add missing ireq_net & ir_cookie initializations
I forgot to update dccp_v6_conn_request() & cookie_v6_check().
They both need to set ireq->ireq_net and ireq->ir_cookie

Lets clear ireq->ir_cookie in inet_reqsk_alloc()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 33cf7c90fe ("net: add real socket cookies")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Eric W. Biederman
0c5c9fb551 net: Introduce possible_net_t
Having to say
> #ifdef CONFIG_NET_NS
> 	struct net *net;
> #endif

in structures is a little bit wordy and a little bit error prone.

Instead it is possible to say:
> typedef struct {
> #ifdef CONFIG_NET_NS
>       struct net *net;
> #endif
> } possible_net_t;

And then in a header say:

> 	possible_net_t net;

Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.

Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.

This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00
Eric W. Biederman
efd7ef1c19 net: Kill hold_net release_net
hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008.  Kill the code it is long past due.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00
Simon Horman
d299ce149c vxlan: Correct path typo in comment
Flags are used in the return path rather than the return patch.

Fixes: af33c1adae ("vxlan: Eliminate dependency on UDP socket in transmit path")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:05:38 -04:00
Eric Dumazet
33cf7c90fe net: add real socket cookies
A long standing problem in netlink socket dumps is the use
of kernel socket addresses as cookies.

1) It is a security concern.

2) Sockets can be reused quite quickly, so there is
   no guarantee a cookie is used once and identify
   a flow.

3) request sock, establish sock, and timewait socks
   for a given flow have different cookies.

Part of our effort to bring better TCP statistics requires
to switch to a different allocator.

In this patch, I chose to use a per network namespace 64bit generator,
and to use it only in the case a socket needs to be dumped to netlink.
(This might be refined later if needed)

Note that I tried to carry cookies from request sock, to establish sock,
then timewait sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:55:28 -04:00
Alexander Duyck
0ddcf43d5d ipv4: FIB Local/MAIN table collapse
This patch is meant to collapse local and main into one by converting
tb_data from an array to a pointer.  Doing this allows us to point the
local table into the main while maintaining the same variables in the
table.

As such the tb_data was converted from an array to a pointer, and a new
array called data is added in order to still provide an object for tb_data
to point to.

In order to track the origin of the fib aliases a tb_id value was added in
a hole that existed on 64b systems.  Using this we can also reverse the
merge in the event that custom FIB rules are enabled.

With this patch I am seeing an improvement of 20ns to 30ns for routing
lookups as long as custom rules are not enabled, with custom rules enabled
we fall back to split tables and the original behavior.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:22:14 -04:00
Johan Hedberg
55e76b3898 Bluetooth: Add 'Already Paired' error for Pair Device command
To make the behavior predictable when attempting to pair with a device
for which we already have a Link Key or Long Term Key, this patch adds a
new 'Already Paired' error which gets sent in such a scenario.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-10 21:42:05 +01:00
Johan Hedberg
406ef2a67b Bluetooth: Make Fast Connectable available while powered off
To maximize the usability of the Fast Connectable feature we should make
it possible to set (or unset) it at any given moment. This means
removing the dependency on the 'connectable' setting as well as the
'powered' setting. The former makes also sense since page scan may get
enabled through add_device even if 'connectable' is false. To keep the
setting available over power cycles its flag also needs to be removed
from the flags that are cleared upon HCI_Reset.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-10 19:37:02 +01:00
David S. Miller
515fb5c317 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter fixes for net-next

The following batch contains a couple of fixes to address some fallout
from the previous pull request, they are:

1) Address link problems in the bridge code after e5de75b. Fix it by
   using rcu hook to address to avoid ifdef pollution and hard
   dependency between bridge and br_netfilter.

2) Address sparse warnings in the netfilter reject code, patch from
   Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 12:48:47 -04:00
Florian Westphal
a03a8dbe20 netfilter: fix sparse warnings in reject handling
make C=1 CF=-D__CHECK_ENDIAN__ shows following:

net/bridge/netfilter/nft_reject_bridge.c:65:50: warning: incorrect type in argument 3 (different base types)
net/bridge/netfilter/nft_reject_bridge.c:65:50:    expected restricted __be16 [usertype] protocol [..]
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: cast from restricted __be16
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: incorrect type in argument 1 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:121:50: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:168:52: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:233:52: warning: incorrect type in argument 3 (different base types) [..]

Caused by two (harmless) errors:
1. htons() instead of ntohs()
2. __be16 for protocol in nf_reject_ipXhdr_put API, use u8 instead.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-10 15:01:32 +01:00
Scott Feldman
f8f2147150 switchdev: add netlink flags to IPv4 FIB add op
Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op
to allow driver to 1) optimize hardware updates, 2) handle ip route prepend
and append commands correctly.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:56:52 -04:00
Florian Fainelli
769a020289 net: dsa: utilize of_find_net_device_by_node
Using of_find_device_by_node() restricts the search to platform_device that
match the specified device_node pointer. This is not even remotely true for
network devices backed by a pci_device for instance.

of_find_net_device_by_node() allows us to do a more thorough lookup to find the
struct net_device corresponding to a particular device_node pointer.

For symetry with the non-OF code path, we hold the net_device pointer in
dsa_probe() just like what dev_to_net_dev() does when we call this
function.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:50:21 -04:00
David S. Miller
3cef5c5b0b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c

Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:38:02 -04:00
Eric W. Biederman
ddb3b6033c net: Remove protocol from struct dst_ops
After my change to neigh_hh_init to obtain the protocol from the
neigh_table there are no more users of protocol in struct dst_ops.
Remove the protocol field from dst_ops and all of it's initializers.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 16:06:10 -04:00
David S. Miller
5428aef811 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Basically, improvements for the packet rejection infrastructure,
deprecation of CLUSTERIP, cleanups for nf_tables and some untangling for
br_netfilter. More specifically they are:

1) Send packet to reset flow if checksum is valid, from Florian Westphal.

2) Fix nf_tables reject bridge from the input chain, also from Florian.

3) Deprecate the CLUSTERIP target, the cluster match supersedes it in
   functionality and it's known to have problems.

4) A couple of cleanups for nf_tables rule tracing infrastructure, from
   Patrick McHardy.

5) Another cleanup to place transaction declarations at the bottom of
   nf_tables.h, also from Patrick.

6) Consolidate Kconfig dependencies wrt. NF_TABLES.

7) Limit table names to 32 bytes in nf_tables.

8) mac header copying in bridge netfilter is already required when
   calling ip_fragment(), from Florian Westphal.

9) move nf_bridge_update_protocol() to br_netfilter.c, also from
   Florian.

10) Small refactor in br_netfilter in the transmission path, again from
    Florian.

11) Move br_nf_pre_routing_finish_bridge_slow() to br_netfilter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:58:21 -04:00
Cong Wang
1e052be69d net_sched: destroy proto tp when all filters are gone
Kernel automatically creates a tp for each
(kind, protocol, priority) tuple, which has handle 0,
when we add a new filter, but it still is left there
after we remove our own, unless we don't specify the
handle (literally means all the filters under
the tuple). For example this one is left:

  # tc filter show dev eth0
  filter parent 8001: protocol arp pref 49152 basic

The user-space is hard to clean up these for kernel
because filters like u32 are organized in a complex way.
So kernel is responsible to remove it after all filters
are gone.  Each type of filter has its own way to
store the filters, so each type has to provide its
way to check if all filters are gone.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:35:55 -04:00
Eric W. Biederman
b79bda3d38 neigh: Use neigh table index for neigh_packet_xmit
Remove a little bit of unnecessary work when transmitting a packet with
neigh_packet_xmit.  Use the neighbour table index not the address family
as a parameter.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-08 19:30:06 -04:00
Shani Michaeli
c93682477b net/dcb: Add IEEE QCN attribute
As specified in 802.1Qau spec. Add this optional attribute to the
DCB netlink layer. To allow for application to use the new attribute,
NIC drivers should implement and register the  callbacks ieee_getqcn,
ieee_setqcn and ieee_getqcnstats.

The QCN attribute holds a set of parameters for management, and
a set of statistics to provide informative data on Congestion-Control
defined by this spec.

Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 21:50:02 -05:00
Willem de Bruijn
89650ad004 fib: make netdev_switch_fib_ipv4_abort in header file static inline
When building without CONFIG_NET_SWITCHDEV,
netdev_switch_fib_ipv4_abort is defined in the header file. It must
be static inline to avoid build failure at link time.

Fixes: 8e05fd7166 ("fib: hook IPv4 fib for hardware offload")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 15:44:35 -05:00
Fan Du
05cbc0db03 ipv4: Create probe timer for tcp PMTU as per RFC4821
As per RFC4821 7.3.  Selecting Probe Size, a probe timer should
be armed once probing has converged. Once this timer expired,
probing again to take advantage of any path PMTU change. The
recommended probing interval is 10 minutes per RFC1981. Probing
interval could be sysctled by sysctl_tcp_probe_interval.

Eric Dumazet suggested to implement pseudo timer based on 32bits
jiffies tcp_time_stamp instead of using classic timer for such
rare event.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 14:57:42 -05:00
Fan Du
6b58e0a5f3 ipv4: Use binary search to choose tcp PMTU probe_size
Current probe_size is chosen by doubling mss_cache,
the probing process will end shortly with a sub-optimal
mss size, and the link mtu will not be taken full
advantage of, in return, this will make user to tweak
tcp_base_mss with care.

Use binary search to choose probe_size in a fine
granularity manner, an optimal mss will be found
to boost performance as its maxmium.

In addition, introduce a sysctl_tcp_probe_threshold
to control when probing will stop in respect to
the width of search range.

Test env:
Docker instance with vxlan encapuslation(82599EB)
iperf -c 10.0.0.24  -t 60

before this patch:
1.26 Gbits/sec

After this patch: increase 26%
1.59 Gbits/sec

Signed-off-by: Fan Du <fan.du@intel.com>
Acked-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 14:57:41 -05:00
Fan Du
dcd8fb8533 ipv4: Raise tcp PMTU probe mss base size
Quotes from RFC4821 7.2.  Selecting Initial Values

   It is RECOMMENDED that search_low be initially set to an MTU size
   that is likely to work over a very wide range of environments.  Given
   today's technologies, a value of 1024 bytes is probably safe enough.
   The initial value for search_low SHOULD be configurable.

Moreover, set a small value will introduce extra time for the search
to converge. So set the initial probe base mss size to 1024 Bytes.

Signed-off-by: Fan Du <fan.du@intel.com>
Acked-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 14:57:41 -05:00
Eric W. Biederman
aaa4e70404 DECnet: Only use neigh_ops for adding the link layer header
Other users users of the neighbour table use neigh->output as the method
to decided when and which link-layer header to place on a packet.
DECnet has been using neigh->output to decide which DECnet headers to
place on a packet depending which neighbour the packet is destined for.

The DECnet usage isn't totally wrong but it can run into problems if the
neighbour output function is run for a second time as the teql driver
and the bridge netfilter code can do.

Therefore to avoid pathologic problems later down the line and make the
neighbour code easier to understand by refactoring the decnet output
code to only use a neighbour method to add a link layer header to a
packet.

This is done by moving the neigbhour operations lookup from
dn_to_neigh_output to dn_neigh_output_packet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 14:54:22 -05:00
Johan Hedberg
b9a245fb12 Bluetooth: Move all mgmt command quirks to handler table
In order to completely generalize the mgmt command handling we need to
move away command-specific information from mgmt_control() into the
actual command table. This patch adds a new 'flags' field to the handler
entries which can now contain the following command specific
information:

 - Command takes variable length parameters
 - Command doesn't target any specific HCI device
 - Command can be sent when the HCI device is unconfigured

After this the mgmt_control() function is completely generic and can
potentially be reused by new HCI channels.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06 20:15:21 +01:00
Johan Hedberg
6d785aa345 Bluetooth: Convert mgmt to use HCI chan registration API
This patch converts the existing mgmt code to use the newly introduced
generic API for registering HCI channels with mgmt-like semantics.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06 20:15:21 +01:00
Johan Hedberg
801c1e8da5 Bluetooth: Add mgmt HCI channel registration API
This patch adds an API for registering HCI channels with mgmt-like
semantics. For now the only user will be HCI_CHANNEL_CONTROL, but e.g.
6lowpan is intended to use this as well in the future.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06 20:15:21 +01:00
Marcel Holtmann
93690c227a Bluetooth: Introduce controller setting information for static address
Currently it is not possible to determine if the static address is used
by the controller. It is also not possible to determine if using a
static on a dual-mode controller with disabled BR/EDR is possible or
not.

To address this issue, introduce a new setting called static-address. If
support for this setting is signaled that means that the kernel supports
using static addresses. And if used on dual-mode controllers with BR/EDR
disabled it means that a configured static address can be used.

In addition utilize the same setting for the list of current active
settings that indicates if a static address is configured and if that
address will be actually used.

With this in mind the existing Set Static Address management command
has been extended to return the current settings. That way the caller
of that command can easily determine if the programmed address will
be used or if extra steps are required.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-06 20:43:07 +02:00
Scott Feldman
8e05fd7166 fib: hook IPv4 fib for hardware offload
Call into the switchdev driver any time an IPv4 fib entry is
added/modified/deleted from the kernel's FIB.  The switchdev driver may or
may not install the route to the offload device.  In the case where the
driver tries to install the route and something goes wrong (device's routing
table is full, etc), then all of the offloaded routes will be flushed from the
device, route forwarding falls back to the kernel, and no more routes are
offloading.

We can refine this logic later.  For now, use the simplist model of offloading
routes up to the point of failure, and then on failure, undo everything and
mark IPv4 offloading disabled.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
Scott Feldman
448b128a14 ipv4: add net bool fib_offload_disabled
If something goes wrong with IPv4 FIB offload, mark entire net offload
disabled.  This is brute force policy to basically shut down IPv4 FIB offload
permanently if there is a problem offloading any route to an external device.
We can refine the policy in the future, to handle failures on a per-device or
per-route basis, but for now, this policy is per-net.

What we're trying to avoid is an inconsistent split between the kernel's FIB
and the offload device's FIB.  We don't want the device to fwd a pkt
inconsitent with what the kernel would do.  An example of a split is if device
has 10.0.0.0/16 and kernel has 10.0.0.0/16 and 10.0.0.0/24, the device wouldn't
see the longest prefix 10.0.0.0/24 and potentially forward pkts incorrectly.

Limited capacity or limited capability are two ways a route may fail to install
to the offload device.  We'll not differentiate between failures at this time,
and treat any failure as fatal and mark the net as fib_offload_disabled.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
Scott Feldman
104616e74e switchdev: don't support custom ip rules, for now
Keep switchdev FIB offload model simple for now and don't allow custom ip
rules.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
Scott Feldman
5e8d90497d switchdev: add IPv4 fib ndo ops wrappers
Add IPv4 fib ndo wrapper funcs and stub them out for now.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
Florian Fainelli
5929903103 net: dsa: let switches specify their tagging protocol
In order to support the new DSA device driver model, a dsa_switch should
be able to advertise the type of tagging protocol supported by the
underlying switch device. This also removes constraints on how tagging
can be stacked to each other.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:18:20 -05:00
Pablo Neira Ayuso
1cae565e8b netfilter: nf_tables: limit maximum table name length to 32 bytes
Set the same as we use for chain names, it should be enough.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:21 +01:00
Patrick McHardy
1a1e1a1219 netfilter: nf_tables: cleanup nf_tables.h
The transaction related definitions are squeezed in between the rule
and expression definitions, which are closely related and should be
next to each other. The transaction definitions actually don't belong
into that file at all since it defines the global objects and API and
transactions are internal to nf_tables_api, but for now simply move
them to a seperate section.

Similar, the chain types are in between a set of registration functions,
they belong to the chain section.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:13 +01:00
Pablo Neira Ayuso
43270b1bc5 netfilter: ipt_CLUSTERIP: deprecate it in favour of xt_cluster
xt_cluster supersedes ipt_CLUSTERIP since it can be also used in
gateway configurations (not only from the backend side).

ipt_CLUSTER is also known to leak the netdev that it uses on
device removal, which requires a rather large fix to workaround
the problem: http://patchwork.ozlabs.org/patch/358629/

So let's deprecate this so we can probably kill code this in the
future.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:05 +01:00
Jakub Pawlowski
82f8b651a9 Bluetooth: fix service discovery behaviour for empty uuids filter
This patch fixes service discovery behaviour, when provided uuid filter
is empty and HCI_QUIRK_STRICT_DUPLICATE_FILTER is set. Before this
patch, empty uuid filter was unable to trigger scan restart, and that
caused inconsistent behaviour in applications.

Example: two DBus clients call BlueZ, one to find all devices with
service abcd, second to find all devices with rssi smaller than -90.
Sum of those filters, that is passed to mgmt_service_scan is empty
filter, with no rssi or uuids set.
That caused kernel not to restart scan when quirk was set.
That was inconsistent with what happen when there's only one of those
two filters set (scan is restarted and reports devices).

To fix that, new variable hdev->discovery.result_filtering was
introduced. It can indicate that filtered scan is running, no matter
what uuid or rssi filter is set.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05 09:50:50 +02:00
Alexander Duyck
a7e5353123 fib_trie: Make fib_table rcu safe
The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections.  This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:18 -05:00
Patrick McHardy
86f1ec3231 netfilter: nf_tables: fix userdata length overflow
The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8
to store its size. Introduce a struct nft_userdata which contains a
length field and indicate its presence using a single bit in the rule.

The length field of struct nft_userdata is also a u8, however we don't
store zero sized data, so the actual length is udata->len + 1.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:06 +01:00
SenthilKumar Jegadeesan
64a8cef41a mac80211: provide station PMF configuration to driver
Some device drivers offload part of aggregation including AddBA/DelBA
negotiations to firmware. In such scenario, the PMF configuration of
the station needs to be provided to driver to enable encryption of
AddBA/DelBA action frames.

Signed-off-by: SenthilKumar Jegadeesan <sjegadee@qti.qualcomm.com>
[fix commit log, documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-04 10:34:12 +01:00
Arik Nemtsov
3384d757d4 mac80211: allow iterating inactive interfaces
Sometimes the driver might want to modify private data in interfaces
that are down. One possible use-case is cleaning up interface state
after HW recovery. Some interfaces that were up before the recovery took
place might be down now, but they might still be "dirty".

Introduce a new iterate_interfaces() API and a new ACTIVE iterator flag.
This way the internal implementation of the both active and inactive
APIs remains the same.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-04 10:34:12 +01:00
Eric W. Biederman
7720c01f3f mpls: Add a sysctl to control the size of the mpls label table
This sysctl gives two benefits.  By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets.  This prevents unpleasant surprises for users.

The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.

This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
0189197f44 mpls: Basic routing support
This change adds a new Kconfig option MPLS_ROUTING.

The core of this change is the code to look at an mpls packet received
from another machine.  Look that packet up in a routing table and
forward the packet on.

Support of MPLS over ATM is not considered or attempted here.  This
implemntation follows RFC3032 and implements the MPLS shim header that
can pass over essentially any network.

What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
label fordwarding information base (lfib) might also be valid.

Further the implemntation forwards packets as described in RFC3032.
There is no need and given the original motivation for MPLS a strong
discincentive to have a flexible label forwarding path.  In essence
the logic is the topmost label is read, looked up, removed, and
replaced by 0 or more new lables and the sent out the specified
interface to it's next hop.

Quite a few optional features are not implemented here.  Among them
are generation of ICMP errors when the TTL is exceeded or the packet
is larger than the next hop MTU (those conditions are detected and the
packets are dropped instead of generating an icmp error).  The traffic
class field is always set to 0.  The implementation focuses on IP over
MPLS and does not handle egress of other kinds of protocols.

Instead of implementing coordination with the neighbour table and
sorting out how to input next hops in a different address family (for
which there is value).  I was lazy and implemented a next hop mac
address instead.  The code is simpler and there are flavor of MPLS
such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
appropriate so a next hop by mac address would need to be implemented
at some point.

Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.

Decoding the mpls header must be done by first byeswapping a 32bit bit
endian word into the local cpu endian and then bit shifting to extract
the pieces.  There is no C bit-field that can represent a wire format
mpls header on a little endian machine as the low bits of the 20bit
label wind up in the wrong half of third byte.  Therefore internally
everything is deal with in cpu native byte order except when writing
to and reading from a packet.

For management simplicity if a label is configured to forward out
an interface that is down the packet is dropped early.  Similarly
if an network interface is removed rt_dev is updated to NULL
(so no reference is preserved) and any packets for that label
are dropped.  Keeping the label entries in the kernel allows
the kernel label table to function as the definitive source
of which labels are allocated and which are not.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
4fd3d7d9e8 neigh: Add helper function neigh_xmit
For MPLS I am building the code so that either the neighbour mac
address can be specified or we can have a next hop in ipv4 or ipv6.

The kind of next hop we have is indicated by the neighbour table
pointer.  A neighbour table pointer of NULL is a link layer address.
A non-NULL neighbour table pointer indicates which neighbour table and
thus which address family the next hop address is in that we need to
look up.

The code either sends a packet directly or looks up the appropriate
neighbour table entry and sends the packet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:23:23 -05:00
Eric W. Biederman
60395a20ff neigh: Factor out ___neigh_lookup_noref
While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.

So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function.  I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.

To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq.  So that the static size of the key would be
available.

I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:23:23 -05:00
David S. Miller
71a83a6db6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 21:16:48 -05:00
Eric W. Biederman
1d5da757da ax25: Stop using magic neighbour cache operations.
Before the ax25 stack calls dev_queue_xmit it always calls
ax25_type_trans which sets skb->protocol to ETH_P_AX25.

Which means that by looking at the protocol type it is possible to
detect IP packets that have not been munged by the ax25 stack in
ndo_start_xmit and call a function to munge them.

Rename ax25_neigh_xmit to ax25_ip_xmit and tweak the return type and
value to be appropriate for an ndo_start_xmit function.

Update all of the ax25 devices to test the protocol type for ETH_P_IP
and return ax25_ip_xmit as the first thing they do.  This preserves
the existing semantics of IP packet processing, but the timing will be
a little different as the IP packets now pass through the qdisc layer
before reaching the ax25 ip packet processing.

Remove the now unnecessary ax25 neighbour table operations.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 14:44:41 -05:00
Alexander Bondar
2ecc3905e6 mac80211: Update beacon's timing and DTIM count on every beacon
Beacon's timestamp, device system time associated with this beacon and
DTIM count parameters are not updated in the associated vif context
if the latest beacon's content is identical to the previously received.
It make sense to update these changing parameters on every beacon so the
driver can get most updated values. This may be necessary, for example,
to avoid either beacons' drift effect or device time stamp overrun.
IMPORTANT: Three sync_* parameters - sync_ts, sync_device_ts and
sync_dtim_count would possibly be out of sync by the time the driver will
use them. The synchronized view is currently guaranteed only in certain
callbacks.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-03 15:56:06 +01:00
Ahmad Kholaif
6c09e791b2 cfg80211: Allow NL80211_ATTR_IFINDEX to be added to vendor events
This modifies cfg80211_vendor_event_alloc() with an additional argument
struct wireless_dev *wdev. __cfg80211_alloc_event_skb() is modified to
take in *wdev argument, if wdev != NULL, both the NL80211_ATTR_IFINDEX
and wdev identifier are added to the vendor event.

These changes make it easier for drivers to add ifindex indication in
vendor events cleanly.

This also updates all existing users of cfg80211_vendor_event_alloc()
and __cfg80211_alloc_event_skb() in the kernel tree.

Signed-off-by: Ahmad Kholaif <akholaif@qca.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-03 15:56:05 +01:00
Dedy Lansky
6eb1813764 cfg80211: add bss_type and privacy arguments in cfg80211_get_bss()
802.11ad adds new a network type (PBSS) and changes the capability
field interpretation for the DMG (60G) band.
The same 2 bits that were interpreted as "ESS" and "IBSS" before are
re-used as a 2-bit field with 3 valid values (and 1 reserved). Valid
values are: "IBSS", "PBSS" (new) and "AP".

In order to get the BSS struct for the new PBSS networks, change the
cfg80211_get_bss() function to take a new enum ieee80211_bss_type
argument with the valid network types, as "capa_mask" and "capa_val"
no longer work correctly (the search must be band-aware now.)

The remaining bits in "capa_mask" and "capa_val" are used only for
privacy matching so replace those two with a privacy enum as well.

Signed-off-by: Dedy Lansky <dlansky@codeaurora.org>
[rewrite commit log, tiny fixes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-03 15:56:01 +01:00
Florian Westphal
ee586bbc28 netfilter: reject: don't send icmp error if csum is invalid
tcp resets are never emitted if the packet that triggers the
reject/reset has an invalid checksum.

For icmp error responses there was no such check.
It allows to distinguish icmp response generated via

iptables -I INPUT -p udp --dport 42 -j REJECT

and those emitted by network stack (won't respond if csum is invalid,
REJECT does).

Arguably its possible to avoid this by using conntrack and only
using REJECT with -m conntrack NEW/RELATED.

However, this doesn't work when connection tracking is not in use
or when using nf_conntrack_checksum=0.

Furthermore, sending errors in response to invalid csums doesn't make
much sense so just add similar test as in nf_send_reset.

Validate csum if needed and only send the response if it is ok.

Reference: http://bugzilla.redhat.com/show_bug.cgi?id=1169829
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-03 02:10:35 +01:00
Eric W. Biederman
bdf53c5849 neigh: Don't require dst in neigh_hh_init
- Add protocol to neigh_tbl so that dst->ops->protocol is not needed
- Acquire the device from neigh->dev

This results in a neigh_hh_init that will cache the samve values
regardless of the packets flowing through it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:41 -05:00
Eric W. Biederman
59b2af26b9 arp: Kill arp_find
There are no more callers so kill this function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:41 -05:00
Eric W. Biederman
def6775369 neigh: Move neigh_compat_output into ax25_ip.c
The only caller is now is ax25_neigh_construct so move
neigh_compat_output into ax25_ip.c make it static and rename it
ax25_neigh_output.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:40 -05:00
Eric W. Biederman
3b6a94bed0 ax25: Refactor to use private neighbour operations.
AX25 already has it's own private arp cache operations to isolate
it's abuse of dev_rebuild_header to transmit packets.  Add a function
ax25_neigh_construct that will allow all of the ax25 devices to
force using these operations, so that the generic arp code does
not need to.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:40 -05:00
Eric W. Biederman
46d4e47abe ax25: Make ax25_header and ax25_rebuild_header static
The only user is in ax25_ip.c so stop exporting these functions.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:40 -05:00
David S. Miller
77f0379fa8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

A small batch with accumulated updates in nf-next, mostly IPVS updates,
they are:

1) Add 64-bits stats counters to IPVS, from Julian Anastasov.

2) Move NETFILTER_XT_MATCH_ADDRTYPE out of NETFILTER_ADVANCED as docker
seem to require this, from Anton Blanchard.

3) Use boolean instead of numeric value in set_match_v*(), from
coccinelle via Fengguang Wu.

4) Allows rescheduling of new connections in IPVS when port reuse is
detected, from Marcelo Ricardo Leitner.

5) Add missing bits to support arptables extensions from nft_compat,
from Arturo Borrero.

Patrick is preparing a large batch to enhance the set infrastructure,
named expressions among other things, that should follow up soon after
this batch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 14:55:05 -05:00
David S. Miller
70c836a4d1 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-03-02

Here's the first bluetooth-next pull request targeting the 4.1 kernel:

 - ieee802154/6lowpan cleanups
 - SCO routing to host interface support for the btmrvl driver
 - AMP code cleanups
 - Fixes to AMP HCI init sequence
 - Refactoring of the HCI callback mechanism
 - Added shutdown routine for Intel controllers in the btusb driver
 - New config option to enable/disable Bluetooth debugfs information
 - Fix for early data reception on L2CAP fixed channels

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 14:47:12 -05:00
Ying Xue
1b78414047 net: Remove iocb argument from sendmsg and recvmsg
After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 13:06:31 -05:00
Eyal Birger
744d5a3e9f net: move skb->dropcount to skb->cb[]
Commit 977750076d ("af_packet: add interframe drop cmsg (v6)")
unionized skb->mark and skb->dropcount in order to allow recording
of the socket drop count while maintaining struct sk_buff size.

skb->dropcount was introduced since there was no available room
in skb->cb[] in packet sockets. However, its introduction led to
the inability to export skb->mark, or any other aliased field to
userspace if so desired.

Moving the dropcount metric to skb->cb[] eliminates this problem
at the expense of 4 bytes less in skb->cb[] for protocol families
using it.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger
3bc3b96f3b net: add common accessor for setting dropcount on packets
As part of an effort to move skb->dropcount to skb->cb[], use
a common function in order to set dropcount in struct sk_buff.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger
b4772ef879 net: use common macro for assering skb->cb[] available size in protocol families
As part of an effort to move skb->dropcount to skb->cb[] use a common
macro in protocol families using skb->cb[] for ancillary data to
validate available room in skb->cb[].

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger
6368c23577 net: bluetooth: compact struct bt_skb_cb by converting boolean fields to bit fields
Convert boolean fields incoming and req_start to bit fields and move
force_active in order save space in bt_skb_cb in an effort to use
a portion of skb->cb[] for storing skb->dropcount.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:29 -05:00
Eyal Birger
49a6fe0557 net: bluetooth: compact struct bt_skb_cb by inlining struct hci_req_ctrl
struct hci_req_ctrl is never used outside of struct bt_skb_cb;
Inlining it frees 8 bytes on a 64 bit system in skb->cb[] allowing
the addition of more ancillary data.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:29 -05:00
Johannes Berg
36ef906ee8 wext: add checked wrappers for adding events/points to streams
These checked wrappers are necessary for the next patch, which
will use them to avoid sending out partial scan results.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-28 21:31:12 +01:00
Alexander Duyck
56315f9e6e fib_trie: Convert fib_alias to hlist from list
There isn't any advantage to having it as a list and by making it an hlist
we make the fib_alias more compatible with the list_info in terms of the
type of list used.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:37:06 -05:00
Madhu Challa
93a714d6b5 multicast: Extend ip address command to enable multicast group join/leave on
Joining multicast group on ethernet level via "ip maddr" command would
not work if we have an Ethernet switch that does igmp snooping since
the switch would not replicate multicast packets on ports that did not
have IGMP reports for the multicast addresses.

Linux vxlan interfaces created via "ip link add vxlan" have the group option
that enables then to do the required join.

By extending ip address command with option "autojoin" we can get similar
functionality for openvswitch vxlan interfaces as well as other tunneling
mechanisms that need to receive multicast traffic. The kernel code is
structured similar to how the vxlan driver does a group join / leave.

example:
ip address add 224.1.1.10/24 dev eth5 autojoin
ip address del 224.1.1.10/24 dev eth5

Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:25:25 -05:00
Madhu Challa
46a4dee074 igmp v6: add __ipv6_sock_mc_join and __ipv6_sock_mc_drop
Based on the igmp v4 changes from Eric Dumazet.
959d10f6bbf6("igmp: add __ip_mc_{join|leave}_group()")

These changes are needed to perform igmp v6 join/leave while
RTNL is held.

Make ipv6_sock_mc_join and ipv6_sock_mc_drop wrappers around
__ipv6_sock_mc_join and  __ipv6_sock_mc_drop to avoid
proliferation of work queues.

Signed-off-by: Madhu Challa <challa@noironetworks.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:25:24 -05:00
Tom Herbert
723b8e460d udp: In udp_flow_src_port use random hash value if skb_get_hash fails
In the unlikely event that skb_get_hash is unable to deduce a hash
in udp_flow_src_port we use a consistent random value instead.
This is specified in GRE/UDP draft section 3.2.1:
https://tools.ietf.org/html/draft-ietf-tsvwg-gre-in-udp-encap-04

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:00:01 -05:00
Johan Hedberg
4cd3928a8b Bluetooth: Update New CSRK event to match latest specification
The 'master' parameter of the New CSRK event was recently renamed to
'type', with the old values kept for backwards compatibility as
unauthenticated local/remote keys. This patch updates the code to take
into account the two new (authenticated) values and ensures they get
used based on the security level of the connection that the respective
keys get distributed over.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-27 18:25:48 +01:00
Guenter Roeck
d79d210736 net: dsa: Introduce dsa_is_port_initialized
To avoid race conditions when using the ds->ports[] array,
we need to check if the accessed port has been initialized.
Introduce and use helper function dsa_is_port_initialized
for that purpose and use it where needed.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25 17:57:48 -05:00
Florian Fainelli
b73adef677 net: dsa: integrate with SWITCHDEV for HW bridging
In order to support bridging offloads in DSA switch drivers, select
NET_SWITCHDEV to get access to the port_stp_update and parent_get_id
NDOs that we are required to implement.

To facilitate the integratation at the DSA driver level, we implement 3
types of operations:

- port_join_bridge
- port_leave_bridge
- port_stp_update

DSA will resolve which switch ports that are currently bridge port
members as some Switch hardware/drivers need to know about that to limit
the register programming to just the relevant registers (especially for
slow MDIO buses).

We also take care of setting the correct STP state when slave network
devices are brought up/down while being bridge members.

Finally, when a port is leaving the bridge, we make sure we set in
BR_STATE_FORWARDING state, otherwise the bridge layer would leave it
disabled as a result of having left the bridge.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25 17:03:38 -05:00
Marcelo Ricardo Leitner
d752c36457 ipvs: allow rescheduling of new connections when port reuse is detected
Currently, when TCP/SCTP port reusing happens, IPVS will find the old
entry and use it for the new one, behaving like a forced persistence.
But if you consider a cluster with a heavy load of small connections,
such reuse will happen often and may lead to a not optimal load
balancing and might prevent a new node from getting a fair load.

This patch introduces a new sysctl, conn_reuse_mode, that allows
controlling how to proceed when port reuse is detected. The default
value will allow rescheduling of new connections only if the old entry
was in TIME_WAIT state for TCP or CLOSED for SCTP.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-25 13:46:35 +09:00
Mahesh Bandewar
14c9551a32 bonding: Implement port churn-machine (AD standard 43.4.17).
The Churn Detection machines detect the situation where a port is operable,
but the Actor and Partner have not attached the link to an Aggregator and
brought the link into operation within a bound time period. Under normal
operation of the LACP, agreement between Actor and Partner should be reached
very rapidly. Continued failure to reach agreement can be symptomatic of
device failure.

Actor-churn-detection state-machine
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>

===================================

BEGIN=True + PortEnable=False
           |
           v
 +------------------------+   ActorPort.Sync=True  +------------------+
 |   ACTOR_CHURN_MONITOR  | ---------------------> |  NO_ACTOR_CHURN  |
 |========================|                        |==================|
 |    ActorChurn=False    |  ActorPort.Sync=False  | ActorChurn=False |
 | ActorChurn.Timer=Start | <--------------------- |                  |
 +------------------------+                        +------------------+
           |                                                ^
           |                                                |
  ActorChurn.Timer=Expired                                  |
           |                                       ActorPort.Sync=True
           |                                                |
           |                +-----------------+             |
           |                |   ACTOR_CHURN   |             |
           |                |=================|             |
           +--------------> | ActorChurn=True | ------------+
                            |                 |
                            +-----------------+

Similar for the Partner-churn-detection.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-24 16:05:48 -05:00
Dan Carpenter
278f7b4fff caif: fix a signedness bug in cfpkt_iterate()
The cfpkt_iterate() function can return -EPROTO on error, but the
function is a u16 so the negative value gets truncated to a positive
unsigned short.  This causes a static checker warning.

The only caller which might care is cffrml_receive(), when it's checking
the frame checksum.  I modified cffrml_receive() so that it never says
-EPROTO is a valid checksum.

Also this isn't ever going to be inlined so I removed the "inline".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 17:35:14 -05:00
Johan Hedberg
7129069e84 Bluetooth: Rename hci_send_to_control to hci_send_to_channel
The hci_send_to_control() can be made more general purpose with a small
change of passing the desired HCI channel as a parameter to it. This
allows using it for the monitor channel as well as e.g. 6lowpan in the
future.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-20 18:20:17 +01:00
Johan Hedberg
3a6d576be9 Bluetooth: Convert disconn_cfm to be triggered through hci_cb
This patch moves all the disconn_cfm callbacks to be based on the hci_cb
list. This means making l2cap_disconn_cfm private to l2cap_core.c and
sco_conn_cb private to sco.c respectively. Since the hci_conn type
filtering isn't done any more on the wrapper level the callbacks
themselves need to check that they were passed a relevant type of
connection.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-19 08:44:29 +01:00
Johan Hedberg
539c496d88 Bluetooth: Convert connect_cfm to be triggered through hci_cb
This patch moves all the connect_cfm callbacks to be based on the hci_cb
list. This means making l2cap_connect_cfm private to l2cap_core.c and
sco_connect_cb private to sco.c respectively. Since the hci_conn type
filtering isn't done any more on the wrapper level the callbacks
themselves need to check that they were passed a relevant type of
connection.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-19 08:44:29 +01:00
Johan Hedberg
354fe804ed Bluetooth: Convert L2CAP security callback to use hci_cb
There's no reason to have the custom hci_proto_auth/encrypt_cfm helpers
when the hci_cb list works equally well. This patch adds L2CAP to the
hci_cb list and makes l2cap_security_cfm a private function of
l2cap_core.c.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-19 08:44:28 +01:00
Johan Hedberg
fba7ecf09b Bluetooth: Convert hci_cb_list_lock to a mutex
We'll soon need to be able to sleep inside the loops that iterate the
hci_cb list, so neither a spinlock, rwlock or rcu are usable. This patch
changes the lock to a mutex which permits sleeping while holding the
lock.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-19 08:44:28 +01:00
Linus Torvalds
f5af19d10d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Missing netlink attribute validation in nft_lookup, from Patrick
    McHardy.

 2) Restrict ipv6 partial checksum handling to UDP, since that's the
    only case it works for.  From Vlad Yasevich.

 3) Clear out silly device table sentinal macros used by SSB and BCMA
    drivers.  From Joe Perches.

 4) Make sure the remote checksum code never creates a situation where
    the remote checksum is applied yet the tunneling metadata describing
    the remote checksum transformation is still present.  Otherwise an
    external entity might see this and apply the checksum again.  From
    Tom Herbert.

 5) Use msecs_to_jiffies() where applicable, from Nicholas Mc Guire.

 6) Don't explicitly initialize timer struct fields, use setup_timer()
    and mod_timer() instead.  From Vaishali Thakkar.

 7) Don't invoke tg3_halt() without the tp->lock held, from Jun'ichi
    Nomura.

 8) Missing __percpu annotation in ipvlan driver, from Eric Dumazet.

 9) Don't potentially perform skb_get() on shared skbs, also from Eric
    Dumazet.

10) Fix COW'ing of metrics for non-DST_HOST routes in ipv6, from Martin
    KaFai Lau.

11) Fix merge resolution error between the iov_iter changes in vhost and
    some bug fixes that occurred at the same time.  From Jason Wang.

12) If rtnl_configure_link() fails we have to perform a call to
    ->dellink() before unregistering the device.  From WANG Cong.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
  net: dsa: Set valid phy interface type
  rtnetlink: call ->dellink on failure when ->newlink exists
  com20020-pci: add support for eae single card
  vhost_net: fix wrong iter offset when setting number of buffers
  net: spelling fixes
  net/core: Fix warning while make xmldocs caused by dev.c
  net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081
  ipv6: fix ipv6_cow_metrics for non DST_HOST case
  openvswitch: Fix key serialization.
  r8152: restore hw settings
  hso: fix rx parsing logic when skb allocation fails
  tcp: make sure skb is not shared before using skb_get()
  bridge: netfilter: Move sysctl-specific error code inside #ifdef
  ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc
  ipvlan: add a missing __percpu pcpu_stats
  tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()
  bgmac: fix device initialization on Northstar SoCs (condition typo)
  qlcnic: Delete existing multicast MAC list before adding new
  net/mlx5_core: Fix configuration of log_uar_page_sz
  sunvnet: don't change gso data on clones
  ...
2015-02-17 17:41:19 -08:00
Tedd Ho-Jeong An
a44fecbd52 Bluetooth: Add shutdown callback before closing the device
This callback allows a vendor to send the vendor specific commands
before cloing the hci interface.

Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-15 00:37:52 +01:00
Alexander Aring
b976796950 ieee802154: cleanup ieee802154_le64_to_be64
This patch cleanups the ieee802154_le64_to_be64 function. This patch
removes an unnecessary temporary variable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-14 05:19:58 +01:00
Alexander Aring
ba5bf17e83 ieee802154: cleanup ieee802154_be64_to_le64
This patch cleanups the ieee802154_be64_to_le64 function. This patch
removes an unnecessary temporary variable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-14 05:19:58 +01:00
Vladimir Davydov
f48b80a5e2 memcg: cleanup static keys decrement
Move memcg_socket_limit_enabled decrement to tcp_destroy_cgroup (called
from memcg_destroy_kmem -> mem_cgroup_sockets_destroy) and zap a bunch of
wrapper functions.

Although this patch moves static keys decrement from __mem_cgroup_free to
mem_cgroup_css_free, it does not introduce any functional changes, because
the keys are incremented on setting the limit (tcp or kmem), which can
only happen after successful mem_cgroup_css_online.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:10 -08:00
huaibin Wang
ac37e2515c xfrm: release dst_orig in case of error in xfrm_lookup()
dst_orig should be released on error. Function like __xfrm_route_forward()
expects that behavior.
Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(),
which expects the opposite.
Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be
done in case of error.

Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions")
Signed-off-by: huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-02-12 07:10:56 +01:00
Linus Torvalds
8cc748aa76 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "Highlights:

   - Smack adds secmark support for Netfilter
   - /proc/keys is now mandatory if CONFIG_KEYS=y
   - TPM gets its own device class
   - Added TPM 2.0 support
   - Smack file hook rework (all Smack users should review this!)"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits)
  cipso: don't use IPCB() to locate the CIPSO IP option
  SELinux: fix error code in policydb_init()
  selinux: add security in-core xattr support for pstore and debugfs
  selinux: quiet the filesystem labeling behavior message
  selinux: Remove unused function avc_sidcmp()
  ima: /proc/keys is now mandatory
  Smack: Repair netfilter dependency
  X.509: silence asn1 compiler debug output
  X.509: shut up about included cert for silent build
  KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y
  MAINTAINERS: email update
  tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device
  smack: fix possible use after frees in task_security() callers
  smack: Add missing logging in bidirectional UDS connect check
  Smack: secmark support for netfilter
  Smack: Rework file hooks
  tpm: fix format string error in tpm-chip.c
  char/tpm/tpm_crb: fix build error
  smack: Fix a bidirectional UDS connect check typo
  smack: introduce a special case for tmpfs in smack_d_instantiate()
  ...
2015-02-11 20:25:11 -08:00
Tom Herbert
0ace2ca89c vxlan: Use checksum partial with remote checksum offload
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:13 -08:00
Tom Herbert
26c4f7da3e net: Fix remcsum in GRO path to not change packet
Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.

To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.

In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:09 -08:00
Paul Moore
04f81f0154 cipso: don't use IPCB() to locate the CIPSO IP option
Using the IPCB() macro to get the IPv4 options is convenient, but
unfortunately NetLabel often needs to examine the CIPSO option outside
of the scope of the IP layer in the stack.  While historically IPCB()
worked above the IP layer, due to the inclusion of the inet_skb_param
struct at the head of the {tcp,udp}_skb_cb structs, recent commit
971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
reordered the tcp_skb_cb struct and invalidated this IPCB() trick.

This patch fixes the problem by creating a new function,
cipso_v4_optptr(), which locates the CIPSO option inside the IP header
without calling IPCB().  Unfortunately, this isn't as fast as a simple
lookup so some additional tweaks were made to limit the use of this
new function.

Cc: <stable@vger.kernel.org> # 3.18
Reported-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2015-02-11 14:46:37 -05:00
Fan Du
b0f9ca53cb ipv4: Namespecify TCP PMTU mechanism
Packetization Layer Path MTU Discovery works separately beside
Path MTU Discovery at IP level, different net namespace has
various requirements on which one to chose, e.g., a virutalized
container instance would require TCP PMTU to probe an usable
effective mtu for underlying tunnel, while the host would
employ classical ICMP based PMTU to function.

Hence making TCP PMTU mechanism per net namespace to decouple
two functionality. Furthermore the probe base MSS should also
be configured separately for each namespace.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 18:45:00 -08:00
David S. Miller
2573beec56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:35:57 -08:00
Vlad Yasevich
8381eacf5c ipv6: Make __ipv6_select_ident static
Make __ipv6_select_ident() static as it isn't used outside
the file.

Fixes: 0508c07f5e (ipv6: Select fragment id during UFO segmentation if not set.)
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:21:03 -08:00
Moni Shoua
92e584fe44 net/bonding: Fix potential bad memory access during bonding events
When queuing work to send the NETDEV_BONDING_INFO netdev event, it's
possible that when the work is executed, the pointer to the slave
becomes invalid. This can happen if between queuing the event and the
execution of the work, the net-device was un-ensvaled and re-enslaved.

Fix that by queuing a work with the data of the slave instead of the
slave structure.

Fixes: 69e6113343 ('net/bonding: Notify state change on slaves')
Reported-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:03:53 -08:00
Julian Anastasov
cd67cd5eb2 ipvs: use 64-bit rates in stats
IPVS stats are limited to 2^(32-10) conns/s and packets/s,
2^(32-5) bytes/s. It is time to use 64 bits:

* Change all conn/packet kernel counters to 64-bit and update
them in u64_stats_update_{begin,end} section

* In kernel use struct ip_vs_kstats instead of the user-space
struct ip_vs_stats_user and use new func ip_vs_export_stats_user
to export it to sockopt users to preserve compatibility with
32-bit values

* Rename cpu counters "ustats" to "cnt"

* To netlink users provide additionally 64-bit stats:
IPVS_SVC_ATTR_STATS64 and IPVS_DEST_ATTR_STATS64. Old stats
remain for old binaries.

* We can use ip_vs_copy_stats in ip_vs_stats_percpu_show

Thanks to Chris Caputo for providing initial patch for ip_vs_est.c

Signed-off-by: Chris Caputo <ccaputo@alt.net>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-09 16:59:03 +09:00
Eric Dumazet
567e4b7973 net: rfs: add hash collision detection
Receive Flow Steering is a nice solution but suffers from
hash collisions when a mix of connected and unconnected traffic
is received on the host, when flow hash table is populated.

Also, clearing flow in inet_release() makes RFS not very good
for short lived flows, as many packets can follow close().
(FIN , ACK packets, ...)

This patch extends the information stored into global hash table
to not only include cpu number, but upper part of the hash value.

I use a 32bit value, and dynamically split it in two parts.

For host with less than 64 possible cpus, this gives 6 bits for the
cpu number, and 26 (32-6) bits for the upper part of the hash.

Since hash bucket selection use low order bits of the hash, we have
a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
enough.

If the hash found in flow table does not match, we fallback to RPS (if
it is enabled for the rxqueue).

This means that a packet for an non connected flow can avoid the
IPI through a unrelated/victim CPU.

This also means we no longer have to clear the table at socket
close time, and this helps short lived flows performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 16:53:57 -08:00
Neal Cardwell
a9b2c06dbe tcp: mitigate ACK loops for connections as tcp_request_sock
In the SYN_RECV state, where the TCP connection is represented by
tcp_request_sock, we now rate-limit SYNACKs in response to a client's
retransmitted SYNs: we do not send a SYNACK in response to client SYN
if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms)
since we last sent a SYNACK in response to a client's retransmitted
SYN.

This allows the vast majority of legitimate client connections to
proceed unimpeded, even for the most aggressive platforms, iOS and
MacOS, which actually retransmit SYNs 1-second intervals for several
times in a row. They use SYN RTO timeouts following the progression:
1,1,1,1,1,2,4,8,16,32.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:12 -08:00
Neal Cardwell
032ee42369 tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks
Helpers for mitigating ACK loops by rate-limiting dupacks sent in
response to incoming out-of-window packets.

This patch includes:

- rate-limiting logic
- sysctl to control how often we allow dupacks to out-of-window packets
- SNMP counter for cases where we rate-limited our dupack sending

The rate-limiting logic in this patch decides to not send dupacks in
response to out-of-window segments if (a) they are SYNs or pure ACKs
and (b) the remote endpoint is sending them faster than the configured
rate limit.

We rate-limit our responses rather than blocking them entirely or
resetting the connection, because legitimate connections can rely on
dupacks in response to some out-of-window segments. For example, zero
window probes are typically sent with a sequence number that is below
the current window, and ZWPs thus expect to thus elicit a dupack in
response.

We allow dupacks in response to TCP segments with data, because these
may be spurious retransmissions for which the remote endpoint wants to
receive DSACKs. This is safe because segments with data can't
realistically be part of ACK loops, which by their nature consist of
each side sending pure/data-less ACKs to each other.

The dupack interval is controlled by a new sysctl knob,
tcp_invalid_ratelimit, given in milliseconds, in case an administrator
needs to dial this upward in the face of a high-rate DoS attack. The
name and units are chosen to be analogous to the existing analogous
knob for ICMP, icmp_ratelimit.

The default value for tcp_invalid_ratelimit is 500ms, which allows at
most one such dupack per 500ms. This is chosen to be 2x faster than
the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
2.4). We allow the extra 2x factor because network delay variations
can cause packets sent at 1 second intervals to be compressed and
arrive much closer.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:12 -08:00
David S. Miller
3c09e92fb6 NFC: 3.20 second pull request
This is the second NFC pull request for 3.20.
 
 It brings:
 
 - NCI NFCEE (NFC Execution Environment, typically an embedded or
   external secure element) discovery and enabling/disabling support.
   In order to communicate with an NFCEE, we also added NCI's logical
   connections support to the NCI stack.
 
 - HCI over NCI protocol support. Some secure elements only understand
   HCI and thus we need to send them HCI frames when they're part of
   an NCI chipset.
 
 - NFC_EVT_TRANSACTION userspace API addition. Whenever an application
   running on a secure element needs to notify its host counterpart,
   we send an NFC_EVENT_SE_TRANSACTION event to userspace through the
   NFC netlink socket.
 
 - Secure element and HCI transaction event support for the st21nfcb
   chipset.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU06ozAAoJEIqAPN1PVmxKdtkP/2CeQ0+xJnJWgyh4xjkVxfPb
 wPYrz3cHeHdnVX36mnIdRoQZRjxh/Ef/vgK8RsXohcldzHA9D8GjxRpj0GrkVGQ9
 xoDkiHsFDdsZHeezSrlnXNnObvZrkeMsmnUDJxfwLtcX1nzFKzBgW2GaDiNjUazC
 r5Ip4YIfoCaVLq5MSqYRqJ6uhplXd/ePdtPoo54L/E81y4ptQi8pUsQBy4K3GrFn
 FbXoemcymhi9AVBGl+OlppZ93t6bdOV1D5tcOwpytAbfa3jp2oUnJPZay7EoWruR
 aj8l5v3P+SSphAJwaGSbny0L83tTS7sq+N4QG+VxxdMkw2YjpmxgN3AGguNBtPGG
 SUThpZV6QhrkAOnwzP2BJnh68WSU1eJrGvk8zUWW0SanA8k2jaHO/VzXCA3/ZkC4
 IFcXl4Jr6R3iUxDFgS/6MB5E9EGedzFTacsWoHVyZPtaDAQp4WVdAIRQ4QKgJCLw
 1yRmCeVunTsrs6O0aenU1xHinPlnx+tkuiNh4H64APQYTz54WwXH1/S04OL1SkqI
 mTzUvFgWqJvSIOuU19g0HBw+xZ/9vvX8LLkm9kSTbe7tcmFH5CI7ZCN/hNDHvMSd
 sjNYOyag/SAHJ01tTKcSPAgWO49bHWPodI04zP0lK0gMLjKvRknVMdTgIDfu49HN
 2da9E23iBmNNgG79iVHN
 =8caD
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

NFC: 3.20 second pull request

This is the second NFC pull request for 3.20.

It brings:

- NCI NFCEE (NFC Execution Environment, typically an embedded or
  external secure element) discovery and enabling/disabling support.
  In order to communicate with an NFCEE, we also added NCI's logical
  connections support to the NCI stack.

- HCI over NCI protocol support. Some secure elements only understand
  HCI and thus we need to send them HCI frames when they're part of
  an NCI chipset.

- NFC_EVT_TRANSACTION userspace API addition. Whenever an application
  running on a secure element needs to notify its host counterpart,
  we send an NFC_EVENT_SE_TRANSACTION event to userspace through the
  NFC netlink socket.

- Secure element and HCI transaction event support for the st21nfcb
  chipset.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:22:25 -08:00
Erik Kline
c58da4c659 net: ipv6: allow explicitly choosing optimistic addresses
RFC 4429 ("Optimistic DAD") states that optimistic addresses
should be treated as deprecated addresses.  From section 2.1:

   Unless noted otherwise, components of the IPv6 protocol stack
   should treat addresses in the Optimistic state equivalently to
   those in the Deprecated state, indicating that the address is
   available for use but should not be used if another suitable
   address is available.

Optimistic addresses are indeed avoided when other addresses are
available (i.e. at source address selection time), but they have
not heretofore been available for things like explicit bind() and
sendmsg() with struct in6_pktinfo, etc.

This change makes optimistic addresses treated more like
deprecated addresses than tentative ones.

Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 15:37:41 -08:00
David S. Miller
6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
Eric Dumazet
677651462c ipv6: fix sparse errors in ip6_make_flowlabel()
include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
include/net/ipv6.h:713:22:    expected restricted __be32 [usertype] hash
include/net/ipv6.h:713:22:    got unsigned int
include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
include/net/ipv6.h:719:22: warning: invalid assignment: ^=
include/net/ipv6.h:719:22:    left side has type restricted __be32
include/net/ipv6.h:719:22:    right side has type unsigned int

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 00:42:28 -08:00
Eric Dumazet
f4575d3534 flow_keys: n_proto type should be __be16
(struct flow_keys)->n_proto is in network order, use
proper type for this.

Fixes following sparse errors :

net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:139:39:    expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:139:39:    got restricted __be16 [assigned] [usertype] proto
net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:237:23:    expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:237:23:    got restricted __be16 [assigned] [usertype] proto

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d8498 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 00:40:22 -08:00
David S. Miller
f2683b743f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
More iov_iter work from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:46:55 -08:00
Eric Dumazet
9878196578 tcp: do not pace pure ack packets
When we added pacing to TCP, we decided to let sch_fq take care
of actual pacing.

All TCP had to do was to compute sk->pacing_rate using simple formula:

sk->pacing_rate = 2 * cwnd * mss / rtt

It works well for senders (bulk flows), but not very well for receivers
or even RPC :

cwnd on the receiver can be less than 10, rtt can be around 100ms, so we
can end up pacing ACK packets, slowing down the sender.

Really, only the sender should pace, according to its own logic.

Instead of adding a new bit in skb, or call yet another flow
dissection, we tweak skb->truesize to a small value (2), and
we instruct sch_fq to use new helper and not pace pure ack.

Note this also helps TCP small queue, as ack packets present
in qdisc/NIC do not prevent sending a data packet (RPC workload)

This helps to reduce tx completion overhead, ack packets can use regular
sock_wfree() instead of tcp_wfree() which is a bit more expensive.

This has no impact in the case packets are sent to loopback interface,
as we do not coalesce ack packets (were we would detect skb->truesize
lie)

In case netem (with a delay) is used, skb_orphan_partial() also sets
skb->truesize to 1.

This patch is a combination of two patches we used for about one year at
Google.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:36:31 -08:00
Moni Shoua
69e6113343 net/bonding: Notify state change on slaves
Use notifier chain to dispatch an event upon a change in slave state.
Event is dispatched with slave specific info.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:24 -08:00
Moni Shoua
69a2338e05 net/bonding: Move slave state changes to a helper function
Move slave state changes to a helper function, this is a pre-step for adding
functionality of dispatching an event when this helper is called.

This commit doesn't add new functionality.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:24 -08:00
David S. Miller
940288b6a5 Last round of updates for net-next:
* revert a patch that caused a regression with mesh userspace (Bob)
  * fix a number of suspend/resume related races
    (from Emmanuel, Luca and myself - we'll look at backporting later)
  * add software implementations for new ciphers (Jouni)
  * add a new ACPI ID for Broadcom's rfkill (Mika)
  * allow using netns FD for wireless (Vadim)
  * some other cleanups (various)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJU0NEaAAoJEDBSmw7B7bqr2DAP/3nbk6WB2Lz4Vwi8fh9C4y6X
 4ZzN2v7NHaimC+Wpxg3wP2wCdX2VG3ZWwF3yLm6qgVHGnE35RFLxURen1eqNeW77
 Yf5gvZ066nCGEQ8l8J6YK9vrLX4qp5c4lyE00bbxpZTA4Qq71SgTg+rmGC1be8uX
 vacfaLwfDWffuOOnjBAPfanj7f4AQaUEY2uN1WkBFC7iEeOtPcWVkHAFVIeGjJfQ
 vfgQJcwOjgWjYbwZdQQi7Aj+k1Yzda4pg1yEWn3CkZ8zyeCYGs1gk2ovoBgPgXBP
 yT9ypIdFsN242VJvy7nkFnCKA8mhKyltMQ1Xjs0Q9lAxWdaq9U+iqt5cUn4jxVIG
 T9Vi3PCbx/nVOqcfR81dBTQ3uDU5AyosPsmh2YTxi5lpRBrsjNY2FtaKE0sm2Om3
 wRiSPOdPrXBeEnU0KssI9e6euXgS4JQV78Naq85OWZDd2yZ1fT5U2fi8y4drRGlz
 rucbbobdVhQch5L4FStPz1uW5pNuJrhekXeZIE8MruTNg2A2oBAK3ApO7hxn68sE
 RnbAnkxVLwgedC9042JF5eiS1PDIU46w4e782j/+XskVKMEakqd23iJycx3tmgHZ
 cxDi/qKZ2RCE74YsT61o/9ErSVPvCfNPL3+CVov918jidQQg2WHfKm/jFlIxHIxk
 4wBP7p2VGlENMgw/R8GJ
 =wOaR
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Last round of updates for net-next:
 * revert a patch that caused a regression with mesh userspace (Bob)
 * fix a number of suspend/resume related races
   (from Emmanuel, Luca and myself - we'll look at backporting later)
 * add software implementations for new ciphers (Jouni)
 * add a new ACPI ID for Broadcom's rfkill (Mika)
 * allow using netns FD for wireless (Vadim)
 * some other cleanups (various)

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 14:57:45 -08:00
David S. Miller
45e826fd57 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-02-03

Here's what's likely the last bluetooth-next pull request for 3.20.
Notable changes include:

 - xHCI workaround + a new id for the ath3k driver
 - Several new ids for the btusb driver
 - Support for new Intel Bluetooth controllers
 - Minor cleanups to ieee802154 code
 - Nested sleep warning fix in socket accept() code path
 - Fixes for Out of Band pairing handling
 - Support for LE scan restarting for HCI_QUIRK_STRICT_DUPLICATE_FILTER
 - Improvements to data we expose through debugfs
 - Proper handling of Hardware Error HCI events

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 13:56:37 -08:00
Christophe Ricard
15d4a8da0e NFC: nci: Move logical connection structure allocation
conn_info is currently allocated only after nfcee_discovery_ntf
which is not generic enough for logical connection other than
NFCEE. The corresponding conn_info is now created in
nci_core_conn_create_rsp().

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04 09:14:09 +01:00
Christophe Ricard
3ba5c8466b NFC: nci: Change credits field to credits_cnt
For consistency sake change nci_core_conn_create_rsp structure
credits field to credits_cnt.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04 09:13:15 +01:00
Christophe Ricard
b16ae7160a NFC: nci: Support all destinations type when creating a connection
The current implementation limits nci_core_conn_create_req()
to only manage NCI_DESTINATION_NFCEE.
Add new parameters to nci_core_conn_create() to support all
destination types described in the NCI specification.
Because there are some parameters with variable size dynamic
buffer allocation is needed.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04 09:10:50 +01:00
Christophe Ricard
12bdf27d46 NFC: nci: Add reference to the RF logical connection
The NCI_STATIC_RF_CONN_ID logical connection is the most used
connection. Keeping it directly accessible in the nci_dev
structure will simplify and optimize the access.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04 09:09:53 +01:00
Vlad Yasevich
0508c07f5e ipv6: Select fragment id during UFO segmentation if not set.
If the IPv6 fragment id has not been set and we perform
fragmentation due to UFO, select a new fragment id.
We now consider a fragment id of 0 as unset and if id selection
process returns 0 (after all the pertrubations), we set it to
0x80000000, thus giving us ample space not to create collisions
with the next packet we may have to fragment.

When doing UFO integrity checking, we also select the
fragment id if it has not be set yet.   This is stored into
the skb_shinfo() thus allowing UFO to function correclty.

This patch also removes duplicate fragment id generation code
and moves ipv6_select_ident() into the header as it may be
used during GSO.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03 23:06:43 -08:00
Al Viro
21226abb4e net: switch memcpy_fromiovec()/memcpy_fromiovecend() users to copy_from_iter()
That takes care of the majority of ->sendmsg() instances - most of them
via memcpy_to_msg() or assorted getfrag() callbacks.  One place where we
still keep memcpy_fromiovecend() is tipc - there we potentially read the
same data over and over; separate patch, that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:15 -05:00
Al Viro
57be5bdad7 ip: convert tcp_sendmsg() to iov_iter primitives
patch is actually smaller than it seems to be - most of it is unindenting
the inner loop body in tcp_sendmsg() itself...

the bit in tcp_input.c is going to get reverted very soon - that's what
memcpy_from_msg() will become, but not in this commit; let's keep it
reasonably contained...

There's one potentially subtle change here: in case of short copy from
userland, mainline tcp_send_syn_data() discards the skb it has allocated
and falls back to normal path, where we'll send as much as possible after
rereading the same data again.  This patch trims SYN+data skb instead -
that way we don't need to copy from the same place twice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:14 -05:00
Al Viro
cacdc7d2f9 ip: stash a pointer to msghdr in struct ping_fakehdr
... instead of storing its ->mgs_iter.iov there

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:14 -05:00
David S. Miller
3ae55826ae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Validate hooks for nf_tables NAT expressions, otherwise users can
   crash the kernel when using them from the wrong hook. We already
   got one user trapped on this when configuring masquerading.

2) Fix a BUG splat in nf_tables with CONFIG_DEBUG_PREEMPT=y. Reported
   by Andreas Schultz.

3) Avoid unnecessary reroute of traffic in the local input path
   in IPVS that triggers a crash in in xfrm. Reported by Florian
   Wiessner and fixes by Julian Anastasov.

4) Fix memory and module refcount leak from the error path of
   nf_tables_newchain().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:30:53 -08:00
Vlad Yasevich
6422398c2a ipv6: introduce ipv6_make_skb
This commit is very similar to
commit 1c32c5ad6f
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 1 02:36:47 2011 +0000

    inet: Add ip_make_skb and ip_finish_skb

It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.

The job of ip6_make_skb is to collect messages into an ipv6 packet
and poplulate ipv6 eader.  The job of ip6_finish_skb is to transmit
the generated skb.  Together they replicated the job of
ip6_push_pending_frames() while also provide the capability to be
called independently.  This will be needed to add lockless UDP sendmsg
support.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:28:04 -08:00
Willem de Bruijn
b245be1f4d net-timestamp: no-payload only sysctl
Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.

Add a sysctl that optionally drops looped timestamp with data. This
only affects processes without CAP_NET_RAW.

The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.

No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  (v1 -> v2)
  - test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
  (rfc -> v1)
  - document the sysctl in Documentation/sysctl/net.txt
  - fix access control race: read .._OPT_TSONLY only once,
        use same value for permission check and skb generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 18:46:51 -08:00
Christophe Ricard
a41bb8448e NFC: nci: Add RF NFCEE action notification support
The NFCC sends an NCI_OP_RF_NFCEE_ACTION_NTF notification
to the host (DH) to let it know that for example an RF
transaction with a payment reader is done.
For now the notification handler is empty.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:41 +01:00
Christophe Ricard
447b27c4f2 NFC: Forward NFC_EVT_TRANSACTION to user space
NFC_EVT_TRANSACTION is sent through netlink in order for a
specific application running on a secure element to notify
userspace of an event. Typically the secure element application
counterpart on the host could interpret that event and act
upon it.

Forwarded information contains:
- SE host generating the event
- Application IDentifier doing the operation
- Applications parameters

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:40 +01:00
Christophe Ricard
11f54f2286 NFC: nci: Add HCI over NCI protocol support
According to the NCI specification, one can use HCI over NCI
to talk with specific NFCEE. The HCI network is viewed as one
logical NFCEE.
This is needed to support secure element running HCI only
firmwares embedded on an NCI capable chipset, like e.g. the
st21nfcb.
There is some duplication between this piece of code and the
HCI core code, but the latter would need to be abstracted even
more to be able to use NCI as a logical transport for HCP packets.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:40 +01:00
Christophe Ricard
736bb95774 NFC: nci: Support logical connections management
In order to communicate with an NFCEE, we need to open a logical
connection to it, by sending the NCI_OP_CORE_CONN_CREATE_CMD
command to the NFCC. It's left up to the drivers to decide when
to close an already opened logical connection.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:39 +01:00
Christophe Ricard
f7f793f313 NFC: nci: Add NFCEE enabling and disabling support
NFCEEs can be enabled or disabled by sending the
NCI_OP_NFCEE_MODE_SET_CMD command to the NFCC. This patch
provides an API for drivers to enable and disable e.g. their
NCI discoveredd secure elements.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:39 +01:00
Christophe Ricard
af9c8aa67d NFC: nci: Add NFCEE discover support
NFCEEs (NFC Execution Environment) have to be explicitly
discovered by sending the NCI_OP_NFCEE_DISCOVER_CMD
command. The NFCC will respond to this command by telling
us how many NFCEEs are connected to it. Then the NFCC sends
a notification command for each and every NFCEE connected.
Here we implement support for sending
NCI_OP_NFCEE_DISCOVER_CMD command, receiving the response
and the potential notifications.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:38 +01:00
Christophe Ricard
8277f6937a NFC: nci: Add NCI NFCEE constants
Add NFCEE NCI constant for:
- NFCEE Interface/Protocols
- Destination type
- Destination-specific parameters type
- NFCEE Discovery Action

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:38 +01:00
Christophe Ricard
4aeee6871e NFC: nci: Add dynamic logical connections support
The current NCI core only support the RF static connection.
For other NFC features such as Secure Element communication, we
may need to create logical connections to the NFCEE (Execution
Environment.

In order to track each logical connection ID dynamically, we add a
linked list of connection info pointers to the nci_dev structure.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:31 +01:00
Johan Hedberg
66f096f791 Bluetooth: Remove mgmt_rp_read_local_oob_ext_data struct
This extended return parameters struct conflicts with the new Read Local
OOB Extended Data command definition. To avoid the conflict simply
rename the old "extended" version to the normal one and update the code
appropriately to take into account the two possible response PDU sizes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-02 18:27:56 +01:00
Jakub Pawlowski
4b0e0ceddf Bluetooth: Add restarting to service discovery
When using LE_SCAN_FILTER_DUP_ENABLE, some controllers would send
advertising report from each LE device only once. That means that we
don't get any updates on RSSI value, and makes Service Discovery very
slow. This patch adds restarting scan when in Service Discovery, and
device with filtered uuid is found, but it's not in RSSI range to send
event yet. This way if device moves into range, we will quickly get RSSI
update.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-02 08:52:34 +01:00
Jakub Pawlowski
2d28cfe7aa Bluetooth: Add le_scan_restart work for LE scan restarting
Currently there is no way to restart le scan, and it's needed in
service scan method. The way it work: it disable, and then enable le
scan on controller.

During the restart, we must remember when the scan was started, and
it's duration, to later re-schedule the le_scan_disable work, that was
stopped during the stop scan phase.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-02 08:52:33 +01:00
Roopa Prabhu
8a44dbb202 swdevice: add new apis to set and del bridge port attributes
This patch adds two new api's netdev_switch_port_bridge_setlink
and netdev_switch_port_bridge_dellink to offload bridge port attributes
to switch port

(The names of the apis look odd with 'switch_port_bridge',
but am more inclined to change the prefix of the api to something else.
Will take any suggestions).

The api's look at the NETIF_F_HW_SWITCH_OFFLOAD feature flag to
pass bridge port attributes to the port device.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01 23:16:34 -08:00
Eric Dumazet
bdbbb8527b ipv4: tcp: get rid of ugly unicast_sock
In commit be9f4a44e7 ("ipv4: tcp: remove per net tcp_sock")
I tried to address contention on a socket lock, but the solution
I chose was horrible :

commit 3a7c384ffd ("ipv4: tcp: unicast_sock should not land outside
of TCP stack") addressed a selinux regression.

commit 0980e56e50 ("ipv4: tcp: set unicast_sock uc_ttl to -1")
took care of another regression.

commit b5ec8eeac4 ("ipv4: fix ip_send_skb()") fixed another regression.

commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
was another shot in the dark.

Really, just use a proper socket per cpu, and remove the skb_orphan()
call, to re-enable flow control.

This solves a serious problem with FQ packet scheduler when used in
hostile environments, as we do not want to allocate a flow structure
for every RST packet sent in response to a spoofed packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01 23:06:19 -08:00
Eric Dumazet
0d32ef8cef net: sched: fix panic in rate estimators
Doing the following commands on a non idle network device
panics the box instantly, because cpu_bstats gets overwritten
by stats.

tc qdisc add dev eth0 root <your_favorite_qdisc>
... some traffic (one packet is enough) ...
tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc>

[  325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c
[  325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
[  325.369158] PGD 1fa7067 PUD 0
[  325.372254] Oops: 0000 [#1] SMP
[  325.375514] Modules linked in: ...
[  325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163
[  325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000
[  325.419518] RIP: 0010:[<ffffffff81541c9e>]  [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
[  325.428506] RSP: 0018:ffff881ff2fa7928  EFLAGS: 00010286
[  325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c
[  325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060
[  325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000
[  325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744
[  325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c
[  325.469536] FS:  00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000
[  325.477630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0
[  325.490510] Stack:
[  325.492524]  ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0
[  325.499990]  ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744
[  325.507460]  00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0
[  325.514956] Call Trace:
[  325.517412]  [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230
[  325.523250]  [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60
[  325.529349]  [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590
[  325.535117]  [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240
[  325.540963]  [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20
[  325.546532]  [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0
[  325.552145]  [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40
[  325.557558]  [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220
[  325.563317]  [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0

Lets play safe and not use an union : percpu 'pointers' are mostly read
anyway, and we have typically few qdiscs per host.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Fixes: 22e0f8b932 ("net: sched: make bstats per cpu and estimator RCU safe")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-31 17:49:37 -08:00
Eric Dumazet
349c9e3c73 ipv4: icmp: use percpu allocation
Get rid of nr_cpu_ids and use modern percpu allocation.

Note that the sockets themselves are not yet allocated
using NUMA affinity.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-31 17:48:18 -08:00
Marcel Holtmann
f7697b1602 Bluetooth: Store OOB data present value for each set of remote OOB data
Instead of doing complex calculation every time the OOB data is used,
just calculate the OOB data present value and store it with the OOB
data raw values.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-31 09:59:45 +02:00
Christoph Hellwig
7cc0566268 net: remove sock_iocb
The sock_iocb structure is allocate on stack for each read/write-like
operation on sockets, and contains various fields of which only the
embedded msghdr and sometimes a pointer to the scm_cookie is ever used.
Get rid of the sock_iocb and put a msghdr directly on the stack and pass
the scm_cookie explicitly to netlink_mmap_sendmsg.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 23:15:07 -08:00
Jesse Gross
b8693877ae openvswitch: Add support for checksums on UDP tunnels.
Currently, it isn't possible to request checksums on the outer UDP
header of tunnels - the TUNNEL_CSUM flag is ignored. This adds
support for requesting that UDP checksums be computed on transmit
and properly reported if they are present on receive.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 23:04:15 -08:00
David S. Miller
b8f8be3f04 NFC: 3.20 first pull request
This is the first NFC pull request for 3.20.
 
 With this one we have:
 
 - Secure element support for the ST Micro st21nfca driver. This depends
   on a few HCI internal changes in order for example to support more
   than one secure element per controller.
 
 - ACPI support for NXP's pn544 HCI driver. This controller is found on
   many x86 SoCs and is typically enumerated on the ACPI bus there.
 
 - A few st21nfca and st21nfcb fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUyXy7AAoJEIqAPN1PVmxKOY4P/1WJtEZfzBOFMlh9qeA8YESv
 cS1xbZuAVeEJ3r/sgDc87iA/4MMNZDzfhDHQlQJs/pJWAXE3Am1/dZGWHJC6pbk/
 roTlbEh5OvQU8cRIdAvOcEgrBIWk+E30Mkd2OtOMpWyhbgChN7hKz0KUs7znVHBJ
 G3YCcZOPr7K2ra78UlYzApvGqoxiVaiEWQyj6rzx2HbVzxzICL6A5m9cRcZuGxYR
 sK3Y/DpKJKGwH3p1kkLIOqHy3nGhq2LttVSXF/f5xkOB8teERSkt4i8aJBiSb9ym
 a++iAWp2syH3sGh2rsb3G4KRYCYq2J9mJD7oBB3G/6UoNwyeSgW3GdxgH/yxt6C9
 KfzHm9T8jfvQIBlf4lGeboV8zu+ysQehJFNKtxdQDlvwqPiR14clT5JFejocr9+N
 SegCbF+LJaZW8boOZVvhxTxASpEZ0RTFwUIkKKxtXOpK4ha0s1gEAtuZUpTYmRtJ
 wGqds8wszkE0wOdZJi7dsCGpp5JNI0LZZCsuKa6Ko3E7LkTAor8Bbmob0RR51ClD
 srtoz6kJgoozAMRLMISXBimk0geOp38iGs26GPSpRHwSV/1loN6+abaOMYlGbDCq
 +oVpqMmJsS/z5US5+wEkWU+O4y9TS0O74d/TxxcwUM+CLNyKI+Ms1rMOyYi0domo
 2xa7MuDCrmztQbs6ROKT
 =SNWp
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

NFC: 3.20 first pull request

This is the first NFC pull request for 3.20.

With this one we have:

- Secure element support for the ST Micro st21nfca driver. This depends
  on a few HCI internal changes in order for example to support more
  than one secure element per controller.

- ACPI support for NXP's pn544 HCI driver. This controller is found on
  many x86 SoCs and is typically enumerated on the ACPI bus there.

- A few st21nfca and st21nfcb fixes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:49:55 -08:00
Neal Cardwell
e73ebb0881 tcp: stretch ACK fixes prep
LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that
cover more than the RFC-specified maximum of 2 packets. These stretch
ACKs can cause serious performance shortfalls in common congestion
control algorithms that were designed and tuned years ago with
receiver hosts that were not using LRO or GRO, and were instead
politely ACKing every other packet.

This patch series fixes Reno and CUBIC to handle stretch ACKs.

This patch prepares for the upcoming stretch ACK bug fix patches. It
adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future
fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and
changes all congestion control algorithms to pass in 1 for the ACKed
count. It also changes tcp_slow_start() to return the number of packet
ACK "credits" that were not processed in slow start mode, and can be
processed by the congestion control module in additive increase mode.

In future patches we will fix tcp_cong_avoid_ai() to handle stretch
ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start
and additive increase mode.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:37 -08:00
Marcel Holtmann
c7741d16a5 Bluetooth: Perform a power cycle when receiving hardware error event
When receiving a HCI Hardware Error event, the controller should be
assumed to be non-functional until issuing a HCI Reset command.

The Bluetooth hardware errors are vendor specific and so add a
new hdev->hw_error callback that drivers can provide to run extra
code to handle the hardware error.

After completing the vendor specific error handling perform a full
reset of the Bluetooth stack by closing and re-opening the transport.

Based-on-patch-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-28 21:26:24 +01:00
Jonathan Toppins
303691042d bonding: cleanup and remove dead code
fix sparse warning about non-static function

drivers/net/bonding/bond_main.c:3737:5: warning: symbol
'bond_3ad_xor_xmit' was not declared. Should it be static?

Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:09:04 -08:00
Jonathan Toppins
2477bc9a3d bonding: update bond carrier state when min_links option changes
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:09:03 -08:00
David S. Miller
95f873f2ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/arm/boot/dts/imx6sx-sdb.dts
	net/sched/cls_bpf.c

Two simple sets of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 16:59:56 -08:00
Christophe Ricard
8409e4283c NFC: hci: Add cmd_received handler
When a command is received, it is sometime needed to let the CLF driver do
some additional operations. (ex: count remaining pipe notification...)

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-28 00:03:34 +01:00
Christophe Ricard
af77522320 NFC: hci: Change nfc_hci_send_response gate parameter to pipe
As there can be several pipes connected to the same gate, we need
to know which pipe ID to use when sending an HCI response. A gate
ID is not enough.

Instead of changing the nfc_hci_send_response() API to something
not aligned with the rest of the HCI API, we call nfc_hci_hcp_message_tx
directly.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-27 23:55:20 +01:00
Christophe Ricard
118278f20a NFC: hci: Add pipes table to reference them with a tuple {gate, host}
In order to keep host source information on specific hci event (such as
evt_connectivity or evt_transaction) and because 2 pipes can be connected
to the same gate, it is necessary to add a table referencing every pipe
with a {gate, host} tuple.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-27 23:39:32 +01:00
Christophe Ricard
fda7a49cb9 NFC: hci: Change event_received handler gate parameter to pipe
Several pipes may point to the same CLF gate, so getting the gate ID
as an input is not enough.
For example dual secure element may have 2 pipes (1 for uicc and
1 for eSE) pointing to the connectivity gate.

As resolving gate and host IDs can be done from a pipe, we now pass
the pipe ID to the event received handler.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-27 23:39:23 +01:00
Jouni Malinen
8ade538bf3 mac80111: Add BIP-GMAC-128 and BIP-GMAC-256 ciphers
This allows mac80211 to configure BIP-GMAC-128 and BIP-GMAC-256 to the
driver and also use software-implementation within mac80211 when the
driver does not support this with hardware accelaration.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 11:10:13 +01:00
Jouni Malinen
00b9cfa3ff mac80111: Add GCMP and GCMP-256 ciphers
This allows mac80211 to configure GCMP and GCMP-256 to the driver and
also use software-implementation within mac80211 when the driver does
not support this with hardware accelaration.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[remove a spurious newline]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 11:06:09 +01:00
Hannes Frederic Sowa
df4d92549f ipv4: try to cache dst_entries which would cause a redirect
Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f886497212 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU.  Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 17:28:27 -08:00
Joe Stringer
7b1883cefc genetlink: Add genlmsg_parse() helper function.
The first user will be the next patch.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:45:50 -08:00
Tom Herbert
af33c1adae vxlan: Eliminate dependency on UDP socket in transmit path
In the vxlan transmit path there is no need to reference the socket
for a tunnel which is needed for the receive side. We do, however,
need the vxlan_dev flags. This patch eliminate references
to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
vxlan_sock flags.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 23:15:40 -08:00
Tom Herbert
d998f8efa4 udp: Do not require sock in udp_tunnel_xmit_skb
The UDP tunnel transmit functions udp_tunnel_xmit_skb and
udp_tunnel6_xmit_skb include a socket argument. The socket being
passed to the functions (from VXLAN) is a UDP created for receive
side. The only thing that the socket is used for in the transmit
functions is to get the setting for checksum (enabled or zero).
This patch removes the argument and and adds a nocheck argument
for checksum setting. This eliminates the unnecessary dependency
on a UDP socket for UDP tunnel transmit.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 23:15:40 -08:00
Johan Hedberg
a1443f5a27 Bluetooth: Convert Set SC to use HCI Request
This patch converts the Set Secure Connection HCI handling to use a HCI
request instead of using a hard-coded callback in hci_event.c. This e.g.
ensures that we don't clear the flags incorrectly if something goes
wrong with the power up process (not related to a mgmt Set SC command).

The code can also be simplified a bit since only one pending Set SC
command is allowed, i.e. mgmt_pending_foreach usage is not needed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-23 19:07:03 +01:00
Luciano Coelho
9c74893441 nl80211: add an attribute to allow delaying the first scheduled scan cycle
The userspace may want to delay the the first scheduled scan or
net-detect cycle.  Add an optional attribute to the scheduled scan
configuration to pass the delay to be (optionally) used by the driver.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
[add the attribute to the policy to validate it]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:30:47 +01:00
Lorenzo Bianconi
db82d8a966 mac80211: enable TPC through mac80211 stack
Control per packet Transmit Power Control (TPC) in lower drivers
according to TX power settings configured by the user. In particular TPC is
enabled if value passed in enum nl80211_tx_power_setting is
NL80211_TX_POWER_LIMITED (allow using less than specified from userspace),
whereas TPC is disabled if nl80211_tx_power_setting is set to
NL80211_TX_POWER_FIXED (use value configured from userspace)

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:28:51 +01:00
Johannes Berg
fa7e1fbcb5 mac80211: allow drivers to control software crypto
Some drivers unfortunately cannot support software crypto, but
mac80211 currently assumes that they do.

This has the issue that if the hardware enabling fails for some
reason, the software fallback is used, which won't work. This
clearly isn't desirable, the error should be reported and the
key setting refused.

Support this in mac80211 by allowing drivers to set a new HW
flag IEEE80211_HW_SW_CRYPTO_CONTROL, in which case mac80211 will
only allow software fallback if the set_key() method returns 1.
The driver will also need to advertise supported cipher suites
so that mac80211 doesn't advertise any (future) software ciphers
that the driver can't actually do.

While at it, to make it easier to support this, refactor the
ieee80211_init_cipher_suites() code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-22 22:01:01 +01:00
David S. Miller
0c49087462 Some further updates for net-next:
* fix network-manager which was broken by the previous changes
  * fix delete-station events, which were broken by me making the
    genlmsg_end() mistake
  * fix a timer left running during suspend in some race conditions
    that would cause an annoying (but harmless) warning
  * (less important, but in the tree already) remove 80+80 MHz rate
    reporting since the spec doesn't distinguish it from 160 MHz;
    as the bitrate they're both 160 MHz bandwidth
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUvUZlAAoJEDBSmw7B7bqrfNMQAJT5jjOSjmwW8Zdvujvda/qt
 bFpYa9t0NsN3izzMpjPSrCwPrHN5qE86ZA8TcZrIzejPH4rpltjaXB6JNHZardVo
 deCUWU9xotoPELoE0Xex9mHPEkYYvOaht/P8A/88qP1S2PykMmj9fqNeijyUvwuo
 Jlsh0wKe4Jq6bCmdxvy/bde84ceAQcuh2TKNov1S0tB38tRY9qSu2n6ZGpoMNcEe
 CWuW+23jL1uAvt6Ljk2fTKdLR8iyXykfM0UMX2/4R2PMnJXK/dQqV/eeXTjpxtMk
 UV4aIMcSS19D7HxICKOXOdZLdMMuXaFUnUMlGLBtXZz3n9lZoP7RtVIHoib8VBXZ
 tY7xQrK6YNvwZ4SZZPuv/yr6YWP2+vBM2FUfXjzD+or6uYsej203a5q0RsOY+3Tp
 6Yklr+mQNlrOtpMsHMSgrBUUZsAK1I95ALMVVqaq1hgf1cDvRIDHlOo4A7bjwNFw
 eA3L1o4O1/E/IGp4v6+2Iukc9rIwm11sNr/wuD8dDkZTradUPH1iY6J5sxJNb2Nq
 YdpCnQ/lNXj650+z9/G2omSA6DTTTOtXJPxKR+FOHZVKDpZYtF6TxKb0S79fINps
 6ZlWIna5bUiXF1b6xad1x+vtyjNMgTvkg6mR+WQnvF57Ri8hucbtpv5wpA5bhYUQ
 Fbz9VZF2nfMeIbXfTaWi
 =Bvmr
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Some further updates for net-next:
 * fix network-manager which was broken by the previous changes
 * fix delete-station events, which were broken by me making the
   genlmsg_end() mistake
 * fix a timer left running during suspend in some race conditions
   that would cause an annoying (but harmless) warning
 * (less important, but in the tree already) remove 80+80 MHz rate
   reporting since the spec doesn't distinguish it from 160 MHz;
   as the bitrate they're both 160 MHz bandwidth

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 16:22:19 -05:00
Felix Fietkau
22a5dc0e5e net: sched: Introduce connmark action
This tc action allows you to retrieve the connection tracking mark
This action has been used heavily by openwrt for a few years now.

There are known limitations currently:

doesn't work for initial packets, since we only query the ct table.
  Fine given use case is for returning packets

no implicit defrag.
  frags should be rare so fix later..

won't work for more complex tasks, e.g. lookup of other extensions
  since we have no means to store results

we still have a 2nd lookup later on via normal conntrack path.
This shouldn't break anything though since skb->nfct isn't altered.

V2:
remove unnecessary braces (Jiri)
change the action identifier to 14 (Jiri)
Fix some stylistic issues caught by checkpatch
V3:
Move module params to bottom (Cong)
Get rid of tcf_hashinfo_init and friends and conform to newer API (Cong)

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 16:02:06 -05:00
Nicolas Dichtel
1728d4fabd tunnels: advertise link netns via netlink
Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:32:03 -05:00
Nicolas Dichtel
d37512a277 rtnl: add link netns id to interface messages
This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels).
With this attribute, it's possible to interpret correctly all advertised
information (like IFLA_LINK, etc.).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:21:26 -05:00
Nicolas Dichtel
0c7aecd4bd netns: add rtnl cmd to add and get peer netns ids
With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to the netns where it is added (ie valid only into this
netns).

The main function (ie the one exported to other module), peernet2id(), allows to
get the id of a peer netns. If no id has been assigned by the user, this
function allocates one.

These ids will be used in netlink messages to point to a peer netns, for example
in case of a x-netns interface.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:21:18 -05:00
Pablo Neira Ayuso
75e8d06d43 netfilter: nf_tables: validate hooks in NAT expressions
The user can crash the kernel if it uses any of the existing NAT
expressions from the wrong hook, so add some code to validate this
when loading the rule.

This patch introduces nft_chain_validate_hooks() which is based on
an existing function in the bridge version of the reject expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-19 14:52:39 +01:00
Jiri Pirko
27c0013285 switchdev: fix typo in inline function definition
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 12:24:11 -05:00
Martin KaFai Lau
88340160f3 ip_tunnel: Create percpu gro_cell
In the ipip tunnel, the skb->queue_mapping is lost in ipip_rcv().
All skb will be queued to the same cell->napi_skbs.  The
gro_cell_poll is pinned to one core under load.  In production traffic,
we also see severe rx_dropped in the tunl iface and it is probably due to
this limit: skb_queue_len(&cell->napi_skbs) > netdev_max_backlog.

This patch is trying to alloc_percpu(struct gro_cell) and schedule
gro_cell_poll to process the skb in the same core.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 01:56:32 -05:00
Johannes Berg
053c095a82 netlink: make nlmsg_end() and genlmsg_end() void
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 01:03:45 -05:00
David S. Miller
e445dd5f67 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-01-16

Here are some more bluetooth & ieee802154 patches intended for 3.20:

 - Refactoring & cleanups of ieee802154 & 6lowpan code
 - Various fixes to the btmrvl driver
 - Fixes for Bluetooth Low Energy Privacy feature handling
 - Added build-time sanity checks for sockaddr sizes
 - Fixes for Security Manager registration on LE-only controllers
 - Refactoring of broken inquiry mode handling to a generic quirk

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 00:25:30 -05:00
Jiri Pirko
3aeb66176f net: replace br_fdb_external_learn_* calls with switchdev notifier events
This patch benefits from newly introduced switchdev notifier and uses it
to propagate fdb learn events from rocker driver to bridge. That avoids
direct function calls and possible use by other listeners (ovs).

Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 00:23:57 -05:00
Jiri Pirko
03bf0c2812 switchdev: introduce switchdev notifier
This patch introduces new notifier for purposes of exposing events which happen
on switch driver side. The consumers of the event messages are mainly involved
masters, namely bridge and ovs.

Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 00:23:57 -05:00
Jiri Pirko
d23b8ad8ab tc: add BPF based action
This action provides a possibility to exec custom BPF code.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-17 23:51:10 -05:00
Johannes Berg
ee1c244219 genetlink: synchronize socket closing and family removal
In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.

Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)

To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.

This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.

Also remove the warning since it would still trigger, but is now no
longer a problem.

This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 17:04:25 -05:00
Johannes Berg
f555f3d76a genetlink: document parallel_ops
The kernel-doc for the parallel_ops family struct member is
missing, add it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 17:04:24 -05:00
Rickard Strandqvist
0026b6551b Bluetooth: Remove unused function
Remove the function hci_conn_change_link_key() that is not used anywhere.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-16 13:06:38 +02:00
David S. Miller
27f097177d Here's a big pile of changes for this round.
We have
  * a lot of regulatory code changes to deal with the
    way newer Intel devices handle this
  * a change to drop packets while disconnecting from
    an AP instead of trying to wait for them
  * a new attempt at improving the tailroom accounting
    to not kick in too much for performance reasons
  * improvements in wireless link statistics
  * many other small improvements and small fixes that
    didn't seem necessary for 3.19 (e.g. in hwsim which
    is testing only code)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUt7WEAAoJEDBSmw7B7bqrVBoP/2EViE62HMmXdqG1SZWz8q9o
 Iigq8STC/sT2WCx1pYm+tKuVW4LD2O3mCriGNP8A3RwzDZ6H7sKJYb1gV6QCPV6f
 4+yT5VSAB3D3lHmp/bbyNsmKCBQ5uS4LVgDrokrkbGpacDu94PYS5Wv9t3x6PBVB
 5Xjky6g6A/pSuxTIstSO9k5xkzNjaB1TxvVRz/gJrGcFQVkDFSlVbuTHUVxs8p+p
 k6mwY/2WYijZkswWZVQTJLQlF9vRI7PYkKs5m8gz4pjNU48oFJoyu4IP3Z1Xj/Sm
 zgT1C9rgp0Du74HYO2niGAvLWgKajAZuW5hIacDndUPjYQQBLgGs/bCJGSntM+x9
 XoOdPixdFPT/58ijyYZlmHc8rxPOd2kHsVbwGplp8f195S4VO04D+ejfOaoAUFwX
 v/kMvO3XIFmEH1jjkDAC3OTcRMYVMuENyWl7pFzxHIzPeRiEpQUd9iSdM4yol0F2
 ZyWvKud4U75Sh+aCiDIIBETtdfCRFe12hgKs4COYbI/UYkGPTPrNei/uisopdubT
 JC+7pZOYdSgoX12yVi6ds6DmKE/ZpIQyhIK4wTWgVoszbnfdb9Mw7mJEThwNRjeK
 JJPsbuty7u8HWjXzEqHLoTV3BFv1cgRSJc5Wt0zfME+LzD7iQpEpv+QBAguwwChD
 Osn55Z3FnKEmBdGcOIje
 =vaEW
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Here's a big pile of changes for this round.

We have
 * a lot of regulatory code changes to deal with the
   way newer Intel devices handle this
 * a change to drop packets while disconnecting from
   an AP instead of trying to wait for them
 * a new attempt at improving the tailroom accounting
   to not kick in too much for performance reasons
 * improvements in wireless link statistics
 * many other small improvements and small fixes that
   didn't seem necessary for 3.19 (e.g. in hwsim which
   is testing only code)

Conflicts:
	drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c

Minor overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:16:56 -05:00
Eric Dumazet
5055c371bf ipv4: per cpu uncached list
RAW sockets with hdrinc suffer from contention on rt_uncached_lock
spinlock.

One solution is to use percpu lists, since most routes are destroyed
by the cpu that created them.

It is unclear why we even have to put these routes in uncached_list,
as all outgoing packets should be freed when a device is dismantled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: caacf05e5a ("ipv4: Properly purge netdev references on uncached routes.")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 18:26:16 -05:00
Johannes Berg
b51f3beecf cfg80211: change bandwidth reporting to explicit field
For some reason, we made the bandwidth separate flags, which
is rather confusing - a single rate cannot have different
bandwidths at the same time.

Change this to no longer be flags but use a separate field
for the bandwidth ('bw') instead.

While at it, add support for 5 and 10 MHz rates - these are
reported as regular legacy rates with their real bitrate,
but tagged as 5/10 now to make it easier to distinguish them.

In the nl80211 API, the flags are preserved, but the code
now can also clearly only set a single one of the flags.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-15 22:41:32 +01:00
Johannes Berg
97d910d0aa cfg80211: remove 80+80 MHz rate reporting
These rates are treated the same as 160 MHz in the spec, so
it makes no sense to distinguish them. As no driver uses them
yet, this is also not a problem, just remove them.

In the userspace API the field remains reserved to preserve
API and ABI.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-15 16:05:21 +01:00
Johannes Berg
f89903d53f mac80211: remove 80+80 MHz rate reporting
These rates are treated the same as 160 MHz in the spec,
so it makes no sense to distinguish them. As no driver
uses them yet, this is also not a problem, just remove
them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-15 16:02:46 +01:00
David S. Miller
4e7a84b1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter updates for net-next

The following patchset contains netfilter updates for net-next, just a
bunch of cleanups and small enhancement to selectively flush conntracks
in ctnetlink, more specifically the patches are:

1) Rise default number of buckets in conntrack from 16384 to 65536 in
   systems with >= 4GBytes, patch from Marcelo Leitner.

2) Small refactor to save one level on indentation in xt_osf, from
   Joe Perches.

3) Remove unnecessary sizeof(char) in nf_log, from Fabian Frederick.

4) Another small cleanup to remove redundant variable in nfnetlink,
   from Duan Jiong.

5) Fix compilation warning in nfnetlink_cthelper on parisc, from
   Chen Gang.

6) Fix wrong format in debugging for ctseqadj, from Gao feng.

7) Selective conntrack flushing through the mark for ctnetlink, patch
   from Kristian Evensen.

8) Remove nf_ct_conntrack_flush_report() exported symbol now that is
   not required anymore after the selective flushing patch, again from
   Kristian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:50:25 -05:00
Thomas Graf
1dd144cf5b openvswitch: Support VXLAN Group Policy extension
Introduces support for the group policy extension to the VXLAN virtual
port. The extension is disabled by default and only enabled if the user
has provided the respective configuration.

  ovs-vsctl add-port br0 vxlan0 -- \
     set Interface vxlan0 type=vxlan options:exts=gbp

The configuration interface to enable the extension is based on a new
attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
which can carry additional extensions as needed in the future.

The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
internally just like Geneve options but transported as nested Netlink
attributes to user space.

Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.

The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:11:41 -05:00
Thomas Graf
ac5132d1a0 vxlan: Only bind to sockets with compatible flags enabled
A VXLAN net_device looking for an appropriate socket may only consider
a socket which has a matching set of flags/extensions enabled. If
incompatible flags are enabled, return a conflict to have the caller
create a distinct socket with distinct port.

The OVS VXLAN port is kept unaware of extensions at this point.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:11:41 -05:00
Thomas Graf
3511494ce2 vxlan: Group Policy extension
Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels.  However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

           Host 1:                       Host 2:

      Group A        Group B        Group B     Group A
      +-----+   +-------------+    +-------+   +-----+
      | lxc |   | SELinux CTX |    | httpd |   | VM  |
      +--+--+   +--+----------+    +---+---+   +--+--+
	  \---+---/                     \----+---/
	      |                              |
	  +---+---+                      +---+---+
	  | vxlan |                      | vxlan |
	  +---+---+                      +---+---+
	      +------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP  frames. The extension is therefore disabled by default
and needs to be specifically enabled:

   ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
 iptables:
  host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
  host2# iptables -I INPUT -m mark --mark 0x200 -j DROP

 OVS:
  # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
  # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:11:41 -05:00
Tom Herbert
dfd8645ea1 vxlan: Remote checksum offload
Add support for remote checksum offload in VXLAN. This uses a
reserved bit to indicate that RCO is being done, and uses the low order
reserved eight bits of the VNI to hold the start and offset values in a
compressed manner.

Start is encoded in the low order seven bits of VNI. This is start >> 1
so that the checksum start offset is 0-254 using even values only.
Checksum offset (transport checksum field) is indicated in the high
order bit in the low order byte of the VNI. If the bit is set, the
checksum field is for UDP (so offset = start + 6), else checksum
field is for TCP (so offset = start + 16). Only TCP and UDP are
supported in this implementation.

Remote checksum offload for VXLAN is described in:

https://tools.ietf.org/html/draft-herbert-vxlan-rco-00

Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).

With UDP checksums and Remote Checksum Offload
  IPv4
      Client
        11.84% CPU utilization
      Server
        12.96% CPU utilization
      9197 Mbps
  IPv6
      Client
        12.46% CPU utilization
      Server
        14.48% CPU utilization
      8963 Mbps

With UDP checksums, no remote checksum offload
  IPv4
      Client
        15.67% CPU utilization
      Server
        14.83% CPU utilization
      9094 Mbps
  IPv6
      Client
        16.21% CPU utilization
      Server
        14.32% CPU utilization
      9058 Mbps

No UDP checksums
  IPv4
      Client
        15.03% CPU utilization
      Server
        23.09% CPU utilization
      9089 Mbps
  IPv6
      Client
        16.18% CPU utilization
      Server
        26.57% CPU utilization
       8954 Mbps

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:20:04 -05:00
Arik Nemtsov
2c3e861c94 cfg80211: introduce sync regdom set API for self-managed
A self-managed device will sometimes need to set its regdomain synchronously.
Notably it should be set before usermode has a chance to query it. Expose
a new API to accomplish this which requires the RTNL.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:43:44 +01:00
Jiri Pirko
df8a39defa net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Jiri Pirko
d8b9605d26 net: sched: fix skb->protocol use in case of accelerated vlan path
tc code implicitly considers skb->protocol even in case of accelerated
vlan paths and expects vlan protocol type here. However, on rx path,
if the vlan header was already stripped, skb->protocol contains value
of next header. Similar situation is on tx path.

So for skbs that use skb->vlan_tci for tagging, use skb->vlan_proto instead.

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Tom Herbert
3bf3947526 vxlan: Improve support for header flags
This patch cleans up the header flags of VXLAN in anticipation of
defining some new ones:

- Move header related definitions from vxlan.c to vxlan.h
- Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag)
- Move check for unknown flags to after we find vxlan_sock, this
  assumes that some flags may be processed based on tunnel
  configuration
- Add a comment about why the stack treating unknown set flags as an
  error instead of ignoring them

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:05:01 -05:00
Marcel Holtmann
039d4e410c Bluetooth: Add missing response structure for HCI Delete Stored Link Key
This patch adds this missing structure for processing the result of the
HCI Delete Stored Link Key command.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 21:55:40 +02:00
Marcel Holtmann
c2f0f97927 Bluetooth: Handle command complete event for HCI Read Stored Link Keys
When the HCI Read Stored Link Keys command completes it gives useful
information of the current stored keys and maximum keys a controller
can actually store. So process this event and store these information
in hci_dev structure.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 21:54:16 +02:00
Marcel Holtmann
cb9627806c Bluetooth: Add defintions for HCI Read Stored Link Key command
This patch adds the missing commmand structure and command complete
structure for the HCI Read Store Link Key command.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 21:53:54 +02:00
Marcel Holtmann
1904a853fa Bluetooth: Add opcode parameter to hci_req_complete_t callback
When hci_req_run() calls its provided complete function and one of the
HCI commands in the sequence fails, then provide the opcode of failing
command. In case of success HCI_OP_NOP is provided since all commands
completed.

This patch fixes the prototype of hci_req_complete_t and all its users.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:16:31 +02:00
Johannes Berg
6de39808cf nl80211: support per-TID station statistics
The base for the current statistics is pretty mixed up, support
exporting RX/TX statistics for MSDUs per TID. This (currently)
covers received MSDUs, transmitted MSDUs and retries/failures
thereof.

Doing it per TID for MSDUs makes more sense than say only per AC
because it's symmetric - we could export per-AC statistics for all
frames (which AC we used for transmission can be determined also
for management frames) but per TID is better and usually data
frames are really the ones we care about. Also, on RX we can't
determine the AC - but we do know the TID for any QoS MPDU we
received.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:18 +01:00
Johannes Berg
8d791361a4 nl80211: clarify packet statistics descriptions
The current statistics we keep aren't very clear, some are on
MPDUs and some on MSDUs/MMPDUs. Clarify the descriptions based
on the counters mac80211 keeps.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:16 +01:00
Johannes Berg
a76b1942a1 cfg80211: add nl80211 beacon-only statistics
Add these two values:
 * BEACON_RX: number of beacons received from this peer
 * BEACON_SIGNAL_AVG: signal strength average for beacons only

These can then be used for Android Lollipop's statistics request.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:13 +01:00
Johannes Berg
319090bf6c cfg80211: remove enum station_info_flags
This is really just duplicating the list of information that's
already available in the nl80211 attribute, so remove the list.
Two small changes are needed:
 * remove STATION_INFO_ASSOC_REQ_IES complete, but the length
   (assoc_req_ies_len) can be used instead
 * add NL80211_STA_INFO_RX_DROP_MISC which exists internally
   but not in nl80211 yet

This gets rid of the duplicate maintenance of the two lists.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:10 +01:00
Johannes Berg
2b9a7e1bac mac80211: allow drivers to provide most station statistics
In many cases, drivers can filter things like beacons that will
skew statistics reported by mac80211. To get correct statistics
in these cases, call drivers to obtain statistics and let them
override all values, filling values from mac80211 if the driver
didn't provide them. Not all of them make sense for the driver
to fill, so some are still always done by mac80211.

Note that this doesn't currently allow a driver to say "I know
this value is wrong, don't report it at all", or to sum it up
with a mac80211 value (as could be useful for "dropped misc"),
that can be added if it turns out to be needed.

This also gets rid of the get_rssi() method as is can now be
implemented using sta_statistics().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:06 +01:00
Johannes Berg
cf5ead822d cfg80211: allow including station info in delete event
When a station is removed, its statistics may be interesting to
userspace, for example for further aggregation of statistics of
all stations that ever connected to an AP.

Introduce a new cfg80211_del_sta_sinfo() function (and make the
cfg80211_del_sta() a static inline calling it) to allow passing
a struct station_info along with this, and send the data in the
nl80211 event message.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:00 +01:00
Johannes Berg
052536abfa cfg80211: add scan time to survey data
Add the time spent scanning to the survey data so it can be
reported by drivers that collect such information.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:27:58 +01:00
Johannes Berg
11f78ac32b cfg80211: allow survey data to return global data
Not all devices are able to report survey data (particularly
time spent for various operations) per channel. As all these
statistics already exist in survey data, allow such devices
to report them (if userspace requested it)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:27:54 +01:00
Johannes Berg
4ed20bebf5 cfg80211: remove "channel" from survey names
All of the survey data is (currently) per channel anyway,
so having the word "channel" in the name does nothing. In
the next patch I'll introduce global data to the survey,
where the word "channel" is actually confusing.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:27:52 +01:00
Kristian Evensen
ae406bd057 netfilter: conntrack: Remove nf_ct_conntrack_flush_report
The only user of nf_ct_conntrack_flush_report() was ctnetlink_del_conntrack().
After adding support for flushing connections with a given mark, this function
is no longer called.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-08 12:16:58 +01:00
Ido Yariv
db12847ca8 mac80211: Re-fix accounting of the tailroom-needed counter
When hw acceleration is enabled, the GENERATE_IV or PUT_IV_SPACE flags
only require headroom space. Therefore, the tailroom-needed counter can
safely be decremented for most drivers.

The older incarnation of this patch (ca34e3b5) assumed that the above
holds true for all drivers. As reported by Christopher Chavez and
researched by Christian Lamparter and Larry Finger, this isn't a valid
assumption for p54 and cw1200.

Drivers that still require tailroom for ICV/MIC even when HW encryption
is enabled can use IEEE80211_KEY_FLAG_RESERVE_TAILROOM to indicate it.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Cc: Christopher Chavez <chrischavez@gmx.us>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Solomon Peachy <pizza@shaftnet.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-07 14:39:32 +01:00
Johannes Berg
3a4b0c948d Merge branch 'mac80211' into mac80211-next
Merge mac80211.git to get some changes that would otherwise
cause conflicts with new changes coming here.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-07 14:39:16 +01:00
David S. Miller
44d84d7272 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-01-06 22:29:20 -05:00
David S. Miller
15ecf7a063 Here's just a single fix - a revert of a patch that broke the
p54 and cw2100 drivers (arguably due to bad assumptions there.)
 Since this affects kernels since 3.17, I decided to revert for
 now and we'll revisit this optimisation properly for -next.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUq9EvAAoJEDBSmw7B7bqr7mMQAJRbdepVVqK5IFH1BH0NC7vi
 LkBE2lp8ZAz1Crg+OAQNUdlZUHtGyfYoXSfzezmrMG51i5xjHyOYQQikW7aJ2SQ0
 XsjJJ5TcqKe83NwoakXUrMpE7KmCt/LnbjKNXDsZIvLlUkqa7ksXaS7btK195aXy
 WlVrmUE+BqT9a16VjFLZ6wRjI43+3bGxhtFL+g1eXw6nZ4a2o4EbIXdc9SN+/bT4
 tAhWJfdAQqQc34jhesWGbMIvkXWhzy2R6Js+9gMIBNsmlAiYbFa4QZ/9tI3nBI/O
 yHSiDc7JnPNjkkC+3wTJxMl7mEd6fEKnAS1ryZ5L4XhPrQpV39iZuWSPvPGw6LLW
 kB6+wXkIyQdCSoyrQZxY75ibqOUKYYxhhkSYfMePXRKTYY6MlHYqiH8wPWFpPoqO
 iumLqx8/CtRW1q1t2EBAG6rZLRF8HqmfqtB+ptT0DWcAP8E81q8BImPoPFr+P9S2
 XfuuSw97xKCcilOcYJ0uYSBe4XNNhy1dtC/zJ8cA9nV4WNkofALga5Z/t8ARhDsM
 wvP1D2uIX3U9My17bXq+Xn/fSSS7yhpLZjEHj/JNRvpDCWGf/tQl6A3ydMy//Oqe
 lRSKfmiAGysqhXnmK12+YhfO+4ioTz8dA88tHs1AO8qasfQwx45eRsUPemWeExiL
 9Lntb0U6MhYvgiTdWqt6
 =CnbJ
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Here's just a single fix - a revert of a patch that broke the
p54 and cw2100 drivers (arguably due to bad assumptions there.)
Since this affects kernels since 3.17, I decided to revert for
now and we'll revisit this optimisation properly for -next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-06 13:29:27 -05:00
Gautam Kumar Shukla
d75bb06b61 cfg80211: add extensible feature flag attribute
With the wiphy::features flag being used up this patch adds a
new field wiphy::ext_features. Considering extensibility this
new field is declared as a byte array. This extensible flag is
exposed to user-space by NL80211_ATTR_EXT_FEATURES.

Cc: Avinash Patil <patila@marvell.com>
Signed-off-by: Gautam (Gautam Kumar) Shukla <gautams@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-06 12:10:24 +01:00
Daniel Borkmann
81164413ad net: tcp: add per route congestion control
This work adds the possibility to define a per route/destination
congestion control algorithm. Generally, this opens up the possibility
for a machine with different links to enforce specific congestion
control algorithms with optimal strategies for each of them based
on their network characteristics, even transparently for a single
application listening on all links.

For our specific use case, this additionally facilitates deployment
of DCTCP, for example, applications can easily serve internal
traffic/dsts in DCTCP and external one with CUBIC. Other scenarios
would also allow for utilizing e.g. long living, low priority
background flows for certain destinations/routes while still being
able for normal traffic to utilize the default congestion control
algorithm. We also thought about a per netns setting (where different
defaults are possible), but given its actually a link specific
property, we argue that a per route/destination setting is the most
natural and flexible.

The administrator can utilize this through ip-route(8) by appending
"congctl [lock] <name>", where <name> denotes the name of a
congestion control algorithm and the optional lock parameter allows
to enforce the given algorithm so that applications in user space
would not be allowed to overwrite that algorithm for that destination.

The dst metric lookups are being done when a dst entry is already
available in order to avoid a costly lookup and still before the
algorithms are being initialized, thus overhead is very low when the
feature is not being used. While the client side would need to drop
the current reference on the module, on server side this can actually
even be avoided as we just got a flat-copied socket clone.

Joint work with Florian Westphal.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann
ea69763999 net: tcp: add RTAX_CC_ALGO fib handling
This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
control metric to be set up and dumped back to user space.

While the internal representation of RTAX_CC_ALGO is handled as a u32
key, we avoided to expose this implementation detail to user space, thus
instead, we chose the netlink attribute that is being exchanged between
user space to be the actual congestion control algorithm name, similarly
as in the setsockopt(2) API in order to allow for maximum flexibility,
even for 3rd party modules.

It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
it should have been stored in RTAX_FEATURES instead, we first thought
about reusing it for the congestion control key, but it brings more
complications and/or confusion than worth it.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann
c5c6a8ab45 net: tcp: add key management to congestion control
This patch adds necessary infrastructure to the congestion control
framework for later per route congestion control support.

For a per route congestion control possibility, our aim is to store
a unique u32 key identifier into dst metrics, which can then be
mapped into a tcp_congestion_ops struct. We argue that having a
RTAX key entry is the most simple, generic and easy way to manage,
and also keeps the memory footprint of dst entries lower on 64 bit
than with storing a pointer directly, for example. Having a unique
key id also allows for decoupling actual TCP congestion control
module management from the FIB layer, i.e. we don't have to care
about expensive module refcounting inside the FIB at this point.

We first thought of using an IDR store for the realization, which
takes over dynamic assignment of unused key space and also performs
the key to pointer mapping in RCU. While doing so, we stumbled upon
the issue that due to the nature of dynamic key distribution, it
just so happens, arguably in very rare occasions, that excessive
module loads and unloads can lead to a possible reuse of previously
used key space. Thus, previously stale keys in the dst metric are
now being reassigned to a different congestion control algorithm,
which might lead to unexpected behaviour. One way to resolve this
would have been to walk FIBs on the actually rare occasion of a
module unload and reset the metric keys for each FIB in each netns,
but that's just very costly.

Therefore, we argue a better solution is to reuse the unique
congestion control algorithm name member and map that into u32 key
space through jhash. For that, we split the flags attribute (as it
currently uses 2 bits only anyway) into two u32 attributes, flags
and key, so that we can keep the cacheline boundary of 2 cachelines
on x86_64 and cache the precalculated key at registration time for
the fast path. On average we might expect 2 - 4 modules being loaded
worst case perhaps 15, so a key collision possibility is extremely
low, and guaranteed collision-free on LE/BE for all in-tree modules.
Overall this results in much simpler code, and all without the
overhead of an IDR. Due to the deterministic nature, modules can
now be unloaded, the congestion control algorithm for a specific
but unloaded key will fall back to the default one, and on module
reload time it will switch back to the expected algorithm
transparently.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Florian Westphal
e715b6d3a5 net: fib6: convert cfg metric to u32 outside of table write lock
Do the nla validation earlier, outside the write lock.

This is needed by followup patch which needs to be able to call
request_module (which can sleep) if needed.

Joint work with Daniel Borkmann.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Tom Herbert
ad6f939ab1 ip: Add offset parameter to ip_cmsg_recv
Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.

ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert
5961de9f19 ip: Add offset parameter to ip_cmsg_recv
Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.

ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert
c44d13d6f3 ip: IP cmsg cleanup
Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that
they can be referenced in other source files.

Restructure ip_cmsg_recv to not go through flags using shift, check
for flags by 'and'. This eliminates both the shift and a conditional
per flag check.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert
224d019c4f ip: Move checksum convert defines to inet
Move convert_csum from udp_sock to inet_sock. This allows the
possibility that we can use convert checksum for different types
of sockets and also allows convert checksum to be enabled from
inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Thomas Graf
149118d893 netlink: Warn on unordered or illegal nla_nest_cancel() or nlmsg_cancel()
Calling nla_nest_cancel() in a different order as the nesting was
built up can lead to negative offsets being calculated which
results in skb_trim() being called with an underflowed unsigned
int. Warn if mark < skb->data as it's definitely a bug.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:38:22 -05:00
Johannes Berg
1e359a5de8 Revert "mac80211: Fix accounting of the tailroom-needed counter"
This reverts commit ca34e3b5c8.

It turns out that the p54 and cw2100 drivers assume that there's
tailroom even when they don't say they really need it. However,
there's currently no way for them to explicitly say they do need
it, so for now revert this.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.

Cc: stable@vger.kernel.org
Fixes: ca34e3b5c8 ("mac80211: Fix accounting of the tailroom-needed counter")
Reported-by: Christopher Chavez <chrischavez@gmx.us>
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Debugged-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-05 10:33:46 +01:00
Jesse Gross
df5dba8e52 geneve: Remove socket hash table.
The hash table for open Geneve ports is used only on creation and
deletion time. It is not performance critical and is not likely to
grow to a large number of items. Therefore, this can be changed
to use a simple linked list.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Jesse Gross
829a3ada9c geneve: Simplify locking.
The existing Geneve locking scheme was pulled over directly from
VXLAN. However, VXLAN has a number of built in mechanisms which make
the locking more complex and are unlikely to be necessary with Geneve.
This simplifies the locking to use a basic scheme of a mutex
when doing updates plus RCU on receive.

In addition to making the code easier to read, this also avoids the
possibility of a race when creating or destroying sockets since
UDP sockets and the list of Geneve sockets are protected by different
locks. After this change, the entire operation is atomic.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Jesse Gross
61f3cade76 geneve: Remove workqueue.
The work queue is used only to free the UDP socket upon destruction.
This is not necessary with Geneve and generally makes the code more
difficult to reason about. It also introduces nondeterministic
behavior such as when a socket is rapidly deleted and recreated, which
could fail as the the deletion happens asynchronously.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Marcel Holtmann
043ec9bf7b Bluetooth: Introduce HCI_QUIRK_FIXUP_INQUIRY_MODE option
The HCI_QUIRK_FIXUP_INQUIRY_MODE option allows to force Inquiry Result
with RSSI setting on controllers that do not indicate support for it,
but where it is known to be fully functional.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-03 22:31:09 +02:00
Marcel Holtmann
05b3c3e790 Bluetooth: Remove no longer needed force_sc_support debugfs option
The force_sc_support debugfs option was introduced to easily work with
pre-production Bluetooth 4.1 silicon. This option is no longer needed
since controllers supporting BR/EDR Secure Connections feature are now
available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:04 +01:00
Marcel Holtmann
91389af67c Bluetooth: Remove broken force_lesc_support debugfs option
The force_lesc_support debugfs option never really worked. It has a race
condition between creating the debugfs entry and registering the L2CAP
fixed channel for BR/EDR SMP support.

Also this has been replaced with a working force_bredr_smp debugfs
switch that developers can use now.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:03 +01:00
Marcel Holtmann
300acfdec9 Bluetooth: Introduce force_bredr_smp debugfs option for testing
Testing cross-transport pairing that starts on BR/EDR is only valid when
using a controller with BR/EDR Secure Connections. Devices will indicate
this by providing BR/EDR SMP fixed channel over L2CAP. To allow testing
of this feature on Bluetooth 4.0 controller or controllers without the
BR/EDR Secure Connections features, introduce a force_bredr_smp debugfs
option that allows faking the required AES connection.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:03 +01:00
David S. Miller
6c032edc8a Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg say:

====================
pull request: bluetooth-next 2014-12-31

Here's the first batch of bluetooth patches for 3.20.

 - Cleanups & fixes to ieee802154  drivers
 - Fix synchronization of mgmt commands with respective HCI commands
 - Add self-tests for LE pairing crypto functionality
 - Remove 'BlueFritz!' specific handling from core using a new quirk flag
 - Public address configuration support for ath3012
 - Refactor debugfs support into a dedicated file
 - Initial support for LE Data Length Extension feature from Bluetooth 4.2

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 15:58:21 -05:00
Alexander Duyck
345e9b5426 fib_trie: Push rcu_read_lock/unlock to callers
This change is to start cleaning up some of the rcu_read_lock/unlock
handling.  I realized while reviewing the code there are several spots that
I don't believe are being handled correctly or are masking warnings by
locally calling rcu_read_lock/unlock instead of calling them at the correct
level.

A common example is a call to fib_get_table followed by fib_table_lookup.
The rcu_read_lock/unlock ought to wrap both but there are several spots where
they were not wrapped.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Johannes Berg
023e2cfa36 netlink/genetlink: pass network namespace to bind/unbind
Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.

To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.

Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27 03:07:50 -05:00
Johannes Berg
c380d9a7af genetlink: pass multicast bind/unbind to families
In order to make the newly fixed multicast bind/unbind
functionality in generic netlink, pass them down to the
appropriate family.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27 02:20:23 -05:00
Johannes Berg
f8403a2e47 genetlink: pass only network namespace to genl_has_listeners()
There's no point to force the caller to know about the internal
genl_sock to use inside struct net, just have them pass the network
namespace. This doesn't really change code generation since it's
an inline, but makes the caller less magic - there's never any
reason to pass another socket.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27 02:20:23 -05:00
Jesse Gross
5f35227ea3 net: Generalize ndo_gso_check to ndo_features_check
GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.

This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.

By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.

CC: Tom Herbert <therbert@google.com>
CC: Joe Stringer <joestringer@nicira.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by:  Tom Herbert <therbert@google.com>
Fixes: 04ffcb255f ("net: Add ndo_gso_check")
Tested-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-26 17:20:56 -05:00
Nicolas Dichtel
ef8f342b43 neigh: remove next ptr from struct neigh_table
After commit
d7480fd3b1 ("neigh: remove dynamic neigh table registration support"),
this field is not used anymore.

CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-26 17:07:08 -05:00
Marcel Holtmann
711ffa78f4 Bluetooth: Introduce HCI_QUIRK_BROKEN_LOCAL_COMMANDS constant
Some controllers advertise support for Bluetooth 1.2 specification,
but they do not support the HCI Read Local Supported Commands command.

If that is the case, then the driver can quirk the behavior and force
the core to skip this command. This will allow removing vendor specific
checks out of the core.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-26 20:16:10 +02:00
Marcel Holtmann
72e4a6bd02 Bluetooth: Remove duplicate constant for RFCOMM PSM
The RFCOMM_PSM constant is actually a duplicate. So remove it and
use the L2CAP_PSM_RFCOMM constant instead.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-20 19:55:04 +02:00
Marcel Holtmann
23b9ceb74f Bluetooth: Create debugfs directory for each connection handle
For every internal representation of a Bluetooth connection which is
identified by hci_conn, create a debugfs directory with the handle
number as directory name.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-20 19:54:24 +02:00
Marcel Holtmann
a8e1bfaa55 Bluetooth: Store default and maximum LE data length settings
When the controller supports the LE Data Length Extension feature, the
default and maximum data length are read and now stored.

For backwards compatibility all values are initialized to the data
length values from Bluetooth 4.1 and earlier specifications.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-20 17:52:21 +02:00
Marcel Holtmann
94a3bd02a6 Bluetooth: Add structures for LE Data Length Extension feature
This patch adds the structures for HCI commands and events of the
LE Data Length Extension feature from Bluetooth 4.2 specification.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-20 17:51:41 +02:00
Johan Hedberg
5a154e6f71 Bluetooth: Fix Add Device to wait for HCI before sending cmd_complete
This patch updates the Add Device mgmt command handler to use a
hci_request to wait for HCI command completion before notifying user
space of the mgmt command completion. To do this we need to add an extra
hci_request parameter to the hci_conn_params_set function. Since this
function has no other users besides mgmt.c it's moved there as a static
function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-19 22:06:37 +01:00
Johan Hedberg
2cf22218b0 Bluetooth: Add hci_request support for hci_update_background_scan
Many places using hci_update_background_scan() try to synchronize
whatever they're doing with the help of hci_request callbacks. However,
since the hci_update_background_scan() function hasn't so far accepted a
hci_request pointer any commands triggered by it have been left out by
the synchronization. This patch modifies the API in a similar way as was
done for hci_update_page_scan, i.e. there's a variant that takes a
hci_request and another one that takes a hci_dev.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-19 22:06:37 +01:00
Johan Hedberg
0857dd3bed Bluetooth: Split hci_request helpers to hci_request.[ch]
None of the hci_request related things in net/bluetooth/hci_core.h are
needed anywhere outside of the core bluetooth module. This patch creates
a new net/bluetooth/hci_request.c file with its corresponding h-file and
moves the functionality there from hci_core.c and hci_core.h.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-19 13:04:42 +01:00
Johan Hedberg
1d2dc5b7b3 Bluetooth: Split hci_update_page_scan into two functions
To keep the parameter list and its semantics clear it makes sense to
split the hci_update_page_scan function into two separate functions: one
taking a hci_dev and another taking a hci_request. The one taking a
hci_dev constructs its own hci_request and then calls the other
function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-19 12:52:18 +01:00
Jukka Rissanen
cab9e3a055 Bluetooth: 6lowpan: Add IPSP PSM value
The Internet Protocol Support Profile a.k.a BT 6LoWPAN specification
is ready so PSM value for it is now known.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-19 11:12:05 +01:00
Alexander Aring
ba2a9506a7 nl802154: introduce support for cca settings
This patch adds support for setting cca parameters via nl802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-19 00:19:23 +01:00
Alexander Aring
7fe9a3882b ieee802154: rework cca setting
The current cca setting handle is a driver specific call. We need to
introduce some 802.15.4 specific layer and mapping 802.15.4 cca modes to
driver specific ones inside the 802.15.4 driver. This patch will add
such 802.15.4 layer and mapping the cca settings to driver specific ones.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-19 00:19:23 +01:00
Alexander Aring
b40d6376ff nl802154: introduce cca mode enums
This patch adds enums for 802.15.4 specific CCA settings.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-19 00:19:23 +01:00
Jukka Rissanen
93a1e86ce1 nl80211: Stop scheduled scan if netlink client disappears
An attribute NL80211_ATTR_SOCKET_OWNER can be set by the scan initiator.
If present, the attribute will cause the scan to be stopped if the client
dies.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-12-18 14:38:44 +01:00
Jukka Rissanen
31a60ed1e9 nl80211: Convert sched_scan_req pointer to RCU pointer
Because of possible races when accessing sched_scan_req pointer in
rdev, the sched_scan_req is converted to RCU pointer.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-12-18 14:38:09 +01:00
Jonathan Doron
b0d7aa5959 cfg80211: allow wiphy specific regdomain management
Add a new regulatory flag that allows a driver to manage regdomain
changes/updates for its own wiphy.
A self-managed wiphys only employs regulatory information obtained from
the FW and driver and does not use other cfg80211 sources like
beacon-hints, country-code IEs and hints from other devices on the same
system. Conversely, a self-managed wiphy does not share its regulatory
hints with other devices in the system. If a system contains several
devices, one or more of which are self-managed, there might be
contradictory regulatory settings between them. Usage of flag is
generally discouraged. Only use it if the FW/driver is incompatible
with non-locally originated hints.

A new API lets the driver send a complete regdomain, to be applied on
its wiphy only.

After a wiphy-specific regdomain change takes place, usermode will get
a new type of change notification. The regulatory core also takes care
enforce regulatory restrictions, in case some interfaces are on
forbidden channels.

Signed-off-by: Jonathan Doron <jonathanx.doron@intel.com>
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-12-17 11:49:55 +01:00
Linus Torvalds
603ba7e41b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #2 from Al Viro:
 "Next pile (and there'll be one or two more).

  The large piece in this one is getting rid of /proc/*/ns/* weirdness;
  among other things, it allows to (finally) make nameidata completely
  opaque outside of fs/namei.c, making for easier further cleanups in
  there"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coda_venus_readdir(): use file_inode()
  fs/namei.c: fold link_path_walk() call into path_init()
  path_init(): don't bother with LOOKUP_PARENT in argument
  fs/namei.c: new helper (path_cleanup())
  path_init(): store the "base" pointer to file in nameidata itself
  make default ->i_fop have ->open() fail with ENXIO
  make nameidata completely opaque outside of fs/namei.c
  kill proc_ns completely
  take the targets of /proc/*/ns/* symlinks to separate fs
  bury struct proc_ns in fs/proc
  copy address of proc_ns_ops into ns_common
  new helpers: ns_alloc_inum/ns_free_inum
  make proc_ns_operations work with struct ns_common * instead of void *
  switch the rest of proc_ns_operations to working with &...->ns
  netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
  make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
  common object embedded into various struct ....ns
2014-12-16 15:53:03 -08:00
Johannes Berg
848955ccf0 mac80211: move U-APSD enablement to vif flags
In order to let drivers have more dynamic U-APSD support,
move the enablement flag to the virtual interface driver
flags. This lets drivers not only set it up differently
for different interfaces, but also enable/disable on the
fly if needed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-12-15 12:34:45 +01:00
Linus Torvalds
e3aa91a7cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 - The crypto API is now documented :)
 - Disallow arbitrary module loading through crypto API.
 - Allow get request with empty driver name through crypto_user.
 - Allow speed testing of arbitrary hash functions.
 - Add caam support for ctr(aes), gcm(aes) and their derivatives.
 - nx now supports concurrent hashing properly.
 - Add sahara support for SHA1/256.
 - Add ARM64 version of CRC32.
 - Misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
  crypto: tcrypt - Allow speed testing of arbitrary hash functions
  crypto: af_alg - add user space interface for AEAD
  crypto: qat - fix problem with coalescing enable logic
  crypto: sahara - add support for SHA1/256
  crypto: sahara - replace tasklets with kthread
  crypto: sahara - add support for i.MX53
  crypto: sahara - fix spinlock initialization
  crypto: arm - replace memset by memzero_explicit
  crypto: powerpc - replace memset by memzero_explicit
  crypto: sha - replace memset by memzero_explicit
  crypto: sparc - replace memset by memzero_explicit
  crypto: algif_skcipher - initialize upon init request
  crypto: algif_skcipher - removed unneeded code
  crypto: algif_skcipher - Fixed blocking recvmsg
  crypto: drbg - use memzero_explicit() for clearing sensitive data
  crypto: drbg - use MODULE_ALIAS_CRYPTO
  crypto: include crypto- module prefix in template
  crypto: user - add MODULE_ALIAS
  crypto: sha-mb - remove a bogus NULL check
  crytpo: qat - Fix 64 bytes requests
  ...
2014-12-13 13:33:26 -08:00
Sujith Manoharan
5cf16616e1 mac80211: Fix accounting of multicast frames
Since multicast frames are marked as no-ack, using
IEEE80211_TX_STAT_ACK to check if they have been
successfully transmitted by the driver is incorrect
since a driver can choose to ignore transmission status
for no-ack frames. This results in incorrect accounting
for such frames.

To fix this issue, this patch introduces a new flag
that can be used by drivers to indicate error-free
transmission of no-ack frames.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
[add a note about not setting the flag for non-no-ack frames]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-12-12 13:48:43 +01:00
Sujith Manoharan
6b127c71fb mac80211: Move IEEE80211_TX_CTL_PS_RESPONSE
Move IEEE80211_TX_CTL_PS_RESPONSE to info->control.flags since
this is used only in the TX path (by ath9k). This frees up
a bit which can be used for other purposes.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-12-12 13:48:26 +01:00
Linus Torvalds
70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Linus Torvalds
b6da0076ba Merge branch 'akpm' (patchbomb from Andrew)
Merge first patchbomb from Andrew Morton:
 - a few minor cifs fixes
 - dma-debug upadtes
 - ocfs2
 - slab
 - about half of MM
 - procfs
 - kernel/exit.c
 - panic.c tweaks
 - printk upates
 - lib/ updates
 - checkpatch updates
 - fs/binfmt updates
 - the drivers/rtc tree
 - nilfs
 - kmod fixes
 - more kernel/exit.c
 - various other misc tweaks and fixes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  exit: pidns: fix/update the comments in zap_pid_ns_processes()
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  exit: exit_notify: re-use "dead" list to autoreap current
  exit: reparent: call forget_original_parent() under tasklist_lock
  exit: reparent: avoid find_new_reaper() if no children
  exit: reparent: introduce find_alive_thread()
  exit: reparent: introduce find_child_reaper()
  exit: reparent: document the ->has_child_subreaper checks
  exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
  exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
  exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
  exit: proc: don't try to flush /proc/tgid/task/tgid
  exit: release_task: fix the comment about group leader accounting
  exit: wait: drop tasklist_lock before psig->c* accounting
  exit: wait: don't use zombie->real_parent
  exit: wait: cleanup the ptrace_reparented() checks
  usermodehelper: kill the kmod_thread_locker logic
  usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
  fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  ...
2014-12-10 18:34:42 -08:00
Al Viro
707c5960f1 Merge branch 'nsfs' into for-next 2014-12-10 21:31:59 -05:00
Johannes Weiner
3e32cb2e0a mm: memcontrol: lockless page counters
Memory is internally accounted in bytes, using spinlock-protected 64-bit
counters, even though the smallest accounting delta is a page.  The
counter interface is also convoluted and does too many things.

Introduce a new lockless word-sized page counter API, then change all
memory accounting over to it.  The translation from and to bytes then only
happens when interfacing with userspace.

The removed locking overhead is noticable when scaling beyond the per-cpu
charge caches - on a 4-socket machine with 144-threads, the following test
shows the performance differences of 288 memcgs concurrently running a
page fault benchmark:

vanilla:

   18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
         1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
            24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
     1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
 8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
 1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
     1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )

     132.474343877 seconds time elapsed                                          ( +-  0.21% )

lockless:

   12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
           832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
            15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
     1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
 9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
 2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
     1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )

      91.369330729 seconds time elapsed                                          ( +-  0.45% )

On top of improved scalability, this also gets rid of the icky long long
types in the very heart of memcg, which is great for 32 bit and also makes
the code a lot more readable.

Notable differences between the old and new API:

- res_counter_charge() and res_counter_charge_nofail() become
  page_counter_try_charge() and page_counter_charge() resp. to match
  the more common kernel naming scheme of try_do()/do()

- res_counter_uncharge_until() is only ever used to cancel a local
  counter and never to uncharge bigger segments of a hierarchy, so
  it's replaced by the simpler page_counter_cancel()

- res_counter_set_limit() is replaced by page_counter_limit(), which
  expects its callers to serialize against themselves

- res_counter_memparse_write_strategy() is replaced by
  page_counter_limit(), which rounds down to the nearest page size -
  rather than up.  This is more reasonable for explicitely requested
  hard upper limits.

- to keep charging light-weight, page_counter_try_charge() charges
  speculatively, only to roll back if the result exceeds the limit.
  Because of this, a failing bigger charge can temporarily lock out
  smaller charges that would otherwise succeed.  The error is bounded
  to the difference between the smallest and the biggest possible
  charge size, so for memcg, this means that a failing THP charge can
  send base page charges into reclaim upto 2MB (4MB) before the limit
  would have been reached.  This should be acceptable.

[akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
[akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Linus Torvalds
cbfe0de303 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS changes from Al Viro:
 "First pile out of several (there _definitely_ will be more).  Stuff in
  this one:

   - unification of d_splice_alias()/d_materialize_unique()

   - iov_iter rewrite

   - killing a bunch of ->f_path.dentry users (and f_dentry macro).

     Getting that completed will make life much simpler for
     unionmount/overlayfs, since then we'll be able to limit the places
     sensitive to file _dentry_ to reasonably few.  Which allows to have
     file_inode(file) pointing to inode in a covered layer, with dentry
     pointing to (negative) dentry in union one.

     Still not complete, but much closer now.

   - crapectomy in lustre (dead code removal, mostly)

   - "let's make seq_printf return nothing" preparations

   - assorted cleanups and fixes

  There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  copy_from_iter_nocache()
  new helper: iov_iter_kvec()
  csum_and_copy_..._iter()
  iov_iter.c: handle ITER_KVEC directly
  iov_iter.c: convert copy_to_iter() to iterate_and_advance
  iov_iter.c: convert copy_from_iter() to iterate_and_advance
  iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
  iov_iter.c: convert iov_iter_zero() to iterate_and_advance
  iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
  iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
  iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
  iov_iter.c: iterate_and_advance
  iov_iter.c: macros for iterating over iov_iter
  kill f_dentry macro
  dcache: fix kmemcheck warning in switch_names
  new helper: audit_file()
  nfsd_vfs_write(): use file_inode()
  ncpfs: use file_inode()
  kill f_dentry uses
  lockd: get rid of ->f_path.dentry->d_sb
  ...
2014-12-10 16:10:49 -08:00
David S. Miller
22f10923dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/amd/xgbe/xgbe-desc.c
	drivers/net/ethernet/renesas/sh_eth.c

Overlapping changes in both conflict cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:48:20 -05:00
Joe Perches
785c20a08b irda: Convert function pointer arrays and uses to const
Making things const is a good thing.

(x86-64 defconfig with all irda)
$ size net/irda/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 109276	   1868	    244	 111388	  1b31c	net/irda/built-in.o.new
 108828	   2316	    244	 111388	  1b31c	net/irda/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:33:16 -05:00
Joe Perches
22bbf5f3e4 llc: Make llc_sap_action_t function pointer arrays const
It's better when function pointer arrays aren't modifiable.

Net change:

$ size net/llc/built-in.o.*
   text	   data	    bss	    dec	    hex	filename
  61193	  12758	   1344	  75295	  1261f	net/llc/built-in.o.new
  47113	  27030	   1344	  75487	  126df	net/llc/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:21:24 -05:00
Joe Perches
9b37306935 llc: Make llc_conn_ev_qfyr_t function pointer arrays const
It's better when function pointer arrays aren't modifiable.

Net change from original:

$ size net/llc/built-in.o.*
   text	   data	    bss	    dec	    hex	filename
  61065	  12886	   1344	  75295	  1261f	net/llc/built-in.o.new
  47113	  27030	   1344	  75487	  126df	net/llc/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:21:24 -05:00
Joe Perches
14b7d95fd2 llc: Make function pointer arrays const
It's better when function pointer arrays aren't modifiable.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:21:24 -05:00
David S. Miller
6e5f59aacb Merge branch 'for-davem-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
More iov_iter work for the networking from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:17:23 -05:00
Linus Torvalds
86c6a2fddf Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

     This instruments might_sleep() checks to catch places that nest
     blocking primitives - such as mutex usage in a wait loop.  Such
     bugs can result in hard to debug races/hangs.

     Another category of invalid nesting that this facility will detect
     is the calling of blocking functions from within schedule() ->
     sched_submit_work() -> blk_schedule_flush_plug().

     There's some potential for false positives (if secondary blocking
     primitives themselves are not ready yet for this facility), but the
     kernel will warn once about such bugs per bootup, so the warning
     isn't much of a nuisance.

     This feature comes with a number of fixes, for problems uncovered
     with it, so no messages are expected normally.

   - Another round of sched/numa optimizations and refinements, for
     CONFIG_NUMA_BALANCING=y.

   - Another round of sched/dl fixes and refinements.

  Plus various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched: Add missing rcu protection to wake_up_all_idle_cpus
  sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
  sched/numa: Init numa balancing fields of init_task
  sched/deadline: Remove unnecessary definitions in cpudeadline.h
  sched/cpupri: Remove unnecessary definitions in cpupri.h
  sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
  sched/fair: Fix stale overloaded status in the busiest group finding logic
  sched: Move p->nr_cpus_allowed check to select_task_rq()
  sched/completion: Document when to use wait_for_completion_io_*()
  sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
  sched/fair: Kill task_struct::numa_entry and numa_group::task_list
  sched: Refactor task_struct to use numa_faults instead of numa_* pointers
  sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
  sched/deadline: Reschedule from switched_from_dl() after a successful pull
  sched/deadline: Push task away if the deadline is equal to curr during wakeup
  sched/deadline: Add deadline rq status print
  sched/deadline: Fix artificial overrun introduced by yield_task_dl()
  sched/rt: Clean up check_preempt_equal_prio()
  sched/core: Use dl_bw_of() under rcu_read_lock_sched()
  sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
  ...
2014-12-09 21:21:34 -08:00
David S. Miller
b5f185f33d Merge tag 'master-2014-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-12-08

Please pull this last batch of pending wireless updates for the 3.19 tree...

For the wireless bits, Johannes says:

"This time I have Felix's no-status rate control work, which will allow
drivers to work better with rate control even if they don't have perfect
status reporting. In addition to this, a small hwsim fix from Patrik,
one of the regulatory patches from Arik, and a number of cleanups and
fixes I did myself.

Of note is a patch where I disable CFG80211_WEXT so that compatibility
is no longer selectable - this is intended as a wake-up call for anyone
who's still using it, and is still easily worked around (it's a one-line
patch) before we fully remove the code as well in the future."

For the Bluetooth bits, Johan says:

"Here's one more bluetooth-next pull request for 3.19:

 - Minor cleanups for ieee802154 & mac802154
 - Fix for the kernel warning with !TASK_RUNNING reported by Kirill A.
   Shutemov
 - Support for another ath3k device
 - Fix for tracking link key based security level
 - Device tree bindings for btmrvl + a state update fix
 - Fix for wrong ACL flags on LE links"

And...

"In addition to the previous one this contains two more cleanups to
mac802154 as well as support for some new HCI features from the
Bluetooth 4.2 specification.

From the original request:

'Here's what should be the last bluetooth-next pull request for 3.19.
It's rather large but the majority of it is the Low Energy Secure
Connections feature that's part of the Bluetooth 4.2 specification. The
specification went public only this week so we couldn't publish the
corresponding code before that. The code itself can nevertheless be
considered fairly mature as it's been in development for over 6 months
and gone through several interoperability test events.

Besides LE SC the pull request contains an important fix for command
complete events for mgmt sockets which also fixes some leaks of hci_conn
objects when powering off or unplugging Bluetooth adapters.

A smaller feature that's part of the pull request is service discovery
support. This is like normal device discovery except that devices not
matching specific UUIDs or strong enough RSSI are filtered out.

Other changes that the pull request contains are firmware dump support
to the btmrvl driver, firmware download support for Broadcom BCM20702A0
variants, as well as some coding style cleanups in 6lowpan &
ieee802154/mac802154 code.'"

For the NFC bits, Samuel says:

"With this one we get:

- NFC digital improvements for DEP support: Chaining, NACK and ATN
  support added.

- NCI improvements: Support for p2p target, SE IO operand addition,
  SE operands extensions to support proprietary implementations, and
  a few fixes.

- NFC HCI improvements: OPEN_PIPE and NOTIFY_ALL_CLEARED support,
  and SE IO operand addition.

- A bunch of minor improvements and fixes for STMicro st21nfcb and
  st21nfca"

For the iwlwifi bits, Emmanuel says:

"Major works are CSA and TDLS. On top of that I have a new
firmware API for scan and a few rate control improvements.
Johannes find a few tricks to improve our CPU utilization
and adds support for a new spin of 7265 called 7265D.
Along with this a few random things that don't stand out."

And...

"I deprecate here -8.ucode since -9 has been published long ago.
Along with that I have a new activity, we have now better
a infrastructure for firmware debugging. This will allow to
have configurable probes insides the firmware.
Luca continues his work on NetDetect, this feature is now
complete. All the rest is minor fixes here and there."

For the Atheros bits, Kalle says:

"Only ath10k changes this time and no major changes. Most visible are:

o new debugfs interface for runtime firmware debugging (Yanbo)

o fix shared WEP (Sujith)

o don't rebuild whenever kernel version changes (Johannes)

o lots of refactoring to make it easier to add new hw support (Michal)

There's also smaller fixes and improvements with no point of listing
here."

In addition, there are a few last minute updates to ath5k,
ath9k, brcmfmac, brcmsmac, mwifiex, rt2x00, rtlwifi, and wil6210.
Also included is a pull of the wireless tree to pick-up the fixes
originally included in "pull request: wireless 2014-12-03"...

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 18:12:03 -05:00
Al Viro
17836394e5 first fruits - kill l2cap ->memcpy_fromiovec()
Just use copy_from_iter().  That's what this method is trying to do
in all cases, in a very convoluted fashion.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-09 16:29:10 -05:00
Al Viro
c0371da604 put iov_iter into msghdr
Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-09 16:29:03 -05:00
Al Viro
56c39fb67c switch l2cap ->memcpy_fromiovec() to msghdr
it'll die soon enough - now that kvec-backed iov_iter works regardless
of set_fs(), both instances will become copy_from_iter() as soon as
we introduce ->msg_iter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-09 16:28:23 -05:00
Al Viro
f69e6d131f ip_generic_getfrag, udplite_getfrag: switch to passing msghdr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-09 16:28:22 -05:00
Jiri Pirko
57d743a3de net: sched: cls: remove unused op put from tcf_proto_ops
It is never called and implementations are void. So just remove it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 14:49:02 -05:00
David S. Miller
6db70e3e1d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-12-03

1) Fix a set but not used warning. From Fabian Frederick.

2) Currently we make sequence number values available to userspace
   only if we use ESN. Make the sequence number values also available
   for non ESN states. From Zhi Ding.

3) Remove socket policy hashing. We don't need it because socket
   policies are always looked up via a linked list. From Herbert Xu.

4) After removing socket policy hashing, we can use __xfrm_policy_link
   in xfrm_policy_insert. From Herbert Xu.

5) Add a lookup method for vti6 tunnels with wildcard endpoints.
   I forgot this when I initially implemented vti6.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 21:30:21 -05:00
Alexander Duyck
a5a519b271 fib_trie: Fix /proc/net/fib_trie when CONFIG_IP_MULTIPLE_TABLES is not defined
In recent testing I had disabled CONFIG_IP_MULTIPLE_TABLES and as a result
when I ran "cat /proc/net/fib_trie" the main trie was displayed multiple
times.  I found that the problem line of code was in the function
fib_trie_seq_next.  Specifically the line below caused the indexes to go in
the opposite direction of our traversal:

	h = tb->tb_id & (FIB_TABLE_HASHSZ - 1);

This issue was that the RT tables are defined such that RT_TABLE_LOCAL is ID
255, while it is located at TABLE_LOCAL_INDEX of 0, and RT_TABLE_MAIN is 254
with a TABLE_MAIN_INDEX of 1.  This means that the above line will return 1
for the local table and 0 for main.  The result is that fib_trie_seq_next
will return NULL at the end of the local table, fib_trie_seq_start will
return the start of the main table, and then fib_trie_seq_next will loop on
main forever as h will always return 0.

The fix for this is to reverse the ordering of the two tables.  It has the
advantage of making it so that the tables now print in the same order
regardless of if multiple tables are enabled or not.  In order to make the
definition consistent with the multiple tables case I simply masked the to
RT_TABLE_XXX values by (FIB_TABLE_HASHSZ - 1).  This way the two table
layouts should always stay consistent.

Fixes: 93456b6 ("[IPV4]: Unify access to the routing tables")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 21:14:32 -05:00
Al Viro
ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
David S. Miller
244ebd9f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains netfilter updates for net-next. Basically,
enhancements for xt_recent, skip zeroing of timer in conntrack, fix
linking problem with recent redirect support for nf_tables, ipset
updates and a couple of cleanups. More specifically, they are:

1) Rise maximum number per IP address to be remembered in xt_recent
   while retaining backward compatibility, from Florian Westphal.

2) Skip zeroing timer area in nf_conn objects, also from Florian.

3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
   using meta l4proto and transport layer header, from Alvaro Neira.

4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
   and IP6_NF_IPTABLES=n.

And ipset updates from Jozsef Kadlecsik:

5) Support updating element extensions when the set is full (fixes
   netfilter bugzilla id 880).

6) Fix set match with 32-bits userspace / 64-bits kernel.

7) Indicate explicitly when /0 networks are supported in ipset.

8) Simplify cidr handling for hash:*net* types.

9) Allocate the proper size of memory when /0 networks are supported.

10) Explicitly add padding elements to hash:net,net and hash:net,port,
    because the elements must be u32 sized for the used hash function.

Jozsef is also cooking ipset RCU conversion which should land soon if
they reach the merge window in time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 20:56:46 -08:00
John W. Linville
f700076a9d Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-12-05 14:12:24 -05:00
Marcel Holtmann
4b71bba45c Bluetooth: Enabled LE Direct Advertising Report event if supported
When the controller supports the Extended Scanner Filter Policies, it
supports the LE Direct Advertising Report event. However by default
that event is blocked by the LE event mask. It is required to enable
it during controller setup.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 18:15:33 +02:00
Marcel Holtmann
32c9d43fa5 Bluetooth: Add definitions for LE Direct Advertising Report event
This patch adds the event id and data structures for the LE Direct
Advertising Report event.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 18:15:18 +02:00
Marcel Holtmann
2331fd46f5 Bluetooth: Move LE advertising report defines to the right location
All Bluetooth commands and events are ordered by its opcode or event
id, but for some reason this one now stands out. So move it to its
correct spot in the list.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 18:15:02 +02:00
Marcel Holtmann
da25cf6a98 Bluetooth: Report invalid RSSI for service discovery and background scan
When using Start Service Discovery and when background scanning is used
to report devices, the RSSI is reported or the value 127 is provided in
case RSSI in unavailable.

For Start Discovery the value 0 is reported to keep backwards
compatibility with the existing users.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 14:14:28 +02:00
Marcel Holtmann
0256325ed6 Bluetooth: Add helper function for clearing the discovery filter
The discovery filter allocates memory for its UUID list. So use
a helper function to free it and reset it to default states.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 13:12:58 +02:00
Jakub Pawlowski
37eab042be Bluetooth: Add extra discovery fields for storing filter information
With the upcoming addition of support for Start Service Discovery, the
discovery handling needs to filter on RSSI and UUID values. For that
they need to be stored in the discovery handling. This patch adds the
appropiate fields and also make sure they are reset when discovery
has been stopped.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 12:37:33 +02:00
Jakub Pawlowski
7e61df5423 Bluetooth: Add definitions for MGMT_OP_START_SERVICE_DISCOVERY
This patch adds the opcode and structure for Start Service Discovery
operation.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 12:37:32 +02:00
Marcel Holtmann
a89612aac3 Bluetooth: Add HCI_RSSI_INVALID for unknown RSSI value
The Bluetooth core specification defines the value 127 as invalid for
RSSI values. So instead of hard coding it, lets add a constant for it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 12:37:30 +02:00
Al Viro
435d5f4bb2 common object embedded into various struct ....ns
for now - just move corresponding ->proc_inum instances over there

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04 14:31:00 -05:00
John W. Linville
de51f1649a Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg <johannes@sipsolutions.net> says:

"This time I have Felix's no-status rate control work, which will allow
drivers to work better with rate control even if they don't have perfect
status reporting. In addition to this, a small hwsim fix from Patrik,
one of the regulatory patches from Arik, and a number of cleanups and
fixes I did myself.

Of note is a patch where I disable CFG80211_WEXT so that compatibility
is no longer selectable - this is intended as a wake-up call for anyone
who's still using it, and is still easily worked around (it's a one-line
patch) before we fully remove the code as well in the future."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-12-04 11:29:10 -05:00
John W. Linville
04bb7ecf88 NFC: 3.19 pull request
This is the NFC pull request for 3.19.
 
 With this one we get:
 
 - NFC digital improvements for DEP support: Chaining, NACK and ATN
   support added.
 
 - NCI improvements: Support for p2p target, SE IO operand addition,
   SE operands extensions to support proprietary implementations, and
   a few fixes.
 
 - NFC HCI improvements: OPEN_PIPE and NOTIFY_ALL_CLEARED support,
   and SE IO operand addition.
 
 - A bunch of minor improvements and fixes for STMicro st21nfcb and
   st21nfca
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUfjvWAAoJEIqAPN1PVmxK46MP/1QFb093QmAK3AfMS7XBZdwg
 vgJkZIyFmTeRlDTonRJpv1MXZhPL+JHYK0wGANts1Ba04nZIgK9jzdcqjvd1niD1
 Hk9ggHpfxsIl35FfRVcWAUpyb71Ykug2V396gJH2iiHvt2jHr8IsRWOWCVaItWMO
 ZpVpV58xkROEu115wjKdrmKijL0xf7/F70RlReV4tOLGip3E/bP7EhutF2s5D8Lt
 A5hfauQS3lUJIhlJ1IT59XtdnBSIS5D8OuNYjuJuFkPfIcXAHacEUBrta3KahQPy
 kFGYYjj+V0vZA9JQG/j6vomP4AI0pP4DPkwWqgmioaPshkOOktwf1l21IKQrC7aw
 h7EFEG6q5AxnL5JFbm4y+b5IEp8ALeVi5rv1xdRZ7/AW/n6FWGrkBHatsJceQgX+
 wFhZ52USGHQUzWoJ6SC0O82U831/dgxWm5M42CV5/Bl9d69XX7Om9MWytdjQ94yB
 ERoSJcivc+Q30bAiZmvb44ghv3qMGNArlBVVlcN7+3rbz4XOWLhduqCJaUs12fgY
 lyGdNHxVVmEeEY9ByOjr6RD1pYpSDgEQSYvn5pbKadX+DqpXHQuGC4JHtbBkzKn0
 Se9c442HRBjts0VprJ7wjLgbf5VfqFjtgsfqubD/mcTn61B4/t9zyGgV1VuI7UgL
 QxK0m0rB1nOxpdWFesl1
 =WHHp
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.19 pull request

This is the NFC pull request for 3.19.

With this one we get:

- NFC digital improvements for DEP support: Chaining, NACK and ATN
  support added.

- NCI improvements: Support for p2p target, SE IO operand addition,
  SE operands extensions to support proprietary implementations, and
  a few fixes.

- NFC HCI improvements: OPEN_PIPE and NOTIFY_ALL_CLEARED support,
  and SE IO operand addition.

- A bunch of minor improvements and fixes for STMicro st21nfcb and
  st21nfca"

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-12-04 11:27:40 -05:00
Johan Hedberg
6928a9245f Bluetooth: Store address type with OOB data
To be able to support OOB data for LE pairing we need to store the
address type of the remote device. This patch extends the relevant
functions and data types with a bdaddr_type variable.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:21 +01:00
Johan Hedberg
81328d5cca Bluetooth: Unify remote OOB data functions
There's no need to duplicate code for the 192 vs 192+256 variants of the
OOB data functions. This is also helpful to pave the way to support LE
SC OOB data where only 256 bit data is provided.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
ef8efe4bf8 Bluetooth: Add skeleton for BR/EDR SMP channel
This patch adds the very basic code for creating and destroying SMP
L2CAP channels for BR/EDR connections.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
858cdc78be Bluetooth: Add debugfs switch for forcing SMP over BR/EDR
To make it possible to use LE SC functionality over BR/EDR with pre-4.1
controllers (that do not support BR/EDR SC links) it's useful to be able
to force LE SC operations even over a traditional SSP protected link.
This patch adds a debugfs switch to force a special debug flag which is
used to skip the checks for BR/EDR SC support.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
fe8bc5ac67 Bluetooth: Add hci_conn flag for new link key generation
For LE Secure Connections we want to trigger cross transport key
generation only if a new link key was actually created during the BR/EDR
connection. This patch adds a new flag to track this information.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
f3a73d97b3 Bluetooth: Rename hci_find_ltk_by_addr to hci_find_ltk
Now that hci_find_ltk_by_addr is the only LTK lookup function there's no
need to keep the long name anymore. This patch shortens the function
name to simply hci_find_ltk.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:16 +01:00
Johan Hedberg
0ac3dbf999 Bluetooth: Remove unused hci_find_ltk function
Now that LTKs are always looked up based on bdaddr (with EDiv/Rand
checks done after a successful lookup) the hci_find_ltk function is not
needed anymore. This patch removes the function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:16 +01:00
Johan Hedberg
710f11c08e Bluetooth: Use custom macro for testing BR/EDR SC enabled
Since the HCI_SC_ENABLED flag will also be used for controllers without
BR/EDR Secure Connections support whenever we need to check specifically
for SC for BR/EDR we also need to check that the controller actually
supports it. This patch adds a convenience macro for check all the
necessary conditions and converts the places in the code that need it to
use it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:15 +01:00
Johan Hedberg
23fb8de376 Bluetooth: Add mgmt support for LE Secure Connections LTK types
We need a dedicated LTK type for LTK resulting from a Secure Connections
based SMP pairing. This patch adds a new define for it and ensures that
both the New LTK event as well as the Load LTKs command supports it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:15 +01:00
Scott Feldman
38dcf357ae bridge: call netdev_sw_port_stp_update when bridge port STP status changes
To notify switch driver of change in STP state of bridge port, add new
.ndo op and provide switchdev wrapper func to call ndo op. Use it in bridge
code then.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:22 -08:00
Jiri Pirko
007f790c82 net: introduce generic switch devices support
The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:20 -08:00
Christophe Ricard
b3a55b9c5d NFC: hci: Add specific hci macro to not create a pipe
Some pipe are only created by other host (different than the
Terminal Host).
The pipe values will for example be notified by
NFC_HCI_ADM_NOTIFY_PIPE_CREATED.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:48:13 +01:00
Johan Hedberg
0bd49fc75a Bluetooth: Track both local and remote L2CAP fixed channel mask
To pave the way for future fixed channels to be added easily we should
track both the local and remote mask on a per-L2CAP connection (struct
l2cap_conn) basis. So far the code has used a global variable in a racy
way which anyway needs fixing.

This patch renames the existing conn->fixed_chan_mask that tracked
the remote mask to conn->remote_fixed_chan and adds a new variable
conn->local_fixed_chan to track the local mask. Since the HS support
info is now available in the local mask we can remove the
conn->hs_enabled variable.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-02 09:26:50 +01:00
Christophe Ricard
a688bf55c5 NFC: nci: Add se_io NCI operand
se_io allows to send apdu over the CLF to the embedded Secure Element.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:01:21 +01:00
Christophe Ricard
e9ef9431a3 NFC: nci: Update nci_disable_se to run proprietary commands to disable a secure element
Some NFC controller using NCI protocols may need a proprietary commands
flow to disable a secure element

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:01:21 +01:00
Christophe Ricard
93bca2bfa4 NFC: nci: Update nci_enable_se to run proprietary commands to enable a secure element
Some NFC controller using NCI protocols may need a proprietary commands
flow to enable a secure element

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:01:21 +01:00
Christophe Ricard
ba4db551bb NFC: nci: Update nci_discover_se to run proprietary commands to discover all available secure element
Some NFC controller using NCI protocols may need a proprietary commands
flow to discover all available secure element

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:01:21 +01:00
Christophe Ricard
9b8d32b7ac NFC: hci: Add se_io HCI operand
se_io allows to send apdu over the CLF to the embedded Secure Element.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 01:49:58 +01:00
John W. Linville
992066c8d3 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-12-01 15:49:58 -05:00
David S. Miller
60b7379dc5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
Felix Fietkau
f027c2aca0 mac80211: add ieee80211_tx_status_noskb
This can be used by drivers that cannot reliably map tx status
information onto specific skbs.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:51 +01:00
Felix Fietkau
f684565e0a mac80211: add tx_status_noskb to rate_control_ops
This op works like .tx_status, except it does not need access to the
skb. This will be used by drivers that cannot match tx status
information to specific packets.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:50 +01:00
Arik Nemtsov
ad932f046f cfg80211: leave invalid channels on regdomain change
When the regulatory settings change, some channels might become invalid.
Disconnect interfaces acting on these channels, after giving userspace
code a grace period to leave them.

This mode is currently opt-in, and not all interface operating modes are
supported for regulatory-enforcement checks. A wiphy that wishes to use
the new enforcement code must specify an appropriate regulatory flag,
and all its supported interface modes must be supported by the checking
code.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
[fix some indentation, typos]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 14:33:41 +01:00
Julien Lefrique
529ee06682 NFC: NCI: Configure ATR_RES general bytes
The Target responds to the ATR_REQ with the ATR_RES. Configure the General
Bytes in ATR_RES with the first three octets equal to the NFC Forum LLCP
magic number, followed by some LLC Parameters TLVs described in section
4.5 of [LLCP].

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
a99903ec45 NFC: NCI: Handle Target mode activation
Changes:

 * Extract the Listen mode activation parameters from RF_INTF_ACTIVATED_NTF.

 * Store the General Bytes of ATR_REQ.

 * Signal that Target mode is activated in case of an activation in NFC-DEP.

 * Update the NCI state accordingly.

 * Use the various constants defined in nfc.h.

 * Fix the ATR_REQ and ATR_RES maximum size. As per NCI 1.0 and NCI 1.1, the
   Activation Parameters for both Poll and Listen mode contain all the bytes of
   ATR_REQ/ATR_RES starting and including Byte 3 as defined in [DIGITAL].
   In [DIGITAL], the maximum size of ATR_REQ/ATR_RES is 64 bytes and they are
   numbered starting from Byte 1.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
90d78c1396 NFC: NCI: Enable NFC-DEP in Listen A and Listen F
Send LA_SEL_INFO and LF_PROTOCOL_TYPE with NFC-DEP protocol enabled.
Configure 212 Kbit/s and 412 Kbit/s bit rates for Listen F.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Mark A. Greer
384ab1d174 NFC: digital: Add NFC-DEP Initiator-side ATN Support
When an NFC-DEP Initiator times out when waiting for
a DEP_RES from the Target, its supposed to send an
ATN to the Target.  The Target should respond to the
ATN with a similar ATN PDU and the Initiator can then
resend the last non-ATN PDU that it sent.  No more
than 'N(retry,atn)' are to be send where
2 <= 'N(retry,atn)' <= 5.  If the Initiator had just
sent a NACK PDU when the timeout occurred, it is to
continue sending NACKs until 'N(retry,nack)' NACKs
have been send.  This is described in section
14.12.5.6 of the NFC-DEP Digital Protocol Spec.

The digital layer's NFC-DEP code doesn't implement
this so add that support.

The value chosen for 'N(retry,atn)' is 2.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:55 +01:00
Mark A. Greer
49dbb14e30 NFC: digital: Add NFC-DEP Target-side NACK Support
When an NFC-DEP Target receives a NACK PDU with
a PNI equal to 1 less than the current PNI, it
is supposed to re-send the last PDU.  This is
implied in section 14.12.5.4 of the NFC Digital
Protocol Spec.

The digital layer's NFC-DEP code doesn't implement
Target-side NACK handing so add it.  The last PDU
that was sent is saved in the 'nfc_digital_dev'
structure's 'saved_skb' member.  The skb will have
an additional reference taken to ensure that the skb
isn't freed when the driver performs a kfree_skb()
on the skb.  The length of the skb/PDU is also saved
so the length can be restored when re-sending the PDU
in the skb (the driver will perform an skb_pull() so
an skb_push() needs to be done to restore the skb's
data pointer/length).

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:47 +01:00
Mark A. Greer
a80509c76b NFC: digital: Add NFC-DEP Initiator-side NACK Support
When an NFC-DEP Initiator receives a frame with
an incorrect CRC or with a parity error, and the
frame is at least 4 bytes long, its supposed to
send a NACK to the Target.  The Initiator can
send up to 'N(retry,nack)' consecutive NACKs
where 2 <= 'N(retry,nack)' <= 5.  When the limit
is exceeded, a PROTOCOL EXCEPTION is raised.
Any other type of transmission error is to be
ignored and the Initiator should continue
waiting for a new frame.  This is described
in section 14.12.5.4 of the NFC Digital Protocol
Spec.

The digital layer's NFC-DEP code doesn't implement
any of this so add it.  This support diverges from
the spec in two significant ways:

a) NACKs will be sent for ANY error reported by the
   driver except a timeout.  This is done because
   there is currently no way for the digital layer
   to distinguish a CRC or parity error from any
   other type of error reported by the driver.

b) All other errors will cause a PROTOCOL EXCEPTION
   even frames with CRC errors that are less than 4
   bytes.

The value chosen for 'N(retry,nack)' is 2.

Targets do not send NACK PDUs.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:33 +01:00
Mark A. Greer
3bd2a5bcc6 NFC: digital: Add NFC-DEP Send Chaining Support
When the NFC-DEP code is given a packet to send
that is larger than the peer's maximum payload,
its supposed to set the 'MI' bit in the 'I' PDU's
Protocol Frame Byte (PFB).  Setting this bit
indicates that NFC-DEP chaining is to occur.

When NFC-DEP chaining is progress, sender 'I' PDUs
are acknowledged with 'ACK' PDUs until the last 'I'
PDU in the chain (which has the 'MI' bit cleared)
is responded to with a normal 'I' PDU.  This can
occur while in Initiator mode or in Target mode.

Sender NFC-DEP chaining is currently not implemented
in the digital layer so add that support.  Unfortunately,
since sending a frame may require writing the CRC to the
end of the data, the relevant data part of the original
skb must be copied for each intermediate frame.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:10 +01:00
Mark A. Greer
b08147cbc4 NFC: digital: Implement NFC-DEP max payload lengths
The maximum payload for NFC-DEP exchanges (i.e., the
number of bytes between SoD and EoD) is negotiated
using the ATR_REQ, ATR_RES, and PSL_REQ commands.
The valid maximum lengths are 64, 128, 192, and 254
bytes.

Currently, NFC-DEP code assumes that both sides are
always using 254 byte maximums and ignores attempts
by the peer to change it.  Instead, implement the
negotiation code, enforce the local maximum when
receiving data from the peer, and don't send payloads
that exceed the remote's maximum.  The default local
maximum is 254 bytes.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:38:59 +01:00
Mark A. Greer
05afedcb89 NFC: digital: Add Target-mode NFC-DEP DID Support
When in Target mode, the Initiator specifies whether
subsequent DEP_REQ and DEP_RES frames will include
a DID byte by the value passed in the ATR_REQ.  If
the DID value in the ATR_REQ is '0' then no DID
byte will be included.  If the DID value is between
'1' and '14' then a DID byte containing the same
value must be included in subsequent DEP_REQ and
DEP_RES frames.  Any other DID value is invalid.
This is specified in sections 14.8.1.2 and 14.8.2.2
of the NFC Digital Protocol Spec.

Checking the DID value (if it should be there at all),
is not currently supported by the digital layer's
NFC-DEP code.  Add this support by remembering the
DID value in the ATR_REQ, checking the DID value of
received DEP_REQ frames (if it should be there at all),
and including the remembered DID value in DEP_RES
frames when appropriate.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:38:24 +01:00
Pablo Neira Ayuso
b59eaf9e28 netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module
This resolves linking problems with CONFIG_IPV6=n:

net/built-in.o: In function `redirect_tg6':
xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'

Reported-by: Andreas Ruprecht <rupran@einserver.de>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 13:08:42 +01:00
Alvaro Neira
68b0faa87d netfilter: nf_tables_bridge: export nft_reject_ip*hdr_validate functions
This patch exports the functions nft_reject_iphdr_validate and
nft_reject_ip6hdr_validate to use it in follow up patches.
These functions check if the IPv4/IPv6 header is correct.

Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:58:05 +01:00
Florian Westphal
c41884ce05 netfilter: conntrack: avoid zeroing timer
add a __nfct_init_offset annotation member to struct nf_conn to make
it clear which members are covered by the memset when the conntrack
is allocated.

This avoids zeroing timer_list and ct_net; both are already inited
explicitly.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:41:06 +01:00
Willem de Bruijn
f4713a3dfa net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks
TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets.
If the socket is of family AF_INET6, call ipv6_recv_error instead
of ip_recv_error.

This change is more complex than a single branch due to the loadable
ipv6 module. It reuses a pre-existing indirect function call from
ping. The ping code is safe to call, because it is part of the core
ipv6 module and always present when AF_INET6 sockets are active.

Fixes: 4ed2d765 (net-timestamp: TCP timestamping)
Signed-off-by: Willem de Bruijn <willemb@google.com>

----

It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6)
to ip_recv_error.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 15:45:04 -05:00
Johannes Berg
98f0334263 cfg80211: clean up beacon loss CQM event
Having it as a sub-event for RSSI thresholds is very ugly,
but luckily no userspace actually uses the events yet.

Move the event to its own function call internally and to
its own event attribute in nl80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-26 20:56:42 +01:00
Tom Herbert
7c967b224a net: Add remcsum_adjust as common function for remote checksum offload
This function does the work to update a checksum field as part of
remote checksum offload.

remcsum_adjust does the following:

1) Subtract out the calculated checksum from the beginning of the
   packet (ptr arg) to the start offset.
2) Adjust the checksum field indicated by offset based on the modified
   checksum value from above step.
3) Return the difference in the old checksum field value and the
   new one. The caller will use this to update skb->csum and NAPI csum.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:25:43 -05:00
Johannes Berg
5b97f49d65 cfg80211: refactor the various CQM event sending code
Much of the code can be shared by moving it into helper functions
for the CQM event sending.

Also move the code closer together, even in the header file.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-26 12:47:38 +01:00
David S. Miller
d3fc6b3fdd Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
More work from Al Viro to move away from modifying iovecs
by using iov_iter instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-25 20:02:51 -05:00
Daniel Borkmann
79e886599e crypto: algif - add and use sock_kzfree_s() instead of memzero_explicit()
Commit e1bd95bf7c ("crypto: algif - zeroize IV buffer") and
2a6af25bef ("crypto: algif - zeroize message digest buffer")
added memzero_explicit() calls on buffers that are later on
passed back to sock_kfree_s().

This is a discussed follow-up that, instead, extends the sock
API and adds sock_kzfree_s(), which internally uses kzfree()
instead of kfree() for passing the buffers back to slab.

Having sock_kzfree_s() allows to keep the changes more minimal
by just having a drop-in replacement instead of adding
memzero_explicit() calls everywhere before sock_kfree_s().

In kzfree(), the compiler is not allowed to optimize the memset()
away and thus there's no need for memzero_explicit(). Both,
sock_kfree_s() and sock_kzfree_s() are wrappers for
__sock_kfree_s() and call into kfree() resp. kzfree(); here,
__sock_kfree_s() needs to be explicitly inlined as we want the
compiler to optimize the call and condition away and thus it
produces e.g. on x86_64 the _same_ assembler output for
sock_kfree_s() before and after, and thus also allows for
avoiding code duplication.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-25 22:50:39 +08:00
Jiri Pirko
08dcf9fd19 tc_vlan: fix type of tcfv_push_vid
Should be u16. So fix it to kill the sparse warning.

Fixes: c7e2b9689e "sched: introduce vlan action"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:12:03 -05:00
David S. Miller
958d03b016 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/ipvs updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:

1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
   patches from Arturo Borrero.

2) Introduce the nf_tables redir expression, from Arturo Borrero.

3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
   Patch from Alex Gartrell.

4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
   from Marcelo Leitner.

5) Add some extra validation when registering logger families, also
   from Marcelo.

6) Some spelling cleanups from stephen hemminger.

7) Fix sparse warning in nf_logger_find_get().

8) Add cgroup support to nf_tables meta, patch from Ana Rey.

9) A Kconfig fix for the new redir expression and fix sparse warnings in
   the new redir expression.

10) Fix several sparse warnings in the netfilter tree, from
    Florian Westphal.

11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
    nothing when this situation occurs.

12) Add conntrack zone support to xt_connlimit, again from Florian.

13) Add netnamespace support to the h323 conntrack helper, contributed
    by Vasily Averin.

14) Remove unnecessary nul-pointer checks before free_percpu() and
    module_put(), from Markus Elfring.

15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:00:58 -05:00
Al Viro
0f7db23a07 vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:42 -05:00
Al Viro
e0eb093e79 switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:40 -05:00
Al Viro
e169371823 switch ipxrtr_route_packet() from iovec to msghdr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:49 -05:00
Al Viro
6ce8e9ce59 new helper: memcpy_from_msg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:48 -05:00
David S. Miller
1459143386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ieee802154/fakehard.c

A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.

Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 22:28:24 -05:00
David S. Miller
53b15ef3c2 Merge tag 'master-2014-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-11-21

Please pull this batch of updates intended for the 3.19 stream...

For the mac80211 bits, Johannes says:

"It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
 * TDLS off-channel support set from Arik/Liad, with some support
   patches I did
 * custom regulatory fixes from Arik
 * minstrel VHT fix (and a small optimisation) from Felix
 * add back radiotap vendor namespace support (myself)
 * random MAC address scanning for cfg80211/mac80211/hwsim (myself)
 * CSA improvements (Luca)
 * WoWLAN Net Detect (wake on network found) support (Luca)
 * and lots of other smaller changes from many people"

For the Bluetooth bits, Johan says:

"Here's another set of patches for 3.19. Most of it is again fixes and
cleanups to ieee802154 related code from Alexander Aring. We've also got
better handling of hardware error events along with a proper API for HCI
drivers to notify the HCI core of such situations. There's also a minor
fix for mgmt events as well as a sparse warning fix. The code for
sending HCI commands synchronously also gets a fix where we might loose
the completion event in the case of very fast HW (particularly easily
reproducible with an emulated HCI device)."

And...

"Here's another bluetooth-next pull request for 3.19. We've got:

 - Various fixes, cleanups and improvements to ieee802154/mac802154
 - Support for a Broadcom BCM20702A1 variant
 - Lots of lockdep fixes
 - Fixed handling of LE CoC errors that should trigger SMP"

For the Atheros bits, Kalle says:

"One ath6kl patch and rest for ath10k, but nothing really major which
stands out. Most notable:

o fix resume (Bartosz)

o firmware restart is now faster and more reliable (Michal)

o it's now possible to test hardware restart functionality without
  crashing the firmware using hw-restart parameter with
  simulate_fw_crash debugfs file (Michal)"

On top of that...both ath9k and mwifiex get their usual level of
updates.  Of note is the ath9k spectral scan work from Oleksij Rempel.

I also pulled from the wireless tree in order to avoid some merge issues.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 16:39:45 -05:00
Jiri Pirko
c7e2b9689e sched: introduce vlan action
This tc action allows to work with vlan tagged skbs. Two supported
sub-actions are header pop and header push.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:20:18 -05:00
John W. Linville
9a638ddfb0 It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
  * TDLS off-channel support set from Arik/Liad, with some support
    patches I did
  * custom regulatory fixes from Arik
  * minstrel VHT fix (and a small optimisation) from Felix
  * add back radiotap vendor namespace support (myself)
  * random MAC address scanning for cfg80211/mac80211/hwsim (myself)
  * CSA improvements (Luca)
  * WoWLAN Net Detect (wake on network found) support (Luca)
  * and lots of other smaller changes from many people
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUbghkAAoJEDBSmw7B7bqroosP/RABYXUMua+k3Ccq7T4eU4jV
 AEO2p2gt5nHBzEl1NCJtdUTJkZ6ftT7ehvAkV2zboB0FBoAoBTbZ8YDtcBBiWaY6
 wJ5TYuOl6LFo7csAxWxpCzUxW3M+iq26itpyW9Zt9WWxP4QLSNPFyXEXV3SEh45n
 HcDW9A0SE+mgdaTQ2LEMBJ5XWxG/Ic7i9Xn6Py3o4x7NsTB4EqFNOD0WXcPCq7M0
 H8xlsIYIBYoGNMsV/2Nu7CEgcSXfDLqWcs9uPHQMCvWPjx/vIoEyOgTwJlE9bQHh
 2tloc1LBP6XKQ6g2bJ/pBaQVnZGugcOJhD6KUq3ckNm9qIP1ZtRmJslH4V6pUSCS
 eGl3TcAKSSE4BWIa7AaETWXeoH4X68dO7PM7pnflQRCQLzCJRbDWCdqjBst/AxBT
 6hvAFAvExEcWBkNVSTJ2egRk/C9cDFKRaCWQ1h4wX9yvh+8efe1D0DDWLW9a9qv1
 LsoGJE72BZdXn2CaQEME+CjTd3fWmn6u729d/c863cq2kspCSOof0QD0X9uWhBUx
 BvqtgbQjGZzAvHFcjBd7yRd5hz0aDfLyBL59bq2IBzaU1QmyekNPqzSMSD+5ZlCp
 uxEeE5AY2+pcNZV1KRtkvgAByfUgAVd0FHZcVb8SIM6QZ3IhqiOuzxuXtxv6hrYP
 V/76B+ath4Sv1IPF56ex
 =q4e6
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
 * TDLS off-channel support set from Arik/Liad, with some support
   patches I did
 * custom regulatory fixes from Arik
 * minstrel VHT fix (and a small optimisation) from Felix
 * add back radiotap vendor namespace support (myself)
 * random MAC address scanning for cfg80211/mac80211/hwsim (myself)
 * CSA improvements (Luca)
 * WoWLAN Net Detect (wake on network found) support (Luca)
 * and lots of other smaller changes from many people"

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-20 16:09:30 -05:00
Al Viro
232365f660 bury skb_copy_to_page()
no callers since 3.0

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 16:24:35 -05:00
Al Viro
08adb7dabd fold verify_iovec() into copy_msghdr_from_user()
... and do the same on the compat side of things.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 16:23:49 -05:00
Johannes Berg
f815e2b3c0 mac80211: notify drivers on sta rate table changes
This allows drivers with a firmware or chip-based rate lookup table to
use the most recent default rate selection without having to get it from
per-packet data or explicit ieee80211_get_tx_rate calls

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 19:15:50 +01:00
Johannes Berg
a344d6778a mac80211: allow drivers to support NL80211_SCAN_FLAG_RANDOM_ADDR
Allow drivers to support NL80211_SCAN_FLAG_RANDOM_ADDR with software
based scanning and generate a random MAC address for them for every
scan request with the flag.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:46:09 +01:00
Johannes Berg
ad2b26abc1 cfg80211: allow drivers to support random MAC addresses for scan
Add the necessary feature flags and a scan flag to support using
random MAC addresses for scan while unassociated.

The configuration for this supports an arbitrary MAC address
value and mask, so that any kind of configuration (e.g. fixed
OUI or full 46-bit random) can be requested. Full 46-bit random
is the default when no other configuration is passed.

Also add a small helper function to use the addr/mask correctly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:52 +01:00
Luciano Coelho
8cd4d4563e cfg80211: add wowlan net-detect support
Add a new WoWLAN API to enable net-detect as a wake up trigger.
Net-detect allows the device to scan in the background while the
host is asleep to wake up the host system when a matching network
is found.

Reuse the scheduled scan attributes to specify how the scan is
performed while suspended and the matches that will trigger a
wake event.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:45 +01:00
Liad Kaufman
b6da911b3c mac80211: synchronously reserve TID per station
In TDLS (e.g., TDLS off-channel) there is a requirement for
some drivers to supply an unused TID between the AP and the
device to the FW, to allow sending PTI requests and to allow
the FW to aggregate on a specific TID for better throughput.

To ensure that the allocated TID is indeed unused, this patch
introduces an API for blocking the driver from TXing on that
TID.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:36 +01:00
Arik Nemtsov
8a4d32f30d mac80211: add TDLS channel-switch Rx flow
When receiving a TDLS channel switch request or response, parse the frame
and call a new tdls_recv_channel_switch op in the low level driver with
the parsed data.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:26 +01:00
Arik Nemtsov
a7a6bdd067 mac80211: introduce TDLS channel switch ops
Implement the cfg80211 TDLS channel switch ops and introduce new mac80211
ones for low-level drivers.
Verify low-level driver support for the new ops when using the relevant
wiphy feature bit. Also verify the peer supports channel switching before
passing the command down.

Add a new STA flag to track the off-channel state with the TDLS peer and
make sure to cancel the channel-switch if the peer STA is unexpectedly
removed.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:21 +01:00
Arik Nemtsov
1057d35ede cfg80211: introduce TDLS channel switch commands
Introduce commands to initiate and cancel TDLS channel-switching. Once
TDLS channel-switching is started, the lower level driver is responsible
for continually initiating channel-switch operations and returning to
the base (AP) channel to listen for beacons from time to time.

Upon cancellation of the channel-switch all communication between the
relevant TDLS peers will continue on the base channel.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:12 +01:00
Johan Hedberg
0378b59770 Bluetooth: Convert link keys list to use RCU
This patch converts the hdev->link_keys list to be protected through
RCU, thereby eliminating the need to hold the hdev lock while accessing
the list.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-19 16:19:47 +01:00
Joe Stringer
11bf7828a5 vxlan: Inline vxlan_gso_check().
Suggested-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:38:44 -05:00
Rick Jones
e3e3217029 icmp: Remove some spurious dropped packet profile hits from the ICMP path
If icmp_rcv() has successfully processed the incoming ICMP datagram, we
should use consume_skb() rather than kfree_skb() because a hit on the likes
of perf -e skb:kfree_skb is not called-for.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:28:28 -05:00
John W. Linville
f48ecb19bc Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg <johan.hedberg@gmail.com> says:

"Here's another bluetooth-next pull request for 3.19. We've got:

 - Various fixes, cleanups and improvements to ieee802154/mac802154
 - Support for a Broadcom BCM20702A1 variant
 - Lots of lockdep fixes
 - Fixed handling of LE CoC errors that should trigger SMP"

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-18 15:13:26 -05:00
Johannes Berg
760a52e80f Merge remote-tracking branch 'wireless-next/master' into mac80211-next
This brings in some mwifiex changes that further patches will
need to work on top to not cause merge conflicts.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-18 09:32:44 +01:00
Johan Hedberg
38da170306 Bluetooth: Use shorter "rand" name for "randomizer"
The common short form of "randomizer" is "rand" in many places
(including the Bluetooth specification). The shorter version also makes
for easier to read code with less forced line breaks. This patch renames
all occurences of "randomizer" to "rand" in the Bluetooth subsystem
code.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-18 01:53:15 +01:00
Alexander Aring
ee7b9053bd ieee802154: fix byteorder for short address and panid
This patch changes the byteorder handling for short and panid handling.
We now except to get little endian in nl802154 for these attributes.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:17 +01:00
Alexander Aring
cb41c8dd01 ieee802154: rename and move WPAN_NUM_ defines
This patch moves the 802.15.4 constraints WPAN_NUM_ defines into
"net/ieee802154.h" which should contain all necessary 802.15.4 related
information. Also rename these defines to a common name which is
IEEE802154_MAX_CHANNEL and IEEE802154_MAX_PAGE.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:17 +01:00
Alexander Aring
b821ecd4c8 ieee802154: add del interface command
This patch adds support for deleting a wpan interface via nl802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:17 +01:00
Alexander Aring
0e57547eb7 ieee802154: setting extended address while iface add
This patch adds support for setting an extended address while
registration a new interface. If ieee802154_is_valid_extended_addr
getting as parameter and invalid extended address then the perm address
is fallback. This is useful to make some default handling while for
example default registration of a wpan interface while phy registration.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:16 +01:00
Alexander Aring
f3ea5e4423 ieee802154: add new interface command
This patch adds a new nl802154 command for adding a new interface
according to a wpan phy via nl802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:16 +01:00
David S. Miller
f1227c5c1b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter updates for your net tree,
they are:

1) Fix missing initialization of the range structure (allocated in the
   stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.

2) Make sure the data we receive from userspace contains the req_version
   structure, otherwise return an error incomplete on truncated input.
   From Dan Carpenter.

3) Fix handling og skb->sk which may cause incorrect handling
   of connections from a local process. Via Simon Horman, patch from
   Calvin Owens.

4) Fix wrong netns in nft_compat when setting target and match params
   structure.

5) Relax chain type validation in nft_compat that was recently included,
   this broke the matches that need to be run from the route chain type.
   Now iptables-test.py automated regression tests report success again
   and we avoid the only possible problematic case, which is the use of
   nat targets out of nat chain type.

6) Use match->table to validate the tablename, instead of the match->name.
   Again patch for nft_compat.

7) Restore the synchronous release of objects from the commit and abort
   path in nf_tables. This is causing two major problems: splats when using
   nft_compat, given that matches and targets may sleep and call_rcu is
   invoked from softirq context. Moreover Patrick reported possible event
   notification reordering when rules refer to anonymous sets.

8) Fix race condition in between packets that are being confirmed by
   conntrack and the ctnetlink flush operation. This happens since the
   removal of the central spinlock. Thanks to Jesper D. Brouer to looking
   into this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:23:56 -05:00
Ingo Molnar
e9ac5f0fa8 Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:50:25 +01:00
Johan Hedberg
adae20cb2d Bluetooth: Convert IRK list to RCU
This patch set converts the hdev->identity_resolving_keys list to use
RCU to eliminate the need to use hci_dev_lock/unlock.

An additional change that must be done is to remove use of
CRYPTO_ALG_ASYNC for the hdev-specific AES crypto context. The reason is
that this context is used for matching RPAs and the loop that does the
matching is under the RCU read lock, i.e. is an atomic section which
cannot sleep.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:53:27 +01:00
Johan Hedberg
970d0f1b28 Bluetooth: Convert LTK list to RCU
This patch set converts the hdev->long_term_keys list to use RCU to
eliminate the need to use hci_dev_lock/unlock.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:53:27 +01:00
Joe Stringer
23e62de33d net: Add vxlan_gso_check() helper
Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not
other UDP-based encapsulation protocols where the format and size of the
header differs. This patch implements a generic ndo_gso_check() for
VXLAN which will only advertise GSO support when the skb looks like it
contains VXLAN (or no UDP tunnelling at all).

Implementation shamelessly stolen from Tom Herbert:
http://thread.gmane.org/gmane.linux.network/332428/focus=333111

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14 17:12:48 -05:00
David S. Miller
076ce44825 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/chelsio/cxgb4vf/sge.c
	drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c

sge.c was overlapping two changes, one to use the new
__dev_alloc_page() in net-next, and one to use s->fl_pg_order in net.

ixgbe_phy.c was a set of overlapping whitespace changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14 01:01:12 -05:00
Herbert Xu
53c2e285f9 xfrm: Do not hash socket policies
Back in 2003 when I added policy expiration, I half-heartedly
did a clean-up and renamed xfrm_sk_policy_link/xfrm_sk_policy_unlink
to __xfrm_policy_link/__xfrm_policy_unlink, because the latter
could be reused for all policies.  I never actually got around
to using __xfrm_policy_link for non-socket policies.

Later on hashing was added to all xfrm policies, including socket
policies.  In fact, we don't need hashing on socket policies at
all since they're always looked up via a linked list.

This patch restores xfrm_sk_policy_link/xfrm_sk_policy_unlink
as wrappers around __xfrm_policy_link/__xfrm_policy_unlink so
that it's obvious we're dealing with socket policies.

This patch also removes hashing from __xfrm_policy_link as for
now it's only used by socket policies which do not need to be
hashed.  Ironically this will in fact allow us to use this helper
for non-socket policies which I shall do later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-11-13 11:25:03 +01:00
Johan Hedberg
abe84903a8 Bluetooth: Use proper nesting annotation for l2cap_chan lock
By default lockdep considers all L2CAP channels equal. This would mean
that we get warnings if a channel is locked when another one's lock is
tried to be acquired in the same thread. This kind of inter-channel
locking dependencies exist in the form of parent-child channels as well
as any channel wishing to elevate the security by requesting procedures
on the SMP channel.

To eliminate the chance for these lockdep warnings we introduce a
nesting level for each channel and use that when acquiring the channel
lock. For now there exists the earlier mentioned three identified
categories: SMP, "normal" channels and parent channels (i.e. those in
BT_LISTEN state). The nesting level is defined as atomic_t since we need
access to it before the lock is actually acquired.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-13 07:49:09 +01:00
Alexander Aring
61f2dcba9a mac802154: add interframe spacing time handling
This patch adds a new interframe spacing time handling into mac802154
layer. Interframe spacing time is a time period between each transmit.
This patch adds a high resolution timer into mac802154 and starts on
xmit complete with corresponding interframe spacing expire time if
ifs_handling is true. We make it variable because it depends if
interframe spacing time is handled by transceiver or mac802154. At the
timer complete function we wake the netdev queue again. This avoids
new frame transmit in range of interframe spacing time.

For synced driver we add no handling of interframe spacing time. This
is currently a lack of support in all synced xmit drivers. I suppose
it's working because the latency of workqueue which is needed to call
spi_sync.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-13 04:51:58 +01:00
Tom Herbert
a8c5f90fb5 ip_tunnel: Ops registration for secondary encap (fou, gue)
Instead of calling fou and gue functions directly from ip_tunnel
use ops for these that were previously registered. This patch adds the
logic to add and remove encapsulation operations for ip_tunnel,
and modified fou (and gue) to register with ip_tunnels.

This patch also addresses a circular dependency between ip_tunnel
and fou that was causing link errors when CONFIG_NET_IP_TUNNEL=y
and CONFIG_NET_FOU=m. References to fou an gue have been removed from
ip_tunnel.c

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12 15:01:35 -05:00
Joe Perches
955a9d202f irda: Convert IRDA_DEBUG to pr_debug
Use the normal kernel debugging mechanism which also
enables dynamic_debug at the same time.

Other miscellanea:

o Remove sysctl for irda_debug
o Remove function tracing like uses (use ftrace instead)
o Coalesce formats
o Realign arguments
o Remove unnecessary OOM messages

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12 13:56:41 -05:00
Pablo Neira Ayuso
b326dd37b9 netfilter: nf_tables: restore synchronous object release from commit/abort
The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.

Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.

So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.

This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Alexander Aring
c8937a1d11 ieee820154: add lbt setting support
This patch adds support for setting listen before transmit mode via
nl802154 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:43 +01:00
Alexander Aring
17a3a46bfb ieee820154: add max frame retries setting support
This patch add support for setting mac frame retries setting via
nl802154 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:42 +01:00
Alexander Aring
a01ba7652c ieee820154: add max csma backoffs setting support
This patch add support for max csma backoffs setting via nl802154
framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:41 +01:00
Alexander Aring
656a999e87 ieee820154: add backoff exponent setting support
This patch adds support for setting backoff exponents via nl802154
framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:40 +01:00
Alexander Aring
9830c62a0b ieee820154: add short_addr setting support
This patch adds support for setting short address via nl802154 framework.
Also added a comment because a 0xffff seems to be valid address that we
don't have a short address. This is a valid setting but we need
more checks in upper layers to don't allow this address as source address.
Also the current netlink interface doesn't allow to set the short_addr
to 0xffff. Same for the 0xfffe short address which describes a not
allocated short address.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:40 +01:00
Alexander Aring
702bf37128 ieee820154: add pan_id setting support
This patch adds support for setting pan_id via nl802154 framework.
Adding a comment because setting 0xffff as pan_id seems to be valid
setting. The pan_id 0xffff as source pan is invalid. I am not sure now
about this setting but for the current netlink interface this is an
invalid setting, so we do the same now. Maybe we need to change that
when we have coordinator support and association support.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:39 +01:00
Alexander Aring
ab0bd56172 ieee820154: add channel set support
This patch adds page and channel setting support to nl802154 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:38 +01:00
Alexander Aring
9d30a8cf98 ieee802154: cleanup cfg802154 intendation
This is patch is cleanup to have a similar indentation like cfg80211
implementation.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:37 +01:00
Alexander Aring
6322d50d87 mac802154: add wpan_phy priv id
This patch adds an unique id for an wpan_phy. This behaviour is mostly
grabbed from wireless stack. This is needed for upcomming patches which
identify the wpan netdev while NETDEV_CHANGENAME in netdev notify function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:36 +01:00
Joe Perches
6c91023dc3 irda: Remove IRDA_<TYPE> logging macros
And use the more common mechanisms directly.

Other miscellanea:

o Coalesce formats
o Add missing newlines
o Realign arguments
o Remove unnecessary OOM message logging as
  there's a generic stack dump already on OOM.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 18:11:00 -05:00
John W. Linville
6164c20228 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-11-11 16:30:48 -05:00
Joe Perches
d65c4e4e0a irda: Simplify IRDA logging macros
These are the same as net_<level>_ratelimited, so
use the more common style in the macro definition.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 16:21:59 -05:00
WANG Cong
d7480fd3b1 neigh: remove dynamic neigh table registration support
Currently there are only three neigh tables in the whole kernel:
arp table, ndisc table and decnet neigh table. What's more,
we don't support registering multiple tables per family.
Therefore we can just make these tables statically built-in.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 15:23:54 -05:00
Joe Perches
ba7a46f16d net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited
Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_<LEVEL> uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled.  Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings".  For backward compatibility,
the sysctl is not removed, but it has no function.  The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 14:10:31 -05:00
Eric Dumazet
2c8c56e15d net: introduce SO_INCOMING_CPU
Alternative to RPS/RFS is to use hardware support for multiple
queues.

Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.

Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.

We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.

After accept(), connect(), or even file descriptor passing around
processes, applications can use :

 int cpu;
 socklen_t len = sizeof(cpu);

 getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 13:00:06 -05:00
Jesse Gross
cfdf1e1ba5 udptunnel: Add SKB_GSO_UDP_TUNNEL during gro_complete.
When doing GRO processing for UDP tunnels, we never add
SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
is added (such as SKB_GSO_TCPV4). The result is that if the packet is
later resegmented we will do GSO but not treat it as a tunnel. This
results in UDP fragmentation of the outer header instead of (i.e.) TCP
segmentation of the inner header as was originally on the wire.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 15:09:45 -05:00
David S. Miller
b92172661e Merge tag 'master-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-11-07

Please pull this batch of updates intended for the 3.19 stream!

For the mac80211 bits, Johannes says:

"This relatively large batch of changes is comprised of the following:
 * large mac80211-hwsim changes from Ben, Jukka and a bit myself
 * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
   University in Prague and Volkswagen Group Research
 * minstrel VHT work from Karl
 * more CSA work from Luca
 * WMM admission control support in mac80211 (myself)
 * various smaller fixes, spelling corrections, and minor API additions"

For the Bluetooth bits, Johan says:

"Here's the first bluetooth-next pull request for 3.19. The vast majority
of patches are for ieee802154 from Alexander Aring with various fixes
and cleanups. There are also several LE/SMP fixes as well as improved
support for handling LE devices that have lost their pairing information
(the patches from Alfonso). Jukka provides a couple of stability fixes
for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers
we have one new USB ID for an Acer controller as well as a reset
handling fix for H5."

For the Atheros bits, Kalle says:

"Major changes are:

o ethtool support (Ben)

o print dev string prefix with debug hex buffers dump (Michal)

o debugfs file to read calibration data from the firmware verification
  purposes (me)

o fix fw_stats debugfs file, now results are more reliable (Michal)

o firmware crash counters via debugfs (Ben&me)

o various tracing points to debug firmware (Rajkumar)

o make it possible to provide firmware calibration data via a file (me)

And we have quite a lot of smaller fixes and clean up."

For the iwlwifi bits, Emmanuel says:

"The big new thing here is netdetect which allows the
firmware to wake up the platform when a specific network
is detected. Along with that I have fixes for d3 operation.
The usual amount of rate scaling stuff - we now support STBC.
The other commit that stands out is Johannes's work on
devcoredump. He basically starts to use the standard
infrastructure he built."

Along with that are the usual sort of updates and such for ath9k,
brcmfmac, wil6210, and a handful of other bits here and there...

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 14:34:59 -05:00
David S. Miller
1ef8019be8 net: Move bonding headers under include/net
This ways drivers like cxgb4 don't need to do ugly relative includes.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 13:27:49 -05:00
Johannes Berg
1f7bba79af mac80211: add back support for radiotap vendor namespace data
Radiotap vendor namespace data might still be useful, but we
reverted it because it used too much space in the RX status.
Put it back, but address the space problem by using a single
bit only and putting everything else into the skb->data.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:30:43 +01:00
Luciano Coelho
f8d7552e94 cfg80211: add channel switch started notification
Add a new NL80211_CH_SWITCH_STARTED_NOTIFY message that can be sent to
the userspace when a channel switch process has started.  This allows
userspace to take action, for instance, by requesting other interfaces
to switch channel as necessary.

This patch introduces a function that allows the drivers to send this
notification.  It should be used when the driver starts processing a
channel switch initiated by a remote device (eg. when a STA receives a
CSA from the AP) and when it successfully starts a userspace-triggered
channel switch (eg. when hostapd triggers a channel swith in the AP).

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:20:14 +01:00
Alexander Aring
3ae75e02c3 ieee802154: add new nl802154 header
This patch adds the new userspace header for nl802154. We don't place
this header in include/uapi now. This header could be modified in the
next time.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
fcf39e6e88 ieee802154: add wpan_dev_list
This patch adds a wpan_dev_list list into cfg802154_registered_device
struct. Also adding new wpan_dev into this list while
cfg802154_netdev_notifier_call. This behaviour is mostly grab from
wireless core.c implementation and is needed for preparing nl802154
framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
190ac1ca33 ieee802154: add iftype to wpan_dev
This patch adds an iftype argument to the wpan_dev. This is needed to
get the interface type from netdev ieee802154_ptr. The subif data struct
can only accessible in mac802154 branch.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
f601379fa1 ieee802154: rename wpan_phy_alloc
This patch renames the wpan_phy_alloc function to wpan_phy_new. This
naming convention is like wireless and "wiphy_new" function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
863e88f255 mac802154: move mac pib attributes into wpan_dev
This patch moves all mac pib attributes into the wpan_dev struct.
Furthermore we can easier access these attributes over the netdev
802154_ptr pointer. Currently this is only possible over a complicated
callback structure in mac802154 because subif data structure is
accessable inside mac802154 only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:27 +01:00
David S. Miller
4e84b496fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-06 22:01:18 -05:00
David S. Miller
6b798d70d0 Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:

====================
Open vSwitch

First two patches are related to OVS MPLS support. Rest of patches
are mostly refactoring and minor improvements to openvswitch.

v1-v2:
 - Fix conflicts due to "gue: Remote checksum offload"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 16:33:13 -05:00
Joe Perches
926c512685 sock.h: Remove unused NETDEBUG macro
It's unused now, just delete it.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 15:11:11 -05:00
Ryo Munakata
5816c3dafb net/9p: remove a comment about pref member which doesn't exist
Signed-off-by: Ryo Munakata <ryomnktml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 14:59:19 -05:00
Steffen Klassert
d50051407f ipv6: Allow sending packets through tunnels with wildcard endpoints
Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit
packets through an ipv6 tunnel. This capability is set when the
tunnel gets configured, based on the tunnel endpoint addresses.

On tunnels with wildcard tunnel endpoints, we need to do the
capabiltiy checking on a per packet basis like it is done in
the receive path.

This patch extends ip6_tnl_xmit_ctl() to take local and remote
addresses as parameters to allow for per packet capabiltiy
checking.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 14:19:19 -05:00
Simon Horman
25cd9ba0ab openvswitch: Add basic MPLS support to kernel
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:33 -08:00
WANG Cong
25de4668d0 ipv6: move INET6_MATCH() to include/net/inet6_hashtables.h
It is only used in net/ipv6/inet6_hashtables.c.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:59:04 -05:00
Tom Herbert
b17f709a24 gue: TX support for using remote checksum offload option
Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert
c1aa8347e7 gue: Protocol constants for remote checksum offload
Define a private flag for remote checksun offload as well as a length
for the option.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert
5024c33ac3 gue: Add infrastructure for flags and options
Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert
63487babf0 net: Move fou_build_header into fou.c and refactor
Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.

Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:02 -05:00
Alexander Aring
dee56d1477 mac802154: add support for perm_extended_addr
This patch adding support for a perm extended address. This is useful
when a device supports an eeprom with a programmed static extended address.
If a device doesn't support such eeprom or serial registers then the
driver should generate a random extended address.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:05 +01:00
Alexander Aring
705cbbbe9c mac802154: cleanup ieee802154_netdev_to_extended_addr
This patch cleanups the ieee802154_be64_to_le64 to have a similar
function like ieee802154_le64_to_be64 only with switched source and
destionation types.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:05 +01:00
Alexander Aring
239a84a9a0 mac802154: add ieee802154_le64_to_be64
This patch adds a new function to convert a le64 to a be64. This is
useful to translate an extended address to a netdev dev_addr.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:05 +01:00
Alexander Aring
7c118c1a86 mac802154: add ieee802154_vif struct
This patch adds an ieee802154_vif similar like the ieee80211_vif which
holds the interface type and maybe further more attributes like the
ieee80211_vif structure.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring
bd28a11f25 ieee802154: remove mlme get_phy callback
This patch removes the get_phy callback from mlme ops structure. Instead
we doing a dereference via ieee802154_ptr dev pointer. For backwards
compatibility we need to run get_device after dereference wpan_phy via
ieee802154_ptr.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring
d5ae67bacd ieee802154: rework interface registration
This patch meld mac802154_netdev_register into ieee802154_if_add
function. Also we have now only one alloc_netdev call with one interface
setup routine "ieee802154_if_setup" instead two different one for each
interface type. This patch checks via runtime the interface type and do
different handling now. Additional we add the wpan_dev struct in
ieee802154_sub_if_data and set the new ieee802154_ptr while netdev
registration. This behaviour is very similar the mac80211 netdev
registration functionality.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring
9f3295b9ea ieee802154: remove nl802154 unused functions
The include/net/nl802154.h file contains a lot of prototypes which are
not used inside of ieee802154 subsystem. This patch removes this file
and make the only one used prototype "ieee802154_nl_start_confirm" as
static declaration in ieee802154/nl-mac.c

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Alexander Aring
53f9ee61b4 ieee802154: rework wpan_phy index assignment
This patch reworks the wpan_phy index incrementation. It's now similar
like wireless wiphy index incrementation. We move the wpan_phy index
attribute inside of cfg802154_registered_device and use atomic
operations instead locking mechanism via wpan_phy_mutex.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Joe Perches
824f1fbee7 netfilter: Convert print_tuple functions to return void
Since adding a new function to seq_file (seq_has_overflowed())
there isn't any value for functions called from seq_show to
return anything.   Remove the int returns of the various
print_tuple/<foo>_print_tuple functions.

Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:10:33 -05:00
Steven Rostedt (Red Hat)
37246a5837 netfilter: Remove return values for print_conntrack callbacks
The seq_printf() and friends are having their return values removed.
The print_conntrack() returns the result of seq_printf(), which is
meaningless when seq_printf() returns void. Might as well remove the
return values of print_conntrack() as well.

Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:09:47 -05:00
John W. Linville
bf515fb11a This relatively large batch of changes is comprised of the
following:
  * large mac80211-hwsim changes from Ben, Jukka and a bit myself
  * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
    University in Prague and Volkswagen Group Research
  * minstrel VHT work from Karl
  * more CSA work from Luca
  * WMM admission control support in mac80211 (myself)
  * various smaller fixes, spelling corrections, and minor API additions
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUWMs/AAoJEDBSmw7B7bqrmBQQAIbfAe7wH1WifRtOnhw3zWQQ
 K36+Edf3HlQ+EIkSs63QousRj2e7pGDOyhzMWLaqsmeTLteUtlGbr7qwiJO1QZdf
 Ml2V5O2s+b8hUIClDBVQF2L6+GGUmRUdQqvDDhkN1guoxD/Nk8cNtsRkSdiXWJWy
 R48NzvYDflBhc8uqPtR8jDb10eM3c00YP9HB+w9hYAfizD+FRue7UNp4MQIqwp9V
 HdKRT6L2n/6QA+Mzse0rMDes5qI7nIUNgj+hjqgJSnhITPMgGR5j/pitnVHrr81M
 ngOipBFG3svsQrwZh8nM4Llp0cM4Gs+GlgCieu9+TJpr2sY00Z3kYcp0pxtDoSxz
 Wblqz9n/bnW9mrkEfl12XqwwT5vguchwHoZ9cXhejDxSawWXoTRx20uW4ahO8ArA
 kWwwjTBVsQ5WMCtOBiqggzNKghwCc2ILmcZnjGdg9aNXcWsmQ4vyeCfG2QxBz/UB
 Grv/f9NSy6mzKQ34yv+lyR7rFZ8XcT03EVAnZSYz8X0ZZGxwtFupRp1RrBh1KPtD
 TJoe6Q71FfHKYRJ2xgygYkQFo+r9d0BKBeerq+Vu2hBeaqyi4aUwSj7d1sUaaq6N
 tL8fmAUqFjVOOUFeH1g07Xke5QD+yrEC7sJKkeRMfcRGB+dEa+2m3I5p4WDz9bWM
 AEvFSsYr/I9KI4d1huXD
 =6GIj
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"This relatively large batch of changes is comprised of the
following:
 * large mac80211-hwsim changes from Ben, Jukka and a bit myself
 * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
   University in Prague and Volkswagen Group Research
 * minstrel VHT work from Karl
 * more CSA work from Luca
 * WMM admission control support in mac80211 (myself)
 * various smaller fixes, spelling corrections, and minor API additions"

Conflicts:
	drivers/net/wireless/ath/wil6210/cfg80211.c

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-04 16:18:12 -05:00
Florian Westphal
f7b3bec6f5 net: allow setting ecn via routing table
This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.

One can use 'ip route change dev $dev $net features ecn' to toggle this.

Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.

There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:

Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.

It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.

We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.

Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).

 [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
 [2] http://ecn.ethz.ch/

Joint work with Daniel Borkmann.

Reference: http://thread.gmane.org/gmane.linux.network/335797
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:09 -05:00
Florian Westphal
f1673381b1 syncookies: split cookie_check_timestamp() into two functions
The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.

We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.

While this is the natural place for decoding options such as ECN which
commit 172d69e63c ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.

Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.

This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.

This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.

Joint work with Daniel Borkmann.

Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:09 -05:00
Eliad Peller
cf2c92d840 mac80211: replace restart_complete() with reconfig_complete()
Drivers might want to know also when mac80211 has
completed reconfiguring after resume (e.g. in order
to know when frames can be passed to mac80211).

Rename restart_complete() to a more-generic reconfig_complete(),
and add a new enum to indicate the reconfiguration type.

Update the current users with the new prototype.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:49:00 +01:00
Rostislav Lisovy
239281f803 mac80211: 802.11p OCB mode support
This patch adds 802.11p OCB (Outside the Context of a BSS) mode
support.

When communicating in OCB mode a mandatory wildcard BSSID
(48 '1' bits) is used.

The EDCA parameters handling function was changed to support
802.11p specific values.

The insertion of a newly discovered STAs is done in the similar way
as in the IBSS mode -- through the deferred insertion.

The OCB mode uses a periodic 'housekeeping task' for expiration of
disconnected STAs (in the similar manner as in the MESH mode).

New Kconfig option for verbose OCB debugging outputs is added.

Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:18:21 +01:00
Rostislav Lisovy
6e0bd6c35b cfg80211: 802.11p OCB mode handling
This patch adds new iface type (NL80211_IFTYPE_OCB) representing
the OCB (Outside the Context of a BSS) mode.
When establishing a connection to the network a cfg80211_join_ocb
function is called (particular nl80211_command is added as well).
A mandatory parameters during the ocb_join operation are 'center
frequency' and 'channel width (5/10 MHz)'.

Changes done in mac80211 are minimal possible required to avoid
many warnings (warning: enumeration value 'NL80211_IFTYPE_OCB'
not handled in switch) during compilation. Full functionality
(where needed) is added in the following patch.

Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:18:17 +01:00
Felix Fietkau
5b3dc42b1b mac80211: add support for driver tx power reporting
The configured tx power is often limited by hardware capabilities,
channel settings, antenna configuration, etc.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[fix tracing compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 10:15:09 +01:00
Marcel Holtmann
845472e8d5 Bluetooth: Add hci_conn_lookup_type() helper function
Some drivers require knowledge of what connection handle is assigned
to what connection link type (ACL or SCO/eSCO). Instead of having each
driver implement connection tracking, provide a simple helper function
for lookup of the link type.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-03 10:13:43 +02:00
Marcel Holtmann
43d6bc46e2 Bluetooth: Introduce HCI_QUIRK_STRICT_DUPLICATE_FILTER
Some vendors decide to use a strict duplicate filter policy that only
filters on Bluetooth device addresses. This means that when the RSSI
changes, these devices are not reported again. During discovery it is
useful to actually get the RSSI updates.

Since this is specific to each controller, add a new quirk setting
that allows drivers to tell the core what kind of filtering policy
the controller uses.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-03 10:13:36 +02:00
Alexander Aring
f753f7eeb4 mac802154: fix byteorder issues
Running make C=2 occurs these warnings:

cast from restricted __be64
incorrect type in argument 1 (different base types)
expected unsigned long long[unsigned] [usertype] val
got restricted __be64 [usertype]<noident>
cast from restricted __be64
cast to restricted __le64

This patch fix these warnings by forcing to __le64 type and using swabp64
instead swab64.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 21:52:02 +01:00
Marcel Holtmann
75e0569f7f Bluetooth: Add hci_reset_dev() for driver triggerd stack reset
Some Bluetooth drivers require to reset the upper stack. To avoid having
all drivers send HCI Hardware Error events, provide a generic function
to wrap the reset functionality.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-02 10:03:45 +02:00
Marcel Holtmann
24dfa34371 Bluetooth: Print error message for HCI_Hardware_Error event
When the HCI_Hardware_Error event is send by the controller or
injected by the driver, then at least print an error message.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-02 09:59:42 +02:00
Alexander Aring
ab24f50f2a mac802154: add helper for converting dev_addr
This patch adds a helper for converting the dev_addr attribute in
netdevice to __le64 type. The dev_addr attribute is a char pointer
and contains the extended address in big endian byte order.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:07 +01:00
Alexander Aring
4a9a816a4f cfg802154: convert deprecated iface add and del
This patch removes the wpan_phy callbacks for add and del an interface
on a phy. Instead we introduce deprecated cfg802154 callbacks for this.
Furthermore we introduce a new netlink interface nl802154 which use
different callbacks. The deprecated function is to have a backwards
compatibility with the current netlink interface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
Alexander Aring
a5dd1d72d8 cfg802154: introduce cfg802154_registered_device
This patch introduce the cfg802154_registered_device struct. Like
cfg80211_registered_device in wireless this should contain similar
functionality for cfg802154. This patch should not change any behaviour.
We just adds cfg802154_registered_device as container for wpan_phy struct.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
David S. Miller
55b42b5ca2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/marvell.c

Simple overlapping changes in drivers/net/phy/marvell.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-01 14:53:27 -04:00
David S. Miller
e3a88f9c4f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains fixes for netfilter/ipvs. This round of
fixes is larger than usual at this stage, specifically because of the
nf_tables bridge reject fixes that I would like to see in 3.18. The
patches are:

1) Fix a null-pointer dereference that may occur when logging
   errors. This problem was introduced by 4a4739d56b ("ipvs: Pull
   out crosses_local_route_boundary logic") in v3.17-rc5.

2) Update hook mask in nft_reject_bridge so we can also filter out
   packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
   to filter from prerouting and postrouting"), which needs this chunk
   to work.

3) Two patches to refactor common code to forge the IPv4 and IPv6
   reject packets from the bridge. These are required by the nf_tables
   reject bridge fix.

4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
   packets from the bridge. The idea is to forge the reject packets and
   inject them to the original port via br_deliver() which is now
   exported for that purpose.

5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
   the original skbuff may cloned after prerouting when the bridge stack
   needs to flood it to several bridge ports, it is too late to reject
   the traffic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 12:29:42 -04:00
Pablo Neira Ayuso
8bfcdf6671 netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_ip6hdr_put(): to build the IPv6 header.
* nf_reject_ip6_tcphdr_put(): to build the TCP header.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:49:57 +01:00
Pablo Neira Ayuso
052b9498ee netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:49:05 +01:00
Ben Hutchings
5188cd44c5 drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway.  Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 20:01:18 -04:00
Guenter Roeck
3d762a0f0a net: dsa: Add support for reading switch registers with ethtool
Add support for reading switch registers with 'ethtool -d'.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Guenter Roeck
6793abb4e8 net: dsa: Add support for switch EEPROM access
On some chips it is possible to access the switch eeprom.
Add infrastructure support for it.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Guenter Roeck
51579c3f1a net: dsa: Add support for reporting switch chip temperatures
Some switches provide chip temperature data.
Add support for reporting it through the hwmon subsystem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Alexander Aring
ec718f3db9 mac802154: rx: add software checksum filtering check
This patch adds a new hardware flag which indicate that the transceiver
doesn't support check for bad checksum via hardware. Also add a handling of
this while receive.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
90386a7e3b mac802154: separate omit tx/rx flags
This patch splits the IEEE802154_HW_OMIT_CKSUM hardware flag into
IEEE802154_HW_TX_OMIT_CKSUM and IEEE802154_HW_RX_OMIT_CKSUM. This is
useful to deliver the received crc from the driver layer to the monitor
interface. At the moment we can't do that without change the xmit
handling.

The received checksum should be visible in monitor mode only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
94b792220c mac802154: add support for promiscuous mode
This patch adds a new driver operation to bring the transceiver into
promiscuous mode.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
c8fc84ed60 mac802154: add hardware address filter flag
Overdue introduction for address filtering hardware flag. Furthermore we
will check and set address filtering on interface up. This patch
prepares that we can check if an transceiver supports address filtering
option. Currently all mainline driver supports hardware address filtering.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Cc: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:44 +01:00
Alexander Aring
ab79be3eeb mac802154: add IEEE802154_HW_ARET hw flag
This patch adds a new IEEE802154_HW_ARET hardware flag for indicating
that the transceiver supports ARET handling. Also remove the
IEEE802154_HW_FRAME_RETRIES from IEEE802154_HW_CSMA flag. Frame retries
handling is part of ARET.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:44 +01:00
Alexander Aring
90a6161df5 mac802154: remove tab after define
This patch removes tabs after define in hardware flags declarations.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:44 +01:00
Nicolas Dichtel
75fbfd3323 neigh: optimize neigh_parms_release()
In neigh_parms_release() we loop over all entries to find the entry given in
argument and being able to remove it from the list. By using a double linked
list, we can avoid this loop.

Here are some numbers with 30 000 dummy interfaces configured:

Before the patch:
$ time rmmod dummy
real	2m0.118s
user	0m0.000s
sys	1m50.048s

After the patch:
$ time rmmod dummy
real	1m9.970s
user	0m0.000s
sys	0m47.976s

Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 16:11:50 -04:00
Eric Dumazet
dca145ffaa tcp: allow for bigger reordering level
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.

Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.

Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.

[1] Aggregation of links, per packet load balancing, fabrics not doing
 deep packet inspections, alternative TCP congestion modules...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:05:15 -04:00
Alexander Aring
a543c5989d mac802154: remove driver ops in wpan-phy
This patch removes the driver ops callbacks inside of wpan_phy struct.
It was used to check if a phy supports this driver ops call. We do this
now via hardware flags.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring
e37d2ec82a mac802154: ops: declare channel and page as u8
The range of channel and page fits into an unsigned byte range. This
patch changes the set_channel parameter definitions for channel and
page to u8.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring
1630186100 mac802154: declare struct ieee802154_ops as const
The ieee802154_ops structure should be never changed during runtime.
This patch declare this structure as const to avoid a runtime change.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Simon Horman
941d8ebcf7 datapath: Rename last_action() as nla_is_last() and move to netlink.h
The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.

It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.

Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:07:29 -04:00
Peter Zijlstra
26cabd3125 sched, net: Clean up sk_wait_event() vs. might_sleep()
WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0

 [<ffffffff8105bc38>] __might_sleep+0x58/0x90
 [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
 [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0

Which is a false positive because sk_wait_event() will already have
TASK_RUNNING at that point if it would've gone through
schedule_timeout().

So annotate with sched_annotate_sleep(); which goes away on !DEBUG builds.

Reported-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140924082242.524407432@infradead.org
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: netdev@vger.kernel.org
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:56:37 +01:00
Arturo Borrero
e9105f1bea netfilter: nf_tables: add new expression nft_redir
This new expression provides NAT in the redirect flavour, which is to
redirect packets to local machine.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:49:39 +01:00
Arturo Borrero
9de920eddb netfilter: refactor NAT redirect IPv6 code to use it from nf_tables
This patch refactors the IPv6 code so it can be usable both from xt and
nf_tables.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:48:10 +01:00
Arturo Borrero
8b13eddfdf netfilter: refactor NAT redirect IPv4 to use it from nf_tables
This patch refactors the IPv4 code so it can be usable both from xt and
nf_tables.

A similar patch follows-up to handle IPv6.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:47:06 +01:00
Fabian Frederick
5e96d788d9 ipx: move extern sysctl_ipx_pprop_broadcasting to header file
include ipx.h from sysctl_net_ipx.c

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 16:03:53 -04:00
Alexander Aring
c5c47e67bc mac802154: rx: use tasklet instead workqueue
Tasklets have much less overhead than workqueues. This patch also
removes the heap allocation for the worker on receiving path.
Like mac80211 we should prefer use a tasklet here instead a workqueue to
getting fast out of interrupt context when ieee802154_rx_irqsafe is
called by driver. Like wireless inside the tasklet context we should
call netif_receive_skb instead netif_rx_ni anymore.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:40 +01:00
Martin Townsend
01141234f2 ieee802154: 6lowpan: rename process_data and lowpan_process_data
As we have decouple decompression from data delivery we can now rename all
occurences of process_data in receive path.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 15:51:16 +01:00
Martin Townsend
f8b361768e 6lowpan: remove skb_deliver from IPHC
Separating skb delivery from decompression ensures that we can support further
decompression schemes and removes the mixed return value of error codes with
NET_RX_FOO.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 15:51:15 +01:00
Ben Greear
e8f479b112 cfg80211: support configuring vif mac addr on create
This is useful when creating virtual interfaces.
Keeps udev from mucking with things it shouldn't, since
the default MAC is never seen by udev when specified on
the cmd-line during creation.

Signed-off-by: Ben Greear <greearb@candelatech.com>
[check for feature flag in nl80211 to force drivers to set it]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:33 +01:00
Ben Greear
e27513fbd0 mac80211: support creating wiphy w/out creating wlanX
This will be helpful when using the mac80211_hwsim
wiphys and automated testing.  Let user create the
vifs as needed, and named as expected.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:32 +01:00
Ben Greear
ad28757eef mac80211: allow creating wiphy devices with suggested name
Support creating wiphy devices with an optional name.
This will be used by hwsim to have better automated control
over virtual radio creation/deletion.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:31 +01:00
Ben Greear
1998d90ad4 cfg80211: support creating wiphy with suggested name
Kernel will attempt to use the name if it is supplied,
but if name cannot be used for some reason, the default
phyX name will be used instead.

Signed-off-by: Ben Greear <greearb@candelatech.com>
[while at it, use wiphy_name() instead of dev_name(),
 fix format string issue reported by Kees Cook]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:18 +01:00
Alexander Aring
1e7283a271 mac802154: tx: add comment at sync xmit callback
This patch adds a warning that xmit_sync callback is deprecated and
should be removed soon. The 802.15.4 subsystem will not accept synced
drivers anymore.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:05 +01:00
Alexander Aring
ed0a5dce0c mac802154: tx: add support for xmit_async callback
This patch renames the existsing xmit callback to xmit_sync and
introduces an asynchronous xmit_async function. If ieee802154_ops
doesn't provide the xmit_async callback, then we have a fallback to
the xmit_sync callback.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:04 +01:00
Alexander Aring
c208510351 mac802154: add netdev qeue helpers
This patch adds a new file net/mac802154/util.c which contains utility
functions for drivers, etc. This file contains functions to start and
stop queues for all virtual interfaces, this is useful for asynchronous
handling by driver level.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:03 +01:00
Alexander Aring
dc67c6b30f mac802154: tx: remove xmit channel context switch
This patch removes the channel hopping feature before xmit. There are
several issues to provide a real channel hopping (timing requirements,
etc...).

We don't have any known kernelspace protocol which really use this
feature. And I don't know an real user of this feature.
We simply drop this feature now.

This patch removes also the hold of pib lock which isn't needed by any
real driver xmit callback implementation.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:03 +01:00
Alexander Aring
c6f635faf3 mac802154: remove ieee802154_addr from driver_ops
This driver_ops callback function is never used by any driver.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:39 +02:00
Alexander Aring
5a50439775 ieee802154: rename ieee802154_dev to ieee802154_hw
The identical struct of the wireless stack implementation is named
ieee80211_hw. This is useful to name the variable hw instead of get
confusing with netdev dev variable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:37 +02:00
Alexander Aring
4ca24aca55 ieee802154: move ieee802154 header
This patch moves the ieee802154 header into include/linux instead
include/net. Similar like wireless which have the ieee80211 header
inside of include/linux.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:57 +02:00
Alexander Aring
5ad60d3699 ieee802154: move wpan-phy.h to cfg802154.h
The wpan-phy header contains the wpan_phy struct information. Later this
header will be have similar function like cfg80211 header. The cfg80211
header contains the wiphy struct which is identically the wpan_phy
struct inside 802.15.4 subsystem.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:56 +02:00
Alexander Aring
e72740d057 ieee802154: wpan-phy: use blank line after function
This patch adds a blank line after function declaration.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring
4896167d97 ieee802154: wpan-phy: change to __aligned(size)
This patch fix a checkpatch warning that __aligned(size) is preferred
over __attribute__((aligned(size))).

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring
57205c14ca mac802154: fix typo IEEE802515 to IEEE802154
This patch fixs a typo in address filter defines from IEEE802515 to
IEEE802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring
b3020f0a35 ieee802154: mac802154: remove FSF address
This patch removes the FSF address in files which belongs to ieee802154
and mac802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alfonso Acosta
89cbb0638e Bluetooth: Defer connection-parameter removal when unpairing
Systematically removing the LE connection parameters and autoconnect
action is inconvenient for rebonding without disconnecting from
userland (i.e. unpairing followed by repairing without
disconnecting). The parameters will be lost after unparing and
userland needs to take care of book-keeping them and re-adding them.

This patch allows userland to forget about parameter management when
rebonding without disconnecting. It defers clearing the connection
parameters when unparing without disconnecting, giving a chance of
keeping the parameters if a repairing happens before the connection is
closed.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Alfonso Acosta
fd45ada910 Bluetooth: Include ADV_IND report in Device Connected event
There are scenarios when autoconnecting to a device after the
reception of an ADV_IND report (action 0x02), in which userland
might want to examine the report's contents.

For instance, the Service Data might have changed and it would be
useful to know ahead of time before starting any GATT procedures.
Also, the ADV_IND may contain Manufacturer Specific data which would
be lost if not propagated to userland. In fact, this patch results
from the need to rebond with a device lacking persistent storage which
notifies about losing its LTK in ADV_IND reports.

This patch appends the ADV_IND report which triggered the
autoconnection to the EIR Data in the Device Connected event.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Alfonso Acosta
48ec92fa4f Bluetooth: Refactor arguments of mgmt_device_connected
The values of a lot of the mgmt_device_connected() parameters come
straight from a hci_conn object. We can simplify the function by passing
the full hci_conn pointer to it.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:23 +02:00
Sébastien Barré
16704b129b Removed unused function sctp_addr_is_valid()
sctp_addr_is_valid() only appeared in its definition.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 00:37:21 -04:00
Kenjiro Nakayama
105970f608 net: Remove trailing whitespace in tcp.h icmp.c syncookies.c
Remove trailing whitespace in tcp.h icmp.c syncookies.c

Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 00:13:10 -04:00
Arik Nemtsov
0fc1e0495f mac80211: expose API allowing station iteration
Allow drivers to iterate all stations currently uploaded to them.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:40:03 +02:00
Arik Nemtsov
8b94148cfe mac80211: expose TDLS-initiator value to low level driver
Some drivers need to know which station is the TDLS link initiator.
Expose this value via the mac80211 ieee80211_sta structure.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:40:02 +02:00
Andrei Otcheretianski
a7f3a76828 mac80211: export IE splitting function
Export ieee80211_ie_split function, so it can be reused by
drivers which need to insert additional elements.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:39:43 +02:00
Johannes Berg
02219b3abc mac80211: add WMM admission control support
Use the currently existing APIs between mac80211 and the low
level driver to implement WMM admission control.

The low level driver needs to report the media time used by
each transmitted packet in ieee80211_tx_status. Based on that
information, mac80211 will modify the QoS parameters of the
admission controlled Access Category when the limit is
reached. Once the original QoS parameters can be restored,
mac80211 will do so.

One issue with this approach is that management frames will
also erroneously be downgraded, but the upside is that the
implementation is simple. In the future, it can be extended
to driver- or device-based implementations that are better.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-22 10:42:09 +02:00
Johannes Berg
723e73acd1 cfg80211: make WMM TSPEC support flag an nl80211 feature flag
During the review of the corresponding wpa_supplicant patches we
noticed that the only way for it to detect that this functionality
is supported currently is to check for the command support. This
can be misleading though, as the command was also designed to, in
the future, support pure 802.11 TSPECs.

Expose the WMM-TSPEC feature flag to nl80211 so later we can also
expose an 802.11-TSPEC feature flag (if needed) to differentiate
the two cases.

Note: this change isn't needed in 3.18 as there's no driver there
yet that supports the functionality at all.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-22 10:41:49 +02:00
David S. Miller
ce8ec48967 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains netfilter fixes for your net tree,
they are:

1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.

2) Restrict nat and masq expressions to the nat chain type. Otherwise,
   users may crash their kernel if they attach a nat/masq rule to a non
   nat chain.

3) Fix hook validation in nft_compat when non-base chains are used.
   Basically, initialize hook_mask to zero.

4) Make sure you use match/targets in nft_compat from the right chain
   type. The existing validation relies on the table name which can be
   avoided by

5) Better netlink attribute validation in nft_nat. This expression has
   to reject the configuration when no address and proto configurations
   are specified.

6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
   Yet another sanity check to reject incorrect configurations from
   userspace.

7) Conditional NAT attribute dumping depending on the existing
   configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-20 11:57:47 -04:00
Jouni Malinen
988568669d cfg80211: Specify frame and reason code for NL80211_CMD_DEL_STATION
The optional NL80211_ATTR_MGMT_SUBTYPE and NL80211_ATTR_REASON_CODE
attributes can now be included in NL80211_CMD_DEL_STATION to indicate to
the driver which frame (Deauthentication/Disassociation) and reason code
in that frame should be used to indicate removal to the specific
station. This is used by drivers that implement AP SME and generate
those frames internally.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 16:39:23 +02:00
Jouni Malinen
89c771e5a6 cfg80211: Convert del_station() callback to use a param struct
This makes it easier to add new parameters for the del_station calls
without having to modify all drivers that use this.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 16:24:21 +02:00
Linus Torvalds
e25b492741 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A quick batch of bug fixes:

  1) Fix build with IPV6 disabled, from Eric Dumazet.

  2) Several more cases of caching SKB data pointers across calls to
     pskb_may_pull(), thus referencing potentially free'd memory.  From
     Li RongQing.

  3) DSA phy code tests operation presence improperly, instead of going:

        if (x->ops->foo)
                r = x->ops->foo(args);

     it was going:

        if (x->ops->foo(args))
                r = x->ops->foo(args);

   Fix from Andew Lunn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  Net: DSA: Fix checking for get_phy_flags function
  ipv6: fix a potential use after free in sit.c
  ipv6: fix a potential use after free in ip6_offload.c
  ipv4: fix a potential use after free in gre_offload.c
  tcp: fix build error if IPv6 is not enabled
2014-10-19 11:41:57 -07:00
Eric Dumazet
815afe1785 tcp: fix build error if IPv6 is not enabled
$ make M=net/ipv4
  CC      net/ipv4/route.o
In file included from net/ipv4/route.c:102:0:
include/net/tcp.h: In function ‘tcp_v6_iif’:
include/net/tcp.h:738:32: error: ‘union <anonymous>’ has no member named ‘h6’
  return TCP_SKB_CB(skb)->header.h6.iif;

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 870c315138 ("ipv6: introduce tcp_v6_iif()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-18 13:01:36 -04:00
Linus Torvalds
2e923b0251 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Include fixes for netrom and dsa (Fabian Frederick and Florian
    Fainelli)

 2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.

 3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
    ip_tunnel, fou), from Li ROngQing.

 4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.

 5) Use after free in virtio_net, from Michael S Tsirkin.

 6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
    Shelar.

 7) ISDN gigaset and capi bug fixes from Tilman Schmidt.

 8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.

 9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.

10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
    Wang and Eric Dumazet.

11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
    chunks, from Daniel Borkmann.

12) Don't call sock_kfree_s() with NULL pointers, this function also has
    the side effect of adjusting the socket memory usage.  From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
  bna: fix skb->truesize underestimation
  net: dsa: add includes for ethtool and phy_fixed definitions
  openvswitch: Set flow-key members.
  netrom: use linux/uaccess.h
  dsa: Fix conversion from host device to mii bus
  tipc: fix bug in bundled buffer reception
  ipv6: introduce tcp_v6_iif()
  sfc: add support for skb->xmit_more
  r8152: return -EBUSY for runtime suspend
  ipv4: fix a potential use after free in fou.c
  ipv4: fix a potential use after free in ip_tunnel_core.c
  hyperv: Add handling of IP header with option field in netvsc_set_hash()
  openvswitch: Create right mask with disabled megaflows
  vxlan: fix a free after use
  openvswitch: fix a use after free
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ipv4: clean up cookie_v4_check()
  ipv4: share tcp_v4_save_options() with cookie_v4_check()
  ipv4: call __ip_options_echo() in cookie_v4_check()
  atm: simplify lanai.c by using module_pci_driver
  ...
2014-10-18 09:31:37 -07:00
Florian Fainelli
a28205437b net: dsa: add includes for ethtool and phy_fixed definitions
net/dsa/slave.c uses functions and structures declared in phy_fixed.h
but does not explicitely include it, while dsa.h needs structure
declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix
those by including the correct header files.

Fixes: ec9436baed ("net: dsa: allow drivers to do link adjustment")
Fixes: ce31b31c68 ("net: dsa: allow updating fixed PHY link information")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:54:46 -04:00
Eric Dumazet
870c315138 ipv6: introduce tcp_v6_iif()
Commit 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line
misses") added a regression for SO_BINDTODEVICE on IPv6.

This is because we still use inet6_iif() which expects that IP6 control
block is still at the beginning of skb->cb[]

This patch adds tcp_v6_iif() helper and uses it where necessary.

Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
parameter to it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:48:07 -04:00
Cong Wang
461b74c391 ipv4: clean up cookie_v4_check()
We can retrieve opt from skb, no need to pass it as a parameter.
And opt should always be non-NULL, no need to check.

Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 12:02:57 -04:00
Cong Wang
e25f866fbc ipv4: share tcp_v4_save_options() with cookie_v4_check()
cookie_v4_check() allocates ip_options_rcu in the same way
with tcp_v4_save_options(), we can just make it a helper function.

Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 12:02:57 -04:00
Nicolas Dichtel
2c6ba4b15b netlink: fix description of portid
Avoid confusion between pid and portid.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-16 14:52:35 -04:00
Linus Torvalds
0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Li RongQing
02ea80741a ipv6: remove aca_lock spinlock from struct ifacaddr6
no user uses this lock.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 13:15:15 -04:00
Daniel Borkmann
b69040d8e3 net: sctp: fix panic on duplicate ASCONF chunks
When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 12:46:22 -04:00
Daniel Borkmann
9de7922bc7 net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
Commit 6f4c618ddb ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:<NULL>
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 <IRQ>
 [<ffffffff8144fb1c>] skb_put+0x5c/0x70
 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
 [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
 [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
 [<ffffffff81496ac5>] ip_rcv+0x275/0x350
 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
 [<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param->param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 12:46:22 -04:00
Andy Shevchenko
5df1415aee lib80211: remove unused print_ssid()
In kernel we have %*pE specifier to print an escaped buffer.  All users
now switched to that approach.

This fixes a bug as well.  The current implementation wrongly prints
octal numbers: only two first digits are used in case when 3 are
required and the rest of the string ends up cut off.

Additionally by default the \f, \v, \a, and \e are escaped to their
alphabetic representation.  It's safe to do since it is currently used
for messaging only.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "John W . Linville" <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:27 +02:00
Pablo Neira Ayuso
7210e4e38f netfilter: nf_tables: restrict nat/masq expressions to nat chain type
This adds the missing validation code to avoid the use of nat/masq from
non-nat chains. The validation assumes two possible configuration
scenarios:

1) Use of nat from base chain that is not of nat type. Reject this
   configuration from the nft_*_init() path of the expression.

2) Use of nat from non-base chain. In this case, we have to wait until
   the non-base chain is referenced by at least one base chain via
   jump/goto. This is resolved from the nft_*_validate() path which is
   called from nf_tables_check_loops().

The user gets an -EOPNOTSUPP in both cases.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-13 20:42:00 +02:00
Linus Torvalds
ca321885b0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "This set fixes a bunch of fallout from the changes that went in during
  this merge window, particularly:

   - Fix fsl_pq_mdio (Claudiu Manoil) and fm10k (Pranith Kumar) build
     failures.

   - Several networking drivers do atomic_set() on page counts where
     that's not exactly legal.  From Eric Dumazet.

   - Make __skb_flow_get_ports() work cleanly with unaligned data, from
     Alexander Duyck.

   - Fix some kernel-doc buglets in rfkill and netlabel, from Fabian
     Frederick.

   - Unbalanced enable_irq_wake usage in bcmgenet and systemport
     drivers, from Florian Fainelli.

   - pxa168_eth needs to depend on HAS_DMA, from Geert Uytterhoeven.

   - Multi-dequeue in the qdisc layer severely bypasses the fairness
     limits the previous code used to enforce, reintroduce in a way that
     at the same time doesn't compromise bulk dequeue opportunities.
     From Jesper Dangaard Brouer.

   - macvlan receive path unnecessarily hops through a softirq by using
     netif_rx() instead of netif_receive_skb().  From Jason Baron"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits)
  net: systemport: avoid unbalanced enable_irq_wake calls
  net: bcmgenet: avoid unbalanced enable_irq_wake calls
  net: bcmgenet: fix off-by-one in incrementing read pointer
  net: fix races in page->_count manipulation
  mlx4: fix race accessing page->_count
  ixgbe: fix race accessing page->_count
  igb: fix race accessing page->_count
  fm10k: fix race accessing page->_count
  net/phy: micrel: Add clock support for KSZ8021/KSZ8031
  flow-dissector: Fix alignment issue in __skb_flow_get_ports
  net: filter: fix the comments
  Documentation: replace __sk_run_filter with __bpf_prog_run
  macvlan: optimize the receive path
  macvlan: pass 'bool' type to macvlan_count_rx()
  drivers: net: xgene: Add 10GbE ethtool support
  drivers: net: xgene: Add 10GbE support
  drivers: net: xgene: Preparing for adding 10GbE support
  dtb: Add 10GbE node to APM X-Gene SoC device tree
  Documentation: dts: Update section header for APM X-Gene
  MAINTAINERS: Update APM X-Gene section
  ...
2014-10-11 21:19:00 -04:00
David S. Miller
7b6fa1eef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter fixes for net-next

This batch contains two fixes for what you have in your net-next,
they are:

1) Remove nf_send_reset6() from header file. This function now resides
   in the nf_reject_ipv6 module. Reported by Eric Dumazet.

2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix
   errors reported by Dan Carpenter's static analysis tools.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:01:09 -04:00
Linus Torvalds
c798360cd1 Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:
 "A lot of activities on percpu front.  Notable changes are...

   - percpu allocator now can take @gfp.  If @gfp doesn't contain
     GFP_KERNEL, it tries to allocate from what's already available to
     the allocator and a work item tries to keep the reserve around
     certain level so that these atomic allocations usually succeed.

     This will replace the ad-hoc percpu memory pool used by
     blk-throttle and also be used by the planned blkcg support for
     writeback IOs.

     Please note that I noticed a bug in how @gfp is interpreted while
     preparing this pull request and applied the fix 6ae833c7fe
     ("percpu: fix how @gfp is interpreted by the percpu allocator")
     just now.

   - percpu_ref now uses longs for percpu and global counters instead of
     ints.  It leads to more sparse packing of the percpu counters on
     64bit machines but the overhead should be negligible and this
     allows using percpu_ref for refcnting pages and in-memory objects
     directly.

   - The switching between percpu and single counter modes of a
     percpu_ref is made independent of putting the base ref and a
     percpu_ref can now optionally be initialized in single or killed
     mode.  This allows avoiding percpu shutdown latency for cases where
     the refcounted objects may be synchronously created and destroyed
     in rapid succession with only a fraction of them reaching fully
     operational status (SCSI probing does this when combined with
     blk-mq support).  It's also planned to be used to implement forced
     single mode to detect underflow more timely for debugging.

  There's a separate branch percpu/for-3.18-consistent-ops which cleans
  up the duplicate percpu accessors.  That branch causes a number of
  conflicts with s390 and other trees.  I'll send a separate pull
  request w/ resolutions once other branches are merged"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
  percpu: fix how @gfp is interpreted by the percpu allocator
  blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
  percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
  percpu_ref: add PERCPU_REF_INIT_* flags
  percpu_ref: decouple switching to percpu mode and reinit
  percpu_ref: decouple switching to atomic mode and killing
  percpu_ref: add PCPU_REF_DEAD
  percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
  percpu_ref: replace pcpu_ prefix with percpu_
  percpu_ref: minor code and comment updates
  percpu_ref: relocate percpu_ref_reinit()
  Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
  Revert "percpu: free percpu allocation info for uniprocessor system"
  percpu-refcount: make percpu_ref based on longs instead of ints
  percpu-refcount: improve WARN messages
  percpu: fix locking regression in the failure path of pcpu_alloc()
  percpu-refcount: add @gfp to percpu_ref_init()
  proportions: add @gfp to init functions
  percpu_counter: add @gfp to percpu_counter_init()
  percpu_counter: make percpu_counters_lock irq-safe
  ...
2014-10-10 07:26:02 -04:00
Luciano Coelho
0f791eb47f mac80211: allow channel switch with multiple channel contexts
Channel switch with multiple channel contexts should now work fine.
Remove check that disallows switches when multiple contexts are in
use.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:09 +02:00
Luciano Coelho
f1d65583bc mac80211: add post_channel_switch driver operation
As a counterpart to the pre_channel_switch operation, add a
post_channel_switch operation.  This allows the drivers to go back to
a normal configuration after the channel switch is completed.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:09 +02:00
Luciano Coelho
6d027bcc8a mac80211: add pre_channel_switch driver operation
Some drivers may need to prepare for a channel switch also when it is
initiated from the remote side (eg. station, P2P client).  To make
this possible, add a generic callback that can be called for all
interface types.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:08 +02:00
Luciano Coelho
2ba45384e5 mac80211: add device_timestamp to the ieee80211_channel_switch struct
Some devices may need the device timestamp in order to synchronize the
channel switch.  To pass this value back to the driver, add it to the
channel switch structure and copy the device_timestamp value received
in the rx info structure into it.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:08 +02:00
Henning Rogge
66be7d2bcd cfg80211: add ops to query mesh proxy path table
Add two new cfg80211 operations for querying a table with proxied mesh
paths.

Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:19:07 +02:00
Linus Torvalds
35a9ad8af0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
      virtio_net

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric
      Dumazet.

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom
      Herbert.

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian
      Fainelli.

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John
      Fastabend.

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
      Duyck.

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
      Dumazet.

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
  ...
2014-10-08 21:40:54 -04:00
David S. Miller
64b1f00a08 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-10-08 16:22:22 -04:00
Linus Torvalds
d0cd84817c dmaengine-3.17
1/ Step down as dmaengine maintainer see commit 08223d80df "dmaengine
    maintainer update"
 
 2/ Removal of net_dma, as it has been marked 'broken' since 3.13 (commit
    7787380336 "net_dma: mark broken"), without reports of performance
    regression.
 
 3/ Miscellaneous fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUKDLKAAoJEB7SkWpmfYgC7wwP/iNHqRjf1suMUTBIF3P6Hgbe
 VCUwh0IkuujMPDG46WRn6cYzarRxVPLoGaLHLPszgjI6pmGPVv19wqeDOlUxtcmr
 0iQWEWv/zqseaAIW+4gj/WYCyMgKil49EUBJKCZCfNmIaad+e0pr8f0uE5yOkHPM
 tqWoZERu9A4dlXGr1TjeOZVzdnPrCt92MrLDN6ZZ6tMuJaEc5PauaLxKTeGy5fYj
 UB+k1xJQzECbsYfpB+uCVYl5/qPO1rNyuBYS8THCsW+JYmrbbfH2kkF2lo2FaUpO
 8Yd50FtzXHKWwAt7BzfIwU2M7x0wRmryrC/xsQi6M+WmVeHYvvHUIpzaA66xRZ5x
 fCy3Fu8sEnmnmboAbh2v2c5uTycqRl2xPzbpLAuxglloXIxzi3ckp6ESF/Z4SldH
 oxIoEievN7lah3vKgvlHZYcWDzrYr8EKf/EzFe9RqDBQDKtzDzre1H9Uivr387Vm
 uFUcGHYG/GXuX47C7EUsMtaSW2UEoR2ytw/HR6CKFPTVXwAzEO6kA9vg0EqL0iIq
 2wVLgavlZuwegmaUBgnr+bgVZMvVN7OU7fAIRVe5xNO6itrPKvheSlQthmRiiq9C
 uzOu4PS6PexqzHUNPCcJpCsj+lawmCSrE0bxtPzTA/CQInVgWs219V9+W5Gn/0YA
 EARN9k6ueX9PZPQrPQLm
 =BBBv
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine

Pull dmaengine updates from Dan Williams:
 "Even though this has fixes marked for -stable, given the size and the
  needed conflict resolutions this is 3.18-rc1/merge-window material.

  These patches have been languishing in my tree for a long while.  The
  fact that I do not have the time to do proper/prompt maintenance of
  this tree is a primary factor in the decision to step down as
  dmaengine maintainer.  That and the fact that the bulk of drivers/dma/
  activity is going through Vinod these days.

  The net_dma removal has not been in -next.  It has developed simple
  conflicts against mainline and net-next (for-3.18).

  Continuing thanks to Vinod for staying on top of drivers/dma/.

  Summary:

   1/ Step down as dmaengine maintainer see commit 08223d80df
      "dmaengine maintainer update"

   2/ Removal of net_dma, as it has been marked 'broken' since 3.13
      (commit 7787380336 "net_dma: mark broken"), without reports of
      performance regression.

   3/ Miscellaneous fixes"

* tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
  net: make tcp_cleanup_rbuf private
  net_dma: revert 'copied_early'
  net_dma: simple removal
  dmaengine maintainer update
  dmatest: prevent memory leakage on error path in thread
  ioat: Use time_before_jiffies()
  dmaengine: fix xor sources continuation
  dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
  dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
  dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
  ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
  drivers: dma: Include appropriate header file in dca.c
  drivers: dma: Mark functions as static in dma_v3.c
  dma: mv_xor: Add DMA API error checks
  ioat/dca: Use dev_is_pci() to check whether it is pci device
2014-10-07 20:39:25 -04:00
Pablo Neira Ayuso
91c1a09b33 netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h
nf_send_reset6() now resides in net/ipv6/netfilter/nf_reject_ipv6.c

Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <edumazet@google.com>
2014-10-07 19:58:07 +02:00
Andy Zhou
7c5df8fa19 openvswitch: fix a compilation error when CONFIG_INET is not setW!
Fix a openvswitch compilation error when CONFIG_INET is not set:

=====================================================
   In file included from include/net/geneve.h:4:0,
                       from net/openvswitch/flow_netlink.c:45:
		          include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
			  >> include/net/udp_tunnel.h💯2: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
			  >>      return iptunnel_handle_offloads(skb, udp_csum, type);
			  >>           ^
			  >>           >> include/net/udp_tunnel.h💯2: warning: return makes pointer from integer without a cast
			  >>           >>    cc1: some warnings being treated as errors

=====================================================

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:10:49 -04:00
Hannes Frederic Sowa
812918c464 ipv6: make fib6 serial number per namespace
Try to reduce number of possible fn_sernum mutation by constraining them
to their namespace.

Also remove rt_genid which I forgot to remove in 705f1c869d ("ipv6:
remove rt6i_genid").

Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa
42b1870646 ipv6: make rt_sernum atomic and serial number fields ordinary ints
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa
94b2cfe02b ipv6: minor fib6 cleanups like type safety, bool conversion, inline removal
Also renamed struct fib6_walker_t to fib6_walker and enum fib_walk_state_t
to fib6_walk_state as recommended by Cong Wang.

Cc: Cong Wang <cwang@twopensource.com>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
John Fastabend
82a470f111 net: sched: remove tcf_proto from ematch calls
This removes the tcf_proto argument from the ematch code paths that
only need it to reference the net namespace. This allows simplifying
qdisc code paths especially when we need to tear down the ematch
from an RCU callback. In this case we can not guarentee that the
tcf_proto structure is still valid.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 18:02:32 -04:00
Eric Dumazet
f2600cf02b net: sched: avoid costly atomic operation in fq_dequeue()
Standard qdisc API to setup a timer implies an atomic operation on every
packet dequeue : qdisc_unthrottled()

It turns out this is not really needed for FQ, as FQ has no concept of
global qdisc throttling, being a qdisc handling many different flows,
some of them can be throttled, while others are not.

Fix is straightforward : add a 'bool throttle' to
qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
in sch_fq.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:55:10 -04:00
Jesse Gross
f579668406 openvswitch: Add support for Geneve tunneling.
The Openvswitch implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Singed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:21 -04:00
Andy Zhou
0b5e8b8eea net: Add Geneve tunneling protocol driver
This adds a device level support for Geneve -- Generic Network
Virtualization Encapsulation. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-01

Only protocol layer Geneve support is provided by this driver.
Openvswitch can be used for configuring, set up and tear down
functional Geneve tunnels.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:20 -04:00
Vlad Yasevich
bdf6fa52f0 sctp: handle association restarts when the socket is closed.
Currently association restarts do not take into consideration the
state of the socket.  When a restart happens, the current assocation
simply transitions into established state.  This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user.  The conditions
to trigger this are as follows:
  1) Remote does not acknoledge some data causing data to remain
     outstanding.
  2) Local application calls close() on the socket.  Since data
     is still outstanding, the association is placed in SHUTDOWN_PENDING
     state.  However, the socket is closed.
  3) The remote tries to create a new association, triggering a restart
     on the local system.  The association moves from SHUTDOWN_PENDING
     to ESTABLISHED.  At this point, it is no longer reachable by
     any socket on the local system.

This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk.  This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.

Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:21:45 -04:00
David S. Miller
a4b4a2b7f9 Merge tag 'master-2014-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-10-03

Please pull tihs batch of updates intended for the 3.18 stream!

For the iwlwifi bits, Emmanuel says:

"I have here a few things that depend on the latest mac80211's changes:
RRM, TPC, Quiet Period etc...  Eyal keeps improving our rate control
and we have a new device ID. This last patch should probably have
gone to wireless.git, but at that stage, I preferred to send it to
-next and CC stable."

For (most of) the Atheros bits, Kalle says:

"The only new feature is testmode support from me. Ben added a new method
to crash the firmware with an assert for debug purposes. As usual, we
have lots of smaller fixes from Michal. Matteo fixed a Kconfig
dependency with debugfs. I fixed some warnings recently added to
checkpatch."

For the NFC bits, Samuel says:

"We've had major updates for TI and ST Microelectronics drivers, and a
few NCI related changes.

For TI's trf7970a driver:

- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues

For ST Microelectronics' ST21NFCA and ST21NFCB drivers:

- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes

Finally, Marvell added ISO15693 support to the NCI stack, together with a
couple of NCI fixes."

For the Bluetooth bits, Johan says:

"This 3.18 pull request replaces the one I did on Monday ("bluetooth-next
2014-09-22", which hasn't been pulled yet). The additions since the last
request are:

 - SCO connection fix for devices not supporting eSCO
 - Cleanups regarding the SCO establishment logic
 - Remove unnecessary return value from logging functions
 - Header compression fix for 6lowpan
 - Cleanups to the ieee802154/mrf24j40 driver

Here's a copy from previous request that this one replaces:

'
Here are some more patches for 3.18. They include various fixes to the
btusb HCI driver, a fix for LE SMP, as well as adding Jukka to the
MAINTAINERS file for generic 6LoWPAN (as requested by Alexander Aring).

I've held on to this pull request a bit since we were waiting for a SCO
related fix to get sorted out first. However, since the merge window is
getting closer I decided not to wait for it. If we do get the fix sorted
out there'll probably be a second small pull request later this week.
'"

And,

"Unless 3.17 gets delayed this will probably be our last -next pull request for
3.18. We've got:

  - New Marvell hardware supportr
  - Multicast support for 6lowpan
  - Several of 6lowpan fixes & cleanups
  - Fix for a (false-positive) lockdep warning in L2CAP
  - Minor btusb cleanup"

On top of all that comes the usual sort of updates to ath5k, ath9k,
ath10k, brcmfmac, mwifiex, and wil6210.  This time around there are
also a number of rtlwifi updates to enable some new hardware and
to reconcile the in-kernel drivers with some newer releases of the
Realtek vendor drivers.  Also of note is some device tree work for
the bcma bus.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:34:39 -04:00
David S. Miller
61b37d2f54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains another batch with Netfilter/IPVS updates
for net-next, they are:

1) Add abstracted ICMP codes to the nf_tables reject expression. We
   introduce four reasons to reject using ICMP that overlap in IPv4
   and IPv6 from the semantic point of view. This should simplify the
   maintainance of dual stack rule-sets through the inet table.

2) Move nf_send_reset() functions from header files to per-family
   nf_reject modules, suggested by Patrick McHardy.

3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the
   code now that br_netfilter can be modularized. Convert remaining spots
   in the network stack code.

4) Use rcu_barrier() in the nf_tables module removal path to ensure that
   we don't leave object that are still pending to be released via
   call_rcu (that may likely result in a crash).

5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad)
   idea was to probe the word size based on the xtables match/target info
   size, but this assumption is wrong when you have to dump the information
   back to userspace.

6) Allow to filter from prerouting and postrouting in the nf_tables bridge.
   In order to emulate the ebtables NAT chains (which are actually simple
   filter chains with no special semantics), we have support filtering from
   this hooks too.

7) Add explicit module dependency between xt_physdev and br_netfilter.
   This provides a way to detect if the user needs br_netfilter from
   the configuration path. This should reduce the breakage of the
   br_netfilter modularization.

8) Cleanup coding style in ip_vs.h, from Simon Horman.

9) Fix crash in the recently added nf_tables masq expression. We have
   to register/unregister the notifiers to clean up the conntrack table
   entries from the module init/exit path, not from the rule addition /
   deletion path. From Arturo Borrero.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:32:37 -04:00
Sébastien Barré
dd3619f2ed Removed unused inet6 address state
the inet6 state INET6_IFADDR_STATE_UP only appeared in its definition.

Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:37:17 -04:00
Tom Herbert
37dd024779 gue: Receive side for Generic UDP Encapsulation
This patch adds support receiving for GUE packets in the fou module. The
fou module now supports direct foo-over-udp (no encapsulation header)
and GUE. To support this a type parameter is added to the fou netlink
parameters.

For a GUE socket we define gue_udp_recv, gue_gro_receive, and
gue_gro_complete to handle the specifics of the GUE protocol. Most
of the code to manage and configure sockets is common with the fou.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:33 -07:00
Eric Dumazet
55a93b3ea7 qdisc: validate skb without holding lock
Validation of skb can be pretty expensive :

GSO segmentation and/or checksum computations.

We can do this without holding qdisc lock, so that other cpus
can queue additional packets.

Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.

Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.

Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.

Same if disabling TX checksum offload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:36:11 -07:00
Jesper Dangaard Brouer
5772e9a346 qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.

This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().

The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
 (1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet.  The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
 (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.

One restriction of the new API is that every SKB must belong to the
same TXQ.  This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.

Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()).  In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit().  In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().

To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits.  In-effect bulking will only happen for BQL enabled drivers.

Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.

For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets.  They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.

Jointed work with Hannes, Daniel and Florian.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:37:06 -07:00
David S. Miller
739e4a758e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	net/netfilter/nfnetlink.c

Both r8152 and nfnetlink conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02 11:25:43 -07:00
Simon Horman
07dcc686fa ipvs: Clean up comment style in ip_vs.h
* Consistently use the multi-line comment style for networking code:

  /* This
   * That
   * The other thing
   */

* Use single-line comment style for comments with only one line of text.

* In general follow the leading '*' of each line of a comment with a
  single space and then text.

* Add missing line break between functions, remove double line break,
  align comments to previous lines whenever possible.

Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:58 +02:00
Pablo Neira Ayuso
4b7fd5d97e netfilter: explicit module dependency between br_netfilter and physdev
You can use physdev to match the physical interface enslaved to the
bridge device. This information is stored in skb->nf_bridge and it is
set up by br_netfilter. So, this is only available when iptables is
used from the bridge netfilter path.

Since 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the br_netfilter code is modular. To reduce the impact of this change,
we can autoload the br_netfilter if the physdev match is used since
we assume that the users need br_netfilter in place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:57 +02:00
Pablo Neira Ayuso
c8d7b98bec netfilter: move nf_send_resetX() code to nf_reject_ipvX modules
Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and
nf_reject_ipv6 respectively. This code is shared by x_tables and
nf_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:49 +02:00
Pablo Neira Ayuso
51b0a5d8c2 netfilter: nft_reject: introduce icmp code abstraction for inet and bridge
This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides
an abstraction to the ICMP and ICMPv6 codes that you can use from the
inet and bridge tables, they are:

* NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable
* NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable
* NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable
* NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited

You can still use the specific codes when restricting the rule to match
the corresponding layer 3 protocol.

I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have
different semantics depending on the table family and to allow the user
to specify ICMP family specific codes if they restrict it to the
corresponding family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:29:57 +02:00
WANG Cong
a0efb80ce3 net_sched: avoid calling tcf_unbind_filter() in call_rcu callback
This fixes the following crash:

[   63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
[   63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
[   63.980094] RIP: 0010:[<ffffffff817e6d07>]  [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d
[   63.980094] RSP: 0018:ffff880117dffcc0  EFLAGS: 00010202
[   63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
[   63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
[   63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
[   63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
[   63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
[   63.980094] FS:  0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[   63.980094] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
[   63.980094] Stack:
[   63.980094]  ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
[   63.980094]  ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
[   63.980094]  ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
[   63.980094] Call Trace:
[   63.980094]  [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d
[   63.980094]  [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691
[   63.980094]  [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691
[   63.980094]  [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d
[   63.980094]  [<ffffffff810780a4>] __do_softirq+0x142/0x323
[   63.980094]  [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53
[   63.980094]  [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221
[   63.980094]  [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33
[   63.980094]  [<ffffffff8108e44d>] kthread+0xc9/0xd1
[   63.980094]  [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125
[   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
[   63.980094]  [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0
[   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61

tp could be freed in call_rcu callback too, the order is not guaranteed.

John Fastabend says:

====================
Its worth noting why this is safe. Any running schedulers will either
read the valid class field or it will be zeroed.

All schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
====================

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 22:00:42 -04:00
Tom Herbert
8bce6d7d0d udp: Generalize skb_udp_segment
skb_udp_segment is the function called from udp4_ufo_fragment to
segment a UDP tunnel packet. This function currently assumes
segmentation is transparent Ethernet bridging (i.e. VXLAN
encapsulation). This patch generalizes the function to
operate on either Ethertype or IP protocol.

The inner_protocol field must be set to the protocol of the inner
header. This can now be either an Ethertype or an IP protocol
(in a union). A new flag in the skbuff indicates which type is
effective. skb_set_inner_protocol and skb_set_inner_ipproto
helper functions were added to set the inner_protocol. These
functions are called from the point where the tunnel encapsulation
is occuring.

When skb_udp_tunnel_segment is called, the function to segment the
inner packet is selected based on the inner IP or Ethertype. In the
case of an IP protocol encapsulation, the function is derived from
inet[6]_offloads. In the case of Ethertype, skb->protocol is
set to the inner_protocol and skb_mac_gso_segment is called. (GRE
currently does this, but it might be possible to lookup the protocol
in offload_base and call the appropriate segmenation function
directly).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:35:51 -04:00
Li RongQing
a12a601ed1 tcp: Change tcp_slow_start function to return void
No caller uses the return value, so make this function return void.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 17:09:16 -04:00
Hannes Frederic Sowa
705f1c869d ipv6: remove rt6i_genid
Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
are already marked DST_HOST (e.g. input routes routes) will always be
invalidated during sk_dst_check. Thus per-socket dst caching absolutely
had no effect and early demuxing had no effect.

Thus this patch removes rt6i_genid: fn_sernum already gets modified during
add operations, so we only must ensure we mutate fn_sernum during ipv6
address remove operations. This is a fairly cost extensive operations,
but address removal should not happen that often. Also our mtu update
functions do the same and we heard no complains so far. xfrm policy
changes also cause a call into fib6_flush_trees. Also plug a hole in
rt6_info (no cacheline changes).

I verified via tracing that this change has effect.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 14:00:48 -04:00
John Fastabend
b0ab6f9275 net: sched: enable per cpu qstats
After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend
6401585366 net: sched: restrict use of qstats qlen
This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().

The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.

It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend
25331d6ce4 net: sched: implement qstat helper routines
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend
22e0f8b932 net: sched: make bstats per cpu and estimator RCU safe
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
David S. Miller
852248449c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
pull request: netfilter/ipvs updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
most relevantly they are:

1) Four patches to make the new nf_tables masquerading support
   independent of the x_tables infrastructure. This also resolves a
   compilation breakage if the masquerade target is disabled but the
   nf_tables masq expression is enabled.

2) ipset updates via Jozsef Kadlecsik. This includes the addition of the
   skbinfo extension that allows you to store packet metainformation in the
   elements. This can be used to fetch and restore this to the packets through
   the iptables SET target, patches from Anton Danilov.

3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick.

4) Add simple weighted fail-over scheduler via Simon Horman. This provides
   a fail-over IPVS scheduler (unlike existing load balancing schedulers).
   Connections are directed to the appropriate server based solely on
   highest weight value and server availability, patch from Kenny Mathis.

5) Support IPv6 real servers in IPv4 virtual-services and vice versa.
   Simon Horman informs that the motivation for this is to allow more
   flexibility in the choice of IP version offered by both virtual-servers
   and real-servers as they no longer need to match: An IPv4 connection
   from an end-user may be forwarded to a real-server using IPv6 and
   vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell
   and Julian Anastasov.

6) Add global generation ID to the nf_tables ruleset. When dumping from
   several different object lists, we need a way to identify that an update
   has ocurred so userspace knows that it needs to refresh its lists. This
   also includes a new command to obtain the 32-bits generation ID. The
   less significant 16-bits of this ID is also exposed through res_id field
   in the nfnetlink header to quickly detect the interference and retry when
   there is no risk of ID wraparound.

7) Move br_netfilter out of the bridge core. The br_netfilter code is
   built in the bridge core by default. This causes problems of different
   kind to people that don't want this: Jesper reported performance drop due
   to the inconditional hook registration and I remember to have read complains
   on netdev from people regarding the unexpected behaviour of our bridging
   stack when br_netfilter is enabled (fragmentation handling, layer 3 and
   upper inspection). People that still need this should easily undo the
   damage by modprobing the new br_netfilter module.

8) Dump the set policy nf_tables that allows set parameterization. So
   userspace can keep user-defined preferences when saving the ruleset.
   From Arturo Borrero.

9) Use __seq_open_private() helper function to reduce boiler plate code
   in x_tables, From Rob Jones.

10) Safer default behaviour in case that you forget to load the protocol
   tracker. Daniel Borkmann and Florian Westphal detected that if your
   ruleset is stateful, you allow traffic to at least one single SCTP port
   and the SCTP protocol tracker is not loaded, then any SCTP traffic may
   be pass through unfiltered. After this patch, the connection tracking
   classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has
   been compiled with support for these modules.
====================

Trivially resolved conflict in include/linux/skbuff.h, Eric moved some
netfilter skbuff members around, and the netfilter tree adjusted the
ifdef guards for the bridging info pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 14:46:53 -04:00
Florian Westphal
d82bd12298 tcp: move TCP_ECN_create_request out of header
After Octavian Purdilas tcp ipv4/ipv6 unification work this helper only
has a single callsite.

While at it, convert name to lowercase, suggested by Stephen.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 14:41:22 -04:00
Arturo Borrero
9363dc4b59 netfilter: nf_tables: store and dump set policy
We want to know in which cases the user explicitly sets the policy
options. In that case, we also want to dump back the info.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29 11:28:03 +02:00
Florian Westphal
9890092e46 net: tcp: more detailed ACK events and events for CE marked packets
DataCenter TCP (DCTCP) determines cwnd growth based on ECN information
and ACK properties, e.g. ACK that updates window is treated differently
than DUPACK.

Also DCTCP needs information whether ACK was delayed ACK. Furthermore,
DCTCP also implements a CE state machine that keeps track of CE markings
of incoming packets.

Therefore, extend the congestion control framework to provide these
event types, so that DCTCP can be properly implemented as a normal
congestion algorithm module outside of the core stack.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal
7354c8c389 net: tcp: split ack slow/fast events from cwnd_event
The congestion control ops "cwnd_event" currently supports
CA_EVENT_FAST_ACK and CA_EVENT_SLOW_ACK events (among others).
Both FAST and SLOW_ACK are only used by Westwood congestion
control algorithm.

This removes both flags from cwnd_event and adds a new
in_ack_event callback for this. The goal is to be able to
provide more detailed information about ACKs, such as whether
ECE flag was set, or whether the ACK resulted in a window
update.

It is required for DataCenter TCP (DCTCP) congestion control
algorithm as it makes a different choice depending on ECE being
set or not.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Daniel Borkmann
30e502a34b net: tcp: add flag for ca to indicate that ECN is required
This patch adds a flag to TCP congestion algorithms that allows
for requesting to mark IPv4/IPv6 sockets with transport as ECN
capable, that is, ECT(0), when required by a congestion algorithm.

It is currently used and needed in DataCenter TCP (DCTCP), as it
requires both peers to assert ECT on all IP packets sent - it
uses ECN feedback (i.e. CE, Congestion Encountered information)
from switches inside the data center to derive feedback to the
end hosts.

Therefore, simply add a new flag to icsk_ca_ops. Note that DCTCP's
algorithm/behaviour slightly diverges from RFC3168, therefore this
is only (!) enabled iff the assigned congestion control ops module
has requested this. By that, we can tightly couple this logic really
only to the provided congestion control ops.

Joint work with Florian Westphal and Glenn Judd.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal
55d8694fa8 net: tcp: assign tcp cong_ops when tcp sk is created
Split assignment and initialization from one into two functions.

This is required by followup patches that add Datacenter TCP
(DCTCP) congestion control algorithm - we need to be able to
determine if the connection is moderated by DCTCP before the
3WHS has finished.

As we walk the available congestion control list during the
assignment, we are always guaranteed to have Reno present as
it's fixed compiled-in. Therefore, since we're doing the
early assignment, we don't have a real use for the Reno alias
tcp_init_congestion_ops anymore and can thus remove it.

Actual usage of the congestion control operations are being
made after the 3WHS has finished, in some cases however we
can access get_info() via diag if implemented, therefore we
need to zero out the private area for those modules.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
WANG Cong
18d0264f63 net_sched: remove the first parameter from tcf_exts_destroy()
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:29:01 -04:00
David S. Miller
f5c7e1a47a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-09-25

1) Remove useless hash_resize_mutex in xfrm_hash_resize().
   This mutex is used only there, but xfrm_hash_resize()
   can't be called concurrently at all. From Ying Xue.

2) Extend policy hashing to prefixed policies based on
   prefix lenght thresholds. From Christophe Gouault.

3) Make the policy hash table thresholds configurable
   via netlink. From Christophe Gouault.

4) Remove the maximum authentication length for AH.
   This was needed to limit stack usage. We switched
   already to allocate space, so no need to keep the
   limit. From Herbert Xu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:19:15 -04:00
Florian Fainelli
7905288f09 net: dsa: allow switches driver to implement get/set EEE
Allow switches driver to query and enable/disable EEE on a per-port
basis by implementing the ethtool_{get,set}_eee settings and delegating
these operations to the switch driver.

set_eee() will need to coordinate with the PHY driver to make sure that
EEE is enabled, the link-partner supports it and the auto-negotiation
result is satisfactory.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:09 -04:00
Florian Fainelli
b2f2af21e3 net: dsa: allow enabling and disable switch ports
Whenever a per-port network device is used/unused, invoke the switch
driver port_enable/port_disable callbacks to allow saving as much power
as possible by disabling unused parts of the switch (RX/TX logic, memory
arrays, PHYs...). We supply a PHY device argument to make sure the
switch driver can act on the PHY device if needed (like putting/taking
the PHY out of deep low power mode).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:08 -04:00
Eric Dumazet
cd7d8498c9 tcp: change tcp_skb_pcount() location
Our goal is to access no more than one cache line access per skb in
a write or receive queue when doing the various walks.

After recent TCP_SKB_CB() reorganizations, it is almost done.

Last part is tcp_skb_pcount() which currently uses
skb_shinfo(skb)->gso_segs, which is a terrible choice, because it needs
3 cache lines in current kernel (skb->head, skb->end, and
shinfo->gso_segs are all in 3 different cache lines, far from skb->cb)

This very simple patch reuses space currently taken by tcp_tw_isn
only in input path, as tcp_skb_pcount is only needed for skb stored in
write queue.

This considerably speeds up tcp_ack(), granted we avoid shinfo->tx_flags
to get SKBTX_ACK_TSTAMP, which seems possible.

This also speeds up all sack processing in general.

This speeds up tcp_sendmsg() because it no longer has to access/dirty
shinfo.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:36:48 -04:00
Eric Dumazet
971f10eca1 tcp: better TCP_SKB_CB layout to reduce cache line misses
TCP maintains lists of skb in write queue, and in receive queues
(in order and out of order queues)

Scanning these lists both in input and output path usually requires
access to skb->next, TCP_SKB_CB(skb)->seq, and TCP_SKB_CB(skb)->end_seq

These fields are currently in two different cache lines, meaning we
waste lot of memory bandwidth when these queues are big and flows
have either packet drops or packet reorders.

We can move TCP_SKB_CB(skb)->header at the end of TCP_SKB_CB, because
this header is not used in fast path. This allows TCP to search much faster
in the skb lists.

Even with regular flows, we save one cache line miss in fast path.

Thanks to Christoph Paasch for noticing we need to cleanup
skb->cb[] (IPCB/IP6CB) before entering IP stack in tx path,
and that I forgot IPCB use in tcp_v4_hnd_req() and tcp_v4_save_options().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:43 -04:00
Eric Dumazet
a224772db8 ipv6: add a struct inet6_skb_parm param to ipv6_opt_accepted()
ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm
that it needs. Lets not assume this, as TCP stack might use a different
place.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:43 -04:00
Eric Dumazet
24a2d43d88 ipv4: rename ip_options_echo to __ip_options_echo()
ip_options_echo() assumes struct ip_options is provided in &IPCB(skb)->opt
Lets break this assumption, but provide a helper to not change all call points.

ip_send_unicast_reply() gets a new struct ip_options pointer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:42 -04:00
Dan Williams
3f33407856 net: make tcp_cleanup_rbuf private
net_dma was the only external user so this can become local to tcp.c
again.

Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:22:21 -07:00
Dan Williams
7bced39751 net_dma: simple removal
Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.

This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.

Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():

    https://lkml.org/lkml/2014/9/3/177

Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: <stable@vger.kernel.org>
Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:05:16 -07:00
LEROY Christophe
4565af0d40 net: optimise csum_replace4()
csum_partial() is a generic function which is not optimised for small fixed
length calculations, and its use requires to store "from" and "to" values in
memory while we already have them available in registers. This also has impact,
especially on RISC processors. In the same spirit as the change done by
Eric Dumazet on csum_replace2(), this patch rewrites inet_proto_csum_replace4()
taking into account RFC1624.

I spotted during a NATted tcp transfert that csum_partial() is one of top 5
consuming functions (around 8%), and the second user of csum_partial() is
inet_proto_csum_replace4().

I have proposed the same modification to inet_proto_csum_replace4() in another
patch.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:14:16 -04:00
David S. Miller
57219dc7bf Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-09-22

Please pull this batch of updates intended for the 3.18 stream...

For the mac80211 bits, Johannes says:

"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."

For the bluetooth bits, Johan says:

"Here are some more patches intended for 3.18. Most of them are cleanups
or fixes for SMP. The only exception is a fix for BR/EDR L2CAP fixed
channels which should now work better together with the L2CAP
information request procedure."

For the iwlwifi bits, Emmanuel says:

"I fix here dvm which was broken by my last pull request. Arik
continues to work on TDLS and Luca solved a few issues in CT-Kill. Eyal
keeps digging into rate scaling code, more to come soon. Besides this,
nothing really special here."

Beyond that, there are the usual big batches of updates to ath9k, b43,
mwifiex, and wil6210 as well as a handful of other bits here and there.
Also, rtlwifi gets some btcoexist attention from Larry.

Please let me know if there are problems!
====================

Had to adjust the wil6210 code to comply with Joe Perches's recent
change in net-next to make the netdev_*() routines return void instead
of 'int'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:39:24 -04:00
John W. Linville
30d3c071a6 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-09-26 13:38:51 -04:00
John W. Linville
330bd4ec9d NFC: 3.18 pull request
This is the NFC pull request for 3.18.
 
 We've had major updates for TI and ST Microelectronics drivers:
 
 For TI's trf7970a driver:
 
 - Target mode support for trf7970a
 - Suspend/resume support for trf7970a
 - DT properties additions to handle different quirks
 - A bunch of fixes for smartphone IOP related issues
 
 For ST Microelectronics' ST21NFCA and ST21NFCB drivers:
 
 - ISO15693 support for st21nfcb
 - checkpatch and sparse related warning fixes
 - Code cleanups and a few minor fixes
 
 Finally, Marvell add ISO15693 support to the NCI stack, together with a
 couple of NCI fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUIg4oAAoJEIqAPN1PVmxKg+kP/RPrgH6LA1tFubIwKR2+sGQ7
 g/W2J3AE8QASZkErkpXRNCt2D/SPWEKBY/qscwz+BtcWg76taIIaGTvVUNtxSaW3
 gS4hG6V1UlANWv3KFfaOKmzjEOO/SPNtkFAyI0cTOaGyUqG4o9BgBZpn1rYO16MD
 ZkSC39MpjMoXB9BbsfQngoUEoWc3tZNMmRzk4IVTwE/wXuQvZxmFQXEAiZ+pnYle
 NQfugaGMz0526rLG3QnrpkUakFb81iQwtONpbx6i8KW/Klkc6TN/ek6J9ecU8t5z
 tdHOViZWRmA1VwMGBHwpq8F2o/ATH6GeivTgqrQjcjGNhCUUT1Ulzve2UxGEMWi6
 ncjKY/GxUrYaMMtRvLv+/knrfbWtd+EnWOav07jgNrrA0tvgBNQvEKKHPoWykDVN
 QKpxu3YoNxrsR/LJMS+Zjj0IIM1Y+9DTOkLXzxJ5Hvht8rOl5heYGh2DICOpWsbQ
 ejrQicJOJvN5vqu+Sgcqq4msyTEdbs2LfRDrW1VC9A6ILI+KzYg2laTFGMnhZ5qn
 TgsYIDdONS2iGUulFHylGHI7ANtUg/mhklLUccY1HQYyiAM1NQUtzq1tAz6yLoIH
 l8iIiyzJSBWW57nWhyrULEbzHgPE+bHIjO4T+UUOxMgquYa4V11S1uP0OfWfZogR
 xS24GlobS2oXHMQqh0fA
 =d/C+
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.18 pull request

This is the NFC pull request for 3.18.

We've had major updates for TI and ST Microelectronics drivers:

For TI's trf7970a driver:

- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues

For ST Microelectronics' ST21NFCA and ST21NFCB drivers:

- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes

Finally, Marvell add ISO15693 support to the NCI stack, together with a
couple of NCI fixes."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-26 13:37:02 -04:00
Pablo Neira Ayuso
34666d467c netfilter: bridge: move br_netfilter out of the core
Jesper reported that br_netfilter always registers the hooks since
this is part of the bridge core. This harms performance for people that
don't need this.

This patch modularizes br_netfilter so it can be rmmod'ed, thus,
the hooks can be unregistered. I think the bridge netfilter should have
been a separated module since the beginning, Patrick agreed on that.

Note that this is breaking compatibility for users that expect that
bridge netfilter is going to be available after explicitly 'modprobe
bridge' or via automatic load through brctl.

However, the damage can be easily undone by modprobing br_netfilter.
The bridge core also spots a message to provide a clue to people that
didn't notice that this has been deprecated.

On top of that, the plan is that nftables will not rely on this software
layer, but integrate the connection tracking into the bridge layer to
enable stateful filtering and NAT, which is was bridge netfilter users
seem to require.

This patch still keeps the fake_dst_ops in the bridge core, since this
is required by when the bridge port is initialized. So we can safely
modprobe/rmmod br_netfilter anytime.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
2014-09-26 18:42:31 +02:00
Tejun Heo
d06efebf0c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18
This is to receive 0a30288da1 ("blk-mq, percpu_ref: implement a
kludge for SCSI blk-mq stall during probe") which implements
__percpu_ref_kill_expedited() to work around SCSI blk-mq stall.  The
commit reverted and patches to implement proper fix will be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
2014-09-24 13:00:21 -04:00
Johan Hedberg
d41c15cf95 Bluetooth: Fix reason code used for rejecting SCO connections
The core specification defines valid values for the
HCI_Reject_Synchronous_Connection_Request command to be 0x0D-0x0F. So
far the code has been using HCI_ERROR_REMOTE_USER_TERM (0x13) which is
not a valid value and is therefore being rejected by some controllers:

 > HCI Event: Connect Request (0x04) plen 10
	bdaddr 40:6F:2A:6A:E5:E0 class 0x000000 type eSCO
 < HCI Command: Reject Synchronous Connection (0x01|0x002a) plen 7
	bdaddr 40:6F:2A:6A:E5:E0 reason 0x13
	Reason: Remote User Terminated Connection
 > HCI Event: Command Status (0x0f) plen 4
	Reject Synchronous Connection (0x01|0x002a) status 0x12 ncmd 1
	Error: Invalid HCI Command Parameters

This patch introduces a new define for a value from the valid range
(0x0d == Connection Rejected Due To Limited Resources) and uses it
instead for rejecting incoming connections.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-24 14:03:32 +02:00
Joe Perches
2b0bf6c85a Bluetooth: Convert bt_<level> logging functions to return void
No caller or macro uses the return value so make all
the functions return void.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-24 09:40:08 +02:00
Christophe Ricard
9e87f9a9c4 NFC: nci: Add support for proprietary RF Protocols
In NFC Forum NCI specification, some RF Protocol values are
reserved for proprietary use (from 0x80 to 0xfe).
Some CLF vendor may need to use one value within this range
for specific technology.
Furthermore, some CLF may not becompliant with NFC Froum NCI
specification 2.0 and therefore will not support RF Protocol
value 0x06 for PROTOCOL_T5T as mention in a draft specification
and in a recent push.

Adding get_rf_protocol handle to the nci_ops structure will
help to set the correct technology to target.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-09-24 02:02:24 +02:00
Eric Dumazet
4cdf507d54 icmp: add a global rate limitation
Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
protected by a lock, meaning that hosts can be stuck hard if all cpus
want to check ICMP limits.

When say a DNS or NTP server process is restarted, inetpeer tree grows
quick and machine comes to its knees.

iptables can not help because the bottleneck happens before ICMP
messages are even cooked and sent.

This patch adds a new global limitation, using a token bucket filter,
controlled by two new sysctl :

icmp_msgs_per_sec - INTEGER
    Limit maximal number of ICMP packets sent per second from this host.
    Only messages whose type matches icmp_ratemask are
    controlled by this limit.
    Default: 1000

icmp_msgs_burst - INTEGER
    icmp_msgs_per_sec controls number of ICMP packets sent per second,
    while icmp_msgs_burst controls the burst size of these packets.
    Default: 50

Note that if we really want to send millions of ICMP messages per
second, we might extend idea and infra added in commit 04ca6973f7
("ip: make IP identifiers less predictable") :
add a token bucket in the ip_idents hash and no longer rely on inetpeer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:47:38 -04:00
David S. Miller
1f6d80358d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/mips/net/bpf_jit.c
	drivers/net/can/flexcan.c

Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:09:27 -04:00
David S. Miller
84de67b298 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2014-09-22

We generate a blackhole or queueing route if a packet
matches an IPsec policy but a state can't be resolved.
Here we assume that dst_output() is called to kill
these packets. Unfortunately this assumption is not
true in all cases, so it is possible that these packets
leave the system without the necessary transformations.

This pull request contains two patches to fix this issue:

1) Fix for blackhole routed packets.

2) Fix for queue routed packets.

Both patches are serious stable candidates.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:41:41 -04:00
Eric Dumazet
fcdd1cf4dd tcp: avoid possible arithmetic overflows
icsk_rto is a 32bit field, and icsk_backoff can reach 15 by default,
or more if some sysctl (eg tcp_retries2) are changed.

Better use 64bit to perform icsk_rto << icsk_backoff operations

As Joe Perches suggested, add a helper for this.

Yuchung spotted the tcp_v4_err() case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:27:10 -04:00
Daniel Borkmann
35f7aa5309 ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback
RFC2710 (MLDv1), section 3.7. says:

  The length of a received MLD message is computed by taking the
  IPv6 Payload Length value and subtracting the length of any IPv6
  extension headers present between the IPv6 header and the MLD
  message. If that length is greater than 24 octets, that indicates
  that there are other fields present *beyond* the fields described
  above, perhaps belonging to a *future backwards-compatible* version
  of MLD. An implementation of the version of MLD specified in this
  document *MUST NOT* send an MLD message longer than 24 octets and
  MUST ignore anything past the first 24 octets of a received MLD
  message.

RFC3810 (MLDv2), section 8.2.1. states for *listeners* regarding
presence of MLDv1 routers:

  In order to be compatible with MLDv1 routers, MLDv2 hosts MUST
  operate in version 1 compatibility mode. [...] When Host
  Compatibility Mode is MLDv2, a host acts using the MLDv2 protocol
  on that interface. When Host Compatibility Mode is MLDv1, a host
  acts in MLDv1 compatibility mode, using *only* the MLDv1 protocol,
  on that interface. [...]

While section 8.3.1. specifies *router* behaviour regarding presence
of MLDv1 routers:

  MLDv2 routers may be placed on a network where there is at least
  one MLDv1 router. The following requirements apply:

  If an MLDv1 router is present on the link, the Querier MUST use
  the *lowest* version of MLD present on the network. This must be
  administratively assured. Routers that desire to be compatible
  with MLDv1 MUST have a configuration option to act in MLDv1 mode;
  if an MLDv1 router is present on the link, the system administrator
  must explicitly configure all MLDv2 routers to act in MLDv1 mode.
  When in MLDv1 mode, the Querier MUST send periodic General Queries
  truncated at the Multicast Address field (i.e., 24 bytes long),
  and SHOULD also warn about receiving an MLDv2 Query (such warnings
  must be rate-limited). The Querier MUST also fill in the Maximum
  Response Delay in the Maximum Response Code field, i.e., the
  exponential algorithm described in section 5.1.3. is not used. [...]

That means that we should not get queries from different versions of
MLD. When there's a MLDv1 router present, MLDv2 enforces truncation
and MRC == MRD (both fields are overlapping within the 24 octet range).

Section 8.3.2. specifies behaviour in the presence of MLDv1 multicast
address *listeners*:

  MLDv2 routers may be placed on a network where there are hosts
  that have not yet been upgraded to MLDv2. In order to be compatible
  with MLDv1 hosts, MLDv2 routers MUST operate in version 1 compatibility
  mode. MLDv2 routers keep a compatibility mode per multicast address
  record. The compatibility mode of a multicast address is determined
  from the Multicast Address Compatibility Mode variable, which can be
  in one of the two following states: MLDv1 or MLDv2.

  The Multicast Address Compatibility Mode of a multicast address
  record is set to MLDv1 whenever an MLDv1 Multicast Listener Report is
  *received* for that multicast address. At the same time, the Older
  Version Host Present timer for the multicast address is set to Older
  Version Host Present Timeout seconds. The timer is re-set whenever a
  new MLDv1 Report is received for that multicast address. If the Older
  Version Host Present timer expires, the router switches back to
  Multicast Address Compatibility Mode of MLDv2 for that multicast
  address. [...]

That means, what can happen is the following scenario, that hosts can
act in MLDv1 compatibility mode when they previously have received an
MLDv1 query (or, simply operate in MLDv1 mode-only); and at the same
time, an MLDv2 router could start up and transmits MLDv2 startup query
messages while being unaware of the current operational mode.

Given RFC2710, section 3.7 we would need to answer to that with an MLDv1
listener report, so that the router according to RFC3810, section 8.3.2.
would receive that and internally switch to MLDv1 compatibility as well.

Right now, I believe since the initial implementation of MLDv2, Linux
hosts would just silently drop such MLDv2 queries instead of replying
with an MLDv1 listener report, which would prevent a MLDv2 router going
into fallback mode (until it receives other MLDv1 queries).

Since the mapping of MRC to MRD in exactly such cases can make use of
the exponential algorithm from 5.1.3, we cannot [strictly speaking] be
aware in MLDv1 of the encoding in MRC, it seems also not mentioned by
the RFC. Since encodings are the same up to 32767, assume in such a
situation this value as a hard upper limit we would clamp. We have asked
one of the RFC authors on that regard, and he mentioned that there seem
not to be any implementations that make use of that exponential algorithm
on startup messages. In any case, this patch fixes this MLD
interoperability issue.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:23:15 -04:00
Florian Fainelli
19e57c4e6d net: dsa: add {get, set}_wol callbacks to slave devices
Allow switch drivers to implement per-port Wake-on-LAN getter and
setters.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:41:23 -04:00
Florian Fainelli
2446254915 net: dsa: allow switch drivers to implement suspend/resume hooks
Add an abstraction layer to suspend/resume switch devices, doing the
following split:

- suspend/resume the slave network devices and their corresponding PHY
  devices
- suspend/resume the switch hardware using switch driver callbacks

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:41:23 -04:00
Eric Dumazet
2571178626 net: sched: shrink struct qdisc_skb_cb to 28 bytes
We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
or increasing skb->cb[] size.

Commit e0f31d8498 ("flow_keys: Record IP layer protocol in
skb_flow_dissect()") broke IPoIB.

Only current offender is sch_choke, and this one do not need an
absolutely precise flow key.

If we store 17 bytes of flow key, its more than enough. (Its the actual
size of flow_keys if it was a packed structure, but we might add new
fields at the end of it later)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d8498 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:21:47 -04:00
Nicolas Dichtel
0d566379c5 genetlink: add function genl_has_listeners()
This function is the counterpart of the function netlink_has_listeners().

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:28:26 -04:00
Sabrina Dubroca
54003f119c net: fix sparse warnings in SNMP_UPD_PO_STATS(_BH)
ptr used to be a non __percpu pointer (result of a this_cpu_ptr
assignment, 7d720c3e4f ("percpu: add __percpu sparse annotations to
net")). Since d25398df59 ("net: avoid reloads in SNMP_UPD_PO_STATS"),
that's no longer the case, SNMP_UPD_PO_STATS uses this_cpu_add and ptr
is now __percpu.

Silence sparse warnings by preserving the original type and
annotation, and remove the out-of-date comment.

warning: incorrect type in initializer (different address spaces)
   expected unsigned long long *ptr
   got unsigned long long [noderef] <asn:3>*<noident>
warning: incorrect type in initializer (different address spaces)
   expected void const [noderef] <asn:3>*__vpp_verify
   got unsigned long long *<noident>
warning: incorrect type in initializer (different address spaces)
   expected void const [noderef] <asn:3>*__vpp_verify
   got unsigned long long *<noident>

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:22:31 -04:00
Tom Herbert
5632848653 net: Changes to ip_tunnel to support foo-over-udp encapsulation
This patch changes IP tunnel to support (secondary) encapsulation,
Foo-over-UDP. Changes include:

1) Adding tun_hlen as the tunnel header length, encap_hlen as the
   encapsulation header length, and hlen becomes the grand total
   of these.
2) Added common netlink define to support FOU encapsulation.
3) Routines to perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Florian Fainelli
6819563e64 net: dsa: allow switch drivers to specify phy_device::dev_flags
Some switch drivers (e.g: bcm_sf2) may have to communicate specific
workarounds or flags towards the PHY device driver. Allow switches
driver to be delegated that task by introducing a get_phy_flags()
callback which will do just that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Andy Zhou
6a93cc9052 udp-tunnel: Add a few more UDP tunnel APIs
Added a few more UDP tunnel APIs that can be shared by UDP based
tunnel protocol implementation. The main ones are highlighted below.

setup_udp_tunnel_sock() configures UDP listener socket for
receiving UDP encapsulated packets.

udp_tunnel_xmit_skb() and upd_tunnel6_xmit_skb() transmit skb
using UDP encapsulation.

udp_tunnel_sock_release() closes the UDP tunnel listener socket.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou
fd384412e1 udp_tunnel: Seperate ipv6 functions into its own file.
Add ip6_udp_tunnel.c for ipv6 UDP tunnel functions to avoid ifdefs
in udp_tunnel.c

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Herbert Xu
689f1c9de2 ipsec: Remove obsolete MAX_AH_AUTH_LEN
While tracking down the MAX_AH_AUTH_LEN crash in an old kernel
I thought that this limit was rather arbitrary and we should
just get rid of it.

In fact it seems that we've already done all the work needed
to remove it apart from actually removing it.  This limit was
there in order to limit stack usage.  Since we've already
switched over to allocating scratch space using kmalloc, there
is no longer any need to limit the authentication length.

This patch kills all references to it, including the BUG_ONs
that led me here.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-18 10:54:36 +02:00
Marcel Holtmann
0097db06f5 Bluetooth: Remove exported hci_recv_fragment function
The hci_recv_fragment function is no longer used by any driver and thus
do not export it. In fact it is not even needed by the core and it can
be removed altogether.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-09-17 10:23:03 +03:00
Steffen Klassert
b8c203b2d2 xfrm: Generate queueing routes only from route lookup functions
Currently we genarate a queueing route if we have matching policies
but can not resolve the states and the sysctl xfrm_larval_drop is
disabled. Here we assume that dst_output() is called to kill the
queued packets. Unfortunately this assumption is not true in all
cases, so it is possible that these packets leave the system unwanted.

We fix this by generating queueing routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: a0073fe18e ("xfrm: Add a state resolution packet queue")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:49 +02:00
Steffen Klassert
f92ee61982 xfrm: Generate blackhole routes only from route lookup functions
Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131b1 ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:40 +02:00
Alex Gartrell
391f503d69 ipvs: prevent mixing heterogeneous pools and synchronization
The synchronization protocol is not compatible with heterogeneous pools, so
we need to verify that we're not turning both on at the same time.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:35 +09:00
Alex Gartrell
ba38528aae ipvs: Supply destination address family to ip_vs_conn_new
The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell
655eef103d ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}
We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Alex Gartrell
6cff339bbd ipvs: Add destination address family to netlink interface
This is necessary to support heterogeneous pools.  For example, if you have
an ipv6 addressed network, you'll want to be able to forward ipv4 traffic
into it.

This patch enforces that destination address family is the same as service
family, as none of the forwarding mechanisms support anything else.

For the old setsockopt mechanism, we simply set the dest address family to
AF_INET as we do with the service.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Alexander Duyck
b4d2394d01 dsa: Replace mii_bus with a generic host device
This change makes it so that instead of passing and storing a mii_bus we
instead pass and store a host_dev.  From there we can test to determine the
exact type of device, and can verify it is the correct device for our switch.

So for example it would be possible to pass a device pointer from a pci_dev
and instead of checking for a PHY ID we could check for a vendor and/or device
ID.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:24:20 -04:00
Alexander Duyck
5075314e4e dsa: Split ops up, and avoid assigning tag_protocol and receive separately
This change addresses several issues.

First, it was possible to set tag_protocol without setting the ops pointer.
To correct that I have reordered things so that rcv is now populated before
we set tag_protocol.

Second, it didn't make much sense to keep setting the device ops each time a
new slave was registered.  So by moving the receive portion out into root
switch initialization that issue should be addressed.

Third, I wanted to avoid sending tags if the rcv pointer was not registered
so I changed the tag check to verify if the rcv function pointer is set on
the root tree.  If it is then we start sending DSA tagged frames.

Finally I split the device ops pointer in the structures into two spots.  I
placed the rcv function pointer in the root switch since this makes it
easiest to access from there, and I placed the xmit function pointer in the
slave for the same reason.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:24:20 -04:00
John W. Linville
6bd2bd27ba This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
 timeout support, a number of changes for TDLS, early support for radio
 resource measurement and many fixes. Also, I'm changing a number of
 places to clear key memory when it's freed and Intel claims copyright
 for code they developed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUEpv0AAoJEDBSmw7B7bqr6CMP/2CXvWr/98AY2Flt74KDNyaE
 vmJBVCsu+eT0G9FL6YxbVU5+rvInGDHd9qTHkU4ljd+uXwnG8XAT+WHFlhBjzm+V
 juXPWblbSdMzwpWDfq7Kbk134b9ALTEUqekhqSFvhPA5h0Dq0/8lDK9CFyfwKWbN
 07PwUv0VUUEHKVqQoVSNJu9Szi5NvZvDcN7Jwg1Cpnv0sUOeH7J2Kz1OUT4RaEhI
 c/UJjCQV4ssXaEkTDIxciQ62HrglZanMqyx4a9LGbrxLdw1KJ19CNmSkwB5mQuZg
 LhV05Y0Gv4tkRC8sCo7HF7cqgjBfjTNiEjZYfbExW0QFOMKIgKmmjYIEezVdbrk7
 gFIyhTRE595UtztUJV0dcitoOlybbRf3OdEwAIJD6fc0vhoe/rSjUIyS7/CZisMT
 9zg33JvtK3eYPSJS1jy4lk2yZ5alhLoPMQTNmsEuyOGcU3sH9vTGMjONPffOlcH9
 nzj7aUS2Qvwn3H+4CIaZbZhySpa0B9zkGL3oxeaEBmLJbFMTo5ua2FNGhubC2O+O
 BwNULDBEMwsGHKMUCWCLmQwACWdVdNxYYWtXbWfxdmC/CJoXgdLCJIUfoa1aOf2A
 DyCqUFvG/n8ObHVy+P3RU6poQFj0M/yclJAMHRW6x2qzNvAkDb0G6TVeIlgN5dG8
 jLoZPL5OH0wb0BPVNEH8
 =OPIp
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."

Conflicts:
	net/mac80211/iface.c

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-15 14:51:23 -04:00
Marcel Holtmann
43e73e4e2a Bluetooth: Provide HCI command opcode information to driver
The Bluetooth core already does processing of the HCI command header
and puts it together before sending it to the driver. It is not really
efficient for the driver to look at the HCI command header again in
case it has to make certain decisions about certain commands. To make
this easier, just provide the opcode as part of the SKB control buffer
information. The extra information about the opcode is optional and
only provided for HCI commands.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-09-15 07:15:45 +03:00
Florian Fainelli
ac7a04c33d net: dsa: change tag_protocol to an enum
Now that we introduced an additional multiplexing/demultiplexing layer
with commit 3e8a72d1da ("net: dsa: reduce number of protocol hooks")
that lives within the DSA code, we no longer need to have a given switch
driver tag_protocol be an actual ethertype value, instead, we can
replace it with an enum: dsa_tag_protocol.

Do this replacement in the drivers, which allows us to get rid of the
cpu_to_be16()/htons() dance, and remove ETH_P_BRCMTAG since we do not
need it anymore.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:04:35 -04:00
WANG Cong
013b4d9038 ipv6: clean up ipv6_dev_ac_inc()
Make it accept inet6_dev, and rename it to __ipv6_dev_ac_inc()
to reflect this change.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
John Fastabend
25d8c0d55f net: rcu-ify tcf_proto
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
John Fastabend
46e5da40ae net: qdisc: use rcu prefix and silence sparse warnings
Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
Sabrina Dubroca
381f4dca48 ipv6: clean up anycast when an interface is destroyed
If we try to rmmod the driver for an interface while sockets with
setsockopt(JOIN_ANYCAST) are alive, some refcounts aren't cleaned up
and we get stuck on:

  unregister_netdevice: waiting for ens3 to become free. Usage count = 1

If we LEAVE_ANYCAST/close everything before rmmod'ing, there is no
problem.

We need to perform a cleanup similar to the one for multicast in
addrconf_ifdown(how == 1).

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:33:06 -04:00
Pablo Neira Ayuso
67981fefb2 netfilter: fix compilation of masquerading without IP_NF_TARGET_MASQUERADE
CONFIG_NF_NAT_MASQUERADE_IPV6=m
 # CONFIG_IP6_NF_TARGET_MASQUERADE is not set

results in:

net/ipv6/netfilter/nf_nat_masquerade_ipv6.c: In function ‘nf_nat_masquerade_ipv6’:
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:41:14: error: ‘struct nf_conn_nat’ has no member named ‘masq_index’
  nfct_nat(ct)->masq_index = out->ifindex;
              ^
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c: In function ‘device_cmp’:
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:61:12: error: ‘const struct nf_conn_nat’ has no member named ‘masq_index’
  return nat->masq_index == (int)(long)ifindex;
            ^
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:62:1: warning: control
reaches end of non-void function [-Wreturn-type]
 }
 ^
make[3]: *** [net/ipv6/netfilter/nf_nat_masquerade_ipv6.o] Error 1

Fix this by using the new NF_NAT_MASQUERADE_IPV4 and _IPV6 symbols
in include/net/netfilter/nf_nat.h.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-11 17:02:45 +02:00
Eliad Peller
0d8614b4b9 mac80211: replace SMPS hw flags with wiphy feature bits
Use the new static_smps / dynamic_smps feature bits
instead of mac80211-internal hw flags.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 13:37:02 +02:00
Eliad Peller
18998c381b cfg80211: allow requesting SMPS mode on ap start
Add feature bits to indicate device support for
static-smps and dynamic-smps modes.

Add a new NL80211_ATTR_SMPS_MODE attribue to allow
configuring the smps mode to be used by the ap
(e.g. configuring to ap to dynamic smps mode will
reduce power consumption while having minor effect
on throughput)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 13:37:02 +02:00
Eliad Peller
b0b6aa2c8e cfg80211/mac80211: add wmm info to assoc event
Userspace might need to know what queues are configured
for uapsd (e.g. for setting proper default values in tspecs).

Add this bitmap to the association event (inside wmm
nested attribute)

Add additional parameter to cfg80211_rx_assoc_resp,
and update its callers.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:24:39 +02:00
Johannes Berg
960d01acf6 cfg80211: add WMM traffic stream API
Add nl80211 and driver API to validate, add and delete traffic
streams with appropriate settings.

The API calls for userspace doing the action frame handshake
with the peer, and then allows only to set up the parameters
in the driver. To avoid setting up a session only to tear it
down again, the validate API is provided, but the real usage
later can still fail so userspace must be prepared for that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:21:18 +02:00
Johannes Berg
78f686cae0 cfg80211: don't put kek/kck/replay counter on the stack
There's no need to put the values on the stack, just pass a
pointer to the data in the nl80211 message. This reduces stack
usage and avoids potential issues with putting sensitive data
on the stack.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:07:34 +02:00
David S. Miller
0aac383353 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
nf-next pull request

The following patchset contains Netfilter/IPVS updates for your
net-next tree. Regarding nf_tables, most updates focus on consolidating
the NAT infrastructure and adding support for masquerading. More
specifically, they are:

1) use __u8 instead of u_int8_t in arptables header, from
   Mike Frysinger.

2) Add support to match by skb->pkttype to the meta expression, from
   Ana Rey.

3) Add support to match by cpu to the meta expression, also from
   Ana Rey.

4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from
   Vytas Dauksa.

5) Fix netnet and netportnet hash types the range support for IPv4,
   from Sergey Popovich.

6) Fix missing-field-initializer warnings resolved, from Mark Rustad.

7) Dan Carperter reported possible integer overflows in ipset, from
   Jozsef Kadlecsick.

8) Filter out accounting objects in nfacct by type, so you can
   selectively reset quotas, from Alexey Perevalov.

9) Move specific NAT IPv4 functions to the core so x_tables and
   nf_tables can share the same NAT IPv4 engine.

10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4.

11) Move specific NAT IPv6 functions to the core so x_tables and
    nf_tables can share the same NAT IPv4 engine.

12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6.

13) Refactor code to add nft_delrule(), which can be reused in the
    enhancement of the NFT_MSG_DELTABLE to remove a table and its
    content, from Arturo Borrero.

14) Add a helper function to unregister chain hooks, from
    Arturo Borrero.

15) A cleanup to rename to nft_delrule_by_chain for consistency with
    the new nft_*() functions, also from Arturo.

16) Add support to match devgroup to the meta expression, from Ana Rey.

17) Reduce stack usage for IPVS socket option, from Julian Anastasov.

18) Remove unnecessary textsearch state initialization in xt_string,
    from Bojan Prtvar.

19) Add several helper functions to nf_tables, more work to prepare
    the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero.

20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from
    Arturo Borrero.

21) Support NAT flags in the nat expression to indicate the flavour,
    eg. random fully, from Arturo.

22) Add missing audit code to ebtables when replacing tables, from
    Nicolas Dichtel.

23) Generalize the IPv4 masquerading code to allow its re-use from
    nf_tables, from Arturo.

24) Generalize the IPv6 masquerading code, also from Arturo.

25) Add the new masq expression to support IPv4/IPv6 masquerading
    from nf_tables, also from Arturo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:46:32 -07:00
Willem de Bruijn
67cc0d4077 net-timestamp: optimize sock_tx_timestamp default path
Few packets have timestamping enabled. Exit sock_tx_timestamp quickly
in this common case.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 17:34:41 -07:00
Vincent Bernat
49a601589c net/ipv4: bind ip_nonlocal_bind to current netns
net.ipv4.ip_nonlocal_bind sysctl was global to all network
namespaces. This patch allows to set a different value for each
network namespace.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:27:09 -07:00
Arturo Borrero
9ba1f726be netfilter: nf_tables: add new nft_masq expression
The nft_masq expression is intended to perform NAT in the masquerade flavour.

We decided to have the masquerade functionality in a separated expression other
than nft_nat.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:30 +02:00
Arturo Borrero
be6b635cd6 netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).

The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.

This factorization only involves IPv6; a similar patch exists to
handle IPv4.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:29 +02:00
Arturo Borrero
8dd33cc93e netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).

The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.

This factorization only involves IPv4; a similar patch follows to
handle IPv6.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:29 +02:00
Pablo Neira Ayuso
2a5538e9aa netfilter: nat: move specific NAT IPv6 to core
Move the specific NAT IPv6 core functions that are called from the
hooks from ip6table_nat.c to nf_nat_l3proto_ipv6.c. This prepares the
ground to allow iptables and nft to use the same NAT engine code that
comes in a follow up patch.

This also renames nf_nat_ipv6_fn to nft_nat_ipv6_fn in
net/ipv6/netfilter/nft_chain_nat_ipv6.c to avoid a compilation breakage.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:30:00 +02:00
Johan Hedberg
e1e930f591 Bluetooth: Fix mgmt pairing failure when authentication fails
Whether through HCI with BR/EDR or SMP with LE when authentication fails
we should also notify any pending Pair Device mgmt command. This patch
updates the mgmt_auth_failed function to take the actual hci_conn object
and makes sure that any pending pairing command is notified and cleaned
up appropriately.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-09 03:12:15 +02:00
David S. Miller
5b4c314575 Merge tag 'master-2014-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-09-08

Please pull this batch of updates intended for the 3.18 stream...

For the mac80211 bits, Johannes says:

"Not that much content this time. Some RCU cleanups, crypto
performance improvements, and various patches all over,
rather than listing them one might as well look into the
git log instead."

For the Bluetooth bits, Gustavo says:

"The changes consists of:

        - Coding style fixes to HCI drivers
        - Corrupted ack value fix for the H5 HCI driver
        - A couple of Enhanced L2CAP fixes
        - Conversion of SMP code to use common L2CAP channel API
        - Page scan optimizations when using the kernel-side whitelist
        - Various mac802154 and and ieee802154 6lowpan cleanups
        - One new Atheros USB ID"

For the iwlwifi bits, Emmanuel says:

"We have a new big thing coming up which is called Dynamic Queue
Allocation (or DQA).  This is a completely new way to work with the
Tx queues and it requires major refactoring.  This is being done by
Johannes and Avri.  Besides this, Johannes disables U-APSD by default
because of APs that would disable A-MPDU if the association supports
U-ASPD.  Luca contributed to the power area which he was cleaning
up on the way while working on CSA.  A few more random things here
and there."

For the Atheros bits, Kalle says:

"For ath6kl we had two small fixes and a new SDIO device id.

For ath10k the bigger changes are:

 * support for new firmware version 10.2 (Michal)

 * spectral scan support (Simon, Sven & Mathias)

 * export a firmware crash dump file (Ben & me)

 * cleaning up of pci.c (Michal)

 * print pci id in all messages, which causes most of the churn (Michal)"

Beyond that, we have the usual collection of various updates to ath9k,
b43, mwifiex, and wil6210, as well as a few other bits here and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-08 16:43:58 -07:00
Willem de Bruijn
a7f26b7e1e inet: remove dead inetpeer sequence code
inetpeer sequence numbers are no longer incremented, so no need to
check and flush the tree. The function that increments the sequence
number was already dead code and removed in in "ipv4: remove unused
function" (068a6e18). Remove the code that checks for a change, too.

Verifying that v4_seq and v6_seq are never incremented and thus that
flush_check compares bp->flush_seq to 0 is trivial.

The second part of the change removes flush_check completely even
though bp->flush_seq is exactly !0 once, at initialization. This
change is correct because the time this branch is true is when
bp->root == peer_avl_empty_rcu, in which the branch and
inetpeer_invalidate_tree are a NOOP.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-08 16:42:42 -07:00
Johan Hedberg
fc75cc8684 Bluetooth: Fix locking of the SMP context
Before the move the l2cap_chan the SMP context (smp_chan) didn't have
any kind of proper locking. The best there existed was the
HCI_CONN_LE_SMP_PEND flag which was used to enable mutual exclusion for
potential multiple creators of the SMP context.

Now that SMP has been converted to use the l2cap_chan infrastructure and
since the SMP context is directly mapped to a corresponding l2cap_chan
we get the SMP context locking essentially for free through the
l2cap_chan lock. For all callbacks that l2cap_core.c makes for each
channel implementation (smp.c in the case of SMP) the l2cap_chan lock is
held through l2cap_chan_lock(chan).

Since the calls from l2cap_core.c to smp.c are covered the only missing
piece to have the locking implemented properly is to ensure that the
lock is held for any other call path that may access the SMP context.
This means user responses through mgmt.c, requests to elevate the
security of a connection through hci_conn.c, as well as any deferred
work through workqueues.

This patch adds the necessary locking to all these other code paths that
try to access the SMP context. Since mutual exclusion for the l2cap_chan
access is now covered from all directions the patch also removes
unnecessary HCI_CONN_LE_SMP_PEND flag (once we've acquired the chan lock
we can simply check whether chan->smp is set to know if there's an SMP
context).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-08 19:07:56 +02:00
Johan Hedberg
f3d82d0c8e Bluetooth: Move identity address update behind a workqueue
The identity address update of all channels for an l2cap_conn needs to
take the lock for each channel, i.e. it's safest to do this by a
separate workqueue callback.

Previously this was partially solved by moving the entire SMP key
distribution behind a workqueue. However, if we want SMP context locking
to be correct and safe we should always use the l2cap_chan lock when
accessing it, meaning even smp_distribute_keys needs to take that lock
which would once again create a dead lock when updating the identity
address.

The simplest way to solve this is to have l2cap_conn manage the deferred
work which is what this patch does. A subsequent patch will remove the
now unnecessary SMP key distribution work struct.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-08 19:07:55 +02:00
Johan Hedberg
e3b679d56c Bluetooth: Update hci_disconnect() to return an error value
We'll soon use hci_disconnect() from places that are interested to know
whether the hci_send_cmd() really succeeded or not. This patch updates
hci_disconnect() to pass on any error returned from hci_send_cmd().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-08 19:07:55 +02:00
Johan Hedberg
b04afa0c28 Bluetooth: Remove unused l2cap_conn_shutdown API
Now that there are no more users of the l2cap_conn_shutdown API (since
smp.c switched to using hci_disconnect) we can simply remove it along
with all of it's l2cap_conn variables.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-08 19:07:54 +02:00
Johan Hedberg
f94b665dcf Bluetooth: Ignore incoming data after initiating disconnection
When hci_chan_del is called the disconnection routines get scheduled
through a workqueue. If there's any incoming ACL data before the
routines get executed there's a chance that a new hci_chan is created
and the disconnection never happens. This patch adds a new hci_conn flag
to indicate that we're in the process of driving the connection down. We
set the flag in hci_chan_del and check for it in hci_chan_create so that
no new channels are created for the same connection.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-08 19:07:53 +02:00
Johan Hedberg
eb78d7e53d Bluetooth: Use zero timeout for immediate scheduling
There's no point in passing a "small" timeout to queue_delayed_work() to
try to get the callback faster scheduled. Passing 0 is perfectly valid
and will cause a shortcut to a direct queue_work().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-08 19:07:53 +02:00
Johan Hedberg
51bb8457dd Bluetooth: Improve *_get() functions to return the object type
It's natural to have *_get() functions that increment the reference
count of an object to return the object type itself. This way it's
simple to make a copy of the object pointer and increase the reference
count in a single step. This patch updates two such get() functions,
namely hci_conn_get() and l2cap_conn_get(), and updates the users to
take advantage of the new API.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-08 19:07:52 +02:00
Johan Hedberg
c16900cf28 Bluetooth: Fix hci_conn reference counting for fixed channels
Now that SMP has been converted to use fixed channels we've got a bit of
a problem with the hci_conn reference counting. So far the L2CAP code
has kept a reference for each L2CAP channel that was notified of the
connection. With SMP however this would mean that the connection is
never dropped even though there are no other users of it. Furthermore,
SMP already does its own hci_conn reference counting internally,
starting from a security or pairing request and ending with the key
distribution.

This patch makes L2CAP fixed channels default to the L2CAP core not
keeping a hci_conn reference for them. A new FLAG_HOLD_HCI_CONN flag is
added so that L2CAP users can declare an exception to this rule and hold
a reference even for their fixed channels. One such exception is the
L2CAP socket layer which does want a reference for each socket (e.g. an
ATT socket which uses a fixed channel).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-08 19:07:52 +02:00
John W. Linville
61a3d4f9d5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-09-08 11:14:56 -04:00
David S. Miller
eb84d6b604 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-07 21:41:53 -07:00
Tejun Heo
908c7f1949 percpu_counter: add @gfp to percpu_counter_init()
Percpu allocator now supports allocation mask.  Add @gfp to
percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
with percpu_counters too.

We could have left percpu_counter_init() alone and added
percpu_counter_init_gfp(); however, the number of users isn't that
high and introducing _gfp variants to all percpu data structures would
be quite ugly, so let's just do the conversion.  This is the one with
the most users.  Other percpu data structures are a lot easier to
convert.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
2014-09-08 09:51:29 +09:00
David S. Miller
45ce829dd0 Merge tag 'master-2014-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-09-05

Please pull this batch of fixes intended for the 3.17 stream...

For the mac80211 bits, Johannes says:

"Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
shouldn't be necessary but since many places treat it as a string we
couldn't move to just sending two bytes.

In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
fix for the recently introduced RX aggregation offload, a revert for
a broken patch (that luckily didn't really cause any harm) and a small
fix for alignment in debugfs."

For the iwlwifi bits, Emmanuel says:

"I revert a patch that disabled CTS to self in dvm because users
reported issues. The revert is CCed to stable since the offending
patch was sent to stable too. I also bump the firmware API versions
since a new firmware is coming up. On top of that, Marcel fixes a
bug I introduced while fixing a bug in our Kconfig file."

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-07 16:11:10 -07:00
Eric Dumazet
7faee5c0d5 tcp: remove TCP_SKB_CB(skb)->when
After commit 740b0f1841 ("tcp: switch rtt estimations to usec resolution"),
we no longer need to maintain timestamps in two different fields.

TCP_SKB_CB(skb)->when can be removed, as same information sits in skb_mstamp.stamp_jiffies

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:49:33 -07:00
Eric Dumazet
04317dafd1 tcp: introduce TCP_SKB_CB(skb)->tcp_tw_isn
TCP_SKB_CB(skb)->when has different meaning in output and input paths.

In output path, it contains a timestamp.
In input path, it contains an ISN, chosen by tcp_timewait_state_process()

Lets add a different name to ease code comprehension.

Note that 'when' field will disappear in following patch,
as skb_mstamp already contains timestamp, the anonymous
union will promptly disappear as well.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:49:33 -07:00
Alexander Duyck
56193d1bce net: Add function for parsing the header length out of linear ethernet frames
This patch updates some of the flow_dissector api so that it can be used to
parse the length of ethernet buffers stored in fragments.  Most of the
changes needed were to __skb_get_poff as it needed to be updated to support
sending a linear buffer instead of a skb.

I have split __skb_get_poff into two functions, the first is skb_get_poff
and it retains the functionality of the original __skb_get_poff.  The other
function is __skb_get_poff which now works much like __skb_flow_dissect in
relation to skb_flow_dissect in that it provides the same functionality but
works with just a data buffer and hlen instead of needing an skb.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:47:02 -07:00
Alexander Duyck
82eabd9eb2 net: merge cases where sock_efree and sock_edemux are the same function
Since sock_efree and sock_demux are essentially the same code for non-TCP
sockets and the case where CONFIG_INET is not defined we can combine the
code or replace the call to sock_edemux in several spots.  As a result we
can avoid a bit of unnecessary code or code duplication.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:43:45 -07:00
Alexander Duyck
62bccb8cdb net-timestamp: Make the clone operation stand-alone from phy timestamping
The phy timestamping takes a different path than the regular timestamping
does in that it will create a clone first so that the packets needing to be
timestamped can be placed in a queue, or the context block could be used.

In order to support these use cases I am pulling the core of the code out
so it can be used in other drivers beyond just phy devices.

In addition I have added a destructor named sock_efree which is meant to
provide a simple way for dropping the reference to skb exceptions that
aren't part of either the receive or send windows for the socket, and I
have removed some duplication in spots where this destructor could be used
in place of sock_edemux.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:43:45 -07:00
Eric Dumazet
d546c62154 ipv4: harden fnhe_hashfun()
Lets make this hash function a bit secure, as ICMP attacks are still
in the wild.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:40:33 -07:00
Masanari Iida
e793c0f70e net: treewide: Fix typo found in DocBook/networking.xml
This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:35:28 -07:00
Eric Dumazet
caa415270c ipv4: fix a race in update_or_create_fnhe()
nh_exceptions is effectively used under rcu, but lacks proper
barriers. Between kzalloc() and setting of nh->nh_exceptions(),
we need a proper memory barrier.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 4895c771c7 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:15:50 -07:00
Willem de Bruijn
c199105d15 net-timestamp: only report sw timestamp if reporting bit is set
The timestamping API has separate bits for generating and reporting
timestamps. A software timestamp should only be reported for a packet
when the packet has the relevant generation flag (SKBTX_..) set
and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set.

The second check was accidentally removed. Reinstitute the original
behavior.

Tested:
  Without this patch, Documentation/networking/txtimestamp reports
  timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set.
  After the patch, it only reports them when the flag is set.

Fixes: f24b9be595 ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 15:02:43 -07:00
Lorenzo Bianconi
a4bcaf5556 mac80211: extend set_coverage_class signature
Extend mac80211 set_coverage_class API in order to enable ACK timeout
estimation algorithm (dynack) passing coverage class equals to -1
to lower drivers. Synchronize set_coverage_class routine signature with
mac80211 function pointer for p54, ath9k, ath9k_htc and ath5k drivers.

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-05 13:54:07 +02:00
Lorenzo Bianconi
3057dbfdab cfg80211: enable dynack through nl80211
Enable ACK timeout estimation algorithm (dynack) using mac80211
set_coverage_class API. Dynack is activated passing coverage class equals to -1
to lower drivers and it is automatically disabled setting valid value for
coverage class.
Define NL80211_ATTR_WIPHY_DYN_ACK flag attribute to enable dynack from
userspace. In order to activate dynack NL80211_FEATURE_ACKTO_ESTIMATION feature
flag must be set by lower drivers to indicate dynack capability.

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-05 13:54:03 +02:00
Assaf Krauss
bab5ab7d2a nl80211: Add flag attribute for RRM connections
Add a flag attribute to use in associations, for tagging the target
connection as supporting RRM. It is the responsibility of upper
layers to set this flag only if both the underlying device, and the
target network indeed support RRM.
To be used in ASSOCIATE and CONNECT commands.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-05 13:52:08 +02:00
Johannes Berg
2740f0cf8e cfg80211: add Intel Mobile Communications copyright
Our legal structure changed at some point (see wikipedia), but
we forgot to immediately switch over to the new copyright
notice.

For files that we have modified in the time since the change,
add the proper copyright notice now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-05 13:52:06 +02:00
Johannes Berg
d98ad83ee8 mac80211: add Intel Mobile Communications copyright
Our legal structure changed at some point (see wikipedia), but
we forgot to immediately switch over to the new copyright
notice.

For files that we have modified in the time since the change,
add the proper copyright notice now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-05 13:52:06 +02:00
Hannes Frederic Sowa
2f711939d2 ipv6: add sysctl_mld_qrv to configure query robustness variable
This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query
robustness variable. It specifies how many retransmit of unsolicited mld
retransmit should happen. Admins might want to tune this on lossy links.

Also reset mld state on interface down/up, so we pick up new sysctl
settings during interface up event.

IPv6 certification requests this knob to be available.

I didn't make this knob netns specific, as it is mostly a setting in a
physical environment and should be per host.

Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-04 22:26:14 -07:00
John W. Linville
ef4ead3f29 Not that much content this time. Some RCU cleanups, crypto
performance improvements, and various patches all over,
 rather than listing them one might as well look into the
 git log instead.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUAIx4AAoJEDBSmw7B7bqrUYcP/3t4qdFxm0bd4j2AEkl3mPwB
 Qu7obTicOTfBRoJNEgS+8AU2u3PfztU6+ErZs4ETLUuqaZwXisqmwBiMo86+Wtdf
 gx9KonwEW051g7YmB0+6EMwuy04MGzTEk8VavQwqM4g9LIPJ4Buo/kj7MNJ51m11
 XyRmJqZJnKKeiiQ4eC0gPf8e44qiQqaDuYZ0r1UDnNRg2KrbAHlGTBKYI3VRl2u4
 xRpPGVnHwT0qkWb1Zw9fk0VfPr9m1ETthzcZvnhk6uMnJ28D+1B1FjZR1GJU6BW7
 Zx2FbevbZTjDoNT1GQpLGMXBuW0lsZFetXVFiJCr/StaPBtHmtdu28fuNVm8yJYz
 euDlEgrE8F4npdec2F5R2zh7Ue2U7eMEL2uxxjciNSJOipHgx5EXH12Y/5QtrChy
 4OHPbNHgpmqFB7TmkvHDgP/0A7XdyqKVc+NtIV+eECIwE4tHcJ6A+bQ+ZCoRV2Vw
 zmsNuNeNeDW7NEAw9veRXissLZMy/EjUnsOrnW29BpO/yG+2YjqpyQ6JQpcXeCPD
 WQgl2FHpk6ap3jpVjxminxw2HkDnQ0oTKusGLcezalhUlWMo7VYNN59aLzcphxX5
 Fotp/8v1sbDTF46uc/QJ38N5TqflwWeFpxvGkdNGuAT4llP03NaXV0ORBecFmMW2
 esb+PLwlByCDeVFu53q+
 =Qth6
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"Not that much content this time. Some RCU cleanups, crypto
performance improvements, and various patches all over,
rather than listing them one might as well look into the
git log instead."

Signed-off-by: John W. Linville <linville@tuxdriver.com>

Conflicts:
	drivers/net/wireless/ath/wil6210/wmi.c
2014-09-04 13:41:33 -04:00
John W. Linville
190355cc06 Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
 shouldn't be necessary but since many places treat it as a string we
 couldn't move to just sending two bytes.
 
 In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
 fix for the recently introduced RX aggregation offload, a revert for
 a broken patch (that luckily didn't really cause any harm) and a small
 fix for alignment in debugfs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUAFrmAAoJEDBSmw7B7bqr79AP/1yGi9lkv/wWUs5y0AhUSen9
 850MU26BBlyAAFSz11xqgaEeRmeBeqhR3K7w/M02TX0CHxBzMqMZfyE//tq0UJaI
 ZwZmtyQmdMiOSNKignTIIx7OHTioq0wrGKb6O2UvKoJfTlB9t01jCC4jmCTF5Vos
 6ReF7NaZEbxW6XDOsClNTAtIa1c6n1RQ5VbDIEL5Vfvqv8LbcobduF8WcYl80eIQ
 +EvIHtUm/Luxg6DblibgEVtwYOtNpvRz4pofdw3xoSHAnF+zhXbUr0dUjpkBNA7o
 vWboCBl14Qn1M7pOJZ0+TBzFmquAr6CDbDvArVCH01Swh27EUDQUcHQAggGpT71w
 DFgWHOYP0UCB6Y4U0GjBehy8PeuytqJLBSceKVud7DDqd8fY+Lq3MMyicIk0aw3o
 IIDLWrujkCBXsdfuxQETmYxHU05WHSuYOCTgGSqbq3QPTWm8pBGWTdbk+1t/0FyH
 cGLJOWs/jCrtHdzDj6TH+kL8NmvwB7sC9MT45qG0ilevmPW25yrnTJPEMEvFBqvZ
 lnaqiX6D1kGNZd09CxgSIhxrQi+N0Yg+UlLa4IUtOIqnQussOC3xH2U5qTufdpa1
 Gi9aCkBGVKQiObPWucf2QB4t1sZ18rxBhrAelZhQPLTKrnsuLhpcVBlU+L6ScCAk
 FVni4HZH2IGtDQ577k10
 =G/pB
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-john-2014-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg <johannes@sipsolutions.net> says:

"Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
shouldn't be necessary but since many places treat it as a string we
couldn't move to just sending two bytes.

In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
fix for the recently introduced RX aggregation offload, a revert for
a broken patch (that luckily didn't really cause any harm) and a small
fix for alignment in debugfs."

Signed-off-by: John W. Linville <linville@redhat.com>
2014-09-04 13:08:24 -04:00
Pablo Neira Ayuso
30766f4c2d netfilter: nat: move specific NAT IPv4 to core
Move the specific NAT IPv4 core functions that are called from the
hooks from iptable_nat.c to nf_nat_l3proto_ipv4.c. This prepares the
ground to allow iptables and nft to use the same NAT engine code that
comes in a follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-02 17:14:10 +02:00
Christophe Gouault
880a6fab8f xfrm: configure policy hash table thresholds by netlink
Enable to specify local and remote prefix length thresholds for the
policy hash table via a netlink XFRM_MSG_NEWSPDINFO message.

prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and
XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh).

example:

    struct xfrmu_spdhthresh thresh4 = {
        .lbits = 0;
        .rbits = 24;
    };
    struct xfrmu_spdhthresh thresh6 = {
        .lbits = 0;
        .rbits = 56;
    };
    struct nlmsghdr *hdr;
    struct nl_msg *msg;

    msg = nlmsg_alloc();
    hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST);
    nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4);
    nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6);
    nla_send_auto(sk, msg);

The numbers are the policy selector minimum prefix lengths to put a
policy in the hash table.

- lbits is the local threshold (source address for out policies,
  destination address for in and fwd policies).

- rbits is the remote threshold (destination address for out
  policies, source address for in and fwd policies).

The default values are:

XFRMA_SPD_IPV4_HTHRESH: 32 32
XFRMA_SPD_IPV6_HTHRESH: 128 128

Dynamic re-building of the SPD is performed when the thresholds values
are changed.

The current thresholds can be read via a XFRM_MSG_GETSPDINFO request:
the kernel replies to XFRM_MSG_GETSPDINFO requests by an
XFRM_MSG_NEWSPDINFO message, with both attributes
XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-02 13:37:56 +02:00
Christophe Gouault
b58555f176 xfrm: hash prefixed policies based on preflen thresholds
The idea is an extension of the current policy hashing.

Today only non-prefixed policies are stored in a hash table. This
patch relaxes the constraints, and hashes policies whose prefix
lengths are greater or equal to a configurable threshold.

Each hash table (one per direction) maintains its own set of IPv4 and
IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32,
128, 128).

Example, if the output hash table is configured with values (16, 24,
56, 64):

ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed

ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ...    => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ...  => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ...    => unhashed

The high order bits of the addresses (up to the threshold) are used to
compute the hash key.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-02 13:29:44 +02:00
Willem de Bruijn
364a9e9324 sock: deduplicate errqueue dequeue
sk->sk_error_queue is dequeued in four locations. All share the
exact same logic. Deduplicate.

Also collapse the two critical sections for dequeue (at the top of
the recv handler) and signal (at the bottom).

This moves signal generation for the next packet forward, which should
be harmless.

It also changes the behavior if the recv handler exits early with an
error. Previously, a signal for follow-up packets on the errqueue
would then not be scheduled. The new behavior, to always signal, is
arguably a bug fix.

For rxrpc, the change causes the same function to be called repeatedly
for each queued packet (because the recv handler == sk_error_report).
It is likely that all packets will fail for the same reason (e.g.,
memory exhaustion).

This code runs without sk_lock held, so it is not safe to trust that
sk->sk_err is immutable inbetween releasing q->lock and the subsequent
test. Introduce int err just to avoid this potential race.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 21:49:08 -07:00
David S. Miller
377655fe6b Merge tag 'master-2014-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-08-28

Please pull this batch of fixes intended for the 3.17 stream.

For the Bluetooth/6LowPAN/802.15.4 bits, Johan says:

'It contains a connection reference counting fix for LE where a
connection might stay up even though it should get disconnected.

The other 802.15.4 6LoWPAN related patches were sent to the bluetooth
tree by Alexander Aring and described as follows by him:

"
these patches contains patches for the bluetooth branch.

This series includes memory leak fixes and an errno value fix.
Also there are two patches for sending and receiving 1280 6LoWPAN
packets, which makes the IEEE 802.15.4 6LoWPAN stack more RFC
compliant.
"'

Along with that...

Alexey Khoroshilov fixes a use-after-free bug on at76c50x-usb.

Hauke Mehrtens adds a PCI ID to bcma.

Himangi Saraogi fixes a silly "A || A" test in rtlwifi.

Larry Finger adds a device ID to rtl8192cu.

Maks Naumov fixes a strncmp argument in ath9k.

Álvaro Fernández Rojas adds a PCI ID to ssb.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 18:08:11 -07:00
Vincent Cuissard
cfdbeeafdb NFC: NCI: Add support of ISO15693
Update nci.h to respect latest NCI specification proposal
(stop using proprietary opcodes). Handle ISO15693 parameters
in NCI_RF_ACTIVATED_NTF handler.

Signed-off-by: Vincent Cuissard <cuissard@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-09-01 14:40:31 +02:00
Daniel Borkmann
38ab1fa981 net: sctp: fix ABI mismatch through sctp_assoc_to_state helper
Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp
tree, the official <netinet/sctp.h> header carries a copy of enum
sctp_sstat_state that looks like (compared to the current in-kernel
enumeration):

  User definition:                     Kernel definition:

  enum sctp_sstat_state {              typedef enum {
    SCTP_EMPTY             = 0,          <removed>
    SCTP_CLOSED            = 1,          SCTP_STATE_CLOSED            = 0,
    SCTP_COOKIE_WAIT       = 2,          SCTP_STATE_COOKIE_WAIT       = 1,
    SCTP_COOKIE_ECHOED     = 3,          SCTP_STATE_COOKIE_ECHOED     = 2,
    SCTP_ESTABLISHED       = 4,          SCTP_STATE_ESTABLISHED       = 3,
    SCTP_SHUTDOWN_PENDING  = 5,          SCTP_STATE_SHUTDOWN_PENDING  = 4,
    SCTP_SHUTDOWN_SENT     = 6,          SCTP_STATE_SHUTDOWN_SENT     = 5,
    SCTP_SHUTDOWN_RECEIVED = 7,          SCTP_STATE_SHUTDOWN_RECEIVED = 6,
    SCTP_SHUTDOWN_ACK_SENT = 8,          SCTP_STATE_SHUTDOWN_ACK_SENT = 7,
  };                                   } sctp_state_t;

This header was later on also placed into the uapi, so that user space
programs can compile without having <netinet/sctp.h>, but the shipped
with <linux/sctp.h> instead.

While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that
sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we
nevertheless have a what it appears to be dummy SCTP_EMPTY state from
the very early days.

While it seems to do just nothing, commit 0b8f9e25b0 ("sctp: remove
completely unsed EMPTY state") did the right thing and removed this dead
code. That however, causes an off-by-one when the user asks the SCTP
stack via SCTP_STATUS API and checks for the current socket state thus
yielding possibly undefined behaviour in applications as they expect
the kernel to tell the right thing.

The enumeration had to be changed however as based on the current socket
state, we access a function pointer lookup-table through this. Therefore,
I think the best way to deal with this is just to add a helper function
sctp_assoc_to_state() to encapsulate the off-by-one quirk.

Reported-by: Tristan Su <sooqing@gmail.com>
Fixes: 0b8f9e25b0 ("sctp: remove completely unsed EMPTY state")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29 20:31:08 -07:00
Florian Fainelli
5037d532b8 net: dsa: add Broadcom tag RX/TX handler
Add support for the 4-bytes Broadcom tag that built-in switches such as
the Starfighter 2 might insert when receiving packets, or that we need
to insert while targetting specific switch ports. We use a fake local
EtherType value for this 4-bytes switch tag: ETH_P_BRCMTAG to make sure
we can assign DSA-specific network operations within the DSA drivers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
ce31b31c68 net: dsa: allow updating fixed PHY link information
Allow switch drivers to hook a PHY link update callback to perform
port-specific link work.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
ec9436baed net: dsa: allow drivers to do link adjustment
Whenever libphy determines that the link status of a given PHY/port has
changed, allow to call into the switch driver link adjustment callback
so proper actions can be taken care of by the switch driver upon link
notification.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
5aed85cec2 net: dsa: allow switches to work without tagging
In case switch port tagging is disabled (voluntarily, or the switch just
does not support it), allow us to continue using the defined set of
dsa_device_ops in net/dsa/slave.c.

We introduce dsa_protocol_is_tagged() to check whether we need to
override skb->protocol and go through the DSA-specifif packet_type
function, or if we just go on and receive the SKB through the normal
path.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
0d8bcdd383 net: dsa: allow for more complex PHY setups
Modify the DSA slave interface to be bound to an arbitray PHY, not just
the ones that are available as child PHY devices of the switch MDIO bus.

This allows us for instance to have external PHYs connected to a
separate MDIO bus, but yet also connected to a given switch port.

Under certain configurations, the physical port mask might not be a 1:1
mapping to the MII PHYs mask. This is the case, if e.g: Port 1 of the
switch is used and connects to a PHY at a MDIO address different than 1.

Introduce a phys_mii_mask variable which allows driver to implement and
divert their own MDIO read/writes operations for a subset of the MDIO
PHY addresses.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
bd47497a01 net: dsa: retain a per-port device_node pointer
We will later use the per-port device_node pointer to fetch a bunch of
port-specific properties.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
fa981d9af8 net: dsa: provide a switch device device tree node pointer
We might need to fetch additional resources from the device tree node
pointer, such as register ranges or other properties. Keep a device_node
pointer around for this.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:39 -07:00
Florian Fainelli
3e8a72d1da net: dsa: reduce number of protocol hooks
DSA is currently registering one packet_type function per EtherType it
needs to intercept in the receive path of a DSA-enabled Ethernet device.
Right now we have three of them: trailer, DSA and eDSA, and there might
be more in the future, this will not scale to the addition of new
protocols.

This patch proceeds with adding a new layer of abstraction and two new
functions:

dsa_switch_rcv() which will dispatch into the tag-protocol specific
receive function implemented by net/dsa/tag_*.c

dsa_slave_xmit() which will dispatch into the tag-protocol specific
transmit function implemented by net/dsa/tag_*.c

When we do create the per-port slave network devices, we iterate over
the switch protocol to assign the DSA-specific receive and transmit
operations.

A new fake ethertype value is used: ETH_P_XDSA to illustrate the fact
that this is no longer going to look like ETH_P_DSA or ETH_P_TRAILER
like it used to be.

This allows us to greatly simplify the check in eth_type_trans() and
always override the skb->protocol with ETH_P_XDSA for Ethernet switches
tagged protocol, while also reducing the number repetitive slave
netdevice_ops assignments.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:39 -07:00
Christoph Lameter
903ceff7ca net: Replace get_cpu_var through this_cpu_ptr
Replace uses of get_cpu_var for address calculation through this_cpu_ptr.

Cc: netdev@vger.kernel.org
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:47 -04:00
Johannes Berg
5bc8c1f2b0 cfg80211: allow passing frame type to cfg80211_inform_bss()
When using the cfg80211_inform_bss[_width]() functions drivers
cannot currently indicate whether the data was received in a
beacon or probe response. Fix that by passing a new enum that
indicates such (or unknown).

For good measure, use it in ath6kl.

Acked-by: Kalle Valo <kvalo@qca.qualcomm.com> [ath6kl]
Acked-by: Arend van Spriel <arend@broadcom.com> [brcmfmac]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-08-26 11:16:02 +02:00
Johannes Berg
0e227084ae cfg80211: clarify BSS probe response vs. beacon data
There are a few possible cases of where BSS data came from:
 1) only a beacon has been received
 2) only a probe response has been received
 3) the driver didn't report what it received (this happens when
    using cfg80211_inform_bss[_width]())
 4) both probe response and beacon data has been received

Unfortunately, in the userspace API, a few things weren't there:
 a) there was no way to differentiate cases 1) and 4) above
    without comparing the data of the IEs
 b) the TSF was always from the last frame, instead of being
    exposed for beacon/probe response separately like IEs

Fix this by
   i) exporting a new flag attribute that indicates whether or
      not probe response data has been received - this addresses (a)
  ii) exporting a BEACON_TSF attribute that holds the beacon's TSF
      if a beacon has been received
 iii) not exporting the beacon attributes in case (3) above as that
      would just lead userspace into thinking the data actually came
      from a beacon when that isn't clear

To implement this, track inside the IEs struct whether or not it
(definitely) came from a beacon.

Reported-by: William Seto
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-08-26 11:16:01 +02:00
Ido Yariv
c70f59a2a0 mac80211: don't resize skbs needlessly
Header-less cloned skbs with sufficient headroom need not be cloned
unless the tailroom is going to be modified.

Fix ieee80211_skb_resize so it would only resize cloned skbs if either
the header isn't released or the tailroom is going to be modified.

Some drivers might have assumed that skbs are never cloned, so add a HW
flag that explicitly permits cloned TX skbs. Drivers which do not modify
TX skbs should set this flag to avoid copying skbs.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-08-26 11:16:00 +02:00
Ido Yariv
ca34e3b5c8 mac80211: Fix accounting of the tailroom-needed counter
When hw acceleration is enabled, the GENERATE_IV or PUT_IV_SPACE flags
will only require headroom space. Consequently, the tailroom-needed
counter can safely be decremented.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-08-26 11:15:59 +02:00
Vladimir Kondratiev
970fdfa89b cfg80211: remove @gfp parameter from cfg80211_rx_mgmt()
In the cfg80211_rx_mgmt(), parameter @gfp was used for the memory allocation.
But, memory get allocated under spin_lock_bh(), this implies atomic context.
So, one can't use GFP_KERNEL, only variants with no __GFP_WAIT. Actually, in all
occurrences GFP_ATOMIC is used (wil6210 use GFP_KERNEL by mistake),
and it should be this way or warning triggered in the memory allocation code.

Remove @gfp parameter as no actual choice exist, and use hard coded
GFP_ATOMIC for memory allocation.

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-08-26 11:15:58 +02:00
WANG Cong
453a940ea7 net: make skb an optional parameter for__skb_flow_dissect()
Fixes: commit 690e36e726 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25 17:21:26 -07:00
John W. Linville
07bc788424 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-08-25 15:58:02 -04:00
John W. Linville
0fdcaa5948 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2014-08-25 15:35:20 -04:00
Tom Herbert
57c67ff4bd udp: additional GRO support
Implement GRO for UDPv6. Add UDP checksum verification in gro_receive
for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:24 -07:00
Tom Herbert
1933a7852c net: add gro_compute_pseudo functions
Add inet_gro_compute_pseudo and ip6_gro_compute_pseudo. These are
the logical equivalents of inet_compute_pseudo and ip6_compute_pseudo
for GRO path. The IP header is taken from skb_gro_network_header.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:23 -07:00
David S. Miller
690e36e726 net: Allow raw buffers to be passed into the flow dissector.
Drivers, and perhaps other entities we have not yet considered,
sometimes want to know how deep the protocol headers go before
deciding how large of an SKB to allocate and how much of the packet to
place into the linear SKB area.

For example, consider a driver which has a device which DMAs into
pools of pages and then tells the driver where the data went in the
DMA descriptor(s).  The driver can then build an SKB and reference
most of the data via SKB fragments (which are page/offset/length
triplets).

However at least some of the front of the packet should be placed into
the linear SKB area, which comes before the fragments, so that packet
processing can get at the headers efficiently.  The first thing each
protocol layer is going to do is a "pskb_may_pull()" so we might as
well aggregate as much of this as possible while we're building the
SKB in the driver.

Part of supporting this is that we don't have an SKB yet, so we want
to be able to let the flow dissector operate on a raw buffer in order
to compute the offset of the end of the headers.

So now we have a __skb_flow_dissect() which takes an explicit data
pointer and length.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 12:13:41 -07:00
Eric Dumazet
d2de875c6d net: use ktime_get_ns() and ktime_get_real_ns() helpers
ktime_get_ns() replaces ktime_to_ns(ktime_get())

ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real())

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 19:57:23 -07:00
Johan Hedberg
f161dd4122 Bluetooth: Fix hci_conn reference counting for auto-connections
Recently the LE passive scanning and auto-connections feature was
introduced. It uses the hci_connect_le() API which returns a hci_conn
along with a reference count to that object. All previous users would
tie this returned reference to some existing object, such as an L2CAP
channel, and there'd be no leaked references this way. For
auto-connections however the reference was returned but not stored
anywhere, leaving established connections with one higher reference
count than they should have.

Instead of playing special tricks with hci_conn_hold/drop this patch
associates the returned reference from hci_connect_le() with the object
that in practice does own this reference, i.e. the hci_conn_params
struct that caused us to initiate a connection in the first place. Once
the connection is established or fails to establish this reference is
removed appropriately.

One extra thing needed is to call hci_pend_le_actions_clear() before
calling hci_conn_hash_flush() so that the reference is cleared before
the hci_conn objects are fully removed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-20 21:57:39 +03:00
Martin Townsend
6697dabe27 ieee802154: 6lowpan: ensure MTU of 1280 for 6lowpan
This patch drops the userspace accessable sysfs entry for the maximum
datagram size of a 6LoWPAN fragment packet.

A fragment should not have a datagram size value greater than 1280 byte.
Instead of make this value configurable, we accept 1280 datagram size
fragment packets only.

Signed-off-by: Martin Townsend <martin.townsend@xsilon.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-19 19:17:42 +02:00
Johannes Berg
a74a8c846f mac80211: don't duplicate station QoS capability data
We currently track the QoS capability twice: for all peer stations
in the WLAN_STA_WME flag, and for any clients associated to an AP
interface separately for drivers in the sta->sta.wme field.

Remove the WLAN_STA_WME flag and track the capability only in the
driver-visible field, getting rid of the limitation that the field
is only valid in AP mode.

Reviewed-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-08-15 14:38:08 +02:00
Eliad Peller
a5fe8e7695 regulatory: add NUL to alpha2
alpha2 is defined as 2-chars array, but is used in multiple
places as string (e.g. with nla_put_string calls), which
might leak kernel data.

Solve it by simply adding an extra char for the NULL
terminator, making such operations safe.

Cc: stable@vger.kernel.org
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-08-15 13:51:40 +02:00
Hannes Frederic Sowa
a26552afe8 tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic
tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
ordering of incoming connections and teardowns without the need to
hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
the last measured RTO. To do so, we keep the last seen timestamp in a
per-host indexed data structure and verify if the incoming timestamp
in a connection request is strictly greater than the saved one during
last connection teardown. Thus we can verify later on that no old data
packets will be accepted by the new connection.

During moving a socket to time-wait state we already verify if timestamps
where seen on a connection. Only if that was the case we let the
time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
will be used. But we don't verify this on incoming SYN packets. If a
connection teardown was less than TCP_PAWS_MSL seconds in the past we
cannot guarantee to not accept data packets from an old connection if
no timestamps are present. We should drop this SYN packet. This patch
closes this loophole.

Please note, this patch does not make tcp_tw_recycle in any way more
usable but only adds another safety check:
Sporadic drops of SYN packets because of reordering in the network or
in the socket backlog queues can happen. Users behing NAT trying to
connect to a tcp_tw_recycle enabled server can get caught in blackholes
and their connection requests may regullary get dropped because hosts
behind an address translator don't have synchronized tcp timestamp clocks.
tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.

In general, use of tcp_tw_recycle is disadvised.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14 14:38:54 -07:00
Neal Cardwell
4fab907195 tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().

Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Willem de Bruijn <willemb@google.com>
Fixes: 563d34d057 ("tcp: dont drop MTU reduction indications")
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14 14:38:54 -07:00
Andrey Vagin
9d186cac7f tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)
We don't know right timestamp for repaired skb-s. Wrong RTT estimations
isn't good, because some congestion modules heavily depends on it.

This patch adds the TCPCB_REPAIRED flag, which is included in
TCPCB_RETRANS.

Thanks to Eric for the advice how to fix this issue.

This patch fixes the warning:
[  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
[  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
[  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
[  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
[  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
[  879.569982] Call Trace:
[  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
[  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
[  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
[  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
[  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
[  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
[  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
[  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
[  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
[  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
[  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
[  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
[  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
[  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---

v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
    where it's set for other skb-s.

Fixes: 431a91242d ("tcp: timestamp SYN+DATA messages")
Fixes: 740b0f1841 ("tcp: switch rtt estimations to usec resolution")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14 14:38:54 -07:00
Johan Hedberg
276d807317 Bluetooth: Remove unused l2cap_conn->security_timer
Now that there are no-longer any users for l2cap_conn->security_timer we
can go ahead and simply remove it. The patch makes initialization of the
conn->info_timer unconditional since it's better not to leave any
l2cap_conn data structures uninitialized no matter what the underlying
transport.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:24 +02:00
Johan Hedberg
dec5b49235 Bluetooth: Add public l2cap_conn_shutdown() API to request disconnection
Since we no-longer do special handling of SMP within l2cap_core.c we
don't have any code for calling l2cap_conn_del() when smp.c doesn't like
the data it gets. At the same time we cannot simply export
l2cap_conn_del() since it will try to lock the channels it calls into
whereas we already hold the lock in the smp.c l2cap_chan callbacks (i.e.
it'd lead to a deadlock).

This patch adds a new l2cap_conn_shutdown() API which is very similar to
l2cap_conn_del() except that it defers the call to l2cap_conn_del()
through a workqueue, thereby making it safe to use it from an L2CAP
channel callback.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:21 +02:00
Johan Hedberg
5d88cc73dd Bluetooth: Convert SMP to use l2cap_chan infrastructure
Now that we have all the necessary pieces in place we can fully convert
SMP to use the L2CAP channel infrastructure. This patch adds the
necessary callbacks and removes the now unneeded conn->smp_chan pointer.

One notable behavioral change in this patch comes from the following
code snippet:

-       case L2CAP_CID_SMP:
-               if (smp_sig_channel(conn, skb))
-                       l2cap_conn_del(conn->hcon, EACCES);

This piece of code was essentially forcing a disconnection if garbage
SMP data was received. The l2cap_conn_del() function is private to
l2cap_conn.c so we don't have access to it anymore when using the L2CAP
channel callbacks. Therefore, the behavior of the new code is simply to
return errors in the recv() callback (which is simply the old
smp_sig_channel()), but no disconnection will occur.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:19 +02:00
Johan Hedberg
defce9e836 Bluetooth: Make AES crypto context private to SMP
Now that we have per-adapter SMP data thanks to the root SMP L2CAP
channel we can take advantage of it and attach the AES crypto context
(only used for SMP) to it. This means that the smp_irk_matches() and
smp_generate_rpa() function can be converted to internally handle the
AES context.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:19 +02:00
Johan Hedberg
70db83c4bc Bluetooth: Add SMP L2CAP channel skeleton
This patch creates the initial SMP L2CAP channels and a skeleton for
their callbacks. There is one per-adapter channel created upon adapter
registration, and then one channel per-connection created through the
new_connection callback. The channels are registered with the reserved
CID 0x1f for now in order to not conflict with existing SMP
functionality. Once everything is in place the value can be changed to
what it should be, i.e. L2CAP_CID_SMP.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:18 +02:00
Johan Hedberg
f193844c51 Bluetooth: Add more L2CAP convenience callbacks
In preparation for converting SMP to use l2cap_chan it's useful to add a
few more callback helpers so that smp.c won't need to define all of its
own.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:18 +02:00
Johan Hedberg
d52deb1748 Bluetooth: Resume BT_CONNECTED state after LE security elevation
The LE ATT socket uses a special trick where it temporarily sets
BT_CONFIG state for the duration of a security level elevation. In order
to not require special hacks for going back to BT_CONNECTED state in the
l2cap_core.c code the most reasonable place to resume the state is the
resume callback. This patch adds a new flag to track the pending
security level change and ensures that the state is set back to
BT_CONNECTED in the resume callback in case the flag is set.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:12 +02:00
Johan Hedberg
432df05eb1 Bluetooth: Create unified helper function for updating page scan
Similar to our hci_update_background_scan() function we can simplify a
lot of code by creating a unified helper function for doing page scan
updates. This patch adds such a function to hci_core.c and updates all
the relevant places to use it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:09 +02:00
Johan Hedberg
84c61d92bb Bluetooth: Add convenience function to check for pending power off
There are several situations where we're interested in knowing whether
we're currently in the process of powering off an adapter. This patch
adds a convenience function for the purpose and makes it public since
we'll soon need to access it from hci_event.c as well.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-08-14 08:49:08 +02:00
Linus Torvalds
96784de59f Merge branch 'stable-3.17' of git://git.infradead.org/users/pcmoore/selinux
Pull SElinux fixes from Paul Moore:
 "Two small patches to fix a couple of build warnings in SELinux and
  NetLabel.  The patches are obvious enough that I don't think any
  additional explanation is necessary, but it basically boils down to
  the usual: I was stupid, and these patches fix some of the stupid.

  Both patches were posted earlier this week to the SELinux list, and
  that is where they sat as I didn't think there were noteworthy enough
  to go upstream at this point in time, but DaveM would rather see them
  upstream now so who am I to argue.  As the patches are both very
  small"

* 'stable-3.17' of git://git.infradead.org/users/pcmoore/selinux:
  selinux: remove unused variabled in the netport, netnode, and netif caches
  netlabel: fix the netlbl_catmap_setlong() dummy function
2014-08-09 15:09:52 -07:00
Paul Moore
bc7e6edbbc netlabel: fix the netlbl_catmap_setlong() dummy function
When I added the netlbl_catmap_setlong() function I mistakenly forgot
to mark the associated dummy function as an inline.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-08-07 20:55:21 -04:00
Eric Dumazet
140c55d4b5 net-timestamp: sock_tx_timestamp() fix
sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
stack can store SKBTX_SHARED_FRAG in it.

Also first argument (struct sock *) can be const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 4ed2d765df ("net-timestamp: TCP timestamping")
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-06 12:38:07 -07:00
Linus Torvalds
ae045e2455 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Steady transitioning of the BPF instructure to a generic spot so
      all kernel subsystems can make use of it, from Alexei Starovoitov.

   2) SFC driver supports busy polling, from Alexandre Rames.

   3) Take advantage of hash table in UDP multicast delivery, from David
      Held.

   4) Lighten locking, in particular by getting rid of the LRU lists, in
      inet frag handling.  From Florian Westphal.

   5) Add support for various RFC6458 control messages in SCTP, from
      Geir Ola Vaagland.

   6) Allow to filter bridge forwarding database dumps by device, from
      Jamal Hadi Salim.

   7) virtio-net also now supports busy polling, from Jason Wang.

   8) Some low level optimization tweaks in pktgen from Jesper Dangaard
      Brouer.

   9) Add support for ipv6 address generation modes, so that userland
      can have some input into the process.  From Jiri Pirko.

  10) Consolidate common TCP connection request code in ipv4 and ipv6,
      from Octavian Purdila.

  11) New ARP packet logger in netfilter, from Pablo Neira Ayuso.

  12) Generic resizable RCU hash table, with intial users in netlink and
      nftables.  From Thomas Graf.

  13) Maintain a name assignment type so that userspace can see where a
      network device name came from (enumerated by kernel, assigned
      explicitly by userspace, etc.) From Tom Gundersen.

  14) Automatic flow label generation on transmit in ipv6, from Tom
      Herbert.

  15) New packet timestamping facilities from Willem de Bruijn, meant to
      assist in measuring latencies going into/out-of the packet
      scheduler, latency from TCP data transmission to ACK, etc"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits)
  cxgb4 : Disable recursive mailbox commands when enabling vi
  net: reduce USB network driver config options.
  tg3: Modify tg3_tso_bug() to handle multiple TX rings
  amd-xgbe: Perform phy connect/disconnect at dev open/stop
  amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask
  net: sun4i-emac: fix memory leak on bad packet
  sctp: fix possible seqlock seadlock in sctp_packet_transmit()
  Revert "net: phy: Set the driver when registering an MDIO bus device"
  cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine
  team: Simplify return path of team_newlink
  bridge: Update outdated comment on promiscuous mode
  net-timestamp: ACK timestamp for bytestreams
  net-timestamp: TCP timestamping
  net-timestamp: SCHED timestamp on entering packet scheduler
  net-timestamp: add key to disambiguate concurrent datagrams
  net-timestamp: move timestamp flags out of sk_flags
  net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
  cxgb4i : Move stray CPL definitions to cxgb4 driver
  tcp: reduce spurious retransmits due to transient SACK reneging
  qlcnic: Initialize dcbnl_ops before register_netdev
  ...
2014-08-06 09:38:14 -07:00
Linus Torvalds
bb2cbf5e93 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "In this release:

   - PKCS#7 parser for the key management subsystem from David Howells
   - appoint Kees Cook as seccomp maintainer
   - bugfixes and general maintenance across the subsystem"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits)
  X.509: Need to export x509_request_asymmetric_key()
  netlabel: shorter names for the NetLabel catmap funcs/structs
  netlabel: fix the catmap walking functions
  netlabel: fix the horribly broken catmap functions
  netlabel: fix a problem when setting bits below the previously lowest bit
  PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1
  tpm: simplify code by using %*phN specifier
  tpm: Provide a generic means to override the chip returned timeouts
  tpm: missing tpm_chip_put in tpm_get_random()
  tpm: Properly clean sysfs entries in error path
  tpm: Add missing tpm_do_selftest to ST33 I2C driver
  PKCS#7: Use x509_request_asymmetric_key()
  Revert "selinux: fix the default socket labeling in sock_graft()"
  X.509: x509_request_asymmetric_keys() doesn't need string length arguments
  PKCS#7: fix sparse non static symbol warning
  KEYS: revert encrypted key change
  ima: add support for measuring and appraising firmware
  firmware_class: perform new LSM checks
  security: introduce kernel_fw_from_file hook
  PKCS#7: Missing inclusion of linux/err.h
  ...
2014-08-06 08:06:39 -07:00
David S. Miller
d247b6ab3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/Makefile
	net/ipv6/sysctl_net_ipv6.c

Two ipv6_table_template[] additions overlap, so the index
of the ipv6_table[x] assignments needed to be adjusted.

In the drivers/net/Makefile case, we've gotten rid of the
garbage whereby we had to list every single USB networking
driver in the top-level Makefile, there is just one
"USB_NETWORKING" that guards everything.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 18:46:26 -07:00
Willem de Bruijn
09c2d251b7 net-timestamp: add key to disambiguate concurrent datagrams
Datagrams timestamped on transmission can coexist in the kernel stack
and be reordered in packet scheduling. When reading looped datagrams
from the socket error queue it is not always possible to unique
correlate looped data with original send() call (for application
level retransmits). Even if possible, it may be expensive and complex,
requiring packet inspection.

Introduce a data-independent ID mechanism to associate timestamps with
send calls. Pass an ID alongside the timestamp in field ee_data of
sock_extended_err.

The ID is a simple 32 bit unsigned int that is associated with the
socket and incremented on each send() call for which software tx
timestamp generation is enabled.

The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
avoid changing ee_data for existing applications that expect it 0.
The counter is reset each time the flag is reenabled. Reenabling
does not change the ID of already submitted data. It is possible
to receive out of order IDs if the timestamp stream is not quiesced
first.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 16:35:54 -07:00
Willem de Bruijn
b9f40e21ef net-timestamp: move timestamp flags out of sk_flags
sk_flags is reaching its limit. New timestamping options will not fit.
Move all of them into a new field sk->sk_tsflags.

Added benefit is that this removes boilerplate code to convert between
SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.

SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
timestamp logic (netstamp_needed). That can be simplified and this
last key removed, but will leave that for a separate patch.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs,
though that scatters tstamp fields throughout the struct.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 16:35:54 -07:00
Willem de Bruijn
f24b9be595 net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
Applications that request kernel tx timestamps with SO_TIMESTAMPING
read timestamps as recvmsg() ancillary data. The response is defined
implicitly as timespec[3].

1) define struct scm_timestamping explicitly and

2) add support for new tstamp types. On tx, scm_timestamping always
   accompanies a sock_extended_err. Define previously unused field
   ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
   define the existing behavior.

The reception path is not modified. On rx, no struct similar to
sock_extended_err is passed along with SCM_TIMESTAMPING.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 16:35:53 -07:00
Neal Cardwell
5ae344c949 tcp: reduce spurious retransmits due to transient SACK reneging
This commit reduces spurious retransmits due to apparent SACK reneging
by only reacting to SACK reneging that persists for a short delay.

When a sequence space hole at snd_una is filled, some TCP receivers
send a series of ACKs as they apparently scan their out-of-order queue
and cumulatively ACK all the packets that have now been consecutiveyly
received. This is essentially misbehavior B in "Misbehaviors in TCP
SACK generation" ACM SIGCOMM Computer Communication Review, April
2011, so we suspect that this is from several common OSes (Windows
2000, Windows Server 2003, Windows XP). However, this issue has also
been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
into spurious retransmissions by lack of timestamps?" from March 2014,
where the receiver was thought to be a BSD box.

Since snd_una would temporarily be adjacent to a previously SACKed
range in these scenarios, this receiver behavior triggered the Linux
SACK reneging code path in the sender. This led the sender to clear
the SACK scoreboard, enter CA_Loss, and spuriously retransmit
(potentially) every packet from the entire write queue at line rate
just a few milliseconds before the ACK for each packet arrives at the
sender.

To avoid such situations, now when a sender sees apparent reneging it
does not yet retransmit, but rather adjusts the RTO timer to give the
receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
that will restore sanity to the SACK scoreboard. If the reneging
persists until this RTO then, as before, we clear the SACK scoreboard
and enter CA_Loss.

A 10ms delay tolerates a receiver sending such a stream of ACKs at
56Kbit/sec. And to allow for receivers with slower or more congested
paths, we wait for at least RTT/2.

We validated the resulting max(RTT/2, 10ms) delay formula with a mix
of North American and South American Google web server traffic, and
found that for ACKs displaying transient reneging:

 (1) 90% of inter-ACK delays were less than 10ms
 (2) 99% of inter-ACK delays were less than RTT/2

In tests on Google web servers this commit reduced reneging events by
75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
any measurable impact on latency for user HTTP and SPDY requests.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 16:29:33 -07:00
David S. Miller
aef4f5b6db Merge tag 'master-2014-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
Conflicts:
	net/6lowpan/iphc.c

Minor conflicts in iphc.c were changes overlapping with some
style cleanups.

John W. Linville says:

====================
Please pull this last(?) batch of wireless change intended for the
3.17 stream...

For the NFC bits, Samuel says:

"This is a rather quiet one, we have:

- A new driver from ST Microelectronics for their NCI ST21NFCB,
  including device tree  support.

- p2p support for the ST21NFCA driver

- A few fixes an enhancements for the NFC digital laye"

For the Atheros bits, Kalle says:

"Michal and Janusz did some important RX aggregation fixes, basically we
were missing RX reordering altogether. The 10.1 firmware doesn't support
Ad-Hoc mode and Michal fixed ath10k so that it doesn't advertise Ad-Hoc
support with that firmware. Also he implemented a workaround for a KVM
issue."

For the Bluetooth bits, Gustavo and Johan say:

"To quote Gustavo from his previous request:

'Some last minute fixes for -next. We have a fix for a use after free in
RFCOMM, another fix to an issue with ADV_DIRECT_IND and one for ADV_IND with
auto-connection handling.  Last, we added support for reading the codec and
MWS setting for controllers that support these features.'

Additionally there are fixes to LE scanning, an update to conform to the 4.1
core specification as well as fixes for tracking the page scan state. All
of these fixes are important for 3.17."

And,

"We've got:

- 6lowpan fixes/cleanups
- A couple crash fixes, one for the Marvell HCI driver and another in LE SMP.
- Fix for an incorrect connected state check
- Fix for the bondable requirement during pairing (an issue which had
  crept in because of using "pairable" when in fact the actual meaning
  was "bondable" (these have different meanings in Bluetooth)"

Along with those are some late-breaking hardware support patches in
brcmfmac and b43 as well as a stray ath9k patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 13:18:20 -07:00
Nikolay Aleksandrov
d4ad4d22e7 inet: frags: use kmem_cache for inet_frag_queue
Use kmem_cache to allocate/free inet_frag_queue objects since they're
all the same size per inet_frags user and are alloced/freed in high volumes
thus making it a perfect case for kmem_cache.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02 15:31:31 -07:00
Nikolay Aleksandrov
1ab1934ed8 inet: frags: enum the flag definitions and add descriptions
Move the flags to an enum definion, swap FIRST_IN/LAST_IN to be in increasing
order and add comments explaining each flag and the inet_frag_queue struct
members.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02 15:31:31 -07:00
Nikolay Aleksandrov
06aa8b8a03 inet: frags: rename last_in to flags
The last_in field has been used to store various flags different from
first/last frag in so give it a more descriptive name: flags.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02 15:31:31 -07:00
James Morris
103ae675b1 Merge branch 'next' of git://git.infradead.org/users/pcmoore/selinux into next 2014-08-02 22:58:02 +10:00
Paul Moore
4fbe63d1c7 netlabel: shorter names for the NetLabel catmap funcs/structs
Historically the NetLabel LSM secattr catmap functions and data
structures have had very long names which makes a mess of the NetLabel
code and anyone who uses NetLabel.  This patch renames the catmap
functions and structures from "*_secattr_catmap_*" to just "*_catmap_*"
which improves things greatly.

There are no substantial code or logic changes in this patch.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-01 11:17:37 -04:00
Paul Moore
4b8feff251 netlabel: fix the horribly broken catmap functions
The NetLabel secattr catmap functions, and the SELinux import/export
glue routines, were broken in many horrible ways and the SELinux glue
code fiddled with the NetLabel catmap structures in ways that we
probably shouldn't allow.  At some point this "worked", but that was
likely due to a bit of dumb luck and sub-par testing (both inflicted
by yours truly).  This patch corrects these problems by basically
gutting the code in favor of something less obtuse and restoring the
NetLabel abstractions in the SELinux catmap glue code.

Everything is working now, and if it decides to break itself in the
future this code will be much easier to debug than the code it
replaces.

One noteworthy side effect of the changes is that it is no longer
necessary to allocate a NetLabel catmap before calling one of the
NetLabel APIs to set a bit in the catmap.  NetLabel will automatically
allocate the catmap nodes when needed, resulting in less allocations
when the lowest bit is greater than 255 and less code in the LSMs.

Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-01 11:17:17 -04:00
Paul Moore
41c3bd2039 netlabel: fix a problem when setting bits below the previously lowest bit
The NetLabel category (catmap) functions have a problem in that they
assume categories will be set in an increasing manner, e.g. the next
category set will always be larger than the last.  Unfortunately, this
is not a valid assumption and could result in problems when attempting
to set categories less than the startbit in the lowest catmap node.
In some cases kernel panics and other nasties can result.

This patch corrects the problem by checking for this and allocating a
new catmap node instance and placing it at the front of the list.

Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-01 11:17:03 -04:00
Jason Gunthorpe
299ee123e1 sctp: Fixup v4mapped behaviour to comply with Sock API
The SCTP socket extensions API document describes the v4mapping option as
follows:

8.1.15.  Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)

   This socket option is a Boolean flag which turns on or off the
   mapping of IPv4 addresses.  If this option is turned on, then IPv4
   addresses will be mapped to V6 representation.  If this option is
   turned off, then no mapping will be done of V4 addresses and a user
   will receive both PF_INET6 and PF_INET type addresses on the socket.
   See [RFC3542] for more details on mapped V6 addresses.

This description isn't really in line with what the code does though.

Introduce addr_to_user (renamed addr_v4map), which should be called
before any sockaddr is passed back to user space. The new function
places the sockaddr into the correct format depending on the
SCTP_I_WANT_MAPPED_V4_ADDR option.

Audit all places that touched v4mapped and either sanely construct
a v4 or v6 address then call addr_to_user, or drop the
unnecessary v4mapped check entirely.

Audit all places that call addr_to_user and verify they are on a sycall
return path.

Add a custom getname that formats the address properly.

Several bugs are addressed:
 - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
   addresses to user space
 - The addr_len returned from recvmsg was not correct when
   returning AF_INET on a v6 socket
 - flowlabel and scope_id were not zerod when promoting
   a v4 to v6
 - Some syscalls like bind and connect behaved differently
   depending on v4mapped

Tested bind, getpeername, getsockname, connect, and recvmsg for proper
behaviour in v4mapped = 1 and 0 cases.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 21:49:06 -07:00
David S. Miller
a173e550c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains netfilter updates for net-next, they are:

1) Add the reject expression for the nf_tables bridge family, this
   allows us to send explicit reject (TCP RST / ICMP dest unrech) to
   the packets matching a rule.

2) Simplify and consolidate the nf_tables set dumping logic. This uses
   netlink control->data to filter out depending on the request.

3) Perform garbage collection in xt_hashlimit using a workqueue instead
   of a timer, which is problematic when many entries are in place in
   the tables, from Eric Dumazet.

4) Remove leftover code from the removed ulog target support, from
   Paul Bolle.

5) Dump unmodified flags in the netfilter packet accounting when resetting
   counters, so userspace knows that a counter was in overquota situation,
   from Alexey Perevalov.

6) Fix wrong usage of the bitwise functions in nfnetlink_acct, also from
   Alexey.

7) Fix a crash when adding new set element with an empty NFTA_SET_ELEM_LIST
   attribute.

This patchset also includes a couple of cleanups for xt_LED from
Duan Jiong and for nf_conntrack_ipv4 (using coccinelle) from
Himangi Saraogi.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 14:09:14 -07:00
Dmitry Popov
95cb574598 ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"
Ipv4 tunnels created with "local any remote $ip" didn't work properly since
7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels
had src addr = 0.0.0.0. That was because only dst_entry was cached, although
fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry
(tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with
tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit().

This patch adds saddr to ip_tunnel->dst_cache, fixing this issue.

Reported-by: Sergey Popov <pinkbyte@gentoo.org>
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30 15:18:58 -07:00
David S. Miller
f139c74a8d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30 13:25:49 -07:00
Johan Hedberg
b2939475eb Bluetooth: Rename pairable mgmt setting to bondable
This setting maps to the HCI_BONDABLE flag which tracks whether we're
bondable or not. Therefore, rename the mgmt setting and respective
command accordingly.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-30 19:28:41 +02:00
Johan Hedberg
b6ae8457ac Bluetooth: Rename HCI_PAIRABLE to HCI_BONDABLE
The HCI_PAIRABLE flag isn't actually controlling whether we're pairable
but whether we're bondable. Therefore, rename it accordingly.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-30 19:28:41 +02:00
Varka Bhadram
233351bd66 6lowpan: remove unused function
This patch removes the unused function.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-30 19:28:41 +02:00
Varka Bhadram
267ca9fefc 6lowpan: remove unused macros
This patch removes the unused macros.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-30 19:28:41 +02:00
Alexander Aring
004942445d 6lowpan: remove unused LOWPAN_FRAG_SIZE define
This define is unused since commit
96cb3eb7a1 ("6lowpan: fix fragmentation on
sending side"). It is a worst case scenario for payload calculation.
Since commit 96cb3eb7a1 we calculation the
payload to use the optimal size.

This define is also necessary for ieee802154 6lowpan only and the file
include/net/6lowpan.h should contain generic 6lowpan things only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-30 19:28:40 +02:00
Alexander Aring
556a5bfc03 6lowpan: iphc: use ipv6 api to check address scope
This patch removes the own implementation to check of link-layer,
broadcast and any address type and use the IPv6 api for that.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-30 19:28:39 +02:00
WANG Cong
20e61da7ff ipv4: fail early when creating netdev named all or default
We create a proc dir for each network device, this will cause
conflicts when the devices have name "all" or "default".

Rather than emitting an ugly kernel warning, we could just
fail earlier by checking the device name.

Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-29 11:43:50 -07:00
Willem de Bruijn
4d276eb6a4 net: remove deprecated syststamp timestamp
The SO_TIMESTAMPING API defines three types of timestamps: software,
hardware in raw format (hwtstamp) and hardware converted to system
format (syststamp). The last has been deprecated in favor of combining
hwtstamp with a PTP clock driver. There are no active users in the
kernel.

The option was device driver dependent. If set, but without hardware
support, the correct behavior is to return zero in the relevant field
in the SCM_TIMESTAMPING ancillary message. Without device drivers
implementing the option, this field is effectively always zero.

Remove the internal plumbing to dissuage new drivers from implementing
the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to
avoid breaking existing applications that request the timestamp.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-29 11:39:50 -07:00
John W. Linville
a1ae52c203 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-07-29 10:32:36 -04:00
John W. Linville
ec87652694 NFC: 3.17 pull request
This is the NFC pull request for 3.17.
 This is a rather quiet one, we have:
 
 - A new driver from ST Microelectronics for their NCI ST21NFCB,
   including device tree  support.
 
 - p2p support for the ST21NFCA driver
 
 - A few fixes an enhancements for the NFC digital layer
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT0CJhAAoJEIqAPN1PVmxKtnAP/jOu1cP/EsegdByqmEumjSGQ
 mBv7wC+/r91wV6Hny8Ij8uxyN+/QfxF75nwPjTrPSApo7mHOlF8FeY0/EktxJazo
 c/NIAntNwREIbpkc68CIQNrYr9YFkKEM+7lCV1ImALUb/CPfiH7Fx7vGdhkCKqkc
 B8etkLyeJtl9EqSZM7GI2YrEbPzPEWLk2ydVQ9BccxvN0I8rc29DnD+DNR5sL6Wa
 7bEmZF5J/GnVErvnS8uHDGgerpzlFAj7MONrsxADZCHPie0F87T3wXAGwKlaBY6R
 nOde0YCS749JxN1c2AdmgIadwo5XqjeSbjQ1g8L8HJTWY8Tl9Vw8GoQFN9qhHkSM
 gkOya77n0R8SZWoJzo3BFBMpsncVG2cJIsnwpRBIWMUzg76mGe7Fzl21KCXD8xyy
 xvy8Ar3QKPDTu6uvLNEPk9+cfl/JxQAoLNL30eeZGBDBSg/g2ptNiBYNLXvVoEtU
 B/xyTdmA1SXQnOKGKNxjFCo+WZDXSoTrWeml/uvBhprVAj6YS3K/imc9EiL4zcD7
 72iNGZbIZRfw91x7VbbQ5Nb8PEyYsLef8ztUFM9HgliNgIDMHaQHoXYwmD4264uO
 RGTETdHYQb0ltX3HsBiIgTNF+Y1sJUVP3TyEhjSk58TpCy/S6v9YjcT9RYmNpuPk
 YftdxfapAKV0EJhcQc1r
 =wR0y
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.17 pull request

This is the NFC pull request for 3.17.
This is a rather quiet one, we have:

- A new driver from ST Microelectronics for their NCI ST21NFCB,
  including device tree  support.

- p2p support for the ST21NFCA driver

- A few fixes an enhancements for the NFC digital layer"

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-07-29 10:31:20 -04:00
Eric Dumazet
04ca6973f7 ip: make IP identifiers less predictable
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
    A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
    target > A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
    A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
    target > A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
    A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
    target > A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu>
Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-28 18:46:34 -07:00
David S. Miller
3fd0202a0d Merge tag 'master-2014-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-07-25

Please pull this batch of updates intended for the 3.17 stream!

For the mac80211 bits, Johannes says:

"We have a lot of TDLS patches, among them a fix that should make hwsim
tests happy again. The rest, this time, is mostly small fixes."

For the Bluetooth bits, Gustavo says:

"Some more patches for 3.17. The most important change here is the move of
the 6lowpan code to net/6lowpan. It has been agreed with Davem that this
change will go through the bluetooth tree. The rest are mostly clean up and
fixes."

and,

"Here follows some more patches for 3.17. These are mostly fixes to what
we've sent to you before for next merge window."

For the iwlwifi bits, Emmanuel says:

"I have the usual amount of BT Coex stuff. Arik continues to work
on TDLS and Ariej contributes a few things for HS2.0. I added a few
more things to the firmware debugging infrastructure. Eran fixes a
small bug - pretty normal content."

And for the Atheros bits, Kalle says:

"For ath6kl me and Jessica added support for ar6004 hw3.0, our latest
version of ar6004.

For ath10k Janusz added a printout so that it's easier to check what
ath10k kconfig options are enabled. He also added a debugfs file to
configure maximum amsdu and ampdu values. Also we had few fixes as
usual."

On top of that is the usual large batch of various driver updates --
brcmfmac, mwifiex, the TI drivers, and wil6210 all get some action.
Rafał has also been very busy with b43 and related updates.

Also, I pulled the wireless tree into this in order to resolve a
merge conflict...

P.S.  The change to fs/compat_ioctl.c reflects a name change in a
Bluetooth header file...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-28 17:36:25 -07:00
Mark Rustad
d87de1f3e9 netlink: Fix shadow warning on jiffies
Change formal parameter name to not shadow the global jiffies.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-28 17:20:43 -07:00
Florian Westphal
ab1c724f63 inet: frag: use seqlock for hash rebuild
rehash is rare operation, don't force readers to take
the read-side rwlock.

Instead, we only have to detect the (rare) case where
the secret was altered while we are trying to insert
a new inetfrag queue into the table.

If it was changed, drop the bucket lock and recompute
the hash to get the 'new' chain bucket that we have to
insert into.

Joint work with Nikolay Aleksandrov.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27 22:34:36 -07:00
Florian Westphal
e3a57d18b0 inet: frag: remove periodic secret rebuild timer
merge functionality into the eviction workqueue.

Instead of rebuilding every n seconds, take advantage of the upper
hash chain length limit.

If we hit it, mark table for rebuild and schedule workqueue.
To prevent frequent rebuilds when we're completely overloaded,
don't rebuild more than once every 5 seconds.

ipfrag_secret_interval sysctl is now obsolete and has been marked as
deprecated, it still can be changed so scripts won't be broken but it
won't have any effect. A comment is left above each unused secret_timer
variable to avoid confusion.

Joint work with Nikolay Aleksandrov.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27 22:34:36 -07:00
Florian Westphal
3fd588eb90 inet: frag: remove lru list
no longer used.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27 22:34:36 -07:00
Florian Westphal
434d305405 inet: frag: don't account number of fragment queues
The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter.  Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.

We still report the memory currently used by fragment
reassembly queues.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27 22:34:36 -07:00
Florian Westphal
b13d3cbfb8 inet: frag: move eviction of queues to work queue
When the high_thresh limit is reached we try to toss the 'oldest'
incomplete fragment queues until memory limits are below the low_thresh
value.  This happens in softirq/packet processing context.

This has two drawbacks:

1) processors might evict a queue that was about to be completed
by another cpu, because they will compete wrt. resource usage and
resource reclaim.

2) LRU list maintenance is expensive.

But when constantly overloaded, even the 'least recently used' element is
recent, so removing 'lru' queue first is not 'fairer' than removing any
other fragment queue.

This moves eviction out of the fast path:

When the low threshold is reached, a work queue is scheduled
which then iterates over the table and removes the queues that exceed
the memory limits of the namespace. It sets a new flag called
INET_FRAG_EVICTED on the evicted queues so the proper counters will get
incremented when the queue is forcefully expired.

When the high threshold is reached, no more fragment queues are
created until we're below the limit again.

The LRU list is now unused and will be removed in a followup patch.

Joint work with Nikolay Aleksandrov.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27 22:34:35 -07:00
Florian Westphal
86e93e470c inet: frag: move evictor calls into frag_find function
First step to move eviction handling into a work queue.

We lose two spots that accounted evicted fragments in MIB counters.

Accounting will be restored since the upcoming work-queue evictor
invokes the frag queue timer callbacks instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27 22:34:35 -07:00
Florian Westphal
36c7778218 inet: frag: constify match, hashfn and constructor arguments
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27 22:34:35 -07:00
Georg Lukas
628531c9e9 Bluetooth: Provide defaults for LE advertising interval
Store the default values for minimum and maximum advertising interval
with all the other controller defaults. These vaules are sent to the
adapter whenever advertising is (re)enabled.

Signed-off-by: Georg Lukas <georg@op-co.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-26 19:05:09 +02:00
Paul Bolle
d4da843e6f netfilter: kill remnants of ulog targets
The ulog targets were recently killed. A few references to the Kconfig
macros CONFIG_IP_NF_TARGET_ULOG and CONFIG_BRIDGE_EBT_ULOG were left
untouched. Kill these too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-07-25 14:55:44 +02:00
Marcel Holtmann
4b9e7e7516 Bluetooth: Fix issue with ADV_IND reports and auto-connection handling
When adding remote devices to the kernel using the Add Device management
command, these devices are explicitly allowed to connect. This kind of
incoming connections are possible even when the controller itself is
not connectable.

For BR/EDR this distinction is pretty simple since there is only one
type of incoming connections. With LE this is not that simple anymore
since there are ADV_IND and ADV_DIRECT_IND advertising events.

The ADV_DIRECT_IND advertising events are send for incoming (slave
initiated) connections only. And this is the only thing the kernel
should allow when adding devices using action 0x01. This meaning
of incoming connections is coming from BR/EDR and needs to be
mapped to LE the same way.

Supporting the auto-connection of devices using ADV_IND advertising
events is an important feature as well. However it does not map to
incoming connections. So introduce a new action 0x02 that allows
the kernel to connect to devices using ADV_DIRECT_IND and in addition
ADV_IND advertising reports.

This difference is represented by the new HCI_AUTO_CONN_DIRECT value
for only connecting to ADV_DIRECT_IND. For connection to ADV_IND and
ADV_DIRECT_IND the old value HCI_AUTO_CONN_ALWAYS is used.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-24 00:37:23 +03:00
Sorin Dumitru
274f482d33 sock: remove skb argument from sk_rcvqueues_full
It hasn't been used since commit 0fd7bac(net: relax rcvbuf limits).

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23 13:23:06 -07:00
Marcel Holtmann
f4fe73ed56 Bluetooth: Get MWS transport configuration of the controller
If the Bluetooth controller supports Get MWS Transport Layer
Configuration command, then issue it during initialization.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-23 20:34:08 +03:00
Marcel Holtmann
109e319193 Bluetooth: Read list of local codecs supported by the controller
If the Bluetooth controller supports Read Local Supported Codecs
command, then issue it during initialization so that the list of
codecs is known.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-23 20:34:06 +03:00
Mark A. Greer
bf30a67c94 NFC: digital: Add 'tg_listen_md' and 'tg_get_rf_tech' driver hooks
The digital layer of the NFC subsystem currently
supports a 'tg_listen_mdaa' driver hook that supports
devices that can do mode detection and automatic
anticollision.  However, there are some devices that
can do mode detection but not automatic anitcollision
so add the 'tg_listen_md' hook to support those devices.

In order for the digital layer to get the RF technology
detected by the device from the driver, add the
'tg_get_rf_tech' hook.  It is only valid to call this
hook immediately after a successful call to 'tg_listen_md'.

CC: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-07-23 01:17:31 +02:00
Mark A. Greer
f63bac94bf NFC: digital: Remove extra blank line
Remove extra blank line that was inadvertently
added by a recent commit.

CC: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-07-23 01:17:31 +02:00
Christophe Ricard
95f7687b20 NFC: hci: Add stop_poll HCI operand.
stop_poll allows to stop CLF reader polling. Some other operations might be
necessary for some CLF to stop polling. For example in card mode.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-07-23 01:04:31 +02:00
David Laight
526cbef778 net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAY
MSG_MORE and 'corking' a socket would require that the transmit of
a data chunk be delayed.
Rename the return value to be less specific.

Signed-off-by: David Laight <david.laight@aculab.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 13:32:11 -07:00
John W. Linville
bd6fb31fd3 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-07-22 13:50:23 -04:00
John W. Linville
a006827a15 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-07-22 13:49:34 -04:00
David S. Miller
8fd90bb889 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/infiniband/hw/cxgb4/device.c

The cxgb4 conflict was simply overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 00:44:59 -07:00
Michal Kazior
08cf42e843 mac80211: add support for Rx reordering offloading
Some drivers may be performing most of Tx/Rx
aggregation on their own (e.g. in firmware)
including AddBa/DelBa negotiations but may
otherwise require Rx reordering assistance.

The patch exports 2 new functions for establishing
Rx aggregation sessions in assumption device
driver has taken care of the necessary
negotiations.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[fix endian bug]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-07-21 17:42:07 +02:00
David S. Miller
a8138f42d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains updates for your net-next tree,
they are:

1) Use kvfree() helper function from x_tables, from Eric Dumazet.

2) Remove extra timer from the conntrack ecache extension, use a
   workqueue instead to redeliver lost events to userspace instead,
   from Florian Westphal.

3) Removal of the ulog targets for ebtables and iptables. The nflog
   infrastructure superseded this almost 9 years ago, time to get rid
   of this code.

4) Replace the list of loggers by an array now that we can only have
   two possible non-overlapping logger flavours, ie. kernel ring buffer
   and netlink logging.

5) Move Eric Dumazet's log buffer code to nf_log to reuse it from
   all of the supported per-family loggers.

6) Consolidate nf_log_packet() as an unified interface for packet logging.
   After this patch, if the struct nf_loginfo is available, it explicitly
   selects the logger that is used.

7) Move ip and ip6 logging code from xt_LOG to the corresponding
   per-family loggers. Thus, x_tables and nf_tables share the same code
   for packet logging.

8) Add generic ARP packet logger, which is used by nf_tables. The
   format aims to be consistent with the output of xt_LOG.

9) Add generic bridge packet logger. Again, this is used by nf_tables
   and it routes the packets to the real family loggers. As a result,
   we get consistent logging format for the bridge family. The ebt_log
   logging code has been intentionally left in place not to break
   backward compatibility since the logging output differs from xt_LOG.

10) Update nft_log to explicitly request the required family logger when
    needed.

11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families.
    Allowing selection between netlink and kernel buffer ring logging.

12) Several fixes coming after the netfilter core logging changes spotted
    by robots.

13) Use IS_ENABLED() macros whenever possible in the netfilter tree,
    from Duan Jiong.

14) Removal of a couple of unnecessary branch before kfree, from Fabian
    Frederick.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 21:01:43 -07:00
Mark A. Greer
55537c7e7d NFC: digital: Add digital framing calls when in target mode
Add new "NFC_DIGITAL_FRAMING_*" calls to the digital
layer so the driver can make the necessary adjustments
when performing anticollision while in target mode.

The driver must ensure that the effect of these calls
happens after the following response has been sent but
before reception of the next request begins.

Acked-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-07-21 00:45:21 +02:00
Anish Bhatt
c2659479f7 Update setapp/getapp prototypes in dcbnl_rtnl_ops to return int instead of u8
v2: fixed issue with checking return of dcbnl_rtnl_ops->getapp()

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17 16:02:29 -07:00
Johan Hedberg
977f8fce02 Bluetooth: Introduce a flag to track who really initiates authentication
Even though our side requests authentication, the original action that
caused it may be remotely triggered, such as an incoming L2CAP or RFCOMM
connect request. To track this information introduce a new hci_conn flag
called HCI_CONN_AUTH_INITIATOR.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-17 14:39:40 +02:00
Johan Hedberg
e7cafc4525 Bluetooth: Pass initiator/acceptor information to hci_conn_security()
We're interested in whether an authentication request is because of a
remote or local action. So far hci_conn_security() has been used both
for incoming and outgoing actions (e.g. RFCOMM or L2CAP connect
requests) so without some modifications it cannot know which peer is
responsible for requesting authentication.

This patch adds a new "bool initiator" parameter to hci_conn_security()
to indicate which side is responsible for the request and updates the
current users to pass this information correspondingly.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-17 14:39:39 +02:00
Jeff Layton
1373a7739e net: clean up some sparse endianness warnings in ipv6.h
sparse is throwing warnings when building sunrpc modules due to some
endianness shenanigans in ipv6.h. Specifically:

  CHECK   net/sunrpc/addr.c
include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer
include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer
include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer
include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer

Sprinkle some endianness fixups to silence them. These should all get
fixed up at compile time, so I don't think this will add any extra work
to be done at runtime.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:31:38 -07:00
David Held
2dc41cff75 udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.
Many multicast sources can have the same port which can result in a very
large list when hashing by port only. Hash by address and port instead
if this is the case. This makes multicast more similar to unicast.

On a 24-core machine receiving from 500 multicast sockets on the same
port, before this patch 80% of system CPU was used up by spin locking
and only ~25% of packets were successfully delivered.

With this patch, all packets are delivered and kernel overhead is ~8%
system CPU on spinlocks.

Signed-off-by: David Held <drheld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:29:52 -07:00
David S. Miller
38a4dfcf80 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/nf_tables fixes

The following patchset contains nf_tables fixes, they are:

1) Fix wrong transaction handling when the table flags are not
   modified.

2) Fix missing rcu read_lock section in the netlink dump path, which
   is not protected by the nfnl_lock.

3) Set NLM_F_DUMP_INTR in the netlink dump path to indicate
   interferences with updates.

4) Fix 64 bits chain counters when they are retrieved from a 32 bits
   arch, from Eric Dumazet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 15:27:16 -07:00
Geir Ola Vaagland
2347c80ff1 net: sctp: implement rfc6458, 5.3.6. SCTP_NXTINFO cmsg support
This patch implements section 5.3.6. of RFC6458, that is, support
for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which
is placed into ancillary data cmsghdr structure for each recvmsg()
call, if this information is already available when delivering the
current message.

This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
level by setting an int value with 1/0 for SCTP_RECVNXTINFO in
user space applications as per RFC6458, section 8.1.30.

The sctp_nxtinfo structure is defined as per RFC as below ...

  struct sctp_nxtinfo {
    uint16_t nxt_sid;
    uint16_t nxt_flags;
    uint32_t nxt_ppid;
    uint32_t nxt_length;
    sctp_assoc_t nxt_assoc_id;
  };

... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo.

Joint work with Daniel Borkmann.

Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:40:03 -07:00
Geir Ola Vaagland
0d3a421d28 net: sctp: implement rfc6458, 5.3.5. SCTP_RCVINFO cmsg support
This patch implements section 5.3.5. of RFC6458, that is, support
for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is
placed into ancillary data cmsghdr structure for each recvmsg()
call.

This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user
space applications as per RFC6458, section 8.1.29.

The sctp_rcvinfo structure is defined as per RFC as below ...

  struct sctp_rcvinfo {
    uint16_t rcv_sid;
    uint16_t rcv_ssn;
    uint16_t rcv_flags;
    <-- 2 bytes hole  -->
    uint32_t rcv_ppid;
    uint32_t rcv_tsn;
    uint32_t rcv_cumtsn;
    uint32_t rcv_context;
    sctp_assoc_t rcv_assoc_id;
  };

... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo.
An sctp_rcvinfo item always corresponds to the data in msg_iov.

Joint work with Daniel Borkmann.

Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:40:03 -07:00
Geir Ola Vaagland
63b949382c net: sctp: implement rfc6458, 5.3.4. SCTP_SNDINFO cmsg support
This patch implements section 5.3.4. of RFC6458, that is, support
for 'SCTP Send Information Structure' (SCTP_SNDINFO) which can be
placed into ancillary data cmsghdr structure for sendmsg() calls.

The sctp_sndinfo structure is defined as per RFC as below ...

  struct sctp_sndinfo {
    uint16_t snd_sid;
    uint16_t snd_flags;
    uint32_t snd_ppid;
    uint32_t snd_context;
    sctp_assoc_t snd_assoc_id;
  };

... and supplied under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_SNDINFO, while cmsg_data[] contains struct sctp_sndinfo.
An sctp_sndinfo item always corresponds to the data in msg_iov.

Joint work with Daniel Borkmann.

Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:40:03 -07:00
David S. Miller
1a98c69af1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:09:34 -07:00
Johan Hedberg
f8218dc660 Bluetooth: Track number of LE slave connections
Most (probably all) controllers can only deal with a single slave LE
connection at a time. This patch adds a counter for such connections so
that the number can be quickly looked up without iterating the
connections list.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-16 11:58:03 +02:00
Johan Hedberg
a5c4e309b9 Bluetooth: Add a role parameter to hci_conn_add()
We need to be able to track slave vs master LE connections in
hci_conn_hash, and to be able to do that we need to know the role of the
connection by the time hci_conn_add_has() is called. This means in
practice the hci_conn_add() call that creates the hci_conn_object.

This patch adds a new role parameter to hci_conn_add() function to give
the object its initial role value, and updates the callers to pass the
appropriate role to it. Since the function now takes care of
initializing both conn->role and conn->out values we can remove some
other unnecessary assignments.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-16 11:58:03 +02:00
Johan Hedberg
e804d25d4a Bluetooth: Use explicit role instead of a bool in function parameters
To make the code more understandable it makes sense to use the new HCI
defines for connection role instead of a "bool master" parameter. This
makes it immediately clear when looking at the function calls what the
last parameter is describing.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-16 11:04:23 +02:00
Johan Hedberg
40bef302f6 Bluetooth: Convert HCI_CONN_MASTER flag to a conn->role variable
Having a dedicated u8 role variable in the hci_conn struct greatly
simplifies tracking of the role, since this is the native way that it's
represented on the HCI level.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-16 11:04:23 +02:00
Johan Hedberg
ba165a90b5 Bluetooth: Add proper defines for HCI connection role
All HCI commands and events, including LE ones, use 0x00 for master role
and 0x01 for slave role. It makes therefore sense to add generic defines
for these instead of the current LE_CONN_ROLE_MASTER. Having clean
defines will also make it possible to provide simpler internal APIs.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-16 11:04:23 +02:00
Christoph Paasch
5ee2c941b5 tcp: Remove unnecessary arg from tcp_enter_cwr and tcp_init_cwnd_reduction
Since Yuchung's 9b44190dc1 (tcp: refactor F-RTO), tcp_enter_cwr is always
called with set_ssthresh = 1. Thus, we can remove this argument from
tcp_enter_cwr. Further, as we remove this one, tcp_init_cwnd_reduction
is then always called with set_ssthresh = true, and so we can get rid of
this argument as well.

Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15 16:19:36 -07:00
Tom Gundersen
5517750f05 net: rtnetlink - make create_link take name_assign_type
This passes down NET_NAME_USER (or NET_NAME_ENUM) to alloc_netdev(),
for any device created over rtnetlink.

v9: restore reverse-christmas-tree order of local variables

Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15 16:13:07 -07:00
Tom Herbert
8024e02879 udp: Add udp_sock_create for UDP tunnels to open listener socket
Added udp_tunnel.c which can contain some common functions for UDP
tunnels. The first function in this is udp_sock_create which is used
to open the listener port for a UDP tunnel.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 16:12:15 -07:00
Mathias Krause
9ecf07a1d8 neigh: sysctl - simplify address calculation of gc_* variables
The code in neigh_sysctl_register() relies on a specific layout of
struct neigh_table, namely that the 'gc_*' variables are directly
following the 'parms' member in a specific order. The code, though,
expresses this in the most ugly way.

Get rid of the ugly casts and use the 'tbl' pointer to get a handle to
the table. This way we can refer to the 'gc_*' variables directly.

Similarly seen in the grsecurity patch, written by Brad Spengler.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 14:32:51 -07:00
Eric Dumazet
ce355e209f netfilter: nf_tables: 64bit stats need some extra synchronization
Use generic u64_stats_sync infrastructure to get proper 64bit stats,
even on 32bit arches, at no extra cost for 64bit arches.

Without this fix, 32bit arches can have some wrong counters at the time
the carry is propagated into upper word.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-07-14 12:00:17 +02:00
Pablo Neira Ayuso
38e029f14a netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale
An updater may interfer with the dumping of any of the object lists.
Fix this by using a per-net generation counter and use the
nl_dump_check_consistent() interface so the NLM_F_DUMP_INTR flag is set
to notify userspace that it has to restart the dump since an updater
has interfered.

This patch also replaces the existing consistency checking code in the
rule dumping path since it is broken. Basically, the value that the
dump callback returns is not propagated to userspace via
netlink_dump_start().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-07-14 12:00:16 +02:00
David S. Miller
66568b3925 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates intended for the 3.17 stream...

This is primarily a Bluetooth pull.  Gustavo says:

"A lot of patches to 3.17. The bulk of changes here are for LE support.
The 6loWPAN over Bluetooth now has it own module, we also have support for
background auto-connection and passive scanning, Bluetooth device address
provisioning, support for reading Bluetooth clock values and LE connection
parameters plus many many fixes."

The balance is just a pull of the wireless.git tree, to avoid some
pending merge problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-13 22:42:17 -07:00
Marcel Holtmann
5a54e7c85b Bluetooth: Convert L2CAP ident spinlock into a mutex
The spinlock protecting the L2CAP ident number can be converted into
a mutex since the whole processing is run in a workqueue. So instead
of using a spinlock, just use a mutex here.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-13 22:32:45 +03:00
Marcel Holtmann
0da71f1bf9 Bluetooth: Enable LE encryption events only when supported
The support for LE encryption is optional. When encryption is not
supported then also do not enable the encryption related events.

This moves the event mask setting to the third initialization
stage to ensure that the LE features are available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-13 08:49:34 +03:00
Jiri Pirko
bc91b0f07a ipv6: addrconf: implement address generation modes
This patch introduces a possibility for userspace to set various (so far
two) modes of generating addresses. This is useful for example for
NetworkManager because it can set the mode to NONE and take care of link
local addresses itself. That allow it to have the interface up,
monitoring carrier but still don't have any addresses on it.

One more use-case by Dan Williams:
<quote>
WWAN devices often have their LL address provided by the firmware of the
device, which sometimes refuses to respond to incorrect LL addresses
when doing DHCPv6 or IPv6 ND.  The kernel cannot generate the correct LL
address for two reasons:

1) WWAN pseudo-ethernet interfaces often construct a fake MAC address,
or read a meaningless MAC address from the firmware.  Thus the EUI64 and
the IPv6LL address the kernel assigns will be wrong.  The real LL
address is often retrieved from the firmware with AT or proprietary
commands.

2) WWAN PPP interfaces receive their LL address from IPV6CP, not from
kernel assignments.  Only after IPV6CP has completed do we know the LL
address of the PPP interface and its peer.  But the kernel has already
assigned an incorrect LL address to the interface.

So being able to suppress the kernel LL address generation and assign
the one retrieved from the firmware is less complicated and more robust.
</quote>

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-11 15:05:45 -07:00
John W. Linville
95d01a669b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-07-11 15:52:57 -04:00
Marcel Holtmann
068d69e5bb Bluetooth: Move SCO timeout constants into net/bluetooth/sco.c
There is no external user of the SCO timeout constants and thus
move them into net/bluetooth/sco.c where they are actuallu used.

In addition just remove SCO_CONN_IDLE_TIMEOUT since it is unused.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:58:58 +03:00
Marcel Holtmann
6190ae7a18 Bluetooth: Remove unused SCO_DEFAULT_FLUSH_TO constant
The SCO_DEFAULT_FLUSH_TO constant has been defined, but it is not
used anywhere and so just remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:58:55 +03:00
Marcel Holtmann
fc8f525a6f Bluetooth: Move struct sco_conn into net/bluetooth/sco.c
There exists no external user of struct sco_conn and thus move
it into the one place that is actually using it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:58:53 +03:00
Marcel Holtmann
2a0dccb3df Bluetooth: Move struct sco_pinfo into net/bluetooth/sco.c
There exists no external user of struct sco_pinfo and sco_pi and
thus move it into the one place that is actually using it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:58:50 +03:00
Marcel Holtmann
a6801ca985 Bluetooth: Update the list of L2CAP fixed channels
The list of L2CAP fixed channels increased with newer versions of the
specification. This just updates the constants for it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:57:50 +03:00
Marcel Holtmann
899de76566 Bluetooth: Move HCI request internals to net/bluetooth/hci_core.c
The internals of the HCI request framework should not be leaking to
its users. Move them all into net/bluetooth/hci_core.c and provide
a simple hci_req_pending helper function for the one user outside
the framework.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:56:35 +03:00
Marcel Holtmann
863def58fe Bluetooth: Move struct hci_pinfo into net/bluetooth/hci_sock.c
There exists no external user of struct hci_pinfo and hci_pi and thus
move it into the one place that is actually using it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:55:14 +03:00
Marcel Holtmann
3ad254f7f6 Bluetooth: Move struct hci_sec_filter next to its user
There is only single location using struct hci_sec_filter and with
that there is no point in putting this declaration into a global
header file. So move it right next to its user and make the code
a lot more simpler.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:54:17 +03:00
Marcel Holtmann
f49daa8190 Bluetooth: Move HCI socket definitions into its own header file
All the HCI sockets and ioctl based definitions have been in a global
header file that also includes all the HCI protocol structures. To
make this a bit cleaner, move them into its own file.

This also adjusts fs/compat_ioctl.c to only include this new file
and not all the protocol structures that are not needed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-11 13:53:04 +03:00
Johan Hedberg
13a779e422 Bluetooth: Remove unneeded mgmt_write_scan_failed function
The Set Connectable/Discoverable mgmt handlers use a hci_request with a
proper callback to handle the HCI command sending. It makes therefore
little sense to have this extra function to be called from hci_event.c
for command failures.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-10 11:14:20 +02:00
Johan Hedberg
bc6d2d0418 Bluetooth: Remove unneeded mgmt_discoverable function
Since the HCISETSCAN ioctl is the only non-mgmt user we care about for
setting the right discoverable state we can simply do the necessary
updates in the ioctl handler function instead. This then allows the
removal of the mgmt_discoverable function and should simplify that state
handling considerably.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-10 11:14:20 +02:00
Johan Hedberg
031547d868 Bluetooth: Remove unneeded mgmt_connectable function
The mgmt_connectable function has been used to ensure that the right
actions to HCI_CONNECTABLE are taken when the HCI_Write_Scan_Enable
command is triggered by something else than mgmt. The only other user
that we really care about is the HCISETSCAN ioctl code, so we can
actually more simply perform the needed changes there instead.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-10 11:14:20 +02:00
Johan Hedberg
91a668b056 Bluetooth: Fix setting HCI_CONNECTABLE from ioctl code
When the white list is in use the code would not update the
HCI_CONNECTABLE flag if it gets changed through the ioctl code (e.g.
hciconfig hci0 pscan). Since the flag is important for properly
accepting incoming connections add code to fix it up if necessary and
emit a New Settings mgmt event.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-09 12:30:18 +02:00
Johan Hedberg
6659358efe Bluetooth: Introduce a whitelist for BR/EDR devices
This patch extends the Add/Remove device commands by letting user space
pass BR/EDR addresses to them. The resulting entries get stored in a new
hdev->whitelist list. The idea is that we can now selectively accept
connections from devices in the list even though HCI_CONNECTABLE is not
set (the actual implementation of this is coming in a subsequent patch).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-09 12:25:27 +02:00
Johan Hedberg
dcc36c16c2 Bluetooth: Unify helpers for bdaddr_list manipulations
We already have several lists with struct bdaddr_list entries, and there
will be more in the future. Since the operations for adding, removing,
looking up and clearing entries in these lists are exactly the same it
doesn't make sense to define new functions for every single list. This
patch unifies the functions by passing the list_head to them instead of
a hci_dev pointer.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-09 12:25:26 +02:00
Marcel Holtmann
cd7ca0ec5e Bluetooth: Fix enabling Authenticated Payload Timeout Expired event
The Authenticated Payload Timeout Expired event is valid for
controllers with BR/EDR Secure Connections support, but also for
LE only controllers supporting LE Ping feature. When either of them
is available enable this event. Previous it was not enabled when
the controller was only supporting LE operation.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-09 11:19:15 +03:00
Florian Fainelli
a37934fc0d net: provide stubs for ip6_set_txhash and ip6_make_flowlabel
Commit cb1ce2ef38 ("ipv6: Implement automatic flow label generation
on transmit") introduced ip6_make_flowlabel, while commit b73c3d0e4f
("net: Save TX flow hash in sock and set in skbuf on xmit") introduced
ip6_set_txhash.

ip6_set_tx_hash() uses sk_v6_daddr which references
__sk_common.skc_v6_daddr from struct sock_common, which is gated with
IS_ENABLED(CONFIG_IPV6).

ip6_make_flowlabel() uses the ipv6 member from struct net which is
also gated with IS_ENABLED(CONFIG_IPV6).

When CONFIG_IPV6 is disabled, we will hit a build failure that looks
like this when the compiler attempts inlining these functions:

  CC [M]  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.o
In file included from include/net/inet_sock.h:27:0,
                 from include/net/ip.h:30,
                 from drivers/net/ethernet/broadcom/cnic.c:37:
include/net/ipv6.h: In function 'ip6_set_txhash':
include/net/sock.h:327:33: error: 'struct sock_common' has no member named 'skc_v6_daddr'
 #define sk_v6_daddr  __sk_common.skc_v6_daddr
                                 ^
include/net/ipv6.h:696:49: note: in expansion of macro 'sk_v6_daddr'
  keys.dst = (__force __be32)ipv6_addr_hash(&sk->sk_v6_daddr);
                                                 ^
In file included from include/net/inetpeer.h:15:0,
                 from include/net/route.h:28,
                 from include/net/ip.h:31,
                 from drivers/net/ethernet/broadcom/cnic.c:37:
include/net/ipv6.h: In function 'ip6_make_flowlabel':
include/net/ipv6.h:706:37: error: 'struct net' has no member named 'ipv6'
  if (!flowlabel && (autolabel || net->ipv6.sysctl.auto_flowlabels)) {
                                     ^

Fixes: cb1ce2ef38 ("ipv6: Implement automatic flow label generation on transmit")
Fixes: b73c3d0e4f ("net: Save TX flow hash in sock and set in skbuf on xmit")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 20:43:35 -07:00
David Laight
d1a3fe26e9 net: sctp: Use pointers (not array indexes) to access sctp_cmd_seq_t.cmds[].
Using pointers into sctp_cmd_seq_t.cmds[] lets the compiler generate much
better code.
Use the last entry first to optimise the overflow check.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 14:39:00 -07:00
David Laight
b9420e1c87 net: sctp: Optimise the way 'sctp_arg_t' values are initialised.
Even if memset() is inlined (as on x86) using it to zero the union
generates a memory word write of zero, followed by a write of the
smaller field, and then a read of the word.
As well as being a lot of instructions the sequence is unlikely to
be optimised by the store-load forward hardware so will be slow.

Instead allocate a field of the union that is the same size as the
entire union and write a zero value to it. The compiler will then
generate the required value in a register.

Zeroing the union shouldn't be necessary, but this patch series isn't
intended to have a behavioural change.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 14:39:00 -07:00
David Laight
be1f4f48ce net: sctp: Inline the functions from command.c
sctp_init_cmd_seq() and sctp_next_cmd() are only called from one place.
The call sequence for sctp_add_cmd_sf() is likely to be longer than
the inlined code.
With sctp_add_cmd_sf() inlined the compiler can optimise repeated calls.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 14:38:48 -07:00
David S. Miller
72948cdcbb Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-07-03

Please pull this first batch of wireless updates intended for the
3.17 stream...

For the mac80211 bits, Johannes says:

"The biggest thing here is probably Arik's TDLS rework, beyond that we
have smaller improvements and features like David's scanning IE thing,
Luca's queue work, some CSA work, etc. Also your PID rate control
removal, of course."

For the iwlwifi bits, Emmanuel says:

"I have here a whole bunch of various things. Andy contributes
better debug prints for dvm specific flows and a module parameter to
completely disable power save for dvm. Andrei is sharing the premises
of his work on CSA - more to come. Eran and Liad keep on working
on the new devices. I have the regular amount of BT Coex stuff and
I continue to work on the firmware error report system adding more
debug capabilities. More to come on that subject too."

On top of that, there are some cleanups to the new rsi driver, some
continuing improvements to the rtl818x drivers, and the usual bundles
of updates to ath9k, b43, mwifiex, wil6210, and a few other bits here
and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 14:20:31 -07:00
Johan Hedberg
c93bd15033 Bluetooth: Remove unnecessary mgmt_advertising function
Since the real advertising state is now tracked with its own flag we can
simply set/unset the HCI_ADVERTISING flag in the
set_advertising_complete function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-08 14:22:06 +02:00
Johan Hedberg
66c417c1ee Bluetooth: Add flag to track the real advertising state
Having a single HCI_ADVERTISING flag is problematic since it tries to
track both the real advertising state and the corresponding mgmt
setting. To make the logic simpler and more reliable add a new flag that
only tracks the actual advertising state that has been written to the
controller.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-08 14:22:05 +02:00
Alexander Aring
640985ec2f mac802154: at86rf230: add hw flags and merge ops
This patch adds new mac802154 hw flags for transmit power, csma and
listen before transmit (lbt). These flags indicates that the transceiver
supports these features. If the flags are set and the driver doesn't
implement the necessary functions, then ieee802154_register_device
returns -ENOSYS "Function not implemented".

This patch merges also all at86rf230 operations into one operations structure
and set the right hw flags for the at86rf230 transceivers.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:29:24 -07:00
Tom Herbert
cb1ce2ef38 ipv6: Implement automatic flow label generation on transmit
Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.

Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.

By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.

It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.

Performance impact:

Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.

Automatic flow labels disabled:

  TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

  UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

Automatic flow labels enabled:

  TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

  UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:14:21 -07:00
Tom Herbert
535fb8d006 vxlan: Call udp_flow_src_port
In vxlan and OVS vport-vxlan call common function to get source port
for a UDP tunnel. Removed vxlan_src_port since the functionality is
now in udp_flow_src_port.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:14:21 -07:00
Tom Herbert
b8f1a55639 udp: Add function to make source port for UDP tunnels
This patch adds udp_flow_src_port function which is intended to be
a common function that UDP tunnel implementations call to set the source
port. The source port is chosen so that a hash over the outer headers
(IP addresses and UDP ports) acts as suitable hash for the flow of the
encapsulated packet. In this manner, UDP encapsulation works with RSS
and ECMP based wrt the inner flow.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:14:21 -07:00
Tom Herbert
b73c3d0e4f net: Save TX flow hash in sock and set in skbuf on xmit
For a connected socket we can precompute the flow hash for setting
in skb->hash on output. This is a performance advantage over
calculating the skb->hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb->hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:14:21 -07:00
Tom Herbert
5ed20a68cd flow_dissector: Abstract out hash computation
Move the hash computation located in __skb_get_hash to be a separate
function which takes flow_keys as input. This will allow flow hash
computation in other contexts where we only have addresses and ports.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:14:20 -07:00
Neal Cardwell
86c6a2c75a tcp: switch snt_synack back to measuring transmit time of first SYNACK
Always store in snt_synack the time at which the server received the
first client SYN and attempted to send the first SYNACK.

Recent commit aa27fc501 ("tcp: tcp_v[46]_conn_request: fix snt_synack
initialization") resolved an inconsistency between IPv4 and IPv6 in
the initialization of snt_synack. This commit brings back the idea
from 843f4a55e (tcp: use tcp_v4_send_synack on first SYN-ACK), which
was going for the original behavior of snt_synack from the commit
where it was added in 9ad7c049f0 ("tcp: RFC2988bis + taking RTT
sample from 3WHS for the passive open side") in v3.1.

In addition to being simpler (and probably a tiny bit faster),
unconditionally storing the time of the first SYNACK attempt has been
useful because it allows calculating a performance metric quantifying
how long it took to establish a passive TCP connection.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Jerry Chu <hkchu@google.com>
Acked-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 19:26:37 -07:00
Johan Hedberg
cdd6275e51 Bluetooth: Pass desired connection role to hci_connect_le()
If we have both LE scanning and advertising simultaneously enabled we
need a way to tell hci_connect_le() in which role to initiate a
connection. This patch adds a new parameter to the function to give it
the necessary information. For auto-connect and mgmt_pair_device we
always use master role, whereas for L2CAP users (in practice sockets) we
use slave role whenever HCI_ADVERTISING is set and master role
otherwise.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-07 15:18:08 +02:00
Johan Hedberg
d93375a82d Bluetooth: Remove auth_type parameter from hci_connect_le()
The auth_type value which gets assigned to hci_conn->auth_type is
something that's only used for BR/EDR connections and is of no value for
LE connections. It makes therefore little sense to pass it to the
hci_connect_le() function. This patch removes the parameter from the
function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-07 15:18:07 +02:00
Johan Hedberg
09ae260ba4 Bluetooth: Use lower timeout for LE auto-connections
When we establish connections as a consequence of receiving an
advertising report it makes no sense to wait the normal 20 second LE
connection timeout. This patch modifies the hci_connect_le function to
take an extra timeout value and uses a lower 2 second timeout for the
auto-connection case. This timeout is intentionally chosen to be just a
bit higher than the 1.28 second timeout that High Duty Cycle Advertising
uses.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-06 14:46:15 +03:00
Marcel Holtmann
9713c17b08 Bluetooth: Add support for changing the public device address
This adds support for changing the public device address. This feature
is required by controllers that do not provide a public address and
have HCI_QUIRK_INVALID_BDADDR set.

Even if a controller has a public device address, this is useful when
an embedded system wants to use its own value. As long as the driver
provides the set_bdaddr callback, this allows changing the device
address before powering on the controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-06 13:42:20 +03:00
Marcel Holtmann
d603b76b0c Bluetooth: Run controller setup after external configuration
When the external configuration triggers the switch to a configured
controller, it means the setup needs to be run. Controllers that start
out unconfigured have only run limited set of HCI commands. This is
not enough for complete operation and thus run the setup procedure
before announcing the new controller index.

This introduces HCI_CONFIG flag as companion to HCI_SETUP flag. The
HCI_SETUP flag is only used once for the initial setup procedure. And
during that procedure hdev->setup driver callback is called. With the
new HCI_CONFIG the switch from unconfigured to configured state is
triggering the same setup procedure just without hdev->setup. This
is required since bringing a controller back to unconfigured state
from configured state is possible.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-06 13:41:51 +03:00
Johan Hedberg
19de0825cd Bluetooth: Fix sending Device Removed when clearing all parameters
When calling Device Remove with BDADDR_ANY we should in a similar way
emit Device Removed events as we do when removing a single device. Since
we have to iterate the list and call device_removed() the dedicated
hci_conn_params_clear_enabled() is not really useful anymore. This patch
removes the helper function and does the event emission and list item
removal in a single loop.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-06 12:32:26 +02:00
Marcel Holtmann
e30d3f5fef Bluetooth: Store Bluetooth address from controller setup
During the setup phase of a controller, the Bluetooth address will be
read and to have that original address available for later use, store
it as setup address.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-05 15:48:28 +03:00
Marcel Holtmann
f4537c04d3 Bluetooth: Add support for New Configuration Options management event
When one or more of the missing configuration options change, then send
this even to all the other management interface clients.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-04 21:12:00 +03:00
Marcel Holtmann
dbece37a32 Bluetooth: Add support for Set External Configuration management command
The Set External Configuration management command allows for switching
between configured and unconfigured start if HCI_QURIK_EXTERNAL_CONFIG
is set by the transport driver.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-04 21:10:30 +03:00
Marcel Holtmann
eb1904f49d Bluetooth: Add quirk for external configuration requirement
When a controller requires external configuration, then setting this
quirk will allow indicating this.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-04 21:08:15 +03:00
Marcel Holtmann
89bc22d23f Bluetooth: Add quirk for invalid controller address setting
When a Bluetooth controller does not have a valid public Bluetooth
address, then allow the driver to indicate this. If the quirk is
set, the Bluetooth core will switch to unconfigured state first
and will allow userspace to configure the address before starting
the full initialization of the controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-04 18:09:32 +03:00
Marcel Holtmann
118305b50a Bluetooth: Document the existing device quirks
The current existing device quirks are not documented. So instead of
spreading bits and pieces somewhere in the code, add proper comments
on where these quirks can be used and what behavior they change.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-04 17:30:48 +03:00
Marcel Holtmann
46ebeb26cd Bluetooth: Fix constant for public address configuration
The public address configuration option is value 0x02 since the generic
external configuration is value 0x01. So adjust this accordingly and
also add the value 0x01 to the list.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-04 14:59:09 +03:00
Johan Hedberg
501f882741 Bluetooth: Make hci_pend_le_conn_lookup more general purposed
In some circumstances we need to look up entries in pend_le_conns and in
other in pend_le_reports. This patch converts the existing lookup
function for pend_le_conns to something that can be used for both lists.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-04 11:58:10 +02:00
Johan Hedberg
d9b3ad7df1 Bluetooth: Remove unused hci_pend_le_conn_add function
Since there are no more users of this function we can simply go ahead
and remove it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-04 11:58:09 +02:00
Johan Hedberg
d7347f3cc2 Bluetooth: Fix clearing and restarting all LE actions on power cycle
When powering off (hci_dev_do_close) we should clear both the
pend_le_reports and pend_le_conns types of entries. When powering on
respectively we should populate both lists. This patch converts the
hci_pend_le_conns_clear() function into hci_pend_le_actions_clear()
(which can now be static) and converts the restart_le_auto_conns()
function into restart_le_actions().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-04 11:58:09 +02:00
Johan Hedberg
ae44e5d19e Bluetooth: Remove unused hci_pend_le_conn_del() function
Now that there are no-longer any users of the hci_pend_le_conn_del()
function we can simply go ahead and remove it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-04 11:58:09 +02:00
Johan Hedberg
66f8455aea Bluetooth: Convert pend_le_reports into a list
To simplify manipulation and lookup of hci_conn_params entries of the
type HCI_AUTO_CONN_REPORT it makes sense to store them in their own
list. The new action list_head in hci_conn_params is used for this
purpose.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-04 11:58:08 +02:00
Johan Hedberg
93450c7544 Bluetooth: Convert pend_le_conn list to a generic action list
In preparation to store also HCI_AUTO_CONN_REPORT entries in a list it
makes sense to convert the existing pend_le_conn list head of
hci_conn_params into a more generically named "action". This makes sense
because a parameter entry will never participate in more than one action
list.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-04 11:58:08 +02:00
Marcel Holtmann
9fc3bfb681 Bluetooth: Add support for controller configuration info command
The Read Controller Configuration Information command allows retrieving
details about possible configurations option. The supported options are
returned and also the missing options (if any).

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-04 08:50:19 +03:00
Johan Hedberg
912b42ef05 Bluetooth: Use hci_conn_params in pend_le_conns
Since the connection parameters are always a basis for adding entries to
hdev->pend_le_conns (so far of type bdaddr_list) it's simpler and more
efficient to have the parameters themselves be the entries in the
pend_le_conns list. We do this by adding another list_head to the
hci_conn_params struct.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 18:45:08 +02:00
Johan Hedberg
851efca838 Bluetooth: Track number of added devices with HCI_AUTO_CONN_REPORT
To be able to make the right choice of whether to start passive scanning
or to send out a mgmt_device_found event we need to know if there are
any devices in the le_conn_params list with the auto_connect value set
to HCI_AUTO_CONN_REPORT. This patch adds a counter for this kind of
devices.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:58 +02:00
Marcel Holtmann
73d1df2a7a Bluetooth: Add support for Read Unconfigured Index List command
This command allows to get the list of currently known controller that
are in unconfigured state.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:58 +02:00
Marcel Holtmann
edd3896bc4 Bluetooth: Add support for Unconfigured Index Removed events
When a controller in an unconfigured state gets removed, then send
Unconfigured Index Removed events.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:58 +02:00
Marcel Holtmann
0602a8adc3 Bluetooth: Add support for Unconfigured Index Added events
When a controller is in unconfigured state it is currently hidden
from the management interface. This change now announces the new
controller with an Unconfigured Index Added event and allows clients
to easily detect the controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:58 +02:00
Marcel Holtmann
4a964404c0 Bluetooth: Introduce unconfigured controller state
With the new unconfigured controller state it is possible to provide a
fully functional HCI transport, but disable the higher level operations
that would normally happen. This way userspace can try to configure the
controller before releases the unconfigured state.

The internal state is represented by HCI_UNCONFIGURED. This replaces the
HCI_QUIRK_RAW_DEVICE quirk as internal state representation. This is now
a real state and drivers can use the quirk to actually trigger this
state. In the future this will allow a more fine grained switching from
unconfigured state to configured state for controller inititialization.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:58 +02:00
Johan Hedberg
c46245b3ef Bluetooth: Make is_identity_address a global function
There are more places that can take advantage of is_identity_address()
besides hci_core.c. This patch moves the function to hci_core.h and
gives it the appropriate hci_ prefix.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:57 +02:00
Johan Hedberg
f4869e2adb Bluetooth: Pass store hint to mgmt_new_conn_param
The calling functions of mgmt_new_conn_param have more information about
the parameters, such as whether the kernel is tracking them or not. It
makes therefore sense to have them pass an initial store_hint value to
the mgmt_new_conn_param function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:57 +02:00
Johan Hedberg
7d6ca6939c Bluetooth: Make hci_le_conn_update return the store hint
The caller of hci_le_conn_update is directly interested in knowing what
the best value is for the store_hint parameter of the corresponding
mgmt event. Since hci_le_conn_update knows whether there were stored
parameters that were updated or not we can have it return an initial
store_hint value to the caller.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:57 +02:00
Johan Hedberg
a26f3dcff2 Bluetooth: Add Load Connection Parameters command
This patch implements the new Load Connection Parameters mgmt command
that's intended to load the desired connection parameters for LE
devices.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:57 +02:00
Johan Hedberg
a3451d279f Bluetooth: Add new auto_conn value matching mgmt action 0x00
The 0x00 action value of mgmt means "scan and report" but do not
connect. This is different from HCI_AUTO_CONN_DISABLED so we need a new
value for it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:57 +02:00
Johan Hedberg
55af49a8fe Bluetooth: Add specific connection parameter clear functions
In some circumstances we'll need to either clear only the enabled
parameters or only the disabled ones. This patch adds convenience
functions for this purpose.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:56 +02:00
Johan Hedberg
373110c5d3 Bluetooth: Rename hci_conn_params_clear to hci_conn_params_clear_all
We'll soon have specific clear functions for clearing enabled or
disabled entries, so rename the function that removes everything to
clear_all().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:56 +02:00
Marcel Holtmann
24c457e270 Bluetooth: Add support for hdev->set_bdaddr callback handling
Some embedded controllers allow the programming of a public address
and this adds vendor support for supporting OEM confguration of such
addresses.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:55 +02:00
Andre Guedes
ffb5a827d5 Bluetooth: Introduce "New Connection Parameter" Event
This patch introduces a new Mgmt event called "New Connection Parameter".
This event indicates to userspace the connection parameters values the
remote device requested.

The user may store these values and load them into kernel. This way, next
time a connection is established to that device, the kernel will use those
parameters values instead of the default ones.

This event is sent when the remote device requests new connection
parameters through connection parameter update procedure. This event is
not sent for slave connections.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:55 +02:00
Andre Guedes
662bc2e63d Bluetooth: Enable new LE meta event
The Bluetooth 4.1 introduces a new LE meta event called "LE Remote
Connection Parameter Request" event. In order to the controller
sends this event to host, we should enable it during controller
initialization.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:55 +02:00
Andre Guedes
8e75b46a4f Bluetooth: Connection Parameter Update Procedure
This patch adds support for LE Connection Parameters Request Link
Layer control procedure introduced in Core spec 4.1. This procedure
allows a Peripheral or Central to update the Link Layer connection
parameters of an established connection.

Regarding the acceptance of connection parameters, the LL procedure
follows the same approach of L2CAP procedure (see l2cap_conn_param_
update_req function). We accept any connection parameters values as
long as they are within the valid range.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:54 +02:00
Johan Hedberg
2a8357f239 Bluetooth: Fix redundant device (un)blocked events
For the Block/Unblock Device mgmt commands we should only emit the
Blocked/Unblocked events on any socket except for the one which received
the command. The code was previously incorrectly trying to look up a
non-existent pending command and thereby ending up not skipping the
command socket for the event.

We can simplify the code a lot by simply sending the event directly from
the command handler functions. We have the reference to the command
socket available there which makes it easy to pass to the mgmt_event
function for skipping.

The only notable side-effect of this is that the old blacklisting
ioctl's no-longer cause mgmt events to be emitted, however as user space
versions using these ioctl's are not mgmt-aware this is acceptable.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:54 +02:00
Johan Hedberg
fe59a05f94 Bluetooth: Add flag to track STK encryption
There are certain subtle differences in behavior when we're encrypted
with the STK, such as allowing re-encryption even though the security
level stays the same. Because of this, add a flag to track whether we're
encrypted with an STK or not.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:54 +02:00
Marcel Holtmann
c70a7e4cc8 Bluetooth: Add support for Not Connectable flag for Device Found events
The Device Found events of the management interface should indicate if
it is possible to connect to a remote device or if it is broadcaster
only advertising. To allow this differentation the Not Connectable flag
is introduced that will be set when it is known that a device can not
be connected.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:53 +02:00
Marcel Holtmann
af58925ca6 Bluetooth: Provide flags parameter direct to mgmt_device_found
Providing the flags parameter directly to mgmt_device_found function
makes the core simpler and more readable. With this it becomes a lot
easier to add new flags in the future.

This also changes hci_inquiry_cache_update to just return that flags
needed for mgmt_device_found since that is its only use for the two
return parameters anyway.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:53 +02:00
Marcel Holtmann
d06b50ce14 Bluetooth: Remove connection interval parameters from hci_conn_params_set
The connection interval parameter of hci_conn_params_set are always used
with the controller defaults. So just let hci_conn_params_add set the
controller default and not bother resetting them to controller defaults
every time the hci_conn_params_set is called.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:53 +02:00
Marcel Holtmann
51d167c097 Bluetooth: Change hci_conn_params_add to return the parameter struct
When adding new connection parameters, it is useful to return either
the existing struct or the newly created one.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:52 +02:00
Andre Guedes
d4905f2453 Bluetooth: Connection parameters check helper
This patch renames l2cap_check_conn_param() to hci_check_conn_params()
and moves it to hci_core.h so it can reused in others files. This helper
will be reused in the next patch.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:52 +02:00
Marcel Holtmann
bf5b3c8be0 Bluetooth: Provide function to create and set connection parameters
In some cases it is useful to not overwrite connection parametes and
instead just create default ones if they don't exist. This function
does exactly that. hci_conn_params_add will allow to create new
default connection parameters. hci_conn_params_set will set the
values and also create new parameters if they don't exist.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:51 +02:00
Marcel Holtmann
04fb7d9066 Bluetooth: Provide defaults for LE connection latency and timeout
Store the connection latency and supervision timeout default values
with all the other controller defaults. And when needed use them
for new connections.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:51 +02:00
Marcel Holtmann
8afef092a1 Bluetooth: Add Device Added and Device Removed management events
When devices are added or removed, then make sure that events are send
out to all other clients so that the list of devices can be easily
tracked. This is especially important when external clients are
adding or removing devices within the auto-connection list.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:50 +02:00
Marcel Holtmann
2faade53e6 Bluetooth: Add support for Add/Remove Device management commands
This allows adding or removing devices from the background scanning
list the kernel maintains. Device flagged for auto-connection will
be automatically connected if they are found.

The passive scanning required for auto-connection will be started
and stopped on demand.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:50 +02:00
Marcel Holtmann
f044eb0524 Bluetooth: Store latency and supervision timeout in connection params
When the slave updates the connection parameters, store also the
connection latency and supervision timeout information in the
internal list of connection parameters for known devices.

Having these values available allowes the auto-connection
procedure to use the correct values from the beginning without
having to request an update on every connection establishment.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:50 +02:00
Johan Hedberg
958684263d Bluetooth: Add support for Get Clock Info mgmt command
This patch implements support for the Get Clock Information mgmt
command. This is done by performing one or two HCI_Read_Clock commands
and creating the response from the stored values in the hci_dev and
hci_conn structs.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:49 +02:00
Johan Hedberg
33f3572103 Bluetooth: Add tracking of local and piconet clock values
This patch adds support for storing the local and piconet clock values
from the HCI_Read_Clock command response to the hci_dev and hci_conn
structs. This will be later used in another patch to implement support
for the Get Clock Info mgmt command.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:48 +02:00
Marcel Holtmann
df935429be Bluetooth: Send HCI_Read_Clock_Offset before disconnecting
When the connection is in master role and it is going to be
disconnected based on the disconnection timeout, then send
the HCI_Read_Clock_Offset command in an attempt to update the
clock offset value in the inquiry cache.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:48 +02:00
Johan Hedberg
b10e8017bd Bluetooth: Remove unnecessary hcon->smp_conn variable
The smp_conn member of struct hci_conn was simply a pointer to the
l2cap_conn object. Since we already have hcon->l2cap_data that points to
the same thing there's no need to have this second variable. This patch
removes it and changes the single place that was using it to use
hcon->l2cap_data instead.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:47 +02:00
Andre Guedes
dbbfa2ab7a Bluetooth: Use macro instead of hard-coded value
This patch replaces the hard-coded value in hci_bdaddr_is_rpa() helper
by the corresponding macro ADDR_LE_DEV_RANDOM.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:46 +02:00
Johan Hedberg
4dae27983e Bluetooth: Convert hci_conn->link_mode into flags
Since the link_mode member of the hci_conn struct is a bit field and we
already have a flags member as well it makes sense to merge these two
together. This patch moves all used link_mode bits into corresponding
flags. To keep backwards compatibility with user space we still need to
provide a get_link_mode() helper function for the ioctl's that expect a
link_mode style value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:46 +02:00
Johan Hedberg
3769972bad Bluetooth: Add a new HCI_USE_DEBUG_KEYS flag
To pave the way for actively using debug keys for pairing this patch
adds a new HCI_USE_DEBUG_KEYS flag for the purpose. When the flag is set
we issue a HCI_Write_SSP_Debug mode whenever HCI_Write_SSP_Mode(0x01)
has been issued as well as before issuing a HCI_Write_SSP_Mode(0x00)
command.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:46 +02:00
Johan Hedberg
af6a9c3213 Bluetooth: Convert hcon->flush_key to a proper flag
There's no point in having boolean variables in the hci_conn struct
since it already has a flags member. This patch converts the flush_key
member into a proper flag.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:46 +02:00
Johan Hedberg
0663b297f1 Bluetooth: Rename HCI_DEBUG_KEYS to HCI_KEEP_DEBUG_KEYS
We're planning to add a flag to actively use debug keys in addition to
simply just accepting them, which makes the current generically named
DEBUG_KEYS flag a bit confusing. Since the flag in practice affects
whether the kernel keeps debug keys around or not rename it to
HCI_KEEP_DEBUG_KEYS.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:45 +02:00
Johan Hedberg
7652ff6aea Bluetooth: Move mgmt event sending out from hci_add_link_key()
There are two callers of hci_add_link_key(). The first one is the HCI
Link Key Notification event and the second one the mgmt code that
receives a list of link keys from user space. Previously we've had the
hci_add_link_key() function being responsible for also emitting a mgmt
signal but for the latter use case this should not happen. Because of
this a rather awkward new_key paramter has been passed to the function.

This patch moves the mgmt event sending out from the hci_add_link_key()
function, thereby making the code a bit more understandable.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:45 +02:00
Johan Hedberg
567fa2aa3d Bluetooth: Update hci_add_link_key() to return pointer to key
By returning the added (or updated) key we pave the way for further
refactoring (in subsequent patches) that allows moving the mgmt event
sending out from this function (and thereby removal of the awkward
new_key parameter).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:45 +02:00
Marcel Holtmann
1855d92dce Bluetooth: Track LE connection parameter update event
When the LE controller changes its connection parameters, it will send
a connection parameter update event. Make sure that the new set of
parameters are stored in hci_conn struct and thus will properly update
the previous values retrieved from the connection complete event.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:44 +02:00
Marcel Holtmann
e04fde60ef Bluetooth: Store current LE connection parameters in hci_conn struct
The LE connection parameters are needed later on to be able to decide
if it is required to trigger connection update procedures. So when the
connection has been established successfully, store the current used
parameters in hci_conn struct.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:44 +02:00
Jukka Rissanen
6b8d4a6a03 Bluetooth: 6LoWPAN: Use connected oriented channel instead of fixed one
Create a CoC dynamically instead of one fixed channel for communication
to peer devices.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:44 +02:00
Jukka Rissanen
0498878b18 Bluetooth: Provide L2CAP ops callback for memcpy_fromiovec
The highly optimized TX path for L2CAP channels and its fragmentation
within the HCI ACL packets requires to copy data from user provided
IO vectors and also kernel provided memory buffers.

This patch allows channel clients to provide a memcpy_fromiovec callback
to keep this optimized behavior, but adapt it to kernel vs user memory
for the TX path. For all kernel internal L2CAP channels, a default
implementation is provided that can be referenced.

In case of A2MP, this fixes a long-standing issue with wrongly accessing
kernel memory as user memory.

This patch originally by Marcel Holtmann.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:43 +02:00
Marcel Holtmann
111902f723 Bluetooth: Use separate dbg_flags to special debugfs options
All the special settings configured via debugfs are either developer
only options or temporary solutions. To not clutter the standard flags,
move them to their own dbg_flags entry.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:43 +02:00
Johan Hedberg
ddf4ba078f Bluetooth: Remove unused LTK authentication defines
These defines were probably put in to track authenticated vs
unauthenticated LTKs, however since the LTK struct has a separate
boolean authenticated member these were never used.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:42 +02:00
Johan Hedberg
2ceba53936 Bluetooth: Remove HCI prefix from SMP LTK defines
The LTK type has really nothing to do with HCI so it makes more sense to
have these in smp.h than hci.h. This patch moves the defines to smp.h
and removes the HCI_ prefix in the same go.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:42 +02:00
Johan Hedberg
7d5843b7b7 Bluetooth: Remove unnecessary SMP STK define
We never store the "master" type of STKs since we request encryption
directly with them so we only need one STK type (the one that's
looked-up on the slave side). Simply remove the unnecessary define and
rename the _SLAVE one to the shorter form.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-07-03 17:42:42 +02:00
Marcel Holtmann
65cc2b49db Bluetooth: Use struct delayed_work for HCI command timeout
Since the whole HCI command, event and data packet processing has been
migrated to use workqueues instead of tasklets, it makes sense to use
struct delayed_work instead of struct timer_list for the timeout
handling. This patch converts the hdev->cmd_timer to use workqueue
as well.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:42 +02:00
Marcel Holtmann
d9fbd02be5 Bluetooth: Use explicit header and body length for L2CAP SKB allocation
When allocating the L2CAP SKB for transmission, provide the upper layers
with a clear distinction on what is the header and what is the body
portion of the SKB.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:42 +02:00
Marcel Holtmann
0775899158 Bluetooth: Shrink size of struct l2cap_ctrl fields
The struct l2cap_ctrl fields are wasting an unsigned int when all the
bits can fit into an __u8 field.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:41 +02:00
Marcel Holtmann
8d46321c4f Bluetooth: Assign L2CAP socket priority when allocating SKB
The SKB for L2CAP sockets are all allocated in a central callback
in the socket support. Instead of having to pass around the socket
priority all the time, assign it to skb->priority when actually
allocating the SKB.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:41 +02:00
Marcel Holtmann
67f86a45bb Bluetooth: Use const for struct l2cap_ops field
The struct l2cap_ops field should not allow any modifications and thus
it is better declared as const.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:41 +02:00
Daniel Borkmann
8f61059a96 net: sctp: improve timer slack calculation for transport HBs
RFC4960, section 8.3 says:

  On an idle destination address that is allowed to heartbeat,
  it is recommended that a HEARTBEAT chunk is sent once per RTO
  of that destination address plus the protocol parameter
  'HB.interval', with jittering of +/- 50% of the RTO value,
  and exponential backoff of the RTO if the previous HEARTBEAT
  is unanswered.

Currently, we calculate jitter via sctp_jitter() function first,
and then add its result to the current RTO for the new timeout:

  TMO = RTO + (RAND() % RTO) - (RTO / 2)
              `------------------------^-=> sctp_jitter()

Instead, we can just simplify all this by directly calculating:

  TMO = (RTO / 2) + (RAND() % RTO)

With the help of prandom_u32_max(), we don't need to open code
our own global PRNG, but can instead just make use of the per
CPU implementation of prandom with better quality numbers. Also,
we can now spare us the conditional for divide by zero check
since no div or mod operation needs to be used. Note that
prandom_u32_max() won't emit the same result as a mod operation,
but we really don't care here as we only want to have a random
number scaled into RTO interval.

Note, exponential RTO backoff is handeled elsewhere, namely in
sctp_do_8_2_transport_strike().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:44:07 -07:00
Alexander Aring
48bc03433c ieee802154: reassembly: fix possible buffer overflow
The max_dsize attribute in ctl_table for lowpan_frags_ns_ctl_table is
configured with integer accessing methods. This patch change the
max_dsize attribute to int to avoid a possible buffer overflow.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:34:25 -07:00
Eric Dumazet
5925a0555b net: fix sparse warning in sk_dst_set()
sk_dst_cache has __rcu annotation, so we need a cast to avoid
following sparse error :

include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces)
include/net/sock.h:1774:19:    expected struct dst_entry [noderef] <asn:4>*__ret
include/net/sock.h:1774:19:    got struct dst_entry *dst

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 7f50236153 ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 17:03:06 -07:00
Eric Dumazet
9fe516ba3f inet: move ipv6only in sock_common
When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.

This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()

This is magnified when SO_REUSEPORT is used.

Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 23:46:21 -07:00
Eric Dumazet
7f50236153 ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix
We have two different ways to handle changes to sk->sk_dst

First way (used by TCP) assumes socket lock is owned by caller, and use
no extra lock : __sk_dst_set() & __sk_dst_reset()

Another way (used by UDP) uses sk_dst_lock because socket lock is not
always taken. Note that sk_dst_lock is not softirq safe.

These ways are not inter changeable for a given socket type.

ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
the socket lock as synchronization, but users might be UDP sockets.

Instead of converting sk_dst_lock to a softirq safe version, use xchg()
as we did for sk_rx_dst in commit e47eb5dfb2 ("udp: ipv4: do not use
sk_dst_lock from softirq context")

In a follow up patch, we probably can remove sk_dst_lock, as it is
only used in IPv6.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 9cb3a50c5f ("ipv4: Invalidate the socket cached route on pmtu events if possible")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-30 23:40:58 -07:00
Octavian Purdila
1fb6f159fd tcp: add tcp_conn_request
Create tcp_conn_request and remove most of the code from
tcp_v4_conn_request and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:37 -07:00
Octavian Purdila
695da14eb0 tcp: add queue_add_hash to tcp_request_sock_ops
Add queue_add_hash member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
2aec4a297b tcp: add mss_clamp to tcp_request_sock_ops
Add mss_clamp member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
5db92c9949 tcp: unify tcp_v4_rtx_synack and tcp_v6_rtx_synack
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
d6274bd8d6 tcp: add send_synack method to tcp_request_sock_ops
Create a new tcp_request_sock_ops method to unify the IPv4/IPv6
signature for tcp_v[46]_send_synack. This allows us to later unify
tcp_v4_rtx_synack with tcp_v6_rtx_synack and tcp_v4_conn_request with
tcp_v4_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
936b8bdb53 tcp: add init_seq method to tcp_request_sock_ops
More work in preparation of unifying tcp_v4_conn_request and
tcp_v6_conn_request: indirect the init sequence calls via the
tcp_request_sock_ops.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
d94e0417ad tcp: add route_req method to tcp_request_sock_ops
Create wrappers with same signature for the IPv4/IPv6 request routing
calls and use these wrappers (via route_req method from
tcp_request_sock_ops) in tcp_v4_conn_request and tcp_v6_conn_request
with the purpose of unifying the two functions in a later patch.

We can later drop the wrapper functions and modify inet_csk_route_req
and inet6_cks_route_req to use the same signature.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
fb7b37a7f3 tcp: add init_cookie_seq method to tcp_request_sock_ops
Move the specific IPv4/IPv6 cookie sequence initialization to a new
method in tcp_request_sock_ops in preparation for unifying
tcp_v4_conn_request and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Octavian Purdila
16bea70aa7 tcp: add init_req method to tcp_request_sock_ops
Move the specific IPv4/IPv6 intializations to a new method in
tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request
and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Octavian Purdila
476eab8251 net: remove inet6_reqsk_alloc
Since pktops is only used for IPv6 only and opts is used for IPv4
only, we can move these fields into a union and this allows us to drop
the inet6_reqsk_alloc function as after this change it becomes
equivalent with inet_reqsk_alloc.

This patch also fixes a kmemcheck issue in the IPv6 stack: the flags
field was not annotated after a request_sock was allocated.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Octavian Purdila
57b47553f6 tcp: cookie_v4_init_sequence: skb should be const
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Pablo Neira Ayuso
960649d192 netfilter: bridge: add generic packet logger
This adds the generic plain text packet loggger for bridged packets.
It routes the logging message to the real protocol packet logger.
I decided not to refactor the ebt_log code for two reasons:

1) The ebt_log output is not consistent with the IPv4 and IPv6
   Netfilter packet loggers. The output is different for no good
   reason and it adds redundant code to handle packet logging.

2) To avoid breaking backward compatibility for applications
   outthere that are parsing the specific ebt_log output, the ebt_log
   output has been left as is. So only nftables will use the new
   consistent logging format for logged bridged packets.

More decisions coming in this patch:

1) This also removes ebt_log as default logger for bridged packets.
   Thus, nf_log_packet() routes packet to this new packet logger
   instead. This doesn't break backward compatibility since
   nf_log_packet() is not used to log packets in plain text format
   from anywhere in the ebtables/netfilter bridge code.

2) The new bridge packet logger also performs a lazy request to
   register the real IPv4, ARP and IPv6 netfilter packet loggers.
   If the real protocol logger is no available (not compiled or the
   module is not available in the system, not packet logging happens.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-27 13:20:47 +02:00
Pablo Neira Ayuso
fab4085f4e netfilter: log: nf_log_packet() as real unified interface
Before this patch, the nf_loginfo parameter specified the logging
configuration in case the specified default logger was loaded. This
patch updates the semantics of the nf_loginfo parameter in
nf_log_packet() which now indicates the logger that you explicitly
want to use.

Thus, nf_log_packet() is exposed as an unified interface which
internally routes the log message to the corresponding logger type
by family.

The module dependencies are expressed by the new nf_logger_find_get()
and nf_logger_put() functions which bump the logger module refcount.
Thus, you can not remove logger modules that are used by rules anymore.

Another important effect of this change is that the family specific
module is only loaded when required. Therefore, xt_LOG and nft_log
will just trigger the autoload of the nf_log_{ip,ip6} modules
according to the family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-27 13:20:13 +02:00
Pablo Neira Ayuso
83e96d443b netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files
The plain text logging is currently embedded into the xt_LOG target.
In order to be able to use the plain text logging from nft_log, as a
first step, this patch moves the family specific code to the following
files and Kconfig symbols:

1) net/ipv4/netfilter/nf_log_ip.c: CONFIG_NF_LOG_IPV4
2) net/ipv6/netfilter/nf_log_ip6.c: CONFIG_NF_LOG_IPV6
3) net/netfilter/nf_log_common.c: CONFIG_NF_LOG_COMMON

These new modules will be required by xt_LOG and nft_log. This patch
is based on original patch from Arturo Borrero Gonzalez.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-27 13:19:59 +02:00
David S. Miller
9b8d90b963 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-25 22:40:43 -07:00
Eric Dumazet
f886497212 ipv4: fix dst race in sk_dst_get()
When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dormando <dormando@rydia.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 17:41:44 -07:00
Pablo Neira Ayuso
27fd8d90c9 netfilter: nf_log: move log buffering to core logging
This patch moves Eric Dumazet's log buffer implementation from the
xt_log.h header file to the core net/netfilter/nf_log.c. This also
includes the renaming of the structure and functions to avoid possible
undesired namespace clashes.

This change allows us to use it from the arp and bridge packet logging
implementation in follow up patches.
2014-06-25 19:28:43 +02:00
Pablo Neira Ayuso
5962815a6a netfilter: nf_log: use an array of loggers instead of list
Now that legacy ulog targets are not available anymore in the tree, we
can have up to two possible loggers:

1) The plain text logging via kernel logging ring.
2) The nfnetlink_log infrastructure which delivers log messages
   to userspace.

This patch replaces the list of loggers by an array of two pointers
per family for each possible logger and it also introduces a new field
to the nf_logger structure which indicates the position in the logger
array (based on the logger type).

This prepares a follow up patch that consolidates the nf_log_packet()
interface by allowing to specify the logger as parameter.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-25 19:28:43 +02:00
Florian Westphal
9500507c61 netfilter: conntrack: remove timer from ecache extension
This brings the (per-conntrack) ecache extension back to 24 bytes in size
(was 152 byte on x86_64 with lockdep on).

When event delivery fails, re-delivery is attempted via work queue.

Redelivery is attempted at least every 0.1 seconds, but can happen
more frequently if userspace is not congested.

The nf_ct_release_dying_list() function is removed.
With this patch, ownership of the to-be-redelivered conntracks
(on-dying-list-with-DYING-bit not yet set) is with the work queue,
which will release the references once event is out.

Joint work with Pablo Neira Ayuso.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-25 19:15:38 +02:00
Michal Kazior
97dc94f1d9 cfg80211: remove channel_switch combination check
Driver is now responsible for veryfing if the
switch is possible.

Since this is inherently tricky driver may decide
to disconnect an interface later with
cfg80211_stop_iface().

This doesn't mean driver can accept everything. It
should do it's best to verify requests and reject
them as soon as possible.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 18:06:20 +02:00
Michal Kazior
5bcae31d9c mac80211: implement multi-vif in-place reservations
Multi-vif in-place reservations happen when
it is impossible to allocate more channel contexts
as indicated by interface combinations.

Such reservations are not finalized until all
assigned interfaces are ready.

This still doesn't handle all possible cases
(i.e. degradation of number of channels) properly.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 18:06:20 +02:00
David Spinadel
633e271326 mac80211: split sched scan IEs
Split sched scan IEs to band specific and not band specific
blocks. Common IEs blocks may be sent to the FW once per command,
instead of per band.

This allows optimization of size of the command, which may be
required by some drivers (eg. iwlmvm with newer firmware version).

As this changes the mac80211 API, update all drivers to use the
new version correctly, even if they don't (yet) make use of the
split data.

Signed-off-by: David Spinadel <david.spinadel@intel.com>
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 09:10:43 +02:00
David Spinadel
c56ef67250 mac80211: support more than one band in scan request
Some drivers (such as iwlmvm) can handle multiple bands in a single
HW scan request. Add a HW flag to indicate that the driver support
this. To hold the required data, create a separate structure for
HW scan request that holds cfg scan request and data about
different parts of the scan IEs.

As this changes the mac80211 API, update all drivers using it to
use the correct new function type/argument.

Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 09:10:42 +02:00
Govindarajulu Varadarajan
e0f31d8498 flow_keys: Record IP layer protocol in skb_flow_dissect()
skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
give any information about IPv4 or IPv6.

This patch adds new member, n_proto, to struct flow_keys. Which records the
IP layer type. i.e IPv4 or IPv6.

This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.

Adding new member to flow_keys increases the struct size by around 4 bytes.
This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
qdisc_cb_private_validate()

So increase data size by 4

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-23 14:32:19 -07:00
Arik Nemtsov
ee10f2c779 mac80211: protect TDLS discovery session
After sending a TDLS discovery-request, we expect a reply to arrive on
the AP's channel. We must stay on the channel (no PSM, scan, etc.), since
a TDLS setup-response is a direct packet not buffered by the AP.
Add a new mac80211 driver callback to allow discovery session protection.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:28:19 +02:00
Arik Nemtsov
c887f0d3a0 mac80211: add API to request TDLS operation from userspace
Write a mac80211 to the cfg80211 API for requesting a userspace TDLS
operation. Define TDLS specific reason codes that can be used here.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:28:17 +02:00
Arik Nemtsov
31fa97c5de cfg80211: pass TDLS initiator in tdls_mgmt operations
The TDLS initiator is set once during link setup. If determines the
address ordering in the link identifier IE.

Fix dependent drivers - mwifiex and mac80211.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:24:55 +02:00
Johannes Berg
b7ffbd7ef6 cfg80211: make ethtool the driver's responsibility
Currently, cfg80211 tries to implement ethtool, but that doesn't
really scale well, with all the different operations. Make the
lower-level driver responsible for it, which currently only has
an effect on mac80211. It will similarly not scale well at that
level though, since mac80211 also has many drivers.

To cleanly implement this in mac80211, introduce a new file and
move some code to appropriate places.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:05:33 +02:00
Octavian Purdila
e0f802fbca tcp: move ir_mark initialization to tcp_openreq_init
ir_mark initialization is done for both TCP v4 and v6, move it in the
common tcp_openreq_init function.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-17 15:30:54 -07:00
Pablo Neira Ayuso
a0a7379e16 netfilter: nf_tables: use u32 for chain use counter
Since 4fefee5 ("netfilter: nf_tables: allow to delete several objects
from a batch"), every new rule bumps the chain use counter. However,
this is limited to 16 bits, which means that it will overrun after
2^16 rules.

Use a u32 chain counter and check for overflows (just like we do for
table objects).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:07:44 +02:00
Tom Herbert
bbdff225ed udp: call __skb_checksum_complete when doing full checksum
In __udp_lib_checksum_complete check if checksum is being done over all
the data (len is equal to skb->len) and if it is call
__skb_checksum_complete instead of __skb_checksum_complete_head. This
allows checksum to be saved in checksum complete.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 01:00:49 -07:00
Linus Torvalds
f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Florian Westphal
6e765a009a net_sched: drr: warn when qdisc is not work conserving
The DRR scheduler requires that items on the active list are work
conserving, i.e. do not hold on to skbs for throttling purposes, etc.
Attaching e.g. tbf renders DRR useless because all other classes on the
active list are delayed as well.

So, warn users that this configuration won't work as expected; we
already do this in couple of other qdiscs, see e.g.

commit b00355db3f
('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')

The 'const' change is needed to avoid compiler warning ("discards 'const'
qualifier from pointer target type").

tested with:
drr_hier() {
        parent=$1
        classes=$2
        for i in  $(seq 1 $classes); do
                classid=$parent$(printf %x $i)
                tc class add dev eth0 parent $parent classid $classid drr
		tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
        done
}
tc qdisc add dev eth0 root handle 1: drr
drr_hier 1: 32
tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:50:59 -07:00
Daniel Borkmann
e575235fc6 net: sctp: migrate most recently used transport to ktime
Be more precise in transport path selection and use ktime
helpers instead of jiffies to compare and pick the better
primary and secondary recently used transports. This also
avoids any side-effects during a possible roll-over, and
could lead to better path decision-making.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 12:23:17 -07:00
Octavian Purdila
6cc55e096f tcp: add gfp parameter to tcp_fragment
tcp_fragment can be called from process context (from tso_fragment).
Add a new gfp parameter to allow it to preserve atomic memory if
possible.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 22:30:58 -07:00
Linus Torvalds
3f17ea6dea Merge branch 'next' (accumulated 3.16 merge window patches) into master
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.

* accumulated work in next: (6809 commits)
  ufs: sb mutex merge + mutex_destroy
  powerpc: update comments for generic idle conversion
  cris: update comments for generic idle conversion
  idle: remove cpu_idle() forward declarations
  nbd: zero from and len fields in NBD_CMD_DISCONNECT.
  mm: convert some level-less printks to pr_*
  MAINTAINERS: adi-buildroot-devel is moderated
  MAINTAINERS: add linux-api for review of API/ABI changes
  mm/kmemleak-test.c: use pr_fmt for logging
  fs/dlm/debug_fs.c: replace seq_printf by seq_puts
  fs/dlm/lockspace.c: convert simple_str to kstr
  fs/dlm/config.c: convert simple_str to kstr
  mm: mark remap_file_pages() syscall as deprecated
  mm: memcontrol: remove unnecessary memcg argument from soft limit functions
  mm: memcontrol: clean up memcg zoneinfo lookup
  mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
  mm/mempool.c: update the kmemleak stack trace for mempool allocations
  lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
  mm: introduce kmemleak_update_trace()
  mm/kmemleak.c: use %u to print ->checksum
  ...
2014-06-08 11:31:16 -07:00
Tom Herbert
359a0ea987 vxlan: Add support for UDP checksums (v4 sending, v6 zero csums)
Added VXLAN link configuration for sending UDP checksums, and allowing
TX and RX of UDP6 checksums.

Also, call common iptunnel_handle_offloads and added GSO support for
checksums.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:39 -07:00
Tom Herbert
4749c09c37 gre: Call gso_make_checksum
Call gso_make_checksum. This should have the benefit of using a
checksum that may have been previously computed for the packet.

This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
offload GRE GSO with and without the GRE checksum offloaed.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:38 -07:00
Tom Herbert
af5fcba7f3 udp: Generic functions to set checksum
Added udp_set_csum and udp6_set_csum functions to set UDP checksums
in packets. These are for simple UDP packets such as those that might
be created in UDP tunnels.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:38 -07:00
Linus Torvalds
1aacb90eaa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next
Pull trivial tree changes from Jiri Kosina:
 "Usual pile of patches from trivial tree that make the world go round"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  staging: go7007: remove reference to CONFIG_KMOD
  aic7xxx: Remove obsolete preprocessor define
  of: dma: doc fixes
  doc: fix incorrect formula to calculate CommitLimit value
  doc: Note need of bc in the kernel build from 3.10 onwards
  mm: Fix printk typo in dmapool.c
  modpost: Fix comment typo "Modules.symvers"
  Kconfig.debug: Grammar s/addition/additional/
  wimax: Spelling s/than/that/, wording s/destinatary/recipient/
  aic7xxx: Spelling s/termnation/termination/
  arm64: mm: Remove superfluous "the" in comment
  of: Spelling s/anonymouns/anonymous/
  dma: imx-sdma: Spelling s/determnine/determine/
  ath10k: Improve grammar in comments
  ath6kl: Spelling s/determnine/determine/
  of: Improve grammar for of_alias_get_id() documentation
  drm/exynos: Spelling s/contro/control/
  radio-bcm2048.c: fix wrong overflow check
  doc: printk-formats: do not mention casts for u64/s64
  doc: spelling error changes
  ...
2014-06-04 08:50:34 -07:00
David S. Miller
c99f7abf0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/inetpeer.h
	net/ipv6/output_core.c

Changes in net were fixing bugs in code removed in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 23:32:12 -07:00
Linus Torvalds
776edb5931 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
2014-06-03 12:57:53 -07:00
Eric Dumazet
39c36094d7 net: fix inet_getid() and ipv6_select_ident() bugs
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.

06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)

It appears I introduced this bug in linux-3.1.

inet_getid() must return the old value of peer->ip_id_count,
not the new one.

Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.

Fixes: 87c48fa3b4 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:09:28 -07:00
David S. Miller
31595de219 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-06-02

Please pull this remaining batch of updates intended for the 3.16 stream...

For the mac80211 bits, Johannes says:

"The remainder for -next right now is mostly fixes, and a handful of
small new things like some CSA infrastructure, the regdb script mW/dBm
conversion change and sending wiphy notifications."

For the bluetooth bits, Gustavo says:

"Some more patches for 3.16. There is nothing really special here, just a
bunch of clean ups, fixes plus some small improvements. Please pull."

For the nfc bits, Samuel says:

"We have:

- Felica (Type3) tags support for trf7970a
- Type 4b tags support for port100
- st21nfca DTS typo fix
- A few sparse warning fixes"

For the atheros bits, Kalle says:

"Ben added support for setting antenna configurations. Michal improved
warm reset so that we would not need to fall back to cold reset that
often, an issue where ath10k stripped protected flag while in monitor
mode and made module initialisation asynchronous to fix the problems
with firmware loading when the driver is linked to the kernel.

Luca removed unused channel_switch_beacon callbacks both from ath9k and
ath10k. Marek fixed Protected Management Frames (PMF) when using Action
Frames. Also we had other small fixes everywhere in the driver."

Along with that, there are a handful of updates to a variety
of drivers.  This includes updates to at76c50x-usb, ath9k, b43,
brcmfmac, mwifiex, rsi, rtlwifi, and wil6210.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 11:17:35 -07:00
Eric Dumazet
73f156a6e8 inetpeer: get rid of ip_id_count
Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 11:00:41 -07:00
John W. Linville
fcb2c0d6cf Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-06-02 11:20:17 -04:00
David S. Miller
90d0e08e57 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

This small patchset contains three accumulated Netfilter/IPVS updates,
they are:

1) Refactorize common NAT code by encapsulating it into a helper
   function, similarly to what we do in other conntrack extensions,
   from Florian Westphal.

2) A minor format string mismatch fix for IPVS, from Masanari Iida.

3) Add quota support to the netfilter accounting infrastructure, now
   you can add quotas to accounting objects via the nfnetlink interface
   and use them from iptables. You can also listen to quota
   notifications from userspace. This enhancement from Mathieu Poirier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:54:47 -07:00
John W. Linville
a5eb1aeb25 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Conflicts:
	drivers/bluetooth/btusb.c
2014-05-29 13:03:47 -04:00
John W. Linville
737be10d8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-29 12:55:38 -04:00
John W. Linville
9db7cb6901 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-05-27 13:51:31 -04:00
Luciano Coelho
1a5f0c13d1 mac80211: add a single-transaction driver op to switch contexts
In some cases, when the driver is already using all the channel
contexts it can handle at once, we have to do an in-place switch
(ie. we cannot afford using an extra context temporarily for the
transaction).  But some drivers may not support switching the channel
context assigned to a vif on the fly (ie. without unassigning and
assigning it) while others may only work if the context is changed on
the fly, without unassigning it first.

To allow these different scenarios, add a new driver operation that
let's the driver decide how to handle an in-place switch.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-26 11:04:41 +02:00
David S. Miller
54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
Tom Herbert
28448b8045 net: Split sk_no_check into sk_no_check_{rx,tx}
Define separate fields in the sock structure for configuring disabling
checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
removed UDP_CSUM_* defines since they are no longer necessary.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:28:53 -04:00
Tom Herbert
b26ba202e0 net: Eliminate no_check from protosw
It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:28:53 -04:00
Johan Hedberg
d7b2545023 Bluetooth: Clearly distinguish mgmt LTK type from authenticated property
On the mgmt level we have a key type parameter which currently accepts
two possible values: 0x00 for unauthenticated and 0x01 for
authenticated. However, in the internal struct smp_ltk representation we
have an explicit "authenticated" boolean value.

To make this distinction clear, add defines for the possible mgmt values
and do conversion to and from the internal authenticated value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-23 11:24:04 -07:00
David S. Miller
65db611a5c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-05-22

This is the last ipsec pull request before I leave for
a three weeks vacation tomorrow. David, can you please
take urgent ipsec patches directly into net/net-next
during this time?

I'll continue to run the ipsec/ipsec-next trees as soon
as I'm back.

1) Simplify the xfrm audit handling, from Tetsuo Handa.

2) Codingstyle cleanup for xfrm_output, from abian Frederick.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 16:00:00 -04:00
Ezequiel Garcia
e876f208af net: Add a software TSO helper API
Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 14:57:15 -04:00
John W. Linville
40a10fd740 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-22 13:58:36 -04:00
John W. Linville
99abe65ff1 NFC: 3.16: First pull request
This is the NFC pull request for 3.16. We have:
 
 - STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
   thus relies on the HCI stack. This submission provides support for tag
   redaer/writer mode (including Type 5) and device tree bindings.
 
 - PM runtime support and a bunch of bug fixes for TI's trf7970a.
 
 - Device tree support for NXP's pn544. Legacy platform data support is
   obviously kept intact.
 
 - NFC Tag type 4B support to the NFC Digital stack.
 
 - SOCK_RAW type support to the raw NFC socket, and allow NCI
   sniffing from that. This can be extended to report HCI frames and also
   proprietarry ones like e.g. the pn533 ones.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTepRlAAoJEIqAPN1PVmxKnF0P/RvfrZs6CbGNJC+dkEbk90p1
 nsngy4+4MmPwJYVzObnLz4Br0k1kmFKiOKske6drjMpgzDWeuQelw3B7bd3FYfxD
 YkQsc5RC984xrDoDH5pn8mA6VJqmn7whrmcibTYAixrDqTvo8gw6uja4ryAnSdZm
 n7cRbh/A5F/sa7O4mPA0bCTdp4jAS/vOP9rGFDOth/b5yJVs99XmC+AZp/Ad9BUx
 +/osWGmBV5jshtX7aPTSxIQB4BUaP/lP1DW8yF5whKDjsHC9QyJcAtw9HfZ4tv2h
 YNteZZ8yjM+rSjnDw/LvDc2Gp8Z8P1GYf8D3QN3cWhw1ZvXi7CnqKjEnm41sbfaH
 L5esIfsRBUdmk6Ika7zALqmOQFI3PzH+ag96punl29qb2gyBDRSnXKVLirv3xxFG
 h7vYtQL43Rosn/4pSilRbYReRwyKbSCxW3un/tUJy0Faafs6q+9oMC2aWbIfTT6l
 40n4H9EmzYy2OaaXSFckiIIYYgVDAji8GLXTf+dPHb+NrH3QQOR3m27WzHc4rmYk
 kUrv0lKoFswA+VLlIcJTrSKNF21FDjwuImzIWiPz6Fx/+rWJ0b4GlQyIynD72LpR
 2LkUhTrxuRuRtxVCtvTdkPlL6Bdp3HO7t4qZ0EirgnpmGK6NScBgABoqFJSbz9uS
 UUvZbHVIjLrDU9zzoyz8
 =cSl+
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.16: First pull request

This is the NFC pull request for 3.16. We have:

- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
  thus relies on the HCI stack. This submission provides support for tag
  redaer/writer mode (including Type 5) and device tree bindings.

- PM runtime support and a bunch of bug fixes for TI's trf7970a.

- Device tree support for NXP's pn544. Legacy platform data support is
  obviously kept intact.

- NFC Tag type 4B support to the NFC Digital stack.

- SOCK_RAW type support to the raw NFC socket, and allow NCI
  sniffing from that. This can be extended to report HCI frames and also
  proprietarry ones like e.g. the pn533 ones."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-05-22 13:56:46 -04:00
David S. Miller
8af750d739 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
Netfilter/nftables updates for net-next

The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:

1) Add set element update notification via netlink, from Arturo Borrero.

2) Put all object updates in one single message batch that is sent to
   kernel-space. Before this patch only rules where included in the batch.
   This series also introduces the generic transaction infrastructure so
   updates to all objects (tables, chains, rules and sets) are applied in
   an all-or-nothing fashion, these series from me.

3) Defer release of objects via call_rcu to reduce the time required to
   commit changes. The assumption is that all objects are destroyed in
   reverse order to ensure that dependencies betweem them are fulfilled
   (ie. rules and sets are destroyed first, then chains, and finally
   tables).

4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
   include two patches to prepare this new feature.

5) Implement the proper set selection based on the characteristics of the
   data. The new infrastructure also allows you to specify your preferences
   in terms of memory and computational complexity so the underlying set
   type is also selected according to your needs, from Patrick McHardy.

6) Several cleanup patches for nft expressions, including one minor possible
   compilation breakage due to missing mark support, also from Patrick.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 12:06:23 -04:00
Neal Cardwell
ca8a226343 tcp: make cwnd-limited checks measurement-based, and gentler
Experience with the recent e114a710aa ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.

The main problems seemed to be:

(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.

(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).

This commit improves both of these, by:

(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.

(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 12:04:49 -04:00
Emmanuel Grumbach
67af981153 cfg80211: allow RSSI compensation
Channels in 2.4GHz band overlap, this means that if we
send a probe request on channel 1 and then move to channel
2, we will hear the probe response on channel 2. In this
case, the RSSI will be lower than if we had heard it on
the channel on which it was sent (1 in this case).

The firmware / low level driver can parse the channel in
the DS IE or HT IE and compensate the RSSI so that it will
still have a valid value even if we heard the frame on an
adjacent channel. This can be done up to a certain offset.

Add this offset as a configuration for the low level driver.
A low level driver that can compensate the low RSSI in this
case should assign the maximal offset for which the RSSI
value is still valid.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-22 09:58:49 +02:00
Antonio Quartulli
7406353d43 cfg80211: implement cfg80211_get_station cfg80211 API
Implement and export the new cfg80211_get_station() API.
This utility can be used by other kernel modules to obtain
detailed information about a given wireless station.

It will be in particular useful to batman-adv which will
implement a wireless rate based metric.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-21 09:15:17 +02:00
Antonio Quartulli
cca674d47e mac80211: export the expected throughput
Add get_expected_throughput() API to mac80211 so that each
driver can implement its own version based on the RC
algorithm they are using (might be using an HW RC algo).
The API returns a value expressed in Kbps.

Also, add the new get_expected_throughput() member
to the rate_control_ops structure in order to be
able to query the RC algorithm (this patch provides an
implementation of this API for both minstrel and
minstrel_ht).

The related member in the station_info object is now
filled accordingly when dumping a station.

Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-21 09:15:16 +02:00
Antonio Quartulli
867d849fc8 cfg80211: export expected throughput through get_station()
Users may need information about the expected throughput
towards a given peer.
This value is supposed to consider the size overhead
generated by the 802.11 header.

This value is exported in kbps through the get_station() API
by including it into the station_info object.
Moreover, it is sent to user space when replying to the
nl80211 GET_STATION command.

This information will be useful to the batman-adv module
which will use it for its new metric computation.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-20 15:13:32 +02:00
Hiren Tandel
57be1f3f3e NFC: Add RAW socket type support for SOCKPROTO_RAW
This allows for a more generic NFC sniffing by using SOCKPROTO_RAW
SOCK_RAW to read RAW NFC frames. This is for sniffing anything but LLCP
(HCI, NCI, etc...).

Signed-off-by: Hiren Tandel <hirent@marvell.com>
Signed-off-by: Rahul Tank <rahult@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-20 00:06:04 +02:00
Johannes Berg
922bd80fc3 cfg80211: constify wowlan/coalesce mask/pattern pointers
This requires changing the nl80211 parsing code a bit to use
intermediate pointers for the allocation, but clarifies the
API towards the drivers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 18:06:50 +02:00
Johannes Berg
c1e5f4714d cfg80211: constify more pointers in the cfg80211 API
This also propagates through the drivers.

The orinoco driver uses the cfg80211 API structs for internal
bookkeeping, and so needs a (void *) cast that removes the
const - but that's OK because it allocates those pointers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 17:53:16 +02:00
Johannes Berg
3b3a0162fa cfg80211: constify MAC addresses in cfg80211 ops
This propagates through all the drivers and mac80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 17:34:42 +02:00
Luciano Coelho
8d77ec8562 mac80211: fix csa_counter_offs argument name in docbook
The csa_counter_offs was erroneously described as csa_offs in
the docbook section.

This fixes two warnings when making htmldocs (at least):

Warning(include/net/mac80211.h:3428): No description found for parameter 'csa_counter_offs[IEEE80211_MAX_CSA_COUNTERS_NUM]'
Warning(include/net/mac80211.h:3428): Excess struct/union/enum/typedef member 'csa_offs' description in 'ieee80211_mutable_offsets'

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 14:39:34 +02:00
Luciano Coelho
c2e4323b33 cfg80211: add documentation for max_num_csa_counters
Move the comment in the structure to a description of the
max_num_csa_counters field in the docbook area.

This fixes a warning when building htmldocs (at least):

 Warning(include/net/cfg80211.h:3064): No description found for parameter 'max_num_csa_counters'

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 14:38:53 +02:00
Pablo Neira Ayuso
c7c32e72cb netfilter: nf_tables: defer all object release via rcu
Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso
128ad3322b netfilter: nf_tables: remove skb and nlh from context structure
Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso
60319eb1ca netfilter: nf_tables: use new transaction infrastructure to handle elements
Leave the set content in consistent state if we fail to load the
batch. Use the new generic transaction infrastructure to achieve
this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso
55dd6f9307 netfilter: nf_tables: use new transaction infrastructure to handle table
This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso
91c7b38dc9 netfilter: nf_tables: use new transaction infrastructure to handle chain
This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
958bee14d0 netfilter: nf_tables: use new transaction infrastructure to handle sets
This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:

 1) create the set and send netlink message to the kernel
 2) process the response from the kernel that contains the allocated name.
 3) add the set elements and send netlink message to the kernel.
 4) process the response from the kernel (to check for errors).

To:

 1) add the set to the batch.
 2) add the set elements to the batch.
 3) add the rule that points to the set.
 4) send batch to the kernel.

This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.

Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
b380e5c733 netfilter: nf_tables: add message type to transactions
The patch adds message type to the transaction to simplify the
commit the and abort routines. Yet another step forward in the
generalisation of the transaction infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
1081d11b08 netfilter: nf_tables: generalise transaction infrastructure
This patch generalises the existing rule transaction infrastructure
so it can be used to handle set, table and chain object transactions
as well. The transaction provides a data area that stores private
information depending on the transaction type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
7c95f6d866 netfilter: nf_tables: deconstify table and chain in context structure
The new transaction infrastructure updates the family, table and chain
objects in the context structure, so let's deconstify them. While at it,
move the context structure initialization routine to the top of the
source file as it will be also used from the table and chain routines.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:09 +02:00
Phoebe Buckheister
f0f77dc6be ieee802154, mac802154: implement devkey record option
The 802.15.4-2011 standard states that for each key, a list of devices
that use this key shall be kept. Previous patches have only considered
two options:

 * a device "uses" (or may use) all keys, rendering the list useless
 * a device is restricted to a certain set of keys

Another option would be that a device *may* use all keys, but need not
do so, and we are interested in the actual set of keys the device uses.
Recording keys used by any given device may have a noticable performance
impact and might not be needed as often. The common case, in which a
device will not switch keys too often, should still perform well.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:42 -04:00
Phoebe Buckheister
29e023746a mac802154: add llsec configuration functions
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
af9eed5bbf ieee802154: add dgram sockopts for security control
Allow datagram sockets to override the security settings of the device
they send from on a per-socket basis. Requires CAP_NET_ADMIN or
CAP_NET_RAW, since raw sockets can send arbitrary packets anyway.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
f30be4d53c mac802154: integrate llsec with wpan devices
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
dc20759f28 ieee802154: add types for link-layer security
The added structures match 802.15.4-2011 link-layer security PIBs as
closely as is reasonable. Some lists required by the standard were
modeled as bitmaps (frame_types and command_frame_ids in *llsec_key,
802.15.4-2011 7.5/Table 61), since using lists for those seems a bit
excessive and not particularly useful. The DeviceDescriptorHandleList
was inverted and is here a per-device list, since operations on this
list are likely to have both a key and a device at hand, and per-device
lists of keys are shorter than per-key lists of devices.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:40 -04:00
Andrzej Kaczmarek
d0455ed996 Bluetooth: Store max TX power level for connection
This patch adds support to store local maximum TX power level for
connection when reply for HCI_Read_Transmit_Power_Level is received.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-15 21:48:07 -07:00
Andrzej Kaczmarek
dd9838087b Bluetooth: Add support to get connection information
This patch adds support for Get Connection Information mgmt command
which can be used to query for information about connection, i.e. RSSI
and local TX power level.

In general values cached in hci_conn are returned as long as they are
considered valid, i.e. do not exceed age limit set in hdev. This limit
is calculated as random value between min/max values to avoid client
trying to guess when to poll for updated information.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-15 21:48:06 -07:00
Andrzej Kaczmarek
31ad169148 Bluetooth: Add conn info lifetime parameters to debugfs
This patch adds conn_info_min_age and conn_info_max_age parameters to
debugfs which determine lifetime of connection information. Actual
lifetime will be random value between min and max age.

Default values for min and max age are 1000ms and 3000ms respectively.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-15 21:48:05 -07:00
Duan Jiong
be7a010d6f ipv6: update Destination Cache entries when gateway turn into host
RFC 4861 states in 7.2.5:

	The IsRouter flag in the cache entry MUST be set based on the
         Router flag in the received advertisement.  In those cases
         where the IsRouter flag changes from TRUE to FALSE as a result
         of this update, the node MUST remove that router from the
         Default Router List and update the Destination Cache entries
         for all destinations using that neighbor as a router as
         specified in Section 7.3.3.  This is needed to detect when a
         node that is used as a router stops forwarding packets due to
         being configured as a host.

Currently, when dealing with NA Message which IsRouter flag changes from
TRUE to FALSE, the kernel only removes router from the Default Router List,
and don't update the Destination Cache entries.

Now in order to update those Destination Cache entries, i introduce
function rt6_clean_tohost().

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15 23:26:27 -04:00
Phoebe Buckheister
32edc40ae6 ieee802154: change _cb handling slightly
The current mac_cb handling of ieee802154 is rather awkward and limited.
Decompose the single flags field into multiple fields with the meanings
of each subfield of the flags field to make future extensions (for
example, link-layer security) easier. Also don't set the frame sequence
number in upper layers, since that's a thing the MAC is supposed to set
on frame transmit - we set it on header creation, but assuming that
upper layers do not blindly duplicate our headers, this is fine.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15 15:51:42 -04:00
Phoebe Buckheister
c3a6114f31 ieee802154: add definitions for link-layer security and header functions
When dealing with 802.15.4, one often has to know the maximum payload
size for a given packet. This depends on many factors, one of which is
whether or not a security header is present in the frame. These
definitions and functions provide an easy way for any upper layer to
calculate the maximum payload size for a packet. The first obvious user
for this is 6lowpan, which duplicates this calculation and gets it
partially wrong because it ignores security headers.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15 15:51:42 -04:00
David S. Miller
fcd77db07d net: Fix CONFIG_SYSCTL ifdef test.
> include/net/ip.h:211:5: warning: "CONFIG_SYSCTL" is not defined [-Wundef]
>  #if CONFIG_SYSCTL
>      ^

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15 13:43:14 -04:00
Andrei Otcheretianski
1af586c911 mac80211: Handle the CSA counters correctly
Make the beacon CSA counters part of ieee80211_mutable_offsets and don't
decrement CSA counters when generating a beacon template. This permits the
driver to offload the CSA counters handling. Since mac80211 updates the probe
responses with the correct counter, the driver should sync the counter's value
with mac80211 using ieee80211_csa_update_counter function.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-15 15:01:00 +02:00
Andrei Otcheretianski
6ec8c332a0 mac80211: Provide ieee80211_beacon_get_template API
Add a new API ieee80211_beacon_get_template, which doesn't
affect DTIM counter and should be used if the device generates beacon
frames, and new beacon template is needed. In addition set the offsets
to TIM IE for MESH interface.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-15 15:01:00 +02:00
Andrei Otcheretianski
9a774c78e2 cfg80211: Support multiple CSA counters
Change the type of NL80211_ATTR_CSA_C_OFF_BEACON and
NL80211_ATTR_CSA_C_OFF_PRESP to be NLA_BINARY which allows
userspace to use beacons and probe responses with
multiple CSA counters.
This isn't breaking the API since userspace can
continue to use nla_put_u16 for this attributes, which
is equivalent to a single element u16 array.
In addition advertise max number of supported CSA counters.
This is needed when using CSA and eCSA IEs together.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-15 15:00:42 +02:00
Andrei Otcheretianski
34d22ce22b cfg80211: Add API to update CSA counters in mgmt frames
Add NL80211_ATTR_CSA_C_OFFSETS_TX which holds an array
of offsets to the CSA counters which should be updated
when sending a management frames with NL80211_CMD_FRAME.

This API should be used by the drivers that wish to keep the
CSA counter updated in probe responses, but do not implement
probe response offloading and so, do not use
ieee80211_proberesp_get function.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-15 14:52:44 +02:00
Joe Perches
c722831744 net: Use a more standard macro for INET_ADDR_COOKIE
Missing a colon on definition use is a bit odd so
change the macro for the 32 bit case to declare an
__attribute__((unused)) and __deprecated variable.

The __deprecated attribute will cause gcc to emit
an error if the variable is actually used.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 16:07:23 -04:00
WANG Cong
122ff243f5 ipv4: make ip_local_reserved_ports per netns
ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.

By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 15:31:45 -04:00
Lorenzo Colitti
84f39b08d7 net: support marking accepting TCP sockets
When using mark-based routing, sockets returned from accept()
may need to be marked differently depending on the incoming
connection request.

This is the case, for example, if different socket marks identify
different networks: a listening socket may want to accept
connections from all networks, but each connection should be
marked with the network that the request came in on, so that
subsequent packets are sent on the correct network.

This patch adds a sysctl to mark TCP sockets based on the fwmark
of the incoming SYN packet. If enabled, and an unmarked socket
receives a SYN, then the SYN packet's fwmark is written to the
connection's inet_request_sock, and later written back to the
accepted socket when the connection is established.  If the
socket already has a nonzero mark, then the behaviour is the same
as it is today, i.e., the listening socket's fwmark is used.

Black-box tested using user-mode linux:

- IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
  mark of the incoming SYN packet.
- The socket returned by accept() is marked with the mark of the
  incoming SYN packet.
- Tested with syncookies=1 and syncookies=2.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 18:35:09 -04:00
Lorenzo Colitti
e110861f86 net: add a sysctl to reflect the fwmark on replies
Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.

This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.

Tested using user-mode linux:
 - ICMP/ICMPv6 echo replies and errors.
 - TCP RST packets (IPv4 and IPv6).

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 18:35:08 -04:00
Yuchung Cheng
843f4a55e3 tcp: use tcp_v4_send_synack on first SYN-ACK
To avoid large code duplication in IPv6, we need to first simplify
the complicate SYN-ACK sending code in tcp_v4_conn_request().

To use tcp_v4(6)_send_synack() to send all SYN-ACKs, we need to
initialize the mini socket's receive window before trying to
create the child socket and/or building the SYN-ACK packet. So we move
that initialization from tcp_make_synack() to tcp_v4_conn_request()
as a new function tcp_openreq_init_req_rwin().

After this refactoring the SYN-ACK sending code is simpler and easier
to implement Fast Open for IPv6.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:53:02 -04:00
Yuchung Cheng
89278c9dc9 tcp: simplify fast open cookie processing
Consolidate various cookie checking and generation code to simplify
the fast open processing. The main goal is to reduce code duplication
in tcp_v4_conn_request() for IPv6 support.

Removes two experimental sysctl flags TFO_SERVER_ALWAYS and
TFO_SERVER_COOKIE_NOT_CHKD used primarily for developmental debugging
purposes.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:53:02 -04:00
Yuchung Cheng
5b7ed0892f tcp: move fastopen functions to tcp_fastopen.c
Move common TFO functions that will be used by both v4 and v6
to tcp_fastopen.c. Create a helper tcp_fastopen_queue_check().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:53:02 -04:00
John W. Linville
3231d65ffe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-05-13 15:27:44 -04:00
Felix Fietkau
8c48b50a1a cfg80211: allow restricting supported dfs regions
At the moment, the ath9k/ath10k DFS module only supports detecting ETSI
radar patterns.
Add a bitmap in the interface combinations, indicating which DFS regions
are supported by the detector. If unset, support for all regions is
assumed.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-13 15:50:06 +02:00
WANG Cong
60ff746739 net: rename local_df to ignore_df
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-12 14:03:41 -04:00
David S. Miller
5f013c9bc7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/altera/altera_sgdma.c
	net/netlink/af_netlink.c
	net/sched/cls_api.c
	net/sched/sch_api.c

The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces.  These were simple transformations from
netlink_capable to netlink_ns_capable.

The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-12 13:19:14 -04:00
Andrzej Kaczmarek
5a134faeef Bluetooth: Store TX power level for connection
This patch adds support to store local TX power level for connection
when reply for HCI_Read_Transmit_Power_Level is received.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-09 14:16:42 -07:00
David S. Miller
1448eb5669 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-05-08

This one is all from Johannes:

"Here are a few small fixes for the current cycle: radiotap TX flags were
wrong (fix by Bob), Chun-Yeow fixes an SMPS issue with mesh interfaces,
Eliad fixes a locking bug and a cfg80211 state problem and finally
Henning sent me a fix for IBSS rate information."

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-09 16:46:53 -04:00
Johannes Berg
f6837ba8c9 mac80211: handle failed restart/resume better
When the driver fails during HW restart or resume, the whole
stack goes into a very confused state with interfaces being
up while the hardware is down etc.

Address this by shutting down everything; we'll run into a
lot of warnings in the process but that's better than having
the whole stack get messed up.

Reviewed-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-09 12:21:34 +02:00
Cong Wang
ba6b918ab2 ping: move ping_group_range out of CONFIG_SYSCTL
Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still
work, just that no one can change it. Therefore we should move it out of
sysctl_net_ipv4.c. And, it should not share the same seqlock with
ip_local_port_range.

BTW, rename it to ->ping_group_range instead.

Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 22:50:47 -04:00
Cong Wang
c9d8f1a642 ipv4: move local_port_range out of CONFIG_SYSCTL
When CONFIG_SYSCTL is not set, ip_local_port_range should still work,
just that no one can change it. Therefore we should move it out of sysctl_inet.c.
Also, rename it to ->ip_local_ports instead.

Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 22:50:47 -04:00
John W. Linville
6153871f77 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2014-05-08 11:13:41 -04:00
Andrzej Kaczmarek
5ae76a9415 Bluetooth: Store RSSI for connection
This patch adds support to store RSSI for connection when reply for
HCI_Read_RSSI is received.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-08 08:01:57 -07:00
Luciano Coelho
c3d620362d cfg80211: fix docbook warning
When trying to generate documentation, at least xmldocs, we get the
following warning:

Warning(include/net/cfg80211.h:461): No description found for parameter 'nl80211_iftype'

Fix it by adding the iftype argument name to the
cfg80211_chandef_dfs_required() function declaration.

Reported-and-tested-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-08 11:59:35 +02:00
WANG Cong
698365fa18 net: clean up snmp stats code
commit 8f0ea0fe3a (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:

	commit 933393f58f
	Date:   Thu Dec 22 11:58:51 2011 -0600

	    percpu: Remove irqsafe_cpu_xxx variants

	    We simply say that regular this_cpu use must be safe regardless of
	    preemption and interrupt state.  That has no material change for x86
	    and s390 implementations of this_cpu operations.  However, arches that
	    do not provide their own implementation for this_cpu operations will
	    now get code generated that disables interrupts instead of preemption.

probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.

So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.

Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-07 16:06:05 -04:00
Neal Cardwell
d28071d102 tunnel: fix RFC number in comment for INET_ECN_decapsulate()
The quoted text and figure are from RFC 6040 ("Tunnelling of Explicit
Congestion Notification").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-07 15:30:52 -04:00
John W. Linville
cabae81103 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2014-05-06 14:05:51 -04:00
Michal Kazior
f04c22033c cfg80211: export interface stopping function
This exports a new cfg80211_stop_iface() function.

This is intended for driver internal interface
combination management and channel switching.

Due to locking issues (it re-enters driver) the
call is asynchronous and uses cfg80211 event
list/worker.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-06 15:16:34 +02:00
Michal Kazior
59af6928d2 mac80211: fix CSA tx queue stopping
It was possible for tx queues to be stuck stopped
if AP CSA finalization failed. In that case
neither stop_ap nor do_stop woke the queues up.
This means it was impossible to perform tx at all
until driver was reloaded or a successful CSA was
performed later.

It was possible to solve this in a simpler manner
however this is more robust and future proof
(having multi-vif CSA in mind).

New sdata->csa_block_tx is introduced to keep
track of which interfaces requested tx to be
blocked for CSA. This is required because mac80211
stops all tx queues for that purpose. This means
queues must be awoken only when last tx-blocking
CSA interface is finished.

It is still possible to have tx queues stopped
after CSA failure but as soon as offending
interfaces are stopped from userspace (stop_ap or
ifdown) tx queues are woken up properly.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-06 15:10:00 +02:00
Libor Pechacek
86aae6c7b5 Bluetooth: Convert RFCOMM spinlocks into mutexes
Enabling CONFIG_DEBUG_ATOMIC_SLEEP has shown that some rfcomm functions
acquiring spinlocks call sleeping locks further in the chain.  Converting
the offending spinlocks into mutexes makes sleeping safe.

Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-05 19:25:06 -07:00
Tom Herbert
e4f45b7f40 net: Call skb_checksum_init in IPv6
Call skb_checksum_init instead of private functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:30 -04:00
Tom Herbert
ed70fcfcee net: Call skb_checksum_init in IPv4
Call skb_checksum_init instead of private functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:30 -04:00
Tom Herbert
07064c6e02 net: Allow csum_add to be provided in arch
csum_add is really nothing more then add-with-carry which
can be implemented efficiently in some architectures.
Allow architecture to define this protected by HAVE_ARCH_CSUM_ADD.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:29 -04:00
David S. Miller
2ad0649687 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-05-02

Please pull this batch of updates intended for the 3.16 stream...

For the mac80211 bits, Johannes says:

"In this round we have a large number of small features and
improvements from people too numerous to list here. The only really
bit thing is Michał and Luca's CSA work (including changing how
interface combination verification is done)."

For the Bluetooth bits, Gustavo says:

"Here goes some patches for the -next release. There is nothing
really special for this pull request, just a bunch of refactors,
fixes and clean ups."

For the ath10k/ath6kl bits, Kalle says:

"For ath6kl Kalle fixed a bunch of checkpatch warnings.

In ath10k we had more changes, major ones being:

* fix memory allocation failures after a firmware crash (Michal)

* some rework of DFS configuration to enable it correctly in all cases
  (Michal)

* add a new firmware crash option to make it possible to crash 10.1
  firmware for testing purposes (Marek P)

* fix RTS/CTS protection in certain cases (Marek K)

* fix wrong RSSI and rate reporting in some cases (Janusz)

* fix firmware stats reporting (Chun, Ben & Bartosz)"

For the iwlwifi bits, Emmanuel says:

"I have here a bunch of unrelated things. I disabled support for
-7.ucode which means that I can removed a lot of code. Eliad has
a brand new feature: we reduce the Tx power when the link allows -
this reduces our power consumption. The regular changes in power and
scan area. One interesting thing though is the patches from Johannes,
we have now GRO which allows to increase our throughput in TCP Rx. The
main advantage is that it reduces the number of TCP Acks - these TCP
Acks are completely useless when we are using A-MPDU since the first
packet of the A-MPDU generates a TCP Ack which is made obsolete by
the next packets."

Along with that, there are a variety of updates to b43, mwifiex,
rtl8180 and wil6210 drivers and a handful of other updates here
and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:36:26 -04:00
Andy King
2c4a336e0a vsock: Make transport the proto owner
Right now the core vsock module is the owner of the proto family. This
means there's nothing preventing the transport module from unloading if
there are open sockets, which results in a panic. Fix that by allowing
the transport to be the owner, which will refcount it properly.

Includes version bump to 1.0.1.0-k

Passes checkpatch this time, I swear...

Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:13:50 -04:00
Arik Nemtsov
0c4972ccaa mac80211: set an external flag for TDLS stations
Expose a new tdls flag for the public ieee80211_sta struct.
This can be used in some rate control decisions.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-05 15:56:02 +02:00
Geert Uytterhoeven
325483a9bf wimax: Spelling s/than/that/, wording s/destinatary/recipient/
Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: wimax@linuxwimax.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-05-05 15:37:55 +02:00
Eliad Peller
792e6aa7a1 cfg80211: add cfg80211_sched_scan_stopped_rtnl
Add locked-version for cfg80211_sched_scan_stopped.
This is used for some users that might want to
call it when rtnl is already locked.

Fixes: d43c6b6 ("mac80211: reschedule sched scan after HW restart")
Cc: stable@vger.kernel.org (3.14+)
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-05 15:14:57 +02:00
Eric Dumazet
249015515f tcp: remove in_flight parameter from cong_avoid() methods
Commit e114a710aa ("tcp: fix cwnd limited checking to improve
congestion control") obsoleted in_flight parameter from
tcp_is_cwnd_limited() and its callers.

This patch does the removal as promised.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 19:23:07 -04:00
Eric Dumazet
e114a710aa tcp: fix cwnd limited checking to improve congestion control
Yuchung discovered tcp_is_cwnd_limited() was returning false in
slow start phase even if the application filled the socket write queue.

All congestion modules take into account tcp_is_cwnd_limited()
before increasing cwnd, so this behavior limits slow start from
probing the bandwidth at full speed.

The problem is that even if write queue is full (aka we are _not_
application limited), cwnd can be under utilized if TSO should auto
defer or TCP Small queues decided to hold packets.

So the in_flight can be kept to smaller value, and we can get to the
point tcp_is_cwnd_limited() returns false.

With TCP Small Queues and FQ/pacing, this issue is more visible.

We fix this by having tcp_cwnd_validate(), which is supposed to track
such things, take into account unsent_segs, the number of segs that we
are not sending at the moment due to TSO or TSQ, but intend to send
real soon. Then when we are cwnd-limited, remember this fact while we
are processing the window of ACKs that comes back.

For example, suppose we have a brand new connection with cwnd=10; we
are in slow start, and we send a flight of 9 packets. By the time we
have received ACKs for all 9 packets we want our cwnd to be 18.
We implement this by setting tp->lsnd_pending to 9, and
considering ourselves to be cwnd-limited while cwnd is less than
twice tp->lsnd_pending (2*9 -> 18).

This makes tcp_is_cwnd_limited() more understandable, by removing
the GSO/TSO kludge, that tried to work around the issue.

Note the in_flight parameter can be removed in a followup cleanup
patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 17:54:35 -04:00
John W. Linville
406a94d7fa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-05-02 13:47:50 -04:00
Lorenzo Colitti
5c98631cca net: ipv6: Introduce ip6_sk_dst_hoplimit.
This replaces 6 identical code snippets with a call to a new
static inline function.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-30 13:31:26 -04:00
Florian Fainelli
7fa857ed04 net: dsa: add ds_to_priv
DSA drivers have a trick which consists in allocating "priv_size" more
bytes to account for the DSA driver private context. Add a helper
function to access that private context instead of open-coding it in
drivers with (void *)(ds + 1).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-30 13:31:25 -04:00
John W. Linville
f6595444c1 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	net/mac80211/chan.c
2014-04-30 12:04:27 -04:00
John W. Linville
0006433a5b Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-04-30 11:56:43 -04:00
Florian Westphal
f768e5bdef netfilter: add helper for adding nat extension
Reduce copy-past a bit by adding a common helper.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-29 20:56:22 +02:00
Jouni Malinen
e16821bcfb cfg80211: Dynamic channel bandwidth changes in AP mode
This extends NL80211_CMD_SET_CHANNEL to allow dynamic channel bandwidth
changes in AP mode (including P2P GO) during a lifetime of the BSS. This
can be used to implement, e.g., HT 20/40 MHz co-existence rules on the
2.4 GHz band.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-28 18:09:59 +02:00
Cong Wang
2f7ef2f879 sched, cls: check if we could overwrite actions when changing a filter
When actions are attached to a filter, they are a part of the filter
itself, so when changing a filter we should allow to overwrite the actions
inside as well.

In my specific case, when I tried to _append_ a new action to an existing
filter which already has an action, I got EEXIST since kernel refused
to overwrite the existing one in kernel.

This patch checks if we are changing the filter checking NLM_F_CREATE flag
(Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down
to actions. This fixes the problem above.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-27 23:42:39 -04:00
Rostislav Lisovy
ea077c1cea cfg80211: Add attributes describing prohibited channel bandwidth
Since there are frequency bands (e.g. 5.9GHz) allowing channels
with only 10 or 5 MHz bandwidth, this patch adds attributes that
allow keeping track about this information.

When channel attributes are reported to user-space, make sure to
not break old tools, i.e. if the 'split wiphy dump' is enabled,
report the extra attributes (if present) describing the bandwidth
restrictions.  If the 'split wiphy dump' is not enabled,
completely omit those channels that have flags set to either
IEEE80211_CHAN_NO_10MHZ or IEEE80211_CHAN_NO_20MHZ.

Add the check for new bandwidth restriction flags in
cfg80211_chandef_usable() to comply with the restrictions.

Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-25 17:38:23 +02:00
Marek Kwaczynski
17d38fa8c2 mac80211: add option to generate CCMP IVs only for mgmt frames
Some chips can encrypt managment frames in HW, but
require generated IV in the frame. Add a key flag
that allows us to achieve this.

Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com>
[use BIT(0) to fill that spot, fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-25 17:26:15 +02:00
Michal Kazior
65a124dd71 cfg80211: allow drivers to iterate over matching combinations
The patch splits cfg80211_check_combinations()
into an iterator function and a simple iteration
user.

This makes it possible for drivers to asses how
many channels can use given iftype setup. This in
turn can be used for future
multi-interface/multi-channel channel switching.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-25 17:08:14 +02:00
Nicolas Dichtel
f01ec1c017 vxlan: add x-netns support
This patch allows to switch the netns when packet is encapsulated or
decapsulated.
The vxlan socket is openned into the i/o netns, ie into the netns where
encapsulated packets are received. The socket lookup is done into this netns to
find the corresponding vxlan tunnel. After decapsulation, the packet is
injecting into the corresponding interface which may stand to another netns.

When one of the two netns is removed, the tunnel is destroyed.

Configuration example:
ip netns add netns1
ip netns exec netns1 ip link set lo up
ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
ip link set vxlan10 netns netns1
ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10
ip netns exec netns1 ip link set vxlan10 up

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24 16:18:26 -04:00
Eric W. Biederman
a3b299da86 net: Add variants of capable for use on on sockets
sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24 13:44:53 -04:00
Luis R. Rodriguez
7e65eac8e3 6lowpan: nuke net_ieee802154_lowpan() accessor when 6lowpan is disabled
Johannes noted this is not needed, all of the fragment
accessors don't need CONFIG_NET_NS. This goes test compiled with
CONFIG_BT_6LOWPAN=y and a disabled CONFIG_NET_NS.

CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Cc: David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24 12:36:00 -04:00
Tomasz Bursztyka
aa45660c6b netfilter: nf_tables: Make meta expression core functions public
This will be useful to create network family dedicated META expression
as for NFPROTO_BRIDGE for instance.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-23 13:55:30 +02:00
Tetsuo Handa
2e71029e2c xfrm: Remove useless xfrm_audit struct.
Commit f1370cc4 "xfrm: Remove useless secid field from xfrm_audit." changed
"struct xfrm_audit" to have either
{ audit_get_loginuid(current) / audit_get_sessionid(current) } or
{ INVALID_UID / -1 } pair.

This means that we can represent "struct xfrm_audit" as "bool".
This patch replaces "struct xfrm_audit" argument with "bool".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-04-23 08:21:04 +02:00
Tetsuo Handa
f1370cc4a0 xfrm: Remove useless secid field from xfrm_audit.
It seems to me that commit ab5f5e8b "[XFRM]: xfrm audit calls" is doing
something strange at xfrm_audit_helper_usrinfo().
If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls
audit_log_task_context() which basically does
secid != 0 && security_secid_to_secctx(secid) == 0 case
except that secid is obtained from current thread's context.

Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was
obtained from other thread's context? It might audit current thread's
context rather than other thread's context if security_secid_to_secctx()
in xfrm_audit_helper_usrinfo() failed for some reason.

Then, are all the caller of xfrm_audit_helper_usrinfo() passing either
secid obtained from current thread's context or secid == 0?
It seems to me that they are.

If I didn't miss something, we don't need to pass secid to
xfrm_audit_helper_usrinfo() because audit_log_task_context() will
obtain secid from current thread's context.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-04-22 10:47:53 +02:00
Mark A. Greer
51d98fa47c NFC: digital: Add macros for the ISO/IEC 14443-B Protocol
Add RF tech and framing macros for the ISO/IEC 14443-B Protocol.

Cc: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-04-22 00:37:28 +02:00
Christophe Ricard
e240bc3612 NFC: hci: Add load_session HCI operand
load_session allows a CLF to restore the gate <-> pipe table from some
proprietary location.
The main advantage to add this function is to reduce the memory wear by
running pipe creation (and storing) only once.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-04-22 00:37:26 +02:00
Weiping Pan
86fd14ad1e tcp: make tcp_cwnd_application_limited() static
Make tcp_cwnd_application_limited() static and move it from tcp_input.c to
tcp_output.c

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-20 18:18:56 -04:00
Luis R. Rodriguez
17d8ecb8ff 6lowpan: include net/net_namespace.h on 6lowpan namepsace header
Don't rely on driver files or other headers having this file included.

CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-20 18:18:55 -04:00
Luis R. Rodriguez
599018a710 6lowpan: add helper to get 6lowpan namespace
This will simplify the new reassembly backport
with no code changes being required.

CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Cc: David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-20 18:18:55 -04:00
Vlad Yasevich
b14878ccb7 net: sctp: cache auth_enable per endpoint
Currently, it is possible to create an SCTP socket, then switch
auth_enable via sysctl setting to 1 and crash the system on connect:

Oops[#1]:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
[...]
Call Trace:
[<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
[<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
[<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
[<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
[<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
[<ffffffff8043af68>] sctp_rcv+0x588/0x630
[<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
[<ffffffff803acb50>] ip6_input+0x2c0/0x440
[<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
[<ffffffff80310650>] process_backlog+0xb4/0x18c
[<ffffffff80313cbc>] net_rx_action+0x12c/0x210
[<ffffffff80034254>] __do_softirq+0x17c/0x2ac
[<ffffffff800345e0>] irq_exit+0x54/0xb0
[<ffffffff800075a4>] ret_from_irq+0x0/0x4
[<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
[<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
[<ffffffff805a88b0>] start_kernel+0x37c/0x398
Code: dd0900b8  000330f8  0126302d <dcc60000> 50c0fff1  0047182a  a48306a0
03e00008  00000000
---[ end trace b530b0551467f2fd ]---
Kernel panic - not syncing: Fatal exception in interrupt

What happens while auth_enable=0 in that case is, that
ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
when endpoint is being created.

After that point, if an admin switches over to auth_enable=1,
the machine can crash due to NULL pointer dereference during
reception of an INIT chunk. When we enter sctp_process_init()
via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
the INIT verification succeeds and while we walk and process
all INIT params via sctp_process_param() we find that
net->sctp.auth_enable is set, therefore do not fall through,
but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
dereference what we have set to NULL during endpoint
initialization phase.

The fix is to make auth_enable immutable by caching its value
during endpoint initialization, so that its original value is
being carried along until destruction. The bug seems to originate
from the very first days.

Fix in joint work with Daniel Borkmann.

Reported-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-18 18:32:00 -04:00
Peter Zijlstra
4e857c58ef arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:48 +02:00
Cong Wang
6a662719c9 ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
As suggested by Julian:

	Simply, flowi4_iif must not contain 0, it does not
	look logical to ignore all ip rules with specified iif.

because in fib_rule_match() we do:

        if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
                goto out;

flowi4_iif should be LOOPBACK_IFINDEX by default.

We need to move LOOPBACK_IFINDEX to include/net/flow.h:

1) It is mostly used by flowi_iif

2) Fix the following compile error if we use it in flow.h
by the patches latter:

In file included from include/linux/netfilter.h:277:0,
                 from include/net/netns/netfilter.h:5,
                 from include/net/net_namespace.h:21,
                 from include/linux/netdevice.h:43,
                 from include/linux/icmpv6.h:12,
                 from include/linux/ipv6.h:61,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:27,
                 from include/linux/nfs_fs.h:30,
                 from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16 15:05:11 -04:00
Eric Dumazet
aad88724c9 ipv4: add a sock pointer to dst->output() path.
In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.

The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.

Fixes: 8f646c922d ("vxlan: keep original skb ownership")
Reported-by: lucien xin <lucien.xin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 13:47:15 -04:00
Eric Dumazet
b0270e9101 ipv4: add a sock pointer to ip_queue_xmit()
ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit 31c70d5956 ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.

One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.

Fixes: 31c70d5956 ("l2tp: keep original skb ownership")
Reported-by: Zhan Jianyu <nasa4836@gmail.com>
Tested-by: Zhan Jianyu <nasa4836@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 12:58:34 -04:00
David S. Miller
00cbc3dcd1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains three Netfilter fixes for your net tree,
they are:

* Fix missing generation sequence initialization which results in a splat
  if lockdep is enabled, it was introduced in the recent works to improve
  nf_conntrack scalability, from Andrey Vagin.

* Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
  disabled otherwise this crashes due to a double release, from Andrey
  Vagin.

* Fix nf_tables cmp fast in big endian, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 19:00:10 -04:00
Daniel Borkmann
362d52040c Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
This reverts commit ef2820a735 ("net: sctp: Fix a_rwnd/rwnd management
to reflect real state of the receiver's buffer") as it introduced a
serious performance regression on SCTP over IPv4 and IPv6, though a not
as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.

Current state:

[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 17:56:21 GMT
Connecting to host 192.168.241.3, port 5201
      Cookie: Lab200slot2.1397238981.812898.548918
[  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
[  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
[  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
[  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
[  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
[  4]   6.21-6.21   sec  0.00 Bytes    0.00 bits/sec
[  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
[  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
[  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
[  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
[  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
[  4]  16.79-16.79  sec  0.00 Bytes    0.00 bits/sec
[  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
(etc)

[root@Lab200slot2 ~]#  iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 19:08:41 GMT
Connecting to host 2001:db8:0:f101::1, port 5201
      Cookie: Lab200slot2.1397243321.714295.2b3f7c
[  4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec
[  4]   1.00-2.00   sec   201 MBytes  1.69 Gbits/sec
[  4]   2.00-3.00   sec   188 MBytes  1.58 Gbits/sec
[  4]   3.00-4.00   sec   174 MBytes  1.46 Gbits/sec
[  4]   4.00-5.00   sec   165 MBytes  1.39 Gbits/sec
[  4]   5.00-6.00   sec   199 MBytes  1.67 Gbits/sec
[  4]   6.00-7.00   sec   163 MBytes  1.36 Gbits/sec
[  4]   7.00-8.00   sec   174 MBytes  1.46 Gbits/sec
[  4]   8.00-9.00   sec   193 MBytes  1.62 Gbits/sec
[  4]   9.00-10.00  sec   196 MBytes  1.65 Gbits/sec
[  4]  10.00-11.00  sec   157 MBytes  1.31 Gbits/sec
[  4]  11.00-12.00  sec   175 MBytes  1.47 Gbits/sec
[  4]  12.00-13.00  sec   192 MBytes  1.61 Gbits/sec
[  4]  13.00-14.00  sec   199 MBytes  1.67 Gbits/sec
(etc)

After patch:

[root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
Time: Mon, 14 Apr 2014 16:40:48 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1397493648.413274.65e131
[  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
[  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
[  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
[  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
[  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
[  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
[  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
[  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec

With the reverted patch applied, the SCTP/IPv4 performance is back
to normal on latest upstream for IPv4 and IPv6 and has same throughput
as 3.4.2 test kernel, steady and interval reports are smooth again.

Fixes: ef2820a735 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
Reported-by: Peter Butler <pbutler@sonusnet.com>
Reported-by: Dongsheng Song <dongsheng.song@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Peter Butler <pbutler@sonusnet.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:48 -04:00
Eric Dumazet
30f78d8ebf ipv6: Limit mtu to 65575 bytes
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.

We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.

We must limit the IPv6 MTU to (65535 + 40) bytes in theory.

Tested:

ifconfig lo mtu 70000
netperf -H ::1

Before patch : Throughput :   0.05 Mbits

After patch : Throughput : 35484 Mbits

Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 12:39:59 -04:00
Patrick McHardy
b855d416dc netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4
nft_cmp_fast is used for equality comparisions of size <= 4. For
comparisions of size < 4 byte a mask is calculated that is applied to
both the data from userspace (during initialization) and the register
value (during runtime). Both values are stored using (in effect) memcpy
to a memory area that is then interpreted as u32 by nft_cmp_fast.

This works fine on little endian since smaller types have the same base
address, however on big endian this is not true and the smaller types
are interpreted as a big number with trailing zero bytes.

The mask therefore must not include the lower bytes, but the higher bytes
on big endian. Add a helper function that does a cpu_to_le32 to switch
the bytes on big endian. Since we're dealing with a mask of just consequitive
bits, this works out fine.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-14 10:38:02 +02:00
Linus Torvalds
454fd351f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull yet more networking updates from David Miller:

 1) Various fixes to the new Redpine Signals wireless driver, from
    Fariya Fatima.

 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
    Dmitry Petukhov.

 3) UFO and TSO packets differ in whether they include the protocol
    header in gso_size, account for that in skb_gso_transport_seglen().
   From Florian Westphal.

 4) If VLAN untagging fails, we double free the SKB in the bridging
    output path.  From Toshiaki Makita.

 5) Several call sites of sk->sk_data_ready() were referencing an SKB
    just added to the socket receive queue in order to calculate the
    second argument via skb->len.  This is dangerous because the moment
    the skb is added to the receive queue it can be consumed in another
    context and freed up.

    It turns out also that none of the sk->sk_data_ready()
    implementations even care about this second argument.

    So just kill it off and thus fix all these use-after-free bugs as a
    side effect.

 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.

 7) pktgen needs to do locking properly for LLTX devices, from Daniel
    Borkmann.

 8) xen-netfront driver initializes TX array entries in RX loop :-) From
    Vincenzo Maffione.

 9) After refactoring, some tunnel drivers allow a tunnel to be
    configured on top itself.  Fix from Nicolas Dichtel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  vti: don't allow to add the same tunnel twice
  gre: don't allow to add the same tunnel twice
  drivers: net: xen-netfront: fix array initialization bug
  pktgen: be friendly to LLTX devices
  r8152: check RTL8152_UNPLUG
  net: sun4i-emac: add promiscuous support
  net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  net: ipv6: Fix oif in TCP SYN+ACK route lookup.
  drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
  drivers: net: cpsw: discard all packets received when interface is down
  net: Fix use after free by removing length arg from sk_data_ready callbacks.
  Drivers: net: hyperv: Address UDP checksum issues
  Drivers: net: hyperv: Negotiate suitable ndis version for offload support
  Drivers: net: hyperv: Allocate memory for all possible per-pecket information
  bridge: Fix double free and memory leak around br_allowed_ingress
  bonding: Remove debug_fs files when module init fails
  i40evf: program RSS LUT correctly
  i40evf: remove open-coded skb_cow_head
  ixgb: remove open-coded skb_cow_head
  igbvf: remove open-coded skb_cow_head
  ...
2014-04-12 17:31:22 -07:00
Linus Torvalds
582076ab16 A bunch of updates and cleanup within the transport layer,
particularly with a focus on RDMA.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJTRXV1AAoJEDZk62b0Tg6xYWAP/0Kfp9/8Z05SQ1fnJveySw1O
 nKK//1/GBmquntqFAVHI6yJRLzPJ+Z/Y4u4a0qriwfgpPQvUJQrL77tY/VfEBUPB
 VoJG1tj7lpdLrO3p4YyPkPjymyC7YOoFNjGEstWFg7HetwnnqqZL2LB+5yJzyqjx
 y9nv1HzsrbAE7j8C4hQ1Nmds5muUb5VhnTtPhjrx4tP1sWWh8XTVJbsVDiEqx6cu
 uJXFFTbkONr9jKfv+Ki3H2pZej2yD7w4tU4lkdcGNyij/Q4Xn1iERqroW2/GT6Cl
 AXxlIKN24ASjWo4VqW0Wf8gO8vbUtHRChoiZ69DvzNTkbWmAIWSFHyQJ4cinwuyr
 UbOQZuccO59QtpNVpBvG/vjnbI54rg+VGLy+xE0vcrBDlyoptc56IAFSg8zJY5UN
 ysbyHCGME//9VZ1zeeZvkMjm8z5Enp6x4zmtnUHmufO7DVMTFUePED6U1u9WIyP5
 FFy5EboXMSh97yB8REvbIlY2MgBJWYdnyzLKFMeRpzC8fOXJqBoCX8i3Z5SkMYWJ
 1FS/pGr7ec/VX1iHXSYi9hhzTJ6o9mEmOIhaO4UcqAuK8Rk2jbCp1Lx0iDhmJtdT
 zjofGDe57ro7nOZf8A/TI5z+6BZq8KYVfZrXtYaPqC4rXOzJAox/yHlz9FmAJYgv
 ssOIKXX0ujLwqrMBVatj
 =yzy/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p changes from Eric Van Hensbergen:
 "A bunch of updates and cleanup within the transport layer,
  particularly with a focus on RDMA"

* tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9pnet_rdma: check token type before int conversion
  9pnet: trans_fd : allocate struct p9_trans_fd and struct p9_conn together.
  9pnet: p9_client->conn field is unused. Remove it.
  9P: Get rid of REQ_STATUS_FLSH
  9pnet_rdma: add cancelled()
  9pnet_rdma: update request status during send
  9P: Add cancelled() to the transport functions.
  net: Mark function as static in 9p/client.c
  9P: Add memory barriers to protect request fields over cb/rpc threads handoff
2014-04-11 14:14:57 -07:00
David S. Miller
676d23690f net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-11 16:15:36 -04:00
Rostislav Lisovy
041f607de1 mac80211: Update conf_is_ht() to work properly with 5/10MHz channels
The channels with 5/10MHz bandwidth are not HT. We have to
reflect this in conf_is_ht() function which returns whether the
particular channel is HT or not.

Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:56:04 +02:00
Kalle Valo
ce26151bc3 cfg80211: update comment about WIPHY_FLAG_CUSTOM_REGULATORY
Commit a2f73b6c5d ("cfg80211: move regulatory flags to their own variable")
renamed WIPHY_FLAG_CUSTOM_REGULATORY to REGULATORY_CUSTOM_REG, but missed to
update one comment.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:56:02 +02:00
Luciano Coelho
5d52ee8110 mac80211: allow reservation of a running chanctx
With single-channel drivers, we need to be able to change a running
chanctx if we want to use chanctx reservation.  Not all drivers may be
able to do this, so add a flag that indicates support for it.

Changing a running chanctx can also be used as an optimization in
multi-channel drivers when the context needs to be reserved for future
usage.

Introduce IEEE80211_CHANCTX_RESERVED chanctx mode to mark a channel as
reserved so nobody else can use it (since we know it's going to
change).  In the future, we may allow several vifs to use the same
reservation as long as they plan to use the chanctx on the same
future channel.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:55:56 +02:00
Luciano Coelho
73de86a389 cfg80211/mac80211: move interface counting for combination check to mac80211
Move the counting part of the interface combination check from
cfg80211 to mac80211.

This is needed to simplify locking when the driver has to perform a
combination check by itself (eg. with channel-switch).

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:55:43 +02:00
Luciano Coelho
2beb6dab2d cfg80211/mac80211: refactor cfg80211_chandef_dfs_required()
Some interface types don't require DFS (such as STATION, P2P_CLIENT
etc).  In order to centralize these decisions, make
cfg80211_chandef_dfs_required() take the iftype into consideration.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:55:41 +02:00
Luciano Coelho
cb2d956dd3 cfg80211: refactor cfg80211_can_use_iftype_chan()
Separate the code that counts the interface types and channels from
the code that check the interface combinations.  The new function that
checks for combinations is exported so it can be called by the
drivers.

This is done in preparation for moving the interface combinations
checks out of cfg80211.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:55:39 +02:00
Ilan Peer
174e0cd28a cfg80211: Enable GO operation on additional channels
Allow GO operation on a channel marked with IEEE80211_CHAN_GO_CONCURRENT
iff there is an active station interface that is associated to
an AP operating on the same channel in the 2 GHz band or the same UNII band
(in the 5 GHz band). This relaxation is not allowed if the channel is
marked with IEEE80211_CHAN_RADAR.

Note that this is a permissive approach to the FCC definitions,
that require a clear assessment that the device operating the AP is
an authorized master, i.e., with radar detection and DFS capabilities.

It is assumed that such restrictions are enforced by user space.
Furthermore, it is assumed, that if the conditions that allowed for
the operation of the GO on such a channel change, i.e., the station
interface disconnected from the AP, it is the responsibility of user
space to evacuate the GO from the channel.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:55:34 +02:00
David Spinadel
570dbde137 cfg80211: Add indoor only and GO concurrent channel attributes
The FCC are clarifying some soft configuration requirements,
which among other include the following:

1. Indoor operation, where a device can use channels requiring indoor
   operation, subject to that it can guarantee indoor operation,
   i.e., the device is connected to AC Power or the device is under
   the control of a local master that is acting as an AP and is
   connected to AC Power.
2. Concurrent GO operation, where devices may instantiate a P2P GO
   while they are under the guidance of an authorized master. For example,
   on a channel on which a BSS is connected to an authorized master, i.e.,
   with DFS and radar detection capability in the UNII band.

See https://apps.fcc.gov/eas/comments/GetPublishedDocument.html?id=327&tn=528122

Add support for advertising Indoor-only and GO-Concurrent channel
properties.

Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:55:32 +02:00
Emmanuel Grumbach
77be2c54c5 mac80211: add vif to flush call
This will allow the low level driver to make decision based
on the vif such as queues etc...
Since the vif might be NULL, we can't add it to the tracing
functions.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[fix staging rtl8821ae driver]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:55:29 +02:00
Johannes Berg
78f22b6a3a cfg80211: allow userspace to take ownership of interfaces
When dynamically creating interfaces from userspace, e.g. for P2P usage,
such interfaces are usually owned by the process that created them, i.e.
wpa_supplicant. Should wpa_supplicant crash, such interfaces will often
cease operating properly and cause problems on restarting the process.

To avoid this problem, introduce an ownership concept for interfaces. If
an interface is owned by a netlink socket, then it will be destroyed if
the netlink socket is closed for any reason, including if the process it
belongs to crashed. This gives us a race-free way to get rid of any such
interfaces.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-04-09 10:55:28 +02:00
Linus Torvalds
ce7613db2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull more networking updates from David Miller:

 1) If a VXLAN interface is created with no groups, we can crash on
    reception of packets.  Fix from Mike Rapoport.

 2) Missing includes in CPTS driver, from Alexei Starovoitov.

 3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
    and Dan Carpenter.

 4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers.  From
    Josh Boyer.

 5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
    Borkmann.

 6) Byte-Queue-Limit enabled drivers aren't handled properly in
    AF_PACKET transmit path, also from Daniel Borkmann.

    Same problem exists in pktgen, and Daniel fixed it there too.

 7) Fix resource leaks in driver probe error paths of new sxgbe driver,
    from Francois Romieu.

 8) Truesize of SKBs can gradually get more and more corrupted in NAPI
    packet recycling path, fix from Eric Dumazet.

 9) Fix uniprocessor netfilter build, from Florian Westphal.  In the
    longer term we should perhaps try to find a way for ARRAY_SIZE() to
    work even with zero sized array elements.

10) Fix crash in netfilter conntrack extensions due to mis-estimation of
    required extension space.  From Andrey Vagin.

11) Since we commit table rule updates before trying to copy the
    counters back to userspace (it's the last action we perform), we
    really can't signal the user copy with an error as we are beyond the
    point from which we can unwind everything.  This causes all kinds of
    use after free crashes and other mysterious behavior.

    From Thomas Graf.

12) Restore previous behvaior of div/mod by zero in BPF filter
    processing.  From Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  net: sctp: wake up all assocs if sndbuf policy is per socket
  isdnloop: several buffer overflows
  netdev: remove potentially harmful checks
  pktgen: fix xmit test for BQL enabled devices
  net/at91_ether: avoid NULL pointer dereference
  tipc: Let tipc_release() return 0
  at86rf230: fix MAX_CSMA_RETRIES parameter
  mac802154: fix duplicate #include headers
  sxgbe: fix duplicate #include headers
  net: filter: be more defensive on div/mod by X==0
  netfilter: Can't fail and free after table replacement
  xen-netback: Trivial format string fix
  net: bcmgenet: Remove unnecessary version.h inclusion
  net: smc911x: Remove unused local variable
  bonding: Inactive slaves should keep inactive flag's value
  netfilter: nf_tables: fix wrong format in request_module()
  netfilter: nf_tables: set names cannot be larger than 15 bytes
  netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
  netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
  netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
  ...
2014-04-08 12:41:23 -07:00
Andrey Vagin
223b02d923 netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.

nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8

I have seen "len" up to 280 and my host has crashes w/o this patch.

The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.

Fixes: 5b423f6a40 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-03 23:52:31 +02:00
Linus Torvalds
32d01dc7be Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "A lot updates for cgroup:

   - The biggest one is cgroup's conversion to kernfs.  cgroup took
     after the long abandoned vfs-entangled sysfs implementation and
     made it even more convoluted over time.  cgroup's internal objects
     were fused with vfs objects which also brought in vfs locking and
     object lifetime rules.  Naturally, there are places where vfs rules
     don't fit and nasty hacks, such as credential switching or lock
     dance interleaving inode mutex and cgroup_mutex with object serial
     number comparison thrown in to decide whether the operation is
     actually necessary, needed to be employed.

     After conversion to kernfs, internal object lifetime and locking
     rules are mostly isolated from vfs interactions allowing shedding
     of several nasty hacks and overall simplification.  This will also
     allow implmentation of operations which may affect multiple cgroups
     which weren't possible before as it would have required nesting
     i_mutexes.

   - Various simplifications including dropping of module support,
     easier cgroup name/path handling, simplified cgroup file type
     handling and task_cg_lists optimization.

   - Prepatory changes for the planned unified hierarchy, which is still
     a patchset away from being actually operational.  The dummy
     hierarchy is updated to serve as the default unified hierarchy.
     Controllers which aren't claimed by other hierarchies are
     associated with it, which BTW was what the dummy hierarchy was for
     anyway.

   - Various fixes from Li and others.  This pull request includes some
     patches to add missing slab.h to various subsystems.  This was
     triggered xattr.h include removal from cgroup.h.  cgroup.h
     indirectly got included a lot of files which brought in xattr.h
     which brought in slab.h.

  There are several merge commits - one to pull in kernfs updates
  necessary for converting cgroup (already in upstream through
  driver-core), others for interfering changes in the fixes branch"

* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
  cgroup: remove useless argument from cgroup_exit()
  cgroup: fix spurious lockdep warning in cgroup_exit()
  cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
  cgroup: break kernfs active_ref protection in cgroup directory operations
  cgroup: fix cgroup_taskset walking order
  cgroup: implement CFTYPE_ONLY_ON_DFL
  cgroup: make cgrp_dfl_root mountable
  cgroup: drop const from @buffer of cftype->write_string()
  cgroup: rename cgroup_dummy_root and related names
  cgroup: move ->subsys_mask from cgroupfs_root to cgroup
  cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
  cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
  cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
  cgroup: reorganize cgroup bootstrapping
  cgroup: relocate setting of CGRP_DEAD
  cpuset: use rcu_read_lock() to protect task_cs()
  cgroup_freezer: document freezer_fork() subtleties
  cgroup: update cgroup_transfer_tasks() to either succeed or fail
  cgroup: drop task_lock() protection around task->cgroups
  cgroup: update how a newly forked task gets associated with css_set
  ...
2014-04-03 13:05:42 -07:00
Linus Torvalds
cd6362befe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Here is my initial pull request for the networking subsystem during
  this merge window:

   1) Support for ESN in AH (RFC 4302) from Fan Du.

   2) Add full kernel doc for ethtool command structures, from Ben
      Hutchings.

   3) Add BCM7xxx PHY driver, from Florian Fainelli.

   4) Export computed TCP rate information in netlink socket dumps, from
      Eric Dumazet.

   5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
      Dichtel.

   6) Convert many drivers to pci_enable_msix_range(), from Alexander
      Gordeev.

   7) Record SKB timestamps more efficiently, from Eric Dumazet.

   8) Switch to microsecond resolution for TCP round trip times, also
      from Eric Dumazet.

   9) Clean up and fix 6lowpan fragmentation handling by making use of
      the existing inet_frag api for it's implementation.

  10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

  11) Auto size SKB lengths when composing netlink messages based upon
      past message sizes used, from Eric Dumazet.

  12) qdisc dumps can take a long time, add a cond_resched(), From Eric
      Dumazet.

  13) Sanitize netpoll core and drivers wrt.  SKB handling semantics.
      Get rid of never-used-in-tree netpoll RX handling.  From Eric W
      Biederman.

  14) Support inter-address-family and namespace changing in VTI tunnel
      driver(s).  From Steffen Klassert.

  15) Add Altera TSE driver, from Vince Bridgers.

  16) Optimizing csum_replace2() so that it doesn't adjust the checksum
      by checksumming the entire header, from Eric Dumazet.

  17) Expand BPF internal implementation for faster interpreting, more
      direct translations into JIT'd code, and much cleaner uses of BPF
      filtering in non-socket ocntexts.  From Daniel Borkmann and Alexei
      Starovoitov"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
  netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
  net: Add a test to see if a skb is freeable in irq context
  qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
  net: ptp: move PTP classifier in its own file
  net: sxgbe: make "core_ops" static
  net: sxgbe: fix logical vs bitwise operation
  net: sxgbe: sxgbe_mdio_register() frees the bus
  Call efx_set_channels() before efx->type->dimension_resources()
  xen-netback: disable rogue vif in kthread context
  net/mlx4: Set proper build dependancy with vxlan
  be2net: fix build dependency on VxLAN
  mac802154: make csma/cca parameters per-wpan
  mac802154: allow only one WPAN to be up at any given time
  net: filter: minor: fix kdoc in __sk_run_filter
  netlink: don't compare the nul-termination in nla_strcmp
  can: c_can: Avoid led toggling for every packet.
  can: c_can: Simplify TX interrupt cleanup
  can: c_can: Store dlc private
  can: c_can: Reduce register access
  can: c_can: Make the code readable
  ...
2014-04-02 20:53:45 -07:00
Linus Torvalds
159d8133d0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual rocket science -- mostly documentation and comment updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  sparse: fix comment
  doc: fix double words
  isdn: capi: fix "CAPI_VERSION" comment
  doc: DocBook: Fix typos in xml and template file
  Bluetooth: add module name for btwilink
  driver core: unexport static function create_syslog_header
  mmc: core: typo fix in printk specifier
  ARM: spear: clean up editing mistake
  net-sysfs: fix comment typo 'CONFIG_SYFS'
  doc: Insert MODULE_ in module-signing macros
  Documentation: update URL to hfsplus Technote 1150
  gpio: update path to documentation
  ixgbe: Fix format string in ixgbe_fcoe.
  Kconfig: Remove useless "default N" lines
  user_namespace.c: Remove duplicated word in comment
  CREDITS: fix formatting
  treewide: Fix typo in Documentation/DocBook
  mm: Fix warning on make htmldocs caused by slab.c
  ata: ata-samsung_cf: cleanup in header file
  idr: remove unused prototype of idr_free()
2014-04-02 16:23:38 -07:00
Patrick McHardy
c50b960ccc netfilter: nf_tables: implement proper set selection
The current set selection simply choses the first set type that provides
the requested features, which always results in the rbtree being chosen
by virtue of being the first set in the list.

What we actually want to do is choose the implementation that can provide
the requested features and is optimal from either a performance or memory
perspective depending on the characteristics of the elements and the
preferences specified by the user.

The elements are not known when creating a set. Even if we would provide
them for anonymous (literal) sets, we'd still have standalone sets where
the elements are not known in advance. We therefore need an abstract
description of the data charcteristics.

The kernel already knows the size of the key, this patch starts by
introducing a nested set description which so far contains only the maximum
amount of elements. Based on this the set implementations are changed to
provide an estimate of the required amount of memory and the lookup
complexity class.

The set ops have a new callback ->estimate() that is invoked during set
selection. It receives a structure containing the attributes known to the
kernel and is supposed to populate a struct nft_set_estimate with the
complexity class and, in case the size is known, the complete amount of
memory required, or the amount of memory required per element otherwise.

Based on the policy specified by the user (performance/memory, defaulting
to performance) the kernel will then select the best suited implementation.

Even if the set implementation would allow to add more than the specified
maximum amount of elements, they are enforced since new implementations
might not be able to add more than maximum based on which they were
selected.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-02 21:32:57 +02:00
Phoebe Buckheister
e462ded699 mac802154: make csma/cca parameters per-wpan
Commit 9b2777d608 (ieee802154: add TX power control to wpan_phy)
and following erroneously added CSMA and CCA parameters for 802.15.4
devices as PHY parameters, while they are actually MAC parameters and
can differ for any two WPAN instances. Since it is now sensible to have
multiple WPAN devices with differing CSMA/CCA parameters, make these
parameters MAC parameters instead.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-01 16:25:51 -04:00
David S. Miller
ce22bb6122 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-03-31

Please accept this one last round of general wireless updates for
the 3.15 merge window!

For the Bluetooth bits, Gustavo says:

"Here follow another set of patches to 3.15. This is mostly a bug fix
pull request with the exception of one commit from Marcel which adds
tracking to the current configured LE scan type parameter."

Beyond that, notable bits include some final refactoring of rtl8180
and the addition of the rtl8187se driver, fixes for a number of
problems identified by Dan Carpenter and his static analysis tools,
and a handful of other bits here and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:44:55 -04:00
Wang Yufen
60ea37f7a5 ipv6: reuse rt6_need_strict
Move the whole rt6_need_strict as static inline into ip6_route.h,
so that it can be reused

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:16:16 -04:00
John W. Linville
96da266e77 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-03-31 15:22:17 -04:00
Daniel Borkmann
fbc907f0b1 net: filter: move filter accounting to filter core
This patch basically does two things, i) removes the extern keyword
from the include/linux/filter.h file to be more consistent with the
rest of Joe's changes, and ii) moves filter accounting into the filter
core framework.

Filter accounting mainly done through sk_filter_{un,}charge() take
care of the case when sockets are being cloned through sk_clone_lock()
so that removal of the filter on one socket won't result in eviction
as it's still referenced by the other.

These functions actually belong to net/core/filter.c and not
include/net/sock.h as we want to keep all that in a central place.
It's also not in fast-path so uninlining them is fine and even allows
us to get rd of sk_filter_release_rcu()'s EXPORT_SYMBOL and a forward
declaration.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
David S. Miller
64c27237a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/marvell/mvneta.c

The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:48:54 -04:00
Hannes Frederic Sowa
c15b1ccadb ipv6: move DAD and addrconf_verify processing to workqueue
addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.

A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.

To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.

(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).

addrconf_verify_lock is removed, too. After the transition it is not
needed any more.

As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.

Relevant backtrace:
[  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
[  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[  541.031152] Call Trace:
[  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]

Hunks and backtrace stolen from a patch by Stephen Hemminger.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:54:50 -04:00
Lukasz Rymanowski
3d5a76f08b Bluetooth: Keep msec in DISCOV_LE_TIMEOUT
To be consistent, lets use msec for this timeout as well.

Note: This define value is a minimum scan time taken from BT Core spec 4.0,
Vol 3, Part C, chapter 9.2.6

Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-28 00:09:30 -07:00
Lukasz Rymanowski
b9a7a61e5c Bluetooth: Add new debugfs parameter
With this patch it is possible to control discovery interleaved
timeout value from debugfs.

It is for fine tuning of this timeout.

Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-28 00:09:30 -07:00
Lukasz Rymanowski
ae55f5982a Bluetooth: Keep msec in DISCOV_INTERLEAVED_TIMEOUT
Keep msec instead of jiffies in this define. This is needed by following
patch where we want this timeout to be exposed in debugfs.

Note: Value of this timeout comes from recommendation in BT Core Spec.4.0,
Vol 3, Part C, chapter 13.2.1.

Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-28 00:09:30 -07:00
Michal Kubeček
e5fd387ad5 ipv6: do not overwrite inetpeer metrics prematurely
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:

12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  rto_min lock 1.5s

This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.

Another problem is that old metrics in inetpeer can reappear
unexpectedly for a new route, e.g.

12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip route del fec0::1
12sp0:~ # ip route add fec0::1 dev eth0
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  hoplimit 10 rto_min lock 1s

Resolve the first problem by moving the setting of metrics down
into fib6_add_rt2node() to the point we are sure we are
inserting the new route into the tree. Second problem is
addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
which is set for a new host route in ip6_route_add() and makes
ipv6_cow_metrics() always overwrite the metrics in inetpeer
(even if they are not "new"); it is reset after that.

v5: use a flag in _metrics member rather than one in flags

v4: fix a typo making a condition always true (thanks to Hannes
Frederic Sowa)

v3: rewritten based on David Miller's idea to move setting the
metrics (and allocation in non-host case) down to the point we
already know the route is to be inserted. Also rebased to
net-next as it is quite late in the cycle.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:09:07 -04:00
John W. Linville
2de21e5899 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-03-27 14:16:50 -04:00
Tom Herbert
61b905da33 net: Rename skb->rxhash to skb->hash
The packet hash can be considered a property of the packet, not just
on RX path.

This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:58:20 -04:00
Johan Hedberg
ff5cd29f5c Bluetooth: Store also RSSI for pending advertising reports
Especially in crowded environments it can become frequent that we have
to send out whatever pending event there is stored. Since user space
has its own filtering of small RSSI changes sending a 0 value will
essentially force user space to wake up the higher layers (e.g. over
D-Bus) even though the RSSI didn't actually change more than the
threshold value.

This patch adds storing also of the RSSI for pending advertising reports
so that we report an as accurate RSSI as possible when we have to send
out the stored information to user space.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-26 09:31:40 -07:00
Johan Hedberg
73cf71d986 Bluetooth: Fix line splitting of mgmt_device_found parameters
The line was incorrectly split between the variable type and its name.
This patch fixes the issue.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-26 09:31:39 -07:00
Johan Hedberg
b9a6328f2a Bluetooth: Merge ADV_IND/ADV_SCAN_IND and SCAN_RSP together
To avoid too many events being sent to user space and to help parsing of
all available remote device data it makes sense for us to wait for the
scan response and send a single merged Device Found event to user space.

This patch adds a few new variables to hci_dev to track the last
received ADV_IND/ADV_SCAN_IND, i.e. those which will cause a SCAN_REQ to
be send in the case of active scanning. When the SCAN_RSP is received
the pending data is passed together with the SCAN_RSP to the
mgmt_device_found function which takes care of merging them into a
single Device Found event.

We also need a bit of extra logic to handle situations where we don't
receive a SCAN_RSP after caching some data. In such a scenario we simply
have to send out the pending data as it is and then operate on the new
report as if there was no pending data.

We also need to send out any pending data when scanning stops as
well as ensure that the storage is empty at the start of a new active
scanning session. These both cases are covered by the update to the
hci_cc_le_set_scan_enable function in this patch.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-26 09:31:38 -07:00
Johan Hedberg
3c857757ef Bluetooth: Add directed advertising support through connect()
When we're in peripheral mode (HCI_ADVERTISING flag is set) the most
natural mapping of connect() is to perform directed advertising to the
peer device.

This patch does the necessary changes to enable directed advertising and
keeps the hci_conn state as BT_CONNECT in a similar way as is done for
central or BR/EDR connection initiation.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-26 09:31:38 -07:00
Johan Hedberg
5d2e9fadf4 Bluetooth: Add scan_rsp parameter to mgmt_device_found()
In preparation for being able to merge ADV_IND/ADV_SCAN_IND and SCAN_RSP
together into a single device found event add a second parameter to the
mgmt_device_found function. For now all callers pass NULL as this
parameters since we don't yet have storing of the last received
advertising report.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-26 09:31:37 -07:00
David S. Miller
04f58c8854 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/devicetree/bindings/net/micrel-ks8851.txt
	net/core/netpoll.c

The net/core/netpoll.c conflict is a bug fix in 'net' happening
to code which is completely removed in 'net-next'.

In micrel-ks8851.txt we simply have overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 20:29:20 -04:00
David S. Miller
0fc3196603 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of wireless updates intended for 3.15!

For the mac80211 bits, Johannes says:

"This has a whole bunch of bugfixes for things that went into -next
previously as well as some other bugfixes I didn't want to rush into
3.14 at this point. The rest of it is some cleanups and a few small
features, the biggest of which is probably Janusz's regulatory DFS CAC
time code."

For the Bluetooth bits, Gustavo says:

"One more pull request to 3.15. This is mostly and bug fix pull request, it
contains several fixes and clean up all over the tree, plus some small new
features."

For the NFC bits, Samuel says:

"This is the NFC pull request for 3.15. With this one we have:

- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
  15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
  now supports those through the NFC netlink and digital APIs.

- Support for TI's trf7970a chipset. This chipset relies on the NFC
  digital layer and the driver currently supports type 2, 4A and 5 tags.

- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
  relies on a different firmware download protocal than the C2 one. We
  now support both and use the right one depending on the version we
  detect at runtime.

- Support for 4A tags from the NFC digital layer.

- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande."

For the iwlwifi bits, Emmanuel says:

"We were sending a host command while the mutex wasn't held. This
led to hard-to-catch races."

And...

"I have a fix for a "merge damage" which is not really a merge
damage: it enables scheduled scan which has been disabled in
wireless.git. Since you merged wireless.git into wireless-next.git,
this can now be fixed in wireless-next.git.

Besides this, Alex made a workaround for a hardware bug. This fix
allows us to consume less power in S3. Arik and Eliad continue to
work on D0i3 which is a run-time power saving feature. Eliad also
contributes a few bits to the rate scaling logic to which Eyal adds his
own contribution. Avri dives deep in the power code - newer firmware
will allow to enable power save in newer scenarios. Johannes made a few
clean-ups. I have the regular amount of BT Coex boring stuff. I disable
uAPSD since we identified firmware bugs that cause packet loss. One
thing that do stand out is the udev event that we now send when the
FW asserts. I hope it will allow us to debug the FW more easily."

Also included is one last iwlwifi pull for a build breakage fix...

For the Atheros bits, Kalle says:

"Michal now did some optimisations and was able to improve throughput by
100 Mbps on our MIPS based AP135 platform. Chun-Yeow added some
workarounds to be able to better use ad-hoc mode. Ben improved log
messages and added support for MSDU chaining. And, as usual, also some
smaller fixes."

Beyond that...

Andrea Merello continues his rtl8180 refactoring, in preparation for
a long-awaited rtl8187 driver.  We get a new driver (rsi) for the
RS9113 chip, from Fariya Fatima.  And, of course, we get the usual
round of updates for ath9k, brcmfmac, mwifiex, wil6210, etc. as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 19:25:39 -04:00
Simon Derr
7b4f307276 9pnet: p9_client->conn field is unused. Remove it.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:16 -05:00
Simon Derr
0bfd6845c0 9P: Get rid of REQ_STATUS_FLSH
This request state is mostly useless, and properly implementing it
for RDMA would require an extra lock to be taken in handle_recv()
and in rdma_cancel() to avoid this race:

    handle_recv()           rdma_cancel()
        .                     .
        .                   if req->state == SENT
    req->state = RCVD         .
        .                           req->state = FLSH

So just get rid of it.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:15 -05:00
Simon Derr
afd8d65411 9P: Add cancelled() to the transport functions.
And move transport-specific code out of net/9p/client.c

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:11 -05:00
Dominique Martinet
2b6e72ed74 9P: Add memory barriers to protect request fields over cb/rpc threads handoff
We need barriers to guarantee this pattern works as intended:
[w] req->rc, 1		[r] req->status, 1
wmb			rmb
[w] req->status, 1	[r] req->rc

Where the wmb ensures that rc gets written before status,
and the rmb ensures that if you observe status == 1, rc is the new value.

Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:37:59 -05:00
Li RongQing
0b8c7f6f2a ipv4: remove ip_rt_dump from route.c
ip_rt_dump do nothing after IPv4 route caches removal, so we can remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24 12:45:01 -04:00
Eric Dumazet
99f0b958b1 net: optimize csum_replace2()
When changing one 16bit value by another in IP header, we can adjust
the IP checksum by doing a simple operation described in RFC 1624, as
reminded by David.

csum_partial() is a complex function on x86_64, not really suited for
small number of checksummed bytes.

I spotted csum_partial() being in the top 20 most consuming functions
(more than 1 %) in a GRO workload, which was rather unexpected.

The caller was inet_gro_complete() doing a csum_replace2() when
building the new IP header for the GRO packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24 00:18:44 -04:00
Marcel Holtmann
533553f873 Bluetooth: Track current configured LE scan type parameter
The LE scan type paramter defines if active scanning or passive scanning
is in use. Track the currently set value so it can be used for decision
making from other pieces in the core.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-03-21 22:02:12 +02:00
John W. Linville
49c0ca17ee Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-03-21 14:02:04 -04:00
Eric Dumazet
6326231531 tcp: syncookies: do not use getnstimeofday()
While it is true that getnstimeofday() uses about 40 cycles if TSC
is available, it can use 1600 cycles if hpet is the clocksource.

Switch to get_jiffies_64(), as this is more than enough, and
go back to 60 seconds periods.

Fixes: 8c27bd75f0 ("tcp: syncookies: reduce cookie lifetime to 128 seconds")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:22:42 -04:00
John W. Linville
370c5acef0 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-03-20 11:54:22 -04:00
John W. Linville
7eb2450a51 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-03-20 11:53:20 -04:00
Johan Hedberg
39adbffe4b Bluetooth: Fix passkey endianess in user_confirm and notify_passkey
The passkey_notify and user_confirm functions in mgmt.c were expecting
different endianess for the passkey, leading to a big endian bug and
sparse warning in recently added SMP code. This patch converts both
functions to expect host endianess and do the conversion to little
endian only when assigning to the mgmt event struct.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-19 23:22:07 -07:00
Emmanuel Grumbach
fb378c231d mac80211: set beamforming bit in radiotap
Add a bit in rx_status.vht_flags to let the low level driver
notify mac80211 about a beamformed packet. Propagate this
to the radiotap header.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:57 +01:00
Emmanuel Grumbach
3afc2167f6 cfg80211/mac80211: ignore signal if the frame was heard on wrong channel
On 2.4Ghz band, the channels overlap since the delta
between different channels is 5Mhz while the width of the
receiver is 20Mhz (at least).

This means that we can hear beacons or probe responses from
adjacent channels. These frames will have a significant
lower RSSI which will feed all kinds of logic with inaccurate
data. An obvious example is the roaming algorithm that will
think our AP is getting weak and will try to move to another
AP.

In order to avoid this, update the signal only if the frame
has been heard on the same channel as the one advertised by
the AP in its DS / HT IEs.
We refrain from updating the values only if the AP is
already in the BSS list so that we will still have a valid
(but inaccurate) value if the AP was heard on an adjacent
channel only.

To achieve this, stop taking the channel from DS / HT IEs
in mac80211. The DS / HT IEs is taken into account to
discard the frame if it was received on a disabled channel.
This can happen due to the same phenomenon: the frame is
sent on channel 12, but heard on channel 11 while channel
12 can be disabled on certain devices. Since this check
is done in cfg80211, stop even checking this in mac80211.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[remove unused rx_freq variable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:56 +01:00
Eliad Peller
a0f995a561 mac80211: add status_driver_data array to ieee80211_tx_info
Drivers might want to have private data in addition
to all other ieee80211_tx_info.status fields.

The current ieee80211_tx_info.rate_driver_data overlaps
with some of the non-rate data (e.g. ampdu_ack_len), so
it might not be good enough.

Since we already know how much free bytes remained,
simply use this size to define (void *) array.

While on it, change ack_signal type from int to the more
explicit s32 type.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:55 +01:00
David S. Miller
995dca4ce9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
One patch to rename a newly introduced struct. The rest is
the rework of the IPsec virtual tunnel interface for ipv6 to
support inter address family tunneling and namespace crossing.

1) Rename the newly introduced struct xfrm_filter to avoid a
   conflict with iproute2. From Nicolas Dichtel.

2) Introduce xfrm_input_afinfo to access the address family
   dependent tunnel callback functions properly.

3) Add and use a IPsec protocol multiplexer for ipv6.

4) Remove dst_entry caching. vti can lookup multiple different
   dst entries, dependent of the configured xfrm states. Therefore
   it does not make to cache a dst_entry.

5) Remove caching of flow informations. vti6 does not use the the
   tunnel endpoint addresses to do route and xfrm lookups.

6) Update the vti6 to use its own receive hook.

7) Remove the now unused xfrm_tunnel_notifier. This was used from vti
   and is replaced by the IPsec protocol multiplexer hooks.

8) Support inter address family tunneling for vti6.

9) Check if the tunnel endpoints of the xfrm state and the vti interface
   are matching and return an error otherwise.

10) Enable namespace crossing for vti devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18 14:09:07 -04:00
David S. Miller
e86e180b82 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
most relevantly they are:

* cleanup to remove double semicolon from stephen hemminger.

* calm down sparse warning in xt_ipcomp, from Fan Du.

* nf_ct_labels support for nf_tables, from Florian Westphal.

* new macros to simplify rcu dereferences in the scope of nfnetlink
  and nf_tables, from Patrick McHardy.

* Accept queue and drop (including reason for drop) to verdict
  parsing in nf_tables, also from Patrick.

* Remove unused random seed initialization in nfnetlink_log, from
  Florian Westphal.

* Allow to attach user-specific information to nf_tables rules, useful
  to attach user comments to rule, from me.

* Return errors in ipset according to the manpage documentation, from
  Jozsef Kadlecsik.

* Fix coccinelle warnings related to incorrect bool type usage for ipset,
  from Fengguang Wu.

* Add hash:ip,mark set type to ipset, from Vytas Dauksa.

* Fix message for each spotted by ipset for each netns that is created,
  from Ilia Mirkin.

* Add forceadd option to ipset, which evicts a random entry from the set
  if it becomes full, from Josh Hunt.

* Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu.

* Improve conntrack scalability by removing a central spinlock, original
  work from Eric Dumazet. Jesper Dangaard Brouer took them over to address
  remaining issues. Several patches to prepare this change come in first
  place.

* Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization
  on element removal, etc. from Patrick McHardy.

* Restore context in the rule deletion path, as we now release rule objects
  synchronously, from Patrick McHardy. This gets back event notification for
  anonymous sets.

* Fix NAT family validation in nft_nat, also from Patrick.

* Improve scalability of xt_connlimit by using an array of spinlocks and
  by introducing a rb-tree of hashtables for faster lookup of accounted
  objects per network. This patch was preceded by several patches and
  refactorizations to accomodate this change including the use of kmem_cache,
  from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17 15:06:24 -04:00
John W. Linville
20d83f2464 NFC: 3.15: First pull request
This is the NFC pull request for 3.15. With this one we have:
 
 - Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
   15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
   now supports those through the NFC netlink and digital APIs.
 
 - Support for TI's trf7970a chipset. This chipset relies on the NFC
   digital layer and the driver currently supports type 2, 4A and 5 tags.
 
 - Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
   relies on a different firmware download protocal than the C2 one. We
   now support both and use the right one depending on the version we
   detect at runtime.
 
 - Support for 4A tags from the NFC digital layer.
 
 - A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTI1oQAAoJEIqAPN1PVmxKmxgP/iiwhQwsehiUZ6oXYjAc06WT
 e+vmJwHuDgbEnr5QDSIdpeaKuxYtCErbFcG4SD5+thTQNEeiRsmTLYdkwZ0FCDF+
 WKU3F59EdUZu4F0GyUXvt8lUC3D96bJSIHMCJdg1YDJtyaJcuITeK7lW325CK4hF
 4ik0+HXcv9mt/Dq4k6xgqdgCOSz7o9dJSbzw5NTCN1I+G6CNgYrvYqSmvsKut2C7
 DQvFu8KhbUkSlXgY61YIA2qpf1hfATkfgLdSsjGJfYdKvWLUAyKbEwoG8dkFA38Y
 yqc49SJ6DJ4FGmSt4Qm2VtRa8HLqwjA30jBEeb7Zog80j7zEBS+th0NJxqDWp/82
 eaeNdfrA5d17CHiR9ODuGOFt3VJZeN83u+JCli3FKGF727A/2Z0UI8UgnB5ZfcbZ
 +5IxV6mQp/h8H4AaQzGQ88x2xibhnrW/XYmTpoxj7CcT3AdbGw/IerZXf+afXBce
 iyJZ+Ktr6d5zFucCbm0QyZDSwAhfYDIYGiM6AIM3oum9n3K31dA0Qxy6saMiiK8c
 ArYY6raQbRCaOBVCI8JI6PoQFvw5MNug40v7Q/G88dElte9mSLsGslVVhCNBpAVB
 Atdwa+I3gwgsea1oqGJo0rSEaWv6PsOeQcFG8boDoQDUvgapzk17bTCz4k2DlXO5
 U0Q/GcJnFEdcDhqVKqP5
 =xikL
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.15: First pull request

This is the NFC pull request for 3.15. With this one we have:

- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
  15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
  now supports those through the NFC netlink and digital APIs.

- Support for TI's trf7970a chipset. This chipset relies on the NFC
  digital layer and the driver currently supports type 2, 4A and 5 tags.

- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
  relies on a different firmware download protocal than the C2 one. We
  now support both and use the right one depending on the version we
  detect at runtime.

- Support for 4A tags from the NFC digital layer.

- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-03-17 13:16:50 -04:00
David S. Miller
85dcce7a73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	drivers/net/xen-netback/netback.c

Both the r8152 and netback conflicts were simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:31:55 -04:00
Phoebe Buckheister
a13061ec04 6lowpan: move lowpan frag_info out of 802.15.4 headers
Fragmentation and reassembly information for 6lowpan is independent from
the 802.15.4 stack and used only by the 6lowpan reassembly process. Move
the ieee802154_frag_info struct to a private are, it needn't be in the
802.15.4 skb control block.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:15:26 -04:00
Phoebe Buckheister
ae531b9475 ieee802154: use ieee802154_addr instead of *_sa variants
Change all internal uses of ieee802154_addr_sa to ieee802154_addr,
except for those instances that communicate directly with userspace.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:15:26 -04:00
Phoebe Buckheister
e6278d9200 mac802154: use header operations to create/parse headers
Use the operations on 802.15.4 header structs introduced in a previous
patch to create and parse all headers in the mac802154 stack. This patch
reduces code duplication between different parts of the mac802154 stack
that needed information from headers, and also fixes a few bugs that
seem to have gone unnoticed until now:

 * 802.15.4 dgram sockets would return a slightly incorrect value for
   the SIOCINQ ioctl
 * mac802154 would not drop frames with the "security enabled" bit set,
   even though it does not support security, in violation of the
   standard

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:15:26 -04:00
Phoebe Buckheister
94b4f6c21c ieee802154: add header structs with endiannes and operations
This patch provides a set of structures to represent 802.15.4 MAC
headers, and a set of operations to push/pull/peek these structs from
skbs. We cannot simply pointer-cast the skb MAC header pointer to these
structs, because 802.15.4 headers are wildly variable - depending on the
first three bytes, virtually all other fields of the header may be
present or not, and be present with different lengths.

The new header creation/parsing routines also support 802.15.4 security
headers, which are currently not supported by the mac802154
implementation of the protocol.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:15:26 -04:00
Phoebe Buckheister
b70ab2e87f ieee802154: enforce consistent endianness in the 802.15.4 stack
Enable sparse warnings about endianness, replace the remaining fields
regarding network operations without explicit endianness annotations
with such that are annotated, and propagate this through the entire
stack.

Uses of ieee802154_addr_sa are not changed yet, this patch is only
concerned with all other fields (such as address filters, operation
parameters and the likes).

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:15:26 -04:00
Phoebe Buckheister
46ef0eb3ea ieee802154: add address struct with proper endiannes and some operations
Add a replacement ieee802154_addr struct with proper endianness on
fields. Short address fields are stored as __le16 as on the network,
extended (EUI64) addresses are __le64 as opposed to the u8[8] format
used previously. This disconnect with the netdev address, which is
stored as big-endian u8[8], is intentional.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:15:26 -04:00
Phoebe Buckheister
376b7bd355 ieee802154: rename struct ieee802154_addr to *_sa
The struct as currently defined uses host byte order for some fields,
and most big endian/EUI display byte order for other fields. Inside the
stack, endianness should ideally match network byte order where possible
to minimize the number of byteswaps done in critical paths, but this
patch does not address this; it is only preparatory.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:15:25 -04:00
Steffen Klassert
573ce1c11b xfrm6: Remove xfrm_tunnel_notifier
This was used from vti and is replaced by the IPsec protocol
multiplexer hooks. It is now unused, so remove it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-03-14 07:28:08 +01:00
Steffen Klassert
7e14ea1521 xfrm6: Add IPsec protocol multiplexer
This patch adds an IPsec protocol multiplexer for ipv6. With
this it is possible to add alternative protocol handlers, as
needed for IPsec virtual tunnel interfaces.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-03-14 07:28:07 +01:00
Steffen Klassert
2f32b51b60 xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly
IPv6 can be build as a module, so we need mechanism to access
the address family dependent callback functions properly.
Therefore we introduce xfrm_input_afinfo, similar to that
what we have for the address family dependent part of
policies and states.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-03-14 07:28:07 +01:00
John W. Linville
42775a34d2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/ath/ath9k/recv.c
2014-03-13 14:21:43 -04:00
Steffen Klassert
4a93f5095a flowcache: Fix resource leaks on namespace exit.
We leak an active timer, the hotcpu notifier and all allocated
resources when we exit a namespace. Fix this by introducing a
flow_cache_fini() function where we release the resources before
we exit.

Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:31:18 -04:00
Eric Dumazet
c3f9b01849 tcp: tcp_release_cb() should release socket ownership
Lars Persson reported following deadlock :

-000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock
-001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
-002 |ip_local_deliver_finish(skb = 0x8BD527A0)
-003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
-004 |netif_receive_skb(skb = 0x8BD527A0)
-005 |elk_poll(napi = 0x8C770500, budget = 64)
-006 |net_rx_action(?)
-007 |__do_softirq()
-008 |do_softirq()
-009 |local_bh_enable()
-010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
-011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
-012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
-013 |tcp_release_cb(sk = 0x8BE6B2A0)
-014 |release_sock(sk = 0x8BE6B2A0)
-015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
-016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
-017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
-018 |smb_send_kvec()
-019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
-020 |cifs_call_async()
-021 |cifs_async_writev(wdata = 0x87FD6580)
-022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
-023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
-024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
-025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
-026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
-027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
-028 |bdi_writeback_workfn(work = 0x87E4A9CC)
-029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
-030 |worker_thread(__worker = 0x8B045880)
-031 |kthread(_create = 0x87CADD90)
-032 |ret_from_kernel_thread(asm)

Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
it is running from softirq context.

Lars trace involved a NIC without RX checksum support but other points
are problematic as well, like the prequeue stuff.

Problem is triggered by a timer, that found socket being owned by user.

tcp_release_cb() should call tcp_write_timer_handler() or
tcp_delack_timer_handler() in the appropriate context :

BH disabled and socket lock held, but 'owned' field cleared,
as if they were running from timer handlers.

Fixes: 6f458dfb40 ("tcp: improve latencies of timer triggered events")
Reported-by: Lars Persson <lars.persson@axis.com>
Tested-by: Lars Persson <lars.persson@axis.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:45:59 -04:00
Eric Dumazet
d32d9bb85c flowcache: restore a single flow_cache kmem_cache
It is not legal to create multiple kmem_cache having the same name.

flowcache can use a single kmem_cache, no need for a per netns
one.

Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 21:45:11 -04:00
Mark A. Greer
ceeee42d85 NFC: digital: Rename Type V tags to Type 5 tags
According to the latest draft specification from
the NFC-V committee, ISO/IEC 15693 tags will be
referred to as "Type 5" tags and not "Type V"
tags anymore.  Make the code reflect the new
terminology.

Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-03-11 00:40:59 +01:00
Gu Zheng
5812521be0 net: add a pre-check of net_ns in sk_change_net()
We do not need to switch the net_ns if the target net_ns the same
as the current one, so here we add a pre-check of net_ns to avoid
this as David suggested.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 16:29:48 -04:00
Marcel Holtmann
53ac6ab612 Bluetooth: Make LTK and CSRK only persisent when bonding
In case the pairable option has been disabled, the pairing procedure
does not create keys for bonding. This means that these generated keys
should not be stored persistently.

For LTK and CSRK this is important to tell userspace to not store these
new keys. They will be available for the lifetime of the device, but
after the next power cycle they should not be used anymore.

Inform userspace to actually store the keys persistently only if both
sides request bonding.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-03-10 14:57:33 +02:00
Marcel Holtmann
7ee4ea3692 Bluetooth: Add support for handling signature resolving keys
The connection signature resolving key (CSRK) is used for attribute
protocol signed write procedures. This change generates a new local
key during pairing and requests the peer key as well.

Newly generated key and received key will be provided to userspace
using the New Signature Resolving Key management event.

The Master CSRK can be used for verification of remote signed write
PDUs and the Slave CSRK can be used for sending signed write PDUs
to the remote device.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-03-09 21:39:50 +02:00
Patrick McHardy
62472bcefb netfilter: nf_tables: restore context for expression destructors
In order to fix set destruction notifications and get rid of unnecessary
members in private data structures, pass the context to expressions'
destructor functions again.

In order to do so, replace various members in the nft_rule_trans structure
by the full context.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-08 12:35:17 +01:00
Jesper Dangaard Brouer
93bb0ceb75 netfilter: conntrack: remove central spinlock nf_conntrack_lock
nf_conntrack_lock is a monolithic lock and suffers from huge contention
on current generation servers (8 or more core/threads).

Perf locking congestion is clear on base kernel:

-  72.56%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
   - _raw_spin_lock_bh
      + 25.33% init_conntrack
      + 24.86% nf_ct_delete_from_lists
      + 24.62% __nf_conntrack_confirm
      + 24.38% destroy_conntrack
      + 0.70% tcp_packet
+   2.21%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
+   1.15%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
+   0.77%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
+   0.70%  ksoftirqd/6  [nf_conntrack]       [k] nf_ct_delete
+   0.55%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table

This patch change conntrack locking and provides a huge performance
improvement.  SYN-flood attack tested on a 24-core E5-2695v2(ES) with
10Gbit/s ixgbe (with tool trafgen):

 Base kernel:   810.405 new conntrack/sec
 After patch: 2.233.876 new conntrack/sec

Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
 # iptables -A INPUT -m state --state INVALID -j DROP
 # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0

Use an array of hashed spinlocks to protect insertions/deletions of
conntracks into the hash table. 1024 spinlocks seem to give good
results, at minimal cost (4KB memory). Due to lockdep max depth,
1024 becomes 8 if CONFIG_LOCKDEP=y

The hash resize is a bit tricky, because we need to take all locks in
the array. A seqcount_t is used to synchronize the hash table users
with the resizing process.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07 11:41:13 +01:00
Jesper Dangaard Brouer
ca7433df3a netfilter: conntrack: seperate expect locking from nf_conntrack_lock
Netfilter expectations are protected with the same lock as conntrack
entries (nf_conntrack_lock).  This patch split out expectations locking
to use it's own lock (nf_conntrack_expect_lock).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07 11:41:01 +01:00
Jesper Dangaard Brouer
b7779d06f9 netfilter: conntrack: spinlock per cpu to protect special lists.
One spinlock per cpu to protect dying/unconfirmed/template special lists.
(These lists are now per cpu, a bit like the untracked ct)
Add a @cpu field to nf_conn, to make sure we hold the appropriate
spinlock at removal time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07 11:40:38 +01:00
Jesper Dangaard Brouer
b476b72a0f netfilter: trivial code cleanup and doc changes
Changes while reading through the netfilter code.

Added hint about how conntrack nf_conn refcnt is accessed.
And renamed repl_hash to reply_hash for readability

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07 11:40:04 +01:00
Nicolas Dichtel
870a2df4ca xfrm: rename struct xfrm_filter
iproute2 already defines a structure with that name, let's use another one to
avoid any conflict.

CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-03-07 08:12:37 +01:00
Alexander Aring
cefc8c8a7c 6lowpan: move 6lowpan header to include/net
This header is used by bluetooth and ieee802154 branch. This patch
move this header to the include/net directory to avoid a use of a
relative path in include.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 17:21:38 -05:00
Andrew Lutomirski
adca476782 net: Improve SO_TIMESTAMPING documentation and fix a minor code bug
The original documentation was very unclear.

The code fix is presumably related to the formerly unclear
documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on
__sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops
from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is
set is pointless.  This should have no user-observable effect.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 16:18:01 -05:00
David S. Miller
f7324acd98 tcp: Use NET_ADD_STATS instead of NET_ADD_STATS_BH in tcp_event_new_data_sent()
Can be invoked from non-BH context.

Based upon a patch by Eric Dumazet.

Fixes: f19c29e3e3 ("tcp: snmp stats for Fast Open, SYN rtx, and data pkts")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 15:19:43 -05:00
Hannes Frederic Sowa
e90c14835b inet: remove now unused flag DST_NOPEER
Commit e688a60480 ("net: introduce DST_NOPEER dst flag") introduced
DST_NOPEER because because of crashes in ipv6_select_ident called from
udp6_ufo_fragment.

Since commit 916e4cf46d ("ipv6: reuse ip6_frag_id from
ip6_ufo_append_data") we don't call ipv6_select_ident any more from
ip6_ufo_append_data, thus this flag lost its purpose and can be removed.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 13:15:52 -05:00
David S. Miller
67ddc87f16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/ath/ath9k/recv.c
	drivers/net/wireless/mwifiex/pcie.c
	net/ipv6/sit.c

The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.

The two wireless conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05 20:32:02 -05:00
John W. Linville
9ffd2e9a17 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-03-03 14:59:12 -05:00
Alexander Aring
7240cdec60 6lowpan: handling 6lowpan fragmentation via inet_frag api
This patch drops the current way of 6lowpan fragmentation on receiving
side and replace it with a implementation which use the inet_frag api.
The old fragmentation handling has some race conditions and isn't
rfc4944 compatible. Also adding support to match fragments on
destination address, source address, tag value and datagram_size
which is missing in the current implementation.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-28 17:05:22 -05:00
Alexander Aring
633fc86ff6 net: ns: add ieee802154_6lowpan namespace
This patch adds necessary ieee802154 6lowpan namespace to provide the
inet_frag information. This is a initial support for handling 6lowpan
fragmentation with the inet_frag api.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-28 17:05:22 -05:00
Alexander Aring
89745c9c41 6lowpan: add frag information struct
This patch adds a 6lowpan fragmentation struct into cb of skb which
is necessary to hold fragmentation information.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-28 17:05:21 -05:00
Johan Hedberg
81ad6fd969 Bluetooth: Remove unnecessary stop_scan_complete function
The stop_scan_complete function was used as an intermediate step before
doing the actual connection creation. Since we're using hci_request
there's no reason to have this extra function around, i.e. we can simply
put both HCI commands into the same request.

The single task that the intermediate function had, i.e. indicating
discovery as stopped is now taken care of by a new
HCI_LE_SCAN_INTERRUPTED flag which allows us to do the discovery state
update when the stop scan command completes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-28 10:28:17 -08:00
Johan Hedberg
9489eca4ab Bluetooth: Add timeout for LE connection attempts
LE connection attempts do not have a controller side timeout in the same
way as BR/EDR has (in form of the page timeout). Since we always do
scanning before initiating connections the attempts are always expected
to succeed in some reasonable time.

This patch adds a timer which forces a cancellation of the connection
attempt within 20 seconds if it has not been successful by then. This
way we e.g. ensure that mgmt_pair_device times out eventually and gives
an error response.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-28 07:56:42 -08:00
Johan Hedberg
a7139edd28 Bluetooth: Add defines for LE initiator filter policy
This patch adds defines for the initiator filter policy parameter values
of the HCI_LE_Create_Connection command. They will be used in a
subsequent patch to check whether we should have a timeout for the
connection attempt or not.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-28 07:56:42 -08:00
Johan Hedberg
cb1d68f7a3 Bluetooth: Track LE initiator and responder address information
For SMP we need the local and remote addresses (and their types) that
were used to establish the connection. These may be different from the
Identity Addresses or even the current RPA. To guarantee that we have
this information available and it is correct track these values
separately from the very beginning of the connection.

For outgoing connections we set the values as soon as we get a
successful command status for HCI_LE_Create_Connection (for which the
patch adds a command status handler function) and for incoming
connections as soon as we get a LE Connection Complete HCI event.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-28 07:53:07 -08:00
Marcel Holtmann
fe39c7b2da Bluetooth: Use __le64 type for LE random numbers
The random numbers in Bluetooth Low Energy are 64-bit numbers and should
also be little endian since the HCI specification is little endian.

Change the whole Low Energy pairing to use __le64 instead of a byte
array.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-28 12:36:04 +02:00
Johan Hedberg
a3172b7eb4 Bluetooth: Add timer to force power off
If some of the cleanup commands caused by mgmt_set_powered(off) never
complete we should still force the adapter to be powered down. This is
rather easy to do since hdev->power_off is already a delayed work
struct. This patch schedules this delayed work if at least one HCI
command was sent by the cleanup procedure.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-27 23:41:07 -08:00
Marcel Holtmann
d2ab0ac18d Bluetooth: Add support for storing LE white list entries
The current LE white list entries require storing in the HCI controller
structure. So provide a storage and access functions for it. In addition
export the current list via debugfs.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-28 09:31:20 +02:00
Marcel Holtmann
d9a7b0a53f Bluetooth: Add definitions for LE white list HCI commands
Add the definitions for clearing the LE white list, adding entries to
the LE white list and removing entries from the LE white list.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-28 09:31:09 +02:00
Marcel Holtmann
c9507490ab Bluetooth: Make hci_blacklist_clear function static
The hci_blacklist_clear function is not used outside of hci_core.c and
can be made static.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-28 09:31:05 +02:00
Bjørn Mork
bc86195910 ipv6: addrconf: silence sparse endianness warnings
Avoid the following sparse __CHECK_ENDIAN__ warnings:

 include/net/addrconf.h:318:25: warning: restricted __be64 degrades to integer
 include/net/addrconf.h:318:70: warning: restricted __be64 degrades to integer
 include/net/addrconf.h:330:25: warning: restricted __be64 degrades to integer
 include/net/addrconf.h:330:70: warning: restricted __be64 degrades to integer
 include/net/addrconf.h:347:25: warning: restricted __be64 degrades to integer
 include/net/addrconf.h:348:26: warning: restricted __be64 degrades to integer
 include/net/addrconf.h:349:18: warning: restricted __be64 degrades to integer

The warnings are false but they make it harder to spot real
bugs.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-27 17:13:20 -05:00
David S. Miller
8e1f40ec77 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
This is the rework of the IPsec virtual tunnel interface
for ipv4 to support inter address family tunneling and
namespace crossing. The only change to the last RFC version
is a compile fix for an odd configuration where CONFIG_XFRM
is set but CONFIG_INET is not set.

1) Add and use a IPsec protocol multiplexer.

2) Add xfrm_tunnel_skb_cb to the skb common buffer
   to store a receive callback there.

3) Make vti work with i_key set by not including the i_key
   when comupting the hash for the tunnel lookup in case of
   vti tunnels.

4) Update ip_vti to use it's own receive hook.

5) Remove xfrm_tunnel_notifier, this is replaced by the IPsec
   protocol multiplexer.

6) We need to be protocol family indepenent, so use the on xfrm_lookup
   returned dst_entry instead of the ipv4 rtable in vti_tunnel_xmit().

7) Add support for inter address family tunneling.

8) Check if the tunnel endpoints of the xfrm state and the vti interface
   are matching and return an error otherwise.

8) Enable namespace crossing tor vti devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-27 16:31:54 -05:00
David S. Miller
23187212e7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
1) Build fix for ip_vti when NET_IP_TUNNEL is not set.
   We need this set to have ip_tunnel_get_stats64()
   available.

2) Fix a NULL pointer dereference on sub policy usage.
   We try to access a xfrm_state from the wrong array.

3) Take xfrm_state_lock in xfrm_migrate_state_find(),
   we need it to traverse through the state lists.

4) Clone states properly on migration, otherwise we crash
   when we migrate a state with aead algorithm attached.

5) Fix unlink race when between thread context and timer
   when policies are deleted.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-27 16:19:41 -05:00
Johan Hedberg
a1f4c3188b Bluetooth: Add hci_copy_identity_address convenience function
The number of places needing the local Identity Address are starting to
grow so it's better to have a single place for the logic of determining
it. This patch adds a convenience function for getting the Identity
Address and updates the two current places needing this to use it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-27 08:50:21 -08:00
Johan Hedberg
56ed2cb88c Bluetooth: Add tracking of advertising address type
To know the real source address for incoming connections (needed e.g.
for SMP) we should store the own_address_type parameter that was used
for the last HCI_LE_Write_Advertising_Parameters command. This patch
adds a proper command complete handler for the command and stores the
address type in a new adv_addr_type variable in the hci_dev struct.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-27 08:50:21 -08:00
Pablo Neira Ayuso
0768b3b3d2 netfilter: nf_tables: add optional user data area to rules
This allows us to store user comment strings, but it could be also
used to store any kind of information that the user application needs
to link to the rule.

Scratch 8 bits for the new ulen field that indicates the length the
user data area. 4 bits from the handle (so it's 42 bits long, according
to Patrick, it would last 139 years with 1000 new rules per second)
and 4 bits from dlen (so the expression data area is 4K, which seems
sufficient by now even considering the compatibility layer).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
2014-02-27 16:56:00 +01:00
Andre Guedes
8ef30fd3d1 Bluetooth: Create hci_req_add_le_passive_scan helper
This patches creates the public hci_req_add_le_passive_scan helper so
it can be re-used outside hci_core.c in the next patch.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:35 -08:00
Andre Guedes
a9b0a04c2a Bluetooth: Connection parameters and resolvable address
We should only accept connection parameters from identity addresses
(public or random static). Thus, we should check the address type
in hci_conn_params_add().

Additionally, since the IRK is removed during unpair, we should also
remove the connection parameters from that device.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:35 -08:00
Andre Guedes
9fcb18ef3a Bluetooth: Introduce LE auto connect options
This patch introduces the LE auto connection options: HCI_AUTO_CONN_
ALWAYS and HCI_AUTO_CONN_LINK_LOSS. Their working mechanism are
described as follows:

The HCI_AUTO_CONN_ALWAYS option configures the kernel to always re-
establish the connection, no matter the reason the connection was
terminated. This feature is required by some LE profiles such as
HID over GATT, Health Thermometer and Blood Pressure. These profiles
require the host autonomously connect to the device as soon as it
enters in connectable mode (start advertising) so the device is able
to delivery notifications or indications.

The BT_AUTO_CONN_LINK_LOSS option configures the kernel to re-
establish the connection in case the connection was terminated due
to a link loss. This feature is required by the majority of LE
profiles such as Proximity, Find Me, Cycling Speed and Cadence and
Time.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:34 -08:00
Andre Guedes
a4790dbd43 Bluetooth: Introduce LE auto connection infrastructure
This patch introduces the LE auto connection infrastructure which
will be used to implement the LE auto connection options.

In summary, the auto connection mechanism works as follows: Once the
first pending LE connection is created, the background scanning is
started. When the target device is found in range, the kernel
autonomously starts the connection attempt. If connection is
established successfully, that pending LE connection is deleted and
the background is stopped.

To achieve that, this patch introduces the hci_update_background_scan()
which controls the background scanning state. This function starts or
stops the background scanning based on the hdev->pend_le_conns list. If
there is no pending LE connection, the background scanning is stopped.
Otherwise, we start the background scanning.

Then, every time a pending LE connection is added we call hci_update_
background_scan() so the background scanning is started (in case it is
not already running). Likewise, every time a pending LE connection is
deleted we call hci_update_background_scan() so the background scanning
is stopped (in case this was the last pending LE connection) or it is
started again (in case we have more pending LE connections). Finally,
we also call hci_update_background_scan() in hci_le_conn_failed() so
the background scan is restarted in case the connection establishment
fails. This way the background scanning keeps running until all pending
LE connection are established.

At this point, resolvable addresses are not support by this
infrastructure. The proper support is added in upcoming patches.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:34 -08:00
Andre Guedes
77a77a30ae Bluetooth: Introduce hdev->pend_le_conn list
This patch introduces the hdev->pend_le_conn list which holds the
device addresses the kernel should autonomously connect. It also
introduces some helper functions to manipulate the list.

The list and helper functions will be used by the next patch which
implements the LE auto connection infrastructure.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:34 -08:00
Andre Guedes
04a6c5898e Bluetooth: Refactor HCI connection code
hci_connect() is a very simple and useless wrapper of hci_connect_acl
and hci_connect_le functions. Addtionally, all places where hci_connect
is called the link type value is passed explicitly. This way, we can
safely delete hci_connect, declare hci_connect_acl and hci_connect_le
in hci_core.h and call them directly.

No functionality is changed by this patch.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:34 -08:00
Andre Guedes
2acf3d9066 Bluetooth: Stop scanning on LE connection
Some LE controllers don't support scanning and creating a connection
at the same time. So we should always stop scanning in order to
establish the connection.

Since we may prematurely stop the discovery procedure in favor of
the connection establishment, we should also cancel hdev->le_scan_
disable delayed work and set the discovery state to DISCOVERY_STOPPED.

This change does a small improvement since it is not mandatory the
user stops scanning before connecting anymore. Moreover, this change
is required by upcoming LE auto connection mechanism in order to work
properly with controllers that don't support background scanning and
connection establishment at the same time.

In future, we might want to do a small optimization by checking if
controller is able to scan and connect at the same time. For now,
we want the simplest approach so we always stop scanning (even if
the controller is able to carry out both operations).

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:33 -08:00
Andre Guedes
06c053fb54 Bluetooth: Declare le_conn_failed in hci_core.h
This patch adds the "hci_" prefix to le_conn_failed() helper and
declares it in hci_core.h so it can be reused in hci_event.c.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:33 -08:00
Andre Guedes
b1efcc2870 Bluetooth: Create hci_req_add_le_scan_disable helper
This patch moves stop LE scanning duplicate code to one single
place and reuses it. This will avoid more duplicate code in
upcoming patches.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-26 19:41:33 -08:00
Eric Dumazet
740b0f1841 tcp: switch rtt estimations to usec resolution
Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.

FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'

TCP_CONG_RTT_STAMP is no longer needed.

As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.

In this example ss command displays a srtt of 32 usecs (10Gbit link)

lpk51:~# ./ss -i dst lpk52
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer
Address:Port
tcp    ESTAB      0      1         10.246.11.51:42959
10.246.11.52:64614
         cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559

Updated iproute2 ip command displays :

lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51

Old binary displays :

lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51

With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 17:08:40 -05:00
Hannes Frederic Sowa
0b95227a7b ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.

Fixes: 93b36cf342 ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 15:51:01 -05:00
Hannes Frederic Sowa
1b34657635 ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.

This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.

As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.

The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.

Fixes: 482fc6094a ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 15:51:00 -05:00
Janusz Dziedzic
31559f35c5 cfg80211: DFS get CAC time from regulatory database
Send Channel Availability Check time as a parameter
of start_radar_detection() callback.
Get CAC time from regulatory database.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-25 17:32:54 +01:00
Janusz Dziedzic
089027e57c cfg80211: regulatory: allow getting DFS CAC time from userspace
Introduce DFS CAC time as a regd param, configured per REG_RULE and
set per channel in cfg80211. DFS CAC time is close connected with
regulatory database configuration. Instead of using hardcoded values,
get DFS CAC time form regulatory database. Pass DFS CAC time to user
mode (mainly for iw reg get, iw list, iw info). Allow setting DFS CAC
time via CRDA. Add support for internal regulatory database.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
[rewrap commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-25 17:29:25 +01:00
Patrick McHardy
67a8fc27cc netfilter: nf_tables: add nft_dereference() macro
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-25 11:29:23 +01:00
Steffen Klassert
9994bb8e1e xfrm4: Remove xfrm_tunnel_notifier
This was used from vti and is replaced by the IPsec protocol
multiplexer hooks. It is now unused, so remove it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-25 07:04:18 +01:00
Steffen Klassert
70be6c91c8 xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer
IPsec vti_rcv needs to remind the tunnel pointer to
check it later at the vti_rcv_cb callback. So add
this pointer to the IPsec common buffer, initialize
it and check it to avoid transport state matching of
a tunneled packet.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-25 07:04:17 +01:00
Steffen Klassert
3328715e6c xfrm4: Add IPsec protocol multiplexer
This patch add an IPsec protocol multiplexer. With this
it is possible to add alternative protocol handlers as
needed for IPsec virtual tunnel interfaces.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-25 07:04:16 +01:00
David S. Miller
1f5a7407e4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Introduce skb_to_sgvec_nomark function to add further data to the sg list
   without calling sg_unmark_end first. Needed to add extended sequence
   number informations. From Fan Du.

2) Add IPsec extended sequence numbers support to the Authentication Header
   protocol for ipv4 and ipv6. From Fan Du.

3) Make the IPsec flowcache namespace aware, from Fan Du.

4) Avoid creating temporary SA for every packet when no key manager is
   registered. From Horia Geanta.

5) Support filtering of SA dumps to show only the SAs that match a
   given filter. From Nicolas Dichtel.

6) Remove caching of xfrm_policy_sk_bundles. The cached socket policy bundles
   are never used, instead we create a new cache entry whenever xfrm_lookup()
   is called on a socket policy. Most protocols cache the used routes to the
   socket, so this caching is not needed.

7)  Fix a forgotten SADB_X_EXT_FILTER length check in pfkey, from Nicolas
    Dichtel.

8) Cleanup error handling of xfrm_state_clone.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-24 18:13:33 -05:00
John W. Linville
db18014f65 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-02-24 15:04:35 -05:00
Johan Hedberg
12d4a3b2cc Bluetooth: Move check for MGMT_CONNECTED flag into mgmt.c
Once mgmt_set_powered(off) starts doing disconnections we'll need to
care about any disconnections in mgmt.c and not just those with the
MGMT_CONNECTED flag set. Therefore, move the check into mgmt.c from
hci_event.c.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-24 11:10:36 -08:00
Johan Hedberg
778b235a3b Bluetooth: Move HCI_ADVERTISING handling into mgmt.c
We'll soon need to make decisions on toggling the HCI_ADVERTISING flag
based on pending mgmt_set_powered commands. Therefore, move the handling
from hci_event.c into mgmt.c.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-24 11:10:36 -08:00
Johan Hedberg
f4f0750500 Bluetooth: Add convenience function for getting total connection count
This patch adds a convenience function to return the number of
connections in the conn_hash list. This will be useful once we update
the power off procedure to disconnect any open connections.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-24 11:10:35 -08:00
Marcel Holtmann
2b5224dca5 Bluetooth: Store current RPA and update it if needed
The RPA needs to be stored to know which is the current one. Otherwise
it is impossible to ensure that always the correct RPA can be programmed
into the controller when it is needed.

Current code checks if the address in the controller is a RPA, but that
can potentially lead to using a RPA that can not be resolved with the
IRK that has been distributed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-24 08:49:31 +02:00
Marcel Holtmann
94b1fc92cd Bluetooth: Use unresolvable private address for active scanning
When running active scanning during LE discovery, do not reveal the own
identity to the peer devices. In case LE privacy has been enabled, then
a resolvable private address is used. If the LE privacy option is off,
then use an unresolvable private address.

The public address or static random address is never used in active
scanning anymore. This ensures that scan request are send using a
random address.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-24 08:45:58 +02:00
Johan Hedberg
7bf32048b1 Bluetooth: Remove unneeded hdev->own_addr_type
Now that the identity address type is always looked up for all
successful connections, the hdev->own_addr_type variable has become
completely unnecessary. Simply remove it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-23 12:24:27 -08:00
Johan Hedberg
ebd3a74765 Bluetooth: Add hci_update_random_address() convenience function
This patch adds a convenience function for updating the local random
address which is needed before advertising, scanning and initiating LE
connections.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-23 12:24:25 -08:00
Johan Hedberg
d6bfd59cae Bluetooth: Add timer for regenerating local RPA
This patch adds a timer for updating the local RPA periodically. The
default timeout is set to 15 minutes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-23 12:24:25 -08:00
Johan Hedberg
755a900fcd Bluetooth: Add mgmt defines for privacy
This patch adds basic mgmt defines for enabling privacy. This includes a
new setting flag as well as the Set Privacy command.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-22 09:59:24 -08:00
Johan Hedberg
863efaf224 Bluetooth: Add initial code for distributing local IRK
This code adds a HCI_PRIVACY flag to track whether Privacy support is
enabled (meaning we have a local IRK) and makes sure the IRK is
distributed during SMP key distribution in case this flag is set.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-22 09:59:23 -08:00
Eric Dumazet
f5ddcbbb40 net-tcp: fastopen: fix high order allocations
This patch fixes two bugs in fastopen :

1) The tcp_sendmsg(...,  @size) argument was ignored.

   Code was relying on user not fooling the kernel with iovec mismatches

2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5
allocations, which are likely to fail when memory gets fragmented.

Fixes: 783237e8da ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-22 00:05:21 -05:00
Luciano Coelho
b80edbc177 cfg80211: docbook: add interface combinations documentation
Add the ieee80211_iface_limit and the ieee80211_iface_combination
structures to docbook.  Reformat the examples of combinations
slightly, so it looks a bit better on docbook.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-21 09:27:34 +01:00
Luciano Coelho
0fcf8ac5ac cfg80211: docbook: fix small formatting error
docbook (or one of its friends) gets confused with semi-colons in the
argument descriptions, causing it to think that the semi-colon is
marking a new section in the description of addr_mask in wiphy struct.

Prevent this by using hyphens instead of semi-colons in the mask
example.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-21 09:27:34 +01:00
Marcel Holtmann
3f959d46a6 Bluetooth: Provide option for changing LE advertising channel map
For testing purposes it is useful to provide an option to change the
advertising channel map. So add a debugfs option to allow this.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-21 06:20:59 +02:00
John W. Linville
88daf80dcc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-02-20 15:02:02 -05:00
Nicolas Dichtel
cf71d2bc0b sit: fix panic with route cache in ip tunnels
Bug introduced by commit 7d442fab0a ("ipv4: Cache dst in tunnels").

Because sit code does not call ip_tunnel_init(), the dst_cache was not
initialized.

CC: Tom Herbert <therbert@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-20 13:13:50 -05:00
Johannes Berg
37e3308cb2 mac80211: allow driver to return error from sched_scan_stop
In order to solve races with sched_scan_stop, it is necessary
for the driver to be able to return an error to propagate that
to cfg80211 so it doesn't send an event.

Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-20 16:09:54 +01:00
Johannes Berg
d9b8396a52 cfg80211: document sched_scan_stop synchronous behaviour
Due to userspace assumptions, the sched_scan_stop operation must
be synchronous, i.e. once it returns a new scheduled scan must be
able to start immediately. Document this in the API.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-20 16:08:28 +01:00
Jiri Kosina
d4263348f7 Merge branch 'master' into for-next 2014-02-20 14:54:28 +01:00
Steffen Klassert
ee5c23176f xfrm: Clone states properly on migration
We loose a lot of information of the original state if we
clone it with xfrm_state_clone(). In particular, there is
no crypto algorithm attached if the original state uses
an aead algorithm. This patch add the missing information
to the clone state.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-20 14:30:10 +01:00
Sunil Dutt Undekari
df942e7ba7 cfg80211: Pass TDLS peer capability information in tdls_mgmt
While framing the TDLS Setup Confirmation frame, the driver needs to
know if the TDLS peer is VHT/HT/WMM capable and thus shall construct
the VHT/HT operation / WMM parameter elements accordingly. Supplicant
determines if the TDLS peer is VHT/HT/WMM capable based on the
presence of the respective IEs in the received TDLS Setup Response frame.

The host driver should not need to parse the received TDLS Response
frame and thus, should be able to rely on the supplicant to indicate
the capability of the peer through additional flags while transmitting
the TDLS Setup Confirmation frame through tdls_mgmt operations.

Signed-off-by: Sunil Dutt Undekari <usdutt@qti.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-20 11:55:25 +01:00
Marcel Holtmann
7a4cd51dec Bluetooth: Track the current configured random address
For Bluetooth controllers with LE support, track the value of the
currently configured random address. It is important to know what
the current random address is to avoid unneeded attempts to set
a new address. This will become important when introducing the
LE privacy support in the future.

In addition expose the current configured random address via
debugfs for debugging purposes.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-20 08:28:04 +02:00
Marcel Holtmann
b32bba6ced Bluetooth: Replace own_address_type with force_static_address debugfs
The own_address_type debugfs option does not providing enough
flexibity for interacting with the upcoming LE privacy support.

What really is needed is an option to force using the static address
compared to the public address. The new force_static_address debugfs
option does exactly that. In addition it is also only available when
the controller does actually have a public address. For single mode
LE only controllers this option will not be available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-20 08:19:37 +02:00
Hannes Frederic Sowa
c8e6ad0829 ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg
In case we decide in udp6_sendmsg to send the packet down the ipv4
udp_sendmsg path because the destination is either of family AF_INET or
the destination is an ipv4 mapped ipv6 address, we don't honor the
maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO.

We simply can check for this option in ip_cmsg_send because no calls to
ipv6 module functions are needed to do so.

Reported-by: Gert Doering <gert@space.net>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 16:28:42 -05:00
Johan Hedberg
95fbac8a8e Bluetooth: Add support for sending New IRK event
This patch adds the necessary helper function to send the New IRK mgmt
event and makes sure that the function is called at when SMP key
distribution has completed. The event is sent before the New LTK event
so user space knows which remote device to associate with the keys.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-19 08:04:24 -08:00
Johan Hedberg
35d702719d Bluetooth: Move SMP LTK notification after key distribution
This patch moves the SMP Long Term Key notification over mgmt from the
hci_add_ltk function to smp.c when both sides have completed their key
distribution. This way we are also able to update the identity address
into the mgmt_new_ltk event.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-19 08:04:24 -08:00
Johan Hedberg
ba74b666b5 Bluetooth: Move New LTK store hint evaluation into mgmt_new_ltk
It's simpler (one less if-statement) to just evaluate the appropriate
value for store_hint in the mgmt_new_ltk function than to pass a boolean
parameter to the function. Furthermore, this simplifies moving the mgmt
event emission out from hci_add_ltk in subsequent patches.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-19 08:04:23 -08:00
Johan Hedberg
ca9142b882 Bluetooth: Return added key when adding LTKs and IRKs
The SMP code will need to postpone the mgmt event emission for the IRK
and LTKs. To avoid extra lookups at the end of the key distribution
simply return the added value from the add_ltk and add_irk functions.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-19 08:04:23 -08:00
Masanari Iida
e227867f12 treewide: Fix typo in Documentation/DocBook
This patch fix spelling typo in Documentation/DocBook.
It is because .html and .xml files are generated by make htmldocs,
I have to fix a typo within the source files.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-02-19 14:58:17 +01:00
Florian Westphal
d2bf2f34cc netfilter: nft_ct: labels get support
This also adds NF_CT_LABELS_MAX_SIZE so it can be re-used
as BUILD_BUG_ON in nft_ct.

At this time, nft doesn't yet support writing to the label area;
when this changes the label->words handling needs to be moved
out of xt_connlabel.c into nf_conntrack_labels.c.

Also removes a useless run-time check: words cannot grow beyond
4 (32 bit) or 2 (64bit) since xt_connlabel enforces a maximum of
128 labels.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-19 11:41:25 +01:00
Steffen Klassert
1a1ccc96ab xfrm: Remove caching of xfrm_policy_sk_bundles
We currently cache socket policy bundles at xfrm_policy_sk_bundles.
These cached bundles are never used. Instead we create and cache
a new one whenever xfrm_lookup() is called on a socket policy.

Most protocols cache the used routes to the socket, so let's
remove the unused caching of socket policy bundles in xfrm.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-19 10:35:43 +01:00
David S. Miller
1e8d6421cf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_3ad.h
	drivers/net/bonding/bond_main.c

Two minor conflicts in bonding, both of which were overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 01:24:22 -05:00
Jiri Pirko
f7b12606b5 rtnl: make ifla_policy static
The only place this is used outside rtnetlink.c is veth. So provide
wrapper function for this usage.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-18 18:15:42 -05:00
Johan Hedberg
f4a407bef2 Bluetooth: Wait for SMP key distribution completion when pairing
When we initiate pairing through mgmt_pair_device the code has so far
been waiting for a successful HCI Encrypt Change event in order to
respond to the mgmt command. However, putting privacy into the play we
actually want the key distribution to be complete before replying so
that we can include the Identity Address in the mgmt response.

This patch updates the various hci_conn callbacks for LE in mgmt.c to
only respond in the case of failure, and adds a new mgmt_smp_complete
function that the SMP code will call once key distribution has been
completed.

Since the smp_chan_destroy function that's used to indicate completion
and clean up the SMP context can be called from various places,
including outside of smp.c, the easiest way to track failure vs success
is a new flag that we set once key distribution has been successfully
completed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 11:48:55 -08:00
Johan Hedberg
387a33e304 Bluetooth: Fix updating Identity Address in L2CAP channels
When we receive a remote identity address during SMP key distribution we
should ensure that any associated L2CAP channel instances get their
address information correspondingly updated (so that e.g. doing
getpeername on associated sockets returns the correct address).

This patch adds a new L2CAP core function l2cap_conn_update_id_addr()
which is used to iterate through all L2CAP channels associated with a
connection and update their address information.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 11:48:55 -08:00
Johan Hedberg
2426f3a594 Bluetooth: Add convenience function for fetching IRKs
There are many situations where we need to check if an LE address is an
RPA and if so try to look up the IRK for it. To simplify such cases this
patch adds a convenience function for the job.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 08:58:21 -08:00
Johan Hedberg
a7ec73386c Bluetooth: Fix removing any IRKs when unpairing devices
When mgmt_unpair_device is called we should also remove any associated
IRKs. This patch adds a hci_remove_irk convenience function and ensures
that it's called when mgmt_unpair_device is called.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 08:58:20 -08:00
Johan Hedberg
35f7498a87 Bluetooth: Remove return values from functions that don't need them
There are many functions that never fail but still declare an integer
return value for no reason. This patch converts these functions to use a
void return value to avoid any confusion of whether they can fail or not.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 08:58:20 -08:00
Johan Hedberg
e0b2b27e62 Bluetooth: Fix missing address type check for removing LTKs
When removing Long Term Keys we should also be checking that the given
address type (public vs random) matches. This patch updates the
hci_remove_ltk function to take an extra parameter and uses it for
address type matching.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 08:58:20 -08:00
Johan Hedberg
41edf1601a Bluetooth: Implement mgmt_load_irks command
This patch implements the Load IRKs command for the management
interface. The command is used to load the kernel with the initial set
of IRKs. It also sets a HCI_RPA_RESOLVING flag to indicate that we can
start requesting devices to distribute their IRK to us.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 00:47:03 -08:00
Johan Hedberg
301cb2d85e Bluetooth: Add hci_bdaddr_is_rpa convenience function
When implementing support for Resolvable Private Addresses (RPAs) we'll
need to in several places be able to identify such addresses. This patch
adds a simple convenience function to do the identification of the
address type.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 00:47:03 -08:00
Johan Hedberg
970c4e4603 Bluetooth: Add basic IRK management support
This patch adds the initial IRK storage and management functions to the
HCI core. This includes storing a list of IRKs per HCI device and the
ability to add, remove and lookup entries in that list.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 00:47:03 -08:00
Johan Hedberg
99780a7b63 Bluetooth: Add AES crypto context for each HCI device
Previously the crypto context has only been available for LE SMP
sessions, but now that we'll need to perform operations also during
discovery it makes sense to have this context part of the hci_dev
struct. Later, the context can be removed from the SMP context.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-18 00:47:02 -08:00
Phoebe Buckheister
4244db1b0b ieee802154: add netlink APIs for smartMAC configuration
Introduce new netlink attributes for SET_PHY_ATTRS:
 * CSMA minimal backoff exponent
 * CSMA maximal backoff exponent
 * CSMA retry limit
 * frame retransmission limit

The CSMA attributes shall correspond to minBE, maxBE and maxCSMABackoffs of
802.15.4, respectively. The frame retransmission shall correspond to
maxFrameRetries of 802.15.4, unless given as -1: then the old behaviour
of the stack shall apply. For RF2xy, the old behaviour is to not do
channel sensing at all and simply send *right now*, which is not
intended behaviour for most applications and actually prohibited for
some channel/page combinations.

For all values except frame retransmission limit, the defaults of
802.15.4 apply. Frame retransmission limits are set to -1 to indicate
backward-compatible behaviour.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:39 -05:00
Phoebe Buckheister
6ca001978d ieee802154: add support for setting CCA energy detection levels
Since three of the four clear channel assesment modes make use of energy
detection, provide an API to set the energy detection threshold.
Driver support for this is available in at86rf230 for the RF212 chips.
Since for these chips the minimal energy detection threshold depends on
page and channel used, add a field to struct at86rf230_local that stores
the minimal threshold. Actual ED thresholds are configured as offsets
from this value.

For RF212, setting the ED threshold will not work before a channel/page
has been set due to the dependency of energy detection in the chip and
the actual channel/page selected.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:38 -05:00
Phoebe Buckheister
ba08fea53a ieee802154: add support for CCA mode in wpan phys
The standard describes four modes of clear channel assesment: "energy
above threshold", "carrier found", and the logical and/or of these two.
Support for CCA mode setting is included in the at86rf230 driver,
predicated for RF212 chips.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:38 -05:00
Phoebe Buckheister
84dda3c648 ieee802154: add support for listen-before-talk in wpan_phy
Listen-before-talk is an alternative to CSMA in uncoordinated networks
and prescribed by european regulations if one wants to have a device
with radio duty cycles above 10% (or less in some bands). Add a phy
property to enable/disable LBT in the phy, including support in the
at86rf230 driver for RF212 chips.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:38 -05:00
Phoebe Buckheister
9b2777d608 ieee802154: add TX power control to wpan_phy
Replace the current u8 transmit_power in wpan_phy with s8 transmit_power.
The u8 field contained the actual tx power and a tolerance field,
which no physical radio every used. Adjust sysfs entries to keep
compatibility with userspace, give tolerances of +-1dB statically there.

This patch only adds support for this in the at86rf230 driver and the
RF212 chip. Configuration calculation for RF212 is also somewhat basic,
but does the job - the RF212 datasheet gives a large table with
suggested values for combinations of TX power and page/channel, if this
does not work well, we might have to copy the whole table.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:38 -05:00
Nicolas Dichtel
d3623099d3 ipsec: add support of limited SA dump
The goal of this patch is to allow userland to dump only a part of SA by
specifying a filter during the dump.
The kernel is in charge to filter SA, this avoids to generate useless netlink
traffic (it save also some cpu cycles). This is particularly useful when there
is a big number of SA set on the system.

Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
struct netlink_callback->args is defined as a array of 6 long and the first long
is used in xfrm code to flag the cb as initialized. Hence, we must have:
sizeof(struct xfrm_state_walk) <= sizeof(long) * 5.
With the union, it was false on arm (sizeof(struct xfrm_state_walk) was
sizeof(long) * 7), due to the padding.
In fact, whatever the arch is, this union seems useless, there will be always
padding after it. Removing it will not increase the size of this struct (and
reduce it on arm).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-17 07:18:19 +01:00
Matija Glavinic Pecotic
ef2820a735 net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
Implementation of (a)rwnd calculation might lead to severe performance issues
and associations completely stalling. These problems are described and solution
is proposed which improves lksctp's robustness in congestion state.

1) Sudden drop of a_rwnd and incomplete window recovery afterwards

Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
but size of sk_buff, which is blamed against receiver buffer, is not accounted
in rwnd. Theoretically, this should not be the problem as actual size of buffer
is double the amount requested on the socket (SO_RECVBUF). Problem here is
that this will have bad scaling for data which is less then sizeof sk_buff.
E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
of traffic of this size (less then 100B).

An example of sudden drop and incomplete window recovery is given below. Node B
exhibits problematic behavior. Node A initiates association and B is configured
to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
message in 4G (LTE) network). On B data is left in buffer by not reading socket
in userspace.

Lets examine when we will hit pressure state and declare rwnd to be 0 for
scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
chunk is sent in separate sctp packet)

Logic is implemented in sctp_assoc_rwnd_decrease:

socket_buffer (see below) is maximum size which can be held in socket buffer
(sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)

A simple expression is given for which it will be examined after how many
packets for above stated parameters we enter pressure state:

We start by condition which has to be met in order to enter pressure state:

	socket_buffer < currently_alloced;

currently_alloced is represented as size of sctp packets received so far and not
yet delivered to userspace. x is the number of chunks/packets (since there is no
bundling, and each chunk is delivered in separate packet, we can observe each
chunk also as sctp packet, and what is important here, having its own sk_buff):

	socket_buffer < x*each_sctp_packet;

each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
twice the amount of initially requested size of socket buffer, which is in case
of sctp, twice the a_rwnd requested:

	2*rwnd < x*(payload+sizeof(struc sk_buff));

sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
and each payload size is 43

	20000 < x(43+190);

	x > 20000/233;

	x ~> 84;

After ~84 messages, pressure state is entered and 0 rwnd is advertised while
received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
drop from 6474 to 0, as it will be now shown in example:

IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017]
IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839]
IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]
<...>
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]

--> Sudden drop

IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

At this point, rwnd_press stores current rwnd value so it can be later restored
in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
adding amount which is read from userspace. Let us observe values in above
example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
amount of actual sctp data currently waiting to be delivered to userspace
is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
only for sctp data, which is ~3500. Condition is never met, and when userspace
reads all data, rwnd stays on 3569.

IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

--> At this point userspace read everything, rwnd recovered only to 3569

IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

Reproduction is straight forward, it is enough for sender to send packets of
size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.

2) Minute size window for associations sharing the same socket buffer

In case multiple associations share the same socket, and same socket buffer
(sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
of the associations can permanently drop rwnd of other association(s).

Situation will be typically observed as one association suddenly having rwnd
dropped to size of last packet received and never recovering beyond that point.
Different scenarios will lead to it, but all have in common that one of the
associations (let it be association from 1)) nearly depleted socket buffer, and
the other association blames socket buffer just for the amount enough to start
the pressure. This association will enter pressure state, set rwnd_press and
announce 0 rwnd.
When data is read by userspace, similar situation as in 1) will occur, rwnd will
increase just for the size read by userspace but rwnd_press will be high enough
so that association doesn't have enough credit to reach rwnd_press and restore
to previous state. This case is special case of 1), being worse as there is, in
the worst case, only one packet in buffer for which size rwnd will be increased.
Consequence is association which has very low maximum rwnd ('minute size', in
our case down to 43B - size of packet which caused pressure) and as such
unusable.

Scenario happened in the field and labs frequently after congestion state (link
breaks, different probabilities of packet drop, packet reordering) and with
scenario 1) preceding. Here is given a deterministic scenario for reproduction:

>From node A establish two associations on the same socket, with rcvbuf_policy
being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
bring it down close to 0, as in just one more packet would close it. This has as
a consequence that association number 2 is able to receive (at least) one more
packet which will bring it in pressure state. E.g. if association 2 had rwnd of
10000, packet received was 43, and we enter at this point into pressure,
rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
increase for 43, but conditions to restore rwnd to original state, just as in
1), will never be satisfied.

--> Association 1, between A.y and B.12345

IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569]
IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613]
IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.55915: sctp (1) [COOKIE ACK]

--> Association 2, between A.z and B.12346

IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173]
IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240]
IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
IP B.12346 > A.55915: sctp (1) [COOKIE ACK]

--> Deplete socket buffer by sending messages of size 43B over association 1

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]

<...>

IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]

--> Sudden drop on 1

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
    association 1 so there is place in buffer for only one more packet

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

--> Socket buffer is almost depleted, but there is space for one more packet,
    send them over association 2, size 43B

IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Immediate drop

IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Read everything from the socket, both association recover up to maximum rwnd
    they are capable of reaching, note that association 1 recovered up to 3698,
    and association 2 recovered only to 43

IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

A careful reader might wonder why it is necessary to reproduce 1) prior
reproduction of 2). It is simply easier to observe when to send packet over
association 2 which will push association into the pressure state.

Proposed solution:

Both problems share the same root cause, and that is improper scaling of socket
buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
calculating rwnd is not possible due to fact that there is no linear
relationship between amount of data blamed in increase/decrease with IP packet
in which payload arrived. Even in case such solution would be followed,
complexity of the code would increase. Due to nature of current rwnd handling,
slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
entered is rationale, but it gives false representation to the sender of current
buffer space. Furthermore, it implements additional congestion control mechanism
which is defined on implementation, and not on standard basis.

Proposed solution simplifies whole algorithm having on mind definition from rfc:

o  Receiver Window (rwnd): This gives the sender an indication of the space
   available in the receiver's inbound buffer.

Core of the proposed solution is given with these lines:

sctp_assoc_rwnd_update:
	if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
		asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
	else
		asoc->rwnd = 0;

We advertise to sender (half of) actual space we have. Half is in the braces
depending whether you would like to observe size of socket buffer as SO_RECVBUF
or twice the amount, i.e. size is the one visible from userspace, that is,
from kernelspace.
In this way sender is given with good approximation of our buffer space,
regardless of the buffer policy - we always advertise what we have. Proposed
solution fixes described problems and removes necessity for rwnd restoration
algorithm. Finally, as proposed solution is simplification, some lines of code,
along with some bytes in struct sctp_association are saved.

Version 2 of the patch addressed comments from Vlad. Name of the function is set
to be more descriptive, and two parts of code are changed, in one removing the
superfluous call to sctp_assoc_rwnd_update since call would not result in update
of rwnd, and the other being reordering of the code in a way that call to
sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2
in a way that existing function is not reordered/copied in line, but it is
correctly called. Thanks Vlad for suggesting.

Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:16:56 -05:00
Thierry Escande
12e3d241e4 NFC: digital: Add poll support for type 4A tag platform
This adds support for ATS request and response handling for type 4A tag
activation.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-02-16 23:49:54 +01:00
Mark A. Greer
e487e4dc2e NFC: Add ISO/IEC 15693 header definitions
Add the header definitions required by upcoming
patches that add support for ISO/IEC 15693.

Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-02-16 23:49:53 +01:00
Peter Hurley
72e5108c6d Bluetooth: Don't fail RFCOMM tty writes
The tty driver api design prefers no-fail writes if the driver
write_room() method has previously indicated space is available
to accept writes. Since this is trivially possible for the
RFCOMM tty driver, do so.

Introduce rfcomm_dlc_send_noerror(), which queues but does not
schedule the krfcomm thread if the dlc is not yet connected
(and thus does not error based on the connection state).
The mtu size test is also unnecessary since the caller already
chunks the written data into mtu size.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-14 13:39:32 -08:00
Peter Hurley
c10a848cea Bluetooth: Verify dlci not in use before rfcomm_dev create
Only one session/channel combination may be in use at any one
time. However, the failure does not occur until the tty is
opened (in rfcomm_dlc_open()).

Because these settings are actually bound at rfcomm device
creation (via RFCOMMCREATEDEV ioctl), validate and fail before
creating the rfcomm tty device.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-14 13:39:30 -08:00
Peter Hurley
80ea73378a Bluetooth: Fix unreleased rfcomm_dev reference
When RFCOMM_RELEASE_ONHUP is set, the rfcomm tty driver 'takes over'
the initial rfcomm_dev reference created by the RFCOMMCREATEDEV ioctl.
The assumption is that the rfcomm tty driver will release the
rfcomm_dev reference when the tty is freed (in rfcomm_tty_cleanup()).
However, if the tty is never opened, the 'take over' never occurs,
so when RFCOMMRELEASEDEV ioctl is called, the reference is not
released.

Track the state of the reference 'take over' so that the release
is guaranteed by either the RFCOMMRELEASEDEV ioctl or the rfcomm tty
driver.

Note that the synchronous hangup in rfcomm_release_dev() ensures
that rfcomm_tty_install() cannot race with the RFCOMMRELEASEDEV ioctl.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-14 13:39:29 -08:00
Peter Hurley
1c64834e06 Bluetooth: Release rfcomm_dev only once
No logic prevents an rfcomm_dev from being released multiple
times. For example, if the rfcomm_dev ref count is large due
to pending tx, then multiple RFCOMMRELEASEDEV ioctls may
mistakenly release the rfcomm_dev too many times. Note that
concurrent ioctls are not required to create this condition.

Introduce RFCOMM_DEV_RELEASED status bit which guarantees the
rfcomm_dev can only be released once.

NB: Since the flags are exported to userspace, introduce the status
field to track state for which userspace should not be aware.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-14 13:39:29 -08:00
Johan Hedberg
9b7655eafe Bluetooth: Enable LE L2CAP CoC support by default
Now that the LE L2CAP Connection Oriented Channel support has undergone a
decent amount of testing we can make it officially supported. This patch
removes the enable_lecoc module parameter which was previously needed to
enable support for LE L2CAP CoC.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-14 13:39:12 -08:00
Stanislav Fomichev
45f7435968 tcp: remove unused min_cwnd member of tcp_congestion_ops
Commit 684bad1107 "tcp: use PRR to reduce cwin in CWR state" removed all
calls to min_cwnd, so we can safely remove it.
Also, remove tcp_reno_min_cwnd because it was only used for min_cwnd.

Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 18:22:34 -05:00
Johannes Berg
d85dad7556 mac80211: remove erroneous comment about RX radiotap header
There's no way the driver can pre-build the radiotap header,
so remove the comment stating that it can.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-13 13:27:42 +01:00
Andre Guedes
15819a7065 Bluetooth: Introduce connection parameters list
This patch adds to hdev the connection parameters list (hdev->le_
conn_params). The elements from this list (struct hci_conn_params)
contains the connection parameters (for now, minimum and maximum
connection interval) that should be used during the connection
establishment.

Moreover, this patch adds helper functions to manipulate hdev->le_
conn_params list. Some of these functions are also declared in
hci_core.h since they will be used outside hci_core.c in upcoming
patches.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-13 09:51:44 +02:00
Marcel Holtmann
424ef94311 Bluetooth: Add constants for LTK key types
The LTK key types available right now are unauthenticated and
authenticated ones. Provide two simple constants for it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:44 +02:00
Marcel Holtmann
03c515d748 Bluetooth: Remove __packed from struct smp_ltk
The struct smp_ltk does not need to be packed and so remove __packed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:43 +02:00
Marcel Holtmann
d40f3eef0b Bluetooth: Rename authentication to key_type in mgmt_ltk_info
The field is not a boolean, it is actually a field for a key type. So
name it properly.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:43 +02:00
Marcel Holtmann
abf76bad8f Bluetooth: Track the AES-CCM encryption status of LE and BR/EDR links
When encryption for LE links has been enabled, it will always be use
AES-CCM encryption. In case of BR/EDR Secure Connections, the link
will also use AES-CCM encryption. In both cases track the AES-CCM
status in the connection flags.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:42 +02:00
Marcel Holtmann
4e39ac8136 Bluetooth: Add management command to allow use of debug keys
Originally allowing the use of debug keys was done via the Load Link
Keys management command. However this is BR/EDR specific and to be
flexible and allow extending this to LE as well, make this an independent
command.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:42 +02:00
Marcel Holtmann
b1de97d8c0 Bluetooth: Add management setting for use of debug keys
When the controller has been enabled to allow usage of debug keys, then
clearly identify that in the current settings information.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:41 +02:00
Andre Guedes
5c136e90a4 Bluetooth: Group list_head fields from strcut hci_dev together
This patch groups the list_head fields from struct hci_dev together
and removes empty lines between them.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-13 09:51:41 +02:00
Andre Guedes
1e406eefbe Bluetooth: Save connection interval parameters in hci_conn
This patch creates two new fields in struct hci_conn to save the
minimum and maximum connection interval values used to establish
the connection this object represents.

This change is required in order to know what parameters the
connection is currently using.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-13 09:51:41 +02:00
Johan Hedberg
98a0b845c6 Bluetooth: Fix differentiating stored master vs slave LTK types
If LTK distribution happens in both directions we will have two LTKs for
the same remote device: one which is used when we're connecting as
master and another when we're connecting as slave. When looking up LTKs
from the locally stored list we shouldn't blindly return the first match
but also consider which type of key is in question. If we do not do this
we may end up selecting an incorrect encryption key for a connection.

This patch fixes the issue by always specifying to the LTK lookup
functions whether we're looking for a master or a slave key.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-13 09:51:41 +02:00
Johan Hedberg
2338a7e044 Bluetooth: Rename L2CAP_CHAN_CONN_FIX_A2MP to L2CAP_CHAN_FIXED
There's no reason why A2MP should need or deserve its on channel type.
Instead we should be able to group all fixed CID users under a single
channel type and reuse as much code as possible for them. Where CID
specific exceptions are needed the chan-scid value can be used.

This patch renames the current A2MP channel type to a generic one and
thereby paves the way to allow converting ATT and SMP (and any future
fixed channel protocols) to use the new channel type.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-13 09:51:37 +02:00
Johan Hedberg
61a939c68e Bluetooth: Queue incoming ACL data until BT_CONNECTED state is reached
This patch adds a queue for incoming L2CAP data that's received before
l2cap_connect_cfm is called and processes the data once
l2cap_connect_cfm is called. This way we ensure that we have e.g. all
remote features before processing L2CAP signaling data (which is very
important for making the correct security decisions).

The processing of the pending rx data needs to be done through
queue_work since unlike l2cap_recv_acldata, l2cap_connect_cfm is called
with the hci_dev lock held which could cause potential deadlocks.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-02-13 09:51:36 +02:00
Marcel Holtmann
134c2a89af Bluetooth: Add debugfs entry to show Secure Connections Only mode
For debugging purposes of Secure Connection Only support a simple
debugfs entry is used to indicate if this mode is active or not.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:35 +02:00
Marcel Holtmann
2c068e0b92 Bluetooth: Handle security level 4 for RFCOMM connections
With the introduction of security level 4, the RFCOMM sockets need to
be made aware of this new level. This change ensures that the pairing
requirements are set correctly for these connections.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:35 +02:00
Marcel Holtmann
7d513e9243 Bluetooth: Handle security level 4 for L2CAP connections
With the introduction of security level 4, the L2CAP sockets need to
be made aware of this new level. This change ensures that the pairing
requirements are set correctly for these connections.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:35 +02:00
Marcel Holtmann
7b5a9241b7 Bluetooth: Introduce requirements for security level 4
The security level 4 is a new strong security requirement that is based
around 128-bit equivalent strength for link and encryption keys required
using FIPS approved algorithms. Which means that E0, SAFER+ and P-192
are not allowed. Only connections created with P-256 resulting from
using Secure Connections support are allowed.

This security level needs to be enforced when Secure Connection Only
mode is enabled for a controller or a service requires FIPS compliant
strong security. Currently it is not possible to enable either of
these two cases. This patch just puts in the foundation for being
able to handle security level 4 in the future.

It should be noted that devices or services with security level 4
requirement can only communicate using Bluetooth 4.1 controllers
with support for Secure Connections. There is no backward compatibilty
if used with older hardware.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:35 +02:00
Marcel Holtmann
eb9a8f3fb6 Bluetooth: Track Secure Connections support of remote devices
It is important to know if Secure Connections support has been enabled
for a given remote device. The information is provided in the remote
host features page. So track this information and provide a simple
helper function to extract the status.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:35 +02:00
Marcel Holtmann
ec1091131f Bluetooth: Add support for remote OOB input of P-256 data
The current management interface only allows to provide the remote
OOB input of P-192 data. This extends the command to also accept
P-256 data as well. To make this backwards compatible, the userspace
can decide to only provide P-192 data or the combined P-192 and P-256
data. It is also allowed to leave the P-192 data empty if userspace
only has the remote P-256 data.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:34 +02:00
Marcel Holtmann
0798872ef1 Bluetooth: Add internal function for storing P-192 and P-256 data
Add function to allow adding P-192 and P-256 data to the internal
storage. This also fixes a few coding style issues from the previous
helper functions for the out-of-band credentials storage.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:34 +02:00
Marcel Holtmann
519ca9d017 Bluetooth: Provide remote OOB data for Secure Connections
When Secure Connections has been enabled it is possible to provide P-192
and/or P-256 data during the pairing process. The internal out-of-band
credentials storage has been extended to also hold P-256 data.

Initially the P-256 data will be empty and with Secure Connections enabled
no P-256 data will be provided. This is according to the specification
since it might be possible that the remote side did not provide either
of the out-of-band credentials.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:33 +02:00
Marcel Holtmann
5afeac149e Bluetooth: Add debugfs quirk for forcing Secure Connections support
The Bluetooth 4.1 specification with Secure Connections support has
just been released and controllers with this feature are still in
an early stage.

A handful of controllers have already support for it, but they do
not always identify this feature correctly. This debugfs entry
allows to tell the kernel that the controller can be treated as
it would fully support Secure Connections.

Using debugfs to force Secure Connections support of course does
not make this feature magically appear in all controllers. This
is a debug functionality for early adopters. Once the majority
of controllers matures this quirk will be removed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:33 +02:00
Marcel Holtmann
4d2d279626 Bluetooth: Add support for local OOB data with Secure Connections
For Secure Connections support and the usage of out-of-band pairing,
it is needed to read the P-256 hash and randomizer or P-192 hash and
randomizer. This change will read P-192 data when Secure Connections
is disabled and P-192 and P-256 data when it is enabled.

The difference is between using HCI Read Local OOB Data and using the
new HCI Read Local OOB Extended Data command. The first one has been
introduced with Bluetooth 2.1 and returns only the P-192 data.

< HCI Command: Read Local OOB Data (0x03|0x0057) plen 0
> HCI Event: Command Complete (0x0e) plen 36
      Read Local OOB Data (0x03|0x0057) ncmd 1
        Status: Success (0x00)
        Hash C from P-192: 975a59baa1c4eee391477cb410b23e6d
        Randomizer R with P-192: 9ee63b7dec411d3b467c5ae446df7f7d

The second command has been introduced with Bluetooth 4.1 and will
return P-192 and P-256 data.

< HCI Command: Read Local OOB Extended Data (0x03|0x007d) plen 0
> HCI Event: Command Complete (0x0e) plen 68
      Read Local OOB Extended Data (0x03|0x007d) ncmd 1
        Status: Success (0x00)
        Hash C from P-192: 6489731804b156fa6355efb8124a1389
        Randomizer R with P-192: 4781d5352fb215b2958222b3937b6026
        Hash C from P-256: 69ef8a928b9d07fc149e630e74ecb991
        Randomizer R with P-256: 4781d5352fb215b2958222b3937b6026

The change for the management interface is transparent and no change
is required for existing userspace. The Secure Connections feature
needs to be manually enabled. When it is disabled, then userspace
only gets the P-192 returned and with Secure Connections enabled,
userspace gets P-192 and P-256 in an extended structure.

It is also acceptable to just ignore the P-256 data since it is not
required to support them. The pairing with out-of-band credentials
will still succeed. However then of course no Secure Connection will
b established.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:33 +02:00
Marcel Holtmann
eac83dc632 Bluetooth: Add management command for enabling Secure Connections
The support for Secure Connections need to be explicitly enabled by
userspace. This is required since only userspace that can handle the
new link key types should enable support for Secure Connections.

This command handling is similar to how Secure Simple Pairing enabling
is done. It also tracks the case when Secure Connections support is
enabled via raw HCI commands. This makes sure that the host features
page is updated as well.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:32 +02:00
Marcel Holtmann
e98d2ce293 Bluetooth: Add flags and setting for Secure Connections support
The MGMT_SETTING_SECURE_CONN setting is used to track the support and
status for Secure Connections from the management interface. For HCI
based tracking HCI_SC_ENABLED flag is used.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:32 +02:00
Marcel Holtmann
11015c7903 Bluetooth: Add definitions for new link key types
With the introduction of Secure Connections, the list of link key types
got extended by P-256 versions of authenticated and unauthenticated
link keys.

To avoid any confusion the previous authenticated and unauthenticated
link key types got ammended with a P912 postfix. And the two new keys
have a P256 postfix now. Existing code using the previous definitions
has been adjusted.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:31 +02:00
Marcel Holtmann
e2f9913157 Bluetooth: Add HCI command definition for extended OOB data
The Secure Connections feature introduces the support for P-256 strength
pairings (compared to P-192 with Secure Simple Pairing). This however
means that for out-of-band pairing the hash and randomizer needs to be
differentiated. Two new commands are introduced to handle the possible
combinations of P-192 and P-256. This add the HCI command definition
for both.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:31 +02:00
Marcel Holtmann
eb4b95c627 Bluetooth: Add HCI command definition for Secure Connections enabling
The Secure Connections feature is optional and host stacks have to
manually enable it. This add the HCI command definiton for reading
and writing this setting.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:31 +02:00
Marcel Holtmann
d5991585d0 Bluetooth: Add LMP feature definitions for Secure Connections support
The support for Secure Connections introduces two new controller
features and one new host feature.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:31 +02:00
Horia Geanta
0f24558e91 xfrm: avoid creating temporary SA when there are no listeners
In the case when KMs have no listeners, km_query() will fail and
temporary SAs are garbage collected immediately after their allocation.
This causes strain on memory allocation, leading even to OOM since
temporary SA alloc/free cycle is performed for every packet
and garbage collection does not keep up the pace.

The sane thing to do is to make sure we have audience before
temporary SA allocation.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-13 07:40:30 +01:00
WANG Cong
55334a5db5 net_sched: act: refuse to remove bound action outside
When an action is bonnd to a filter, there is no point to
remove it outside. Currently we just silently decrease the refcnt,
we should reject this explicitly with EPERM.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 19:23:32 -05:00
WANG Cong
4f1e9d8949 net_sched: act: move tcf_hashinfo_init() into tcf_register_action()
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 19:23:32 -05:00
WANG Cong
a5b5c958ff net_sched: act: refactor cleanup ops
For bindcnt and refcnt etc., they are common for all actions,
not need to repeat such operations for their own, they can be unified
now. Actions just need to do its specific cleanup if needed.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 19:23:32 -05:00
WANG Cong
86062033fe net_sched: act: hide struct tcf_common from API
Now we can totally hide it from modules. tcf_hash_*() API's
will operate on struct tc_action, modules don't need to care about
the details.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 19:23:32 -05:00
Johannes Berg
06d181a8fd mac80211: add NAPI support back
NAPI was originally added to mac80211 a long time ago (by John in
commit 4e6cbfd09c in July 2010), but then removed years later
(by Stanislaw in commit 30c97120c6 in February 2013). No driver
ever used it, so that was fine.

Now I'm adding support for NAPI to our driver, so add some code
to mac80211 again  to support NAPI. John was originally wrapping
some (but not nearly all NAPI-related functions), but that doesn't
scale very well with the number of functions that are there, some
of which are even only inlines. Thus, instead of doing that, let
the drivers manage the NAPI struct, except for napi_add() which is
needed so mac80211 knows how to call napi_gro_receive().

Also remove some no longer needed definitions that were left when
NAPI support was removed.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-12 21:37:27 +01:00
Fan Du
ca925cf153 flowcache: Make flow cache name space aware
Inserting a entry into flowcache, or flushing flowcache should be based
on per net scope. The reason to do so is flushing operation from fat
netns crammed with flow entries will also making the slim netns with only
a few flow cache entries go away in original implementation.

Since flowcache is tightly coupled with IPsec, so it would be easier to
put flow cache global parameters into xfrm namespace part. And one last
thing needs to do is bumping flow cache genid, and flush flow cache should
also be made in per net style.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-12 07:02:11 +01:00
Eliad Peller
448cd2e248 mac80211: reset probe_send_count also in HW_CONNECTION_MONITOR case
In case of beacon_loss with IEEE80211_HW_CONNECTION_MONITOR
device, mac80211 probes the ap (and disconnects on timeout)
but ignores the ack.

If we already got an ack, there's no reason to continue
disconnecting. this can help devices that supports
IEEE80211_HW_CONNECTION_MONITOR only partially (e.g. take
care of keep alives, but does not probe the ap.

In case the device wants to disconnect without probing,
it can just call ieee80211_connection_loss.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-11 12:58:32 +01:00
Rashika Kheria
535d3ae9c8 net: Move prototype declaration to header file include/net/net_namespace.h from net/ipx/af_ipx.c
Move prototype declaration of function to header file
include/net/net_namespace.h from net/ipx/af_ipx.c because they are used
by more than one file.

This eliminates the following warning in net/ipx/sysctl_net_ipx.c:
net/ipx/sysctl_net_ipx.c:33:6: warning: no previous prototype for ‘ipx_register_sysctl’ [-Wmissing-prototypes]
net/ipx/sysctl_net_ipx.c:38:6: warning: no previous prototype for ‘ipx_unregister_sysctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria
7780d8ae4a net: Move prototype declaration to header file include/net/datalink.h from net/ipx/af_ipx.c
Move prototype declarations of function to header file
include/net/datalink.h from net/ipx/af_ipx.c because they are used by
more than one file.

This eliminates the following warning in net/ipx/pe2.c:
net/ipx/pe2.c:20:24: warning: no previous prototype for ‘make_EII_client’ [-Wmissing-prototypes]
net/ipx/pe2.c:32:6: warning: no previous prototype for ‘destroy_EII_client’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria
578efbc19f net: Move prototype declaration to header file include/net/ipx.h from net/ipx/af_ipx.c
Move prototype declaration of functions to header file include/net/ipx.h
from net/ipx/af_ipx.c because they are used by more than one file.

This eliminates the following warning in
net/ipx/ipx_route.c:33:19: warning: no previous prototype for ‘ipxrtr_lookup’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:52:5: warning: no previous prototype for ‘ipxrtr_add_route’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:94:6: warning: no previous prototype for ‘ipxrtr_del_routes’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:149:5: warning: no previous prototype for ‘ipxrtr_route_skb’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:171:5: warning: no previous prototype for ‘ipxrtr_route_packet’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:261:5: warning: no previous prototype for ‘ipxrtr_ioctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria
493cc5e5ba net: Move prototype declaration to include/net/ipx.h from net/ipx/ipx_route.c
Move prototype definition of function to header file include/net/ipx.h
from net/ipx/ipx_route.c because they are used by more than one file.

This eliminates the following warning from net/ipx/af_ipx.c:
net/ipx/af_ipx.c:193:23: warning: no previous prototype for ‘ipxitf_find_using_net’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:577:5: warning: no previous prototype for ‘ipxitf_send’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:1219:8: warning: no previous prototype for ‘ipx_cksum’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:49 -08:00
Rashika Kheria
ab3301bd96 net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c
Move prototype declaration of functions to header file include/net/dn.h
from net/decnet/af_decnet.c because they are used by more than one file.

This eliminates the following warning in net/decnet/af_decnet.c:
net/decnet/sysctl_net_decnet.c:354:6: warning: no previous prototype for ‘dn_register_sysctl’ [-Wmissing-prototypes]
net/decnet/sysctl_net_decnet.c:359:6: warning: no previous prototype for ‘dn_unregister_sysctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:49 -08:00
Rashika Kheria
f56b8bf6e4 net: Move prototype declaration to appropriate header file from decnet/af_decnet.c
Move prototype declaration of functions to header file include/net/dn_route.h
from net/decnet/af_decnet.c because it is used by more than one file.

This eliminates the following warning in net/decnet/dn_route.c:
net/decnet/dn_route.c:629:5: warning: no previous prototype for ‘dn_route_rcv’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:49 -08:00
David S. Miller
f41f031960 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/nftables/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes, mostly nftables
fixes, most relevantly they are:

* Fix a crash in the h323 conntrack NAT helper due to expectation list
  corruption, from Alexey Dobriyan.

* A couple of RCU race fixes for conntrack, one manifests by hitting BUG_ON
  in nf_nat_setup_info() and the destroy path, patches from Andrey Vagin and
  me.

* Dump direction attribute in nft_ct only if it is set, from Arturo
  Borrero.

* Fix IPVS bug in its own connection tracking system that may lead to
  copying only 4 bytes of the IPv6 address when initializing the
  ip_vs_conn object, from Michal Kubecek.

* Fix -EBUSY errors in nftables when deleting the rules, chain and tables
  in a row due mixture of asynchronous and synchronous object releasing,
  from me.

* Three fixes for the nf_tables set infrastructure when using intervals and
  mappings, from me.

* Four patches to fixing the nf_tables log, reject and ct expressions from
  the new inet table, from Patrick McHardy.

* Fix memory overrun in the map that is used to dynamically allocate names
  from anonymous sets, also from Patrick.

* Fix a potential oops if you dump a set with NFPROTO_UNSPEC and a table
  name, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 14:20:00 -08:00
Tejun Heo
073219e995 cgroup: clean up cgroup_subsys names and initialization
cgroup_subsys is a bit messier than it needs to be.

* The name of a subsys can be different from its internal identifier
  defined in cgroup_subsys.h.  Most subsystems use the matching name
  but three - cpu, memory and perf_event - use different ones.

* cgroup_subsys_id enums are postfixed with _subsys_id and each
  cgroup_subsys is postfixed with _subsys.  cgroup.h is widely
  included throughout various subsystems, it doesn't and shouldn't
  have claim on such generic names which don't have any qualifier
  indicating that they belong to cgroup.

* cgroup_subsys->subsys_id should always equal the matching
  cgroup_subsys_id enum; however, we require each controller to
  initialize it and then BUG if they don't match, which is a bit
  silly.

This patch cleans up cgroup_subsys names and initialization by doing
the followings.

* cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
  cgroup_subsys with _cgrp_subsys.

* With the above, renaming subsys identifiers to match the userland
  visible names doesn't cause any naming conflicts.  All non-matching
  identifiers are renamed to match the official names.

  cpu_cgroup -> cpu
  mem_cgroup -> memory
  perf -> perf_event

* controllers no longer need to initialize ->subsys_id and ->name.
  They're generated in cgroup core and set automatically during boot.

* Redundant cgroup_subsys declarations removed.

* While updating BUG_ON()s in cgroup_init_early(), convert them to
  WARN()s.  BUGging that early during boot is stupid - the kernel
  can't print anything, even through serial console and the trap
  handler doesn't even link stack frame properly for back-tracing.

This patch doesn't introduce any behavior changes.

v2: Rebased on top of fe1217c4f3 ("net: net_cls: move cgroupfs
    classid handling into core").

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
2014-02-08 10:36:58 -05:00
Tejun Heo
af6363374c cgroup: make CONFIG_CGROUP_NET_PRIO bool and drop unnecessary init_netclassid_cgroup()
net_prio is the only cgroup which is allowed to be built as a module.
The savings from allowing one controller to be built as a module are
tiny especially given that cgroup module support itself adds quite a
bit of complexity.

Given that none of other controllers has much chance of being made a
module and that we're unlikely to add new modular controllers, the
added complexity is simply not justifiable.

As a first step to drop cgroup module support, this patch changes the
config option to bool from tristate and drops module related code from
it.

Also, while an earlier commit fe1217c4f3 ("net: net_cls: move
cgroupfs classid handling into core") dropped module support from
net_cls cgroup, it retained a call to cgroup_load_subsys(), which is
noop for built-in controllers.  Drop it along with
init_netclassid_cgroup().

v2: Removed modular version of task_netprioidx() in
    include/net/netprio_cgroup.h as suggested by Li Zefan.

v3: Rebased on top of fe1217c4f3 ("net: net_cls: move cgroupfs
    classid handling into core").  net_cls cgroup part is mostly
    dropped except for removal of init_netclassid_cgroup().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
2014-02-08 10:36:58 -05:00
Pablo Neira Ayuso
0165d9325d netfilter: nf_tables: fix racy rule deletion
We may lost race if we flush the rule-set (which happens asynchronously
via call_rcu) and we try to remove the table (that userspace assumes
to be empty).

Fix this by recovering synchronous rule and chain deletion. This was
introduced time ago before we had no batch support, and synchronous
rule deletion performance was not good. Now that we have the batch
support, we can just postpone the purge of old rule in a second step
in the commit phase. All object deletions are synchronous after this
patch.

As a side effect, we save memory as we don't need rcu_head per rule
anymore.

Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06 11:46:06 +01:00
Patrick McHardy
05513e9e33 netfilter: nf_tables: add reject module for NFPROTO_INET
Add a reject module for NFPROTO_INET. It does nothing but dispatch
to the AF-specific modules based on the hook family.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06 09:44:18 +01:00
Patrick McHardy
cc4723ca31 netfilter: nft_reject: split up reject module into IPv4 and IPv6 specifc parts
Currently the nft_reject module depends on symbols from ipv6. This is
wrong since no generic module should force IPv6 support to be loaded.
Split up the module into AF-specific and a generic part.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06 09:44:10 +01:00
Emmanuel Grumbach
63c361f511 mac80211: propagate STBC / LDPC flags to radiotap
This capabilities weren't propagated to the radiotap header.
We don't set here the VHT_KNOWN / MCS_HAVE flag because not
all the low level drivers will know how to properly flag
the frames, hence the low level driver will be in charge
of setting IEEE80211_RADIOTAP_MCS_HAVE_FEC,
IEEE80211_RADIOTAP_MCS_HAVE_STBC and / or
IEEE80211_RADIOTAP_VHT_KNOWN_STBC according to its
capabilities.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06 09:34:58 +01:00
Emmanuel Grumbach
1b8d242adb mac80211: move VHT related RX_FLAG to another variable
ieee80211_rx_status.flags is full. Define a new vht_flag
variable to be able to set more VHT related flags and make
room in flags.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com> [ath10k]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06 09:34:10 +01:00
Emmanuel Grumbach
0059b2b142 mac80211: remove unused radiotap vendor fields in ieee80211_rx_status
The purpose of this housekeeping is to make some room for
VHT flags. The radiotap vendor fields weren't in use.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06 09:33:46 +01:00
Patrick McHardy
64d46806b6 netfilter: nf_tables: add AF specific expression support
For the reject module, we need to add AF-specific implementations to
get rid of incorrect module dependencies. Try to load an AF-specific
module first and fall back to generic modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06 00:05:36 +01:00
Pablo Neira Ayuso
e53376bef2 netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.

Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().

A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.

CPU1                                    CPU2
____nf_conntrack_find()
                                        nf_ct_put()
                                         destroy_conntrack()
                                        ...
                                        init_conntrack
                                         __nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&ct->use) (use = 2)
                                         if (!l4proto->new(ct, skb, dataoff, timeouts))
                                          nf_conntrack_free(ct); (use = 2 !!!)
                                        ...
                                        __nf_conntrack_alloc (set use = 1)
 if (!nf_ct_key_equal(h, tuple, zone))
  nf_ct_put(ct); (use = 0)
   destroy_conntrack()
                                        /* continue to work with CT */

After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():

<4>[67096.759334] ------------[ cut here ]------------
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5

I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Vagin <avagin@parallels.com>
Cc: Florian Westphal <fw@strlen.de>
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-02-05 17:46:06 +01:00
Max Filippov
a13aff0641 net: ethoc: set up MII management bus clock
MII management bus clock is derived from the MAC clock by dividing it by
MIIMODER register CLKDIV field value. This value may need to be set up
in case it is undefined or its default value is too high (and
communication with PHY is too slow) or too low (and communication with
PHY is impossible). The value of CLKDIV is not specified directly, but
is derived from the MAC clock for the default MII management bus frequency
of 2.5MHz. The MAC clock may be specified in the platform data, or in
the 'clocks' device tree attribute.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-04 20:19:51 -08:00
Michal Kazior
9e0e29615a cfg80211: consider existing DFS interfaces
It was possible to break interface combinations in
the following way:

 combo 1: iftype = AP, num_ifaces = 2, num_chans = 2,
 combo 2: iftype = AP, num_ifaces = 1, num_chans = 1, radar = HT20

With the above interface combinations it was
possible to:

 step 1. start AP on DFS channel by matching combo 2
 step 2. start AP on non-DFS channel by matching combo 1

This was possible beacuse (step 2) did not consider
if other interfaces require radar detection.

The patch changes how cfg80211 tracks channels -
instead of channel itself now a complete chandef
is stored.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:58:17 +01:00
Antonio Quartulli
fe94f3a4ff cfg80211: fix channel configuration in IBSS join
When receiving an IBSS_JOINED event select the BSS object
based on the {bssid, channel} couple rather than the bssid
only.
With the current approach if another cell having the same
BSSID (but using a different channel) exists then cfg80211
picks up the wrong BSS object.
The result is a mismatching channel configuration between
cfg80211 and the driver, that can lead to any sort of
problem.

The issue can be triggered by having an IBSS sitting on
given channel and then asking the driver to create a new
cell using the same BSSID but with a different frequency.
By passing the channel to cfg80211_get_bss() we can solve
this ambiguity and retrieve/create the correct BSS object.
All the users of cfg80211_ibss_joined() have been changed
accordingly.

Moreover WARN when cfg80211_ibss_joined() gets a NULL
channel as argument and remove a bogus call of the same
function in ath6kl (it does not make sense to call
cfg80211_ibss_joined() with a zero BSSID on ibss-leave).

Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Arend van Spriel <arend@broadcom.com>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Cc: libertas-dev@lists.infradead.org
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
[minor code cleanup in ath6kl]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:58:16 +01:00
Johannes Berg
ea73cbce4e nl80211: fix scheduled scan RSSI matchset attribute confusion
The scheduled scan matchsets were intended to be a list of filters,
with the found BSS having to pass at least one of them to be passed
to the host. When the RSSI attribute was added, however, this was
broken and currently wpa_supplicant adds that attribute in its own
matchset; however, it doesn't intend that to mean that anything
that passes the RSSI filter should be passed to the host, instead
it wants it to mean that everything needs to also have higher RSSI.

This is semantically problematic because we have a list of filters
like [ SSID1, SSID2, SSID3, RSSI ] with no real indication which
one should be OR'ed and which one AND'ed.

To fix this, move the RSSI filter attribute into each matchset. As
we need to stay backward compatible, treat a matchset with only the
RSSI attribute as a "default RSSI filter" for all other matchsets,
but only if there are other matchsets (an RSSI-only matchset by
itself is still desirable.)

To make driver implementation easier, keep a global min_rssi_thold
for the entire request as well. The only affected driver is ath6kl.

I found this when I looked into the code after Raja Mani submitted
a patch fixing the n_match_sets calculation to disregard the RSSI,
but that patch didn't address the semantic issue.

Reported-by: Raja Mani <rmani@qti.qualcomm.com>
Acked-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:58:12 +01:00
Johannes Berg
cc01f9b55f mac80211: remove module handling from rate control ops
There's not a single rate control algorithm actually in
a separate module where the module refcount would be
required. Similarly, there's no specific rate control
module.

Therefore, all the module handling code in rate control
is really just dead code, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:26 +01:00
Johannes Berg
631ad703ba mac80211: make rate control ops const
Change the code to allow making all the rate control ops
const, nothing ever needs to change them. Also change all
drivers to make use of this and mark the ops const.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:21 +01:00
Janusz Dziedzic
0b9323f600 nl80211: add Guard Interval support for set_bitrate_mask
Allow to force SGI, LGI.
Mainly for test purpose.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:11 +01:00
Johannes Berg
4b5800fec6 cfg80211: make connect ie param const
This required liberally sprinkling 'const' over brcmfmac
and mwifiex but seems like a useful thing to do since the
pointer can't really be written.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:10 +01:00
Jouni Malinen
664834dee6 cfg80211: Clean up connect params and channel fetching
Addition of the frequency hints showed up couple of places in cfg80211
where pointers could be marked const and a shared function could be used
to fetch a valid channel.

Signed-off-by: Jouni Malinen <j@w1.fi>
[fix mwifiex]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:09 +01:00
Jouni Malinen
b43504cf75 cfg80211: Advertise maximum associated STAs in AP mode
This allows drivers to advertise the maximum number of associated
stations they support in AP mode (including P2P GO). User space
applications can use this for cleaner way of handling the limit (e.g.,
hostapd rejecting IEEE 802.11 authentication without manual
configuration of the limit) or to figure out what type of use cases can
be executed with multiple devices before trying and failing.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:08 +01:00
Jouni Malinen
1df4a51082 cfg80211: Allow BSS hint to be provided for connect
This clarifies the expected driver behavior on the older
NL80211_ATTR_MAC and NL80211_ATTR_WIPHY_FREQ attributes and adds a new
set of similar attributes with _HINT postfix to enable use of a
recommendation of the initial BSS to choose. This can be helpful for
some drivers that can avoid an additional full scan on connection
request if the information is provided to them (user space tools like
wpa_supplicant already has that information available based on earlier
scans).

In addition, this can be used to get more expected behavior for cases
where a specific BSS should be picked first based on operations like
Interworking network selection or WPS. These cases were already easily
addressed with drivers that leave BSS selection to user space, but there
was no convenient way to do this with drivers that take care of BSS
selection internally without using the NL80211_ATTR_MAC which is not
really desired since it is needed for other purposes to force the
association to remain with the same BSS.

Signed-off-by: Jouni Malinen <j@w1.fi>
[add const, fix policy]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:07 +01:00
Luciano Coelho
66e01cf99e mac80211: only set CSA beacon when at least one beacon must be transmitted
A beacon should never have a Channel Switch Announcement information
element with a count of 0, because a count of 1 means switch just
before the next beacon.  So, if a count of 0 was valid in a beacon, it
would have been transmitted in the next channel already, which is
useless.  A CSA count equal to zero is only meaningful in action
frames or probe_responses.

Fix the ieee80211_csa_is_complete() and ieee80211_update_csa()
functions accordingly.

With a CSA count of 0, we won't transmit any CSA beacons, because the
switch will happen before the next TBTT.  To avoid extra work and
potential confusion in the drivers, complete the CSA immediately,
instead of waiting for the driver to call ieee80211_csa_finish().

To keep things simpler, we also switch immediately when the CSA count
is 1, while in theory we should delay the switch until just before the
next TBTT.

Additionally, move the ieee80211_csa_finish() function to cfg.c,
where it makes more sense.

Tested-by: Simon Wunderlich <sw@simonwunderlich.de>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:06 +01:00
Linus Torvalds
4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Linus Torvalds
6dd9158ae8 Merge git://git.infradead.org/users/eparis/audit
Pull audit update from Eric Paris:
 "Again we stayed pretty well contained inside the audit system.
  Venturing out was fixing a couple of function prototypes which were
  inconsistent (didn't hurt anything, but we used the same value as an
  int, uint, u32, and I think even a long in a couple of places).

  We also made a couple of minor changes to when a couple of LSMs called
  the audit system.  We hoped to add aarch64 audit support this go
  round, but it wasn't ready.

  I'm disappearing on vacation on Thursday.  I should have internet
  access, but it'll be spotty.  If anything goes wrong please be sure to
  cc rgb@redhat.com.  He'll make fixing things his top priority"

* git://git.infradead.org/users/eparis/audit: (50 commits)
  audit: whitespace fix in kernel-parameters.txt
  audit: fix location of __net_initdata for audit_net_ops
  audit: remove pr_info for every network namespace
  audit: Modify a set of system calls in audit class definitions
  audit: Convert int limit uses to u32
  audit: Use more current logging style
  audit: Use hex_byte_pack_upper
  audit: correct a type mismatch in audit_syscall_exit()
  audit: reorder AUDIT_TTY_SET arguments
  audit: rework AUDIT_TTY_SET to only grab spin_lock once
  audit: remove needless switch in AUDIT_SET
  audit: use define's for audit version
  audit: documentation of audit= kernel parameter
  audit: wait_for_auditd rework for readability
  audit: update MAINTAINERS
  audit: log task info on feature change
  audit: fix incorrect set of audit_sock
  audit: print error message when fail to create audit socket
  audit: fix dangling keywords in audit_log_set_loginuid() output
  audit: log on errors from filter user rules
  ...
2014-01-23 18:08:10 -08:00
Jiri Pirko
ba7d49b1f0 rtnetlink: provide api for getting and setting slave info
Recent patch
bonding: add netlink attributes to slave link dev (1d3ee88ae0)

Introduced yet another device specific way to access slave information
over rtnetlink. There is one already there for bridge.

This patch introduces generic way to do this, for getting and setting
info as well by extending link_ops. Later on, this new interface will
be used for bridge ports as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:05 -08:00
FX Le Bail
7c90cc2d40 ipv6: enable anycast addresses as source addresses for datagrams
This change allows to consider an anycast address valid as source address
when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
So, when sending a datagram with ancillary data, the unicast and anycast
addresses are handled in the same way.

- Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
  on given interface or is global.
- Uses it in ip6_datagram_send_ctl().

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:05 -08:00
Hannes Frederic Sowa
809fa972fd reciprocal_divide: update/correction of the algorithm
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1<<32)*((1<<l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')>>32
  q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
Daniel Borkmann
89770b0a69 net: introduce reciprocal_scale helper and convert users
As David Laight suggests, we shouldn't necessarily call this
reciprocal_divide() when users didn't requested a reciprocal_value();
lets keep the basic idea and call it reciprocal_scale(). More
background information on this topic can be found in [1].

Joint work with Hannes Frederic Sowa.

  [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html

Suggested-by: David Laight <david.laight@aculab.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
wangweidong
5bc1d1b4a2 sctp: remove macros sctp_bh_[un]lock_sock
Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user
space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
wangweidong
048ed4b626 sctp: remove macros sctp_{lock|release}_sock
Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
wangweidong
1b0de194f1 sctp: remove macros sctp_read_[un]lock
Redefined read_[un]lock to sctp_read_[un]lock for user space
friendly code which we haven't use in years, and the macros
we never used, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:41 -08:00
wangweidong
387602dfdc sctp: remove macros sctp_write_[un]_lock
Redefined write_[un]lock to sctp_write_[un]lock for user space
friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:41 -08:00
wangweidong
3c8e43ba9f sctp: remove macros sctp_spin_[un]lock
Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:41 -08:00
wangweidong
79b91130a2 sctp: remove macros sctp_local_bh_{disable|enable}
Redefined local_bh_{disable|enable} to sctp_local_bh_{disable|enable}
for user space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:40 -08:00
wangweidong
940287ee10 sctp: remove macros sctp_spin_[un]lock_irqrestore
Redefined spin_[un]lock_irqstore to sctp_spin_[un]lock_irqrestore for user
space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:40 -08:00
Or Gerlitz
dc01e7d344 net: Add GRO support for vxlan traffic
Add GRO handlers for vxlann, by using the UDP GRO infrastructure.

For single TCP session that goes through vxlan tunneling I got nice
improvement from 6.8Gbs to 11.5Gbs

--> UDP/VXLAN GRO disabled
$ netperf  -H 192.168.52.147 -c -C

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      6799.75   12.54    24.79    0.604   1.195

--> UDP/VXLAN GRO enabled

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      11562.72   24.90    20.34    0.706   0.577

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:05:04 -08:00
Or Gerlitz
b582ef0990 net: Add GRO support for UDP encapsulating protocols
Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive
code twice on a packet. This solves the problem of udp encapsulated packets whose
inner VM packet is udp and happen to carry a port which has registered offloads.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:05:04 -08:00
Hannes Frederic Sowa
82b276cd2b ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts
Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:59:19 -08:00
WANG Cong
6e6a50c254 net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()
So that we will not expose struct tcf_common to modules.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
WANG Cong
c779f7af99 net_sched: act: fetch hinfo from a->ops->hinfo
Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
Linus Torvalds
a0fa1dd3cd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:

 - Add the initial implementation of SCHED_DEADLINE support: a real-time
   scheduling policy where tasks that meet their deadlines and
   periodically execute their instances in less than their runtime quota
   see real-time scheduling and won't miss any of their deadlines.
   Tasks that go over their quota get delayed (Available to privileged
   users for now)

 - Clean up and fix preempt_enable_no_resched() abuse all around the
   tree

 - Do sched_clock() performance optimizations on x86 and elsewhere

 - Fix and improve auto-NUMA balancing

 - Fix and clean up the idle loop

 - Apply various cleanups and fixes

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  sched: Fix __sched_setscheduler() nice test
  sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
  sched: Fix up attr::sched_priority warning
  sched: Fix up scheduler syscall LTP fails
  sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
  sched/core: Fix htmldocs warnings
  sched/deadline: No need to check p if dl_se is valid
  sched/deadline: Remove unused variables
  sched/deadline: Fix sparse static warnings
  m68k: Fix build warning in mac_via.h
  sched, thermal: Clean up preempt_enable_no_resched() abuse
  sched, net: Fixup busy_loop_us_clock()
  sched, net: Clean up preempt_enable_no_resched() abuse
  sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
  sched/preempt, locking: Rework local_bh_{dis,en}able()
  sched/clock, x86: Avoid a runtime condition in native_sched_clock()
  sched/clock: Fix up clear_sched_clock_stable()
  sched/clock, x86: Use a static_key for sched_clock_stable
  sched/clock: Remove local_irq_disable() from the clocks
  sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
  ...
2014-01-20 10:42:08 -08:00
WANG Cong
671314a5ab net_sched: act: remove capab from struct tc_action_ops
It is not actually implemented.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:58:07 -08:00
Hannes Frederic Sowa
4b261c75a9 ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:53:18 -08:00
Florent Fourcot
6444f72b4b ipv6: add flowlabel_consistency sysctl
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
 * rename ip6_flowlabel_consistency to flowlabel_consistency
 * use net_info_ratelimited()
 * checkpatch cleanups

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Florent Fourcot
46e5f40176 ipv6: add a flag to get the flow label used remotly
This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
David S. Miller
4180442058 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 00:55:41 -08:00
Eric Dumazet
6c7e7610ff ipv4: fix a dst leak in tunnels
This patch :

1) Remove a dst leak if DST_NOCACHE was set on dst
   Fix this by holding a reference only if dst really cached.

2) Remove a lockdep warning in __tunnel_dst_set()
    This was reported by Cong Wang.

3) Remove usage of a spinlock where xchg() is enough

4) Remove some spurious inline keywords.
   Let compiler decide for us.

Fixes: 7d442fab0a ("ipv4: Cache dst in tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:36:39 -08:00
Hannes Frederic Sowa
11ffff752c ipv6: simplify detection of first operational link-local address on interface
In commit 1ec047eb47 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.

Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.

Fixes: 1ec047eb47 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
Reported-by: Jiri Pirko <jiri@resnulli.us>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:10:01 -08:00
Florent Fourcot
1d13a96c74 ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT
This patch is following the commit b903d324be (ipv6: tcp: fix TCLASS
value in ACK messages sent from TIME_WAIT).

For the same reason than tclass, we have to store the flow label in the
inet_timewait_sock to provide consistency of flow label on the last ACK.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 17:56:33 -08:00
John W. Linville
7916a07557 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-17 14:43:17 -05:00
Jiri Pirko
89740ca74f neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions.
When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is
not initialized yet. This might cause confusion for the people who use
NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for
usage in ndo_neigh_setup.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 11:31:58 -08:00
Li RongQing
d76ed22b22 ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper
Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h,
and use this macro as possible. And define ip6_tclass helper to return
tclass

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:53:18 -08:00
FX Le Bail
ec35b61ea5 IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
Aruna-Hewapathirane
63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
WANG Cong
72c1d3bdd5 ipv4: register igmp_notifier even when !CONFIG_PROC_FS
We still need this notifier even when we don't config
PROC_FS.

It should be rare to have a kernel without PROC_FS,
so just for completeness.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:03:33 -08:00
David S. Miller
aef2b45fe4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Steffen Klassert says:

====================
This pull request has a merge conflict between commits be7928d20b
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
da7c224b1b ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.

The version from net-next can be used, like it is done in linux-next.

1) Checkpatch cleanups, from Weilong Chen.

2) Fix lockdep complaints when pktgen is used with IPsec,
   from Fan Du.

3) Update pktgen to allow any combination of IPsec transport/tunnel mode
   and AH/ESP/IPcomp type, from Fan Du.

4) Make pktgen_dst_metrics static, Fengguang Wu.

5) Compile fix for pktgen when CONFIG_XFRM is not set,
   from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:14:25 -08:00
Eric Paris
4440e85481 audit: convert all sessionid declaration to unsigned int
Right now the sessionid value in the kernel is a combination of u32,
int, and unsigned int.  Just use unsigned int throughout.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:31:46 -05:00
stephen hemminger
6daaf0de2f sctp: make sctp_addto_chunk_fixed local
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:42:30 -08:00
WANG Cong
7eb8896df0 net_sched: act: remove struct tcf_act_hdr
It is not necessary at all.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong
2519a602c2 net_sched: optimize tcf_match_indev()
tcf_match_indev() is called in fast path, it is not wise to
search for a netdev by ifindex and then compare by its name,
just compare the ifindex.

Also, dev->name could be changed by user-space, therefore
the match would be always fail, but dev->ifindex could
be consistent.

BTW, this will also save some bytes from the core struct of u32.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong
832d1d5bfa net_sched: add struct net pointer to tcf_proto_ops->dump
It will be needed by the next patch.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
WANG Cong
ddafd34f41 net_sched: act: move idx_gen into struct tcf_hashinfo
There is no need to store the index separatedly
since tcf_hashinfo is allocated statically too.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
Luis R. Rodriguez
4f7b91404c cfg80211: make regulatory_hint() remove REGULATORY_CUSTOM_REG
The REGULATORY_CUSTOM_REG can be used during early init with
the goal of overriding the wiphy's default regulatory settings
in case the alpha2 of the device is not known. In the case that
the alpha2 becomes known lets avoid having drivers having to
clear the REGULATORY_CUSTOM_REG flag by doing it for them
when regulatory_hint() is used.

Cc: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-13 14:46:58 -05:00
John W. Linville
f13352519e Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-01-13 14:40:59 -05:00
John W. Linville
ec665facde This is the first NFC pull request for 3.14
It includes:
 
 * A new NFC driver for Marvell's 8897, and a few NCI fixes and
   improvements needed to support this chipset.
 
 * An LLCP fix for how we were setting the default MIU on a p2p link. If
   there is no explicit MIU extension announced at connection time, we
   must use the default one and not the one announced at LLCP link
   establishement time.
 
 * A pn544 EEPROM config update. Some of the currently EEPROM configured
   values are overwriting the firmware ones while other should not be set
   by the driver itself.
 
 * Some NFC digital stack fixes and improvements. Asynchronous functions
   are better documented, RF technologies and CRC functions are set upon
   PSL_REQ reception, and a few minor bugs are fixed.
 
 * Minor and miscelaneous pn533, mei_phy and port100 fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSzfOKAAoJEIqAPN1PVmxKODEP/i1tmx6bwSjuR0gMyvIkqcBJ
 1mM7BwdXlTKTvS/HaKTqaftS5S9Kj/IYSsHPjqRAJp3ipZdc39D8rR3jiyhWzKyD
 A/o6whTBTnyAgt8/enNp+h8S/Iq+E3itL/51KUOeeFIKSpGqqfcssZ1/3qhvoYZQ
 75zck2OPiEs8KBl1bCrrzK1kP4s8aEH6PepmXd7WS8njKe+dcyl3erw0IVN4WPfP
 FKFemvL/HP8+cUyshdiQGRiSw+TyD1VLaZinhyoeJxVRcXUjcodLwtCIATwqvu54
 2fMk1ccFineAQZGFfZGbtMAjHQLUeOpHHxFfdkW1g7P9IBp4zjtEiNOhNvPnKlR2
 p4g4R/vPdXxbQWjIoWzXI8qw/eFq8xIVC0ap37W/Y65532ParnXESAwk29BJ6770
 kqpHTjfZTUmW2POuvqhEKUKPPVp5nt0ArgfnjvHOS1wxcT885vWeu/YOxpOm9VdU
 rjFSBBaBDC43vGkCHn5szU9sEwu4O1/JFHElSToXsu+bRtS0tA3O62Kv732RZmbm
 1SCbZ63o1ivZr8Q37bY1NDW1/YdwUJMNbEb/t/wDLBqYx0vQcD0aDUWCwoACi2Du
 FbUElc975E1ChvM7VfV7uqFN0Pc++M1IEHLcM2BiXmSIGjziFY02FFDkqHiea/tC
 xlYfYrrUoIj0v/u7q1t4
 =ydBy
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"This is the first NFC pull request for 3.14

It includes:

* A new NFC driver for Marvell's 8897, and a few NCI fixes and
  improvements needed to support this chipset.

* An LLCP fix for how we were setting the default MIU on a p2p link. If
  there is no explicit MIU extension announced at connection time, we
  must use the default one and not the one announced at LLCP link
  establishement time.

* A pn544 EEPROM config update. Some of the currently EEPROM configured
  values are overwriting the firmware ones while other should not be set
  by the driver itself.

* Some NFC digital stack fixes and improvements. Asynchronous functions
  are better documented, RF technologies and CRC functions are set upon
  PSL_REQ reception, and a few minor bugs are fixed.

* Minor and miscelaneous pn533, mei_phy and port100 fixes."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-13 14:36:42 -05:00
Hannes Frederic Sowa
8ed1dc44d3 ipv4: introduce hardened ip_no_pmtu_disc mode
This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
to be honored by protocols which do more stringent validation on the
ICMP's packet payload. This knob is useful for people who e.g. want to
run an unmodified DNS server in a namespace where they need to use pmtu
for TCP connections (as they are used for zone transfers or fallback
for requests) but don't want to use possibly spoofed UDP pmtu information.

Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
if the returned packet is in the window or if the association is valid.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:55 -08:00
Hannes Frederic Sowa
f87c10a8aa ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing
While forwarding we should not use the protocol path mtu to calculate
the mtu for a forwarded packet but instead use the interface mtu.

We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
introduced for multicast forwarding. But as it does not conflict with
our usage in unicast code path it is perfect for reuse.

I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
dependencies because of IPSKB_FORWARDED.

Because someone might have written a software which does probe
destinations manually and expects the kernel to honour those path mtus
I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
can disable this new behaviour. We also still use mtus which are locked on a
route for forwarding.

The reason for this change is, that path mtus information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv4 forwarding path to
wrongfully emit fragmentation needed notifications or start to fragment
packets along a path.

Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
won't be set and further fragmentation logic will use the path mtu to
determine the fragmentation size. They also recheck packet size with
help of path mtu discovery and report appropriate errors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:54 -08:00
Peter Zijlstra
3708983452 sched, net: Fixup busy_loop_us_clock()
The only valid use of preempt_enable_no_resched() is if the very next
line is schedule() or if we know preemption cannot actually be enabled
by that statement due to known more preempt_count 'refs'.

This busy_poll stuff looks to be completely and utterly broken,
sched_clock() can return utter garbage with interrupts enabled (rare
but still) and it can drift unbounded between CPUs.

This means that if you get preempted/migrated and your new CPU is
years behind on the previous CPU we get to busy spin for a _very_ long
time.

There is a _REASON_ sched_clock() warns about preemptability -
papering over it with a preempt_disable()/preempt_enable_no_resched()
is just terminal brain damage on so many levels.

Replace sched_clock() usage with local_clock() which has a bounded
drift between CPUs (<2 jiffies).

There is a further problem with the entire busy wait poll thing in
that the spin time is additive to the syscall timeout, not inclusive.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:39:11 +01:00
John W. Linville
235f939228 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	net/ieee802154/6lowpan.c
2014-01-10 10:59:40 -05:00
Johannes Berg
b77cf4f8e1 mac80211: handle MMPDUs at EOSP correctly
If a uAPSD service period ends with an MMPDU, we currently just
send that MMPDU, but it obviously won't get the EOSP bit set as
it doesn't have a QoS header. This contradicts the standard, so
add a QoS-nulldata frame after the MMPDU to properly terminate
the service period with a frame that has EOSP set.

Also fix a bug wrt. the TID for the MMPDU, it shouldn't be set
to 0 unconditionally but use the actual TID that was assigned.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-10 09:50:02 +01:00
Patrick McHardy
3876d22dba netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain()
We don't encode argument types into function names and since besides
nft_do_chain() there are only AF-specific versions, there is no risk
of confusion.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:16 +01:00
Patrick McHardy
fa2c1de0bb netfilter: nf_tables: minor nf_chain_type cleanups
Minor nf_chain_type cleanups:

- reorder struct to plug a hoe
- rename struct module member to "owner" for consistency
- rename nf_hookfn array to "hooks" for consistency
- reorder initializers for better readability

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy
2a37d755b8 netfilter: nf_tables: constify chain type definitions and pointers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy
baae3e62f3 netfilter: nf_tables: fix chain type module reference handling
The chain type module reference handling makes no sense at all: we take
a reference immediately when the module is registered, preventing the
module from ever being unloaded.

Fix by taking a reference when we're actually creating a chain of the
chain type and release the reference when destroying the chain.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:14 +01:00
Ilan Peer
bdfbec2d2d cfg80211: Add a function to get the number of supported channels
Add a utility function to get the number of channels supported by
the device, and update the places in the code that need this data.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
[replace another occurrence in libertas, fix kernel-doc, fix bugs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-09 14:24:24 +01:00
John W. Linville
300e5fd160 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-01-08 13:44:29 -05:00
Patrick McHardy
4566bf2706 netfilter: nft_meta: add l4proto support
For L3-proto independant rules we need to get at the L4 protocol value
directly. Add it to the nft_pktinfo struct and use the meta expression
to retrieve it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:31 +01:00
Patrick McHardy
1d49144c0a netfilter: nf_tables: add "inet" table for IPv4/IPv6
This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:25 +01:00
Patrick McHardy
115a60b173 netfilter: nf_tables: add support for multi family tables
Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2014-01-07 23:55:46 +01:00
Patrick McHardy
c9484874e7 netfilter: nf_tables: add hook ops to struct nft_pktinfo
Multi-family tables need the AF from the hook ops. Add a pointer to the
hook ops and replace usage of the hooknum member in struct nft_pktinfo.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
Johannes Berg
685328b296 mac80211: remove channel_change_time
This value is no longer used by mac80211, and practically no
driver ever set it to a correct value anyway, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-07 23:16:39 +01:00
FX Le Bail
509aba3b0d IPv6: add the option to use anycast addresses as source addresses in echo reply
This change allows to follow a recommandation of RFC4942.

- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
  as source addresses for ICMPv6 echo reply. This sysctl is false by default
  to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().

Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
   (http://tools.ietf.org/html/rfc4942#section-2.1.6)

2.1.6. Anycast Traffic Identification and Security

   [...]
   To avoid exposing knowledge about the internal structure of the
   network, it is recommended that anycast servers now take advantage of
   the ability to return responses with the anycast address as the
   source address if possible.

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 15:51:39 -05:00
Eric Dumazet
438e38fadc gre_offload: statically build GRE offloading support
GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.

This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 20:28:34 -05:00
David S. Miller
39b6b2992f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
[GIT net-next] Open vSwitch

Open vSwitch changes for net-next/3.14. Highlights are:
 * Performance improvements in the mechanism to get packets to userspace
   using memory mapped netlink and skb zero copy where appropriate.
 * Per-cpu flow stats in situations where flows are likely to be shared
   across CPUs. Standard flow stats are used in other situations to save
   memory and allocation time.
 * A handful of code cleanups and rationalization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 19:48:38 -05:00
Amitkumar Karwar
22c15bf30b NFC: NCI: Add set_config API
This API can be used by drivers to send their custom
configuration using SET_CONFIG NCI command to the device.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 01:32:40 +01:00
Amitkumar Karwar
86e8586ed5 NFC: NCI: Add setup handler
Some drivers require special configuration while initializing.
This patch adds setup handler for this custom configuration.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 01:32:40 +01:00
Thomas Graf
bb9b18fb55 genl: Add genlmsg_new_unicast() for unicast message allocation
Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:51:53 -08:00
David S. Miller
56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
David S. Miller
9aa28f2b71 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: <pablo@netfilter.org>

====================
nftables updates for net-next

The following patchset contains nftables updates for your net-next tree,
they are:

* Add set operation to the meta expression by means of the select_ops()
  infrastructure, this allows us to set the packet mark among other things.
  From Arturo Borrero Gonzalez.

* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
  Borkmann.

* Add new queue expression to nf_tables. These comes with two previous patches
  to prepare this new feature, one to add mask in nf_tables_core to
  evaluate the queue verdict appropriately and another to refactor common
  code with xt_NFQUEUE, from Eric Leblond.

* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
  Eric Leblond.

* Add the reject expression to nf_tables, this adds the missing TCP RST
  support. It comes with an initial patch to refactor common code with
  xt_NFQUEUE, again from Eric Leblond.

* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
  Nazarewicz.

* Remove the nft_meta_target code, now that Arturo added the set operation
  to the meta expression, from me.

* Add help information for nf_tables to Kconfig, also from me.

* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
  available to other nf_tables objects, requested by Arturo, from me.

* Expose the table usage counter, so we can know how many chains are using
  this table without dumping the list of chains, from Tomasz Bursztyka.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 13:29:30 -05:00
David S. Miller
855404efae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:18:50 -05:00
Li RongQing
8f84985fec net: unify the pcpu_tstats and br_cpu_netstats as one
They are same, so unify them as one, pcpu_sw_netstats.

Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:10:24 -05:00
Marcel Holtmann
f9f462faa0 Bluetooth: Add quirk for disabling Delete Stored Link Key command
Some controller pretend they support the Delete Stored Link Key command,
but in reality they really don't support it.

  < HCI Command: Delete Stored Link Key (0x03|0x0012) plen 7
      bdaddr 00:00:00:00:00:00 all 1
  > HCI Event: Command Complete (0x0e) plen 4
      Delete Stored Link Key (0x03|0x0012) ncmd 1
      status 0x11 deleted 0
      Error: Unsupported Feature or Parameter Value

Not correctly supporting this command causes the controller setup to
fail and will make a device not work. However sending the command for
controller that handle stored link keys is important. This quirk
allows a driver to disable the command if it knows that this command
handling is broken.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-01-04 20:10:40 +02:00
Thierry Escande
444fb98eed NFC: digital: Add a note about asynchronous functions
This explains how and why the timeout parameter must be handled by the
driver implementation.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:35:34 +01:00
stephen hemminger
5e419e68a6 llc: make lock static
The llc_sap_list_lock does not need to be global, only acquired
in core.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:56:48 -05:00
stephen hemminger
8f09898bf0 socket: cleanups
Namespace related cleaning

 * make cred_to_ucred static
 * remove unused sock_rmalloc function

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:55:58 -05:00
Tom Herbert
9a4aa9af44 ipv4: Use percpu Cache route in IP tunnels
percpu route cache eliminates share of dst refcnt between CPUs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:40:57 -05:00
Tom Herbert
7d442fab0a ipv4: Cache dst in tunnels
Avoid doing a route lookup on every packet being tunneled.

In ip_tunnel.c cache the route returned from ip_route_output if
the tunnel is "connected" so that all the rouitng parameters are
taken from tunnel parms for a packet. Specifically, not NBMA tunnel
and tos is from tunnel parms (not inner packet).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:38:45 -05:00
Daniel Borkmann
86f8515f97 net: netprio: rename config to be more consistent with cgroup configs
While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:42 +01:00
Daniel Borkmann
fe1217c4f3 net: net_cls: move cgroupfs classid handling into core
Zefan Li requested [1] to perform the following cleanup/refactoring:

- Split cgroupfs classid handling into net core to better express a
  possible more generic use.

- Disable module support for cgroupfs bits as the majority of other
  cgroupfs subsystems do not have that, and seems to be not wished
  from cgroup side. Zefan probably might want to follow-up for netprio
  later on.

- By this, code can be further reduced which previously took care of
  functionality built when compiled as module.

cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.

No change in functionality, but only code refactoring that is being
done here.

 [1] http://patchwork.ozlabs.org/patch/304825/

Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:41 +01:00
stephen hemminger
dcd93ed4cd netfilter: nf_conntrack: remove dead code
The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:37 +01:00
Fan Du
c454997e68 {pktgen, xfrm} Introduce xfrm_state_lookup_byspi for pktgen
Introduce xfrm_state_lookup_byspi to find user specified by custom
from "pgset spi xxx". Using this scheme, any flow regardless its
saddr/daddr could be transform by SA specified with configurable
spi.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:12 +01:00
Vlad Yasevich
619a60ee04 sctp: Remove outqueue empty state
The SCTP outqueue structure maintains a data chunks
that are pending transmission, the list of chunks that
are pending a retransmission and a length of data in
flight.  It also tries to keep the emtpy state so that
it can performe shutdown sequence or notify user.

The problem is that the empy state is inconsistently
tracked.  It is possible to completely drain the queue
without sending anything when using PR-SCTP.  In this
case, the empty state will not be correctly state as
report by Jamal Hadi Salim <jhs@mojatatu.com>.  This
can cause an association to be perminantly stuck in the
SHUTDOWN_PENDING state.

Additionally, SCTP is incredibly inefficient when setting
the empty state.  Even though all the data is availaible
in the outqueue structure, we ignore it and walk a list
of trasnports.

In the end, we can completely remove the extra empty
state and figure out if the queue is empty by looking
at 3 things:  length of pending data, length of in-flight
data, and exisiting of retransmit data.  All of these
are already in the strucutre.

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 17:22:48 -05:00
stephen hemminger
9c75f4029c sched action: make local function static
No need to export functions only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:36 -05:00
Li RongQing
0c3584d589 ipv6: remove prune parameter for fib6_clean_all
since the prune parameter for fib6_clean_all always is 0, remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:35 -05:00
stephen hemminger
e82435341f ipv6: namespace cleanups
Running 'make namespacecheck' shows:
  net/ipv6/route.o
    ipv6_route_table_template
    rt6_bind_peer
  net/ipv6/icmp.o
    icmpv6_route_lookup
    ipv6_icmp_table_template

This addresses some of those warnings by:
 * make icmpv6_route_lookup static
 * move inline's out of ip6_route.h since only used into route.c
 * move rt6_bind_peer into route.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:46:09 -05:00
stephen hemminger
3678a9d863 netlink: cleanup rntl_af_register
The function __rtnl_af_register is never called outside this
code, and the return value is always 0.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:42:19 -05:00
Daniel Borkmann
7e0309631e net: llc: fix order of evaluation in llc_conn_ac_inc_vr_by_1
Function llc_conn_ac_inc_vr_by_1() evaluates via macro
PDU_GET_NEXT_Vr() into ...

  llc_sk(sk)->vR = ++llc_sk(sk)->vR & 0xffffffffffffff7f

... but the order in which the side effects take place is
undefined because there is no intervening sequence point.

As llc_sk(sk)->vR is written in llc_sk(sk)->vR (assignment
left-hand side) and written in ++llc_sk(sk)->vR & 0xffffffffffffff7f
this might possibly yield undefined behavior.

The final value of llc_sk(sk)->vR is ambiguous, because,
depending on the order of expression evaluation, the
increment may occur before, after, or interleaved with
the assignment. In C, evaluating such an expression yields
undefined behavior.

Since we're doing the increment via PDU_GET_NEXT_Vr() macro
and the only place it is being used is from
llc_conn_ac_inc_vr_by_1(), in order to increment vR by 1
with a follow-up optimized modulo, rewrite the expression
into ((vR + 1) & CONST) in order to fix this.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 22:22:43 -05:00
John W. Linville
ad86c55bac Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-01 15:39:56 -05:00
Zhi Yong Wu
c9d8ca0454 net, rps: fix build failure when CONFIG_RPS isn't set
In file included from net/socket.c:99:0:
include/net/sock.h: In function ‘sock_rps_record_flow’:
include/net/sock.h:849:30: error: ‘const struct sock’ has no member named ‘sk_rxhash’
include/net/sock.h: In function ‘sock_rps_reset_flow’:
include/net/sock.h:854:29: error: ‘const struct sock’ has no member named ‘sk_rxhash’

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 15:59:27 -05:00
Tom Herbert
fe47755852 net: Allow setting sock flow hash without a sock
This patch adds sock_rps_record_flow_hash and sock_rps_reset_flow_hash
which take a hash value as an argument and sets the sock_flow_table
accordingly.  This allows the table to be populated in cases where flow
is being tracked outside of a sock structure.

sock_rps_record_flow and sock_rps_reset_flow call this function
where the hash is taken from sk_rxhash.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 13:31:34 -05:00
Eric Leblond
cc70d069e2 netfilter: REJECT: separate reusable code
This patch prepares the addition of TCP reset support in
the nft_reject module by moving reusable code into a header
file.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-30 15:04:41 +01:00
stephen hemminger
f7e56a76ac tcp: make local functions static
The following are only used in one file:
  tcp_connect_init
  tcp_set_rto

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29 16:34:24 -05:00
Stephen Hemminger
ea074b3495 ipv4: ping make local stuff static
Don't export ping_table or ping_v4_sendmsg. Both are only used
inside ping code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:05:45 -05:00
Stephen Hemminger
068a6e1834 ipv4: remove unused function
inetpeer_invalidate_family defined but never used

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:03:20 -05:00
Stephen Hemminger
7195cf7221 arp: make arp_invalidate static
Don't export arp_invalidate, only used in arp.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:02:46 -05:00
Rashika Kheria
b6835f9c5d drivers: net: Include new header file in sbni.c
Create a new header file include/net/Space.h which contains
prototype declaration of sbni_probe().

Include the new header file in drivers/net/Space.c and
drivers/net/wan/sbni.c because they use this function.

This eliminates the following warning in wan/sbni.c:
drivers/net/wan/sbni.c:224:12: warning: no previous prototype for ‘sbni_probe’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 18:51:20 -05:00
David S. Miller
1669cb9855 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2013-12-19

1) Use the user supplied policy index instead of a generated one
   if present. From Fan Du.

2) Make xfrm migration namespace aware. From Fan Du.

3) Make the xfrm state and policy locks namespace aware. From Fan Du.

4) Remove ancient sleeping when the SA is in acquire state,
   we now queue packets to the policy instead. This replaces the
   sleeping code.

5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
   posibility to sleep. The sleeping code is gone, so remove it.

6) Check user specified spi for IPComp. Thr spi for IPcomp is only
   16 bit wide, so check for a valid value. From Fan Du.

7) Export verify_userspi_info to check for valid user supplied spi ranges
   with pfkey and netlink. From Fan Du.

8) RFC3173 states that if the total size of a compressed payload and the IPComp
   header is not smaller than the size of the original payload, the IP datagram
   must be sent in the original non-compressed form. These packets are dropped
   by the inbound policy check because they are not transformed. Document the need
   to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 18:37:49 -05:00
Kyeyoon Park
fa9ffc7456 cfg80211: Add support for QoS mapping
This allows QoS mapping from external networks to be implemented as
defined in IEEE Std 802.11-2012, 10.24.9. APs can use this to advertise
DSCP ranges and exceptions for mapping frames to a specific UP over
Wi-Fi.

The payload of the QoS Map Set element (IEEE Std 802.11-2012, 8.4.2.97)
is sent to the driver through the new NL80211_ATTR_QOS_MAP attribute to
configure the local behavior either on the AP (based on local
configuration) or on a station (based on information received from the
AP).

Signed-off-by: Kyeyoon Park <kyeyoonp@qca.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-19 16:29:22 +01:00
Johannes Berg
567ffc3509 nl80211: support vendor-specific events
In addition to vendor-specific commands, also support vendor-specific
events. These must be registered with cfg80211 before they can be used.
They're also advertised in nl80211 in the wiphy information so that
userspace knows can be expected. The events themselves are sent on a
new multicast group called "vendor".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-19 13:40:31 +01:00
Felix Fietkau
a7022e65c6 mac80211: add helper functions for tracking P2P NoA state
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-19 13:37:46 +01:00
Hannes Frederic Sowa
93b36cf342 ipv6: support IPV6_PMTU_INTERFACE on sockets
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
information if this mode is enabled.

The additional bit in ipv6_pinfo just eats in the padding behind the
bitfield. There are no changes to the layout of the struct at all.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 17:37:05 -05:00
Hannes Frederic Sowa
974eda11c5 inet: make no_pmtu_disc per namespace and kill ipv4_config
The other field in ipv4_config, log_martians, was converted to a
per-interface setting, so we can just remove the whole structure.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:58:20 -05:00
David S. Miller
143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
WANG Cong
3627287463 net_sched: convert tcf_proto_ops to use struct list_head
We don't need to maintain our own singly linked list code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:08 -05:00
WANG Cong
1f747c26c4 net_sched: convert tc_action_ops to use struct list_head
We don't need to maintain our own singly linked list code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
89819dc01f net_sched: convert tcf_hashinfo to hlist and use spinlock
So that we don't need to play with singly linked list,
and since the code is not on hot path, we can use spinlock
instead of rwlock.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
369ba56787 net_sched: init struct tcf_hashinfo at register time
It looks weird to store the lock out of the struct but
still points to a static variable. Just move them into the struct.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
5da57f422d net_sched: cls: refactor out struct tcf_ext_map
These information can be saved in tcf_exts, and this will
simplify the code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
33be627159 net_sched: act: use standard struct list_head
Currently actions are chained by a singly linked list,
therefore it is a bit hard to add and remove a specific
entry. Convert it to struct list_head so that in the
latter patch we can remove an action without finding
its head.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
d84231d3a2 net_sched: remove get_stats from tc_action_ops
It is not used.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
Tom Herbert
7539fadcb8 net: Add utility functions to clear rxhash
In several places 'skb->rxhash = 0' is being done to clear the
rxhash value in an skb.  This does not clear l4_rxhash which could
still be set so that the rxhash wouldn't be recalculated on subsequent
call to skb_get_rxhash.  This patch adds an explict function to clear
all the rxhash related information in the skb properly.

skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as
l4_rxhash.

Fixed up places where 'skb->rxhash = 0' was being called.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:36:21 -05:00
Eric Dumazet
d4589926d7 tcp: refine TSO splits
While investigating performance problems on small RPC workloads,
I noticed linux TCP stack was always splitting the last TSO skb
into two parts (skbs). One being a multiple of MSS, and a small one
with the Push flag. This split is done even if TCP_NODELAY is set,
or if no small packet is in flight.

Example with request/response of 4K/4K

IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>

We can avoid this split by including the Nagle tests at the right place.

Note : If some NIC had trouble sending TSO packets with a partial
last segment, we would have hit the problem in GRO/forwarding workload already.

tcp_minshall_update() is moved to tcp_output.c and is updated as we might
feed a TSO packet with a partial last segment.

This patch tremendously improves performance, as the traffic now looks
like :

IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>

Before :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280774

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     205719.049006 task-clock                #    9.278 CPUs utilized
         8,449,968 context-switches          #    0.041 M/sec
         1,935,997 CPU-migrations            #    0.009 M/sec
           160,541 page-faults               #    0.780 K/sec
   548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
   455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
   272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
   166,091,460,030 instructions              #    0.30  insns per cycle
                                             #    2.74  stalled cycles per insn [83.39%]
    29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
     1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]

      22.173517844 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   16851063           0.0
IpExtOutOctets                  23878580777        0.0

After patch :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280877

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     107496.071918 task-clock                #    4.847 CPUs utilized
         5,635,458 context-switches          #    0.052 M/sec
         1,374,707 CPU-migrations            #    0.013 M/sec
           160,920 page-faults               #    0.001 M/sec
   281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
   228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
   142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
    95,227,712,566 instructions              #    0.34  insns per cycle
                                             #    2.40  stalled cycles per insn [83.43%]
    16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
       874,252,952 branch-misses             #    5.39% of all branches         [83.37%]

      22.175821286 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   11239428           0.0
IpExtOutOctets                  23595191035        0.0

Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
scheduler.

Many thanks to Neal for review and ideas.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:15:25 -05:00
David S. Miller
6ea09d8a09 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates for the 3.14 stream...

For the Bluetooth bits, Gustavo says:

"This is the first batch of patches intended for 3.14. There is
nothing big here.  Most of the code are refactors, clean up, small
fixes, plus some new device id support."

And...

"More patches to 3.14. Here we have the support for Low Energy
Connection Oriented Channels (LE CoC). Basically, as the name says,
this adds supports for connection oriented channels in the same way
we already have them for BR/EDR connections so profiles/protocols
that work on top of BR/EDR can now work on LE plus a plenty of new
possibilities for LE."

For the ath10k bits, Kalle says:

"Janusz and Marek implemented DFS support to ath10k, but the code is
not enabled yet due to missing cfg80211/mac80211 patches (it will be
enabled in the next pull request). Michal did some device reset fixes
and made it possible for ath10k to share an interrupt with another
device. And lots of smaller fixes from different people."

For the iwlwifi bits, Emmanuel says:

"I have here a big rework of the rate control by Eyal. This is obviously
the biggest part of this batch.
I also have enhancement of protection flags by Avri and a few bits for
WoWLAN by Eliad and Luca. Johannes cleans up the debugfs plus a few
fixes. I provided a few things for Bluetooth coexistence.
Besides this we have an implementation for low priority scan."

Along with all that, there are big batches of updates to mwifiex and
ath9k, Jeff Kirsher's FSF address fix patches, and a handful of other
bits here and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:08:17 -05:00
wangweidong
be78cfcb25 sctp: Reorder 'struc association' members to reduce its size
Members of 'struct association' are not in appropriate order to
reuse compiler added padding on 64bit architectures. In this patch
we reorder those struct members and help reduce the size of the
structure from 2776 bytes to 2720 bytes on 64 bit architectures.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:32:43 -05:00
Janusz Dziedzic
204e35a91c nl80211: add VHT support for set_bitrate_mask
Add VHT MCS/NSS set support for nl80211_set_tx_bitrate_mask().
This should be used mainly for test purpose, to check
different MCS/NSS VHT combinations.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 16:05:17 +01:00
Fan Du
776e9dd90c xfrm: export verify_userspi_info for pkfey and netlink interface
In order to check against valid IPcomp spi range, export verify_userspi_info
for both pfkey and netlink interface.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-16 12:54:02 +01:00
Felix Fietkau
70dabeb74e mac80211: let the driver reserve extra tailroom in beacons
Can be used to add extra IEs (such as P2P NoA) without having to
reallocate the buffer.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 12:14:07 +01:00
Johannes Berg
6a9d1b91f3 mac80211: add pre-RCU-sync sta removal driver operation
Currently, mac80211 allows drivers to keep RCU-protected station
references that are cleared when the station is removed from the
driver and consequently needs to synchronize twice, once before
removing the station from the driver (so it can guarantee that
the station is no longer used in TX towards the driver) and once
after the station is removed from the driver.

Add a new pre-RCU-synchronisation station removal operation to
the API to allow drivers to clear/invalidate their RCU-protected
station pointers before the RCU synchronisation.

This will allow removing the second synchronisation by changing
the driver API so that the driver may no longer assume a valid
RCU-protected pointer after sta_remove/sta_state returns.

The alternative to this would be to synchronize_rcu() in all the
drivers that currently rely on this behaviour (only iwlmvm) but
that would defeat the purpose.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 11:29:44 +01:00
Johannes Berg
c4de673b77 Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2013-12-16 11:23:45 +01:00
John W. Linville
f647a52e15 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-12-13 13:14:28 -05:00
Jesper Dangaard Brouer
8cf4d6a224 net: reorder struct netns_ct for better cache-line usage
Reorder struct netns_ct so that atomic_t "count" changes don't
slowdown users of read mostly fields.

This is based on Eric Dumazet's proposed patch:
 "netfilter: conntrack: remove the central spinlock"
 http://thread.gmane.org/gmane.linux.network/268758/focus=47306

The tricky part of cache-aligning this structure, that it is getting
inlined in struct net (include/net/net_namespace.h), thus changes to
other netns_xxx structures affects our alignment.

Eric's original patch contained an ambiguity on 32-bit regarding
alignment in struct net.  This patch also takes 32-bit into account,
and in case of changed (struct net) alignment sysctl_xxx entries have
been ordered according to how often they are accessed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-13 12:55:55 +01:00
Jiri Benc
7e98056964 ipv6: router reachability probing
RFC 4191 states in 3.5:

   When a host avoids using any non-reachable router X and instead sends
   a data packet to another router Y, and the host would have used
   router X if router X were reachable, then the host SHOULD probe each
   such router X's reachability by sending a single Neighbor
   Solicitation to that router's address.  A host MUST NOT probe a
   router's reachability in the absence of useful traffic that the host
   would have sent to the router if it were reachable.  In any case,
   these probes MUST be rate-limited to no more than one per minute per
   router.

Currently, when the neighbour corresponding to a router falls into
NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
should be probed with a single NS. The probe is ratelimited by the existing
code. To better distinguish meanings of the failure values, rename
RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 16:02:58 -05:00
Jukka Rissanen
18722c2470 Bluetooth: Enable 6LoWPAN support for BT LE devices
This is initial version of
http://tools.ietf.org/html/draft-ietf-6lo-btle-00

By default the 6LoWPAN support is not activated and user
needs to tweak /sys/kernel/debug/bluetooth/hci0/6lowpan
file.

The kernel needs IPv6 support before 6LoWPAN is usable.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-11 12:57:55 -08:00
Neil Horman
9f70f46bd4 sctp: properly latch and use autoclose value from sock to association
Currently, sctp associations latch a sockets autoclose value to an association
at association init time, subject to capping constraints from the max_autoclose
sysctl value.  This leads to an odd situation where an application may set a
socket level autoclose timeout, but sliently sctp will limit the autoclose
timeout to something less than that.

Fix this by modifying the autoclose setsockopt function to check the limit, cap
it and warn the user via syslog that the timeout is capped.  This will allow
getsockopt to return valid autoclose timeout values that reflect what subsequent
associations actually use.

While were at it, also elimintate the assoc->autoclose variable, it duplicates
whats in the timeout array, which leads to multiple sources for the same
information, that may differ (as the former isn't subject to any capping).  This
gives us the timeout information in a canonical place and saves some space in
the association structure as well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
CC: Wang Weidong <wangweidong1@huawei.com>
CC: David Miller <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:41:26 -05:00
Jiri Pirko
9a32b86043 dn_dev: add support for IFA_FLAGS nl attribute
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 21:50:00 -05:00
Paul Moore
10ae76faa9 cipso: cleanup cipso_v4_translate() when !CONFIG_NETLABEL
Don't needlessly recompute 'opt[opt_iter + 1]' as we already have it
stored in 'tag_len'.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 17:56:54 -05:00
Florent Fourcot
3308de2b84 ipv6: add ip6_flowlabel helper
And use it if possible.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Florent Fourcot
37cfee909c ipv6: move IPV6_TCLASS_MASK definition in ipv6.h
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Jiri Pirko
bba24896f0 neigh: ipv6: respect default values set before an address is assigned to device
Make the behaviour similar to ipv4. This will allow user to set sysctl
default neigh param values and these values will be respected even by
devices registered before (that ones what do not have address set yet).

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko
1d4c8c2984 neigh: restore old behaviour of default parms values
Previously inet devices were only constructed when addresses are added.
Therefore the default neigh parms values they get are the ones at the
time of these operations.

Now that we're creating inet devices earlier, this changes the behaviour
of default neigh parms values in an incompatible way (see bug #8519).

This patch creates a compromise by setting the default values at the
same point as before but only for those that have not been explicitly
set by the user since the inet device's creation.

Introduced by:
commit 8030f54499
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 22 01:53:47 2007 +0900

    [IPV4] devinet: Register inetdev earlier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko
73af614aed neigh: use tbl->family to distinguish ipv4 from ipv6
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko
cb5b09c17f neigh: wrap proc dointvec functions
This will be needed later on to provide better management of default values.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko
1f9248e560 neigh: convert parms to an array
This patch converts the neigh param members to an array. This allows easier
manipulation which will be needed later on to provide better management of
default values.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
David S. Miller
34f9f43710 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
Daniel's direct transmit changes depend upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:20:14 -05:00
Eric Dumazet
95dc19299f pkt_sched: give visibility to mq slave qdiscs
Commit 6da7c8fcbc ("qdisc: allow setting default queuing discipline")
added the ability to change default qdisc from pfifo_fast to say fq

But as most modern ethernet devices are multiqueue, we cant really
see all the statistics from "tc -s qdisc show", as the default root
qdisc is mq.

This patch adds the calls to qdisc_list_add() to mq and mqprio

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 19:54:47 -05:00
Marcel Holtmann
53b834d233 Bluetooth: Use macros for connectionless slave broadcast features
Add the LMP feature constants for connectionless slave broadcast
and use them for capability testing.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-12-09 00:02:41 +04:00
Eric Leblond
97a2d41c47 netfilter: xt_NFQUEUE: separate reusable code
This patch prepares the addition of nft_queue module by moving
reusable code into a header file.

This patch also converts NFQUEUE to use prandom_u32 to initialize
the random jhash seed as suggested by Florian Westphal.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-07 23:20:45 +01:00
Jiri Pirko
479840ffdb ipv6 addrconf: extend ifa_flags to u32
There is no more space in u8 ifa_flags. So do what davem suffested and
add another netlink attr called IFA_FLAGS for carry more flags.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 16:34:43 -05:00
David S. Miller
f1abb346d8 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates intended for the 3.14 stream...

For the mac80211 bits, Johannes says:

"I have various improvements/cleanups/fixes all over, but the shortlog
shows that Luis's regulatory work and mesh work from the cozybit folks
are the biggest ones, along with the CSA fixes."

Along with that, we have big batches of updates to brcmfmac, rtlwifi,
and ath9k.  There are updates to wcn36xx, rt2x00, and a handful of
others as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 14:25:23 -05:00
Eric Dumazet
f54b311142 tcp: auto corking
With the introduction of TCP Small Queues, TSO auto sizing, and TCP
pacing, we can implement Automatic Corking in the kernel, to help
applications doing small write()/sendmsg() to TCP sockets.

Idea is to change tcp_push() to check if the current skb payload is
under skb optimal size (a multiple of MSS bytes)

If under 'size_goal', and at least one packet is still in Qdisc or
NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
will be delayed up to TX completion time.

This delay might allow the application to coalesce more bytes
in the skb in following write()/sendmsg()/sendfile() system calls.

The exact duration of the delay is depending on the dynamics
of the system, and might be zero if no packet for this flow
is actually held in Qdisc or NIC TX ring.

Using FQ/pacing is a way to increase the probability of
autocorking being triggered.

Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
this feature and default it to 1 (enabled)

Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
This counter is incremented every time we detected skb was under used
and its flush was deferred.

Tested:

Interesting effects when using line buffered commands under ssh.

Excellent performance results in term of cpu usage and total throughput.

lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
9410.39

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

      35209.439626 task-clock                #    2.901 CPUs utilized
             2,294 context-switches          #    0.065 K/sec
               101 CPU-migrations            #    0.003 K/sec
             4,079 page-faults               #    0.116 K/sec
    97,923,241,298 cycles                    #    2.781 GHz                     [83.31%]
    51,832,908,236 stalled-cycles-frontend   #   52.93% frontend cycles idle    [83.30%]
    25,697,986,603 stalled-cycles-backend    #   26.24% backend  cycles idle    [66.70%]
   102,225,978,536 instructions              #    1.04  insns per cycle
                                             #    0.51  stalled cycles per insn [83.38%]
    18,657,696,819 branches                  #  529.906 M/sec                   [83.29%]
        91,679,646 branch-misses             #    0.49% of all branches         [83.40%]

      12.136204899 seconds time elapsed

lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
6624.89

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
      40045.864494 task-clock                #    3.301 CPUs utilized
               171 context-switches          #    0.004 K/sec
                53 CPU-migrations            #    0.001 K/sec
             4,080 page-faults               #    0.102 K/sec
   111,340,458,645 cycles                    #    2.780 GHz                     [83.34%]
    61,778,039,277 stalled-cycles-frontend   #   55.49% frontend cycles idle    [83.31%]
    29,295,522,759 stalled-cycles-backend    #   26.31% backend  cycles idle    [66.67%]
   108,654,349,355 instructions              #    0.98  insns per cycle
                                             #    0.57  stalled cycles per insn [83.34%]
    19,552,170,748 branches                  #  488.244 M/sec                   [83.34%]
       157,875,417 branch-misses             #    0.81% of all branches         [83.34%]

      12.130267788 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:51:41 -05:00
Jeff Kirsher
a6227e26d9 include/net/: Fix FSF address in file headers
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:37:56 -05:00
Jeff Kirsher
4b2f13a251 sctp: Fix FSF address in file headers
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:37:56 -05:00
John W. Linville
d86804cb70 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-12-06 10:37:24 -05:00
Steffen Klassert
0e0d44ab42 net: Remove FLOWI_FLAG_CAN_SLEEP
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 07:24:39 +01:00
Steffen Klassert
5b8ef3415a xfrm: Remove ancient sleeping when the SA is in acquire state
We now queue packets to the policy if the states are not yet resolved,
this replaces the ancient sleeping code. Also the sleeping can cause
indefinite task hangs if the needed state does not get resolved.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 07:24:31 +01:00
Fan Du
283bc9f35b xfrm: Namespacify xfrm state/policy locks
By semantics, xfrm layer is fully name space aware,
so will the locks, e.g. xfrm_state/pocliy_lock.
Ensure exclusive access into state/policy link list
for different name space with one global lock is not
right in terms of semantics aspect at first place,
as they are indeed mutually independent with each
other, but also more seriously causes scalability
problem.

One practical scenario is on a Open Network Stack,
more than hundreds of lxc tenants acts as routers
within one host, a global xfrm_state/policy_lock
becomes the bottleneck. But onces those locks are
decoupled in a per-namespace fashion, locks contend
is just with in specific name space scope, without
causing additional SPD/SAD access delay for other
name space.

Also this patch improve scalability while as without
changing original xfrm behavior.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 06:45:06 +01:00
Fan Du
8d549c4f5d xfrm: Using the right namespace to migrate key info
because the home agent could surely be run on a different
net namespace other than init_net. The original behavior
could lead into inconsistent of key info.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 06:45:05 +01:00
Eric W. Biederman
7f2cbdc28c tcp_memcontrol: Cleanup/fix cg_proto->memory_pressure handling.
kill memcg_tcp_enter_memory_pressure.  The only function of
memcg_tcp_enter_memory_pressure was to reduce deal with the
unnecessary abstraction that was tcp_memcontrol.  Now that struct
tcp_memcontrol is gone remove this unnecessary function, the
unnecessary function pointer, and modify sk_enter_memory_pressure to
set this field directly, just as sk_leave_memory_pressure cleas this
field directly.

This fixes a small bug I intruduced when killing struct tcp_memcontrol
that caused memcg_tcp_enter_memory_pressure to never be called and
thus failed to ever set cg_proto->memory_pressure.

Remove the cg_proto enter_memory_pressure function as it now serves
no useful purpose.

Don't test cg_proto->memory_presser in sk_leave_memory_pressure before
clearing it.  The test was originally there to ensure that the pointer
was non-NULL.  Now that cg_proto is not a pointer the pointer does not
matter.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-05 21:01:01 -05:00
Paul Durrant
1431fb31ec xen-netback: fix fragment detection in checksum setup
The code to detect fragments in checksum_setup() was missing for IPv4 and
too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
with a fragment header even if they are not a fragment - i.e. offset is zero,
and M bit is not set).

This patch also incorporates a fix to callers of maybe_pull_tail() where
skb->network_header was being erroneously added to the length argument.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
cc: David Miller <davem@davemloft.net>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-05 20:31:40 -05:00
Janusz Dziedzic
d1e33e654e cfg80211: in bitrate_mask, rename mcs to ht_mcs
Rename NL80211_TXRATE_MCS to NL80211_TXRATE_HT and also
rename mcs to ht_mcs in struct cfg80211_bitrate_mask.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-05 16:39:07 +01:00
Johan Hedberg
0ce43ce60d Bluetooth: Simplify l2cap_chan initialization for LE CoC
The values in l2cap_chan that are used for actually transmitting data
only need to be initialized right after we've received an L2CAP Connect
Request or just before we send one. The only thing that we need to
initialize though bind() and connect() is the chan->mode value. This way
all other initializations can be done in the l2cap_le_flowctl_init
function (which now becomes private to l2cap_core.c) and the
l2cap_le_flowctl_start function can be completely removed.

Also, since the l2cap_sock_init function initializes the imtu and omtu
to adequate values these do not need to be part of l2cap_le_flowctl_init.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:36 -08:00
Johan Hedberg
e77af75592 Bluetooth: Fix CID ranges for LE CoC CID allocations
LE CoC used differend CIC ranges than BR/EDR L2CAP. The start of the
range is the same (0x0040) but the range ends at 0x007f (unlike BR/EDR
where it goes all the way to 0xffff).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:36 -08:00
Johan Hedberg
595177f311 Bluetooth: Fix LE L2CAP Connect Request handling together with SMP
Unlike BR/EDR, for LE when we're in the BT_CONNECT state we may or may
not have already have sent the Connect Request. This means that we need
some extra tracking of the request. This patch adds an extra channel
flag to prevent the request from being sent a second time.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:35 -08:00
Johan Hedberg
837776f790 Bluetooth: Introduce L2CAP channel callback for suspending
Setting the BT_SK_SUSPEND socket flag from the L2CAP core is causing a
dependency on the socket. So instead of doing that, use a channel
callback into the socket handling to suspend.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:35 -08:00
Johan Hedberg
1f435424ce Bluetooth: Add new BT_SNDMTU and BT_RCVMTU socket options
This patch adds new socket options for LE sockets since the existing
L2CAP_OPTIONS socket option is not usable for LE. For now, the new
socket options also require LE CoC support to be explicitly enabled to
leave some playroom in case something needs to be changed in a backwards
incompatible way.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:34 -08:00
Johan Hedberg
0cd75f7ed7 Bluetooth: Track LE L2CAP credits in l2cap_chan
This patch adds tracking of L2CAP connection oriented channel local and
remote credits to struct l2cap_chan and ensures that connect requests
and responses contain the right values.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:34 -08:00
Johan Hedberg
3831971355 Bluetooth: Add LE L2CAP flow control mode
The LE connection oriented channels have their own mode with its own
data transfer rules. In order to implement this properly we need to
distinguish L2CAP channels operating in this mode from other modes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:34 -08:00
Johan Hedberg
27e2d4c8d2 Bluetooth: Add basic LE L2CAP connect request receiving support
This patch adds the necessary boiler plate code to handle receiving
L2CAP connect requests over LE and respond to them with a proper connect
response.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:33 -08:00
Johan Hedberg
ee5ec5cf00 Bluetooth: Add definitions for LE connection oriented channels
This patch adds the necessary defines and structs for LE connection
oriented channels.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:32 -08:00
Johan Hedberg
9149761ad7 Bluetooth: Add module parameter to enable LE CoC support
Along with the L2CAP Connection Oriented Channels features it is now
allowed to use both custom fixed CIDs as well as PSM based (connection
oriented connections). Since the support for this (with the subsequent
patches) is still on an experimental stage, add a module parameter to
enable it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:32 -08:00
Johannes Berg
ad7e718c9b nl80211: vendor command support
Add support for vendor-specific commands to nl80211. This is
intended to be used for really vendor-specific functionality
that can't be implemented in a generic fashion for any reason.
It's *NOT* intended to be used for any normal/generic feature
or any optimisations that could be implemented across drivers.

Currently, only vendor commands (with replies) are supported,
no dump operations or vendor-specific notifications.

Also add a function wdev_to_ieee80211_vif() to mac80211 which
is needed for mac80211-based drivers wanting to implement any
vendor commands.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-03 16:27:17 +01:00
Marek Kwaczynski
60f4a7b167 nl80211/cfg80211: Set Operating Mode Notification
This attribute is needed for setting Operating Mode Notification
in AP mode from User Space. This functionality is required when
User Space received Assoc Request contains Operation Mode
Notification element.

Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com>
[fix typos, nl80211 documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-03 13:46:42 +01:00
Johannes Berg
18cfd3bfc9 Revert "mac80211: add driver callback for per-interface multicast filter"
This reverts commit 488b366a45.

The code isn't used by anyone, and the Intel driver isn't planning
to use it either right now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-03 13:21:55 +01:00
John W. Linville
4b074b0762 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-12-02 14:25:38 -05:00
Simon Wunderlich
e487eaeb07 cfg80211/mac80211/ath6kl: acquire wdev lock outside ch_switch_notify
The channel switch notification should be sent under the
wdev/sdata-lock, preferably in the same moment as the channel change
happens, to avoid races by other callers (e.g. start/stop_ap).
This also adds the previously missing sdata_lock protection in
csa_finalize_work.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-02 11:51:54 +01:00
Andrei Otcheretianski
b176e62940 cfg80211: aggregate mgmt_tx parameters into a struct
Change cfg80211 and mac80211 to use cfg80211_mgmt_tx_params
struct to aggregate parameters for mgmt_tx functions.
This makes the functions' signatures less clumsy and allows
less painful parameters extension.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
[fix all other drivers]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-02 11:51:52 +01:00
Xufeng Zhang
6eabca54d6 sctp: Restore 'resent' bit to avoid retransmitted chunks for RTT measurements
Currently retransmitted DATA chunks could also be used for
RTT measurements since there are no flag to identify whether
the transmitted DATA chunk is a new one or a retransmitted one.
This problem is introduced by commit ae19c5486 ("sctp: remove
'resent' bit from the chunk") which inappropriately removed the
'resent' bit completely, instead of doing this, we should set
the resent bit only for the retransmitted DATA chunks.

Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28 18:29:58 -05:00
Luis R. Rodriguez
4c7d3982a6 cfg80211: use enum nl80211_dfs_regions for dfs_region everywhere
u8 was used in some other places, just stick to the enum,
this forces us to express the values that are expected.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:52:12 +01:00
Eliad Peller
21f659bf1f mac80211: add min required channel definition field
Add a new field to ieee80211_chanctx_conf to indicate
the min required channel configuration.

Tuning to a narrower channel might help reducing
the noise level and saving some power.

The min required channel definition is the max of
all min required channel definitions of the interfaces
bound to this channel context.

In AP mode, use 20MHz when there are no connected station.
When a new station is added/removed, calculate the new max
bandwidth supported by any of the stations (e.g. 80MHz when
80MHz and 40MHz stations are connected).

In other cases, simply use bss_conf.chandef as the
min required chandef.

Notify drivers about changes to this field by calling
drv_change_chanctx with a new CHANGE_MIN_WIDTH notification.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:52:05 +01:00
Luis R. Rodriguez
2a901468c2 cfg80211: add an option to disable processing country IEs
Certain vendors may want to disable the processing of
country IEs so that they can continue using the regulatory
domain the driver or user has set.  Currently there is no
way to stop the core from processing country IEs, so add
support to the core to ignore country IE hints.

Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:51 +01:00
Luis R. Rodriguez
a09a85a013 cfg80211: add flags to define country IE processing rules
802.11 cards may have different country IE parsing behavioural
preferences and vendors may want to support these. These preferences
were managed by the REGULATORY_CUSTOM_REG and the REGULATORY_STRICT_REG
flags and their combination. Instead of using this existing notation,
split out the country IE behavioural preferences as a new flag. This
will allow us to add more customizations easily and make the code more
maintainable.

Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix up conflicts]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:49 +01:00
Luis R. Rodriguez
a2f73b6c5d cfg80211: move regulatory flags to their own variable
We'll expand this later, this will make it easier to
classify and review what things are related to regulatory
or not.

Coccinelle only missed 4 hits, which I had to do manually,
supplying the SmPL in case of merge conflicts.

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags |= REGULATORY_CUSTOM_REG
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_CUSTOM_REGULATORY
+e->regulatory_flags |= REGULATORY_CUSTOM_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags &= ~REGULATORY_CUSTOM_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags & REGULATORY_CUSTOM_REG

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags |= REGULATORY_STRICT_REG
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_STRICT_REGULATORY
+e->regulatory_flags |= REGULATORY_STRICT_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags &= ~REGULATORY_STRICT_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags & REGULATORY_STRICT_REG

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags |= REGULATORY_DISABLE_BEACON_HINTS
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_DISABLE_BEACON_HINTS
+e->regulatory_flags |= REGULATORY_DISABLE_BEACON_HINTS
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags &= ~REGULATORY_DISABLE_BEACON_HINTS
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags & REGULATORY_DISABLE_BEACON_HINTS

Generated-by: Coccinelle SmPL
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Peter Senna Tschudin <peter.senna@gmail.com>
Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix up whitespace damage, overly long lines]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:46 +01:00
Max Stepanov
2475b1cc0d mac80211: add generic cipher scheme support
This adds generic cipher scheme support to mac80211, such schemes
are fully under control by the driver. On hw registration drivers
may specify additional HW ciphers with a scheme how these ciphers
have to be handled by mac80211 TX/RR. A cipher scheme specifies a
cipher suite value, a size of the security header to be added to
or stripped from frames and how the PN is to be verified on RX.

Signed-off-by: Max Stepanov <Max.Stepanov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:50:52 +01:00
Janusz Dziedzic
d2859df5e7 cfg80211/mac80211: DFS setup chandef for cac event
To report channel width correctly we have
to send correct channel parameters from
mac80211 when calling cfg80211_cac_event().

This is required in case of using channel width
higher than 20MHz and we have to set correct
dfs channel state after CAC (NL80211_DFS_AVAILABLE).

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:50:46 +01:00
Luis R. Rodriguez
222ea58199 cfg80211: force WIPHY_FLAG_CUSTOM_REGULATORY on wiphy_apply_custom_regulatory()
wiphy_apply_custom_regulatory() implies WIPHY_FLAG_CUSTOM_REGULATORY
but we never enforced it, do that now and warn if the driver
didn't set it. All drivers should be following this today already.

Having WIPHY_FLAG_CUSTOM_REGULATORY does not however mean you will
use wiphy_apply_custom_regulatory() though, you may have your own
_orig value set up tools / helpers. The intel drivers are examples
of this type of driver.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:49:45 +01:00
Luis R. Rodriguez
8fe02e167e cfg80211: consolidate passive-scan and no-ibss flags
These two flags are used for the same purpose, just
combine them into a no-ir flag to annotate no initiating
radiation is allowed.

Old userspace sending either flag will have it treated as
the no-ir flag. To be considerate to older userspace we
also send both the no-ir flag and the old no-ibss flags.
Newer userspace will have to be aware of older kernels.

Update all places in the tree using these flags with the
following semantic patch:

@@
@@
-NL80211_RRF_PASSIVE_SCAN
+NL80211_RRF_NO_IR
@@
@@
-NL80211_RRF_NO_IBSS
+NL80211_RRF_NO_IR
@@
@@
-IEEE80211_CHAN_PASSIVE_SCAN
+IEEE80211_CHAN_NO_IR
@@
@@
-IEEE80211_CHAN_NO_IBSS
+IEEE80211_CHAN_NO_IR
@@
@@
-NL80211_RRF_NO_IR | NL80211_RRF_NO_IR
+NL80211_RRF_NO_IR
@@
@@
-IEEE80211_CHAN_NO_IR | IEEE80211_CHAN_NO_IR
+IEEE80211_CHAN_NO_IR
@@
@@
-(NL80211_RRF_NO_IR)
+NL80211_RRF_NO_IR
@@
@@
-(IEEE80211_CHAN_NO_IR)
+IEEE80211_CHAN_NO_IR

Along with some hand-optimisations in documentation, to
remove duplicates and to fix some indentation.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[do all the driver updates in one go]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:49:35 +01:00
Hannes Frederic Sowa
85fbaa7503 inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions
Commit bceaa90240 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa90240 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-23 14:46:23 -08:00
Johannes Berg
91398a0992 genetlink: fix genl_set_err() group ID
Fix another really stupid bug - I introduced genl_set_err()
precisely to be able to adjust the group and reject invalid
ones, but then forgot to do so.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21 13:09:43 -05:00
Johannes Berg
220815a966 genetlink: fix genlmsg_multicast() bug
Unfortunately, I introduced a tremendously stupid bug into
genlmsg_multicast() when doing all those multicast group
changes: it adjusts the group number, but then passes it
to genlmsg_multicast_netns() which does that again.

Somehow, my tests failed to catch this, so add a warning
into genlmsg_multicast_netns() and remove the offending
group ID adjustment.

Also add a warning to the similar code in other functions
so people who misuse them are more loudly warned.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21 13:09:43 -05:00
Linus Torvalds
1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Johannes Berg
2a94fe48f3 genetlink: make multicast groups const, prevent abuse
Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.

This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.

At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
68eb55031d genetlink: pass family to functions using groups
This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
62b68e99fa genetlink: add and use genl_set_err()
Add a static inline to generic netlink to wrap netlink_set_err()
to make it easier to use here - use it in openvswitch (the only
generic netlink user of netlink_set_err()).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
c2ebb90846 genetlink: remove family pointer from genl_multicast_group
There's no reason to have the family pointer there since it
can just be passed internally where needed, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
06fb555a27 genetlink: remove genl_unregister_mc_group()
There are no users of this API remaining, and we'll soon
change group registration to be static (like ops are now)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
c53ed74236 genetlink: only pass array to genl_register_family_with_ops()
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Johannes Berg
568508aa07 genetlink: unify registration functions
Now that the ops assignment is just two variables rather than a
long list iteration etc., there's no reason to separately export
__genl_register_family() and __genl_register_family_with_ops().

Unify the two functions into __genl_register_family() and make
genl_register_family_with_ops() call it after assigning the ops.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 20:50:23 -05:00
Linus Torvalds
9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Johannes Berg
3f5ccd06ae genetlink: make genl_ops flags a u8 and move to end
To save some space in the struct on 32-bit systems,
make the flags a u8 (only 4 bits are used) and also
move them to the end of the struct.

This has no impact on 64-bit systems as alignment of
the struct in an array uses up the space anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg
f84f771d94 genetlink: allow making ops const
Allow making the ops array const by not modifying the ops
flags on registration but rather only when ops are sent
out in the family information.

No users are updated yet except for the pre_doit/post_doit
calls in wireless (the only ones that exist now.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg
d91824c08f genetlink: register family ops as array
Instead of using a linked list, use an array. This reduces
the data size needed by the users of genetlink, for example
in wireless (net/wireless/nl80211.c) on 64-bit it frees up
over 1K of data space.

Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event
since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns
-EINVAL anyway, therefore no such event could ever be sent.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg
3686ec5e84 genetlink: remove genl_register_ops/genl_unregister_ops
genl_register_ops() is still needed for internal registration,
but is no longer available to users of the API.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:40 -05:00
Jiri Pirko
6aafeef03b netfilter: push reasm skb through instead of original frag skbs
Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:

<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000    (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>

As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.

Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 00:19:35 -05:00
Florent Fourcot
3fdfa5ff50 ipv6: enable IPV6_FLOWLABEL_MGR for getsockopt
It is already possible to set/put/renew a label
with IPV6_FLOWLABEL_MGR and setsockopt. This patch
add the possibility to get information about this
label (current value, time before expiration, etc).

It helps application to take decision for a renew
or a release of the label.

v2:
 * Add spin_lock to prevent race condition
 * return -ENOENT if no result found
 * check if flr_action is GET

v3:
 * move the spin_lock to protect only the
   relevant code

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 13:42:57 -05:00
John W. Linville
c1f3bb6bd3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-11-08 09:03:10 -05:00
Mathias Krause
0c7ddf36c2 net: move pskb_put() to core code
This function has usage beside IPsec so move it to the core skbuff code.
While doing so, give it some documentation and change its return type to
'unsigned char *' to be in line with skb_put().

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:28:58 -05:00
Joe Perches
f9bddcdf1f udp: Remove unnecessary semicolon from do{}while (0) macro
Just an unnecessary semicolon that should be removed...

Whitespace neatening of macro too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 02:14:33 -05:00
Hannes Frederic Sowa
482fc6094a ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Cc: David S. Miller <davem@davemloft.net>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 21:52:27 -05:00
John W. Linville
33b443422e Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-11-05 15:50:22 -05:00
John W. Linville
353c78152c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	net/wireless/reg.c
2013-11-05 15:49:02 -05:00
David S. Miller
cfce0a2b61 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please accept the following pull request intended for the 3.13 tree...

I had intended to pass most of these to you as much as two weeks ago.
Unfortunately, I failed to account for the effects of bad Internet
connections and my own fatique/laziness while traveling.  On the bright
side, at least these have been baking in linux-next for some time!

For the mac80211 bits, Johannes says:

"This time I have two fixes for P2P (which requires not using CCK rates)
and a workaround for APs with broken WMM information."

For the iwlwifi bits, Johannes says:

"I have a few fixes for warnings/issues: one from Alex, fixing scan
timings, one from Emmanuel fixing a WARN_ON in the DVM driver, one from
Stanislaw removing a trigger-happy WARN_ON in the MVM driver and a
change from myself to try to recover when the device isn't processing
commands quickly."

And:

"For this round, I have a lot of changes:
 * power management improvements
 * BT coexistence improvements/updates
 * new device support
 * VHT support
 * IBSS support (though due to a small bug it requires new firmware)
 * various other fixes/improvements."

For the Bluetooth bits, Gustavo says:

"More patches for 3.12, busy times for Bluetooth. More than a 100 commits since
the last pull. The bulk of work comes from Johan and Marcel, they are doing
fixes and improvements all over the Bluetooth subsystem, as the diffstat can
show."

For the ath10k and ath6kl bits, Kalle says:

"Bartosz added support to ath10k for our 10.x AP firmware branch, which
gives us AP specific features and fixes. We still support the main
firmware branch as well just like before, ath10k detects runtime what
firmware is used. Unfortunately the firmware interface in 10.x branch is
somewhat different so there was quite a lot of changes in ath10k for
this.

Michal and Sujith did some performance improvements in ath10k. Vladimir
fixed a compiler warning and Fengguang removed an extra semicolon."

For the NFC bits, Samuel says:

"It's a fairly big one, with the following highlights:

- NFC digital layer implementation: Most NFC chipsets implement the NFC
  digital layer in firmware, but others have more basic functionalities
  and expect the host to implement the digital layer. This layer sits
  below the NFC core.

- Sony's port100 support: This is "soft" NFC USB dongle that expects the
  digital layer to be implemented on the host. This is the first user of
  our NFC digital stack implementation.

- Secure element API: We now provide a netlink API for enabling,
  disabling and discovering NFC attached (embedded or UICC ones) secure
  elements. With some userspace help, this allows us to support NFC
  payments.
  Only the pn544 driver currently supports that API.

- NCI SPI fixes and improvements: In order to support NCI devices over
  SPI, we fixed and improved our NCI/SPI implementation. The currently
  most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
  and we're planning to use our NCI/SPI framework to implement a
  driver for it.

- pn533 fragmentation support in target mode: This was the only missing
  feature from our pn533 impementation. We now support fragmentation in
  both Tx and Rx modes, in target mode."

On top of all that, brcmfmac and rt2x00 both get the usual flurry
of updates.  A few other drivers get hit here or there as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 02:34:57 -05:00
Jesper Dangaard Brouer
1ba3aab303 net: codel: Avoid undefined behavior from signed overflow
As described in commit 5a581b367 (jiffies: Avoid undefined
behavior from signed overflow), according to the C standard
3.4.3p3, overflow of a signed integer results in undefined
behavior.

To fix this, do as the above commit, and do an unsigned
subtraction, and interpreting the result as a signed
two's-complement number.  This is based on the theory from
RFC 1982 and is nicely described in wikipedia here:
 https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution

A side-note, I have seen practical issues with the previous logic
when dealing with 16-bit, on a 64-bit machine (gcc version
4.4.5). This were 32-bit, which I have not observed issues with.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 20:01:29 -05:00
Yuchung Cheng
9f9843a751 tcp: properly handle stretch acks in slow start
Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
regardless the number of packets. Consequently slow start performance
is highly dependent on the degree of the stretch ACKs caused by
receiver or network ACK compression mechanisms (e.g., delayed-ACK,
GRO, etc).  But slow start algorithm is to send twice the amount of
packets of packets left so it should process a stretch ACK of degree
N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
follow up patch will use the remainder of the N (if greater than 1)
to adjust cwnd in the congestion avoidance phase.

In addition this patch retires the experimental limited slow start
(LSS) feature. LSS has multiple drawbacks but questionable benefit. The
fractional cwnd increase in LSS requires a loop in slow start even
though it's rarely used. Configuring such an increase step via a global
sysctl on different BDPS seems hard. Finally and most importantly the
slow start overshoot concern is now better covered by the Hybrid slow
start (hystart) enabled by default.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:57:59 -05:00
David S. Miller
72c39a0ade Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
This is another batch containing Netfilter/IPVS updates for your net-next
tree, they are:

* Six patches to make the ipt_CLUSTERIP target support netnamespace,
  from Gao feng.

* Two cleanups for the nf_conntrack_acct infrastructure, introducing
  a new structure to encapsulate conntrack counters, from Holger
  Eitzenberger.

* Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann.

* Skip checksum recalculation in SCTP support for IPVS, also from
  Daniel Borkmann.

* Fix behavioural change in xt_socket after IP early demux, from
  Florian Westphal.

* Fix bogus large memory allocation in the bitmap port set type in ipset,
  from Jozsef Kadlecsik.

* Fix possible compilation issues in the hash netnet set type in ipset,
  also from Jozsef Kadlecsik.

* Define constants to identify netlink callback data in ipset dumps,
  again from Jozsef Kadlecsik.

* Use sock_gen_put() in xt_socket to replace xt_socket_put_sk,
  from Eric Dumazet.

* Improvements for the SH scheduler in IPVS, from Alexander Frolkin.

* Remove extra delay due to unneeded rcu barrier in IPVS net namespace
  cleanup path, from Julian Anastasov.

* Save some cycles in ip6t_REJECT by skipping checksum validation in
  packets leaving from our stack, from Stanislav Fomichev.

* Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from
  Julian Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:46:58 -05:00
Daniel Borkmann
cea80ea8d2 net: checksum: fix warning in skb_checksum
This patch fixes a build warning in skb_checksum() by wrapping the
csum_partial() usage in skb_checksum(). The problem is that on a few
architectures, csum_partial is used with prefix asmlinkage whereas
on most architectures it's not. So fix this up generically as we did
with csum_block_add_ext() to match the signature. Introduced by
2817a336d4 ("net: skb_checksum: allow custom update/combine for
walking skb").

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 15:27:08 -05:00
John W. Linville
87bc0728d4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/brcm80211/brcmfmac/sdio_host.h
2013-11-04 14:51:28 -05:00
David S. Miller
394efd19d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 13:48:30 -05:00
Daniel Borkmann
e6d8b64b34 net: sctp: fix and consolidate SCTP checksumming code
This fixes an outstanding bug found through IPVS, where SCTP packets
with skb->data_len > 0 (non-linearized) and empty frag_list, but data
accumulated in frags[] member, are forwarded with incorrect checksum
letting SCTP initial handshake fail on some systems. Linearizing each
SCTP skb in IPVS to prevent that would not be a good solution as
this leads to an additional and unnecessary performance penalty on
the load-balancer itself for no good reason (as we actually only want
to update the checksum, and can do that in a different/better way
presented here).

The actual problem is elsewhere, namely, that SCTP's checksumming
in sctp_compute_cksum() does not take frags[] into account like
skb_checksum() does. So while we are fixing this up, we better reuse
the existing code that we have anyway in __skb_checksum() and use it
for walking through the data doing checksumming. This will not only
fix this issue, but also consolidates some SCTP code with core
sk_buff code, bringing it closer together and removing respectively
avoiding reimplementation of skb_checksum() for no good reason.

As crc32c() can use hardware implementation within the crypto layer,
we leave that intact (it wraps around / falls back to e.g. slice-by-8
algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
combinator for crc32c blocks.

Also, we remove all other SCTP checksumming code, so that we only
have to use sctp_compute_cksum() from now on; for doing that, we need
to transform SCTP checkumming in output path slightly, and can leave
the rest intact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:57 -05:00
Daniel Borkmann
2817a336d4 net: skb_checksum: allow custom update/combine for walking skb
Currently, skb_checksum walks over 1) linearized, 2) frags[], and
3) frag_list data and calculats the one's complement, a 32 bit
result suitable for feeding into itself or csum_tcpudp_magic(),
but unsuitable for SCTP as we're calculating CRC32c there.

Hence, in order to not re-implement the very same function in
SCTP (and maybe other protocols) over and over again, use an
update() + combine() callback internally to allow for walking
over the skb with different algorithms.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:57 -05:00
Holger Eitzenberger
f7b13e4330 netfilter: introduce nf_conn_acct structure
Encapsulate counters for both directions into nf_conn_acct. During
that process also consistently name pointers to the extend 'acct',
not 'counters'. This patch is a cleanup.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-03 21:48:49 +01:00
David S. Miller
296c10639a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Minor merge conflict in xfrm_policy.c, consisting of overlapping
changes which were trivial to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02 02:13:48 -04:00
Joseph Gasparakis
e6cd988c27 vxlan: Have the NIC drivers do less work for offloads
This patch removes the burden from the NIC drivers to check if the
vxlan driver is enabled in the kernel and also makes available
the vxlan headrooms to them.

Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 02:39:13 -07:00
Mathias Krause
1c5ad13f7c net: esp{4,6}: get rid of struct esp_data
struct esp_data consists of a single pointer, vanishing the need for it
to be a structure. Fold the pointer into 'data' direcly, removing one
level of pointer indirection.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-29 06:39:42 +01:00
Mathias Krause
123b0d1ba0 net: esp{4,6}: remove padlen from struct esp_data
The padlen member of struct esp_data is always zero. Get rid of it.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-29 06:39:42 +01:00
David S. Miller
5d9efa7ee9 ipv6: Remove privacy config option.
The code for privacy extentions is very mature, and making it
configurable only gives marginal memory/code savings in exchange
for obfuscation and hard to read code via CPP ifdef'ery.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 20:07:50 -04:00
Felix Fietkau
06be6b149f mac80211: add ieee80211_tx_prepare_skb() helper function
This can be used by a driver to prepare skbs for transmission, which were
obtained via functions such as ieee80211_probereq_get or
ieee80211_nullfunc_get.

This is useful for drivers that want to send those frames directly, but
need rate control information to be prepared first.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:27 +01:00
Luis R. Rodriguez
034c6d6e67 cfg80211: export reg_initiator_name()
Drivers can now use this to parse the regulatory request and
be more verbose when needed.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:27 +01:00
Johannes Berg
919be62b97 mac80211: add missing IEEE80211_HW_SUPPORTS_HT_CCK_RATES docs
Document the IEEE80211_HW_SUPPORTS_HT_CCK_RATES flag.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:22 +01:00
Simon Wunderlich
5336fa88e8 nl80211/cfg80211: enable DFS for IBSS mode
To use DFS in IBSS mode, userspace is required to react to radar events.
It can inform nl80211 that it is capable of doing so by adding a
NL80211_ATTR_HANDLE_DFS attribute when joining the IBSS.

This attribute is supplied to let the kernelspace know that the
userspace application can and will handle radar events, e.g. by
intiating channel switches to a valid channel. DFS channels may
only be used if this attribute is supplied and the driver supports
it. Driver support will be checked even if a channel without DFS
will be initially joined, as a DFS channel may be chosen later.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
[fix attribute name in commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:21 +01:00
Hannes Frederic Sowa
01ba16d6ec ipv6: reset dst.expires value when clearing expire flag
On receiving a packet too big icmp error we update the expire value by
calling rt6_update_expires. This function uses dst_set_expires which is
implemented that it can only reduce the expiration value of the dst entry.

If we insert new routing non-expiry information into the ipv6 fib where
we already have a matching rt6_info we only clear the RTF_EXPIRES flag
in rt6i_flags and leave the dst.expires value as is.

When new mtu information arrives for that cached dst_entry we again
call dst_set_expires. This time it won't update the dst.expire value
because we left the dst.expire value intact from the last update. So
dst_set_expires won't touch dst.expires.

Fix this by resetting dst.expires when clearing the RTF_EXPIRE flag.
dst_set_expires checks for a zero expiration and updates the
dst.expires.

In the past this (not updating dst.expires) was necessary because
dst.expire was placed in a union with the dst_entry *from reference
and rt6_clean_expires did assign NULL to it. This split happend in
ecd9883724 ("ipv6: fix race condition
regarding dst->expires and dst->from").

Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-by: Valentijn Sessink <valentyn@blub.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Valentijn Sessink <valentyn@blub.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:26:59 -04:00
Hannes Frederic Sowa
7088ad74e6 inet: remove old fragmentation hash initializing
All fragmentation hash secrets now get initialized by their
corresponding hash function with net_get_random_once. Thus we can
eliminate the initial seeding.

Also provide a comment that hash secret seeding happens at the first
call to the corresponding hashing function.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:01:41 -04:00
Hannes Frederic Sowa
b1190570b4 ipv6: split inet6_hash_frag for netfilter and initialize secrets with net_get_random_once
Defer the fragmentation hash secret initialization for IPv6 like the
previous patch did for IPv4.

Because the netfilter logic reuses the hash secret we have to split it
first. Thus introduce a new nf_hash_frag function which takes care to
seed the hash secret.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:01:40 -04:00
David S. Miller
c3fa32b976 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/qmi_wwan.c
	include/net/dst.h

Trivial merge conflicts, both were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:49:34 -04:00
Christoph Paasch
35b87f6c13 net: Dereference pointer-value of sk_prot->memory_pressure
2e685cad57 (tcp_memcontrol: Kill struct tcp_memcontrol) falsly modified
the access to memory_pressure of sk->sk_prot->memory_pressure. The patch
did modify the memory_pressure-field of struct cg_proto, but not the one
of struct proto.

So, the access to sk_prot->memory_pressure should not be changed.

Acked-by: Eric Dumazet <edumazet@google.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:15:01 -04:00
ZHAO Gang
0a6957e7d4 net: remove function sk_reset_txq()
What sk_reset_txq() does is just calls function sk_tx_queue_reset(),
and sk_reset_txq() is used only in sock.h, by dst_negative_advice().
Let dst_negative_advice() calls sk_tx_queue_reset() directly so we
can remove unneeded sk_reset_txq().

Signed-off-by: ZHAO Gang <gamerh2o@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22 14:00:21 -04:00
Alexandre Belloni
7e4d8a193f mac802154: correct a typo in ieee802154_alloc_device() prototype
This has no other impact than a cosmetic one.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:56:23 -04:00
Eric W. Biederman
2e685cad57 tcp_memcontrol: Kill struct tcp_memcontrol
Replace the pointers in struct cg_proto with actual data fields and kill
struct tcp_memcontrol as it is not fully redundant.

This removes a confusing, unnecessary layer of abstraction.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Eric W. Biederman
a4fe34bf90 tcp_memcontrol: Remove the per netns control.
The code that is implemented is per memory cgroup not per netns, and
having per netns bits is just confusing.  Remove the per netns bits to
make it easier to see what is really going on.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Eric W. Biederman
f594d63199 tcp_memcontrol: Remove setting cgroup settings via sysctl
The code is broken and does not constrain sysctl_tcp_mem as
tcp_update_limit does.  With the result that it allows the cgroup tcp
memory limits to be bypassed.

The semantics are broken as the settings are not per netns and are in a
per netns table, and instead looks at current.

Since the code is broken in both design and implementation and does not
implement the functionality for which it was written remove it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Eric W. Biederman
cd91cce620 tcp_memcontrol: Remove tcp_max_memory
This function is never called. Remove it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Julian Anastasov
550bab42f8 ipv6: fill rt6i_gateway with nexthop address
Make sure rt6i_gateway contains nexthop information in
all routes returned from lookup or when routes are directly
attached to skb for generated ICMP packets.

The effect of this patch should be a faster version of
rt6_nexthop() and the consideration of local addresses as
nexthop.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:37:01 -04:00
Julian Anastasov
96dc809514 ipv6: always prefer rt6i_gateway if present
In v3.9 6fd6ce2056 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
such that the recently introduced rt6_nexthop() is used
instead of an assigned neighbor.

As rt6_nexthop() prefers rt6i_gateway only for gatewayed
routes this causes a problem for users like IPVS, xt_TEE and
RAW(hdrincl) if they want to use different address for routing
compared to the destination address.

Another case is when redirect can create RTF_DYNAMIC
route without RTF_GATEWAY flag, we ignore the rt6i_gateway
in rt6_nexthop().

Fix the above problems by considering the rt6i_gateway if
present, so that traffic routed to address on local subnet is
not wrongly diverted to the destination address.

Thanks to Simon Horman and Phil Oester for spotting the
problematic commit.

Thanks to Hannes Frederic Sowa for his review and help in testing.

Reported-by: Phil Oester <kernel@linuxace.com>
Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:37:00 -04:00
Joe Perches
5eccdfaabc nf_tables*.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 17:19:06 -04:00
Gustavo Padovan
d78a32a8fc Bluetooth: Remove sk member from struct l2cap_chan
There is no access to chan->sk in L2CAP core now. This change marks the
end of the task of splitting L2CAP between Core and Socket, thus sk is now
gone from struct l2cap_chan.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 13:50:56 -07:00
Gustavo Padovan
0e790c64f3 Bluetooth: Add L2CAP channel to skb private data
Adding the channel to the skb private data makes possible to us know which
channel the skb we have came from.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 13:50:55 -07:00
Hannes Frederic Sowa
222e83d2e0 tcp: switch tcp_fastopen key generation to net_get_random_once
Changed key initialization of tcp_fastopen cookies to net_get_random_once.

If the user sets a custom key net_get_random_once must be called at
least once to ensure we don't overwrite the user provided key when the
first cookie is generated later on.

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Hannes Frederic Sowa
1bbdceef1e inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once
Initialize the ehash and ipv6_hash_secrets with net_get_random_once.

Each compilation unit gets its own secret now:
  ipv4/inet_hashtables.o
  ipv4/udp.o
  ipv6/inet6_hashtables.o
  ipv6/udp.o
  rds/connection.o

The functions still get inlined into the hashing functions. In the fast
path we have at most two (needed in ipv6) if (unlikely(...)).

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Hannes Frederic Sowa
b23a002fc6 inet: split syncookie keys for ipv4 and ipv6 and initialize with net_get_random_once
This patch splits the secret key for syncookies for ipv4 and ipv6 and
initializes them with net_get_random_once. This change was the reason I
did this series. I think the initialization of the syncookie_secret is
way to early.

Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Hannes Frederic Sowa
b50026b5ac ipv6: split inet6_ehashfn to hash functions per compilation unit
This patch splits the inet6_ehashfn into separate ones in
ipv6/inet6_hashtables.o and ipv6/udp.o to ease the introduction of
seperate secrets keys later.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:34 -04:00
Hannes Frederic Sowa
65cd8033ff ipv4: split inet_ehashfn to hash functions per compilation unit
This duplicates a bit of code but let's us easily introduce
separate secret keys later. The separate compilation units are
ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:34 -04:00
Eric Dumazet
2d26f0a3c0 ipv4: generalize gre_handle_offloads
This patch makes gre_handle_offloads() more generic
and rename it to iptunnel_handle_offloads()

This will be used to add GSO/TSO support to IPIP tunnels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:36:18 -04:00
Seif Mazareeb
f2e5ddcc0d net: fix cipso packet validation when !NETLABEL
When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop
forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel
crash in an SMP system, since the CPU executing this function will
stall /not respond to IPIs.

This problem can be reproduced by running the IP Stack Integrity Checker
(http://isic.sourceforge.net) using the following command on a Linux machine
connected to DUT:

"icmpsic -s rand -d <DUT IP address> -r 123456"
wait (1-2 min)

Signed-off-by: Seif Mazareeb <seif@marvell.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 18:55:42 -04:00
Marcel Holtmann
4b4148e9ac Bluetooth: Add support for setting DUT mode
The Device Under Test (DUT) mode is useful for doing certification
testing and so expose this as debugfs option.

This mode is actually special since you can only enter it. Restoring
normal operation means that a HCI Reset is required. The current mode
value gets tracked as a new device flag and when disabling it, the
correct command to reset the controller is sent.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 18:56:56 +03:00
Marcel Holtmann
4e70c7e71c Bluetooth: Expose debugfs settings for LE connection interval
For testing purposes expose the default LE connection interval values
via debugfs.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 18:56:54 +03:00
Marcel Holtmann
06f5b7785a Bluetooth: Add support for setting SSP debug mode
Enabling and disabling SSP debug mode is useful for development. This
adds a debugfs entry that allows to configure the SSP debug mode.

On purpose this has been implemented as debugfs entry and not a public
API since it is really only useful during testing and development.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 18:56:52 +03:00
Marcel Holtmann
3497ac84bd Bluetooth: Remove interval parameter from HCI connection
The conn->interval parameter of HCI connections is not used at all
and so just remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:50:05 +03:00
Marcel Holtmann
79830f66e3 Bluetooth: Select the own address type during initial setup phase
The own address type is based on the fact if the controller has
a public address or not. This means that this detail can be just
configured once during setup phase.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:28:06 +03:00
Luis R. Rodriguez
03f27120fb cfg80211: export reg_initiator_name()
Drivers can now use this to parse the regulatory request and
be more verbose when needed.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-10-18 14:04:40 -04:00
John W. Linville
df3c2adea4 This is the first NFC pull request for the 3.13 kernel.
It's a fairly big one, with the following highlights:
 
 - NFC digital layer implementation: Most NFC chipsets implement the NFC
   digital layer in firmware, but others have more basic functionalities
   and expect the host to implement the digital layer. This layer sits
   below the NFC core.
 
 - Sony's port100 support: This is "soft" NFC USB dongle that expects the
   digital layer to be implemented on the host. This is the first user of
   our NFC digital stack implementation.
 
 - Secure element API: We now provide a netlink API for enabling,
   disabling and discovering NFC attached (embedded or UICC ones) secure
   elements. With some userspace help, this allows us to support NFC
   payments.
   Only the pn544 driver currently supports that API.
 
 - NCI SPI fixes and improvements: In order to support NCI devices over
   SPI, we fixed and improved our NCI/SPI implementation. The currently
   most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
   and we're planning to use our NCI/SPI framework to implement a
   driver for it.
 
 - pn533 fragmentation support in target mode: This was the only missing
   feature from our pn533 impementation. We now support fragmentation in
   both Tx and Rx modes, in target mode.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSUqvzAAoJEIqAPN1PVmxKKHEP/1mK8kWx9ZZ8egkrkzaTw4lW
 J9uJq2kTGv5tfEfWjuigAhz5kHB6qLTJLx8FEBZAHazLfa4Az0Zm2G7I7+GTQAMV
 Mw+uAAI22GGnNgdTNzXDAqgS76Eul2PiWf2UwAbKFZWiyEj5Vf/eZBJAS9nTKLM3
 En37wBjvMpVgj6ZwOHKsqPR801NSa3HiM653tkmgSsk3QVL3b7s+kynSuAtshsmA
 twzymUoJFfYePVnQNRnT3PVPR6w0tRNs4vdq8j5UUuoj1vgKC0/h6nIz2aCURwyE
 j1pQlSy4ykASM8QZNI2jv3VaOFxmASig2GOgzvCeBZOmCBdlNq13ZY/T0SmmiYES
 LTbg6Pmh5BXID95GqcFcJaC+iS+QNZK+RWhNYmj+JWCuqZ9Qcja3++IoQ7m/pAip
 WfyQsO7lD7wPhXUu4IMxfyKNu8u4nCiHdBMp7TBVXiWIVUMxIcoM0auNW4O1qSvW
 eLq0f1oqQXE9hIrJ4oYUZmn9D5CKRLIXr9SVcmat+VxmV/+jFgV5cMQIGRdDAEUq
 4g5g7Avbpkkccn9wdFaIm0VpxMebaNFSWVfBOw4L7PF56lqns6shBcq3W3OvDf60
 /Tn0oe9Qqvr17xarJackFkF8beBkJaKOJv9M5+KjlVgGexcNMMTIiwj8aPl6+mqC
 t5n6tCp0g941jB3qAV2G
 =K1A5
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"This is the first NFC pull request for the 3.13 kernel.
It's a fairly big one, with the following highlights:

- NFC digital layer implementation: Most NFC chipsets implement the NFC
  digital layer in firmware, but others have more basic functionalities
  and expect the host to implement the digital layer. This layer sits
  below the NFC core.

- Sony's port100 support: This is "soft" NFC USB dongle that expects the
  digital layer to be implemented on the host. This is the first user of
  our NFC digital stack implementation.

- Secure element API: We now provide a netlink API for enabling,
  disabling and discovering NFC attached (embedded or UICC ones) secure
  elements. With some userspace help, this allows us to support NFC
  payments.
  Only the pn544 driver currently supports that API.

- NCI SPI fixes and improvements: In order to support NCI devices over
  SPI, we fixed and improved our NCI/SPI implementation. The currently
  most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
  and we're planning to use our NCI/SPI framework to implement a
  driver for it.

- pn533 fragmentation support in target mode: This was the only missing
  feature from our pn533 impementation. We now support fragmentation in
  both Tx and Rx modes, in target mode."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-10-18 13:59:28 -04:00
John W. Linville
b231070a18 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-10-18 13:56:17 -04:00
David S. Miller
9aaf0435b4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Don't use a wildcard SA if a more precise one is in acquire state,
   from Fan Du.

2) Simplify the SA lookup when using wildcard source. We need to check
   only the destination in this case, from Fan Du.

3) Add a receive path hook for IPsec virtual tunnel interfaces
   to xfrm6_mode_tunnel.

4) Add support for IPsec virtual tunnel interfaces to ipv6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18 13:51:35 -04:00
Eric Dumazet
28be6e07e8 tcp: rename tcp_tso_segment()
Rename tcp_tso_segment() to tcp_gso_segment(), to better reflect
what is going on, and ease grep games.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18 13:38:39 -04:00
Marcel Holtmann
bdc3e0f1d2 Bluetooth: Move device_add handling into hci_register_dev
The device_add handling can be done directly in hci_register_dev and
device_remove within hci_unregister_dev.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-18 10:46:54 +03:00
Marcel Holtmann
b9ee0a783a Bluetooth: Add address type to device blacklist table
The device blacklist is not taking care of the address type. Actually
store the address type in the list entries and also use them when
looking up addresses in the table.

This is actually a serious bug. When adding a LE public address to
the blacklist, then it would be blocking a device on BR/EDR. And this
is not the expected behavior.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-18 10:21:28 +03:00
David S. Miller
5cda73b68e Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This is a batch of updates intended for the 3.13 stream...

The biggest item of interest in here is wcn36xx, the new mac80211
driver for Qualcomm WCN3660/WCN3680 hardware.

Regarding the mac80211 bits, Johannes says:

"We have an assortment of cleanups and new features, of which the
biggest one is probably the channel-switch support in IBSS. Nothing
else really stands out much."

On top of that, the ath9k and rt2x00 get a lot of update action from
Felix Fietkau and Gabor Juhos, respectively.  There are a handful of
updates to other drivers here and there as well.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 16:14:29 -04:00
Eric Dumazet
0baf2b35fc ipv4: shrink rt_cache_stat
Half of the rt_cache_stat fields are no longer used after IP
route cache removal, lets shrink this per cpu area.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 16:11:04 -04:00
Randy Dunlap
a5bb202b84 netdev: inet_timewait_sock.h missing semi-colon when KMEMCHECK is enabled
Fix (a few hundred) build errors due to missing semi-colon when
KMEMCHECK is enabled:

  include/net/inet_timewait_sock.h:139:2: error: expected ',', ';' or '}' before 'int'
  include/net/inet_timewait_sock.h:148:28: error: 'const struct inet_timewait_sock' has no member named 'tw_death_node'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:56:53 -04:00
Vlad Yasevich
e87b3998d7 net: dst: provide accessor function to dst->xfrm
dst->xfrm is conditionally defined.  Provide accessor funtion that
is always available.

Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:24:44 -04:00
David S. Miller
da33edcceb Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
netfilter updates: nf_tables pull request

The following patchset contains the current original nf_tables tree
condensed in 17 patches. I have organized them by chronogical order
since the original nf_tables code was released in 2009 and by
dependencies between the different patches.

The patches are:

1) Adapt all existing hooks in the tree to pass hook ops to the
   hook callback function, required by nf_tables, from Patrick McHardy.

2) Move alloc_null_binding to nf_nat_core, as it is now also needed by
   nf_tables and ip_tables, original patch from Patrick McHardy but
   required major changes to adapt it to the current tree that I made.

3) Add nf_tables core, including the netlink API, the packet filtering
   engine, expressions and built-in tables, from Patrick McHardy. This
   patch includes accumulated fixes since 2009 and minor enhancements.
   The patch description contains a list of references to the original
   patches for the record. For those that are not familiar to the
   original work, see [1], [2] and [3].

4) Add netlink set API, this replaces the original set infrastructure
   to introduce a netlink API to add/delete sets and to add/delete
   set elements. This includes two set types: the hash and the rb-tree
   sets (used for interval based matching). The main difference with
   ipset is that this infrastructure is data type agnostic. Patch from
   Patrick McHardy.

5) Allow expression operation overload, this API change allows us to
   provide define expression subtypes depending on the configuration
   that is received from user-space via Netlink. It is used by follow
   up patches to provide optimized versions of the payload and cmp
   expressions and the x_tables compatibility layer, from Patrick
   McHardy.

6) Add optimized data comparison operation, it requires the previous
   patch, from Patrick McHardy.

7) Add optimized payload implementation, it requires patch 5, from
   Patrick McHardy.

8) Convert built-in tables to chain types. Each chain type have special
   semantics (filter, route and nat) that are used by userspace to
   configure the chain behaviour. The main chain regarding iptables
   is that tables become containers of chain, with no specific semantics.
   However, you may still configure your tables and chains to retain
   iptables like semantics, patch from me.

9) Add compatibility layer for x_tables. This patch adds support to
   use all existing x_tables extensions from nf_tables, this is used
   to provide a userspace utility that accepts iptables syntax but
   used internally the nf_tables kernel core. This patch includes
   missing features in the nf_tables core such as the per-chain
   stats, default chain policy and number of chain references, which
   are required by the iptables compatibility userspace tool. Patch
   from me.

10) Fix transport protocol matching, this fix is a side effect of the
    x_tables compatibility layer, which now provides a pointer to the
    transport header, from me.

11) Add support for dormant tables, this feature allows you to disable
    all chains and rules that are contained in one table, from me.

12) Add IPv6 NAT support. At the time nf_tables was made, there was no
    NAT IPv6 support yet, from Tomasz Bursztyka.

13) Complete net namespace support. This patch register the protocol
    family per net namespace, so tables (thus, other objects contained
    in tables such as sets, chains and rules) are only visible from the
    corresponding net namespace, from me.

14) Add the insert operation to the nf_tables netlink API, this requires
    adding a new position attribute that allow us to locate where in the
    ruleset a rule needs to be inserted, from Eric Leblond.

15) Add rule batching support, including atomic rule-set updates by
    using rule-set generations. This patch includes a change to nfnetlink
    to include two new control messages to indicate the beginning and
    the end of a batch. The end message is interpreted as the commit
    message, if it's missing, then the rule-set updates contained in the
    batch are aborted, from me.

16) Add trace support to the nf_tables packet filtering core, from me.

17) Add ARP filtering support, original patch from Patrick McHardy, but
    adapted to fit into the chain type infrastructure. This was recovered
    to be used by nft userspace tool and our compatibility arptables
    userspace tool.

There is still work to do to fully replace x_tables [4] [5] but that can
be done incrementally by extending our netlink API. Moreover, looking at
netfilter-devel and the amount of contributions to nf_tables we've been
getting, I think it would be good to have it mainstream to avoid accumulating
large patchsets skip continuous rebases.

I tried to provide a reasonable patchset, we have more than 100 accumulated
patches in the original nf_tables tree, so I collapsed many of the small
fixes to the main patch we had since 2009 and provide a small batch for
review to netdev, while trying to retain part of the history.

For those who didn't give a try to nf_tables yet, there's a quick howto
available from Eric Leblond that describes how to get things working [6].

Comments/reviews welcome.

Thanks!

[1] http://lwn.net/Articles/324251/
[2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf
[3] http://lwn.net/Articles/564095/
[4] http://people.netfilter.org/pablo/map-pending-work.txt
[4] http://people.netfilter.org/pablo/nftables-todo.txt
[5] https://home.regit.org/netfilter-en/nftables-quick-howto/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:22:05 -04:00
Michael Opdenacker
78dea8cc49 irda: update comment mentioning IRQF_DISABLED
This patch removes a comment mentioning IRQF_DISABLED,
which is deprecated.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:13:20 -04:00
John W. Linville
9f96da4dd2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-10-17 14:02:07 -04:00
Johan Hedberg
a74a84f696 Bluetooth: Convert idle timer to use delayed work
There is no need to use a timer since the entire Bluetooth subsystem
runs using workqueues these days.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-16 09:12:13 -07:00
Johan Hedberg
7bc18d9d3d Bluetooth: Convert auto accept timer to use delayed work
Since the entire Bluetooth subsystem runs in workqueues these days there
is no need to use a timer for deferring work.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-16 09:12:12 -07:00
Marcel Holtmann
d3900cb25d Bluetooth: Remove enable_hs declaration
This seems to be a left-over. The module parameter enable_hs has
been removed, but its extern declaration is still present. It is
not needed anymore, so just remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-16 12:28:25 +03:00
Marcel Holtmann
db3aebf4a1 Bluetooth: Remove duplicate definitions for advertising event types
The constants for advertising event types have been defined twice. So
remove one copy of it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-16 10:31:48 +03:00
Marcel Holtmann
f14d8f6437 Bluetooth: Set the scan response data when needed
On controller power on and when enabling LE functionality,
make sure that also the scan response data is correctly set.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-16 10:31:24 +03:00
Marcel Holtmann
f8e808bd68 Bluetooth: Store scan response data in HCI device
The scan response data needs to be stored in HCI device and so
add a buffer for it and also ensure to clear it when resetting
the controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-16 10:30:05 +03:00
Marcel Holtmann
083368f7b8 Bluetooth: Make mgmt_new_ltk() return void
The return value of mgmt_new_ltk() function is not used and
so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:54:57 -03:00
Marcel Holtmann
3edaf092c2 Bluetooth: Make mgmt_read_local_oob_data_reply_complete() return void
The return value of mgmt_read_local_oob_data_reply_complete() function
is not used and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:54:49 -03:00
Marcel Holtmann
7667da3423 Bluetooth: Make mgmt_set_local_name_complete() return void
The return value of mgmt_set_local_name_complete() function is
not used and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:54:34 -03:00
Marcel Holtmann
4e1b0245f2 Bluetooth: Make mgmt_set_class_of_dev_complete() return void
The return value of mgmt_set_class_of_dev_complete() function is
not used and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:54:28 -03:00
Marcel Holtmann
3e248560d9 Bluetooth: Make mgmt_ssp_enable_complete() return void
The return value of mgmt_ssp_enable_complete() function is not
used and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:54:23 -03:00
Marcel Holtmann
464996aea4 Bluetooth: Make mgmt_auth_enable_complete() return void
The return value of mgmt_auth_enable_complete() function is not
used and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:54:10 -03:00
Marcel Holtmann
e546099c31 Bluetooth: Make mgmt_auth_failed() return void
The return value of mgmt_auth_failed() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:53:21 -03:00
Marcel Holtmann
3eb385289a Bluetooth: Make mgmt_pin_code_neg_reply_complete() return void
The return value of mgmt_pin_code_neg_reply_complete() function is
not used and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:53:13 -03:00
Marcel Holtmann
e669cf803c Bluetooth: Make mgmt_pin_code_reply_complete() return void
The return value of mgmt_pin_code_reply_complete() function is not
used and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:53:01 -03:00
Marcel Holtmann
ce0e4a0d7b Bluetooth: Make mgmt_pin_code_request() return void
The return value of mgmt_pin_code_request() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:52:47 -03:00
Marcel Holtmann
2ce5fb510f Bluetooth: Add l2cap_chan_no_resume stub for A2MP
The A2MP client for L2CAP channels needs to use l2cap_chan_no_resume
empty stub function.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-15 20:50:41 -03:00
Gustavo Padovan
dc25306b03 Bluetooth: Move l2cap_wait_ack() to l2cap_sock.c
The wait_ack code has a heavy dependency on the socket data structures
and, as of now, it won't be worthless change it to use non-socket
structures as the only user of such feature is a socket.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-15 16:42:44 -07:00
Gustavo Padovan
5ec1bbe549 Bluetooth: Add chan->ops->set_shutdown()
We need to remove all direct access of struct sock from L2CAP core.
This change is pretty simple and just add a new L2CAP channel callback to
do the work in the L2CAP socket side.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-15 16:42:44 -07:00
Gustavo Padovan
8d836d71e2 Bluetooth: Access sk_sndtimeo indirectly in l2cap_core.c
As part of the work to remove struct sock from l2cap_core.c and make it
more generic we remove in this commit the direct access to sk->sk_sndtimeo
member. This objective of this change is purely remove sk usage from
l2cap_core.c

Now we have a new l2cap ops to get the current value of sk->sndtimeo. A
l2cap_chan_no_get_sndtimeo was added for users of L2CAP that doesn't need
to set a timeout.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-15 16:42:44 -07:00
Gustavo Padovan
53f5212121 Bluetooth: Extend state_change() call to report errors too
Instead of creating an new function pointer to report errors we are just
reusing state_change for that and there is a simple reason for this, one
place in the l2cap_core.c code needs, in a locked sk, set both the sk_state
and sk_err. If we create two different functions for this we would need to
release the lock between the two operation putting the socket in non
desired state.

The change is transparent to the l2cap_core.c code, user that only needs
to set the state won't need any modification.

This is another step of an ongoing work to make l2cap_core.c totally
independent from l2cap's struct sock.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-15 16:42:44 -07:00
Marcel Holtmann
d1967ff88b Bluetooth: Update class of device on discoverable timeout
When the discoverable timeout triggers and limited discoverable mode
was used, then the class of device needs to be updated to remove
the limited discoverable bit.

To keep the class of device logic in a central place, expose a new
function mgmt_discoverable_timeout that can be called from the
timeout callback. In case the class of device value needs updating,
it will add the HCI command to the transaction.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 21:28:42 +03:00
Marcel Holtmann
efdcf8e3d7 Bluetooth: Move eir_get_length() function into hci_event.c
The eir_get_length() function is only used from hci_event.c and so
instead of having a public function move it to the location where
it is used.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 21:28:38 +03:00
Marcel Holtmann
9493399108 Bluetooth: Move eir_append_data() function into mgmt.c
The eir_append_data() function is only used from mgmt.c and so
instead of having a public function move it to the location where
it is used.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 21:28:34 +03:00
Marcel Holtmann
dc4a5ee2a3 Bluetooth: Make mgmt_new_link_key() return void
The return value of mgmt_new_link_key() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 21:27:45 +03:00
Marcel Holtmann
3d5053127f Bluetooth: Add HCI command structure for writing current IAC LAP
This patch just adds the HCI command structure for configuring the
current IAC LAP setting. The length of the command is variable and
supports more than two IAC. However since there is only general
discoverable and limited discoverable modes, this can be limited
to two possible IACs.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 18:44:24 +03:00
Marcel Holtmann
4796e8af60 Bluetooth: Make mgmt_write_scan_failed() return void
The return value of mgmt_write_scan_failed() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 17:21:35 +03:00
Marcel Holtmann
a330916c4f Bluetooth: Make mgmt_connectable() return void
The return value of mgmt_connectable() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 17:21:25 +03:00
Marcel Holtmann
86a7564573 Bluetooth: Make mgmt_discoverable() return void
The return value of mgmt_discoverable() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 17:20:33 +03:00
Marcel Holtmann
6acd7db41d Bluetooth: Introduce flag for limited discoverable mode
Add a new flag that can be set when in limited discoverable mode. This
flag will cause the limited discoverable bit in the class of device
value to bet set.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 17:20:15 +03:00
Marcel Holtmann
441ad2d041 Bluetooth: Update advertising data based on management commands
Magically updating the advertising data when some random command enables
advertising in the controller is not really a good idea. It also caused
a bit of complicated code with the exported hci_udpate_ad function that
is shared from many places.

This patch consolidates the advertising data update into the management
core. It also makes sure that when powering on with LE enabled or later
on enabling LE the controller has a good default for advertising data.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-15 17:20:00 +03:00
Julian Anastasov
9e4e948a3e ipvs: avoid rcu_barrier during netns cleanup
commit 578bc3ef1e ("ipvs: reorganize dest trash") added
rcu_barrier() on cleanup to wait dest users and schedulers
like LBLC and LBLCR to put their last dest reference.
Using rcu_barrier with many namespaces is problematic.

Trying to fix it by freeing dest with kfree_rcu is not
a solution, RCU callbacks can run in parallel and execution
order is random.

Fix it by creating new function ip_vs_dest_put_and_free()
which is heavier than ip_vs_dest_put(). We will use it just
for schedulers like LBLC, LBLCR that can delay their dest
release.

By default, dests reference is above 0 if they are present in
service and it is 0 when deleted but still in trash list.
Change the dest trash code to use ip_vs_dest_put_and_free(),
so that refcnt -1 can be used for freeing. As result,
such checks remain in slow path and the rcu_barrier() from
netns cleanup can be removed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-10-15 10:36:01 +09:00
Marcel Holtmann
4b836f393b Bluetooth: Read current IAC LAP on controller setup
Read the current IAC LAP values when initializing the controller. The
values are not used, but it is good to have them in the trace files
for debugging purposes.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-14 19:31:18 -03:00
Marcel Holtmann
b4cb9fb25e Bluetooth: Read number of supported IAC on controller setup
When initializing a controller make sure to read out the number of
supported IAC and store its result. This value is needed to determine
if limited discoverable for BR/EDR can be configured or not.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-10-14 19:31:12 -03:00
Pablo Neira Ayuso
ed683f138b netfilter: nf_tables: add ARP filtering support
This patch registers the ARP family and he filter chain type
for this family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:01:03 +02:00
Pablo Neira Ayuso
b5bc89bfa0 netfilter: nf_tables: add trace support
This patch adds support for tracing the packet travel through
the ruleset, in a similar fashion to x_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:01:02 +02:00
Pablo Neira Ayuso
0628b123c9 netfilter: nfnetlink: add batch support and use it from nf_tables
This patch adds a batch support to nfnetlink. Basically, it adds
two new control messages:

* NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
  the nfgenmsg->res_id indicates the nfnetlink subsystem ID.

* NFNL_MSG_BATCH_END, that results in the invocation of the
  ss->commit callback function. If not specified or an error
  ocurred in the batch, the ss->abort function is invoked
  instead.

The end message represents the commit operation in nftables, the
lack of end message results in an abort. This patch also adds the
.call_batch function that is only called from the batch receival
path.

This patch adds atomic rule updates and dumps based on
bitmask generations. This allows to atomically commit a set of
rule-set updates incrementally without altering the internal
state of existing nf_tables expressions/matches/targets.

The idea consists of using a generation cursor of 1 bit and
a bitmask of 2 bits per rule. Assuming the gencursor is 0,
then the genmask (expressed as a bitmask) can be interpreted
as:

00 active in the present, will be active in the next generation.
01 inactive in the present, will be active in the next generation.
10 active in the present, will be deleted in the next generation.
 ^
 gencursor

Once you invoke the transition to the next generation, the global
gencursor is updated:

00 active in the present, will be active in the next generation.
01 active in the present, needs to zero its future, it becomes 00.
10 inactive in the present, delete now.
^
gencursor

If a dump is in progress and nf_tables enters a new generation,
the dump will stop and return -EBUSY to let userspace know that
it has to retry again. In order to invalidate dumps, a global
genctr counter is increased everytime nf_tables enters a new
generation.

This new operation can be used from the user-space utility
that controls the firewall, eg.

nft -f restore

The rule updates contained in `file' will be applied atomically.

cat file
-----
add filter INPUT ip saddr 1.1.1.1 counter accept #1
del filter INPUT ip daddr 2.2.2.2 counter drop   #2
-EOF-

Note that the rule 1 will be inactive until the transition to the
next generation, the rule 2 will be evicted in the next generation.

There is a penalty during the rule update due to the branch
misprediction in the packet matching framework. But that should be
quickly resolved once the iteration over the commit list that
contain rules that require updates is finished.

Event notification happens once the rule-set update has been
committed. So we skip notifications is case the rule-set update
is aborted, which can happen in case that the rule-set is tested
to apply correctly.

This patch squashed the following patches from Pablo:

* nf_tables: atomic rule updates and dumps
* nf_tables: get rid of per rule list_head for commits
* nf_tables: use per netns commit list
* nfnetlink: add batch support and use it from nf_tables
* nf_tables: all rule updates are transactional
* nf_tables: attach replacement rule after stale one
* nf_tables: do not allow deletion/replacement of stale rules
* nf_tables: remove unused NFTA_RULE_FLAGS

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:01:01 +02:00
Pablo Neira Ayuso
99633ab29b netfilter: nf_tables: complete net namespace support
Register family per netnamespace to ensure that sets are
only visible in its approapriate namespace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:00:59 +02:00
Pablo Neira Ayuso
0ca743a559 netfilter: nf_tables: add compatibility layer for x_tables
This patch adds the x_tables compatibility layer. This allows you
to use existing x_tables matches and targets from nf_tables.

This compatibility later allows us to use existing matches/targets
for features that are still missing in nf_tables. We can progressively
replace them with native nf_tables extensions. It also provides the
userspace compatibility software that allows you to express the
rule-set using the iptables syntax but using the nf_tables kernel
components.

In order to get this compatibility layer working, I've done the
following things:

* add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
to query the x_tables match/target revision, so we don't need to
use the native x_table getsockopt interface.

* emulate xt structures: this required extending the struct nft_pktinfo
to include the fragment offset, which is already obtained from
ip[6]_tables and that is used by some matches/targets.

* add support for default policy to base chains, required to emulate
  x_tables.

* add NFTA_CHAIN_USE attribute to obtain the number of references to
  chains, required by x_tables emulation.

* add chain packet/byte counters using per-cpu.

* support 32-64 bits compat.

For historical reasons, this patch includes the following patches
that were posted in the netfilter-devel mailing list.

From Pablo Neira Ayuso:
* nf_tables: add default policy to base chains
* netfilter: nf_tables: add NFTA_CHAIN_USE attribute
* nf_tables: nft_compat: private data of target and matches in contiguous area
* nf_tables: validate hooks for compat match/target
* nf_tables: nft_compat: release cached matches/targets
* nf_tables: x_tables support as a compile time option
* nf_tables: fix alias for xtables over nftables module
* nf_tables: add packet and byte counters per chain
* nf_tables: fix per-chain counter stats if no counters are passed
* nf_tables: don't bump chain stats
* nf_tables: add protocol and flags for xtables over nf_tables
* nf_tables: add ip[6]t_entry emulation
* nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
* nf_tables: support 32bits-64bits x_tables compat
* nf_tables: fix compilation if CONFIG_COMPAT is disabled

From Patrick McHardy:
* nf_tables: move policy to struct nft_base_chain
* nf_tables: send notifications for base chain policy changes

From Alexander Primak:
* nf_tables: remove the duplicate NF_INET_LOCAL_OUT

From Nicolas Dichtel:
* nf_tables: fix compilation when nf-netlink is a module

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:00:04 +02:00
Pablo Neira Ayuso
9370761c56 netfilter: nf_tables: convert built-in tables/chains to chain types
This patch converts built-in tables/chains to chain types that
allows you to deploy customized table and chain configurations from
userspace.

After this patch, you have to specify the chain type when
creating a new chain:

 add chain ip filter output { type filter hook input priority 0; }
                              ^^^^ ------

The existing chain types after this patch are: filter, route and
nat. Note that tables are just containers of chains with no specific
semantics, which is a significant change with regards to iptables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:16:11 +02:00
Patrick McHardy
c29b72e025 netfilter: nft_payload: add optimized payload implementation for small loads
Add an optimized payload expression implementation for small (up to 4 bytes)
aligned data loads from the linear packet area.

This patch also includes original Patrick McHardy's entitled (nf_tables:
inline nft_payload_fast_eval() into main evaluation loop).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:16:10 +02:00
Patrick McHardy
cb7dbfd039 netfilter: nf_tables: add optimized data comparison for small values
Add an optimized version of nft_data_cmp() that only handles values of to
4 bytes length.

This patch includes original Patrick McHardy's patch entitled (nf_tables:
inline nft_cmp_fast_eval() into main evaluation loop).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:16:09 +02:00
Patrick McHardy
ef1f7df917 netfilter: nf_tables: expression ops overloading
Split the expression ops into two parts and support overloading of
the runtime expression ops based on the requested function through
a ->select_ops() callback.

This can be used to provide optimized implementations, for instance
for loading small aligned amounts of data from the packet or inlining
frequently used operations into the main evaluation loop.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:16:08 +02:00
Patrick McHardy
20a69341f2 netfilter: nf_tables: add netlink set API
This patch adds the new netlink API for maintaining nf_tables sets
independently of the ruleset. The API supports the following operations:

- creation of sets
- deletion of sets
- querying of specific sets
- dumping of all sets

- addition of set elements
- removal of set elements
- dumping of all set elements

Sets are identified by name, each table defines an individual namespace.
The name of a set may be allocated automatically, this is mostly useful
in combination with the NFT_SET_ANONYMOUS flag, which destroys a set
automatically once the last reference has been released.

Sets can be marked constant, meaning they're not allowed to change while
linked to a rule. This allows to perform lockless operation for set
types that would otherwise require locking.

Additionally, if the implementation supports it, sets can (as before) be
used as maps, associating a data value with each key (or range), by
specifying the NFT_SET_MAP flag and can be used for interval queries by
specifying the NFT_SET_INTERVAL flag.

Set elements are added and removed incrementally. All element operations
support batching, reducing netlink message and set lookup overhead.

The old "set" and "hash" expressions are replaced by a generic "lookup"
expression, which binds to the specified set. Userspace is not aware
of the actual set implementation used by the kernel anymore, all
configuration options are generic.

Currently the implementation selection logic is largely missing and the
kernel will simply use the first registered implementation supporting the
requested operation. Eventually, the plan is to have userspace supply a
description of the data characteristics and select the implementation
based on expected performance and memory use.

This patch includes the new 'lookup' expression to look up for element
matching in the set.

This patch includes kernel-doc descriptions for this set API and it
also includes the following fixes.

From Patrick McHardy:
* netfilter: nf_tables: fix set element data type in dumps
* netfilter: nf_tables: fix indentation of struct nft_set_elem comments
* netfilter: nf_tables: fix oops in nft_validate_data_load()
* netfilter: nf_tables: fix oops while listing sets of built-in tables
* netfilter: nf_tables: destroy anonymous sets immediately if binding fails
* netfilter: nf_tables: propagate context to set iter callback
* netfilter: nf_tables: add loop detection

From Pablo Neira Ayuso:
* netfilter: nf_tables: allow to dump all existing sets
* netfilter: nf_tables: fix wrong type for flags variable in newelem

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:16:07 +02:00
Patrick McHardy
96518518cc netfilter: add nftables
This patch adds nftables which is the intended successor of iptables.
This packet filtering framework reuses the existing netfilter hooks,
the connection tracking system, the NAT subsystem, the transparent
proxying engine, the logging infrastructure and the userspace packet
queueing facilities.

In a nutshell, nftables provides a pseudo-state machine with 4 general
purpose registers of 128 bits and 1 specific purpose register to store
verdicts. This pseudo-machine comes with an extensible instruction set,
a.k.a. "expressions" in the nftables jargon. The expressions included
in this patch provide the basic functionality, they are:

* bitwise: to perform bitwise operations.
* byteorder: to change from host/network endianess.
* cmp: to compare data with the content of the registers.
* counter: to enable counters on rules.
* ct: to store conntrack keys into register.
* exthdr: to match IPv6 extension headers.
* immediate: to load data into registers.
* limit: to limit matching based on packet rate.
* log: to log packets.
* meta: to match metainformation that usually comes with the skbuff.
* nat: to perform Network Address Translation.
* payload: to fetch data from the packet payload and store it into
  registers.
* reject (IPv4 only): to explicitly close connection, eg. TCP RST.

Using this instruction-set, the userspace utility 'nft' can transform
the rules expressed in human-readable text representation (using a
new syntax, inspired by tcpdump) to nftables bytecode.

nftables also inherits the table, chain and rule objects from
iptables, but in a more configurable way, and it also includes the
original datatype-agnostic set infrastructure with mapping support.
This set infrastructure is enhanced in the follow up patch (netfilter:
nf_tables: add netlink set API).

This patch includes the following components:

* the netlink API: net/netfilter/nf_tables_api.c and
  include/uapi/netfilter/nf_tables.h
* the packet filter core: net/netfilter/nf_tables_core.c
* the expressions (described above): net/netfilter/nft_*.c
* the filter tables: arp, IPv4, IPv6 and bridge:
  net/ipv4/netfilter/nf_tables_ipv4.c
  net/ipv6/netfilter/nf_tables_ipv6.c
  net/ipv4/netfilter/nf_tables_arp.c
  net/bridge/netfilter/nf_tables_bridge.c
* the NAT table (IPv4 only):
  net/ipv4/netfilter/nf_table_nat_ipv4.c
* the route table (similar to mangle):
  net/ipv4/netfilter/nf_table_route_ipv4.c
  net/ipv6/netfilter/nf_table_route_ipv6.c
* internal definitions under:
  include/net/netfilter/nf_tables.h
  include/net/netfilter/nf_tables_core.h
* It also includes an skeleton expression:
  net/netfilter/nft_expr_template.c
  and the preliminary implementation of the meta target
  net/netfilter/nft_meta_target.c

It also includes a change in struct nf_hook_ops to add a new
pointer to store private data to the hook, that is used to store
the rule list per chain.

This patch is based on the patch from Patrick McHardy, plus merged
accumulated cleanups, fixes and small enhancements to the nftables
code that has been done since 2009, which are:

From Patrick McHardy:
* nf_tables: adjust netlink handler function signatures
* nf_tables: only retry table lookup after successful table module load
* nf_tables: fix event notification echo and avoid unnecessary messages
* nft_ct: add l3proto support
* nf_tables: pass expression context to nft_validate_data_load()
* nf_tables: remove redundant definition
* nft_ct: fix maxattr initialization
* nf_tables: fix invalid event type in nf_tables_getrule()
* nf_tables: simplify nft_data_init() usage
* nf_tables: build in more core modules
* nf_tables: fix double lookup expression unregistation
* nf_tables: move expression initialization to nf_tables_core.c
* nf_tables: build in payload module
* nf_tables: use NFPROTO constants
* nf_tables: rename pid variables to portid
* nf_tables: save 48 bits per rule
* nf_tables: introduce chain rename
* nf_tables: check for duplicate names on chain rename
* nf_tables: remove ability to specify handles for new rules
* nf_tables: return error for rule change request
* nf_tables: return error for NLM_F_REPLACE without rule handle
* nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
* nf_tables: fix NLM_F_MULTI usage in netlink notifications
* nf_tables: include NLM_F_APPEND in rule dumps

From Pablo Neira Ayuso:
* nf_tables: fix stack overflow in nf_tables_newrule
* nf_tables: nft_ct: fix compilation warning
* nf_tables: nft_ct: fix crash with invalid packets
* nft_log: group and qthreshold are 2^16
* nf_tables: nft_meta: fix socket uid,gid handling
* nft_counter: allow to restore counters
* nf_tables: fix module autoload
* nf_tables: allow to remove all rules placed in one chain
* nf_tables: use 64-bits rule handle instead of 16-bits
* nf_tables: fix chain after rule deletion
* nf_tables: improve deletion performance
* nf_tables: add missing code in route chain type
* nf_tables: rise maximum number of expressions from 12 to 128
* nf_tables: don't delete table if in use
* nf_tables: fix basechain release

From Tomasz Bursztyka:
* nf_tables: Add support for changing users chain's name
* nf_tables: Change chain's name to be fixed sized
* nf_tables: Add support for replacing a rule by another one
* nf_tables: Update uapi nftables netlink header documentation

From Florian Westphal:
* nft_log: group is u16, snaplen u32

From Phil Oester:
* nf_tables: operational limit match

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:15:48 +02:00
Maxime Jayat
3f79410c7c treewide: Fix common typo in "identify"
Correct common misspelling of "identify" as "indentify" throughout
the kernel

Signed-off-by: Maxime Jayat <maxime@artisandeveloppeur.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:31:06 +02:00
Marcel Holtmann
d97c899bde Bluetooth: Introduce L2CAP channel callback for resuming
Clearing the BT_SK_SUSPEND socket flag from the L2CAP core is causing
a dependency on the socket. So intead of doing that, use a channel
callback into the socket handling to resume.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-14 14:23:24 +03:00
Marcel Holtmann
bdc2578307 Bluetooth: Introduce L2CAP channel flag for defer setup
The L2CAP core should not look into the socket flags to figure out the
setting of defer setup. So introduce a L2CAP channel flag that mirrors
the socket flag.

Since the defer setup option is only set in one place this becomes a
really easy thing to do.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-14 14:21:06 +03:00
Pablo Neira Ayuso
f59cb0453c netfilter: nf_nat: move alloc_null_binding to nf_nat_core.c
Similar to nat_decode_session, alloc_null_binding is needed for both
ip_tables and nf_tables, so move it to nf_nat_core.c. This change
is required by nf_tables.

This is an adapted version of the original patch from Patrick McHardy.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 11:29:39 +02:00
Marcel Holtmann
2edf870d19 Bluetooth: Provide msg_name callback for L2CAP connectionless channels
The L2CAP connectionless channels use SOCK_DGRAM and recvmsg() and need
to receive the remote BD_ADDR and PSM information via msg_name from
the recvmsg() system call.

So in case the L2CAP socket is for connectionless channels, provide
a msg_name callback that can update the data. Also store the remote
BD_ADDR and PSM in the skb so it can be extracted later on.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 23:13:37 +03:00
Marcel Holtmann
d97636980f Bluetooth: Add support for per socket msg_name callback
This allows to add a per socket msg_name callback that can be used
for updating the msg_name information for recvmsg() system calls.

This feature is used by another patch to support address information
on L2CAP connectionless channels.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 23:10:33 +03:00
Marcel Holtmann
5f6cd79f47 Bluetooth: Remove src and dst fields from bt_sock structure
Every socket protocol now stores its own address information. So
just remove the generic src and dst fields since they are no longer
needed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 21:11:25 +03:00
Marcel Holtmann
94a86df010 Bluetooth: Store RFCOMM address information in its own socket structure
The address information of RFCOMM sockets should be stored in its
own socket structure. Trying to generalize them is not helpful since
different transports have different address types.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 21:11:23 +03:00
Marcel Holtmann
eea963641b Bluetooth: Store SCO address information in its own socket structure
The address information of SCO sockets should be stored in its own
socket structure. Trying to generalize them is not helpful since
different transports have different address types.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 21:11:20 +03:00
Marcel Holtmann
041987cff6 Bluetooth: Use SCO addresses from HCI connection directly
Instead of storing a pointer to the addresses for the HCI device
and HCI connection, use them directly. With the recent changes
to address tracking of HCI connections, this becomes simple.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 21:11:18 +03:00
Marcel Holtmann
4f1654e084 Bluetooth: Return the correct address type for L2CAP sockets
The L2CAP sockets can use BR/EDR public, LE public and LE random
addresses for various combinations of source and destination
devices. So make sure that getsockname(), getpeername() and
accept() return the correct address type.

For this the address type of the source and destination is stored
with the L2CAP channel information. The stored address type is
not the one specific for the HCI protocol. It is the address
type used for the L2CAP sockets and the management interface.

The underlying HCI connections store the HCI address type. If
needed, it gets converted to the socket address type.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 18:58:30 +03:00
Marcel Holtmann
7eafc59e2f Bluetooth: Store address information in L2CAP channel structure
With the effort of abstracting the L2CAP socket from the underlying
L2CAP channel it is important to store the source and destination
address information directly in the L2CAP channel structure.

Direct access to the HCI connection address information is not
possible since they might not be avaiable at L2CAP channel
creation time. The address information will be updated when
the underlying BR/EDR or LE connection status changes.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 18:52:01 +03:00
Marcel Holtmann
662e8820f3 Bluetooth: Store source address of HCI connections
The source addressed was based on the public address of the HCI device,
but with LE connections this not always the case. For example single
mode LE-only controllers would use a static random address. And this
address is configured by userspace.

To not complicate the lookup of what kind of address is in use, store
the correct source address for each HCI connection.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 17:47:37 +03:00
Marcel Holtmann
e7c4096e16 Bluetooth: Store the source address type of LE connections
When establishing LE connections, it is possible to use a public
address (if available) or a random address. The type of address
is only known when creating connections, so make sure it is
stored in hci_conn structure.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 17:46:31 +03:00
Marcel Holtmann
79d95a19a4 Bluetooth: Remove pointless bdaddr_to_le() helper function
The bdaddr_to_le() function tries to convert the internal address
type to one that matches the HCI address type for LE. It does not
handle any address types not used by LE and in the end just make
the code a lot harder to read.

So instead of just hiding behind a magic function, just convert
the internal address type where it needs to be converted. And it
turns out that these are only two cases anyway. One when creating
new LE connections and the other when loading the long term keys.
In both cases this makes it more clear on what it going on.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 17:45:55 +03:00
Marcel Holtmann
a4de24d437 Bluetooth: Remove l2cap_conn->src and l2cap_conn->dst pointers
The l2cap_conn->src and l2cap_conn->dst pointers are no longer in use
and so just remove them.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-13 17:45:24 +03:00
Marcel Holtmann
d40bffbc4e Bluetooth: The L2CAP fixed channel connectionless data is supported
The implementation actually supports the L2CAP connectionless data
channel. So set it as supported in the fixed channels bitmask.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-12 18:30:29 +03:00
Marcel Holtmann
3124b84309 Bluetooth: Allow 3D profile to use security mode 4 level 0
The PSM 0x0021 is dedicated to the 3D profile and has permission to
use security mode 4 level 0 for L2CAP connectionless unicast data
transfers.

When establishing a L2CAP connectionless channel on PSM 0x0021, it
will no longer force Secure Simple Pairing.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-12 17:30:42 +03:00
Marcel Holtmann
14b49b9a49 Bluetooth: Add management command for setting LE scan parameters
The scan interval and window parameters are used for LE passive
background scanning and connection establishment. This allows
userspace to change the values.

These two values should be kept in sync with whatever is used for
the scan parameters service on remote devices. And it puts the
controlling daemon (for example bluetoothd) in charge of setting
the values.

Main use case would be to switch between two sets of values. One
for foreground applications and one for background applications.

At this moment, the values are only used for manual connection
establishment, but soon that should be extended to background
scanning and automatic connection establishment.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-11 18:18:13 +02:00
Marcel Holtmann
bef64738e3 Bluetooth: Make LE scan interval and window a controller option
The scan interval and window for LE passive scanning and connection
establishment should be configurable on a per controller basis. So
introduce a setting that later on will allow modifying it.

This setting does not affect LE active scanning during device
discovery phase. As long as that phase uses interleaved discovery,
it will continuously scan.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-11 18:18:11 +02:00
Marcel Holtmann
7bd8f09f69 Bluetooth: Add hdev parameter to hdev->send driver callback
Instead of masking hdev inside the skb->dev parameter, hand it
directly to the driver as a parameter to hdev->send. This makes
the driver interface more clear and simpler.

This patch fixes all drivers to accept and handle the new parameter
of hdev->send callback. Special care has been taken for bpa10x
and btusb drivers that require having skb->dev set to hdev for
the URB transmit complete handlers.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-11 15:28:03 +02:00
Sunil Dutt
c01fc9ada9 cfg80211: pass station supported channel and oper class info
The information of the peer's supported channels and supported operating
classes are required for the driver to perform TDLS off channel
operations. This commit enhances the function nl80211_(new)set_station
to pass this information of the peer to the driver.

Signed-off-by: Sunil Dutt <c_duttus@qti.qualcomm.com>
[return errors for malformed tuples]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-11 15:26:58 +02:00
Marcel Holtmann
e1a2617069 Bluetooth: Provide hdev parameter to hci_recv_frame() driver callback
To avoid casting skb->dev into hdev, just let the drivers provide
the hdev directly when calling hci_recv_frame() function.

This patch also fixes up all drivers to provide the hdev.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-11 09:45:34 +02:00
Marcel Holtmann
ac4b723661 Bluetooth: Move smp.h header file into net/bluetooth/
The smp.h header file is only used internally by the bluetooth.ko
module and is not a public API. So make it local to the core
Bluetooth module.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-11 00:10:07 +02:00
Marcel Holtmann
7024728ee5 Bluetooth: Move a2mp.h header file into net/bluetooth/
The a2mp.h header file is only used internally by the bluetooth.ko
module and is not a public API. So make it local to the core
Bluetooth module.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-11 00:10:05 +02:00
Marcel Holtmann
7ef9fbf088 Bluetooth: Move amp.h header file into net/bluetooth/
The amp.h header file is only used internally by the bluetooth.ko
module and is not a public API. So make it local to the core
Bluetooth module.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-11 00:10:03 +02:00
Marcel Holtmann
324d36ed26 Bluetooth: Remove hdev->ioctl driver callback
Since there is no use of hdev->ioctl by any Bluetooth driver since
ever, so just lets remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-10 22:11:41 +02:00
Eric Dumazet
b44084c2c8 inet: rename ir_loc_port to ir_num
In commit 634fb979e8 ("inet: includes a sock_common in request_sock")
I forgot that the two ports in sock_common do not have same byte order :

skc_dport is __be16 (network order), but skc_num is __u16 (host order)

So sparse complains because ir_loc_port (mapped into skc_num) is
considered as __u16 while it should be __be16

Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num),
and perform appropriate htons/ntohs conversions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-10 14:37:35 -04:00
John W. Linville
e9517fecf2 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-10-10 13:38:50 -04:00
Eric Dumazet
634fb979e8 inet: includes a sock_common in request_sock
TCP listener refactoring, part 5 :

We want to be able to insert request sockets (SYN_RECV) into main
ehash table instead of the per listener hash table to allow RCU
lookups and remove listener lock contention.

This patch includes the needed struct sock_common in front
of struct request_sock

This means there is no more inet6_request_sock IPv6 specific
structure.

Following inet_request_sock fields were renamed as they became
macros to reference fields from struct sock_common.
Prefix ir_ was chosen to avoid name collisions.

loc_port   -> ir_loc_port
loc_addr   -> ir_loc_addr
rmt_addr   -> ir_rmt_addr
rmt_port   -> ir_rmt_port
iif        -> ir_iif

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-10 00:08:07 -04:00
Eric Dumazet
c2bb06db59 net: fix build errors if ipv6 is disabled
CONFIG_IPV6=n is still a valid choice ;)

It appears we can remove dead code.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 13:04:03 -04:00
Steffen Klassert
212e560112 ipv6: Add a receive path hook for vti6 in xfrm6_mode_tunnel.
Add a receive path hook for the IPsec vritual tunnel interface.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-09 13:16:36 +02:00
Eric Dumazet
efe4208f47 ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 00:01:25 -04:00
Eric Dumazet
05dbc7b594 tcp/dccp: remove twchain
TCP listener refactoring, part 3 :

Our goal is to hash SYN_RECV sockets into main ehash for fast lookup,
and parallel SYN processing.

Current inet_ehash_bucket contains two chains, one for ESTABLISH (and
friend states) sockets, another for TIME_WAIT sockets only.

As the hash table is sized to get at most one socket per bucket, it
makes little sense to have separate twchain, as it makes the lookup
slightly more complicated, and doubles hash table memory usage.

If we make sure all socket types have the lookup keys at the same
offsets, we can use a generic and faster lookup. It turns out TIME_WAIT
and ESTABLISHED sockets already have common lookup fields for IPv4.

[ INET_TW_MATCH() is no longer needed ]

I'll provide a follow-up to factorize IPv6 lookup as well, to remove
INET6_TW_MATCH()

This way, SYN_RECV pseudo sockets will be supported the same.

A new sock_gen_put() helper is added, doing either a sock_put() or
inet_twsk_put() [ and will support SYN_RECV later ].

Note this helper should only be called in real slow path, when rcu
lookup found a socket that was moved to another identity (freed/reused
immediately), but could eventually be used in other contexts, like
sock_edemux()

Before patch :

dmesg | grep "TCP established"

TCP established hash table entries: 524288 (order: 11, 8388608 bytes)

After patch :

TCP established hash table entries: 524288 (order: 10, 4194304 bytes)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 23:19:24 -04:00
David S. Miller
53af53ae83 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/linux/netdevice.h
	net/core/sock.c

Trivial merge issues.

Removal of "extern" for functions declaration in netdevice.h
at the same time "const" was added to an argument.

Two parallel line additions in net/core/sock.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 23:07:53 -04:00
Shawn Bohrer
fbf8866d65 net: ipv4 only populate IP_PKTINFO when needed
The since the removal of the routing cache computing
fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
packet received.  This has introduced a performance regression for some
UDP workloads.

This change skips populating the packet info for sockets that do not have
IP_PKTINFO set.

Benchmark results from a netperf UDP_RR test:
Before 89789.68 transactions/s
After  90587.62 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.63us RTT
After  12.48us RTT

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 16:27:33 -04:00
Shawn Bohrer
421b3885bf udp: ipv4: Add udp early demux
The removal of the routing cache introduced a performance regression for
some UDP workloads since a dst lookup must be done for each packet.
This change caches the dst per socket in a similar manner to what we do
for TCP by implementing early_demux.

For UDP multicast we can only cache the dst if there is only one
receiving socket on the host.  Since caching only works when there is
one receiving socket we do the multicast socket lookup using RCU.

For UDP unicast we only demux sockets with an exact match in order to
not break forwarding setups.  Additionally since the hash chains may be
long we only check the first socket to see if it is a match and not
waste extra time searching the whole chain when we might not find an
exact match.

Benchmark results from a netperf UDP_RR test:
Before 87961.22 transactions/s
After  89789.68 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.97us RTT
After  12.63us RTT

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 16:27:33 -04:00
David S. Miller
7009deab19 Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
Conflicts:
	drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
	drivers/net/wireless/rtlwifi/rtl8188ee/phy.h
	drivers/net/wireless/rtlwifi/rtl8192ce/phy.h
	drivers/net/wireless/rtlwifi/rtl8192de/phy.h
	drivers/net/wireless/rtlwifi/rtl8723ae/phy.h

Just some minor conflicts between the wireless-next changes
and Joe Perches's "extern" removal from function prototypes
in header files.

John W. Linville says:

====================
Regarding the Bluetooth bits, Gustavo says:

"The big work here is from Marcel and Johan. They did a lot of work
in the L2CAP, HCI and MGMT layers. The most important ones are the
addition of a new MGMT command to enable/disable LE advertisement
and the introduction of the HCI user channel to allow applications
to get directly and exclusive access to Bluetooth devices."

As to the ath10k bits, Kalle says:

"Bartosz dropped support for qca98xx hw1.0 hardware from ath10k, it's
just too much to support it. Michal added support for the new firmware
interface. Marek fixed WEP in AP and IBSS mode. Rest of the changes are
minor fixes or cleanups."

And also:

"Major changes are:

* throughput improvements including aligning the RX frames correctly and
  optimising HTT layer (Michal)

* remove qca98xx hw1.0 support (Bartosz)

* add support for firmware version 999.999.0.636 (Michal)

* firmware htt statistics support (Kalle)

* fix WEP in AP and IBSS mode (Marek)

* fix a mutex unlock balance in debugfs file (Shafi)

And of course there's a lot of smaller fixes and cleanup."

For the wl12xx bits, Luca says:

"Here are some patches intended for 3.13.  Eliad is upstreaming a bunch
of patches that have been pending in the internal tree.  Mostly bugfixes
and other small improvements."

Along with that...

Arend and friends bring us a batch of brcmfmac updates, Larry Finger
offers some rtlwifi refactoring, and Sujith sends the usual batch of
ath9k updates.  As usual, there are a number of other small updates
from a variety of players as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 15:40:44 -04:00
Alexei Starovoitov
d45ed4a4e3 net: fix unsafe set_memory_rw from softirq
on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 15:16:45 -04:00
Marcel Holtmann
7528ca1c5a Bluetooth: Read location data on AMP controller init
When initializing an AMP controller, read its current known location
data so that it can be analyzed later on.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 16:53:40 +02:00
Marcel Holtmann
2f1e063bc0 Bluetooth: Make mgmt_discovering() return void
The return value of mgmt_discovering() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:28 +02:00
Marcel Holtmann
9cf12aee8b Bluetooth: Make mgmt_remote_name() return void
The return value of mgmt_remote_name() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:25 +02:00
Marcel Holtmann
901801b9a4 Bluetooth: Make mgmt_device_found() return void
The return value of mgmt_device_found() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:23 +02:00
Marcel Holtmann
9b80ec5e8e Bluetooth: Make mgmt_device_disconnected() return void
The return value of mgmt_device_disconnected() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:21 +02:00
Marcel Holtmann
ecd90ae7f6 Bluetooth: Make mgmt_device_connected() return void
The return value of mgmt_device_connected() function is not used
and so just change it to return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:19 +02:00
Marcel Holtmann
445608d078 Bluetooth: Make mgmt_connect_failed() return void
The return value of mgmt_connect_failed() function is not used
so change it to just return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:18 +02:00
Marcel Holtmann
7892924c7d Bluetooth: Make mgmt_disconnect_failed() return void
The return value of mgmt_disconnect_failed() function is not used
so change it to just return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:16 +02:00
Marcel Holtmann
3eec705e42 Bluetooth: Make mgmt_set_powered_failed() return void
The return value of mgmt_set_powered_failed() function is never used
and so make the function just return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:13 +02:00
Marcel Holtmann
bf6b56db0a Bluetooth: Make mgmt_index_added() and mgmt_index_removed() return void
The return value from mgmt_index_added() and mgmt_index_removed()
functions is never used. So do not pretend that returning an error
would actually be handled and just make both functions return void.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-07 09:19:12 +02:00
Marcel Holtmann
1514b8928e Bluetooth: Remove mgmt_valid_hdev() helper function
The helper function mgmt_valid_hdev() is more obfuscating the code
then it makes it easier to read. So intead of this helper, use the
direct check for BR/EDR device type.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 17:58:45 +02:00
Marcel Holtmann
a6d811ed28 Bluetooth: Remove no longer needed mgmt_new_settings() function
The mgmt_new_settings() function was only needed to handle the
error case when re-enabling advertising failed. Since that is
now handled internally inside the management core, this function
is not needed anymore. So just remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 15:01:21 +02:00
Marcel Holtmann
5976e60811 Bluetooth: Use helper function for re-enabling advertising
When the all LE connections have been disconneted, then it is up to
the host to re-enable advertising at that point. To ensure that the
correct advertising parameters are used, force the usage of the
common helper to enable advertising.

The change just moves the manual enabling of advertising from the
event handler into the management core so that the helper can
be actually shared.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 15:00:07 +02:00
Marcel Holtmann
c2f5ebd214 Bluetooth: Add constants for LE advertising types
Add constants for ADV_IND, ADV_DIRECT_IND, ADV_SCAN_IND and
ADV_NONCONN_IND advertising types.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 14:57:59 +02:00
Marcel Holtmann
1e191893f3 Bluetooth: Add HCI structure for LE advertising parameters command
Add the basic HCI structure for building the LE advertising parameters
command.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 14:51:23 +02:00
Marcel Holtmann
80d58d0b5b Bluetooth: Move hci_amp_capable() function into L2CAP core
The hci_amp_capable() function has only a single user inside the L2CAP
core. Instead of exporting the function, place it next to its user.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 10:25:58 +02:00
Marcel Holtmann
536619e86d Bluetooth: Rename AMP status constants and use them
The AMP controller status constants need to be actually used to avoid
crypted hardcoded numbers.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 10:24:30 +02:00
Marcel Holtmann
6ed971ca4f Bluetooth: Use explicit AMP controller id value for BR/EDR
The special AMP controller id 0 is reserved for the BR/EDR controller
that has the main link. It is a fixed value and so use a constant for
this throughout the code to make it more visible when the handling is
for the BR/EDR channel or when it is for the AMP channel.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 10:23:39 +02:00
Marcel Holtmann
ece6912648 Bluetooth: Separate AMP controller type from HCI device type
There are two defined HCI device types. One is for BR/EDR controllers
and the other is for AMP controllers. The HCI device type is not the
same as the AMP controller type. It just happens that currently the
defined types match, but that is not guaranteed.

Split the usage of AMP controller type into its own domain so that
it is possible to separate between BR/EDR controllers, 802.11 AMP
controllers and any other AMP technology that might be defined in
the future.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 10:22:38 +02:00
Marcel Holtmann
7c13823d5b Bluetooth: Add constants for AMP controller type
Add the constants for BR/EDR and 802.11 AMP controller types.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 10:21:59 +02:00
Marcel Holtmann
f822c411b2 Bluetooth: Remove useless external function to count controllers
The list of controllers can be counted ahead of time and inline
inside the AMP discover handling. There is no need to export such
a function at all.

In addition just count the AMP controller and only allocated space
for a single mandatory BR/EDR controller. No need to allocate more
space than needed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-06 10:21:09 +02:00
Johan Hedberg
d2f5a196d7 Bluetooth: Add public mgmt function to send New Settings event
A function is needed so that the HCI event processing can ask the mgmt
code to emit a new settings event. This is necessary e.g. when the event
processing does updates to mgmt related states without any dependency of
actual mgmt commands.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-05 03:03:38 -07:00
Johan Hedberg
f3d3444a4d Bluetooth: Rename HCI_LE_PERIPHERAL to HCI_ADVERTISING
This flag is used to indicate whether we want to have advertising
enabled or not, so give it a more suitable name.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-05 03:03:38 -07:00
Eric Dumazet
96f817fede tcp: shrink tcp6_timewait_sock by one cache line
While working on tcp listener refactoring, I found that it
would really make things easier if sock_common could include
the IPv6 addresses needed in the lookups, instead of doing
very complex games to get their values (depending on sock
being SYN_RECV, ESTABLISHED, TIME_WAIT)

For this to happen, I need to be sure that tcp6_timewait_sock
and tcp_timewait_sock consume same number of cache lines.

This is possible if we only use 32bits for tw_ttd, as we remove
one 32bit hole in inet_timewait_sock

inet_tw_time_stamp() is defined and used, even if its current
implementation looks like tcp_time_stamp : We might need finer
resolution for tcp_time_stamp in the future.

Before patch : sizeof(struct tcp6_timewait_sock) = 0xc8

After patch : sizeof(struct tcp6_timewait_sock) = 0xc0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 17:43:39 -04:00
John W. Linville
0d4f55bc37 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-10-03 16:19:07 -04:00
Nikolay Aleksandrov
357afe9c46 flow_dissector: factor out the ports extraction in skb_flow_get_ports
Factor out the code that extracts the ports from skb_flow_dissect and
add a new function skb_flow_get_ports which can be re-used.

Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:36:37 -04:00
Eric Dumazet
5080546682 inet: consolidate INET_TW_MATCH
TCP listener refactoring, part 2 :

We can use a generic lookup, sockets being in whatever state, if
we are sure all relevant fields are at the same place in all socket
types (ESTABLISH, TIME_WAIT, SYN_RECV)

This patch removes these macros :

 inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair

And adds :

 sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr

Then, INET_TW_MATCH() is really the same than INET_MATCH()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:33:35 -04:00
DoHyun Pyun
2ed01805ee Bluetooth: Add the definition for Slave Page Response Timeout
The Slave Page Response Timeout event indicates to the Host that a
slave page response timeout has occurred in the BR/EDR Controller.

The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.

Bluetooth Core Specification Addendum 4 - Page 110

"7.7.72 Slave Page Response Timeout Event [New Section]
...
Note: this event will be generated if the slave BR/EDR Controller
responds to a page but does not receive the master FHS packet
(see Baseband, Section 8.3.3) within pagerespTO.

Event Parameters: NONE"

Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 09:55:00 -07:00
DoHyun Pyun
2b359445d5 Bluetooth: Add the definition and stcuture for Sync Train Complete
The Synchronization Train Complete event indicates that the Start
Synchronization Train command has completed.

The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.

Bluetooth Core Specification Addendum 4 - Page 103

"7.7.67 Synchronization Train Complete Event [New Section]
...

Event Parameters:

Status 0x00       Start Synchronization Train command completed
                  successfully.
       0x01-0xFF  Start Synchronization Train command failed.
                  See Part D, Error Codes, for error codes and
                  descriptions."

Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 09:54:59 -07:00
DoHyun Pyun
cefded9819 Bluetooth: Add the definition for Start Synchronization Train
The Start_Synchronization_Train command controls the Synchronization
Train functionality in the BR/EDR Controller.

The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.

Bluetooth Core Specification Addendum 4 - Page 86

"7.1.51 Start Synchronization Train Command [New Section]
...
If connectionless slave broadcast mode is not enabled, the Command
Disallowed (0x0C) error code shall be returned. After receiving this
command and returning a Command Status event, the Baseband starts
attempting to send synchronization train packets containing information
related to the enabled Connectionless Slave Broadcast packet timing.

Note: The AFH_Channel_Map used in the synchronization train packets is
configured by the Set_AFH_Channel_Classification command and the local
channel classification in the BR/EDR Controller.

The synchronization train packets will be sent using the parameters
specified by the latest Write_Synchronization_Train_Parameters command.
The Synchronization Train will continue until synchronization_trainTO
slots (as specified in the last Write_Synchronization_Train command)
have passed or until the Host disables the Connectionless Slave Broadcast
logical transport."

Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 09:54:59 -07:00
DoHyun Pyun
8c9a041be2 Bluetooth: Add the definition and structure for Set CSB
he Set_Connectionless_Slave_Broadcast command controls the
Connectionless Slave Broadcast functionality in the BR/EDR
Controller.

The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.

Bluetooth Core Specification Addendum 4 - Page 78

"7.1.49 Set Connectionless Slave Broadcast Command [New Section]
...
The LT_ADDR indicated in the Set_Connectionless_Slave_Broadcast shall be
pre-allocated using the HCI_Set_Reserved_LT_ADDR command. If the
LT_ADDR has not been reserved, the Unknown Connection Identifier (0x02)
error code shall be returned. If the controller is unable to reserve
sufficient bandwidth for the requested activity, the Connection Rejected
Due to Limited Resources (0x0D) error code shall be returned.

The LPO_Allowed parameter informs the BR/EDR Controller whether it is
allowed to sleep.

The Packet_Type parameter specifies which packet types are allowed. The
Host shall either enable BR packet types only, or shall enable EDR and DM1
packet types only.

The Interval_Min and Interval_Max parameters specify the range from which
the BR/EDR Controller must select the Connectionless Slave Broadcast
Interval. The selected Interval is returned."

Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 09:54:59 -07:00
DoHyun Pyun
a9b07a643f Bluetooth: Add the structure for Write Sync Train Parameters
The Write_Synchronization_Train_Parameters command configures
the Synchronization Train functionality in the BR/EDR Controller.

The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.

Bluetooth Core Specification Addendum 4 - Page 97

"7.3.90 Write Synchronization Train Parameters Command [New Section]
...
Note: The AFH_Channel_Map used in the Synchronization Train packets is
configured by the Set_AFH_Channel_Classification command and the local
channel classification in the BR/EDR Controller.

Interval_Min and Interval_Max specify the allowed range of
Sync_Train_Interval. Refer to [Vol. 2], Part B, section 2.7.2 for
a detailed description of Sync_Train_Interval. The BR/EDR Controller shall
select an interval from this range and return it in Sync_Train_Interval.
If the Controller is unable to select a value from this range, it shall
return the Invalid HCI Command Parameters (0x12) error code.

Once started (via the Start_Synchronization_Train Command) the
Synchronization Train will continue until synchronization_trainTO slots have
passed or Connectionless Slave Broadcast has been disabled."

Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 09:54:59 -07:00
DoHyun Pyun
7d1dab49f6 Bluetooth: Add the definition and structure for Set CSB Data
The Set_Connectionless_Slave_Broadcast_Data command provides the
ability for the Host to set Connectionless Slave Broadcast data in
the BR/EDR Controller.

The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.

Bluetooth Core Specification Addendum 4 - Page 93

"7.3.88 Set Connectionless Slave Broadcast Data Command [New Section]
...
If connectionless slave broadcast mode is disabled, this data shall be
kept by the BR/EDR Controller and used once connectionless slave broadcast
mode is enabled. If connectionless slave broadcast mode is enabled,
and this command is successful, this data will be sent starting with
the next Connectionless Slave Broadcast instant.

The Data_Length field may be zero, in which case no data needs to be
provided.

The Host may fragment the data using the Fragment field in the command. If
the combined length of the fragments exceeds the capacity of the largest
allowed packet size specified in the Set Connectionless Slave Broadcast
command, all fragments associated with the data being assembled shall be
discarded and the Invalid HCI Command Parameters error (0x12) shall be
returned."

Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 09:54:59 -07:00
DoHyun Pyun
6a20eaf404 Bluetooth: Add the definition and structure for Delete Reserved LT_ADDR
The Delete_Reserved_LT_ADDR command requests that the BR/EDR
Controller cancel the reservation for a specific LT_ADDR reserved for the
purposes of Connectionless Slave Broadcast.

The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.

Bluetooth Core Specification Addendum 4 - Page 92

"7.3.87 Delete Reserved LT_ADDR Command [New Section]
...
If the LT_ADDR indicated in the LT_ADDR parameter is not reserved by the
BR/EDR Controller, it shall return the Unknown Connection Identifier (0x02)
error code.
If connectionless slave broadcast mode is still active, then the Controller
shall return the Command Disallowed (0x0C) error code."

Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 09:54:59 -07:00
DoHyun Pyun
d0bf75a51b Bluetooth: Add the definition and structure for Set Reserved LT_ADDR
The Set_Reserved_LT_ADDR command allows the host to request that the
BR/EDR Controller reserve a specific LT_ADDR for Connectionless Slave
Broadcast.

The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.

Bluetooth Core Specification Addendum 4 - Page 90

"7.3.86 Set Reserved LT_ADDR Command [New Section]
...
If the LT_ADDR indicated in the LT_ADDR parameter is already in use by the
BR/EDR Controller, it shall return the ACL Connection Already Exists (0x0B)
error code. If the LT_ADDR indicated in the LT_ADDR parameter is out of
range, the controller shall return the Invalid HCI Command Parameters (0x12)
error code. If the command succeeds, then the reserved LT_ADDR shall be
used when issuing subsequent Set Connectionless Slave Broadcast Data and
Set Connectionless Slave Broadcast commands.
To ensure that the reserved LT_ADDR is not already allocated, it is
recommended that this command be issued at some point after HCI_Reset is
issued but before page scanning is enabled or paging is initiated."

Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 09:54:59 -07:00
Arik Nemtsov
7578d57520 mac80211: implement STA CSA for drivers using channel contexts
Limit the current implementation to a single channel context used by
a single vif, thereby avoiding multi-vif/channel complexities.

Reuse the main function from AP CSA code, but move a portion out in
order to fit the STA scenario.

Add a new mac80211 HW flag so we don't break devices that don't support
channel switch with channel-contexts. The new behavior will be opt-in.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-02 18:18:23 +02:00
Marcel Holtmann
d13eafce2c Bluetooth: Add management command for setting static address
On dual-mode BR/EDR/LE and LE only controllers it is possible
to configure a random address. There are two types or random
addresses, one is static and the other private. Since the
random private addresses require special privacy feature to
be supported, the configuration of these two are kept separate.

This command allows for setting the static random address. It is
only supported on controllers with LE support. The static random
address is suppose to be valid for the lifetime of the controller
or at least until the next power cycle. To ensure such behavior,
setting of the address is limited to when the controller is
powered off.

The special BDADDR_ANY address (00:00:00:00:00:00) can be used to
disable the static address. This is also the default value.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-02 14:50:58 +03:00
Johan Hedberg
0663ca2a03 Bluetooth: Add a new mgmt_set_bredr command
This patch introduces a new mgmt command for enabling/disabling BR/EDR
functionality. This can be convenient when one wants to make a dual-mode
controller behave like a single-mode one. The command is only available
for dual-mode controllers and requires that LE is enabled before using
it. The BR/EDR setting can be enabled at any point, however disabling it
requires the controller to be powered off (otherwise a "rejected"
response will be sent).

Disabling the BR/EDR setting will automatically disable all other BR/EDR
related settings.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 03:48:28 -07:00
Johan Hedberg
56f8790102 Bluetooth: Introduce a new HCI_BREDR_ENABLED flag
To allow treating dual-mode (BR/EDR/LE) controllers as single-mode ones
(LE-only) we want to introduce a new HCI_BREDR_ENABLED flag to track
whether BR/EDR is enabled or not (previously we simply looked at the
feature bit with lmp_bredr_enabled).

This patch add the new flag and updates the relevant places to test
against it instead of using lmp_bredr_enabled. The flag is by default
enabled when registering an adapter and only cleared if necessary once
the local features have been read during the HCI init procedure.

We cannot completely block BR/EDR usage in case user space uses raw HCI
sockets but the patch tries to block this in places where possible, such
as the various BR/EDR specific ioctls.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02 03:48:28 -07:00
Marcel Holtmann
848566b381 Bluetooth: Provide high speed configuration option
Hiding the Bluetooth high speed support behind a module parameter is
not really useful. This can be enabled and disabled at runtime via
the management interface. This also has the advantage that this can
now be changed per controller and not just global.

This patch removes the module parameter and exposes the high speed
setting of the management interface to all controllers.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-02 09:09:59 +03:00
Marcel Holtmann
a59ac2f744 Bluetooth: Replace BDADDR_LOCAL with BDADDR_NONE
The BDADDR_LOCAL is a relict from userspace and has never been used
within the kernel. So remove that constant and replace it with a new
BDADDR_NONE that is similar to HCI_DEV_NONE with all bits set.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-02 09:09:57 +03:00
David S. Miller
4fbef95af4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/usb/qmi_wwan.c
	drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
	include/net/netfilter/nf_conntrack_synproxy.h
	include/net/secure_seq.h

The conflicts are of two varieties:

1) Conflicts with Joe Perches's 'extern' removal from header file
   function declarations.  Usually it's an argument signature change
   or a function being added/removed.  The resolutions are trivial.

2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
   a new value, another changes an existing value.  That sort of
   thing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 17:06:14 -04:00
David S. Miller
e024bdc051 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for your net
tree, they are:

* Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
  Patrick McHardy.

* Fix possible weight overflow in lblc and lblcr schedulers due to
  32-bits arithmetics, from Simon Kirby.

* Fix possible memory access race in the lblc and lblcr schedulers,
  introduced when it was converted to use RCU, two patches from
  Julian Anastasov.

* Fix hard dependency on CPU 0 when reading per-cpu stats in the
  rate estimator, from Julian Anastasov.

* Fix race that may lead to object use after release, when invoking
  ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
  Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:39:35 -04:00
Michal Kazior
0cfcefef19 mac80211: support reporting A-MSDU subframes individually
Some devices may not be able to report A-MSDUs in
single buffers. Drivers for such devices were
forced to re-assemble A-MSDUs which would then
be eventually disassembled by mac80211. This could
lead to CPU cache thrashing and poor performance.

Since A-MSDU has a single sequence number all
subframes share it. This was in conflict with
retransmission/duplication recovery
(IEEE802.11-2012: 9.3.2.10).

Patch introduces a new flag that is meant to be
set for all individually reported A-MSDU subframes
except the last one. This ensures the
last_seq_ctrl is updated after the last subframe
is processed. If an A-MSDU is actually a duplicate
transmission all reported subframes will be
properly discarded.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[johannes: add braces that were missing even before]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:22:03 +02:00
Johannes Berg
55fff50113 mac80211: add explicit IBSS driver operations
This can be useful for drivers if they have any failure cases
when joining an IBSS. Also move setting the queue parameters
to before this new call, in case the new driver op needs them
already.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-01 12:17:45 +02:00
Eric Dumazet
c3f40d7c04 net: add missing sk_max_pacing_rate doc
Warning(include/net/sock.h:411): No description found for parameter
'sk_max_pacing_rate'

Lets please "make htmldocs" and kbuild bot.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:08:59 -07:00
Eric W. Biederman
0bbf87d852 net ipv4: Convert ipv4.ip_local_port_range to be per netns v3
- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
  of the callers.
- Move the initialization of sysctl_local_ports into
   sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next

v3:
- Compile fixes of strange callers of inet_get_local_port_range.
  This patch now successfully passes an allmodconfig build.
  Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range

Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 21:59:38 -07:00
David S. Miller
7b77d161ce Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	include/net/xfrm.h

Simple conflict between Joe Perches "extern" removal for function
declarations in header files and the changes in Steffen's tree.

Steffen Klassert says:

====================
Two patches that are left from the last development cycle.
Manual merging of include/net/xfrm.h is needed. The conflict
can be solved as it is currently done in linux-next.

1) We announce the creation of temporary acquire state via an asyc event,
   so the deletion should be annunced too. From Nicolas Dichtel.

2) The VTI tunnels do not real tunning, they just provide a routable
   IPsec tunnel interface. So introduce and use xfrm_tunnel_notifier
   instead of xfrm_tunnel for xfrm tunnel mode callback. From Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 15:24:57 -04:00
Pravin B Shelar
559835ea72 vxlan: Use RCU apis to access sk_user_data.
Use of RCU api makes vxlan code easier to understand.  It also
fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference.
In rare case without ACCESS_ONCE() compiler might omit vs on
sk_user_data dereference.
Compiler can use vs as alias for sk->sk_user_data, resulting in
multiple sk_user_data dereference in rcu read context which
could change.

CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 14:22:59 -04:00
Patrick McHardy
f4a87e7bd2 netfilter: synproxy: fix BUG_ON triggered by corrupt TCP packets
TCP packets hitting the SYN proxy through the SYNPROXY target are not
validated by TCP conntrack. When th->doff is below 5, an underflow happens
when calculating the options length, causing skb_header_pointer() to
return NULL and triggering the BUG_ON().

Handle this case gracefully by checking for NULL instead of using BUG_ON().

Reported-by: Martin Topholm <mph@one.com>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-30 12:44:38 +02:00
Eric Dumazet
62748f32d5 net: introduce SO_MAX_PACING_RATE
As mentioned in commit afe4fd0624 ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.

SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.

u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));

To be effectively paced, a flow must use FQ packet scheduler.

Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.

I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28 15:35:41 -07:00
Francesco Fusco
aa66158145 ipv4: processing ancillary IP_TOS or IP_TTL
If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().

The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.

Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28 15:21:52 -07:00
Francesco Fusco
f02db315b8 ipv4: IP_TOS and IP_TTL can be specified as ancillary data
This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:

- the TTL, expressed as __u8.
  The allowed values are in the [1-255].
  A value of 0 means that the TTL is not specified.

- the TOS, expressed as __s16.
  The allowed values are in the range [0,255].
  A value of -1 means that the TOS is not specified.

- the priority, expressed as a char and computed when
  handling the ancillary data.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28 15:21:51 -07:00
Eric Dumazet
9a3bab6b05 net: net_secret should not depend on TCP
A host might need net_secret[] and never open a single socket.

Problem added in commit aebda156a5
("net: defer net_secret[] initialization")

Based on prior patch from Hannes Frederic Sowa.

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@strressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28 15:19:40 -07:00
Eric W. Biederman
50624c934d net: Delay default_device_exit_batch until no devices are unregistering v2
There is currently serialization network namespaces exiting and
network devices exiting as the final part of netdev_run_todo does not
happen under the rtnl_lock.  This is compounded by the fact that the
only list of devices unregistering in netdev_run_todo is local to the
netdev_run_todo.

This lack of serialization in extreme cases results in network devices
unregistering in netdev_run_todo after the loopback device of their
network namespace has been freed (making dst_ifdown unsafe), and after
the their network namespace has exited (making the NETDEV_UNREGISTER,
and NETDEV_UNREGISTER_FINAL callbacks unsafe).

Add the missing serialization by a per network namespace count of how
many network devices are unregistering and having a wait queue that is
woken up whenever the count is decreased.  The count and wait queue
allow default_device_exit_batch to wait until all of the unregistration
activity for a network namespace has finished before proceeding to
unregister the loopback device and then allowing the network namespace
to exit.

Only a single global wait queue is used because there is a single global
lock, and there is a single waiter, per network namespace wait queues
would be a waste of resources.

The per network namespace count of unregistering devices gives a
progress guarantee because the number of network devices unregistering
in an exiting network namespace must ultimately drop to zero (assuming
network device unregistration completes).

The basic logic remains the same as in v1.  This patch is now half
comment and half rtnl_lock_unregistering an expanded version of
wait_event performs no extra work in the common case where no network
devices are unregistering when we get to default_device_exit_batch.

Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28 15:09:15 -07:00
Catalin\(ux\) M. BOIE
7df37ff33d IPv6 NAT: Do not drop DNATed 6to4/6rd packets
When a router is doing DNAT for 6to4/6rd packets the latest
anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
6to4 and 6rd") will drop them because the IPv6 address embedded does
not match the IPv4 destination. This patch will allow them to pass by
testing if we have an address that matches on 6to4/6rd interface.  I
have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
Also, log the dropped packets (with rate limit).

Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28 15:56:15 -04:00
Hannes Frederic Sowa
8d2ca1d7b5 ipv6: avoid high order memory allocations for /proc/net/ipv6_route
Dumping routes on a system with lots rt6_infos in the fibs causes up to
11-order allocations in seq_file (which fail). While we could switch
there to vmalloc we could just implement the streaming interface for
/proc/net/ipv6_route. This patch switches /proc/net/ipv6_route from
single_open_net to seq_open_net.

loff_t *pos tracks dst entries.

Also kill never used struct rt6_proc_arg and now unused function
fib6_clean_all_ro.

Cc: Ben Greear <greearb@candelatech.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-27 17:32:16 -04:00
John W. Linville
0a878747e1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
Also fixed-up a badly indented closing brace...

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-09-27 13:11:17 -04:00
Gustavo Padovan
1025c04cec Merge git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Conflicts:
	net/bluetooth/hci_core.c
2013-09-27 11:56:14 -03:00
John W. Linville
7c6a4acc64 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2013-09-26 13:47:05 -04:00
Simon Wunderlich
774f073461 cfg80211: export cfg80211_chandef_dfs_required
It will be used later by the IBSS CSA implementation of mac80211.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-26 13:27:13 +02:00
Johannes Berg
c7c71066c2 mac80211: add ieee80211_iterate_active_interfaces_rtnl()
If it is needed to disconnect multiple virtual interfaces after
(WoWLAN-) suspend, the most obvious approach would be to iterate
all interfaces by calling ieee80211_iterate_active_interfaces()
and then call ieee80211_resume_disconnect() for each one. This
is what the iwlmvm driver does.

Unfortunately, this causes a locking dependency from mac80211's
iflist_mtx to the key_mtx. This is problematic as the former is
intentionally never held while calling any driver operation to
allow drivers to iterate with their own locks held. The key_mtx
is held while installing a key into the driver though, so this
new lock dependency means drivers implementing the logic above
can no longer hold their own lock while iterating.

To fix this, add a new ieee80211_iterate_active_interfaces_rtnl()
function that iterates while the RTNL is already held. This is
true during suspend/resume, so that then the locking dependency
isn't introduced.

While at it, also refactor the various interface iterators and
keep only a single implementation called by the various cases.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-09-26 13:21:37 +02:00
Johan Hedberg
4375f1037d Bluetooth: Add new mgmt_set_advertising command
This patch adds a new mgmt command for enabling and disabling
LE advertising. The command depends on the LE setting being enabled
first and will return a "rejected" response otherwise. The patch also
adds safeguards so that there will ever only be one set_le or
set_advertising command pending per adapter.

The response handling and new_settings event sending is done in an
asynchronous request callback, meaning raw HCI access from user space to
enable advertising (e.g. hciconfig leadv) will not trigger the
new_settings event. This is intentional since trying to support mixed
raw HCI and mgmt access would mean adding extra state tracking or new
helper functions, essentially negating the benefit of using the
asynchronous request framework. The HCI_LE_ENABLED and HCI_LE_PERIPHERAL
flags however are updated correctly even with raw HCI access so this
will not completely break subsequent access over mgmt.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-25 14:30:11 -03:00
Johan Hedberg
eeca6f8913 Bluetooth: Add new mgmt setting for LE advertising
This patch adds a new mgmt setting for LE advertising and hooks up the
necessary places in the mgmt code to operate on the HCI_LE_PERIPHERAL
flag (which corresponds to this setting). This patch does not yet add
any new command for enabling the setting - that is left for a subsequent
patch.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-25 14:30:11 -03:00
Johan Hedberg
416a4ae56b Bluetooth: Use async request for LE enable/disable
This patch updates the code to use an asynchronous request for handling
the enabling and disabling of LE support. This refactoring is necessary
as a preparation for adding advertising support, since when LE is
disabled we should also disable advertising, and the cleanest way to do
this is to perform the two respective HCI commands in the same
asynchronous request.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-25 14:30:11 -03:00
Eric Lapuyade
2bed278517 NFC: NCI: Modify NCI SPI to implement CS/INT handshake per the spec
The NFC Forum NCI specification defines both a hardware and software
protocol when using a SPI physical transport to connect an NFC NCI
Chipset. The hardware requirement is that, after having raised the chip
select line, the SPI driver must wait for an INT line from the NFC
chipset to raise before it sends the data. The chip select must be
raised first though, because this is the signal that the NFC chipset
will detect to wake up and then raise its INT line. If the INT line
doesn't raise in a timely fashion, the SPI driver should abort
operation.

When data is transferred from Device host (DH) to NFC Controller (NFCC),
the signaling sequence is the following:

Data Transfer from DH to NFCC
• 1-Master asserts SPI_CSN
• 2-Slave asserts SPI_INT
• 3-Master sends NCI-over-SPI protocol header and payload data
• 4-Slave deasserts SPI_INT
• 5-Master deasserts SPI_CSN

When data must be transferred from NFCC to DH, things are a little bit
different.

Data Transfer from NFCC to DH
• 1-Slave asserts SPI_INT -> NFC chipset irq handler called -> process
reading from SPI
• 2-Master asserts SPI_CSN
• 3-Master send 2-octet NCI-over-SPI protocol header
• 4-Slave sends 2-octet NCI-over-SPI protocol payload length
• 5-Slave sends NCI-over-SPI protocol payload
• 6-Master deasserts SPI_CSN

In this case, SPI driver should function normally as it does today. Note
that the INT line can and will be lowered anytime between beginning of
step 3 and end of step 5. A low INT is therefore valid after chip select
has been raised.

This would be easily implemented in a single driver. Unfortunately, we
don't write the SPI driver and I had to imagine some workaround trick to
get the SPI and NFC drivers to work in a synchronized fashion. The trick
is the following:

- send an empty spi message: this will raise the chip select line, and
send nothing. We expect the /CS line will stay arisen because we asked
for it in the spi_transfer cs_change field
- wait for a completion, that will be completed by the NFC driver IRQ
handler when it knows we are in the process of sending data (NFC spec
says that we use SPI in a half duplex mode, so we are either sending or
receiving).
- when completed, proceed with the normal data send.

This has been tested and verified to work very consistently on a Nexus
10 (spi-s3c64xx driver). It may not work the same with other spi
drivers.

The previously defined nci_spi_ops{} whose intended purpose were to
address this problem are not used anymore and therefore totally removed.

The nci_spi_send() takes a new optional write_handshake_completion
completion pointer. If non NULL, the nci spi layer will run the above
trick when sending data to the NFC Chip. If NULL, the data is sent
normally all at once and it is then the NFC driver responsibility to
know what it's doing.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 14:59:56 +02:00
Eric Lapuyade
22d4aae589 NFC: NCI: nci_spi_recv_frame() now returns (not forward) the read frame
Previously, nci_spi_recv_frame() would directly transmit incoming frames
to the NCI Core. However, it turns out that some NFC NCI Chips will add
additional proprietary headers that must be handled/removed before NCI
Core gets a chance to handle the frame. With this modification, the chip
phy or driver are now responsible to transmit incoming frames to NCI
Core after proper treatment, and NCI SPI becomes a driver helper instead
of sitting between the NFC driver and NCI Core.

As a general rule in NFC, *_recv_frame() APIs are used to deliver an
incoming frame to an upper layer. To better suit the actual purpose of
nci_spi_recv_frame(), and go along with its nci_spi_send()
counterpart, the function is renamed to nci_spi_read()

The skb is returned as the function result

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 14:25:41 +02:00
Samuel Ortiz
72b70b6ec4 NFC: Define secure element IO API and commands
In order to send and receive ISO7816 APDUs to and from NFC embedded
secure elements, we define a specific netlink command.
On a typical SE use case, host applications will send very few APDUs
(Less than 10) per transaction. This is why we decided to go for a
simple netlink API. Defining another NFC socket protocol for such low
traffic would have been overengineered.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 02:30:47 +02:00
Samuel Ortiz
b9c0c678f7 NFC: Document NFC targets sens_res field
SENS_RES has no specific endiannes attached to it, the kernel ABI is the
following one: Byte 2 (As described by the NFC Forum Digital spec) is
the u16 most significant byte.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 02:02:44 +02:00
Thierry Escande
2c66daecc4 NFC Digital: Add NFC-A technology support
This adds support for NFC-A technology at 106 kbits/s. The stack can
detect tags of type 1 and 2. There is no support for collision
detection. Tags can be read and written by using a user space
application or a daemon like neard.

The flow of polling operations for NFC-A detection is as follow:

1 - The digital stack sends the SENS_REQ command to the NFC device.
2 - The NFC device receives a SENS_RES response from a peer device and
    passes it to the digital stack.
3   - If the SENS_RES response identifies a type 1 tag, detection ends.
      NFC core is notified through nfc_targets_found().
4   - Otherwise, the digital stack sets the cascade level of NFCID1 to
      CL1 and sends the SDD_REQ command.
5 - The digital stack selects SEL_CMD and SEL_PAR according to the
    cascade level and sends the SDD_REQ command.
4 - The digital stack receives a SDD_RES response for the cascade level
    passed in the SDD_REQ command.
5 - The digital stack analyses (part of) NFCID1 and verify BCC.
6 - The digital stack sends the SEL_REQ command with the NFCID1
    received in the SDD_RES.
6 - The peer device replies with a SEL_RES response
7   - Detection ends if NFCID1 is complete. NFC core notified of new
      target by nfc_targets_found().
8   - If NFCID1 is not complete, the cascade level is incremented (up
      to and including CL3) and the execution continues at step 5 to
      get the remaining bytes of NFCID1.

Once target detection is done, type 1 and 2 tag commands must be
handled by a user space application (i.e neard) through the NFC core.
Responses for type 1 tag are returned directly to user space via NFC
core.
Responses of type 2 commands are handled differently. The digital stack
doesn't analyse the type of commands sent through im_transceive() and
must differentiate valid responses from error ones.
The response process flow is as follow:

1 - If the response length is 16 bytes, it is a valid response of a
    READ command. the packet is returned to the NFC core through the
    callback passed to im_transceive(). Processing stops.
2 - If the response is 1 byte long and is a ACK byte (0x0A), it is a
    valid response of a WRITE command for example. First packet byte
    is set to 0 for no-error and passed back to the NFC core.
    Processing stops.
3 - Any other response is treated as an error and -EIO error code is
    returned to the NFC core through the response callback.

Moreover, since the driver can't differentiate success response from a
NACK response, the digital stack has to handle CRC calculation.

Thus, this patch also adds support for CRC calculation. If the driver
doesn't handle it, the digital stack will calculate CRC and will add it
to sent frames. CRC will also be checked and removed from received
frames. Pointers to the correct CRC calculation functions are stored in
the digital stack device structure when a target is detected. This
avoids the need to check the current target type for every call to
im_transceive() and for every response received from a peer device.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 02:02:23 +02:00
Thierry Escande
59ee2361c9 NFC Digital: Implement driver commands mechanism
This implements the mechanism used to send commands to the driver in
initiator mode through in_send_cmd().

Commands are serialized and sent to the driver by using a work item
on the system workqueue. Responses are handled asynchronously by
another work item. Once the digital stack receives the response through
the command_complete callback, the next command is sent to the driver.

This also implements the polling mechanism. It's handled by a work item
cycling on all supported protocols. The start poll command for a given
protocol is sent to the driver using the mechanism described above.
The process continues until a peer is discovered or stop_poll is
called. This patch implements the poll function for NFC-A that sends a
SENS_REQ command and waits for the SENS_RES response.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 02:02:07 +02:00
Thierry Escande
4b10884eb4 NFC: Digital Protocol stack implementation
This is the initial commit of the NFC Digital Protocol stack
implementation.

It offers an interface for devices that don't have an embedded NFC
Digital protocol stack. The driver instantiates the digital stack by
calling nfc_digital_allocate_device(). Within the nfc_digital_ops
structure, the driver specifies a set of function pointers for driver
operations. These functions must be implemented by the driver and are:

in_configure_hw:
Hardware configuration for RF technology and communication framing in
initiator mode. This is a synchronous function.

in_send_cmd:
Initiator mode data exchange using RF technology and framing previously
set with in_configure_hw. The peer response is returned through
callback cb. If an io error occurs or the peer didn't reply within the
specified timeout (ms), the error code is passed back through the resp
pointer. This is an asynchronous function.

tg_configure_hw:
Hardware configuration for RF technology and communication framing in
target mode. This is a synchronous function.

tg_send_cmd:
Target mode data exchange using RF technology and framing previously
set with tg_configure_hw. The peer next command is returned through
callback cb. If an io error occurs or the peer didn't reply within the
specified timeout (ms), the error code is passed back through the resp
pointer. This is an asynchronous function.

tg_listen:
Put the device in listen mode waiting for data from the peer device.
This is an asynchronous function.

tg_listen_mdaa:
If supported, put the device in automatic listen mode with mode
detection and automatic anti-collision. In this mode, the device
automatically detects the RF technology and executes the
anti-collision detection using the command responses specified in
mdaa_params. The mdaa_params structure contains SENS_RES, NFCID1, and
SEL_RES for 106A RF tech. NFCID2 and system code (sc) for 212F and
424F. The driver returns the NFC-DEP ATR_REQ command through cb. The
digital stack deducts the RF tech by analyzing the SoD of the frame
containing the ATR_REQ command. This is an asynchronous function.

switch_rf:
Turns device radio on or off. The stack does not call explicitly
switch_rf to turn the radio on. A call to in|tg_configure_hw must turn
the device radio on.

abort_cmd:
Discard the last sent command.

Then the driver registers itself against the digital stack by using
nfc_digital_register_device() which in turn registers the digital stack
against the NFC core layer. The digital stack implements common NFC
operations like dev_up(), dev_down(), start_poll(), stop_poll(), etc.

This patch is only a skeleton and NFC operations are just stubs.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 01:35:42 +02:00
Eric Lapuyade
fa544fff62 NFC: NCI: Simplify NCI SPI to become a simple framing/checking layer
NCI SPI layer should not manage the nci dev, this is the job of the nci
chipset driver. This layer should be limited to frame/deframe nci
packets, and optionnaly check integrity (crc) and manage the ack/nak
protocol.

The NCI SPI must not be mixed up with an NCI dev. spi_[dev|device] are
therefore renamed to a simple spi for more clarity.
The header and crc sizes are moved to nci.h so that drivers can use
them to reserve space in outgoing skbs.
nci_spi_send() is exported to be accessible by drivers.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 01:35:41 +02:00
Eric Lapuyade
08f13acff9 NFC: Move struct nfc_phy_ops out of HCI up to nfc core level
struct nfc_phy_ops is not an HCI structure only, it can also be used by
NCI or direct NFC Core drivers.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 01:35:40 +02:00
Eric Lapuyade
d593751129 NFC: NCI: Rename spi ndev -> nsdev and nci_dev -> ndev for consistency
An hci dev is an hdev. An nci dev is an ndev. Calling an nci spi dev an
ndev is misleading since it's not the same thing. The nci dev contained
in the nci spi dev is also named inconsistently.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 01:35:40 +02:00
Joe Perches
073a625f0b NFC: Convert nfc_dev_info and nfc_dev_err to nfc_<level>
Use a more standard kernel style macro logging name.

Standardize the spacing of the "NFC: " prefix.
Add \n to uses, remove from macro.
Fix the defective uses that already had a \n.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 01:35:39 +02:00
Joe Perches
b48348395f NFC: Replace nfc_dev_dbg with dev_dbg
Use the generic kernel function instead of a home-grown
one that does the same thing.

Add \n to uses not at the macro.  Don't add \n where
the nfc_dev_dbg macro mistakenly had them already.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 01:35:39 +02:00
Arron Wang
d8eb18eeca NFC: Export nfc_find_se()
This will be needed by all NFC driver implementing the SE ops.

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-09-25 01:35:39 +02:00
Florian Westphal
8c27bd75f0 tcp: syncookies: reduce cookie lifetime to 128 seconds
We currently accept cookies that were created less than 4 minutes ago
(ie, cookies with counter delta 0-3).  Combined with the 8 mss table
values, this yields 32 possible values (out of 2**32) that will be valid.

Reducing the lifetime to < 2 minutes halves the guessing chance while
still providing a large enough period.

While at it, get rid of jiffies value -- they overflow too quickly on
32 bit platforms.

getnstimeofday is used to create a counter that increments every 64s.
perf shows getnstimeofday cost is negible compared to sha_transform;
normal tcp initial sequence number generation uses getnstimeofday, too.

Reported-by: Jakob Lell <jakob@jakoblell.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-24 10:39:58 -04:00
Noel Burton-Krahn
9fe34f5d92 mrp: add periodictimer to allow retries when packets get lost
MRP doesn't implement the periodictimer in 802.1Q, so it never retries
if packets get lost.  I ran into this problem when MRP sent a MVRP
JoinIn before the interface was fully up.  The JoinIn was lost, MRP
didn't retry, and MVRP registration failed.

Tested against Juniper QFabric switches

Signed-off-by: Noel Burton-Krahn <noel@burton-krahn.com>
Acked-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:53:52 -04:00
Joe Perches
7b58446068 sctp: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:42 -04:00
Joe Perches
4e77be4637 netfilter: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:42 -04:00
Joe Perches
0e418f94d3 irda: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:42 -04:00
Joe Perches
a22b8f4b57 caif_hsi.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:41 -04:00
Joe Perches
e74e58f8d2 bluetooth: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:41 -04:00
Joe Perches
d511337a1e xfrm.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:41 -04:00
Joe Perches
5db50ee6e6 x25.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:41 -04:00
Joe Perches
6dfd43d28c wimax.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:41 -04:00
Joe Perches
9e4638cdc8 wext.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:40 -04:00
Joe Perches
160734399f udplite.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:40 -04:00
Joe Perches
1a50bd064e udp.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:40 -04:00
Joe Perches
5c9f30236f tcp.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 16:29:40 -04:00
Joe Perches
f6982299f2 stp.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:09 -04:00
Joe Perches
69336bd2d3 sock.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:09 -04:00
Joe Perches
a4ea1fef97 secure_seq.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:09 -04:00
Joe Perches
8153ff5c5f scm.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:09 -04:00
Joe Perches
efb48ccfcd rtnetlink.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:09 -04:00
Joe Perches
2702c4bb89 route.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:08 -04:00
Joe Perches
54df3b961d rose.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:08 -04:00
Joe Perches
c0f4502a87 request_sock.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:08 -04:00
Joe Perches
0ad6ad9461 raw/rawv6.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:08 -04:00
Joe Perches
c64b5c4b91 psnap.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:08 -04:00
Joe Perches
f307c63656 protocol.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:08 -04:00
Joe Perches
6d3b334d67 ping.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-23 01:51:07 -04:00
Joe Perches
8dda204164 p8022.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:39 -04:00
Joe Perches
96496127fe netrom.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:39 -04:00
Joe Perches
4f69053b72 netevent/netlink.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:39 -04:00
Joe Perches
70a3926f49 iw_handler.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:39 -04:00
Joe Perches
e67e16ea9b net_namespace.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:39 -04:00
Joe Perches
3cc818a27d ndisc.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:39 -04:00
Joe Perches
a5be1eb648 mrp.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:38 -04:00
Joe Perches
bf3c710f71 llc*.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:38 -04:00
Joe Perches
cb7d3d71e9 lapb.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:38 -04:00
Joe Perches
9d03626a28 ipx.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:38 -04:00
Joe Perches
5c3a0fd7d0 ip*.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:38 -04:00
Joe Perches
1fd5115538 inet*.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-21 14:01:38 -04:00
Joe Perches
e60ab84dd5 icmp.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:33 -04:00
Joe Perches
ff2b94d2c3 genetlink.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:33 -04:00
Joe Perches
8aae218f58 gen_stats.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:33 -04:00
Joe Perches
2008f21cbd garp.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:33 -04:00
Joe Perches
4787342c39 flow.h/flow_keys.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:33 -04:00
Joe Perches
8de6879fa9 fib_rules.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:33 -04:00
Joe Perches
7b3852a2fd esp.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:32 -04:00
Joe Perches
a4023dd01a dst.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:32 -04:00
Joe Perches
59ddd965c2 decnet (dn*.h): Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:32 -04:00
Joe Perches
126c623b39 dcbevent.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:32 -04:00
Joe Perches
e8d895a4bb compat.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:49:32 -04:00
Eric Dumazet
3e1e3aae1f net_sched: add u64 rate to psched_ratecfg_precompute()
Add an extra u64 rate parameter to psched_ratecfg_precompute()
so that some qdisc can opt-in for 64bit rates in the future,
to overcome the ~34 Gbits limit.

psched_ratecfg_getrate() reports a legacy structure to
tc utility, so if actual rate is above the 32bit rate field,
cap it to the 34Gbit limit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:41:02 -04:00
Ansis Atteka
703133de33 ip: generate unique IP identificator if local fragmentation is allowed
If local fragmentation is allowed, then ip_select_ident() and
ip_select_ident_more() need to generate unique IDs to ensure
correct defragmentation on the peer.

For example, if IPsec (tunnel mode) has to encrypt large skbs
that have local_df bit set, then all IP fragments that belonged
to different ESP datagrams would have used the same identificator.
If one of these IP fragments would get lost or reordered, then
peer could possibly stitch together wrong IP fragments that did
not belong to the same datagram. This would lead to a packet loss
or data corruption.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-19 14:11:15 -04:00
Johan Hedberg
d62e6d67a7 Bluetooth: Add event mask page 2 setting support
For those controller that support the HCI_Set_Event_Mask_Page_2 command
we should include it in the init sequence. This patch implements sending
of the command and enables the events in it based on supported features
(currently only CSB is checked).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-19 10:21:44 -05:00
Johan Hedberg
5d4e7e8db0 Bluetooth: Add synchronization train parameters reading support
This patch adds support for reading the synchronization train parameters
for controllers that support the feature. Since the feature is
detectable through the local features page 2, which is retreived only in
stage 3 of the HCI init sequence, there is no other option than to add a
fourth stage to the init sequence.

For now the patch doesn't yet add storing of the parameters, but it is
nevertheless convenient to have around to see what kind of parameters
various controllers use by default (analyzable e.g. with the btmon user
space tool).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-19 10:20:07 -05:00
Johan Hedberg
e793dcf082 Bluetooth: Fix waiting for clearing of BT_SK_SUSPEND flag
In the case of blocking sockets we should not proceed with sendmsg() if
the socket has the BT_SK_SUSPEND flag set. So far the code was only
ensuring that POLLOUT doesn't get set for non-blocking sockets using
poll() but there was no code in place to ensure that blocking sockets do
the right thing when writing to them.

This patch adds a new bt_sock_wait_ready helper function to sleep in the
sendmsg call if the BT_SK_SUSPEND flag is set, and wake up as soon as it
is unset. It also updates the L2CAP and RFCOMM sendmsg callbacks to take
advantage of this new helper function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-18 17:02:59 -05:00
Julian Anastasov
bcbde4c0a7 ipvs: make the service replacement more robust
commit 578bc3ef1e ("ipvs: reorganize dest trash") added
IP_VS_DEST_STATE_REMOVING flag and RCU callback named
ip_vs_dest_wait_readers() to keep dests and services after
removal for at least a RCU grace period. But we have the
following corner cases:

- we can not reuse the same dest if its service is removed
while IP_VS_DEST_STATE_REMOVING is still set because another dest
removal in the first grace period can not extend this period.
It can happen when ipvsadm -C && ipvsadm -R is used.

- dest->svc can be replaced but ip_vs_in_stats() and
ip_vs_out_stats() have no explicit read memory barriers
when accessing dest->svc. It can happen that dest->svc
was just freed (replaced) while we use it to update
the stats.

We solve the problems as follows:

- IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed
idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start
will remember when for first time after deletion we noticed
dest->refcnt=0. Later, the connections can grab a reference
while in RCU grace period but if refcnt becomes 0 we can
safely free the dest and its svc.

- dest->svc becomes RCU pointer. As result, we add explicit
RCU locking in ip_vs_in_stats() and ip_vs_out_stats().

- __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it
now can free the service immediately or after a RCU grace
period. dest->svc is not set to NULL anymore.

	As result, unlinked dests and their services are
freed always after IP_VS_DEST_TRASH_PERIOD period, unused
services are freed after a RCU grace period.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18 14:39:03 -05:00
Simon Kirby
c16526a7b9 ipvs: fix overflow on dest weight multiply
Schedulers such as lblc and lblcr require the weight to be as high as the
maximum number of active connections. In commit b552f7e3a9
("ipvs: unify the formula to estimate the overhead of processing
connections"), the consideration of inactconns and activeconns was cleaned
up to always count activeconns as 256 times more important than inactconns.
In cases where 3000 or more connections are expected, a weight of 3000 *
256 * 3000 connections overflows the 32-bit signed result used to determine
if rescheduling is required.

On amd64, this merely changes the multiply and comparison instructions to
64-bit. On x86, a 64-bit result is already present from imull, so only
a few more comparison instructions are emitted.

Signed-off-by: Simon Kirby <sim@hostway.ca>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18 14:38:53 -05:00
Johan Hedberg
0af784dcbc Bluetooth: Remove unused event mask struct
The struct for HCI_Set_Event_Mask is never used. Instead a local 8-byte
array is used for sending this command. Therefore, remove the
unnecessary struct definition.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-18 12:43:55 -05:00
Johan Hedberg
5e130367d4 Bluetooth: Introduce a new HCI_RFKILLED flag
This makes it more convenient to check for rfkill (no need to check for
dev->rfkill before calling rfkill_blocked()) and also avoids potential
races if the RFKILL state needs to be checked from within the rfkill
callback.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-18 12:37:27 -05:00
David S. Miller
61c5923a2f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for you net tree,
mostly targeted to ipset, they are:

* Fix ICMPv6 NAT due to wrong comparison, code instead of type, from
  Phil Oester.

* Fix RCU race in conntrack extensions release path, from Michal Kubecek.

* Fix missing inversion in the userspace ipset test command match if
  the nomatch option is specified, from Jozsef Kadlecsik.

* Skip layer 4 protocol matching in ipset in case of IPv6 fragments,
  also from Jozsef Kadlecsik.

* Fix sequence adjustment in nfnetlink_queue due to using the netlink
  skb instead of the network skb, from Gao feng.

* Make sure we cannot swap of sets with different layer 3 family in
  ipset, from Jozsef Kadlecsik.

* Fix possible bogus matching in ipset if hash sets with net elements
  are used, from Oliver Smith.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-17 20:22:53 -04:00
Marcel Holtmann
23500189d7 Bluetooth: Introduce new HCI socket channel for user operation
This patch introcuces a new HCI socket channel that allows user
applications to take control over a specific HCI device. The application
gains exclusive access to this device and forces the kernel to stay away
and not manage it. In case of the management interface it will actually
hide the device.

Such operation is useful for security testing tools that need to operate
underneath the Bluetooth stack and need full control over a device. The
advantage here is that the kernel still provides the service of hardware
abstraction and HCI level access. The use of Bluetooth drivers for
hardware access also means that sniffing tools like btmon or hcidump
are still working and the whole set of transaction can be traced with
existing tools.

With the new channel it is possible to send HCI commands, ACL and SCO
data packets and receive HCI events, ACL and SCO packets from the
device. The format follows the well established H:4 protocol.

The new HCI user channel can only be established when a device has been
through its setup routine and is currently powered down. This is
enforced to not cause any problems with current operations. In addition
only one user channel per HCI device is allowed. It is exclusive access
for one user application. Access to this channel is limited to process
with CAP_NET_RAW capability.

Using this new facility does not require any external library or special
ioctl or socket filters. Just create the socket and bind it. After that
the file descriptor is ready to speak H:4 protocol.

        struct sockaddr_hci addr;
        int fd;

        fd = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI);

        memset(&addr, 0, sizeof(addr));
        addr.hci_family = AF_BLUETOOTH;
        addr.hci_dev = 0;
        addr.hci_channel = HCI_CHANNEL_USER;

        bind(fd, (struct sockaddr *) &addr, sizeof(addr));

The example shows on how to create a user channel for hci0 device. Error
handling has been left out of the example. However with the limitations
mentioned above it is advised to handle errors. Binding of the user
cahnnel socket can fail for various reasons. Specifically if the device
is currently activated by BlueZ or if the access permissions are not
present.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-16 14:35:55 -03:00
Marcel Holtmann
0736cfa8e5 Bluetooth: Introduce user channel flag for HCI devices
This patch introduces a new user channel flag that allows to give full
control of a HCI device to a user application. The kernel will stay away
from the device and does not allow any further modifications of the
device states.

The existing raw flag is not used since it has a bit of unclear meaning
due to its legacy. Using a new flag makes the code clearer.

A device with the user channel flag set can still be enumerate using the
legacy API, but it does not longer enumerate using the new management
interface used by BlueZ 5 and beyond. This is intentional to not confuse
users of modern systems.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-16 14:35:55 -03:00
Michal Kubeček
c13a84a830 netfilter: nf_conntrack: use RCU safe kfree for conntrack extensions
Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced
RCU protection for freeing extension data when reallocation
moves them to a new location. We need the same protection when
freeing them in nf_ct_ext_free() in order to prevent a
use-after-free by other threads referencing a NAT extension data
via bysource list.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-13 11:58:40 +02:00
Linus Torvalds
bbda1baeeb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Brown paper bag fix in HTB scheduler, class options set incorrectly
    due to a typoe.  Fix from Vimalkumar.

 2) It's possible for the ipv6 FIB garbage collector to run before all
    the necessary datastructure are setup during init, defer the
    notifier registry to avoid this problem.  Fix from Michal Kubecek.

 3) New i40e ethernet driver from the Intel folks.

 4) Add new qmi wwan device IDs, from Bjørn Mork.

 5) Doorbell lock in bnx2x driver is not initialized properly in some
    configurations, fix from Ariel Elior.

 6) Revert an ipv6 packet option padding change that broke standardized
    ipv6 implementation test suites.  From Jiri Pirko.

 7) Fix synchronization of ARP information in bonding layer, from
    Nikolay Aleksandrov.

 8) Fix missing error return resulting in illegal memory accesses in
    openvswitch, from Daniel Borkmann.

 9) SCTP doesn't signal poll events properly due to mistaken operator
    precedence, fix also from Daniel Borkmann.

10) __netdev_pick_tx() passes wrong index to sk_tx_queue_set() which
    essentially disables caching of TX queue in sockets :-/ Fix from
    Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  net_sched: htb: fix a typo in htb_change_class()
  net: qmi_wwan: add new Qualcomm devices
  ipv6: don't call fib6_run_gc() until routing is ready
  net: tilegx driver: avoid compiler warning
  fib6_rules: fix indentation
  irda: vlsi_ir: Remove casting the return value which is a void pointer
  irda: donauboe: Remove casting the return value which is a void pointer
  net: fix multiqueue selection
  net: sctp: fix smatch warning in sctp_send_asconf_del_ip
  net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE
  net: fib: fib6_add: fix potential NULL pointer dereference
  net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs
  bcm63xx_enet: remove deprecated IRQF_DISABLED
  net: korina: remove deprecated IRQF_DISABLED
  macvlan: Move skb_clone check closer to call
  qlcnic: Fix warning reported by kbuild test robot.
  bonding: fix bond_arp_rcv setting and arp validate desync state
  bonding: fix store_arp_validate race with mode change
  ipv6/exthdrs: accept tlv which includes only padding
  bnx2x: avoid atomic allocations during initialization
  ...
2013-09-11 14:33:16 -07:00
Michal Kubeček
2c861cc65e ipv6: don't call fib6_run_gc() until routing is ready
When loading the ipv6 module, ndisc_init() is called before
ip6_route_init(). As the former registers a handler calling
fib6_run_gc(), this opens a window to run the garbage collector
before necessary data structures are initialized. If a network
device is initialized in this window, adding MAC address to it
triggers a NETDEV_CHANGEADDR event, leading to a crash in
fib6_clean_all().

Take the event handler registration out of ndisc_init() into a
separate function ndisc_late_init() and move it after
ip6_route_init().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 17:04:09 -04:00
Linus Torvalds
2b76db6a0f for-linus-3.12-merge minor 9p fixes and tweaks for 3.12 merge window
The first fixes namespace issues which causes a kernel
 NULL pointer dereference, the second fixes uevent
 handling to work better with udev, and the third
 switches some code to use srlcpy instead of strncpy
 in order to be safer.
 
 All changes have been baking in for-next for at least
 2 weeks.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJSMJjZAAoJEDZk62b0Tg6x81sQAKa60QStBKhnL65bvG+ooIsS
 mhwfmFyaWOKw1ezwY2Vk0+JnmKDBpKmqjjwyL3nLP18TcRZStPiFdcJBKWl+czge
 FTv14t54CcjysYPbYN7+gUap4F5mfg0mcHaR0UGow505dNyjwd7mqkZhy1IqhdvP
 Ue/h0RE46GeNtdirxrKBdEfW/7TAL0tcoRgjKu0ev1V2sXCJZywuXgkzWjByRXwT
 JOg04gGnYThuek0/KUPRhf0KxB0CyKrZiics7LGb40HkYYxs7ahADACttLyiDr8l
 GntfHXLgvVlU5QcSbKRfLp0zNbi7AxWmJrwYsEwpas4tUw1Q+pVJ2EE2Ameuq5G+
 LrMGmRVQCVYw8UN+OYUO7glhXEJcCPJj6vxgm+NVXx24yaQyGI1aTsIEjHwZ/hkm
 wlQHC47z6/fIypkXpsU6pYWF/r3GwXHokYReejATQWEPIzIxvHeThe0jjqMLth7F
 zmsHZTpmECqtti1fizy5wBZD25wAIxdf+rf8nKy1VvcSN4s08ESSlC/kV/siNeko
 efFnL8xbjP5SPEVoBtXM6eTDHrQ0S+ACSGWtp0FGXKOW4PKzS60ve2Stp+FYZgQc
 WgXI7+NBU6Z9z+cZ9bsY0hrGwK1YZiR4F3KJ5ofTuxAO6n7zd+N3fGBuQJ2tiW9P
 pKtIXNozWqnAU9Wx4rGa
 =YbFT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p updates from Eric Van Hensbergen:
 "Minor 9p fixes and tweaks for 3.12 merge window

  The first fixes namespace issues which causes a kernel NULL pointer
  dereference, the second fixes uevent handling to work better with
  udev, and the third switches some code to use srlcpy instead of
  strncpy in order to be safer.

  All changes have been baking in for-next for at least 2 weeks"

* tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: avoid accessing utsname after namespace has been torn down
  9p: send uevent after adding/removing mount_tag attribute
  fs: 9p: use strlcpy instead of strncpy
2013-09-11 12:34:13 -07:00
Linus Torvalds
cc998ff881 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:
 "Noteworthy changes this time around:

   1) Multicast rejoin support for team driver, from Jiri Pirko.

   2) Centralize and simplify TCP RTT measurement handling in order to
      reduce the impact of bad RTO seeding from SYN/ACKs.  Also, when
      both timestamps and local RTT measurements are available prefer
      the later because there are broken middleware devices which
      scramble the timestamp.

      From Yuchung Cheng.

   3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel
      memory consumed to queue up unsend user data.  From Eric Dumazet.

   4) Add a "physical port ID" abstraction for network devices, from
      Jiri Pirko.

   5) Add a "suppress" operation to influence fib_rules lookups, from
      Stefan Tomanek.

   6) Add a networking development FAQ, from Paul Gortmaker.

   7) Extend the information provided by tcp_probe and add ipv6 support,
      from Daniel Borkmann.

   8) Use RCU locking more extensively in openvswitch data paths, from
      Pravin B Shelar.

   9) Add SCTP support to openvswitch, from Joe Stringer.

  10) Add EF10 chip support to SFC driver, from Ben Hutchings.

  11) Add new SYNPROXY netfilter target, from Patrick McHardy.

  12) Compute a rate approximation for sending in TCP sockets, and use
      this to more intelligently coalesce TSO frames.  Furthermore, add
      a new packet scheduler which takes advantage of this estimate when
      available.  From Eric Dumazet.

  13) Allow AF_PACKET fanouts with random selection, from Daniel
      Borkmann.

  14) Add ipv6 support to vxlan driver, from Cong Wang"

Resolved conflicts as per discussion.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits)
  openvswitch: Fix alignment of struct sw_flow_key.
  netfilter: Fix build errors with xt_socket.c
  tcp: Add missing braces to do_tcp_setsockopt
  caif: Add missing braces to multiline if in cfctrl_linkup_request
  bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize
  vxlan: Fix kernel panic on device delete.
  net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
  net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
  icplus: Use netif_running to determine device state
  ethernet/arc/arc_emac: Fix huge delays in large file copies
  tuntap: orphan frags before trying to set tx timestamp
  tuntap: purge socket error queue on detach
  qlcnic: use standard NAPI weights
  ipv6:introduce function to find route for redirect
  bnx2x: VF RSS support - VF side
  bnx2x: VF RSS support - PF side
  vxlan: Notify drivers for listening UDP port changes
  net: usbnet: update addr_assign_type if appropriate
  driver/net: enic: update enic maintainers and driver
  driver/net: enic: Exposing symbols for Cisco's low latency driver
  ...
2013-09-05 14:54:29 -07:00
David S. Miller
06c54055be Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	net/bridge/br_multicast.c
	net/ipv6/sit.c

The conflicts were minor:

1) sit.c changes overlap with change to ip_tunnel_xmit() signature.

2) br_multicast.c had an overlap between computing max_delay using
   msecs_to_jiffies and turning MLDV2_MRC() into an inline function
   with a name using lowercase instead of uppercase letters.

3) stmmac had two overlapping changes, one which conditionally allocated
   and hooked up a dma_cfg based upon the presence of the pbl OF property,
   and another one handling store-and-forward DMA made.  The latter of
   which should not go into the new of_find_property() basic block.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 14:58:52 -04:00
Joseph Gasparakis
53cf527513 vxlan: Notify drivers for listening UDP port changes
This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
ndo_del_rx_vxlan_port().

Drivers can get notifications through the above functions about changes
of the UDP listening port of VXLAN. Also, when physical ports come up,
now they can call vxlan_get_rx_port() in order to obtain the port number(s)
of the existing VXLAN interface in case they already up before them.

This information about the listening UDP port would be used for VXLAN
related offloads.

A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
input and his suggestions on this patch set.

CC: John Fastabend <john.r.fastabend@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 12:44:30 -04:00
Daniel Borkmann
e3f5b17047 net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation
Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:20 -04:00
Daniel Borkmann
89225d1ce6 net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.
i) RFC3810, 9.2. Query Interval [QI] says:

   The Query Interval variable denotes the interval between General
   Queries sent by the Querier. Default value: 125 seconds. [...]

ii) RFC3810, 9.3. Query Response Interval [QRI] says:

  The Maximum Response Delay used to calculate the Maximum Response
  Code inserted into the periodic General Queries. Default value:
  10000 (10 seconds) [...] The number of seconds represented by the
  [Query Response Interval] must be less than the [Query Interval].

iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:

  The Older Version Querier Present Timeout is the time-out for
  transitioning a host back to MLDv2 Host Compatibility Mode. When an
  MLDv1 query is received, MLDv2 hosts set their Older Version Querier
  Present Timer to [Older Version Querier Present Timeout].

  This value MUST be ([Robustness Variable] times (the [Query Interval]
  in the last Query received)) plus ([Query Response Interval]).

Hence, on *default* the timeout results in:

  [RV] = 2, [QI] = 125sec, [QRI] = 10sec
  [OVQPT] = [RV] * [QI] + [QRI] = 260sec

Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...

  switchback = (idev->mc_qrv + 1) * max_delay

RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:

  This section is meant to provide advice to network administrators on
  how to tune these settings to their network. Ambitious router
  implementations might tune these settings dynamically based upon
  changing characteristics of the network. [...]

iv) RFC38010, 9.14.2. Query Interval:

  The overall level of periodic MLD traffic is inversely proportional
  to the Query Interval. A longer Query Interval results in a lower
  overall level of MLD traffic. The value of the Query Interval MUST
  be equal to or greater than the Maximum Response Delay used to
  calculate the Maximum Response Code inserted in General Query
  messages.

I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.

Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.

Hence, introduce necessary helper functions and fix this up properly as it
should be.

Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:20 -04:00
Vijay Subramanian
c995ae2259 tcp: Change return value of tcp_rcv_established()
tcp_rcv_established() returns only one value namely 0. We change the return
value to void (as suggested by David Miller).

After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in
response to SYNs. We can remove the check and processing on the return value of
tcp_rcv_established().

We also fix jtcp_rcv_established() in tcp_probe.c to match that of
tcp_rcv_established().

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:28 -04:00
Nicolas Dichtel
ea23192e8e tunnels: harmonize cleanup done on skb on rx path
The goal of this patch is to harmonize cleanup done on a skbuff on rx path.
Before this patch, behaviors were different depending of the tunnel type.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:26 -04:00
Nicolas Dichtel
963a88b31d tunnels: harmonize cleanup done on skb on xmit path
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path.
Before this patch, behaviors were different depending of the tunnel type.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:25 -04:00
Nicolas Dichtel
117961878c vxlan: remove net arg from vxlan[6]_xmit_skb()
This argument is not used, let's remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:25 -04:00
Nicolas Dichtel
8b7ed2d91d iptunnels: remove net arg from iptunnel_xmit()
This argument is not used, let's remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:25 -04:00
Joe Perches
951fd874c3 llc: Use normal etherdevice.h tests
Convert the llc_<foo> static inlines to the
equivalents from etherdevice.h and remove
the llc_<foo> static inline functions.

llc_mac_null -> is_zero_ether_addr
llc_mac_multicast -> is_multicast_ether_addr
llc_mac_match -> ether_addr_equal

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 22:34:47 -04:00
David S. Miller
e7abfe4092 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please accept this batch of updates intended for the 3.12 stream.

For the mac80211 bits, Johannes says this:

"This time I have various improvements all over the place: IBSS, mesh,
testmode, AP client powersave handling, one of the rare rfkill patches
and some code cleanup."

Also for mac80211:

"And I also have some more changes for -next, just a few small fixes and
improvements, nothing really stands out."

And for iwlwifi:

"This time I have some powersave work (notably uAPSD support), CQM
offloads, support for a new firmware API and various code cleanups."

Regarding the Bluetooth bits, Gustavo says:

"Patches to 3.12, here we have:

* implementation of a proper tty_port for RFCOMM devices, this fixes some
issues people were seeing lately in the kernel.
* Add voice_setting option for SCO, it is used for SCO Codec selection
* bugfixes, small improvements and clean ups"

For the NFC bits, Samuel says:

"With this one we have:

- A few pn533 improvements and minor fixes. Testing our pn533 driver
  against Google's NCI stack triggered a few issues that we fixed now.
  We also added Tx fragmentation support to this driver.

- More NFC secure element handling. We added a GET_SE netlink command
  for getting all the discovered secure elements, and we defined 2
  additional secure element netlink event (transaction and connectivity).
  We also fixed a couple of typos and copy-paste bugs from the secure
  element handling code.

- Firmware download support for the pn544 driver. This chipset can enter a
  special mode where it's waiting for firmware blobs to replace the
  already flashed one. We now support that mode."

With repect to the ath tree, Kalle says:

"New features in ath10k are rx/tx checsumming in hw and survey scan
implemented by Michal. Also he made fixes to different areas of the
driver, most notable being fixing the case when using two streams and
reducing the number of interface combinations to avoid firmware crashes.
Bartosz did a clean related to how we handle SoC power save in PCI
layer.

For ath6kl Mohammed and Vasanth sent each a patch to fix two infrequent
crashes."

I also pulled the wireless tree into wireless-next to support a
request from Johannes.  On top of all that, there are the usual
sort of driver updates.  The mwifiex, brcmfmac, brcmsmac, ath9k,
and rt2x00 drivers all get some attention, as does the bcma bus and
a few other random bits here and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 21:45:31 -04:00
Linus Torvalds
32dad03d16 Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "A lot of activities on the cgroup front.  Most changes aren't visible
  to userland at all at this point and are laying foundation for the
  planned unified hierarchy.

   - The biggest change is decoupling the lifetime management of css
     (cgroup_subsys_state) from that of cgroup's.  Because controllers
     (cpu, memory, block and so on) will need to be dynamically enabled
     and disabled, css which is the association point between a cgroup
     and a controller may come and go dynamically across the lifetime of
     a cgroup.  Till now, css's were created when the associated cgroup
     was created and stayed till the cgroup got destroyed.

     Assumptions around this tight coupling permeated through cgroup
     core and controllers.  These assumptions are gradually removed,
     which consists bulk of patches, and css destruction path is
     completely decoupled from cgroup destruction path.  Note that
     decoupling of creation path is relatively easy on top of these
     changes and the patchset is pending for the next window.

   - cgroup has its own event mechanism cgroup.event_control, which is
     only used by memcg.  It is overly complex trying to achieve high
     flexibility whose benefits seem dubious at best.  Going forward,
     new events will simply generate file modified event and the
     existing mechanism is being made specific to memcg.  This pull
     request contains prepatory patches for such change.

   - Various fixes and cleanups"

Fixed up conflict in kernel/cgroup.c as per Tejun.

* 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits)
  cgroup: fix cgroup_css() invocation in css_from_id()
  cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
  cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup
  cgroup: implement CFTYPE_NO_PREFIX
  cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys
  cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax
  cgroup: fix cgroup_write_event_control()
  cgroup: fix subsystem file accesses on the root cgroup
  cgroup: change cgroup_from_id() to css_from_id()
  cgroup: use css_get() in cgroup_create() to check CSS_ROOT
  cpuset: remove an unncessary forward declaration
  cgroup: RCU protect each cgroup_subsys_state release
  cgroup: move subsys file removal to kill_css()
  cgroup: factor out kill_css()
  cgroup: decouple cgroup_subsys_state destruction from cgroup destruction
  cgroup: replace cgroup->css_kill_cnt with ->nr_css
  cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item
  cgroup: move cgroup->subsys[] assignment to online_css()
  cgroup: reorganize css init / exit paths
  cgroup: add __rcu modifier to cgroup->subsys[]
  ...
2013-09-03 18:25:03 -07:00
Cong Wang
5a17a390de net: make snmp_mib_free static inline
Fengguang reported:

   net/built-in.o: In function `in6_dev_finish_destroy':
   (.text+0x4ca7d): undefined reference to `snmp_mib_free'

this is due to snmp_mib_free() is defined when CONFIG_INET is enabled,
but in6_dev_finish_destroy() is now moved to core kernel.

I think snmp_mib_free() is small enough to be inlined, so just make it
static inline.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-02 21:00:50 -07:00
Cong Wang
f564f45c45 vxlan: add ipv6 proxy support
This patch adds the IPv6 version of "arp_reduce", ndisc_send_na()
will be needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:01 -04:00
Cong Wang
e15a00aafa vxlan: add ipv6 route short circuit support
route short circuit only has IPv4 part, this patch adds
the IPv6 part. nd_tbl will be needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang
e4c7ed4153 vxlan: add ipv6 support
This patch adds IPv6 support to vxlan device, as the new version
RFC already mentions it:

   http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03

Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang
5f81bd2e5d ipv6: export a stub for IPv6 symbols used by vxlan
In case IPv6 is compiled as a module, introduce a stub
for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used
by vxlan module. Suggested by Ben.

This is an ugly but easy solution for now.

Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang
3ce9b35ff6 ipv6: move ip6_dst_hoplimit() into core kernel
It will be used by vxlan, and may not be inlined.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:29:59 -04:00
stephen hemminger
d2a7f269f9 qdisc: make args to qdisc_create_default const
Fixes warnings introduced by the qdisc default patch.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 18:09:45 -04:00
stephen hemminger
6da7c8fcbc qdisc: allow setting default queuing discipline
By default, the pfifo_fast queue discipline has been used by default
for all devices. But we have better choices now.

This patch allow setting the default queueing discipline with sysctl.
This allows easy use of better queueing disciplines on all devices
without having to use tc qdisc scripts. It is intended to allow
an easy path for distributions to make fq_codel or sfq the default
qdisc.

This patch also makes pfifo_fast more of a first class qdisc, since
it is now possible to manually override the default and explicitly
use pfifo_fast. The behavior for systems who do not use the sysctl
is unchanged, they still get pfifo_fast

Also removes leftover random # in sysctl net core.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 00:32:32 -04:00
David S. Miller
79f9ab7e0a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
This pull request fixes some issues that arise when 6in4 or 4in6 tunnels
are used in combination with IPsec, all from Hannes Frederic Sowa and a
null pointer dereference when queueing packets to the policy hold queue.

1) We might access the local error handler of the wrong address family if
   6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer
   to the correct local_error to xfrm_state_afinet.

2) Add a helper function to always refer to the correct interpretation
   of skb->sk.

3) Call skb_reset_inner_headers to record the position of the inner headers
   when adding a new one in various ipv6 tunnels. This is needed to identify
   the addresses where to send back errors in the xfrm layer.

4) Dereference inner ipv6 header if encapsulated to always call the
   right error handler.

5) Choose protocol family by skb protocol to not call the wrong
   xfrm{4,6}_local_error handler in case an ipv6 sockets is used
   in ipv4 mode.

6) Partly revert "xfrm: introduce helper for safe determination of mtu"
   because this introduced pmtu discovery problems.

7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs.
   We need this to get the correct mtu informations in xfrm.

8) Fix null pointer dereference in xdst_queue_output.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29 16:05:30 -04:00
Eric Dumazet
95bd09eb27 tcp: TSO packets automatic sizing
After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp->xmit_size_goal_segs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29 15:50:06 -04:00
Daniel Borkmann
76bfd89844 net: sctp: reorder sctp_globals to reduce cacheline usage
Reduce cacheline usage from 2 to 1 cacheline for sctp_globals structure. By
reordering elements, we can close gaps and simply achieve the following:

Current situation:
  /* size: 80, cachelines: 2, members: 10 */
  /* sum members: 57, holes: 4, sum holes: 16 */
  /* padding: 7 */
  /* last cacheline: 16 bytes */

Afterwards:
  /* size: 64, cachelines: 1, members: 10 */
  /* padding: 7 */

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29 14:55:54 -04:00
John W. Linville
0d8165e9fc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/trans.c
2013-08-29 14:08:24 -04:00
Eliezer Tamir
3046e2f5b7 net: add cpu_relax to busy poll loop
Add a cpu_relaxt to sk_busy_loop.

Julie Cummings reported performance issues when hyperthreading is on.
Arjan van de Ven observed that we should have a cpu_relax() in the
busy poll loop.

Reported-by: Julie Cummings <julie.a.cummings@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-28 17:45:48 -04:00
Pravin B Shelar
33c6b1f6b1 genl: Hold reference on correct module while netlink-dump.
netlink dump operations take module as parameter to hold
reference for entire netlink dump duration.
Currently it holds ref only on genl module which is not correct
when we use ops registered to genl from another module.
Following patch adds module pointer to genl_ops so that netlink
can hold ref count on it.

CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-28 17:19:17 -04:00
John W. Linville
f3e979a52c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-08-28 13:51:40 -04:00
John W. Linville
b35c809708 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/trans.c
	net/mac80211/ibss.c
2013-08-28 10:36:09 -04:00
Fan Du
aba8269588 {ipv4,xfrm}: Introduce xfrm_tunnel_notifier for xfrm tunnel mode callback
Some thoughts on IPv4 VTI implementation:

The connection between VTI receiving part and xfrm tunnel mode input process
is hardly a "xfrm_tunnel", xfrm_tunnel is used in places where, e.g ipip/sit
and xfrm4_tunnel, acts like a true "tunnel" device.

In addition, IMHO, VTI doesn't need vti_err to do something meaningful, as all
VTI needs is just a notifier to be called whenever xfrm_input ingress a packet
to update statistics.

A IPsec protected packet is first handled by protocol handlers, e.g AH/ESP,
to check packet authentication or encryption rightness. PMTU update is taken
care of in this stage by protocol error handler.

Then the packet is rearranged properly depending on whether it's transport
mode or tunnel mode packed by mode "input" handler. The VTI handler code
takes effects in this stage in tunnel mode only. So it neither need propagate
PMTU, as it has already been done if necessary, nor the VTI handler is
qualified as a xfrm_tunnel.

So this patch introduces xfrm_tunnel_notifier and meanwhile wipe out vti_err
code.

Signed-off-by: Fan Du <fan.du@windriver.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-28 09:22:17 +02:00
David S. Miller
5b2941b18d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
A number of significant new features and optimizations for net-next/3.12.
Highlights are:
 * "Megaflows", an optimization that allows userspace to specify which
   flow fields were used to compute the results of the flow lookup.
   This allows for a major reduction in flow setups (the major
   performance bottleneck in Open vSwitch) without reducing flexibility.
 * Converting netlink dump operations to use RCU, allowing for
   additional parallelism in userspace.
 * Matching and modifying SCTP protocol fields.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-27 22:11:18 -04:00
Patrick McHardy
81eb6a1487 net: syncookies: export cookie_v6_init_sequence/cookie_v6_check
Extract the local TCP stack independant parts of tcp_v6_init_sequence()
and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY
target.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:28:04 +02:00
Patrick McHardy
48b1de4c11 netfilter: add SYNPROXY core/target
Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
core with common functions and an address family specific target.

The SYNPROXY receives the connection request from the client, responds with
a SYN/ACK containing a SYN cookie and announcing a zero window and checks
whether the final ACK from the client contains a valid cookie.

It then establishes a connection to the original destination and, if
successful, sends a window update to the client with the window size
announced by the server.

Support for timestamps, SACK, window scaling and MSS options can be
statically configured as target parameters if the features of the server
are known. If timestamps are used, the timestamp value sent back to
the client in the SYN/ACK will be different from the real timestamp of
the server. In order to now break PAWS, the timestamps are translated in
the direction server->client.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:27:54 +02:00
Patrick McHardy
0198230b77 net: syncookies: export cookie_v4_init_sequence/cookie_v4_check
Extract the local TCP stack independant parts of tcp_v4_init_sequence()
and cookie_v4_check() and export them for use by the upcoming SYNPROXY
target.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:27:44 +02:00
Patrick McHardy
41d73ec053 netfilter: nf_conntrack: make sequence number adjustments usuable without NAT
Split out sequence number adjustments from NAT and move them to the conntrack
core to make them usable for SYN proxying. The sequence number adjustment
information is moved to a seperate extend. The extend is added to new
conntracks when a NAT mapping is set up for a connection using a helper.

As a side effect, this saves 24 bytes per connection with NAT in the common
case that a connection does not have a helper assigned.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:26:48 +02:00
David S. Miller
b05930f5d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/trans.c
	include/linux/inetdevice.h

The inetdevice.h conflict involves moving the IPV4_DEVCONF values
into a UAPI header, overlapping additions of some new entries.

The iwlwifi conflict is a context overlap.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-26 16:37:08 -04:00
Will Deacon
50192abe02 fs/9p: avoid accessing utsname after namespace has been torn down
During trinity fuzzing in a kvmtool guest, I stumbled across the
following:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
PC is at v9fs_file_do_lock+0xc8/0x1a0
LR is at v9fs_file_do_lock+0x48/0x1a0
[<c01e2ed0>] (v9fs_file_do_lock+0xc8/0x1a0) from [<c0119154>] (locks_remove_flock+0x8c/0x124)
[<c0119154>] (locks_remove_flock+0x8c/0x124) from [<c00d9bf0>] (__fput+0x58/0x1e4)
[<c00d9bf0>] (__fput+0x58/0x1e4) from [<c0044340>] (task_work_run+0xac/0xe8)
[<c0044340>] (task_work_run+0xac/0xe8) from [<c002e36c>] (do_exit+0x6bc/0x8d8)
[<c002e36c>] (do_exit+0x6bc/0x8d8) from [<c002e674>] (do_group_exit+0x3c/0xb0)
[<c002e674>] (do_group_exit+0x3c/0xb0) from [<c002e6f8>] (__wake_up_parent+0x0/0x18)

I believe this is due to an attempt to access utsname()->nodename, after
exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns
as NULL and causing the above dereference.

A similar issue was fixed for lockd in 9a1b6bf818 ("LOCKD: Don't call
utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts
something similar for 9pfs.

Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-08-26 10:28:46 -05:00
Hannes Frederic Sowa
5a25cf1e31 xfrm: revert ipv4 mtu determination to dst_mtu
In commit 0ea9d5e3e0 ("xfrm: introduce
helper for safe determination of mtu") I switched the determination of
ipv4 mtus from dst_mtu to ip_skb_dst_mtu. This was an error because in
case of IP_PMTUDISC_PROBE we fall back to the interface mtu, which is
never correct for ipv4 ipsec.

This patch partly reverts 0ea9d5e3e0
("xfrm: introduce helper for safe determination of mtu").

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-26 12:40:53 +02:00
Joe Stringer
280c571e1a net: Add NEXTHDR_SCTP to ipv6.h
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23 16:43:08 -07:00
John W. Linville
81ca2ff945 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-08-23 11:47:48 -04:00
Vladimir Kondratiev
19504cf5f3 cfg80211: add flags to cfg80211_rx_mgmt()
Add flags intended to report various auxiliary information
and introduce the NL80211_RXMGMT_FLAG_ANSWERED flag to report
that the frame was already answered by the device.

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
[REPLIED->ANSWERED, reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-23 16:06:03 +02:00
Duan Jiong
c92a59eca8 ipv6: handle Redirect ICMP Message with no Redirected Header option
rfc 4861 says the Redirected Header option is optional, so
the kernel should not drop the Redirect Message that has no
Redirected Header option. In this patch, the function
ip6_redirect_no_header() is introduced to deal with that
condition.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2013-08-22 20:08:21 -07:00
David S. Miller
baf3b3f227 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Some constifications, from Mathias Krause.

2) Catch bugs if a hold timer is still active when xfrm_policy_destroy()
   is called, from Fan Du.

3) Remove a redundant address family checking, from Fan Du.

4) Make xfrm_state timer monotonic to be independent of system clock changes,
   from Fan Du.

5) Remove an outdated comment on returning -EREMOTE in the xfrm_lookup(),
   from Rami Rosen.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22 16:04:41 -07:00
John W. Linville
69b307a48a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-08-22 14:27:31 -04:00
John W. Linville
89b5f74a26 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-08-22 11:35:22 -04:00
Frédéric Dalleau
2dea632f9a Bluetooth: Add SCO connection fallback
When initiating a transparent eSCO connection, make use of T2 settings
at first try. T2 is the recommended settings from HFP 1.6 WideBand
Speech. Upon connection failure, try T1 settings.

When CVSD is requested and eSCO is supported, try to establish eSCO
connection using S3 settings. If it fails, fallback in sequence to S2,
S1, D1, D0 settings.

To know which setting should be used, conn->attempt is used. It
indicates the currently ongoing SCO connection attempt and can be used
as the index for the fallback settings table.

These setting and the fallback order are described in Bluetooth HFP 1.6
specification p. 101.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-08-21 16:47:13 +02:00
Frédéric Dalleau
07a5c61eda Bluetooth: Add constants and macro declaration for transparent data
This patch defines constants and macro for transparent data LMP
features. It refers to Bluetooth Core V4.0 specification, Part C, Chap
3.3 which defines LMP feature mask.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-08-21 16:47:12 +02:00
Frédéric Dalleau
10c62ddc6f Bluetooth: Parameters for outgoing SCO connections
In order to establish a transparent SCO connection, the correct settings
must be specified in the Setup Synchronous Connection request. For that,
a setting field is added to ACL connection data to set up the desired
parameters. The patch also removes usage of hdev->voice_setting in CVSD
connection and makes use of T2 parameters for transparent data.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-08-21 16:47:11 +02:00
Frédéric Dalleau
5d4d62f6ca Bluetooth: Add constants for SCO airmode
This patch defines constants for SCO airmode from SCO voice setting. It
refers to Bluetooth Core V4.0 specification, Part E, Chap 6.12 which
describe SCO voice setting format.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-08-21 16:47:10 +02:00
Frédéric Dalleau
ad10b1a487 Bluetooth: Add Bluetooth socket voice option
This patch extends the current Bluetooth socket options with BT_VOICE.
This is intended to choose voice data type at runtime. It only applies
to SCO sockets. Incoming connections shall be setup during deferred
setup. Outgoing connections shall be setup before connect(). The desired
setting is stored in the SCO socket info. This patch declares needed
members, modifies getsockopt() and setsockopt().

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-08-21 16:47:09 +02:00
Frédéric Dalleau
e660ed6c70 Bluetooth: Use hci_connect_sco directly
hci_connect is a super function for connecting hci protocols. But the
voice_setting parameter (introduced in subsequent patches) is only
needed by SCO and security requirements are not needed for SCO channels.
Thus, it makes sense to have a separate function for SCO.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-08-21 16:47:08 +02:00
Felix Fietkau
2dfca312a9 mac80211: add a flag to indicate CCK support for HT clients
brcm80211 cannot handle sending frames with CCK rates as part of an
A-MPDU session. Other drivers may have issues too. Set the flag in all
drivers that have been tested with CCK rates.

This fixes a reported brcmsmac regression introduced in
commit ef47a5e4f1
"mac80211/minstrel_ht: fix cck rate sampling"

Cc: stable@vger.kernel.org # 3.10
Reported-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-21 15:03:25 +02:00
David S. Miller
89d5e23210 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Conflicts:
	net/netfilter/nf_conntrack_proto_tcp.c

The conflict had to do with overlapping changes dealing with
fixing the use of an "s32" to hold the value returned by
NAT_OFFSET().

Pablo Neira Ayuso says:

====================
The following batch contains Netfilter/IPVS updates for your net-next tree.
More specifically, they are:

* Trivial typo fix in xt_addrtype, from Phil Oester.

* Remove net_ratelimit in the conntrack logging for consistency with other
  logging subsystem, from Patrick McHardy.

* Remove unneeded includes from the recently added xt_connlabel support, from
  Florian Westphal.

* Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for
  this, from Florian Westphal.

* Remove tproxy core, now that we have socket early demux, from Florian
  Westphal.

* A couple of patches to refactor conntrack event reporting to save a good
  bunch of lines, from Florian Westphal.

* Fix missing locking in NAT sequence adjustment, it did not manifested in
  any known bug so far, from Patrick McHardy.

* Change sequence number adjustment variable to 32 bits, to delay the
  possible early overflow in long standing connections, also from Patrick.

* Comestic cleanups for IPVS, from Dragos Foianu.

* Fix possible null dereference in IPVS in the SH scheduler, from Daniel
  Borkmann.

* Allow to attach conntrack expectations via nfqueue. Before this patch, you
  had to use ctnetlink instead, thus, we save the conntrack lookup.

* Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 13:30:54 -07:00
Pravin B Shelar
49560532d7 vxlan: Factor out vxlan send api.
Following patch allows more code sharing between vxlan and ovs-vxlan.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 00:15:43 -07:00
Pravin B Shelar
012a5729ff vxlan: Extend vxlan handlers for openvswitch.
Following patch adds data field to vxlan socket and export
vxlan handler api.
vh->data is required to store private data per vxlan handler.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 00:15:43 -07:00
Hannes Frederic Sowa
844d48746e xfrm: choose protocol family by skb protocol
We need to choose the protocol family by skb->protocol. Otherwise we
call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is
used in ipv4 mode, in which case we should call down to xfrm4_local_error
(ip6 sockets are a superset of ip4 ones).

We are called before before ip_output functions, so skb->protocol is
not reset.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-19 09:39:04 +02:00
David S. Miller
2ff1cf12c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
John W. Linville
d074666366 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-08-16 14:24:51 -04:00
Johannes Berg
27b3eb9c06 mac80211: add APIs to allow keeping connections after WoWLAN
In order to be able to (securely) keep connections alive after
the system was suspended for WoWLAN, we need some additional
APIs. We already have API (ieee80211_gtk_rekey_notify) to tell
wpa_supplicant about the new replay counter if GTK rekeying
was done by the device while the host was asleep, but that's
not sufficient.

If GTK rekeying wasn't done, we need to tell the host about
sequence counters for the GTK (and PTK regardless of rekeying)
that was used while asleep, add ieee80211_set_key_rx_seq() for
that.

If GTK rekeying was done, then we need to be able to disable
the old keys (with ieee80211_remove_key()) and allocate the
new GTK key(s) in mac80211 (with ieee80211_gtk_rekey_add()).

If protocol offload (e.g. ARP) is implemented, then also the
TX sequence counter for the PTK must be updated, using the new
ieee80211_set_key_tx_seq() function.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-16 12:58:43 +02:00
Jesper Dangaard Brouer
8a8e3d84b1 net_sched: restore "linklayer atm" handling
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "linklayer atm" handling.

 tc class add ... htb rate X ceil Y linklayer atm

The linklayer setting is implemented by modifying the rate table
which is send to the kernel.  No direct parameter were
transferred to the kernel indicating the linklayer setting.

The commit 56b765b79 ("htb: improved accuracy at high rates")
removed the use of the rate table system.

To keep compatible with older iproute2 utils, this patch detects
the linklayer by parsing the rate table.  It also supports future
versions of iproute2 to send this linklayer parameter to the
kernel directly. This is done by using the __reserved field in
struct tc_ratespec, to convey the choosen linklayer option, but
only using the lower 4 bits of this field.

Linklayer detection is limited to speeds below 100Mbit/s, because
at high rates the rtab is gets too inaccurate, so bad that
several fields contain the same values, this resembling the ATM
detect.  Fields even start to contain "0" time to send, e.g. at
1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
been more broken than we first realized.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15 01:43:08 -07:00
Nicolas Dichtel
0bd8762824 ip6tnl: add x-netns support
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15 01:00:20 -07:00
Nicolas Dichtel
6c742e714d ipip: add x-netns support
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15 01:00:20 -07:00
Hannes Frederic Sowa
0ea9d5e3e0 xfrm: introduce helper for safe determination of mtu
skb->sk socket can be of AF_INET or AF_INET6 address family. Thus we
always have to make sure we a referring to the correct interpretation
of skb->sk.

We only depend on header defines to query the mtu, so we don't introduce
a new dependency to ipv6 by this change.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-14 13:09:07 +02:00
Hannes Frederic Sowa
628e341f31 xfrm: make local error reporting more robust
In xfrm4 and xfrm6 we need to take care about sockets of the other
address family. This could happen because a 6in4 or 4in6 tunnel could
get protected by ipsec.

Because we don't want to have a run-time dependency on ipv6 when only
using ipv4 xfrm we have to embed a pointer to the correct local_error
function in xfrm_state_afinet and look it up when returning an error
depending on the socket address family.

Thanks to vi0ss for the great bug report:
<https://bugzilla.kernel.org/show_bug.cgi?id=58691>

v2:
a) fix two more unsafe interpretations of skb->sk as ipv6 socket
   (xfrm6_local_dontfrag and __xfrm6_output)
v3:
a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when
   building ipv6 as a module (thanks to Steffen Klassert)

Reported-by: <vi0oss@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-14 13:07:12 +02:00
Pravin B Shelar
4221f40513 ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id.
Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.

Following patch reverts optimization from commit
490ab08127 (IP_GRE: Fix IP-Identification.)

CC: Jarno Rajahalme <jrajahalme@nicira.com>
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13 16:52:50 -07:00
Eric Lapuyade
352a5f5fb3 NFC: netlink: Add result of firmware operation to completion event
Result is added as an NFC_ATTR_FIRMWARE_DOWNLOAD_STATUS attribute
containing the standard errno positive value of the completion result.
This event will be sent when the firmare download operation is done and
will contain the operation result.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-08-14 01:12:58 +02:00
Eric Lapuyade
ef04158e13 NFC: Move nfc_fw_download_done() definition from private to public
This API must be called by NFC drivers, and its prototype was
incorrectly placed.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-08-14 01:08:01 +02:00
Pablo Neira Ayuso
bd07793705 netfilter: nfnetlink_queue: allow to attach expectations to conntracks
This patch adds the capability to attach expectations via nfnetlink_queue.
This is required by conntrack helpers that trigger expectations based on
the first packet seen like the TFTP and the DHCPv6 user-space helpers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13 16:32:10 +02:00
John W. Linville
89c2af3c14 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/ethernet/broadcom/Kconfig
2013-08-12 14:45:06 -04:00
David Spinadel
52981cd794 mac80211: add vif to testmode cmd
Pass the wdev from cfg80211 on to the driver as the vif
if given and it's valid for the driver.

Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-12 14:11:42 +02:00
David Spinadel
fc73f11f5f cfg80211: add wdev to testmode cmd
To allow drivers to implement per-interface testmode operations
more easily, pass a wdev pointer if any identification for one
was given from userspace. Clean up the code a bit while at it.

Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-12 14:11:37 +02:00
Johannes Berg
af61a16518 mac80211: add control port protocol TX control flag
A lot of drivers check the frame protocol for ETH_P_PAE,
for various reasons (like making those more reliable).
Add a new flags bitmap to the TX control info and a new
flag indicating the control port protocol is in use to
let all drivers also apply such logic to other control
port protocols, should they be configured.

Also use the new flag in the iwlwifi drivers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-12 14:09:29 +02:00
Eric Dumazet
28d6427109 net: attempt high order allocations in sock_alloc_send_pskb()
Adding paged frags skbs to af_unix sockets introduced a performance
regression on large sends because of additional page allocations, even
if each skb could carry at least 100% more payload than before.

We can instruct sock_alloc_send_pskb() to attempt high order
allocations.

Most of the time, it does a single page allocation instead of 8.

I added an additional parameter to sock_alloc_send_pskb() to
let other users to opt-in for this new feature on followup patches.

Tested:

Before patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 2304  212992  212992    10.00    46861.15

After patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 2304  212992  212992    10.00    57981.11

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10 01:16:44 -07:00
Eric Dumazet
e370a72363 af_unix: improve STREAM behavior with fragmented memory
unix_stream_sendmsg() currently uses order-2 allocations,
and we had numerous reports this can fail.

The __GFP_REPEAT flag present in sock_alloc_send_pskb() is
not helping.

This patch extends the work done in commit eb6a24816b
("af_unix: reduce high order page allocations) for
datagram sockets.

This opens the possibility of zero copy IO (splice() and
friends)

The trick is to not use skb_pull() anymore in recvmsg() path,
and instead add a @consumed field in UNIXCB() to track amount
of already read payload in the skb.

There is a performance regression for large sends
because of extra page allocations that will be addressed
in a follow-up patch, allowing sock_alloc_send_pskb()
to attempt high order page allocations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10 01:16:44 -07:00
Yuchung Cheng
149479d019 tcp: add server ip to encrypt cookie in fast open
Encrypt the cookie with both server and client IPv4 addresses,
such that multi-homed server will grant different cookies
based on both the source and destination IPs. No client change
is needed since cookie is opaque to the client.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10 00:35:33 -07:00
David S. Miller
71acc0ddd4 Revert "net: sctp: convert sctp_checksum_disable module param into sctp sysctl"
This reverts commit cda5f98e36.

As per Vlad's request.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 13:09:41 -07:00
John W. Linville
fa5978447c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-08-09 15:08:10 -04:00
John W. Linville
4f05444892 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2013-08-09 15:06:28 -04:00
Eliezer Tamir
288a937637 net: rename busy poll MIB counter
Rename mib counter from "low latency" to "busy poll"

v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
Eric Dumazet suggested that the current location is better.

So v2 just renames the counter to fit the new naming convention.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:39:08 -07:00
Daniel Borkmann
477143e3fe net: sctp: trivial: update bug report in header comment
With the restructuring of the lksctp.org site, we only allow bug
reports through the SCTP mailing list linux-sctp@vger.kernel.org,
not via SF, as SF is only used for web hosting and nothing more.
While at it, also remove the obvious statement that bugs will be
fixed and incooperated into the kernel.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:33:02 -07:00
Daniel Borkmann
cda5f98e36 net: sctp: convert sctp_checksum_disable module param into sctp sysctl
Get rid of the last module parameter for SCTP and make this
configurable via sysctl for SCTP like all the rest of SCTP's
configuration knobs.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:33:02 -07:00
Florian Westphal
c655bc6896 netfilter: nf_conntrack: don't send destroy events from iterator
Let nf_ct_delete handle delivery of the DESTROY event.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-09 12:03:33 +02:00
Tejun Heo
6d37b97428 netprio_cgroup: pass around @css instead of @cgroup and kill struct cgroup_netprio_state
cgroup controller API will be converted to primarily use struct
cgroup_subsys_state instead of struct cgroup.  In preparation, make
the internal functions of netprio_cgroup pass around @css instead of
@cgrp.

While at it, kill struct cgroup_netprio_state which only contained
struct cgroup_subsys_state without serving any purpose.  All functions
are converted to deal with @css directly.

This patch shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
2013-08-08 20:11:22 -04:00
Tejun Heo
8af01f56a0 cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/
The names of the two struct cgroup_subsys_state accessors -
cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
The former clashes with the type name and the latter doesn't even
indicate it's somehow related to cgroup.

We're about to revamp large portion of cgroup API, so, let's rename
them so that they're less awkward.  Most per-controller usages of the
accessors are localized in accessor wrappers and given the amount of
scheduled changes, this isn't gonna add any noticeable headache.

Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
to task_css().  This patch is pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08 20:11:22 -04:00
stephen hemminger
6261d983f2 ip_tunnel: embed hash list head
The IP tunnel hash heads can be embedded in the per-net structure
since it is a fixed size. Reduce the size so that the total structure
fits in a page size. The original size was overly large, even NETDEV_HASHBITS
is only 8 bits!

Also, add some white space for readability.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:47:52 -07:00
fan.du
5a139296f8 sctp: Pack dst_cookie into 1st cacheline hole for 64bit host
As dst_cookie is used in fast path sctp_transport_dst_check.

Before:
struct sctp_transport {
	struct list_head           transports;           /*     0    16 */
	atomic_t                   refcnt;               /*    16     4 */
	__u32                      dead:1;               /*    20:31  4 */
	__u32                      rto_pending:1;        /*    20:30  4 */
	__u32                      hb_sent:1;            /*    20:29  4 */
	__u32                      pmtu_pending:1;       /*    20:28  4 */

	/* XXX 28 bits hole, try to pack */

	__u32                      sack_generation;      /*    24     4 */

	/* XXX 4 bytes hole, try to pack */

	struct flowi               fl;                   /*    32    64 */
	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
	union sctp_addr            ipaddr;               /*    96    28 */

After:
struct sctp_transport {
	struct list_head           transports;           /*     0    16 */
	atomic_t                   refcnt;               /*    16     4 */
	__u32                      dead:1;               /*    20:31  4 */
	__u32                      rto_pending:1;        /*    20:30  4 */
	__u32                      hb_sent:1;            /*    20:29  4 */
	__u32                      pmtu_pending:1;       /*    20:28  4 */

	/* XXX 28 bits hole, try to pack */

	__u32                      sack_generation;      /*    24     4 */
	u32                        dst_cookie;           /*    28     4 */
	struct flowi               fl;                   /*    32    64 */
	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
	union sctp_addr            ipaddr;               /*    96    28 */

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 12:20:51 -07:00
Mathias Krause
e473fcb472 xfrm: constify mark argument of xfrm_find_acq()
The mark argument is read only, so constify it. Also make dummy_mark in
af_key const -- only used as dummy argument for this very function.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-05 11:13:53 +02:00
Eliezer Tamir
bf37d2b3fd busy_poll: cleanup do-nothing placeholders
When renaming ll_poll to busy poll, I introduced a typo
in the name of the do-nothing placeholder for sk_busy_loop
and called it sk_busy_poll.
This broke compile when busy poll was not configured.
Cong Wang submitted a patch to fixed that.
This patch removes the now redundant, misspelled placeholder.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-04 19:41:27 -07:00
David S. Miller
0e76a3a587 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next to setup some infrastructure Eric
Dumazet needs for usbnet changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03 21:36:46 -07:00
Eric Dumazet
fba3679d34 fib_rules: reorder struct fib_rules fields
Move refcnt, pref, suppress_ifgroup, suppress_prefixlen out of first
cache line, as they are not used in fast path.

Make sure ctarget & fr_net are in first cache line.

(Assuming 64 bit arches and 64 bytes cache lines)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03 11:53:54 -07:00
Stefan Tomanek
73f5698e77 fib_rules: fix suppressor names and default values
This change brings the suppressor attribute names into line; it also changes
the data types to provide a more consistent interface.

While -1 indicates that the suppressor is not enabled, values >= 0 for
suppress_prefixlen or suppress_ifgroup  reject routing decisions violating the
constraint.

This changes the previously presented behaviour of suppress_prefixlen, where a
prefix length _less_ than the attribute value was rejected. After this change,
a prefix length less than *or* equal to the value is considered a violation of
the rule constraint.

It also changes the default values for default and newly added rules (disabling
any suppression for those).

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03 10:40:23 -07:00
Stefan Tomanek
6ef94cfafb fib_rules: add route suppression based on ifgroup
This change adds the ability to suppress a routing decision based upon the
interface group the selected interface belongs to. This allows it to
exclude specific devices from a routing decision.

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02 15:24:22 -07:00
fan.du
d27fc78208 sctp: Don't lookup dst if transport dst is still valid
When sctp sits on IPv6, sctp_transport_dst_check pass cookie as ZERO,
as a result ip6_dst_check always fail out. This behaviour makes
transport->dst useless, because every sctp_packet_transmit must look
for valid dst.

Add a dst_cookie into sctp_transport, and set the cookie whenever we
get new dst for sctp_transport. So dst validness could be checked
against it.

Since I have split genid for IPv4 and IPv6, also delete/add IPv6 address
will also bump IPv6 genid. So issues we discussed in:
http://marc.info/?l=linux-netdev&m=137404469219410&w=4
have all been sloved for this patch.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02 12:36:00 -07:00
Joe Perches
574e2af7c0 include: Convert ethernet mac address declarations to use ETH_ALEN
It's convenient to have ethernet mac addresses use
ETH_ALEN to be able to grep for them a bit easier and
also to ensure that the addresses are __aligned(2).

Add #include <linux/if_ether.h> as necessary.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02 12:33:54 -07:00
Cong Wang
e0d1095ae3 net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
Eliezer renames several *ll_poll to *busy_poll, but forgets
CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.

Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 15:11:17 -07:00
Cong Wang
dfcefb0be1 net: fix a compile error when CONFIG_NET_LL_RX_POLL is not set
When CONFIG_NET_LL_RX_POLL is not set, I got:

net/socket.c: In function ‘sock_poll’:
net/socket.c:1165:4: error: implicit declaration of function ‘sk_busy_loop’ [-Werror=implicit-function-declaration]

Fix this by adding a nop when !CONFIG_NET_LL_RX_POLL.

Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 15:10:58 -07:00
Michal Kubeček
2ac3ac8f86 ipv6: prevent fib6_run_gc() contention
On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 14:16:20 -07:00
John W. Linville
22e02a0272 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-08-01 14:30:59 -04:00
Simon Wunderlich
73da7d5bab mac80211: add channel switch command and beacon callbacks
The count field in CSA must be decremented with each beacon
transmitted. This patch implements the functionality for drivers
using ieee80211_beacon_get(). Other drivers must call back manually
after reaching count == 0.

This patch also contains the handling and finish worker for the channel
switch command, and mac80211/chanctx code to allow to change a channel
definition of an active channel context.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
[small cleanups, catch identical chandef]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-01 18:30:33 +02:00
Simon Wunderlich
16ef1fe272 nl80211/cfg80211: add channel switch command
To allow channel switch announcements within beacons, add
the channel switch command to nl80211/cfg80211. This is
implementation is intended for AP and (later) IBSS mode.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-01 18:30:28 +02:00
Joe Perches
378307217e cls_cgroup.h netprio_cgroup.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:50:02 -07:00
Joe Perches
4fc7074703 checksum: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:50:02 -07:00
Joe Perches
10dd9b7ce3 cfg80211.h/mac80211.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:50:02 -07:00
Joe Perches
c1d8f8041c ax25.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:50:02 -07:00
Joe Perches
90972b2211 arp/neighbour.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:50:02 -07:00
Joe Perches
cd2cf63a56 af_rxrpc.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:50:01 -07:00
Joe Perches
b60a8280ba af_unix.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:50:01 -07:00
Joe Perches
e8e54d3c13 addrconf.h: Remove extern function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:50:01 -07:00
Stefan Tomanek
7764a45a8f fib_rules: add .suppress operation
This change adds a new operation to the fib_rules_ops struct; it allows the
suppression of routing decisions if certain criteria are not met by its
results.

The first implemented constraint is a minimum prefix length added to the
structures of routing rules. If a rule is added with a minimum prefix length
>0, only routes meeting this threshold will be considered. Any other (more
general) routing table entries will be ignored.

When configuring a system with multiple network uplinks and default routes, it
is often convinient to reference the main routing table multiple times - but
omitting the default route. Using this patch and a modified "ip" utility, this
can be achieved by using the following command sequence:

  $ ip route add table secuplink default via 10.42.23.1

  $ ip rule add pref 100            table main prefixlength 1
  $ ip rule add pref 150 fwmark 0xA table secuplink

With this setup, packets marked 0xA will be processed by the additional routing
table "secuplink", but only if no suitable route in the main routing table can
be found. By using a minimal prefixlength of 1, the default route (/0) of the
table "main" is hidden to packets processed by rule 100; packets traveling to
destinations with more specific routing entries are processed as usual.

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:27:17 -07:00
Joe Perches
5c15257f93 net: Remove extern from include/net/ scheduling prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:24:22 -07:00
Joe Perches
d9d10a3096 ndisc: Add missing inline to ndisc_addr_option_pad
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 15:18:17 -07:00
Eric Dumazet
f2f872f927 netem: Introduce skb_orphan_partial() helper
Commit 547669d483 ("tcp: xps: fix reordering issues") added
unexpected reorders in case netem is used in a MQ setup for high
performance test bed.

ETH=eth0
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 32`
do
 tc qd add dev $ETH parent 1:$i netem delay 100ms
done

As all tcp packets are orphaned by netem, TCP stack believes it can
set skb->ooo_okay on all packets.

In order to allow producers to send more packets, we want to
keep sk_wmem_alloc from reaching sk_sndbuf limit.

We can do that by accounting one byte per skb in netem queues,
so that TCP stack is not fooled too much.

Tested:

With above MQ/netem setup, scaling number of concurrent flows gives
linear results and no reorders/retransmits

lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100
 do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done
n:1 198.46
n:10 2002.69
n:20 4000.98
n:30 6006.35
n:40 8020.93
n:50 10032.3
n:60 12081.9
n:70 13971.3
n:80 16009.7
n:90 17117.3
n:100 17425.5

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 14:59:49 -07:00
fan.du
ca4c3fc24e net: split rt_genid for ipv4 and ipv6
Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
  entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 14:56:36 -07:00
Dmitry Popov
c0155b2da4 tcp: Remove unused tcpct declarations and comments
Remove declaration, 4 defines and confusing comment that are no longer used
since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Acked-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 12:16:45 -07:00
Patrick McHardy
2d89c68ac7 netfilter: nf_nat: change sequence number adjustments to 32 bits
Using 16 bits is too small, when many adjustments happen the offsets might
overflow and break the connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 19:54:51 +02:00
Florian Westphal
02982c27ba netfilter: nf_conntrack: remove duplicate code in ctnetlink
ctnetlink contains copy-paste code from death_by_timeout.  In order to
avoid changing both places in upcoming event delivery patch,
export death_by_timeout functionality and use it in the ctnetlink code.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 18:51:23 +02:00
Florian Westphal
93742cf8af netfilter: tproxy: remove nf_tproxy_core.h
We've removed nf_tproxy_core.ko, so also remove its header.
The lookup helpers are split and then moved to tproxy target/socket match.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 18:43:45 +02:00
Florian Westphal
fd158d79d3 netfilter: tproxy: remove nf_tproxy_core, keep tw sk assigned to skb
The module was "permanent", due to the special tproxy skb->destructor.
Nowadays we have tcp early demux and its sock_edemux destructor in
networking core which can be used instead.

Thanks to early demux changes the input path now also handles
"skb->sk is tw socket" correctly, so this no longer needs the special
handling introduced with commit d503b30bd6
(netfilter: tproxy: do not assign timewait sockets to skb->sk).

Thus:
- move assign_sock function to where its needed
- don't prevent timewait sockets from being assigned to the skb
- remove nf_tproxy_core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 16:39:40 +02:00
Patrick McHardy
6704af53fc netfilter: nf_conntrack: remove net_ratelimit() for LOG_INVALID()
Logging of invalid packets has to be explicitly enabled. Rate-limiting these
messages is inconsistent with other netfilter logging features and makes
debugging harder.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 16:37:27 +02:00
Samuel Ortiz
9ea7187c53 NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD
Loading a firmware into a target is typically called firmware
download, not firmware upload. So we rename the netlink API to
NFC_CMD_FW_DOWNLOAD in order to avoid any terminology confusion from
userspace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-07-31 01:19:43 +02:00
Andi Shyti
60ff779c4a 9p: client: remove unused code and any reference to "cancelled" function
This patch reverts commit

80b45261a0

which was implementing a 'cancelled' functionality to notify that
a cancelled request will not be replied.

This implementation was not used anywhere and therefore removed.

Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-30 15:54:28 -07:00
Thomas Graf
c26bf4a513 pktgen: Add UDPCSUM flag to support UDP checksums
UDP checksums are optional, hence pktgen has been omitting them in
favour of performance. The optional flag UDPCSUM enables UDP
checksumming. If the output device supports hardware checksumming
the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the
checksum is generated in software.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-27 22:16:36 -07:00
Asias He
82a54d0ebb VSOCK: Move af_vsock.h and vsock_addr.h to include/net
This is useful for other VSOCK transport implemented outside the
net/vmw_vsock/ directory to use these headers.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-27 22:14:06 -07:00
Joe Stringer
024ec3deac net/sctp: Refactor SCTP skb checksum computation
This patch consolidates the SCTP checksum calculation code from various
places to a single new function, sctp_compute_cksum(skb, offset).

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-27 20:07:15 -07:00
Mikel Astiz
a77b15a60c Bluetooth: Add HCI authentication capabilities macros
Add macros for the HCI capabilities as described in the Bluetooth Core
Specification v4.0, Volume 2, part E, section 7.1.29.

Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-07-25 14:15:22 +01:00
Eric Dumazet
c9bee3b7fd tcp: TCP_NOTSENT_LOWAT socket option
Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.

TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :

Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)

For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())

This patch adds two ways to set the limit :

1) Per socket option TCP_NOTSENT_LOWAT

2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.

This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp->nosent_lowat

Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.

Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
 Specify the minimum number of bytes in the buffer until
 the socket layer will pass the data to the protocol)

Tested:

netperf sessions, and watching /proc/net/protocols "memory" column for TCP

With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.

A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1651584     6291456     16384  20.00   17447.90   10^6bits/s  3.13  S      -1.00  U      0.353   -1.000  usec/KB

 Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

           412,514 context-switches

     200.034645535 seconds time elapsed

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1593240     6291456     16384  20.00   17321.16   10^6bits/s  3.35  S      -1.00  U      0.381   -1.000  usec/KB

 Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

         2,675,818 context-switches

     200.029651391 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24 17:54:48 -07:00
Eric Dumazet
64dc61306c net: add sk_stream_is_writeable() helper
Several call sites use the hardcoded following condition :

sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24 17:54:48 -07:00
Daniel Borkmann
91705c61b5 net: sctp: trivial: update mailing list address
The SCTP mailing list address to send patches or questions
to is linux-sctp@vger.kernel.org and not
lksctp-developers@lists.sourceforge.net anymore. Therefore,
update all occurences.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24 17:53:38 -07:00
Yuchung Cheng
5b08e47caf tcp: prefer packet timing to TS-ECR for RTT
Prefer packet timings to TS-ecr for RTT measurements when both
sources are available. That's because broken middle-boxes and remote
peer can return packets with corrupted TS ECR fields. Similarly most
congestion controls that require RTT signals favor timing-based
sources as well. Also check for bad TS ECR values to avoid RTT
blow-ups. It has happened on production Web servers.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-22 17:53:42 -07:00
Yuchung Cheng
375fe02c91 tcp: consolidate SYNACK RTT sampling
The first patch consolidates SYNACK and other RTT measurement to use a
central function tcp_ack_update_rtt(). A (small) bonus is now SYNACK
RTT measurement happens after PAWS check, potentially reducing the
impact of RTO seeding on bad TCP timestamps values.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-22 17:53:42 -07:00
Richard Cochran
cb820f8e4b net: Provide a generic socket error queue delivery method for Tx time stamps.
This patch moves the private error queue delivery function from the
af_packet code to the core socket method. In this way, network layers
only needing the error queue for transmit time stamping can share common
code.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-22 14:58:19 -07:00
Simon Wunderlich
0430c88347 cfg80211/mac80211: use reduced txpower for 5 and 10 MHz
Some regulations (like germany, but also FCC) express their transmission
power limit in dBm/MHz or mW/MHz. To cope with that and be on the safe
side, reduce the maximum power to half (10 MHz) or quarter (5 MHz)
when operating on these reduced bandwidth channels.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:58:08 +03:00
Simon Wunderlich
74608aca4d cfg80211/mac80211: get mandatory rates based on scan width
Mandatory rates for 5 and 10 MHz are different from the rates used for
20 MHz in 2.4 GHz mode, as they use OFDM only.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:58:07 +03:00
Simon Wunderlich
a5e70697d0 mac80211: add radiotap flag and handling for 5/10 MHz
Wireshark already defines radiotap channel flags for 5 and 10 MHz, so
just use them in Linux radiotap too. Furthermore, add rx status flags to
allow drivers to report when they received data on 5 or 10 MHz channels.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:58:05 +03:00
Simon Wunderlich
3de805cf96 mac80211/rc80211: add chandef to rate initialization
5 and 10 MHz support needs to know the current operating channel width,
add the chandef to the rate control API.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:58:02 +03:00
Simon Wunderlich
dcd6eac1f3 nl80211: add scan width to bss and scan request structs
To allow scanning and working with 5 MHz and 10 MHz BSS, extend the
inform bss commands and add wrappers to take 5 and 10 MHz bss into
account.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:58:01 +03:00
Amitkumar Karwar
be29b99a9b cfg80211/nl80211: Add packet coalesce support
In most cases, host that receives IPv4 and IPv6 multicast/broadcast
packets does not do anything with these packets. Therefore the
reception of these unwanted packets causes unnecessary processing
and power consumption.

Packet coalesce feature helps to reduce number of received
interrupts to host by buffering these packets in firmware/hardware
for some predefined time. Received interrupt will be generated when
one of the following events occur.
a) Expiration of hardware timer whose expiration time is set to
maximum coalescing delay of matching coalesce rule.
b) Coalescing buffer in hardware reaches it's limit.
c) Packet doesn't match any of the configured coalesce rules.

This patch adds set/get configuration support for packet coalesce.
User needs to configure following parameters for creating a coalesce
rule.
a) Maximum coalescing delay
b) List of packet patterns which needs to be matched
c) Condition for coalescence. pattern 'match' or 'no match'
Multiple such rules can be created.

This feature needs to be advertised during driver initialization.
Drivers are supposed to do required firmware/hardware settings based
on user configuration.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
[fix kernel-doc, change free function, fix copy/paste error]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:58:00 +03:00
Johannes Berg
ad24b0da9e wireless: indent kernel-doc with tabs
Almost everywhere tabs are used to indent continuation
lines, replace the few places that use spaces.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:57:58 +03:00
Simon Wunderlich
803768f54e nl80211: enable HT overrides for ibss
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:57:56 +03:00
Amitkumar Karwar
50ac660784 cfg80211/nl80211: rename packet pattern related structures and enums
Currently packet patterns and it's enum/structures are used only
for WoWLAN feature. As we intend to reuse them for new feature
packet coalesce, they are renamed in this patch.

Older names are kept for backward compatibility purpose.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:57:55 +03:00
Linus Torvalds
be9c6d9169 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Just a bunch of small fixes and tidy ups:

   1) Finish the "busy_poll" renames, from Eliezer Tamir.

   2) Fix RCU stalls in IFB driver, from Ding Tianhong.

   3) Linearize buffers properly in tun/macvtap zerocopy code.

   4) Don't crash on rmmod in vxlan, from Pravin B Shelar.

   5) Spinlock used before init in alx driver, from Maarten Lankhorst.

   6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
      Kravkov.

   7) Dummy and ifb driver load failure paths can oops, fixes from Tan
      Xiaojun and Ding Tianhong.

   8) Correct MTU calculations in IP tunnels, from Alexander Duyck.

   9) Account all TCP retransmits in SNMP stats properly, from Yuchung
      Cheng.

  10) atl1e and via-rhine do not handle DMA mapping failures properly,
      from Neil Horman.

  11) Various equal-cost multipath route fixes in ipv6 from Hannes
      Frederic Sowa"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
  ipv6: only static routes qualify for equal cost multipathing
  via-rhine: fix dma mapping errors
  atl1e: fix dma mapping warnings
  tcp: account all retransmit failures
  usb/net/r815x: fix cast to restricted __le32
  usb/net/r8152: fix integer overflow in expression
  net: access page->private by using page_private
  net: strict_strtoul is obsolete, use kstrtoul instead
  drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
  drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
  drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
  net/usb: add relative mii functions for r815x
  net/tipc: use %*phC to dump small buffers in hex form
  qlcnic: Adding Maintainers.
  gre: Fix MTU sizing check for gretap tunnels
  pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
  pkt_sched: sch_qfq: improve efficiency of make_eligible
  gso: Update tunnel segmentation to support Tx checksum offload
  inet: fix spacing in assignment
  ifb: fix oops when loading the ifb failed
  ...
2013-07-13 17:42:22 -07:00
Linus Torvalds
19d2f8e0fb Second round of 9p patches for the 3.11 merge window.
Several of these patches were rebased in order to correct style issues.
 Only stylistic changes were made versus the patches which were in linux-next
 for two weeks.  The rebases have been in linux-next for 3 days and have
 passed my regressions.
 
 The bulk of these are RDMA fixes and improvements.  There's also some
 additions on the extended attributes front to support some additional
 namespaces and a new option for TCP to force allocation of mount requests
 from a priviledged port.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJR3rWXAAoJEDZk62b0Tg6xabIP/12I+SkQ57wRN03EQy5fqUdX
 gK/YMHKQ9QuDnZPBvrZ2lypesQNqVU0KINay6VEA86JG1gwzPyUd2MnpQ7F0vV3N
 XwVD54IoflV/M74xUnrgGWB8YxaPcdacQQ8yazX+mEgOgYGdWmDAl7FHmAkdKAFB
 gSl25f3PNJX1Rjay0dssNVXrVPXuJY/fZXKnNQZKtRwXffRWKsWHd8FU0Eq7F30A
 kNQB8tmMSfHBBjP+tzR0My6/kQ09jzHdtZOkH9IgVpNzqrd8tfy0l6tEvFypxqGT
 5oQFoxHHL/tUW05V0P3gYany2A7lEhSUifPKS6omqHO+vPlw+pDJw+xWlNq9fnDt
 8S8znqVuEHhvqRQW7zFdb9ac2MZi8CHHhC2wGIZ7GYjNG2q5XwE8b/QhdXQeFin7
 ibugvoW7+ZdcDewpQW27oO0g7B/8hRt8KC+1lc/8rITKIfGxbNJkGzTDl0F4Co7v
 IH7Ew5PHPe6ZiuU0QSdU+NBuvk8g8sWGxx04Xvzl3WicwOg7XvN3ivrKB9oN2U1x
 50KZRnYpwQQv/9AxyhroYU+Ufje8SF4v++zsq1eMzUcHsC/C73eatw2m764t+X4S
 8yMLrgqY1Nzif4nAMi/SDMnB/R1bXeuc8kXD9xT6XD9d2tf6e+zCHhQklVeC0tuK
 RiVRJqGrfanbKMnWIG0Y
 =n9rI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull second round of 9p patches from Eric Van Hensbergen:
 "Several of these patches were rebased in order to correct style
  issues.  Only stylistic changes were made versus the patches which
  were in linux-next for two weeks.  The rebases have been in linux-next
  for 3 days and have passed my regressions.

  The bulk of these are RDMA fixes and improvements.  There's also some
  additions on the extended attributes front to support some additional
  namespaces and a new option for TCP to force allocation of mount
  requests from a priviledged port"

* tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
  9P: Add cancelled() to the transport functions.
  9P/RDMA: count posted buffers without a pending request
  9P/RDMA: Improve error handling in rdma_request
  9P/RDMA: Do not free req->rc in error handling in rdma_request()
  9P/RDMA: Use a semaphore to protect the RQ
  9P/RDMA: Protect against duplicate replies
  9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
  9pnet: refactor struct p9_fcall alloc code
  9P/RDMA: rdma_request() needs not allocate req->rc
  9P: Fix fcall allocation for rdma
  fs/9p: xattr: add trusted and security namespaces
  net/9p: add privport option to 9p tcp transport
2013-07-11 10:21:23 -07:00
Eliezer Tamir
64b0dc517e net: rename busy poll socket op and globals
Rename LL_SO to BUSY_POLL_SO
Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
Fix up users of these variables.
Fix documentation for sysctl.

a patch for the socket.7  man page will follow separately,
because of limitations of my mail setup.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir
8b80cda536 net: rename ll methods to busy-poll
Rename ndo_ll_poll to ndo_busy_poll.
Rename sk_mark_ll to sk_mark_napi_id.
Rename skb_mark_ll to skb_mark_napi_id.
Correct all useres of these functions.
Update comments and defines  in include/net/busy_poll.h

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir
076bb0c82a net: rename include/net/ll_poll.h to include/net/busy_poll.h
Rename the file and correct all the places where it is included.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir
76b1e9b981 net/fs: change busy poll time accounting
Suggested by Linus:
Changed time accounting for busy-poll:
- Make it microsecond based.
- Use unsigned longs.
- Revert back to use time_after instead of time_in_range.
Reorder poll/select busy loop conditions:
- Clear busy_flag after one time we can't busy-poll.
- Only init busy_end if we actually are going to busy-poll.
Added one more missing need_resched() test.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-09 12:33:04 -07:00
Eliezer Tamir
cbf55001b2 net: rename low latency sockets functions to busy poll
Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.

Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-08 19:25:45 -07:00
Simon Derr
80b45261a0 9P: Add cancelled() to the transport functions.
RDMA needs to post a buffer for each incoming reply.
Hence it needs to keep count of these and needs to be
aware of whether a flushed request has received a reply
or not.

This patch adds the cancelled() callback to the transport modules.
It is called when RFLUSH has been received and that the corresponding
request will never receive a reply.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:18:18 -05:00
Jim Garlick
2f28c8b31d net/9p: add privport option to 9p tcp transport
If the privport option is specified, the tcp transport binds local
address to a reserved port before connecting to the 9p server.

In some cases when 9P AUTH cannot be implemented, this is better than
nothing.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 21:59:54 -05:00
Eric Dumazet
9eb5bf838d net: sock: fix TCP_SKB_MIN_TRUESIZE
commit eea86af6b1 ("net: sock: adapt SOCK_MIN_RCVBUF and
SOCK_MIN_SNDBUF") forgot the sk_buff alignment taken into account
in __alloc_skb() : skb->truesize = SKB_TRUESIZE(size);

While above commit fixed the sender issue, the receiver is still
dropping the second packet (on loopback device), because the receiver
socket can not really hold two skbs :
First packet truesize already is above sk_rcvbuf, so even TCP coalescing
cannot help.

On a typical 64bit build, each tcp skb truesize is 2304, instead of 2272

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 16:52:10 -07:00
Daniel Borkmann
c50cd35788 net: gre: move GSO functions to gre_offload
Similarly to TCP/UDP offloading, move all related GRE functions to
gre_offload.c to make things more explicit and similar to the rest
of the code.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 14:37:39 -07:00
Eliezer Tamir
419076f59f net: lls fix build with allnoconfig
correct placeholder declarations to prevent build breakage when
!CONFIG_NET_LL_RX_POLL

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 23:50:47 -07:00
Eliezer Tamir
1bc2774d86 net: convert lls to use time_in_range()
Time in range will fail safely if we move to a different cpu with an
extremely large clock skew.
Add time_in_range64() and convert lls to use it.

changelog:
v2
- fixed double call to sched_clock in can_poll_ll
- fixed checkpatchisms

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 15:53:53 -07:00
Hannes Frederic Sowa
8822b64a0f ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data
We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <davej@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 12:44:18 -07:00
Rami Rosen
b48410b4dc ipv4: remove fib_update_nh_saddrs() declaration.
This patch removes the fib_update_nh_saddrs() declaration from
include/net/ip_fib.h, as the fib_update_nh_saddrs() method was removed in
coomit 436c3b6 ("ipv4: Invalidate nexthop cache nh_saddr more correctly").

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 00:33:52 -07:00
Daniel Borkmann
a05b1019cf net: sctp: prevent checksum.h from double inclusion
The header file checksum.h is missing proper defines that prevents
it from double inclusion.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 00:23:57 -07:00
Amerigo Wang
8965779d2c ipv6,mcast: always hold idev->lock before mca_lock
dingtianhong reported the following deadlock detected by lockdep:

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.4.24.05-0.1-default #1 Not tainted
 -------------------------------------------------------
 ksoftirqd/0/3 is trying to acquire lock:
  (&ndev->lock){+.+...}, at: [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120

 but task is already holding lock:
  (&mc->mca_lock){+.+...}, at: [<ffffffff8149d130>] mld_send_report+0x40/0x150

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&mc->mca_lock){+.+...}:
        [<ffffffff810a8027>] validate_chain+0x637/0x730
        [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500
        [<ffffffff810a8734>] lock_acquire+0x114/0x150
        [<ffffffff814f691a>] rt_spin_lock+0x4a/0x60
        [<ffffffff8149e4bb>] igmp6_group_added+0x3b/0x120
        [<ffffffff8149e5d8>] ipv6_mc_up+0x38/0x60
        [<ffffffff81480a4d>] ipv6_find_idev+0x3d/0x80
        [<ffffffff81483175>] addrconf_notify+0x3d5/0x4b0
        [<ffffffff814fae3f>] notifier_call_chain+0x3f/0x80
        [<ffffffff81073471>] raw_notifier_call_chain+0x11/0x20
        [<ffffffff813d8722>] call_netdevice_notifiers+0x32/0x60
        [<ffffffff813d92d4>] __dev_notify_flags+0x34/0x80
        [<ffffffff813d9360>] dev_change_flags+0x40/0x70
        [<ffffffff813ea627>] do_setlink+0x237/0x8a0
        [<ffffffff813ebb6c>] rtnl_newlink+0x3ec/0x600
        [<ffffffff813eb4d0>] rtnetlink_rcv_msg+0x160/0x310
        [<ffffffff814040b9>] netlink_rcv_skb+0x89/0xb0
        [<ffffffff813eb357>] rtnetlink_rcv+0x27/0x40
        [<ffffffff81403e20>] netlink_unicast+0x140/0x180
        [<ffffffff81404a9e>] netlink_sendmsg+0x33e/0x380
        [<ffffffff813c4252>] sock_sendmsg+0x112/0x130
        [<ffffffff813c537e>] __sys_sendmsg+0x44e/0x460
        [<ffffffff813c5544>] sys_sendmsg+0x44/0x70
        [<ffffffff814feab9>] system_call_fastpath+0x16/0x1b

 -> #0 (&ndev->lock){+.+...}:
        [<ffffffff810a798e>] check_prev_add+0x3de/0x440
        [<ffffffff810a8027>] validate_chain+0x637/0x730
        [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500
        [<ffffffff810a8734>] lock_acquire+0x114/0x150
        [<ffffffff814f6c82>] rt_read_lock+0x42/0x60
        [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120
        [<ffffffff8149b036>] mld_newpack+0xb6/0x160
        [<ffffffff8149b18b>] add_grhead+0xab/0xc0
        [<ffffffff8149d03b>] add_grec+0x3ab/0x460
        [<ffffffff8149d14a>] mld_send_report+0x5a/0x150
        [<ffffffff8149f99e>] igmp6_timer_handler+0x4e/0xb0
        [<ffffffff8105705a>] call_timer_fn+0xca/0x1d0
        [<ffffffff81057b9f>] run_timer_softirq+0x1df/0x2e0
        [<ffffffff8104e8c7>] handle_pending_softirqs+0xf7/0x1f0
        [<ffffffff8104ea3b>] __do_softirq_common+0x7b/0xf0
        [<ffffffff8104f07f>] __thread_do_softirq+0x1af/0x210
        [<ffffffff8104f1c1>] run_ksoftirqd+0xe1/0x1f0
        [<ffffffff8106c7de>] kthread+0xae/0xc0
        [<ffffffff814fff74>] kernel_thread_helper+0x4/0x10

actually we can just hold idev->lock before taking pmc->mca_lock,
and avoid taking idev->lock again when iterating idev->addr_list,
since the upper callers of mld_newpack() already take
read_lock_bh(&idev->lock).

Reported-by: dingtianhong <dingtianhong@huawei.com>
Cc: dingtianhong <dingtianhong@huawei.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Chen Weilong <chenweilong@huawei.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 23:39:21 -07:00
Daniel Borkmann
bb33381d0c net: sctp: rework debugging framework to use pr_debug and friends
We should get rid of all own SCTP debug printk macros and use the ones
that the kernel offers anyway instead. This makes the code more readable
and conform to the kernel code, and offers all the features of dynamic
debbuging that pr_debug() et al has, such as only turning on/off portions
of debug messages at runtime through debugfs. The runtime cost of having
CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
is negligible [1]. If kernel debugging is completly turned off, then these
statements will also compile into "empty" functions.

While we're at it, we also need to change the Kconfig option as it /now/
only refers to the ifdef'ed code portions in outqueue.c that enable further
debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
was enabled with this Kconfig option and has now been removed, we
transform those code parts into WARNs resp. where appropriate BUG_ONs so
that those bugs can be more easily detected as probably not many people
have SCTP debugging permanently turned on.

To turn on all SCTP debugging, the following steps are needed:

 # mount -t debugfs none /sys/kernel/debug
 # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control

This can be done more fine-grained on a per file, per line basis and others
as described in [2].

 [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
 [2] Documentation/dynamic-debug-howto.txt

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 23:22:13 -07:00
Eliezer Tamir
91e2fd3378 net: avoid calling sched_clock when LLS is off
Change Low Latency Sockets code for select and poll so that
when LLS is disabled sched_clock() is never called.

Also, avoid sending POLL_LL to sockets if disabled.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 14:06:47 -07:00
Eliezer Tamir
ad6276e0fe net: fix LLS debug_smp_processor_id() warning
Our use of sched_clock is OK because we don't mind the side effects
of calling it and occasionally waking up on a different CPU.

When CONFIG_DEBUG_PREEMPT is on, disable preempt before calling
sched_clock() so we don't trigger a debug_smp_processor_id() warning.

Reported-by: Cody P Schafer <devel-lists@codyps.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 14:06:47 -07:00
David S. Miller
e62bc9e55f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Yet one more pull request for wireless updates intended for 3.11...

For the mac80211 bits, Johannes says:

"Here we have a few memory leak fixes related to BSS struct handling
mostly from Ben, including a fix for a more theoretical problem
(associating while a BSS struct times out) from myself, a compilation
warning fix from Arend, mesh fixes from Thomas, tracking the beacon
bitrate (Alex), a bandwidth change event fix (Ilan) and some initial
work for 5/10 MHz channels from Simon."

Regarding the iwlwifi bits, Johannes says:

"Emmanuel removed some unneeded/unsupported module parameters and adds a
Bluetooth 1x1 lookup-table for some upcoming products. From Alex I have
an older patch to add low-power receive support, this depended on a
mac80211 commit that only just came in with the merge from wireless-next
I did. Ilan made beacon timings better, and Eytan added some debug
statements for thermal throttling. I have a few cleanups, a fix for a
long-standing but rare warning, and, arguably the most important patch
here, the firmware API version bump for the 7260/3160 devices."

Also included is a Bluetooth pull -- Gustavo says:

"Here goes a set of patches to 3.11. The biggest work here is from Andre Guedes
on the move of the Discovery to use the new request framework. Other than that
Johan provided a bunch of fixes to the L2CAP code. The rest are just small
fixes and clean ups."

On top of all that, there are a variety of updates and fixes to
brcmfmac, rt2x00, wil6210, ath9k, ath10k, and a few others here and
there.  This also includes a pull of the wireless tree, in order to
prevent some merge conflicts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 13:21:17 -07:00
David S. Miller
4e144d3a80 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
The following batch contains Netfilter/IPVS updates for net-next,
they are:

* Enforce policy to several nfnetlink subsystem, from Daniel
  Borkmann.

* Use xt_socket to match the third packet (to perform simplistic
  socket-based stateful filtering), from Eric Dumazet.

* Avoid large timeout for picked up from the middle TCP flows,
  from Florian Westphal.

* Exclude IPVS from struct net if IPVS is disabled and removal
  of unnecessary included header file, from JunweiZhang.

* Release SCTP connection immediately under load, to mimic current
  TCP behaviour, from Julian Anastasov.

* Replace and enhance SCTP state machine, from Julian Anastasov.

* Add tweak to reduce sync traffic in the presence of persistence,
  also from Julian Anastasov.

* Add tweak for the IPVS SH scheduler not to reject connections
  directed to a server, choose a new one instead, from Alexander
  Frolkin.

* Add support for sloppy TCP and SCTP modes, that creates state
  information on any packet, not only initial handshake packets,
  from Alexander Frolkin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-30 17:35:13 -07:00
Timo Teräs
2ffae99d1f ipv4: use next hop exceptions also for input routes
Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
assmued that "locally destined, and routed packets, never trigger
PMTU events or redirects that will be processed by us".

However, it seems that tunnel devices do trigger PMTU events in certain
cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
skb_dst(skb)->ops->update_pmtu to propage mtu information from the
outer flows. These can cause the inner flow mtu to be decreased. If
next hop exceptions are not consulted for pmtu, IP fragmentation will
not be done properly for these routes.

It also seems that we really need to have the PMTU information always
for netfilter TCPMSS clamp-to-pmtu feature to work properly.

So for the time being, cache separate copies of input routes for
each next hop exception.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-28 21:27:47 -07:00
Hannes Frederic Sowa
b173ee488d ipv6: resend MLD report if a link-local address completes DAD
RFC3590/RFC3810 specifies we should resend MLD reports as soon as a
valid link-local address is available.

We now use the valid_ll_addr_cnt to check if it is necessary to resend
a new report.

Changes since Flavio Leitner's version:
a) adapt for valid_ll_addr_cnt
b) resend first reports directly in the path and just arm the timer for
   mc_qrv-1 resends.

Reported-by: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-28 21:19:17 -07:00
Hannes Frederic Sowa
1ec047eb47 ipv6: introduce per-interface counter for dad-completed ipv6 addresses
To reduce the number of unnecessary router solicitations, MLDv2 and IGMPv3
messages we need to track the number of valid (as in non-optimistic,
no-dad-failed and non-tentative) link-local addresses. Therefore, this
patch implements a valid_ll_addr_cnt in struct inet6_dev.

We now only emit router solicitations if the first link-local address
finishes duplicate address detection.

The changes for MLDv2 and IGMPv3 are in a follow-up patch.

While there, also simplify one if statement(one minor nit I made in one
of my previous patches):

if (!...)
	do();
else
	return;

<<into>>

if (...)
	return;
do();

Cc: Flavio Leitner <fbl@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Suggested-by: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-28 21:19:17 -07:00
John W. Linville
57ed5cd695 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	net/wireless/nl80211.c
2013-06-28 13:18:21 -04:00
Nicolas Dichtel
5e6700b3bf sit: add support of x-netns
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-27 22:30:47 -07:00
JunweiZhang
8b4d14d8eb netns: exclude ipvs from struct net when IPVS disabled
no real problem is fixed, just save a few bytes in
net_namespace structure.

Signed-off-by: JunweiZhang <junwei.zhang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov
4d0c875dcc ipvs: add sync_persist_mode flag
Add sync_persist_mode flag to reduce sync traffic
by syncing only persistent templates.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov
61e7c420b4 ipvs: replace the SCTP state machine
Convert the SCTP state table, so that it is more readable.
Change the states to be according to the diagram in RFC 2960
and add more states suitable for middle box. Still, such
change in states adds incompatibility if systems in sync
setup include this change and others do not include it.

With this change we also have proper transitions in INPUT-ONLY
mode (DR/TUN) where we see packets only from client. Now
we should not switch to 10-second CLOSED state at a time
when we should stay in ESTABLISHED state.

The short names for states are because we have 16-char space
in ipvsadm and 11-char limit for the connection list format.
It is a sequence of the TCP implementation where the longest
state name is ESTABLISHED.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Alexander Frolkin
c6c96c1883 ipvs: sloppy TCP and SCTP
This adds support for sloppy TCP and SCTP modes to IPVS.

When enabled (sysctls net.ipv4.vs.sloppy_tcp and
net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
packet, not just a TCP SYN (or SCTP INIT).

This allows connections to fail over from one IPVS director to another
mid-flight.

Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov
bba54de5bd ipvs: provide iph to schedulers
Before now the schedulers needed access only to IP
addresses and it was easy to get them from skb by
using ip_vs_fill_iph_addr_only.

New changes for the SH scheduler will need the protocol
and ports which is difficult to get from skb for the
IPv6 case. As we have all the data in the iph structure,
to avoid the same slow lookups provide the iph to schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:45 +09:00
Eliezer Tamir
2d48d67fa8 net: poll/select low latency socket support
select/poll busy-poll support.

Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txt

Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.

When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.

Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25 16:35:52 -07:00
Daniel Borkmann
52db882f3f net: sctp: migrate cookie life from timeval to ktime
Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.

We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to a simple arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions (other archs); code becomes more readable; macros can be
thrown out.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25 16:33:04 -07:00
Daniel Borkmann
f072d7aba7 net: sctp: remove TEST_FRAME ifdef
We do neither ship a test_frame.h, nor will this be compatible with
the 2.5 out-of-tree lksctp kernel test suite anyway. So remove this
artefact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25 16:33:04 -07:00
Hannes Frederic Sowa
b7b1bfce0b ipv6: split duplicate address detection and router solicitation timer
This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.

The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.

If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.

Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25 16:23:03 -07:00
Eric Dumazet
7ae8639c9d tcp: remove invalid __rcu annotation
struct tcp_fastopen_context has a field named tfm, which is a pointer
to a crypto_cipher structure.

It currently has a __rcu annotation, which is not needed at all.

tcp_fastopen_ctx is the pointer fetched by rcu_dereference(), but once
we have a pointer to current tcp_fastopen_context, we do not use/need
rcu_dereference() to access tfm.

This fixes a lot of sparse errors like the following :

net/ipv4/tcp_fastopen.c:21:31: warning: incorrect type in argument 1 (different address spaces)
net/ipv4/tcp_fastopen.c:21:31:    expected struct crypto_cipher *tfm
net/ipv4/tcp_fastopen.c:21:31:    got struct crypto_cipher [noderef] <asn:4>*tfm

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25 02:44:05 -07:00
John W. Linville
9fbdc75116 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-06-24 14:45:50 -04:00
John W. Linville
66ba271ab9 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-06-24 14:44:59 -04:00
David S. Miller
7e2f934dc5 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
I would guess that this is the last big wireless pull request before
the 3.11 merge window...

Regarding the mac80211 bits, Johannes says:

"I have a number of mesh fixes and improvements from Colleen, Jacob,
Ashok and Thomas, powersave fixes in mac80211 from Alex, improved
management-TX from Antonio, and a few various things, including locking
fixes, from others and myself. Overall though, nothing really stands
out."

As for the iwlwifi bits, Johannes says:

"Emmanuel contributed two AP mode fixes, removed an unused field, fixed a
comment and added a warning for something that shouldn't happen in
practice, and I removed the declaration of a function that doesn't even
exist and cleaned up a small include."

"This time I have a number of cleanups, a small fix from Emmanuel and two
performance improvements that combined reduce our driver's CPU
utilisation as much as 75% in high TX-throughput scenarios."

"These two patches fix two issues with using rfkill randomly during
traffic, which would then cause our driver to stop working and not be
able to recover at all."

Regarding the ath6kl bits, Kalle says:

"Here are few simple patches for ath6kl. We have a suspend crash fix for
USB from Shafi, use of mac_pton(), a compiler warning fix and a fix for
module initialisation error path."

Kalle also sends the biggest single item of note, the new ath10k
driver for Qualcomm Atheros 802.11ac CQA98xx devices.

Included is an NFC pull, of which Samuel says:

"These are the pending NFC patches for the 3.11 merge window.

It contains the pending fixes that were on nfc-fixes (nfc-fixes-3.10-2),
along with a few more for the pn544 and pn533 drivers, the LLCP
disconnection path and an LLCP memory leak.

Highlights for this one are:

- An initial secure element API. NFC chipsets can carry an embedded
  secure element or get access to the SIM one. In both cases they
  control the secure elements and this API provides a way to discover,
  enable and disable the available SEs. It also exports that to
  userspace in order for SE focused middleware to actually do something
  with them (e.g. payments).

- NCI over SPI support. SPI is the most complex NCI specified transport
  layer and we now have support for it in the kernel. The next step will
  be to implement drivers for NCI chipsets using this transport like
  e.g. bcm2079x.

- NFC p2p hardware simulation driver. We now have an nfcsim driver that
  is mostly a loopback device between 2 NFC interfaces. It also
  implements the rest of the NFC core API like polling and target
  detection. This driver, with neard running on top of it, allows us to
  completely test the LLCP, SNEP and Handover implementation without
  physical hardware.

- A Firmware update netlink API. Most (All ?) HCI chipsets have a
  special firmware update mode where applications can push a new
  firmware that will be flashed. We now have a netlink API for providing
  that mode to e.g. nfctool."

On top of all that, there are a variety of updates to brcmfmac,
iwlegacy, rtlwifi, wil6210, and the TI wl12xx drivers.  As usual,
the bcma and ssb busses get a little love as well, as do a handful
of others here and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-24 00:31:02 -07:00
Jesse Gross
5243b6ac9e ip_tunnel: Protect tunnel functions with CONFIG_INET guard.
Tunnel constants can be used in generic code but in these cases
the inline functions in ip_tunnels.h cause compilation problems
if CONFIG_INET is not set.

CC: Pravin Shelar <pshelar@nicira.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-24 00:18:38 -07:00
Andrei Emeltchenko
0a804654af Bluetooth: Remove unneeded flag
Remove HCI_LINK_KEYS flag since using HCI_MGMT is enough for test that
user space expects the kernel managing link keys.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:53 +01:00
Andre Guedes
b0434345f2 Bluetooth: Remove inquiry helpers
This patch removes hci_do_inquiry and hci_cancel_inquiry helpers. We
now use the HCI request framework in device discovery functionality
and these helpers are no longer needed.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:52 +01:00
Andre Guedes
917eedc56c Bluetooth: Remove LE scan helpers
This patch removes the LE scan helpers hci_le_scan and hci_cancel_
le_scan and all code related to it. We now use the HCI request
framework in device discovery functionality and these helpers are
no longer needed.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:51 +01:00
Andre Guedes
1183fdcad4 Bluetooth: Make mgmt_stop_discovery_failed static
mgmt_stop_discovery_failed is now only used in mgmt.c so we can
make it a local function. This patch also moves the mgmt_stop_
discovery_failed definition up in mgmt.c to avoid forward
declaration.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:51 +01:00
Andre Guedes
4c87eaab01 Bluetooth: Use HCI request in interleaved discovery
In order to have a better HCI error handling in interleaved discovery
functionality, we should use the HCI request framework.

This patch updates le_scan_disable_work function so it uses the
HCI request framework instead of the hci_send_cmd helper. A complete
callback is registered (le_scan_disable_work_complete function) so we
are able to trigger the inquiry procedure (if we are running the
interleaved discovery) or to stop the discovery procedure (if we are
running LE-only discovery).

This patch also removes the extra logic in hci_cc_le_set_scan_enable
to trigger the inquiry procedure and the mgmt_interleaved_discovery
function since they become useless.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:50 +01:00
Andre Guedes
0d8cc935e0 Bluetooth: Move discovery macros to hci_core.h
Some of discovery macros will be used in hci_core so we need to
define them in common place such as hci_core.h. Thus, this patch
moves discovery macros to hci_core.h and also adds the DISCOV_
prefix to them.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:50 +01:00
Andre Guedes
41dc2bd6d1 Bluetooth: Make mgmt_start_discovery_failed static
mgmt_start_discovery_failed is now only used in mgmt.c so we can
make it a local function. This patch also moves the mgmt_start_
discovery_failed definition up in mgmt.c to avoid forward
declaration.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:50 +01:00
Andre Guedes
1f9b9a5dc5 Bluetooth: Make inquiry_cache_flush non-static
In order to use HCI request framework in start_discovery, we'll need
to call inquiry_cache_flush in mgmt.c. Therefore, this patch adds the
hci_ prefix to inquiry_cache_flush and makes it non-static.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:49 +01:00
Johan Hedberg
073d1cf35f Bluetooth: Rename L2CAP_CID_LE_DATA to L2CAP_CID_ATT
In future Core Specification versions the ATT CID will be just one of
many possible CIDs that can be used for data transfer. Therefore, it
makes sense to rename the define for the ATT CID to something less
ambigous.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-06-23 00:23:47 +01:00
John W. Linville
7d2a47aab2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	net/wireless/nl80211.c
2013-06-21 15:42:30 -04:00
Joe Perches
fedaf4ffc2 ndisc: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:18:07 -07:00
Joe Perches
9e8cda3ba8 ipv6: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:18:07 -07:00
Daniel Borkmann
eea86af6b1 net: sock: adapt SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF
The current situation is that SOCK_MIN_RCVBUF is 2048 + sizeof(struct sk_buff))
while SOCK_MIN_SNDBUF is 2048. Since in both cases, skb->truesize is used for
sk_{r,w}mem_alloc accounting, we should have both sizes adjusted via defining a
TCP_SKB_MIN_TRUESIZE.

Further, as Eric Dumazet points out, the minimal skb truesize in transmit path is
SKB_TRUESIZE(2048) after commit f07d960df3 ("tcp: avoid frag allocation for
small frames"), and tcp_sendmsg() tries to limit skb size to half the congestion
window, meaning we try to build two skbs at minimum. Thus, having SOCK_MIN_SNDBUF
as 2048 can hit a small regression for some applications setting to low
SO_SNDBUF / SO_RCVBUF. Note that we define a TCP_SKB_MIN_TRUESIZE, because
SKB_TRUESIZE(2048) adds SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), but in
case of TCP skbs, the skb_shared_info is part of the 2048 bytes allocation for
skb->head.

The minor adaption in sk_stream_moderate_sndbuf() is to silence a warning by
using a typed max macro, as similarly done in SOCK_MIN_RCVBUF occurences, that
would appear otherwise.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 21:16:53 -07:00
Pravin B Shelar
9a628224a6 ip_tunnel: Add dont fragment flag.
This flag will be used by ovs tunneling.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
3d7b46cd20 ip_tunnel: push generic protocol handling to ip_tunnel module.
Process skb tunnel header before sending packet to protocol handler.
this allows code sharing between gre and ovs gre modules.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
0e6fbc5b6c ip_tunnels: extend iptunnel_xmit()
Refactor various ip tunnels xmit functions and extend iptunnel_xmit()
so that there is more code sharing.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
45f2e9976c gre: export gre_handle_offloads() function.
This is required for OVS GRE offloading.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
752f36da68 gre: export gre_build_header() function.
This is required for ovs gre module.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:40 -07:00
Pravin B Shelar
bda7bb4634 gre: Allow multiple protocol listener for gre protocol.
Currently there is only one user is allowed to register for gre
protocol.  Following patch adds de-multiplexer.  So that multiple
modules can listen on gre protocol e.g. kernel gre devices and ovs.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:40 -07:00
David S. Miller
d98cae64e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/ath/ath9k/Kconfig
	drivers/net/xen-netback/netback.c
	net/batman-adv/bat_iv_ogm.c
	net/wireless/nl80211.c

The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.

The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().

Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.

The nl80211 conflict is a little more involved.  In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported.  Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.

However, the dump handlers to not use this logic.  Instead they have
to explicitly do the locking.  There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so.  So I fixed that whilst handling the overlapping changes.

To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 16:49:39 -07:00
Johannes Berg
959867fa55 cfg80211: require passing BSS struct back to cfg80211_assoc_timeout
Doing so will allow us to hold the BSS (not just ref it) over the
association process, thus ensuring that it doesn't time out and
gets invisible to the user (e.g. in 'iw wlan0 link'.)

This also fixes a leak in mac80211 where it doesn't always release
the BSS struct properly in all cases where calling this function.
This leak was reported by Ben Greear.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-19 18:55:39 +02:00
Simon Wunderlich
30e7473263 nl80211: add rate flags for 5/10 Mhz channels
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-18 16:16:23 +02:00
Simon Wunderlich
2f301ab29e nl80211/cfg80211: add 5 and 10 MHz defines and wiphy flag
Add defines for 5 and 10 MHz channel width and fix channel
handling functions accordingly.

Also check for and report the WIPHY_FLAG_SUPPORTS_5_10_MHZ
capability.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
[fix spelling in comment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-18 16:06:50 +02:00
Daniel Borkmann
dda9192851 net: sctp: remove SCTP_STATIC macro
SCTP_STATIC is just another define for the static keyword. It's use
is inconsistent in the SCTP code anyway and it was introduced in the
initial implementation of SCTP in 2.5. We have a regression suite in
lksctp-tools, but this is for user space only, so noone makes use of
this macro anymore. The kernel test suite for 2.5 is incompatible with
the current SCTP code anyway.

So simply Remove it, to be more consistent with the rest of the kernel
code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 17:08:05 -07:00
Daniel Borkmann
939cfa75a0 net: sctp: get rid of t_new macro for kzalloc
t_new rather obfuscates things where everyone else is using actual
function names instead of that macro, so replace it with kzalloc,
which is the function t_new wraps.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 17:08:04 -07:00
Eliezer Tamir
dafcc4380d net: add socket option for low latency polling
adds a socket option for low latency polling.
This allows overriding the global sysctl value with a per-socket one.
Unexport sysctl_net_ll_poll since for now it's not needed in modules.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:48:14 -07:00
Eliezer Tamir
9a3c71aa80 net: convert low latency sockets to sched_clock()
Use sched_clock() instead of get_cycles().
We can use sched_clock() because we don't care much about accuracy.
Remove the dependency on X86_TSC

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:48:14 -07:00
Eliezer Tamir
eb6db62282 net: change sysctl_net_ll_poll into an unsigned int
There is no reason for sysctl_net_ll_poll to be an unsigned long.
Change it into an unsigned int.
Fix the proc handler.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:48:13 -07:00
Samuel Ortiz
fed7c25ec0 NFC: Add secure elements addition and removal API
This API will allow NFC drivers to add and remove the secure elements
they know about or detect. Typically this should be called (asynchronously
or not) from the driver or the host interface stack detect_se hook.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 13:44:58 +02:00
Samuel Ortiz
0a946301c2 NFC: Extend and fix the internal secure element API
Secure elements need to be discovered after enabling the NFC controller.
This is typically done by the NCI core and the HCI drivers (HCI does not
specify how to discover SEs, it is left to the specific drivers).
Also, the SE enable/disable API explicitely takes a SE index as its
argument.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 13:44:53 +02:00
Samuel Ortiz
0b456c418a NFC: Remove the static supported_se field
Supported secure elements are typically found during a discovery process
initiated when the NFC controller is up and running. For a given NFC
chipset there can be many configurations (embedded SE or not, with or
without a SIM card wired to the NFC controller SWP interface, etc...) and
thus driver code will never know before hand which SEs are available.
So we remove this field, it will be replaced by a real SE discovery
mechanism.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 13:44:19 +02:00
Samuel Ortiz
322bce957e NFC: pn533: Copy NFCID2 through ATR_REQ
When using NFC-F we should copy the NFCID2 buffer that we got from
SENSF_RES through the ATR_REQ NFCID3 buffer. Not doing so violates
NFC Forum digital requirement #189.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 13:44:18 +02:00
Frederic Danis
391d8a2da7 NFC: Add NCI over SPI receive
Before any operation, driver interruption is de-asserted to prevent
race condition between TX and RX.

Transaction starts by emitting "Direct read" and acknowledged mode
bytes. Then packet length is read allowing to allocate correct NCI
socket buffer. After that payload is retrieved.

A delay after the transaction can be added.
This delay is determined by the driver during nci_spi_allocate_device()
call and can be 0.

If acknowledged mode is set:
- CRC of header and payload is checked
- if frame reception fails (CRC error): NACK is sent
- if received frame has ACK or NACK flag: unblock nci_spi_send()

Payload is passed to NCI module.

At the end, driver interruption is re asserted.

Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 13:44:16 +02:00
Frederic Danis
ee9596d467 NFC: Add NCI over SPI send
Before any operation, driver interruption is de-asserted to prevent
race condition between TX and RX.

The NCI over SPI header is added in front of NCI packet.
If acknowledged mode is set, CRC-16-CCITT is added to the packet.
Then the packet is forwarded to SPI module to be sent.

A delay after the transaction is added.
This delay is determined by the driver during nci_spi_allocate_device()
call and can be 0.

After data has been sent, driver interruption is re-asserted.

If acknowledged mode is set, nci_spi_send will block until
acknowledgment is received.

Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 13:44:15 +02:00
Frederic Danis
8a00a61b0e NFC: Add basic NCI over SPI
The NFC Forum defines a transport interface based on
Serial Peripheral Interface (SPI) for the NFC Controller
Interface (NCI).

This module implements the SPI transport of NCI, calling SPI module
directly to read/write data to NFC controller (NFCC).

NFCC driver should provide functions performing device open and close.
It should also provide functions asserting/de-asserting interruption
to prevent TX/RX race conditions.
NFCC driver can also fix a delay between transactions if needed by
the hardware.

Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 13:44:03 +02:00
Eric Lapuyade
9a695d23aa NFC: HCI: Implement fw_upload ops
This is a simple forward to the HCI driver. When driver is done with the
operation, it shall directly notify NFC Core by calling
nfc_fw_upload_done().

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 00:26:09 +02:00
Eric Lapuyade
9674da8759 NFC: Add firmware upload netlink command
As several NFC chipsets can have their firmwares upgraded and
reflashed, this patchset adds a new netlink command to trigger
that the driver loads or flashes a new firmware. This will allows
userspace triggered firmware upgrade through netlink.
The firmware name or hint is passed as a parameter, and the driver
will eventually fetch the firmware binary through the request_firmware
API.
The cmd can only be executed when the nfc dev is not in use. Actual
firmware loading/flashing is an asynchronous operation. Result of the
operation shall send a new event up to user space through the nfc dev
multicast socket. During operation, the nfc dev is not openable and
thus not usable.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 00:26:08 +02:00
Frederic Danis
1095e69f47 NFC: NCI: Fix skb->dev usage
skb->dev is used for carrying a net_device pointer and not
an nci_dev pointer.

Remove usage of skb-dev to carry nci_dev and replace it by parameter
in nci_recv_frame(), nci_send_frame() and driver send() functions.

NfcWilink driver is also updated to use those functions.

Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-06-14 00:25:53 +02:00
Eric Dumazet
d3b6f61418 ip_tunnel: remove __net_init/exit from exported functions
If CONFIG_NET_NS is not set then __net_init is the same as __init and
__net_exit is the same as __exit. These functions will be removed from
memory after the module loads or is removed. Functions that are exported
for use by other functions should never be labeled for removal.

Bug introduced by commit c544193214
("GRE: Refactor GRE tunneling code.")

Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 03:00:59 -07:00
Alexander Bondar
817cee7675 mac80211: track AP's beacon rate and give it to the driver
Track the AP's beacon rate in the scan BSS data and in the
interface configuration to let the drivers know which rate
the AP is using. This information may be used by drivers,
in our case to let the firmware optimise beacon RX.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-13 11:58:47 +02:00
Yuchung Cheng
85f16525a2 tcp: properly send new data in fast recovery in first RTT
Linux sends new unset data during disorder and recovery state if all
(suspected) lost packets have been retransmitted ( RFC5681, section
3.2 step 1 & 2, RFC3517 section 4, NexSeg() Rule 2).  One requirement
is to keep the receive window about twice the estimated sender's
congestion window (tcp_rcv_space_adjust()), assuming the fast
retransmits repair the losses in the next round trip.

But currently it's not the case on the first round trip in either
normal or Fast Open connection, beucase the initial receive window
is identical to (expected) sender's initial congestion window. The
fix is to double it.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 02:46:29 -07:00
David S. Miller
4a2e667ac1 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This pull request is intended for the 3.11 stream...

One big highlight is the cw1200 driver the ST-E CW1100 & CW1200
WLAN chipsets.  This one has been lingering for a while, lacking
some review comments.  Once started getting pulled into linux-next,
it got a bit more attention and a number of improvements were made
over the initial cut.  No doubt there will be more changes ahead,
but I think it is looking alright at this point.

Along with that, there is the usual flurry of updates to the mac80211
core and the iwlwifi, mwifiex, ath9k, rt2x00, wil6210, and other
drivers.  A few of the highlights are some rt2x00 refactoring/cleanup
by Gabor Juhos, some rt2800 hardware support enhancements by Stanislaw
Gruszka, some iwlwifi power management updates from Alexander Bondar,
some enhanced bcma SPROM support from Rafał Miłecki, and a variety
of other things here and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 14:23:41 -07:00
John W. Linville
812fd64596 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/iwlwifi/mvm/mac80211.c
2013-06-12 15:39:05 -04:00
John W. Linville
861bca265e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	drivers/net/wireless/ath/ath9k/Kconfig
	net/mac80211/iface.c
2013-06-12 14:35:23 -04:00
John W. Linville
42d887a680 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-06-12 10:57:04 -04:00
Johan Hedberg
96570ffcca Bluetooth: Fix mgmt handling of power on failures
If hci_dev_open fails we need to ensure that the corresponding
mgmt_set_powered command gets an appropriate response. This patch fixes
the missing response by adding a new mgmt_set_powered_failed function
that's used to indicate a power on failure to mgmt. Since a situation
with the device being rfkilled may require special handling in user
space the patch uses a new dedicated mgmt status code for this.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-06-12 10:20:55 -04:00
Rami Rosen
503b47eafc ipv4: remove is_data also from ip_options documentation.
commit ef722495c8
( [IPV4]: Remove unused ip_options->is_data) removed the unused is_data
member from ip_options struct.

This patch removes is_data also from the documentation of the ip_options struct.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 03:13:50 -07:00
Daniel Borkmann
da5bab079f net: udp4: move GSO functions to udp_offload
Similarly to TCP offloading and UDPv6 offloading, move all related
UDPv4 functions to udp_offload.c to make things more explicit. Also,
by this, we can make those functions static.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:47:25 -07:00
Eric Dumazet
130d3d68b5 net_sched: psched_ratecfg_precompute() improvements
Before allowing 64bits bytes rates, refactor
psched_ratecfg_precompute() to get better comments
and increased accuracy.

rate_bps field is renamed to rate_bytes_ps, as we only
have to worry about bytes per second.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 22:39:47 -07:00
John W. Linville
3899ba90a4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/ath/ath9k/debug.c
	net/mac80211/iface.c
2013-06-11 14:48:32 -04:00
Ashok Nagarajan
ffb3cf3000 {nl,mac,cfg}80211: Allow user to configure basic rates for mesh
Currently mesh uses mandatory rates as the default basic rates. Allow basic
rates to be configured during mesh join. Basic rates are applied only if
channel is also provided with mesh join command.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
[some whitespace fixes, refuse basic rates w/o channel]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-11 14:24:36 +02:00
Colleen Twitty
8e7c053853 {nl,cfg}80211: make peer link expiration time configurable
If a STA has a peer that it hasn't seen any tx activity
from for a certain length of time, the peer link is
expired. This means the inactive STA is removed from the
list of peers and that STA is not considered a peer again
unless it re-peers.  Previously, this inactivity time was
always 30 minutes.  Now, add it to the mesh configuration
and allow it to be configured.  Retain 30 minutes as a
default value.

Signed-off-by: Colleen Twitty <colleen@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-11 14:16:29 +02:00
Eric Dumazet
45203a3b38 net_sched: add 64bit rate estimators
struct gnet_stats_rate_est contains u32 fields, so the bytes per second
field can wrap at 34360Mbit.

Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
and switch the kernel to use this structure natively.

This structure is dumped to user space as a new attribute :

TCA_STATS_RATE_EST64

Old tc command will now display the capped bps (to 34360Mbit), instead
of wrapped values, and updated tc command will display correct
information.

Old tc command output, after patch :

eric:~# tc -s -d qd sh dev lo
qdisc pfifo 8001: root refcnt 2 limit 1000p
 Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
 rate 34360Mbit 189696pps backlog 0b 0p requeues 0

This patch carefully reorganizes "struct Qdisc" layout to get optimal
performance on SMP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:51:03 -07:00
Eliezer Tamir
0602129286 net: add low latency socket poll
Adds an ndo_ll_poll method and the code that supports it.
This method can be used by low latency applications to busy-poll
Ethernet device queues directly from the socket code.
sysctl_net_ll_poll controls how many microseconds to poll.
Default is zero (disabled).
Individual protocol support will be added by subsequent patches.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:35 -07:00
Daniel Borkmann
28850dc7c7 net: tcp: move GRO/GSO functions to tcp_offload
Would be good to make things explicit and move those functions to
a new file called tcp_offload.c, thus make this similar to tcpv6_offload.c.
While moving all related functions into tcp_offload.c, we can also
make some of them static, since they are only used there. Also, add
an explicit registration function.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 14:39:05 -07:00
David S. Miller
143554ace8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Conflicts:
	net/netfilter/nf_log.c

The conflict in nf_log.c is that in 'net' we added CONFIG_PROC_FS
protection around foo_proc_entry() calls to fix a build failure,
whereas in Pablo's tree a guard if() test around a call is
remove_proc_entry() was removed.  Trivially resolved.

Pablo Neira Ayuso says:

====================
The following patchset contains the first batch of
Netfilter/IPVS updates for your net-next tree, they are:

* Three patches with improvements and code refactorization
  for nfnetlink_queue, from Florian Westphal.

* FTP helper now parses replies without brackets, as RFC1123
  recommends, from Jeff Mahoney.

* Rise a warning to tell everyone about ULOG deprecation,
  NFLOG has been already in the kernel tree for long time
  and supersedes the old logging over netlink stub, from
  myself.

* Don't panic if we fail to load netfilter core framework,
  just bail out instead, from myself.

* Add cond_resched_rcu, used by IPVS to allow rescheduling
  while walking over big hashtables, from Simon Horman.

* Change type of IPVS sysctl_sync_qlen_max sysctl to avoid
  possible overflow, from Zhang Yanfei.

* Use strlcpy instead of strncpy to skip zeroing of already
  initialized area to write the extension names in ebtables,
  from Chen Gang.

* Use already existing per-cpu notrack object from xt_CT,
  from Eric Dumazet.

* Save explicit socket lookup in xt_socket now that we have
  early demux, also from Eric Dumazet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 01:03:06 -07:00
David S. Miller
6bc19fb82d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' bug fixes into 'net-next' as we have patches
that will build on top of them.

This merge commit includes a change from Emil Goode
(emilgoode@gmail.com) that fixes a warning that would
have been introduced by this merge.  Specifically it
fixes the pingv6_ops method ipv6_chk_addr() to add a
"const" to the "struct net_device *dev" argument and
likewise update the dummy_ipv6_chk_addr() declaration.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-05 16:37:30 -07:00
Johannes Berg
780b40df12 wireless: fix kernel-doc
Some kernel-doc fixes for forgotten fields and renamed things.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-05 09:33:32 +02:00
Alexander Bondar
989c6505cd mac80211: Use suitable semantics for beacon availability indication
Currently beacon availability upon association is marked by have_beacon
flag of assoc_data structure that becomes unavailable when association
completes. However beacon availability indication is required also after
association to inform a driver. Currently dtim_period parameter is used
for this purpose. Move have_beacon flag to another structure, persistant
throughout a interface's life cycle. Use suitable sematics for beacon
availability indication.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
[fix another instance of BSS_CHANGED_DTIM_PERIOD in docs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-05 09:12:20 +02:00
Joe Perches
f145b67f24 transp_v6.h: style neatening
Use a more current code style.

Remove extern from function prototypes.
Align function arguments and reflow to 80 cols.
Use network comment styles.

Signed-off-by: Joe Perches <joe@perches.com>
cc: Lorenzo Colitti <lorenzo@google.com>,
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 16:43:42 -07:00
Lorenzo Colitti
d862e54614 net: ipv6: Implement /proc/net/icmp6.
The format is based on /proc/net/icmp and /proc/net/{udp,raw}6.

Compiles and displays reasonable results with CONFIG_IPV6={n,m,y}
Couldn't figure out how to test without CONFIG_PROC_FS enabled.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 12:56:14 -07:00
Lorenzo Colitti
8cc785f6f4 net: ipv4: make the ping /proc code AF-independent
Introduce a ping_seq_afinfo structure (similar to its UDP
equivalent) and use it to make some of the ping /proc functions
address-family independent. Rename the remaining ping /proc
functions from ping_* to ping_v4_*.

Compiles and displays reasonable results with CONFIG_IPV6={n,m,y}

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 12:56:14 -07:00
Lorenzo Colitti
17ef66afc0 net: ipv6: Unify {raw,udp}6_sock_seq_show.
udp6_sock_seq_show and raw6_sock_seq_show are identical, except
the UDP version displays ports and the raw version displays the
protocol. Refactor most of the code in these two functions into
a new common ip6_dgram_sock_seq_show function, in preparation
for using it to display ICMPv6 sockets as well.

Also reduce the indentation in parts of include/net/transp_v6.h
to improve readability.

Compiles and displays reasonable results with CONFIG_IPV6={n,m,y}

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 12:56:14 -07:00
Lorenzo Colitti
75698b17ac Clean up indentation in net/ipv6/transp_v6.h
Reduce the indentation of most of the functions and make it a
bit more consistent. This allows longer function and arg names
to be consistently indented without wrapping.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 12:56:14 -07:00
Johannes Berg
ceca7b7121 cfg80211: separate internal SME implementation
The current internal SME implementation in cfg80211 is
very mixed up with the MLME handling, which has been
causing issues for a long time. There are three things
that the implementation has to provide:
 * a basic SME implementation for nl80211's connect()
   call (for drivers implementing auth/assoc, which is
   really just mac80211) and wireless extensions
 * MLME events for the userspace SME
 * SME events (connected, disconnected etc.) for all
   different SME implementation possibilities (driver,
   cfg80211 and userspace)

To achieve these goals it isn't necessary to track the
software SME's connection status outside of it's state
(which is the part that caused many issues.) Instead,
track it only in the SME data (wdev->conn) and in the
general case only track whether the wdev is connected
or not (via wdev->current_bss.)

Also separate the internal implementation to not have
callbacks from the SME events, but rather call it from
the API functions that the driver (or rather mac80211)
calls. This separates the code better.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-04 13:03:11 +02:00
Johannes Berg
6ff57cf888 cfg80211/mac80211: clean up cfg80211 SME APIs
Do some cleanups in the cfg80211 SME APIs, which are
only used by mac80211.

Most of these functions get a frame passed, and there
isn't really any reason to export multiple functions
as cfg80211 can check the frame type instead, do that.

Additionally, the API functions have confusing names
like cfg80211_send_...() which was meant to indicate
that it sends an event to userspace, but gets a bit
confusing when there's both TX and RX and they're not
all clearly labeled.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-04 13:03:10 +02:00
Felix Fietkau
d6d23de278 mac80211: add a tx control flag to indicate PS-Poll/uAPSD response
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-04 12:41:36 +02:00
Johannes Berg
964dc9e2c3 cfg80211: take WoWLAN support information out of wiphy struct
There's no need to take up the space for devices that don't
support WoWLAN, and most drivers can even make the support
data static const (except where it's modified at runtime.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-03 18:43:34 +02:00
Timo Teräs
5aad1de5ea ipv4: use separate genid for next hop exceptions
commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
added the support to flush learned pmtu information.

However, using rt_genid is quite heavy as it is bumped on route
add/change and multicast events amongst other places. These can
happen quite often, especially if using dynamic routing protocols.

While this is ok with routes (as they are just recreated locally),
the pmtu information is learned from remote systems and the icmp
notification can come with long delays. It is worthy to have separate
genid to avoid excessive pmtu resets.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-03 00:07:43 -07:00
Eric Dumazet
01cb71d2d4 net_sched: restore "overhead xxx" handling
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "overhead xxx" handling, as well as the "linklayer atm"
attribute.

tc class add ... htb rate X ceil Y linklayer atm overhead 10

This patch restores the "overhead xxx" handling, for htb, tbf
and act_police

The "linklayer atm" thing needs a separate fix.

Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vimalkumar <j.vimal@gmail.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-02 22:22:35 -07:00
Paul Moore
e4c1721642 xfrm: force a garbage collection after deleting a policy
In some cases after deleting a policy from the SPD the policy would
remain in the dst/flow/route cache for an extended period of time
which caused problems for SELinux as its dynamic network access
controls key off of the number of XFRM policy and state entries.
This patch corrects this problem by forcing a XFRM garbage collection
whenever a policy is sucessfully removed.

Reported-by: Ondrej Moris <omoris@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:30:07 -07:00
Nicolas Dichtel
bf3d6a8f79 iptunnel: specify protocol outside IP header
Before this patch, ip_tunnel_xmit() was using the field protocol from the IP
header passed into argument.
There is no functional change, this patch prepares the support of IPv4 over
IPv4 for module sit.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:19:05 -07:00
Johannes Berg
bd5e14fb77 cfg80211: remove cleanup_work kernel-doc
I evidently forgot this when removing the work itself.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-29 09:11:57 +02:00
Felix Fietkau
e057d3c31b cfg80211: support an active monitor interface flag
An active monitor interface is one that is used for communication (via
injection). It is expected to ACK incoming unicast packets. This is
useful for running various 802.11 testing utilities that associate to an
AP via injection and manage the state in user space.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-29 09:11:44 +02:00
Simon Horman
ced14f6804 net: Correct comparisons and calculations using skb->tail and skb-transport_header
This corrects an regression introduced by "net: Use 16bits for *_headers
fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In
that case skb->tail will be a pointer whereas skb->transport_header
will be an offset from head. This is corrected by using wrappers that
ensure that comparisons and calculations are always made using pointers.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-28 23:49:07 -07:00
Johannes Berg
6abb9cb99f cfg80211: make WoWLAN configuration available to drivers
Make the current WoWLAN configuration available to drivers
at runtime. This isn't really useful for the normal WoWLAN
behaviour and accessing it can also be racy, but drivers
may use it for testing the WoWLAN device behaviour while
the host stays up & running to observe the device.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-27 15:10:58 +02:00
Lorenzo Colitti
6d0bfe2261 net: ipv6: Add IPv6 support to the ping socket.
This adds the ability to send ICMPv6 echo requests without a
raw socket. The equivalent ability for ICMPv4 was added in
2011.

Instead of having separate code paths for IPv4 and IPv6, make
most of the code in net/ipv4/ping.c dual-stack and only add a
few IPv6-specific bits (like the protocol definition) to a new
net/ipv6/ping.c. Hopefully this will reduce divergence and/or
duplication of bugs in the future.

Caveats:

- Setting options via ancillary data (e.g., using IPV6_PKTINFO
  to specify the outgoing interface) is not yet supported.
- There are no separate security settings for IPv4 and IPv6;
  everything is controlled by /proc/net/ipv4/ping_group_range.
- The proc interface does not yet display IPv6 ping sockets
  properly.

Tested with a patched copy of ping6 and using raw socket calls.
Compiles and works with all of CONFIG_IPV6={n,m,y}.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-25 21:07:49 -07:00
Zhang Yanfei
0799567424 ipvs: change type of netns_ipvs->sysctl_sync_qlen_max
This member of struct netns_ipvs is calculated from nr_free_buffer_pages
so change its type to unsigned long in case of overflow.  Also, type of
its related proc var sync_qlen_max and the return type of function
sysctl_sync_qlen_max() should be changed to unsigned long, too.

Besides, the type of ipvs_master_sync_state->sync_queue_len should be
changed to unsigned long accordingly.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-05-26 08:17:33 +09:00
David S. Miller
e6ff4c75f9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next because some upcoming net-next changes
build on top of bug fixes that went into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-24 16:48:28 -07:00
Johannes Berg
8d61ffa5e0 cfg80211/mac80211: use cfg80211 wdev mutex in mac80211
Using separate locks in cfg80211 and mac80211 has always
caused issues, for example having to unlock in places in
mac80211 to call cfg80211, which even needed a framework
to make cfg80211 calls after some functions returned etc.

Additionally, I suspect some issues people have reported
with the cfg80211 state getting confused could be due to
such issues, when cfg80211 is asking mac80211 to change
state but mac80211 is in the process of telling cfg80211
that the state changed (in another way.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-25 00:02:16 +02:00
Johannes Berg
5fe231e873 cfg80211: vastly simplify locking
Virtually all code paths in cfg80211 already (need to) hold
the RTNL. As such, there's little point in having another
four mutexes for various parts of the code, they just cause
lock ordering issues (and much of the time, the RTNL and a
few of the others need thus be held.)

Simplify all this by getting rid of the extra four mutexes
and just use the RTNL throughout. Only a few code changes
were needed to do this and we can get rid of a work struct
for bonus points.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-25 00:02:15 +02:00
Johannes Berg
dde7dc759b Merge remote-tracking branch 'mac80211/master' into mac80211-next 2013-05-25 00:01:30 +02:00
Ashok Nagarajan
b422c6cd7e {cfg,mac}80211: move mandatory rates calculation to cfg80211
Move mandatory rates calculation to cfg80211, shared with non mac80211 drivers.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
[extend documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-24 23:54:43 +02:00
Oleksij Rempel
786677d100 mac80211: add STBC flag for radiotap
Some chips can tell us if received frame was
encoded with STBC or not. To make this information available
in user space we can use updated radiotap specification:
http://www.radiotap.org/defined-fields/MCS

This patch will set number of STBC encoded spatial streams (Nss).
The HAVE_STBC flag should be provided by driver.

Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-24 12:07:25 +02:00
Pablo Neira Ayuso
de94c4591b netfilter: {ipt,ebt}_ULOG: rise warning on deprecation
This target has been superseded by NFLOG. Spot a warning
so we prepare removal in a couple of years.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-23 14:23:16 +02:00
Florian Westphal
2a7851bffb netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812:

[ ip6tables -m addrtype ]
When I tried to use in the nat/PREROUTING it messes up the
routing cache even if the rule didn't matched at all.
[..]
If I remove the --limit-iface-in from the non-working scenario, so just
use the -m addrtype --dst-type LOCAL it works!

This happens when LOCAL type matching is requested with --limit-iface-in,
and the default ipv6 route is via the interface the packet we test
arrived on.

Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation
creates an unwanted cached entry, and the packet won't make it to the
real/expected destination.

Silently ignoring --limit-iface-in makes the routing work but it breaks
rule matching (--dst-type LOCAL with limit-iface-in is supposed to only
match if the dst address is configured on the incoming interface;
without --limit-iface-in it will match if the address is reachable
via lo).

The test should call ipv6_chk_addr() instead.  However, this would add
a link-time dependency on ipv6.

There are two possible solutions:

1) Revert the commit that moved ipt_addrtype to xt_addrtype,
   and put ipv6 specific code into ip6t_addrtype.
2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions.

While the former might seem preferable, Pablo pointed out that there
are more xt modules with link-time dependeny issues regarding ipv6,
so lets go for 2).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23 11:58:55 +02:00
Eric Dumazet
71cea17ed3 tcp: md5: remove spinlock usage in fast path
TCP md5 code uses per cpu variables but protects access to them with
a shared spinlock, which is a contention point.

[ tcp_md5sig_pool_lock is locked twice per incoming packet ]

Makes things much simpler, by allocating crypto structures once, first
time a socket needs md5 keys, and not deallocating them as they are
really small.

Next step would be to allow crypto allocations being done in a NUMA
aware way.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:00:42 -07:00
Daniel Borkmann
168fc21a97 net: ipv6: remove 'next' member from inet6_dev
The next pointer within the inet6_dev structure seems not to be used
anywhere. So just remove it. Tested with allmodconfig on x86_64.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:49:48 -07:00
John W. Linville
ba7c96bec5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-05-20 15:19:01 -04:00
Yuchung Cheng
3e59cb0ddf tcp: remove bad timeout logic in fast recovery
tcp_timeout_skb() was intended to trigger fast recovery on timeout,
unfortunately in reality it often causes spurious retransmission
storms during fast recovery. The particular sign is a fast retransmit
over the highest sacked sequence (SND.FACK).

Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion
to avoid spurious timeout: when SND.UNA advances the sender re-arms
RTO and extends the timeout by icsk_rto. The sender does not offset
the time elapsed since the packet at SND.UNA was sent.

But if the next (DUP)ACK arrives later than ~RTTVAR and triggers
tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet
sent before the icsk_rto interval lost, including one that's above the
highest sacked sequence. Most likely a large part of scorebard will be
marked.

If most packets are not lost then the subsequent DUPACKs with new SACK
blocks will cause the sender to continue to retransmit packets beyond
SND.FACK spuriously. Even if only one packet is lost the sender may
falsely retransmit almost the entire window.

The situation becomes common in the world of bufferbloat: the RTT
continues to grow as the queue builds up but RTTVAR remains small and
close to the minimum 200ms. If a data packet is lost and the DUPACK
triggered by the next data packet is slightly delayed, then a spurious
retransmission storm forms.

As the original comment on tcp_timeout_skb() suggests: the usefulness
of this feature is questionable. It also wastes cycles walking the
sack scoreboard and is actually harmful because of false recovery.

It's time to remove this.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19 23:51:17 -07:00
Nicolas Dichtel
caeaba7900 ipv6: add support of peer address
This patch adds the support of peer address for IPv6. For example, it is
possible to specify the remote end of a 6inY tunnel.
This was already possible in IPv4:
 ip addr add ip1 peer ip2 dev dev1

The peer address is specified with IFA_ADDRESS and the local address with
IFA_LOCAL (like explained in include/uapi/linux/if_addr.h).
Note that the API is not changed, because before this patch, it was not
possible to specify two different addresses in IFA_LOCAL and IFA_REMOTE.
There is a small change for the dump: if the peer is different from ::,
IFA_ADDRESS will contain the peer address instead of the local address.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19 15:09:26 -07:00
David S. Miller
5c4b274981 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains three Netfilter fixes and update
for the MAINTAINER file for your net tree, they are:

* Fix crash if nf_log_packet is called from conntrack, in that case
  both interfaces are NULL, from Hans Schillstrom. This bug introduced
  with the logging netns support in the previous merge window.

* Fix compilation of nf_log and nf_queue without CONFIG_PROC_FS,
  from myself. This bug was introduced in the previous merge window
  with the new netns support for the netfilter logging infrastructure.

* Fix possible crash in xt_TCPOPTSTRIP due to missing sanity
  checkings to validate that the TCP header is well-formed, from
  myself. I can find this bug in 2.6.25, probably it's been there
  since the beginning. I'll pass this to -stable.

* Update MAINTAINER file to point to new nf trees at git.kernel.org,
  remove Harald and use M: instead of P: (now obsolete tag) to
  keep Jozsef in the list of people.

Please, consider pulling this. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-16 14:32:42 -07:00
Colleen Twitty
6e16d90b52 cfg80211: Userspace may inform kernel of mesh auth method.
Authentication takes place in userspace, but the beacon is
generated in the kernel.  Allow userspace to inform the
kernel of the authentication method so the appropriate
mesh config IE can be set prior to beacon generation when
joining the MBSS.

Signed-off-by: Colleen Twitty <colleen@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16 22:39:43 +02:00
Robert P. J. Day
03f831a6f7 wireless: fix kerneldoc content in *80211.h files.
Make kerneldoc content match header file content, no functional
change.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16 22:39:39 +02:00
Felix Fietkau
ef0621e805 mac80211: add support for per-chain signal strength reporting
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[fix unit documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16 22:39:38 +02:00
Felix Fietkau
119363c7dc cfg80211: add support for per-chain signal strength reporting
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16 22:39:37 +02:00
Felix Fietkau
f6b3d85f7f mac80211: fix spurious RCU warning and update documentation
Document rx vs tx status concurrency requirements.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-05-16 22:38:05 +02:00
Hans Schillstrom
8cdb46da06 netfilter: log: netns NULL ptr bug when calling from conntrack
Since (69b34fb netfilter: xt_LOG: add net namespace support
for xt_LOG), we hit this:

[ 4224.708977] BUG: unable to handle kernel NULL pointer dereference at 0000000000000388
[ 4224.709074] IP: [<ffffffff8147f699>] ipt_log_packet+0x29/0x270

when callling log functions from conntrack both in and out
are NULL i.e. the net pointer is invalid.

Adding struct net *net in call to nf_logfn() will secure that
there always is a vaild net ptr.

Reported as netfilter's bugzilla bug 818:
https://bugzilla.netfilter.org/show_bug.cgi?id=818

Reported-by: Ronald <ronald645@gmail.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-15 14:11:07 +02:00
Eric Dumazet
f77d602124 ipv6: do not clear pinet6 field
We have seen multiple NULL dereferences in __inet6_lookup_established()

After analysis, I found that inet6_sk() could be NULL while the
check for sk_family == AF_INET6 was true.

Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
and TCP stacks.

Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
table, we no longer can clear pinet6 field.

This patch extends logic used in commit fcbdf09d96
("net: fix nulls list corruptions in sk_prot_alloc")

TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
to make sure we do not clear pinet6 field.

At socket clone phase, we do not really care, as cloning the parent (non
NULL) pinet6 is not adding a fatal race.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-11 16:26:38 -07:00
Konstantin Khlebnikov
b56141ab34 net: frag, fix race conditions in LRU list maintenance
This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit 3ef0eb0db4
("net: frag, move LRU list maintenance outside of rwlock")

One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().

Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.

This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.

I saw this kernel oops two times in a couple of days.

[119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
[119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[<ffffffff812ede89>]  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
[119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
[119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
[119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
[119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
[119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
[119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
[119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
[119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009]  [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
[119482.252921]  [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803]  [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
[119482.262658]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.267527]  [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
[119482.272412]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.277302]  [<ffffffff8155d068>] ip_rcv+0x268/0x320
[119482.282147]  [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
[119482.286998]  [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
[119482.291826]  [<ffffffff8151a650>] process_backlog+0xa0/0x160
[119482.296648]  [<ffffffff81519f29>] net_rx_action+0x139/0x220
[119482.301403]  [<ffffffff81053707>] __do_softirq+0xe7/0x220
[119482.306103]  [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
[119482.310809]  [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
[119482.315515]  [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
[119482.320219]  [<ffffffff8106d870>] kthread+0xc0/0xd0
[119482.324858]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.329460]  [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
[119482.334057]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.348675]  RSP <ffff880216d5db58>
[119482.353493] CR2: 0000000000000000

Oops happened on this path:
ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 11:06:51 -04:00
Linus Torvalds
20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Linus Torvalds
73287a43cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights (1721 non-merge commits, this has to be a record of some
  sort):

   1) Add 'random' mode to team driver, from Jiri Pirko and Eric
      Dumazet.

   2) Make it so that any driver that supports configuration of multiple
      MAC addresses can provide the forwarding database add and del
      calls by providing a default implementation and hooking that up if
      the driver doesn't have an explicit set of handlers.  From Vlad
      Yasevich.

   3) Support GSO segmentation over tunnels and other encapsulating
      devices such as VXLAN, from Pravin B Shelar.

   4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

   5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
      Dukkipati.

   6) In the PHY layer, allow supporting wake-on-lan in situations where
      the PHY registers have to be written for it to be configured.

      Use it to support wake-on-lan in mv643xx_eth.

      From Michael Stapelberg.

   7) Significantly improve firewire IPV6 support, from YOSHIFUJI
      Hideaki.

   8) Allow multiple packets to be sent in a single transmission using
      network coding in batman-adv, from Martin Hundebøll.

   9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

  10) Generalize the VXLAN forwarding tables so that there is more
      flexibility in configurating various aspects of the endpoints.
      From David Stevens.

  11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
      from Dmitry Kravkov.

  12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
      Neira Ayuso.

  13) Start adding networking selftests.

  14) In situations of overload on the same AF_PACKET fanout socket, or
      per-cpu packet receive queue, minimize drop by distributing the
      load to other cpus/fanouts.  From Willem de Bruijn and Eric
      Dumazet.

  15) Add support for new payload offset BPF instruction, from Daniel
      Borkmann.

  16) Convert several drivers over to mdoule_platform_driver(), from
      Sachin Kamat.

  17) Provide a minimal BPF JIT image disassembler userspace tool, from
      Daniel Borkmann.

  18) Rewrite F-RTO implementation in TCP to match the final
      specification of it in RFC4138 and RFC5682.  From Yuchung Cheng.

  19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
      you like netlink, so I implemented netlink dumping of netlink
      sockets.") From Andrey Vagin.

  20) Remove ugly passing of rtnetlink attributes into rtnl_doit
      functions, from Thomas Graf.

  21) Allow userspace to be able to see if a configuration change occurs
      in the middle of an address or device list dump, from Nicolas
      Dichtel.

  22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
      Frederic Sowa.

  23) Increase accuracy of packet length used by packet scheduler, from
      Jason Wang.

  24) Beginning set of changes to make ipv4/ipv6 fragment handling more
      scalable and less susceptible to overload and locking contention,
      from Jesper Dangaard Brouer.

  25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
      instead.  From Hong Zhiguo.

  26) Optimize route usage in IPVS by avoiding reference counting where
      possible, from Julian Anastasov.

  27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

  28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
      Eitzenberger.

  29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
      nfnetlink_log, and nfnetlink_queue.  From Gao feng.

  30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

  31) Support several new r8169 chips, from Hayes Wang.

  32) Support tokenized interface identifiers in ipv6, from Daniel
      Borkmann.

  33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

  34) Add 802.1ad vlan offload support, from Patrick McHardy.

  35) Support mmap() based netlink communication, also from Patrick
      McHardy.

  36) Support HW timestamping in mlx4 driver, from Amir Vadai.

  37) Rationalize AF_PACKET packet timestamping when transmitting, from
      Willem de Bruijn and Daniel Borkmann.

  38) Bring parity to what's provided by /proc/net/packet socket dumping
      and the info provided by netlink socket dumping of AF_PACKET
      sockets.  From Nicolas Dichtel.

  39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
      Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  filter: fix va_list build error
  af_unix: fix a fatal race with bit fields
  bnx2x: Prevent memory leak when cnic is absent
  bnx2x: correct reading of speed capabilities
  net: sctp: attribute printl with __printf for gcc fmt checks
  netlink: kconfig: move mmap i/o into netlink kconfig
  netpoll: convert mutex into a semaphore
  netlink: Fix skb ref counting.
  net_sched: act_ipt forward compat with xtables
  mlx4_en: fix a build error on 32bit arches
  Revert "bnx2x: allow nvram test to run when device is down"
  bridge: avoid OOPS if root port not found
  drivers: net: cpsw: fix kernel warn on cpsw irq enable
  sh_eth: use random MAC address if no valid one supplied
  3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
  tg3: fix to append hardware time stamping flags
  unix/stream: fix peeking with an offset larger than data in queue
  unix/dgram: fix peeking with an offset larger than data in queue
  unix/dgram: peek beyond 0-sized skbs
  openvswitch: Remove unneeded ovs_netdev_get_ifindex()
  ...
2013-05-01 14:08:52 -07:00
Eric Dumazet
60bc851ae5 af_unix: fix a fatal race with bit fields
Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.

As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().

recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.

[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-01 15:13:49 -04:00
Linus Torvalds
5d434fcb25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
  code cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
  mm: Convert print_symbol to %pSR
  gfs2: Convert print_symbol to %pSR
  m32r: Convert print_symbol to %pSR
  iostats.txt: add easy-to-find description for field 6
  x86 cmpxchg.h: fix wrong comment
  treewide: Fix typo in printk and comments
  doc: devicetree: Fix various typos
  docbook: fix 8250 naming in device-drivers
  pata_pdc2027x: Fix compiler warning
  treewide: Fix typo in printks
  mei: Fix comments in drivers/misc/mei
  treewide: Fix typos in kernel messages
  pm44xx: Fix comment for "CONFIG_CPU_IDLE"
  doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
  mmzone: correct "pags" to "pages" in comment.
  kernel-parameters: remove outdated 'noresidual' parameter
  Remove spurious _H suffixes from ifdef comments
  sound: Remove stray pluses from Kconfig file
  radio-shark: Fix printk "CONFIG_LED_CLASS"
  doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
  ...
2013-04-30 09:36:50 -07:00
David Howells
6bbefe8679 hostap: Don't use create_proc_read_entry()
Don't use create_proc_read_entry() as that is deprecated, but rather use
proc_create_data() and seq_file instead.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Jouni Malinen <j@w1.fi>
cc: John W. Linville <linville@tuxdriver.com>
cc: Johannes Berg <johannes@sipsolutions.net>
cc: linux-wireless@vger.kernel.org
cc: netdev@vger.kernel.org
cc: devel@driverdev.osuosl.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29 15:41:56 -04:00
John W. Linville
17a2911f33 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-04-29 15:31:57 -04:00
Eric Dumazet
aebda156a5 net: defer net_secret[] initialization
Instead of feeding net_secret[] at boot time, defer the init
at the point first socket is created.

This permits some platforms to use better entropy sources than
the ones available at boot time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-29 15:14:02 -04:00
David S. Miller
14d3692f04 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains relevant updates for the Netfilter
tree, they are:

* Enhancements for ipset: Add the counter extension for sets, this
  information can be used from the iptables set match, to change
  the matching behaviour. Jozsef required to add the extension
  infrastructure and moved the existing timeout support upon it.
  This also includes a change in net/sched/em_ipset to adapt it to
  the new extension structure.

* Enhancements for performance boosting in nfnetlink_queue: Add new
  configuration flags that allows user-space to receive big packets (GRO)
  and to disable checksumming calculation. This were proposed by Eric
  Dumazet during the Netfilter Workshop 2013 in Copenhagen. Florian
  Westphal was kind enough to find the time to materialize the proposal.

* A sparse fix from Simon, he noticed it in the SCTP NAT helper, the fix
  required a change in the interface of sctp_end_cksum.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-29 14:29:06 -04:00
Simon Horman
eee1d5a147 sctp: Correct type and usage of sctp_end_cksum()
Change the type of the crc32 parameter of sctp_end_cksum()
from __be32 to __u32 to reflect that fact that it is passed
to cpu_to_le32().

There are five in-tree users of sctp_end_cksum().
The following four had warnings flagged by sparse which are
no longer present with this change.

net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_nat_csum()
net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_csum_check()
net/sctp/input.c:sctp_rcv_checksum()
net/sctp/output.c:sctp_packet_transmit()

The fifth user is net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt().
It has been updated to pass a __u32 instead of a __be32,
the value in question was already calculated in cpu byte-order.

net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt() has also
been updated to assign the return value of sctp_end_cksum()
directly to a variable of type __le32, matching the
type of the return value. Previously the return value
was assigned to a variable of type __be32 and then that variable
was finally assigned to another variable of type __le32.

Problems flagged by sparse.
Compile and sparse tested only.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-29 20:09:08 +02:00
Florian Westphal
a5fedd43d5 netfilter: move skb_gso_segment into nfnetlink_queue module
skb_gso_segment is expensive, so it would be nice if we could
avoid it in the future. However, userspace needs to be prepared
to receive larger-than-mtu-packets (which will also have incorrect
l3/l4 checksums), so we cannot simply remove it.

The plan is to add a per-queue feature flag that userspace can
set when binding the queue.

The problem is that in nf_queue, we only have a queue number,
not the queue context/configuration settings.

This patch should have no impact other than the skb_gso_segment
call now being in a function that has access to the queue config
data.

A new size attribute in nf_queue_entry is needed so
nfnetlink_queue can duplicate the entry of the gso skb
when segmenting the skb while also copying the route key.

The follow up patch adds switch to disable skb_gso_segment when
queue config says so.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-29 20:09:05 +02:00
Jesper Dangaard Brouer
a4c4009f4f net: increase frag hash size
Increase fragmentation hash bucket size to 1024 from old 64 elems.

After we increased the frag mem limits commit c2a93660 (net: increase
fragment memory usage limits) the hash size of 64 elements is simply
too small.  Also considering the mem limit is per netns and the hash
table is shared for all netns.

For the embedded people, note that this increase will change the hash
table/array from using approx 1 Kbytes to 16 Kbytes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-29 13:33:06 -04:00
John W. Linville
375e875c69 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-04-26 08:36:00 -04:00
Pravin B Shelar
def3117493 genl: Allow concurrent genl callbacks.
All genl callbacks are serialized by genl-mutex. This can become
bottleneck in multi threaded case.
Following patch adds an parameter to genl_family so that a
particular family can get concurrent netlink callback without
genl_lock held.
New rw-sem is used to protect genl callback from genl family unregister.
in case of parallel_ops genl-family read-lock is taken for callbacks and
write lock is taken for register or unregistration for any family.
In case of locked genl family semaphore and gel-mutex is locked for
any openration.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25 01:43:15 -04:00
David S. Miller
d3734b0496 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains fixes for recently applied
Netfilter/IPVS updates to the net-next tree, most relevantly
they are:

* Fix sparse warnings introduced in the RCU conversion, from
  Julian Anastasov.

* Fix wrong endianness in the size field of IPVS sync messages,
  from Simon Horman.

* Fix missing if checking in nf_xfrm_me_harder, from Dan Carpenter.

* Fix off by one access in the IPVS SCTP tracking code, again from
  Dan Carpenter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25 00:53:40 -04:00
John W. Linville
6ed0e321a0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-04-24 10:54:20 -04:00
sjur.brandeland@stericsson.com
26ee65e680 caif: Remove my bouncing email address.
Remove my soon bouncing email address.
Also remove the "Contact:" line in file header.
The MAINTAINERS file is a better place to find the
contact person anyway.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-23 13:25:51 -04:00
Julian Anastasov
0a925864c1 ipvs: fix sparse warnings for some parameters
Some service fields are in network order:

- netmask: used once in network order and also as prefix len for IPv6
- port

Other parameters are in host order:

- struct ip_vs_flags: flags and mask moved between user and kernel only
- sync state: moved between user and kernel only
- syncid: sent over network as single octet

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-23 11:43:05 +09:00
David S. Miller
6e0895c2ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	drivers/net/ethernet/intel/igb/igb_main.c
	drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
	include/net/scm.h
	net/batman-adv/routing.c
	net/ipv4/tcp_input.c

The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.

The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.

An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.

Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.

Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.

Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22 20:32:51 -04:00
Daniel Borkmann
3e3251b3f2 net: sctp: minor: remove dead code from sctp_packet
struct sctp_packet is currently embedded into sctp_transport or
sits on the stack as 'singleton' in sctp_outq_flush(). Therefore,
its member 'malloced' is always 0, thus a kfree() is never called.
Because of that, we can just remove this code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22 16:25:21 -04:00
Eric Dumazet
3fb62c5d3f net: remove a stale comment for dl_next
dl_next member in struct request_sock doesn't need to be first.

We expect to insert a "struct common_sock" or a subset of it,
so this claim had to be verified.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22 15:55:48 -04:00
John W. Linville
6475cb05ee Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-04-22 14:58:14 -04:00
John W. Linville
e563589f71 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-04-22 14:56:41 -04:00
Felix Fietkau
0d528d85c5 mac80211: improve the rate control API
Allow rate control modules to pass a rate selection table to mac80211
and the driver. This allows drivers to fetch the most recent rate
selection from the sta pointer for already buffered frames. This allows
rate control to respond faster to sudden link changes and it is also a
step towards adding minstrel_ht support to drivers like iwlwifi.

When a driver sets IEEE80211_HW_SUPPORTS_RC_TABLE, mac80211 will not
fill info->control.rates with rates from the rate table (to preserve
explicit overrides by the rate control module). The driver then
explicitly calls ieee80211_get_tx_rates to merge overrides from
info->control.rates with defaults from the sta rate table.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-22 16:16:41 +02:00
Arend van Spriel
5de1798489 cfg80211: introduce critical protocol indication from user-space
Some protocols need a more reliable connection to complete
successful in reasonable time. This patch adds a user-space
API to indicate the wireless driver that a critical protocol
is about to commence and when it is done, using nl80211 primitives
NL80211_CMD_CRIT_PROTOCOL_START and NL80211_CRIT_PROTOCOL_STOP.

There can be only on critical protocol session started per
registered cfg80211 device.

The driver can support this by implementing the cfg80211 callbacks
.crit_proto_start() and .crit_proto_stop(). Examples of protocols
that can benefit from this are DHCP, EAPOL, APIPA. Exactly how the
link can/should be made more reliable is up to the driver. Things
to consider are avoid scanning, no multi-channel operations, and
alter coexistence schemes.

Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-22 15:48:00 +02:00
Alexander Bondar
908f8d07e9 mac80211: indicate admission control in TX queue parameters
Some driver implementations need to know whether mandatory
admission control is required by the AP for some ACs. Add
a parameter to the TX queue parameters indicating this.

As there's currently no support for admission control in
mac80211's AP implementation, it's only ever set for the
client implementation.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-22 15:33:06 +02:00
Johannes Berg
a42c74ee60 Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2013-04-22 15:31:43 +02:00
Linus Torvalds
83f1b4ba91 net: fix incorrect credentials passing
Commit 257b5358b3 ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.

Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.

This just undoes that (presumably unintentional) part of the commit.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-20 16:56:42 -04:00
Dan Carpenter
e15465e180 irda: small read past the end of array in debug code
The "reason" can come from skb->data[] and it hasn't been capped so it
can be from 0-255 instead of just 0-6.  For example in irlmp_state_dtr()
the code does:

	reason = skb->data[3];
	...
	irlmp_disconnect_indication(self, reason, skb);

Also LMREASON has a couple other values which don't have entries in the
irlmp_reasons[] array.  And 0xff is a valid reason as well which means
"unknown".

So far as I can see we don't actually care about "reason" except for in
the debug code.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:32:31 -04:00
Patrick McHardy
ec464e5dc5 netfilter: rename netlink related "pid" variables to "portid"
Get rid of the confusing mix of pid and portid and use portid consistently
for all netlink related socket identities.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:58:36 -04:00
Johan Hedberg
07dc93dd14 Bluetooth: Fix HCI command send functions to use const specifier
All HCI command send functions that take a pointer to the command
parameters do not need to modify the content in any way (they merely
copy the data to an skb). Therefore, the parameter type should be
declared const. This also allows passing already const parameters to
these APIs which previously would have generated a compiler warning.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-19 10:31:58 -03:00
Andre Guedes
76a388beaf Bluetooth: Rename LE_SCANNING_* macros
This patch renames LE_SCANNING_ENABLED and LE_SCANNING_DISABLED
macros to LE_SCAN_ENABLE and LE_SCAN_DISABLE in order to keep
the same prefix others LE scan macros have.

It also fixes le_scan_enable_req function so it uses the LE_SCAN_
ENABLE macro instead of a magic number.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-18 01:17:27 -03:00
Andre Guedes
525e296a28 Bluetooth: Add macros for filter duplicates values
This patch adds macros for filter_duplicates parameter values from
HCI LE Set Scan Enable command. It also fixes le_scan_enable_req
function so it uses the LE_SCAN_FILTER_DUP_ENABLE macro instead of
a magic number.

The LE_SCAN_FILTER_DUP_DISABLE was also defined since it will be
required to properly support the GAP Observer Role.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-18 01:17:05 -03:00
Andre Guedes
5df480b56e Bluetooth: Add LE scan type macros
This patch adds macros for active and passive LE scan type values.
The LE_SCAN_PASSIVE was also defined since it will be used in future
by LE connection routine and GAP Observer Role support.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-18 01:16:25 -03:00
Johan Hedberg
d2c5d77fff Bluetooth: Add reading of all local feature pages
With the introduction of CSA4 there is now also a features page number 2
available. This patch increments the maximum supported page number to 2
and adds code for reading all available pages (as long as we have
support for them - indicated by HCI_MAX_PAGES).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-18 00:26:25 -03:00
Johan Hedberg
cad718ed2f Bluetooth: Track feature pages in a single table
The local and remote features are organized by page number. Page 0
are the LMP features, page 1 the host features, and any pages beyond 1
features that future core specification versions may define. So far
we've only had the first two pages and two separate variables has been
convenient enough, however with the introduction of Core Specification
Addendum 4 there are features defined on page 2.

Instead of requiring the addition of a new variable each time a new page
number is defined, this patch refactors the code to use a single table
for the features. The patch needs to update both the hci_dev and
hci_conn structures since there are macros that depend on the features
being represented in the same way in both of them.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-18 00:26:20 -03:00
Frédéric Dalleau
fa5513be2b Bluetooth: Move and rename hci_conn_accept
Since this function is only used by sco, move it from hci_event.c to
sco.c and rename to sco_conn_defer_accept. Make it static.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-18 00:17:54 -03:00
Daniel Borkmann
c1db7a26ac net: sctp: sctp_ulpq: remove 'malloced' struct member
The structure sctp_ulpq is embedded into sctp_association and never
separately allocated, also ulpq->malloced is always 0, so that
kfree() is never called. Therefore, remove this code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:13:02 -04:00
Daniel Borkmann
50181c07cb net: sctp: sctp_bind_addr: remove dead code
The sctp_bind_addr structure has a 'malloced' member that is
always set to 0, thus in sctp_bind_addr_free() the kfree()
part can never be called. This part is embedded into
sctp_ep_common anyway and never alloced.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:13:02 -04:00
Daniel Borkmann
8fa5df6d21 net: sctp: sctp_transport: remove unused variable
sctp_transport's member 'malloced' is set to 1, never evaluated
and the structure is kfreed anyway. So just remove it.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:13:02 -04:00
Daniel Borkmann
165a4c3127 net: sctp: sctp_outq: remove 'malloced' from its struct
sctp_outq is embedded into sctp_association, and thus never
kmalloced in any way. Also, malloced is always 0, thus kfree()
is never called. Therefore, remove that dead piece of code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:13:02 -04:00
Daniel Borkmann
ee16371e6c net: sctp: sctp_inq: remove dead code
sctp_inq is never kmalloced, since it's integrated into sctp_ep_common
and only initialized from eps and assocs. Therefore, remove the dead
code from there.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:13:02 -04:00
Daniel Borkmann
542c2d8320 net: sctp: sctp_ssnmap: remove 'malloced' element from struct
sctp_ssnmap_init() can only be called from sctp_ssnmap_new()
where malloced is always set to 1. Thus, when we call
sctp_ssnmap_free() the test for map->malloced evaluates always
to true.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:13:02 -04:00
David Herrmann
2c8e1411e9 Bluetooth: l2cap: add l2cap_user sub-modules
Several sub-modules like HIDP, rfcomm, ... need to track l2cap
connections. The l2cap_conn->hcon->dev object is used as parent for sysfs
devices so the sub-modules need to be notified when the hci_conn object is
removed from sysfs.

As submodules normally use the l2cap layer, the l2cap_user objects are
registered there instead of on the underlying hci_conn object. This avoids
any direct dependency on the HCI layer and lets the l2cap core handle any
specifics.

This patch introduces l2cap_user objects which contain a "probe" and
"remove" callback. You can register them on any l2cap_conn object and if
it is active, the "probe" callback will get called. Otherwise, an error is
returned.

The l2cap_conn object will call your "remove" callback directly before it
is removed from user-space. This allows you to remove your submodules
_before_ the parent l2cap_conn and hci_conn object is removed.

At any time you can asynchronously unregister your l2cap_user object if
your submodule vanishes before the l2cap_conn object does.

There is no way around l2cap_user. If we want wire-protocols in the
kernel, we always want the hci_conn object as parent in the sysfs tree. We
cannot use a channel here since we might need multiple channels for a
single protocol.
But the problem is, we _must_ get notified when an l2cap_conn object is
removed. We cannot use reference-counting for object-removal! This is not
how it works. If a hardware is removed, we should immediately remove the
object from sysfs. Any other behavior would be inconsistent with the rest
of the system. Also note that device_del() might sleep, but it doesn't
wait for user-space or block very long. It only _unlinks_ the object from
sysfs and the whole device-tree. Everything else is handled by ref-counts!
This is exactly what the other sub-modules must do: unlink their devices
when the "remove" l2cap_user callback is called. They should not do any
cleanup or synchronous shutdowns.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-17 03:03:43 -03:00
David Herrmann
9c903e373c Bluetooth: l2cap: introduce l2cap_conn ref-counting
If we want to use l2cap_conn outside of l2cap_core.c, we need refcounting
for these objects. Otherwise, we cannot synchronize l2cap locks with
outside locks and end up with deadlocks.

Hence, introduce ref-counting for l2cap_conn objects. This doesn't affect
l2cap internals at all, as they use a direct synchronization.
We also keep a reference to the parent hci_conn for locking purposes as
l2cap_conn depends on this. This doesn't affect the connection itself but
only the lifetime of the (dead) object.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-17 03:02:10 -03:00
David Herrmann
f53c20e936 Bluetooth: allow constant arguments for bacmp()/bacpy()
There is no reason to require the source arguments to be writeable so fix
this to allow constant source addresses.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-17 02:56:37 -03:00
David Herrmann
8d12356f33 Bluetooth: introduce hci_conn ref-counting
We currently do not allow using hci_conn from outside of HCI-core.
However, several other users could make great use of it. This includes
HIDP, rfcomm and all other sub-protocols that rely on an active
connection.

Hence, we now introduce hci_conn ref-counting. We currently never call
get_device(). put_device() is exclusively used in hci_conn_del_sysfs().
Hence, we currently never have a greater device-refcnt than 1.
Therefore, it is safe to move the put_device() call from
hci_conn_del_sysfs() to hci_conn_del() (it's the only caller). In fact,
this even fixes a "use-after-free" bug as we access hci_conn after calling
hci_conn_del_sysfs() in hci_conn_del().

From now on we can add references to hci_conn objects in other layers
(like l2cap_sock, HIDP, rfcomm, ...) and grab a reference via
hci_conn_get(). This does _not_ guarantee, that the connection is still
alive. But, this isn't what we want. We can simply lock the hci_conn
device and use "device_is_registered(hci_conn->dev)" to test that.
However, this is hardly necessary as outside users should never rely on
the HCI connection to be alive, anyway. Instead, they should solely rely
on the device-object to be available.
But if sub-devices want the hci_conn object as sysfs parent, they need to
be notified when the connection drops. This will be introduced in later
patches with l2cap_users.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-17 02:45:22 -03:00
David Herrmann
fc225c3f5d Bluetooth: remove unneeded hci_conn_hold/put_device()
hci_conn_hold/put_device() is used to control when hci_conn->dev is no
longer needed and can be deleted from the system. Lets first look how they
are currently used throughout the code (excluding HIDP!).

All code that uses hci_conn_hold_device() looks like this:
    ...
    hci_conn_hold_device();
    hci_conn_add_sysfs();
    ...
On the other side, hci_conn_put_device() is exclusively used in
hci_conn_del().

So, considering that hci_conn_del() must not be called twice (which would
fail horribly), we know that hci_conn_put_device() is only called _once_
(which is in hci_conn_del()).
On the other hand, hci_conn_add_sysfs() must not be called twice, either
(it would call device_add twice, which breaks the device, see
drivers/base/core.c). So we know that hci_conn_hold_device() is also
called only once (it's only called directly before hci_conn_add_sysfs()).

So hold and put are known to be called only once. That means we can safely
remove them and directly call hci_conn_del_sysfs() in hci_conn_del().

But there is one issue left: HIDP also uses hci_conn_hold/put_device().
However, this case can be ignored and simply removed as it is totally
broken. The issue is, the only thing HIDP delays with
hci_conn_hold_device() is the removal of the hci_conn->dev from sysfs.
But, the hci_conn device has no mechanism to get notified when its own
parent (hci_dev) gets removed from sysfs. hci_dev_hold/put() does _not_
control when it is removed but only when the device object is created
and destroyed.
And hci_dev calls hci_conn_flush_*() when it removes itself from sysfs,
which itself causes hci_conn_del() to be called, but it does _not_ cause
hci_conn_del_sysfs() to be called, which is wrong.

Hence, we fix it to call hci_conn_del_sysfs() in hci_conn_del(). This
guarantees that a hci_conn object is removed from sysfs _before_ its
parent hci_dev is removed.

The changes to HIDP look scary, wrong and broken. However, if you look at
the HIDP session management, you will notice they're already broken in the
exact _same_ way (ever tried "unplugging" HIDP devices? Breaks _all_ the
time).
So this patch only makes HIDP look _scary_ and _obviously broken_. It does
not break HIDP itself, it already is!

See later patches in this series which fix HIDP to use proper
session-management.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-17 02:38:36 -03:00
Felix Fietkau
991fec0910 mac80211: fix CTS protection handling
The rates[0] CTS and RTS flags are only set after rate control has been
called, so minstrel cannot use them to for setting the number of
retries. This patch adds two new flags to explicitly indicate RTS/CTS use.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-16 23:42:30 +02:00
Felix Fietkau
2ffbe6d333 mac80211: fix and optimize MCS mask handling
Currently the code always copies the configured MCS mask (even if it is
set to default), but only uses it if legacy rates were also masked out.
Fix this by adding a flag that tracks whether the configured MCS mask is
set to default or not.
Optimize the code further by storing a pointer to the configured rate
mask in txrc instead of using memcpy.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-16 23:42:29 +02:00
Karl Beldan
6bc8312f95 mac80211: VHT off-by-one NSS
The number of VHT spatial streams (NSS) is found in:
- s8 ieee80211_tx_rate.rate.idx[6:4] (tx - filled by rate control)
- u8 ieee80211_rx_status.vht_nss     (rx - filled by driver)
Tx discriminates valid rates indexes with the sign bit and encodes NSS
starting from 0 to 7 (note this matches some hw encodings e.g IWLMVM).
Rx does not have the same constraints, and encodes NSS starting from 1
to 8 (note this matches what wireshark expects in the radiotap header).

To handle ieee80211_tx_rate.rate.idx[6:4] ieee80211_rate_set_vht() and
ieee80211_rate_get_vht_nss() assume their nss parameter and return value
respectively runs from 0 to 7.
ATM, there are only 2 users of these: cfg.c:sta_set_rate_info_t() and
iwlwifi/mvm/tx.c:iwl_mvm_hwrate_to_tx_control(), but both assume nss
runs from 1 to 8.
This patch fixes this inconsistency by making ieee80211_rate_set_vht()
and ieee80211_rate_get_vht_nss() handle an nss running from 1 to 8.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-16 16:02:18 +02:00
Johannes Berg
85220d71bf mac80211: support secondary channel offset in CSA
Add support for the secondary channel offset IE in channel
switch announcements. This is necessary for proper handling
of CSA on HT access points.

For this to work it is also necessary to convert everything
here to use chandef structs instead of just channels. The
driver updates aren't really correct though. In particular,
the TI wl18xx driver update can't possibly be right since
it just ignores the new channel width for lack of firmware
API.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-16 15:29:44 +02:00
Johannes Berg
1ce3e82b0e cfg80211: add ieee80211_operating_class_to_band
This function converts a (global only!) operating
class to an internal band identifier. This will
be needed for extended channel switch support.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-16 15:29:43 +02:00
Daniel Borkmann
0022d2dd4d net: sctp: minor: make sctp_ep_common's member 'dead' a bool
Since dead only holds two states (0,1), make it a bool instead
of a 'char', which is more appropriate for its purpose.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15 14:11:37 -04:00
Daniel Borkmann
ff2266cddd net: sctp: remove sctp_ep_common struct member 'malloced'
There is actually no need to keep this member in the structure, because
after init it's always 1 anyway, thus always kfree called. This seems to
be an ancient leftover from the very initial implementation from 2.5
times. Only in case the initialization of an association fails, we leave
base.malloced as 0, but we nevertheless kfree it in the error path in
sctp_association_new().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15 14:11:37 -04:00
Daniel Borkmann
bf84a01063 net: sock: make sock_tx_timestamp void
Currently, sock_tx_timestamp() always returns 0. The comment that
describes the sock_tx_timestamp() function wrongly says that it
returns an error when an invalid argument is passed (from commit
20d4947353, ``net: socket infrastructure for SO_TIMESTAMPING'').
Make the function void, so that we can also remove all the unneeded
if conditions that check for such a _non-existant_ error case in the
output path.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14 15:41:49 -04:00
Cong Wang
f88c91ddba ipv6: statically link register_inet6addr_notifier()
Tomas reported the following build error:

net/built-in.o: In function `ieee80211_unregister_hw':
(.text+0x10f0e1): undefined reference to `unregister_inet6addr_notifier'
net/built-in.o: In function `ieee80211_register_hw':
(.text+0x10f610): undefined reference to `register_inet6addr_notifier'
make: *** [vmlinux] Error 1

when built IPv6 as a module.

So we have to statically link these symbols.

Reported-by: Tomas Melin <tomas.melin@iki.fi>
Cc: Tomas Melin <tomas.melin@iki.fi>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14 15:24:17 -04:00
Eric Dumazet
d6a4a10411 tcp: GSO should be TSQ friendly
I noticed that TSQ (TCP Small queues) was less effective when TSO is
turned off, and GSO is on. If BQL is not enabled, TSQ has then no
effect.

It turns out the GSO engine frees the original gso_skb at the time the
fragments are generated and queued to the NIC.

We should instead call the tcp_wfree() destructor for the last fragment,
to keep the flow control as intended in TSQ. This effectively limits
the number of queued packets on qdisc + NIC layers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-12 18:17:06 -04:00
Samuel Ortiz
be055b2f89 NFC: RFKILL support
All NFC devices will now get proper RFKILL support as long as they provide
some dev_up and dev_down hooks. Rfkilling an NFC device will bring it down
while it is left to userspace to bring it back up when being rfkill unblocked.
This is very similar to what Bluetooth does.

Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-12 16:54:45 +02:00
David S. Miller
16e3d9648a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1)  Allow to avoid copying DSCP during encapsulation
    by setting a SA flag. From Nicolas Dichtel.

2) Constify the netlink dispatch table, no need to modify it
   at runtime. From Mathias Krause.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 16:14:37 -04:00
David Herrmann
76a68ba0ae Bluetooth: rename hci_conn_put to hci_conn_drop
We use _get() and _put() for device ref-counting in the kernel. However,
hci_conn_put() is _not_ used for ref-counting, hence, rename it to
hci_conn_drop() so we can later fix ref-counting and introduce
hci_conn_put().

hci_conn_hold() and hci_conn_put() are currently used to manage how long a
connection should be held alive. When the last user drops the connection,
we spawn a delayed work that performs the disconnect. Obviously, this has
nothing to do with ref-counting for the _object_ but rather for the
keep-alive of the connection.

But we really _need_ proper ref-counting for the _object_ to allow
connection-users like rfcomm-tty, HIDP or others.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-11 16:34:15 -03:00
Marek Puzyniak
0ca54f6c5f mac80211: provide SSID in IBSS mode
Some drivers need SSID in AP and IBSS mode. AP SSID is provided
through BSS_CHANGED_SSID notification. There was no easy way to
do the same for IBSS. In IBSS mode SSID is known but was not
stored in BSS configuration. Extend the AP-mode functionality
to also work in IBSS mode.

Signed-off-by: Marek Puzyniak <marek.puzyniak@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-10 20:24:18 +02:00
John W. Linville
655d8e2328 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/ath/carl9170/debug.c
	drivers/net/wireless/ath/carl9170/main.c
	net/mac80211/ieee80211_i.h
2013-04-10 14:09:54 -04:00
John W. Linville
d3641409a0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/rt2x00/rt2x00pci.c
	net/mac80211/sta_info.c
	net/wireless/core.h
2013-04-10 10:39:27 -04:00
Al Viro
c10c062cad bluetooth: kill unused fops field in struct bt_sock_list
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:37 -04:00
Al Viro
b03166152f bluetooth: kill unused 'module' argument of bt_procfs_init()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:36 -04:00
Daniel Borkmann
1b86643411 net: sctp: introduce uapi header for sctp
This patch introduces an UAPI header for the SCTP protocol,
so that we can facilitate the maintenance and development of
user land applications or libraries, in particular in terms
of header synchronization.

To not break compatibility, some fragments from lksctp-tools'
netinet/sctp.h have been carefully included, while taking care
that neither kernel nor user land breaks, so both compile fine
with this change (for lksctp-tools I tested with the old
netinet/sctp.h header and with a newly adapted one that includes
the uapi sctp header). lksctp-tools smoke test run through
successfully as well in both cases.

Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:19:39 -04:00
Zefan Li
6ffd464102 netprio_cgroup: remove task_struct parameter from sock_update_netprio()
The callers always pass current to sock_update_netprio().

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:19:37 -04:00
Zefan Li
211d2f97e9 cls_cgroup: remove task_struct parameter from sock_update_classid()
The callers always pass current to sock_update_classid().

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:19:35 -04:00
Daniel Borkmann
617fe29d45 net: ipv6: only invalidate previously tokenized addresses
Instead of invalidating all IPv6 addresses with global scope
when one decides to use IPv6 tokens, we should only invalidate
previous tokens and leave the rest intact until they expire
eventually (or are intact forever). For doing this less greedy
approach, we're adding a bool at the end of inet6_ifaddr structure
instead, for two reasons: i) per-inet6_ifaddr flag space is
already used up, making it wider might not be a good idea,
since ii) also we do not necessarily need to export this
information into user space.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:12:23 -04:00
Ursula Braun
f9c41a62bb af_iucv: fix recvmsg by replacing skb_pull() function
When receiving data messages, the "BUG_ON(skb->len < skb->data_len)" in
the skb_pull() function triggers a kernel panic.

Replace the skb_pull logic by a per skb offset as advised by
Eric Dumazet.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 17:16:57 -04:00
Daniel Borkmann
f53adae4ea net: ipv6: add tokenized interface identifier support
This patch adds support for IPv6 tokenized IIDs, that allow
for administrators to assign well-known host-part addresses
to nodes whilst still obtaining global network prefix from
Router Advertisements. It is currently in draft status.

  The primary target for such support is server platforms
  where addresses are usually manually configured, rather
  than using DHCPv6 or SLAAC. By using tokenised identifiers,
  hosts can still determine their network prefix by use of
  SLAAC, but more readily be automatically renumbered should
  their network prefix change. [...]

  The disadvantage with static addresses is that they are
  likely to require manual editing should the network prefix
  in use change.  If instead there were a method to only
  manually configure the static identifier part of the IPv6
  address, then the address could be automatically updated
  when a new prefix was introduced, as described in [RFC4192]
  for example.  In such cases a DNS server might be
  configured with such a tokenised interface identifier of
  ::53, and SLAAC would use the token in constructing the
  interface address, using the advertised prefix. [...]

  http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02

The implementation is partially based on top of Mark K.
Thompson's proof of concept. However, it uses the Netlink
interface for configuration resp. data retrival, so that
it can be easily extended in future. Successfully tested
by myself.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 16:55:28 -04:00
Werner Almesberger
56aa091d60 ieee802154/nl-mac.c: make some MLME operations optional
Check for NULL before calling the following operations from "struct
ieee802154_mlme_ops": assoc_req, assoc_resp, disassoc_req, start_req,
and scan_req.

This fixes a current oops where those functions are called but not
implemented. It also updates the documentation to clarify that they
are now optional by design. If a call to an unimplemented function
is attempted, the kernel returns EOPNOTSUPP via netlink.

The following operations are still required: get_phy, get_pan_id,
get_short_addr, and get_dsn.

Note that the places where this patch changes the initialization
of "ret" should not affect the rest of the code since "ret" was
always set (again) before returning its value.

Signed-off-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:00:16 -04:00
Werner Almesberger
d87c8c6d15 IEEE 802.15.4: remove get_bsn from "struct ieee802154_mlme_ops"
It served no purpose: we never call it from anywhere in the stack
and the only driver that did implement it (fakehard) merely provided
a dummy value.

There is also considerable doubt whether it would make sense to
even attempt beacon processing at this level in the Linux kernel.

Signed-off-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:00:16 -04:00
Eric W. Biederman
6b0ee8c036 scm: Stop passing struct cred
Now that uids and gids are completely encapsulated in kuid_t
and kgid_t we no longer need to pass struct cred which allowed
us to test both the uid and the user namespace for equality.

Passing struct cred potentially allows us to pass the entire group
list as BSD does but I don't believe the cost of cache line misses
justifies retaining code for a future potential application.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 18:58:55 -04:00
David S. Miller
d16658206a Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter and IPVS updates for
your net-next tree, most relevantly they are:

* Add net namespace support to NFLOG, ULOG and ebt_ulog and NFQUEUE.
  The LOG and ebt_log target has been also adapted, but they still
  depend on the syslog netnamespace that seems to be missing, from
  Gao Feng.

* Don't lose indications of congestion in IPv6 fragmentation handling,
  from Hannes Frederic Sowa.i

* IPVS conversion to use RCU, including some code consolidation patches
  and optimizations, also some from Julian Anastasov.

* cpu fanout support for NFQUEUE, from Holger Eitzenberger.

* Better error reporting to userspace when dropping packets from
  all our _*_[xfrm|route]_me_harder functions, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 12:22:06 -04:00
David Herrmann
b3916db32c Bluetooth: hidp: verify l2cap sockets
We need to verify that the given sockets actually are l2cap sockets. If
they aren't, we are not supposed to access bt_sk(sock) and we shouldn't
start the session if the offsets turn out to be valid local BT addresses.

That is, if someone passes a TCP socket to HIDCONNADD, then we access some
random offset in the TCP socket (which isn't even guaranteed to be valid).

Fix this by checking that the socket is an l2cap socket.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-05 23:44:14 -03:00
Gao feng
30e0c6a6be netfilter: nf_log: prepare net namespace support for loggers
This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
   and nf_log_set. The new nf_log_register is used to globally
   register the nf_logger and nf_log_set is used for enabling
   pernet support from nf_loggers.

   Per netns is not yet complete after this patch, it comes in
   separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
   yet complete after this patch, it only allows to bind the
   nf_logger to the protocol family from init_net and it skips
   other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
   After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:12:54 +02:00
Gao feng
f3c1a44a22 netfilter: make /proc/net/netfilter pernet
This patch makes this proc dentry pernet. So far only init_net
had a /proc/net/netfilter directory.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 19:35:02 +02:00
Jesper Dangaard Brouer
19952cc4f8 net: frag queue per hash bucket locking
This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

This patch is part of "net: frag performance followup"
 http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
 (http://thread.gmane.org/gmane.linux.network/257155)
 Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
 Ethernet flow-control.  A third interface is used for generating the
 DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS
 Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
 Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>

----
V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
  need the spinlock _bh versions, as Netfilter currently does a
  local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail
V3:
- Drop the chain_len counter per hash bucket.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:37:05 -04:00
Marcel Holtmann
5afff03815 Bluetooth: Remove driver init queue from core
The driver init queue is no longer needed. This can be all handled
inside the drivers now. So remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-04-04 19:28:25 +03:00
Marcel Holtmann
f41c70c4d5 Bluetooth: Add driver setup stage for early init
Some drivers require a special stage for their early init. This is
always specific to the driver or transport. So call back into driver to
allow bringing up the device.

The advantage with this stage is that the Bluetooth core is actually
handling the HCI layer now. This means that command and event processing
is available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-04-04 19:16:12 +03:00
Johan Hedberg
7b1abbbed0 Bluetooth: Add __hci_cmd_sync_ev function
This patch adds a __hci_cmd_sync_ev function, analogous to
__hci_cmd_sync except that it also takes an event parameter to indicate
that the command completes with a special event instead of command
complete. Internally this new function takes advantage of the
hci_req_add_ev function introduced in the previous patch.

The primary expected user of this new function are the setup routines of
HCI drivers which may want to send custom commands and return only when
they have completed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2013-04-04 19:16:10 +03:00
Johan Hedberg
02350a725f Bluetooth: Add support for custom event terminated commands
This patch adds support for having commands within HCI requests that do
not result in a command complete but some other event. This is at least
needed for some vendor specific commands to be issued in the
hdev->setup() procecure, but might also be useful for other commands.

The way that the support is implemented is by extending the skb control
buffer to have a field to indicate that the command is expected to
terminate with a special event. After sending the command each received
event can then be compared against this field through hdev->sent_cmd.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2013-04-04 19:16:08 +03:00
Johan Hedberg
75e84b7c52 Bluetooth: Add __hci_cmd_sync() helper function
This patch adds a helper function for sending a single HCI command
waiting for its completion and then returning back the parameters in the
resulting command complete event (if there was one).

The implementation is very similar to that of hci_req_sync() except that
instead of invocing a callback for sending HCI commands the function
constructs and sends one itself and after being woken up picks the last
received event from hdev->recv_evt (if it matches the right criteria)
and returns it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2013-04-04 19:16:06 +03:00
Johan Hedberg
b6ddb63823 Bluetooth: Track received events in hdev
This patch adds tracking of received HCI events to the hci_dev struct.
This is necessary so that a subsequent patch can implement a function
for sending a single command synchronously and returning the resulting
command complete parameters in the function return value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2013-04-04 19:16:04 +03:00
Andre Guedes
d4299ce6b3 Bluetooth: Remove unneeded hci_req_cmd_status function
This patch removes the hci_req_cmd_status function since it is not
used anymore. The HCI request framework now considers the HCI command
has complete once the Command Status or Command Complete Event is
received.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-04-04 11:12:34 +03:00
Neal Cardwell
8466563e16 tcp: Remove dead sysctl_tcp_cookie_size declaration
Remove a declaration left over from the TCPCT-ectomy. This sysctl is
no longer referenced anywhere since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:26:50 -04:00
Julian Anastasov
ceec4c3816 ipvs: convert services to rcu
This is the final step in RCU conversion.

Things that are removed:

- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
	RCU and work in parallel with configuration

Other changes:

- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
	the schedulers are prepared for such RCU readers
	even after done_service is called but they need
	to use synchronize_rcu because last ip_vs_scheduler_put
	can happen while RCU read-side critical sections
	use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
	If dests were present, the services are freed from
	the dest trash code

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:58 +02:00
Julian Anastasov
413c2d04e9 ipvs: convert dests to rcu
In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.

We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.

Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:57 +02:00
Julian Anastasov
ba3a3ce14e ipvs: convert sched_lock to spin lock
As all read_locks are gone spin lock is preferred.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:56 +02:00
Julian Anastasov
ed3ffc4e48 ipvs: do not expect result from done_service
This method releases the scheduler state,
it can not fail. Such change will help to properly
replace the scheduler in following patch.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:56 +02:00
Julian Anastasov
578bc3ef1e ipvs: reorganize dest trash
All dests will go to trash, no exceptions.
But we have to use new list node t_list for this, due
to RCU changes in following patches. Dests will wait there
initial grace period and later all conns and schedulers to
put their reference. The dests don't get reference for
staying in dest trash as before.

	As result, we do not load ip_vs_dest_put with
extra checks for last refcnt and the schedulers do not
need to play games with atomic_inc_not_zero while
selecting best destination.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:55 +02:00
Julian Anastasov
fca9c20ae1 ipvs: add ip_vs_dest_hold and ip_vs_dest_put
ip_vs_dest_hold will be used under RCU lock
while ip_vs_dest_put can be called even after dest
is removed from service, as it happens for conns and
some schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:48 +02:00
Julian Anastasov
6b6df46663 ipvs: preparations for using rcu in schedulers
Allow schedulers to use rcu_dereference when
returning destination on lookup. The RCU read-side critical
section will allow ip_vs_bind_dest to get dest refcnt as
preparation for the step where destinations will be
deleted without an IP_VS_WAIT_WHILE guard that holds the
packet processing during update.

	Add new optional scheduler methods add_dest,
del_dest and upd_dest. For now the methods are called
together with update_service but update_service will be
removed in a following change.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:47 +02:00
Julian Anastasov
9a05475ceb ipvs: avoid kmem_cache_zalloc in ip_vs_conn_new
We have many fields to set and few to reset,
use kmem_cache_alloc instead to save some cycles.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:46 +02:00
Julian Anastasov
1845ed0bb2 ipvs: reorder keys in connection structure
__ip_vs_conn_in_get and ip_vs_conn_out_get are
hot places. Optimize them, so that ports are matched first.
By moving net and fwmark below, on 32-bit arch we can fit
caddr in 32-byte cache line and all addresses in 64-byte
cache line.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:45 +02:00
Julian Anastasov
088339a57d ipvs: convert connection locking
Convert __ip_vs_conntbl_lock_array as follows:

- readers that do not modify conn lists will use RCU lock
- updaters that modify lists will use spinlock_t

Now for conn lookups we will use RCU read-side
critical section. Without using __ip_vs_conn_get such
places have access to connection fields and can
dereference some pointers like pe and pe_data plus
the ability to update timer expiration. If full access
is required we contend for reference.

We add barrier in __ip_vs_conn_put, so that
other CPUs see the refcnt operation after other writes.

With the introduction of ip_vs_conn_unlink()
we try to reorganize ip_vs_conn_expire(), so that
unhashing of connections that should stay more time is
avoided, even if it is for very short time.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:45 +02:00
Julian Anastasov
276472eae0 ipvs: remove rs_lock by using RCU
rs_lock was used to protect rs_table (hash table)
from updaters (under global mutex) and readers (packet handlers).
We can remove rs_lock by using RCU lock for readers. Reclaiming
dest only with kfree_rcu is enough because the readers access
only fields from the ip_vs_dest structure.

Use hlist for rs_table.

As we are now using hlist_del_rcu, introduce in_rs_table
flag as replacement for the list_empty checks which do not
work with RCU. It is needed because only NAT dests are in
the rs_table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:43 +02:00
Julian Anastasov
363c97d743 ipvs: convert app locks
We use locks like tcp_app_lock, udp_app_lock,
sctp_app_lock to protect access to the protocol hash tables
from readers in packet context while the application
instances (inc) are [un]registered under global mutex.

As the hash tables are mostly read when conns are
created and bound to app, use RCU for readers and reclaim
app instance after grace period.

Simplify ip_vs_app_inc_get because we use usecnt
only for statistics and rely on module refcounting.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:43 +02:00
Julian Anastasov
026ace060d ipvs: optimize dst usage for real server
Currently when forwarding requests to real servers
we use dst_lock and atomic operations when cloning the
dst_cache value. As the dst_cache value does not change
most of the time it is better to use RCU and to lock
dst_lock only when we need to replace the obsoleted dst.
For this to work we keep dst_cache in new structure protected
by RCU. For packets to remote real servers we will use noref
version of dst_cache, it will be valid while we are in RCU
read-side critical section because now dst_release for replaced
dsts will be invoked after the grace period. Packets to
local real servers that are passed to local stack with
NF_ACCEPT need a dst clone.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:42 +02:00
Julian Anastasov
d1deae4d3a ipvs: rename functions related to dst_cache reset
Move and give better names to two functions:

- ip_vs_dst_reset to __ip_vs_dst_cache_reset
- __ip_vs_dev_reset to ip_vs_forget_dev

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:39 +02:00
Julian Anastasov
c90558dae5 ipvs: avoid routing by TOS for real server
Avoid replacing the cached route for real server
on every packet with different TOS. I doubt that routing
by TOS for real server is used at all, so we should be
better with such optimization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:37 +02:00
Keller, Jacob E
7d4c04fc17 net: add option to enable error queue packets waking select
Currently, when a socket receives something on the error queue it only wakes up
the socket on select if it is in the "read" list, that is the socket has
something to read. It is useful also to wake the socket if it is in the error
list, which would enable software to wait on error queue packets without waking
up for regular data on the socket. The main use case is for receiving
timestamped transmit packets which return the timestamp to the socket via the
error queue. This enables an application to select on the socket for the error
queue only instead of for the regular traffic.

-v2-
* Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
* Modified every socket poll function that checks error queue

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-31 19:44:20 -04:00
Jesper Dangaard Brouer
1b5ab0def4 net: use the frag lru_lock to protect netns_frags.nqueues update
Move the protection of netns_frags.nqueues updates under the LRU_lock,
instead of the write lock.  As they are located on the same cacheline,
and this is also needed when transitioning to use per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:33 -04:00
Pravin B Shelar
330305cc4a ipv4: Fix ip-header identification for gso packets.
ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab08127
(IP_GRE: Fix IP-Identification).

Following patch fixes it so that identification is always
incremented.

Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 13:50:05 -04:00
YOSHIFUJI Hideaki / 吉藤英明
6752c8db8e firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection.
Inspection of upper layer protocol is considered harmful, especially
if it is about ARP or other stateful upper layer protocol; driver
cannot (and should not) have full state of them.

IPv4 over Firewire module used to inspect ARP (both in sending path
and in receiving path), and record peer's GUID, max packet size, max
speed and fifo address.  This patch removes such inspection by extending
our "hardware address" definition to include other information as well:
max packet size, max speed and fifo.  By doing this, The neighbour
module in networking subsystem can cache them.

Note: As we have started ignoring sspd and max_rec in ARP/NDP, those
      information will not be used in the driver when sending.

When a packet is being sent, the IP layer fills our pseudo header with
the extended "hardware address", including GUID and fifo.  The driver
can look-up node-id (the real but rather volatile low-level address)
by GUID, and then the module can send the packet to the wire using
parameters provided in the extendedn hardware address.

This approach is realistic because IP over IEEE1394 (RFC2734) and IPv6
over IEEE1394 (RFC3146) share same "hardware address" format
in their address resolution protocols.

Here, extended "hardware address" is defined as follows:

union fwnet_hwaddr {
	u8 u[16];
	struct {
		__be64 uniq_id;		/* EUI-64			*/
		u8 max_rec;		/* max packet size		*/
		u8 sspd;		/* max speed			*/
		__be16 fifo_hi;		/* hi 16bits of FIFO addr	*/
		__be32 fifo_lo;		/* lo 32bits of FIFO addr	*/
	} __packed uc;
};

Note that Hardware address is declared as union, so that we can map full
IP address into this, when implementing MCAP (Multicast Cannel Allocation
Protocol) for IPv6, but IP and ARP subsystem do not need to know this
format in detail.

One difference between original ARP (RFC826) and 1394 ARP (RFC2734)
is that 1394 ARP Request/Reply do not contain the target hardware address
field (aka ar$tha).  This difference is handled in the ARP subsystem.

CC: Stephan Gatzka <stephan.gatzka@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:32:13 -04:00
Pravin B Shelar
c544193214 GRE: Refactor GRE tunneling code.
Following patch refactors GRE code into ip tunneling code and GRE
specific code. Common tunneling code is moved to ip_tunnel module.
ip_tunnel module is written as generic library which can be used
by different tunneling implementations.

ip_tunnel module contains following components:
 - packet xmit and rcv generic code. xmit flow looks like
   (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
 - hash table of all devices.
 - lookup for tunnel devices.
 - control plane operations like device create, destroy, ioctl, netlink
   operations code.
 - registration for tunneling modules, like gre, ipip etc.
 - define single pcpu_tstats dev->tstats.
 - struct tnl_ptk_info added to pass parsed tunnel packet parameters.

ipip.h header is renamed to ip_tunnel.h

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:18 -04:00
John W. Linville
48b81cc1d9 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-03-25 16:39:06 -04:00
John W. Linville
c78b3841fa Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-03-25 16:38:02 -04:00
Karl Beldan
675a0b049a mac80211: Use a cfg80211_chan_def in ieee80211_hw_conf_chan
Drivers that don't use chanctxes cannot perform VHT association because
they still use a "backward compatibility" pair of {ieee80211_channel,
nl80211_channel_type} in ieee80211_conf and ieee80211_local.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
[fix kernel-doc]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-25 19:19:35 +01:00
Pravin B Shelar
25c7704d8b ipv4: Fix ip-header identification for gso packets.
ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab08127
(IP_GRE: Fix IP-Identification).

Following patch fixes it so that identification is always
incremented.

Reported-by: Cong Wang <amwang@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2013-03-25 12:30:25 -04:00
David S. Miller
da13482534 Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS updates for
your net-next tree, they are:

* Better performance in nfnetlink_queue by avoiding copy from the
  packet to netlink message, from Eric Dumazet.

* Remove unnecessary locking in the exit path of ebt_ulog, from Gao Feng.

* Use new function ipv6_iface_scope_id in nf_ct_ipv6, from Hannes Frederic Sowa.

* A couple of sparse fixes for IPVS, from Julian Anastasov.

* Use xor hashing in nfnetlink_queue, as suggested by Eric Dumazet, from
  myself.

* Allow to dump expectations per master conntrack via ctnetlink, from myself.

* A couple of cleanups to use PTR_RET in module init path, from Silviu-Mihai
  Popescu.

* Remove nf_conntrack module a bit faster if netns are in use, from
  Vladimir Davydov.

* Use checksum_partial in ip6t_NPT, from YOSHIFUJI Hideaki.

* Sparse fix for nf_conntrack, from Stephen Hemminger.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 12:11:44 -04:00
Alexander Bondar
219c38674c mac80211: allow drivers to set default uAPSD parameters
mac80211 currently sets uAPSD parameters to have VO AC trigger-
and delivery-enabled, with maximum service period length.

Allow drivers to change these default settings since different
uAPSD client implementations may handle errors differently and
be able to recover from some errors.

Note: some APs may not function correctly if one or all ACs are
trigger- and delivery-enabled, see
http://thread.gmane.org/gmane.linux.kernel.wireless.general/93577.
We retested with this AP and later firmware doesn't have this
bug any more.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-25 14:43:05 +01:00
Jouni Malinen
8bf2429345 cfg80211: Document update_ft_ies() cfg80211_ops
This was forgotten from the commit that added support for FT
operations with drivers that implement SME.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-25 10:21:36 +01:00
Hannes Frederic Sowa
eec2e6185f ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
Hello!

After patch 1 got accepted to net-next I will also send a patch to
netfilter-devel to make the corresponding changes to the netfilter
reassembly logic.

Thanks,

  Hannes

-- >8 --
[PATCH 2/2] ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

This patch also ensures that INET_ECN_CE is propagated if one fragment
had the codepoint set.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:30 -04:00
Hannes Frederic Sowa
be991971d5 inet: generalize ipv4-only RFC3168 5.3 ecn fragmentation handling for future use by ipv6
This patch just moves some code arround to make the ip4_frag_ecn_table
and IPFRAG_ECN_* constants accessible from the other reassembly engines. I
also renamed ip4_frag_ecn_table to ip_frag_ecn_table.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:30 -04:00
Nicolas Dichtel
63998ac24f ipv6: provide addr and netconf dump consistency info
This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with
dev_base_seq to check if a change occurs during a netlink dump.
If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message
after the dump was interrupted.

Note that only dump of unicast addresses is checked (multicast and anycast are
not checked).

Reported-by: Junwei Zhang <junwei.zhang@6wind.com>
Reported-by: Hongjun Li <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:29 -04:00
Thomas Graf
661d2967b3 rtnetlink: Remove passing of attributes into rtnl_doit functions
With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:31:16 -04:00
Thomas Graf
58d7d8f9b2 decnet: Parse netlink attributes on our own
decnet is the only subsystem left that is relying on the global
netlink attribute buffer rta_buf. It's horrible design and we
want to get rid of it.

This converts all of decnet to do implicit attribute parsing. It
also gets rid of the error prone struct dn_kern_rta.

Yes, the fib_magic() stuff is not pretty.

It's compiled tested but I need someone with appropriate hardware
to test the patch since I don't have access to it.

Cc: linux-decnet-user@lists.sourceforge.net
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:31:16 -04:00
Janusz Dziedzic
67baf66339 mac80211: add P2P NoA settings
Add P2P NoA settings for STA mode.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
[fix docs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-22 14:13:42 +01:00
Yuchung Cheng
9b44190dc1 tcp: refactor F-RTO
The patch series refactor the F-RTO feature (RFC4138/5682).

This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features.  It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).

The new code implements newer F-RTO RFC5682 using CA_Loss processing
path.  F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently.  F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.

The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation.  Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:47:50 -04:00
John W. Linville
5470b462c3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-03-20 15:24:57 -04:00
David S. Miller
61816596d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull in the 'net' tree to get Daniel Borkmann's flow dissector
infrastructure change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:46:26 -04:00
Daniel Borkmann
8ed781668d flow_keys: include thoff into flow_keys for later usage
In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:14:36 -04:00
David S. Miller
90b2621fd4 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are:

* Restrict IPv6 stateless NPT targets to the mangle table. Many users are
  complaining that this target does not work in the nat table, which is the
  wrong table for it, from Florian Westphal.

* Fix possible use before initialization in the netns init path of several
  conntrack protocol trackers (introduced recently while improving conntrack
  netns support), from Gao Feng.

* Fix incorrect initialization of copy_range in nfnetlink_queue, spotted
  by Eric Dumazet during the NFWS2013, patch from myself.

* Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov.

* Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu
  not required anymore after change introduced in 3.7, again from Julian.

* Fix SYN looping in IPVS state sync if the backup is used a real server
  in DR/TUN modes, this required a new /proc entry to disable the director
  function when acting as backup, also from Julian.

* Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by
  Paul Bolle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 10:23:52 -04:00
Vladimir Davydov
dece40e848 netfilter: nf_conntrack: speed up module removal path if netns in use
The patch introduces nf_conntrack_cleanup_net_list(), which cleanups
nf_conntrack for a list of netns and calls synchronize_net() only once
for them all. This should reduce netns destruction time.

I've measured cleanup time for 1k dummy net ns. Here are the results:

 <without the patch>
 # modprobe nf_conntrack
 # time modprobe -r nf_conntrack

 real	0m10.337s
 user	0m0.000s
 sys	0m0.376s

 <with the patch>
 # modprobe nf_conntrack
 # time modprobe -r nf_conntrack

 real    0m5.661s
 user    0m0.000s
 sys     0m0.216s

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19 17:08:31 +01:00
Hannes Frederic Sowa
5a3da1fe95 inet: limit length of fragment queue hash table bucket lists
This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 10:28:36 -04:00
Julian Anastasov
0c12582fbc ipvs: add backup_only flag to avoid loops
Dmitry Akindinov is reporting for a problem where SYNs are looping
between the master and backup server when the backup server is used as
real server in DR mode and has IPVS rules to function as director.

Even when the backup function is enabled we continue to forward
traffic and schedule new connections when the current master is using
the backup server as real server. While this is not a problem for NAT,
for DR and TUN method the backup server can not determine if a request
comes from client or from director.

To avoid such loops add new sysctl flag backup_only. It can be needed
for DR/TUN setups that do not need backup and director function at the
same time. When the backup function is enabled we stop any forwarding
and pass the traffic to the local stack (real server mode). The flag
disables the director function when the backup function is enabled.

For setups that enable backup function for some virtual services and
director function for other virtual services there should be another
more complex solution to support DR/TUN mode, may be to assign
per-virtual service syncid value, so that we can differentiate the
requests.

Reported-by: Dmitry Akindinov <dimak@stalker.com>
Tested-by: German Myzovsky <lawyer@sipnet.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 21:21:51 +09:00
Julian Anastasov
b962abdc65 ipvs: fix some sparse warnings
Add missing __percpu annotations and make ip_vs_net_id static.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 21:18:38 +09:00
Johannes Berg
445ea4e83e mac80211: stop queues temporarily for flushing
Sometimes queues are flushed in the middle of
operation, which can lead to driver issues.
Stop queues temporarily, while flushing, to
avoid transmitting new packets while they are
being flushed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-18 20:15:05 +01:00
Johannes Berg
39ecc01d1b mac80211: pass queue bitmap to flush operation
There are a number of situations in which mac80211 only
really needs to flush queues for one virtual interface,
and in fact during this frames might be transmitted on
other virtual interfaces. Calculate and pass a queue
bitmap to the driver so it knows which queues to flush.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-18 20:15:03 +01:00
Stanislaw Gruszka
d260ff12e7 mac80211: remove vif debugfs driver callbacks
This basically reverts commit b207cdb07f.

Now is possible to use drv_{add,remove}_interface() and vif->debugfs_dir
to create/remove per interface debugfs files. Remove redundant
callbacks.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-18 20:10:05 +01:00
Stanislaw Gruszka
ddbfe860ac mac80211: move sdata debugfs dir to vif
There is need create driver own per interface debugfs files. This is
currently done by drv_{add,remove}_interface_debugfs() callbacks. But it
is possible that after we remove interface from the driver (i.e.
on suspend) we call drv_remove_interface_debugfs() function. Fixing this
problem will require to add call drv_{add,remove}_interface_debugfs()
anytime we create and remove interface in mac80211. So it's better to
add debugfs dir dentry to vif structure to allow to create/remove
custom debugfs driver files on drv_{add,remove}_interface().

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-18 20:10:04 +01:00
Johan Hedberg
f332ec6699 Bluetooth: Add reading of page scan parameters
These parameters are related to the "fast connectable" mode that can be
changed through the mgmt interface. Not all controllers properly reset
these values with HCI_Reset so they need to be read in order to be able
to verify whether the values are correct or not before enabling page
scan.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-18 15:35:02 -03:00
Johan Hedberg
1a4d3c4b37 Bluetooth: Add proper flag for fast connectable mode
In order to be able to represent fast connectable mode in the mgmt
settings we need to have a HCI dev flag for it. This patch adds the flag
and makes sure its value is changed whenever a mgmt_set_fast_connectable
command completes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-18 14:02:08 -03:00
Johan Hedberg
04b4edcbc9 Bluetooth: Handle AD updating through an async request
For proper control of the AD update and the related HCI commands it's
best to run the AD update through an async request instead of a
standalone HCI command. This patch changes the hci_update_ad() function
to take a request pointer and updates its users appropriately. E.g. the
function is no longer called after the init sequence but during stage 3
of the init sequence.

The TX power is read during the init sequence, so we don't need an
explicit update whenever it is read and the AD update based on the local
name should be done through the local name mgmt handler. The only other
user is the update based on enabling advertising. This part is still
kept as there is no mgmt API to enable it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-18 14:02:04 -03:00
Johan Hedberg
2cc6fb0049 Bluetooth: Add a define for the HCI persistent flags mask
We'll need to use this mask also when powering off the HCI device
so it's better to have this in a single and visible place.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-18 14:02:02 -03:00
Johan Hedberg
2908fe31cf Bluetooth: Remove useless HCI_PENDING_CLASS flag
Now that class related operations are tracked through asynchronous HCI
requests this flag is no longer needed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-18 14:02:01 -03:00
John W. Linville
49c87cd1ea Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	net/nfc/llcp/llcp.c
2013-03-18 09:39:21 -04:00
Zhang Yanfei
3b77d6617a net: sctp: remove cast for kmalloc/kzalloc return value
remove cast for kmalloc/kzalloc return value.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-sctp@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-18 14:16:00 +01:00
Christoph Paasch
1a2c6181c4 tcp: Remove TCPCT
TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.

As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:

Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.

before this patch:
	average (among 7 runs) of 20845.5 Requests/Second
after:
	average (among 7 runs) of 21403.6 Requests/Second

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17 14:35:13 -04:00
Erwan Yvin
7a87590338 caif: remove caif_shm
caif_shm is an old implementation
caif_shm will be replaced by caif_virtio

[ As explained by Linus Walleij: "U5500 used this, but was cancelled
  and the silicon did not reach anyone outside ST-Ericsson.  Then for
  the next platforms, we have gone for the leaner & cleaner approach
  of using virtio, rpmesg and rproc." ]

Signed-off-by: Erwan Yvin <erwan.yvin@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Sjur Brendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17 12:16:38 -04:00
Zhouyi Zhou
aaa0c23cb9 Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug
When neighbour table is full, dst_neigh_lookup/dst_neigh_lookup_skb will return
-ENOBUFS which is absolutely non zero, while all the code in kernel which use
above functions assume failure only on zero return which will cause panic. (for
example: : https://bugzilla.kernel.org/show_bug.cgi?id=54731).

This patch corrects above error with smallest changes to kernel source code and
also correct two return value check missing bugs in drivers/infiniband/hw/cxgb4/cm.c

Tested on my x86_64 SMP machine

Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15 09:06:58 -04:00
Denis V. Lunev
5b9e12dbf9 ipv4: fix definition of FIB_TABLE_HASHSZ
a long time ago by the commit

  commit 93456b6d77
  Author: Denis V. Lunev <den@openvz.org>
  Date:   Thu Jan 10 03:23:38 2008 -0800

    [IPV4]: Unify access to the routing tables.

the defenition of FIB_HASH_TABLE size has obtained wrong dependency:
it should depend upon CONFIG_IP_MULTIPLE_TABLES (as was in the original
code) but it was depended from CONFIG_IP_ROUTE_MULTIPATH

This patch returns the situation to the original state.

The problem was spotted by Tingwei Liu.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Tingwei Liu <tingw.liu@gmail.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-13 10:47:09 -04:00
Nandita Dukkipati
6ba8a3b19e tcp: Tail loss probe (TLP)
This patch series implement the Tail loss probe (TLP) algorithm described
in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
first patch implements the basic algorithm.

TLP's goal is to reduce tail latency of short transactions. It achieves
this by converting retransmission timeouts (RTOs) occuring due
to tail losses (losses at end of transactions) into fast recovery.
TLP transmits one packet in two round-trips when a connection is in
Open state and isn't receiving any ACKs. The transmitted packet, aka
loss probe, can be either new or a retransmission. When there is tail
loss, the ACK from a loss probe triggers FACK/early-retransmit based
fast recovery, thus avoiding a costly RTO. In the absence of loss,
there is no change in the connection state.

PTO stands for probe timeout. It is a timer event indicating
that an ACK is overdue and triggers a loss probe packet. The PTO value
is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
ACK timer when there is only one oustanding packet.

TLP Algorithm

On transmission of new data in Open state:
  -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms).
  -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
  -> PTO = min(PTO, RTO)

Conditions for scheduling PTO:
  -> Connection is in Open state.
  -> Connection is either cwnd limited or no new data to send.
  -> Number of probes per tail loss episode is limited to one.
  -> Connection is SACK enabled.

When PTO fires:
  new_segment_exists:
    -> transmit new segment.
    -> packets_out++. cwnd remains same.

  no_new_packet:
    -> retransmit the last segment.
       Its ACK triggers FACK or early retransmit based recovery.

ACK path:
  -> rearm RTO at start of ACK processing.
  -> reschedule PTO if need be.

In addition, the patch includes a small variation to the Early Retransmit
(ER) algorithm, such that ER and TLP together can in principle recover any
N-degree of tail loss through fast recovery. TLP is controlled by the same
sysctl as ER, tcp_early_retrans sysctl.
tcp_early_retrans==0; disables TLP and ER.
		 ==1; enables RFC5827 ER.
		 ==2; delayed ER.
		 ==3; TLP and delayed ER. [DEFAULT]
		 ==4; TLP only.

The TLP patch series have been extensively tested on Google Web servers.
It is most effective for short Web trasactions, where it reduced RTOs by 15%
and improved HTTP response time (average by 6%, 99th percentile by 10%).
The transmitted probes account for <0.5% of the overall transmissions.

Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 08:30:34 -04:00
Alexander Bondar
488b366a45 mac80211: add driver callback for per-interface multicast filter
Some devices have multicast filter capability for each individual
virtual interface rather than just a global one. Add an interface
specific driver callback allowing such drivers to configure this.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-11 16:22:14 +02:00
Cong Wang
e8f72ea4a1 ipv6: introduce ip6tunnel_xmit() helper
Similar to iptunnel_xmit(), group these operations into a
helper function.

This by the way fixes the missing u64_stats_update_begin()
and u64_stats_update_end() for 32 bit arch.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 16:53:34 -04:00
Cong Wang
6aed0c8bf7 tunnel: use iptunnel_xmit() again
With recent patches from Pravin, most tunnels can't use iptunnel_xmit()
any more, due to ip_select_ident() and skb->ip_summed. But we can just
move these operations out of iptunnel_xmit(), so that tunnels can
use it again.

This by the way fixes a bug in vxlan (missing nf_reset()) for net-next.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 03:05:44 -04:00
Andre Guedes
e348fe6bba Bluetooth: Make hci_req_add returning void
Since no one checks the returning value of hci_req_add and HCI
request errors are now handled in hci_req_run, we can make hci_
req_add returning void.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-09 17:11:23 -03:00
Andre Guedes
5d73e0342f Bluetooth: HCI request error handling
When we are building a HCI request with more than one HCI command
and one of the hci_req_add calls fail, we should have some cleanup
routine so the HCI commands already queued on HCI request can be
deleted. Otherwise, we will face some memory leaks issues.

This patch implements the HCI request error handling which is the
following: If a hci_req_add fails, we save the error code in hci_
request. Once hci_req_run is called, we verify the error field. If
it is different from zero, we delete all HCI commands already queued
and return the error code.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-09 17:10:47 -03:00
Hannes Frederic Sowa
b7ef213ef6 ipv6: introdcue __ipv6_addr_needs_scope_id and ipv6_iface_scope_id helper functions
__ipv6_addr_needs_scope_id checks if an ipv6 address needs to supply
a 'sin6_scope_id != 0'. 'sin6_scope_id != 0' was enforced in case
of link-local addresses. To support interface-local multicast these
checks had to be enhanced and are now consolidated into these new helper
functions.

v2:
a) migrated to struct ipv6_addr_props

v3:
a) reverted changes for ipv6_addr_props
b) test for address type instead of comparing scope

v4:
a) unchanged

Suggested-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:29:22 -05:00
Johan Hedberg
cecbb967b2 Bluetooth: Remove unused hdev->init_last_cmd
This variable is no longer needed (due to async HCI request support and
the conversion of hci_req_sync to use it), so it can be safely removed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:40:27 -03:00
Johan Hedberg
42c6b129cd Bluetooth: Use async requests internally in hci_req_sync
This patch converts the hci_req_sync() procedure to internaly use the
asynchronous HCI requests.

The hci_req_sync mechanism relies on hci_req_complete() calls from
hci_event.c into hci_core.c whenever a HCI command completes. This is
very similar to what asynchronous requests do and makes the conversion
fairly straight forward by converting hci_req_complete into a request
complete callback. By this change hci_req_complete (renamed to
hci_req_sync_complete) becomes private to hci_core.c and all calls to it
can be removed from hci_event.c.

The commands in each hci_req_sync procedure are collected into their own
request by passing the hci_request pointer to the request callback
(instead of the hci_dev pointer). The one slight exception is the HCI
init request which has the special handling of HCI driver specific
initialization commands. These commands are run in their own request
prior to the "main" init request.

One other extra change that this patch must contain is the handling of
spontaneous HCI reset complete events that some controllers exhibit.
These were previously handled in the hci_req_complete function but the
right place for them now becomes the hci_req_cmd_complete function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:40:27 -03:00
Johan Hedberg
9238f36a5a Bluetooth: Add request cmd_complete and cmd_status functions
This patch introduces functions to process the HCI request state when
receiving HCI Command Status or Command Complete events. Some HCI
commands, like Inquiry do not result in a Command complete event so
special handling is needed for them. Inquiry is a particularly important
one since it is the only forseeable "non-cmd_complete" command that will
make good use of the request functionality, and its completion is either
indicated by an Inquiry Complete event of a successful Command Complete
for HCI_Inquiry_Cancel.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:40:26 -03:00
Johan Hedberg
71c76a170e Bluetooth: Introduce new hci_req_add function
This function is analogous to hci_send_cmd() but instead of directly
queuing the command to hdev->cmd_q it adds it to the local queue of the
asynchronous HCI request being build (inside struct hci_request).

This is the main function used for building asynchronous requests and
there should be one or more calls to it between calls to hci_req_init
and hci_req_run.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:40:26 -03:00
Johan Hedberg
3119ae9599 Bluetooth: Add initial skeleton for asynchronous HCI requests
This patch adds the initial definitions and functions for asynchronous
HCI requests. Asynchronous requests are essentially a group of HCI
commands together with an optional completion callback. The request is
tracked through the already existing command queue by having the
necessary context information as part of the control buffer of each skb.

The only information needed in the skb control buffer is a flag for
indicating that the skb is the start of a request as well as the
optional complete callback that should be used when the request is
complete (this will be found in the last skb of the request).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:40:26 -03:00
Dean Jenkins
08c30aca9e Bluetooth: Remove RFCOMM session refcnt
Previous commits have improved the handling of the RFCOMM session
timer and the RFCOMM session pointers such that freed RFCOMM
session structures should no longer be erroneously accessed. The
RFCOMM session refcnt now has no purpose and will be deleted by
this commit.

Note that the RFCOMM session is now deleted as soon as the
RFCOMM control channel link is no longer required. This makes the
lifetime of the RFCOMM session deterministic and absolute.
Previously with the refcnt, there was uncertainty about when
the session structure would be deleted because the relative
refcnt prevented the session structure from being deleted at will.

It was noted that the refcnt could malfunction under very heavy
real-time processor loading in embedded SMP environments. This
could cause premature RFCOMM session deletion or double session
deletion that could result in kernel crashes. Removal of the
refcnt prevents this issue.

There are 4 connection / disconnection RFCOMM session scenarios:
host initiated control link ---> host disconnected control link
host initiated ctrl link ---> remote device disconnected ctrl link
remote device initiated ctrl link ---> host disconnected ctrl link
remote device initiated ctrl link ---> remote device disc'ed ctrl link

The control channel connection procedures are independent of the
disconnection procedures. Strangely, the RFCOMM session refcnt was
applying special treatment so erroneously combining connection and
disconnection events. This commit fixes this issue by removing
some session code that used the "initiator" member of the session
structure that was intended for use with the data channels.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:40:24 -03:00
Dean Jenkins
8ff52f7d04 Bluetooth: Return RFCOMM session ptrs to avoid freed session
Unfortunately, the design retains local copies of the s RFCOMM
session pointer in various code blocks and this invites the erroneous
access to a freed RFCOMM session structure.

Therefore, return the RFCOMM session pointer back up the call stack
to avoid accessing a freed RFCOMM session structure. When the RFCOMM
session is deleted, NULL is passed up the call stack.

If active DLCs exist when the rfcomm session is terminating,
avoid a memory leak of rfcomm_dlc structures by ensuring that
rfcomm_session_close() is used instead of rfcomm_session_del().

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:40:24 -03:00
David Herrmann
be9f97f045 Bluetooth: change bt_sock_unregister() to return void
There is no reason a caller ever wants to check the return type of this
call. _Iff_ a user successfully called bt_sock_register(), they're allowed
to call bt_sock_unregister().
All other calls in the kernel (device_del, device_unregister, kfree(), ..)
that are logically equivalent return void. Lets not make callers think
they have to check the return type of this call and instead simply return
void.

We guarantee that after bt_sock_unregister() is called, the socket type
_is_ unregistered. If that is not what the caller wants, they're using the
wrong function, anyway.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:38:44 -03:00
Andre Guedes
bed7174834 Bluetooth: Rename hci_acl_disconn
As hci_acl_disconn function basically sends the HCI Disconnect Command
and it is used to disconnect ACL, SCO and LE links, renaming it to
hci_disconnect is more suitable.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-08 10:38:43 -03:00
Eric Dumazet
7f0e44ac9f ipv6 flowlabel: add __rcu annotations
Commit 18367681a1 (ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.)
omitted proper __rcu annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:33:10 -05:00
Eric Dumazet
b2fb4f54ec tcp: uninline tcp_prequeue()
tcp_prequeue() became too big to be inlined.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:22:39 -05:00
Johannes Berg
e943789edb mac80211: provide ieee80211_sta_eosp()
The irqsafe version ieee80211_sta_eosp_irqsafe() exists, but
drivers must not mix calls to any irqsafe/non-irqsafe function.
Both ath9k and iwlwifi, the likely first users of this interface,
use non-irqsafe RX/TX/TX status so must also use a non-irqsafe
version of this function. Since no driver uses the _irqsafe()
version, remove that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-07 14:33:21 +01:00
Thomas Pedersen
eef941e6d6 cfg80211: rename mesh station types
The mesh station types used to refer to whether the
station was secure or nonsecure. Really the salient
information is whether it is managed by the kernel or
userspace

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:11 +01:00
Thomas Pedersen
bb2798d45f nl80211: explicit userspace MPM
Secure mesh had the implicit requirement that the Mesh
Peering Management entity be in userspace.  However
userspace might want to implement an open MPM as well, so
specify a mesh setup parameter to indicate this.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:11 +01:00
Johannes Berg
55d942f424 mac80211: restrict peer's VHT capabilities to own
Implement restricting peer VHT capabilities to the device's own
capabilities. This is useful when a single driver supports more
than one device and the devices have different capabilities
(often they will differ in the number of spatial streams), but
in particular is also necessary for VHT capability overrides to
work correctly -- otherwise it'd be possible to e.g. advertise,
due to overrides, that TX-STBC is not supported, but then still
use it to TX to the AP because it supports RX-STBC.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:03 +01:00
Johannes Berg
a87121051c mac80211: remove IEEE80211_KEY_FLAG_WMM_STA
There's no driver using this flag, so it seems
that all drivers support HW crypto with WMM or
don't support it at all. Remove the flag and
code setting it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:59 +01:00
Jouni Malinen
355199e02b cfg80211: Extend support for IEEE 802.11r Fast BSS Transition
Add NL80211_CMD_UPDATE_FT_IES to support update of FT IEs to the WLAN
driver and NL80211_CMD_FT_EVENT to send FT events from the WLAN driver.
This will carry the target AP's MAC address along with the relevant
Information Elements. This event is used to report received FT IEs
(MDIE, FTIE, RSN IE, TIE, RICIE). These changes allow FT to be supported
with drivers that use an internal SME instead of user space option (like
FT implementation in wpa_supplicant with mac80211-based drivers).

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:51 +01:00
Ilan Peer
d339d5ca8e mac80211: Allow drivers to differentiate between ROC types
Some devices can handle remain on channel requests differently
based on the request type/priority. Add support to
differentiate between different ROC types, i.e., indicate that
the ROC is required for sending managment frames.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:49 +01:00
Johannes Berg
ee2aca343c cfg80211: add ability to override VHT capabilities
For testing it's sometimes useful to be able to
override certain VHT capability advertisement,
add the ability to do that in cfg80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:47 +01:00
Johannes Berg
77ee7c891a cfg80211: comprehensively check station changes
The station change API isn't being checked properly before
drivers are called, and as a result it is difficult to see
what should be allowed and what not.

In order to comprehensively check the API parameters parse
everything first, and then have the driver call a function
(cfg80211_check_station_change()) with the additionally
information about the kind of station that is being changed;
this allows the function to make better decisions than the
old code could.

While at it, also add a few checks, particularly in mesh
and clarify the TDLS station lifetime in documentation.

To be able to reduce a few checks, ignore any flag set bits
when the mask isn't set, they shouldn't be applied then.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:40 +01:00
Johannes Berg
2c1aabf33d cfg80211: constify station parameter pointers
All the pointers point right into the skb data and
not to anything that would be useful to change, so
make them const.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:38 +01:00
Johannes Berg
f8bacc2104 cfg80211: clean up mesh plink station change API
Make the ability to leave the plink_state unchanged not use a
magic -1 variable that isn't in the enum, but an explicit change
flag; reject invalid plink states or actions and move the needed
constants for plink actions to the right header file. Also
reject plink_state changes for non-mesh interfaces.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:37 +01:00
Nicolas Dichtel
a947b0a93e xfrm: allow to avoid copying DSCP during encapsulation
By default, DSCP is copying during encapsulation.
Copying the DSCP in IPsec tunneling may be a bit dangerous because packets with
different DSCP may get reordered relative to each other in the network and then
dropped by the remote IPsec GW if the reordering becomes too big compared to the
replay window.

It is possible to avoid this copy with netfilter rules, but it's very convenient
to be able to configure it for each SA directly.

This patch adds a toogle for this purpose. By default, it's not set to maintain
backward compatibility.

Field flags in struct xfrm_usersa_info is full, hence I add a new attribute.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-03-06 07:02:45 +01:00
Linus Torvalds
9da060d0ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A moderately sized pile of fixes, some specifically for merge window
  introduced regressions although others are for longer standing items
  and have been queued up for -stable.

  I'm kind of tired of all the RDS protocol bugs over the years, to be
  honest, it's way out of proportion to the number of people who
  actually use it.

   1) Fix missing range initialization in netfilter IPSET, from Jozsef
      Kadlecsik.

   2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
      Berg.

   3) Fix DMA syncing in SFC driver, from Ben Hutchings.

   4) Fix regression in BOND device MAC address setting, from Jiri
      Pirko.

   5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.

   6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
      fix from Dmitry Kravkov.

   7) Missing cfgspace_lock initialization in BCMA driver.

   8) Validate parameter size for SCTP assoc stats getsockopt(), from
      Guenter Roeck.

   9) Fix SCTP association hangs, from Lee A Roberts.

  10) Fix jumbo frame handling in r8169, from Francois Romieu.

  11) Fix phy_device memory leak, from Petr Malat.

  12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
      Mehrtens.

  13) Missing socket refcount release in L2TP, from Guillaume Nault.

  14) sctp_endpoint_init should respect passed in gfp_t, rather than use
      GFP_KERNEL unconditionally.  From Dan Carpenter.

  15) Add AISX AX88179 USB driver, from Freddy Xin.

  16) Remove MAINTAINERS entries for drivers deleted during the merge
      window, from Cesar Eduardo Barros.

  17) RDS protocol can try to allocate huge amounts of memory, check
      that the user's request length makes sense, from Cong Wang.

  18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
      bogus, definition.  From Cong Wang.

  19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
      from Frank Li.  Also, fix a build error introduced in the merge
      window.

  20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.

  21) Don't double count RTT measurements when we leave the TCP receive
      fast path, from Neal Cardwell."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tcp: fix double-counted receiver RTT when leaving receiver fast path
  CAIF: fix sparse warning for caif_usb
  rds: simplify a warning message
  net: fec: fix build error in no MXC platform
  net: ipv6: Don't purge default router if accept_ra=2
  net: fec: put tx to napi poll function to fix dead lock
  sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
  rds: limit the size allocated by rds_message_alloc()
  MAINTAINERS: remove eexpress
  MAINTAINERS: remove drivers/net/wan/cycx*
  MAINTAINERS: remove 3c505
  caif_dev: fix sparse warnings for caif_flow_cb
  ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
  sctp: use the passed in gfp flags instead GFP_KERNEL
  ipv[4|6]: correct dropwatch false positive in local_deliver_finish
  l2tp: Restore socket refcount when sendmsg succeeds
  net/phy: micrel: Disable asymmetric pause for KSZ9021
  bgmac: omit the fcs
  phy: Fix phy_device_free memory leak
  bnx2x: Fix KR2 work-around condition
  ...
2013-03-05 18:42:29 -08:00
Linus Torvalds
56a79b7b02 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull  more VFS bits from Al Viro:
 "Unfortunately, it looks like xattr series will have to wait until the
  next cycle ;-/

  This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
  etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
  more file_inode() work"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  constify path_get/path_put and fs_struct.c stuff
  fix nommu breakage in shmem.c
  cache the value of file_inode() in struct file
  9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
  9p: make sure ->lookup() adds fid to the right dentry
  9p: untangle ->lookup() a bit
  9p: double iput() in ->lookup() if d_materialise_unique() fails
  9p: v9fs_fid_add() can't fail now
  v9fs: get rid of v9fs_dentry
  9p: turn fid->dlist into hlist
  9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
  more file_inode() open-coded instances
  selinux: opened file can't have NULL or negative ->f_path.dentry

(In the meantime, the hlist traversal macros have changed, so this
required a semantic conflict fixup for the newly hlistified fid->dlist)
2013-03-03 13:23:03 -08:00
Eric Dumazet
79ffef1fe2 tcp: avoid wakeups for pure ACK
TCP prequeue mechanism purpose is to let incoming packets
being processed by the thread currently blocked in tcp_recvmsg(),
instead of behalf of the softirq handler, to better adapt flow
control on receiver host capacity to schedule the consumer.

But in typical request/answer workloads, we send request, then
block to receive the answer. And before the actual answer, TCP
stack receives the ACK packets acknowledging the request.

Processing pure ACK on behalf of the thread blocked in tcp_recvmsg()
is a waste of resources, as thread has to immediately sleep again
because it got no payload.

This patch avoids the extra context switches and scheduler overhead.

Before patch :

a:~# echo 0 >/proc/sys/net/ipv4/tcp_low_latency
a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k
231676

 Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k':

     116251.501765 task-clock                #   11.369 CPUs utilized
         5,025,463 context-switches          #    0.043 M/sec
         1,074,511 CPU-migrations            #    0.009 M/sec
           216,923 page-faults               #    0.002 M/sec
   311,636,972,396 cycles                    #    2.681 GHz
   260,507,138,069 stalled-cycles-frontend   #   83.59% frontend cycles idle
   155,590,092,840 stalled-cycles-backend    #   49.93% backend  cycles idle
   100,101,255,411 instructions              #    0.32  insns per cycle
                                             #    2.60  stalled cycles per insn
    16,535,930,999 branches                  #  142.243 M/sec
       646,483,591 branch-misses             #    3.91% of all branches

      10.225482774 seconds time elapsed

After patch :

a:~# echo 0 >/proc/sys/net/ipv4/tcp_low_latency
a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k
233297

 Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k':

      91084.870855 task-clock                #    8.887 CPUs utilized
         2,485,916 context-switches          #    0.027 M/sec
           815,520 CPU-migrations            #    0.009 M/sec
           216,932 page-faults               #    0.002 M/sec
   245,195,022,629 cycles                    #    2.692 GHz
   202,635,777,041 stalled-cycles-frontend   #   82.64% frontend cycles idle
   124,280,372,407 stalled-cycles-backend    #   50.69% backend  cycles idle
    83,457,289,618 instructions              #    0.34  insns per cycle
                                             #    2.43  stalled cycles per insn
    13,431,472,361 branches                  #  147.461 M/sec
       504,470,665 branch-misses             #    3.76% of all branches

      10.249594448 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-28 15:37:29 -05:00
Al Viro
c4d30967f3 9p: turn fid->dlist into hlist
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-27 22:51:08 -05:00
Sasha Levin
b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Linus Torvalds
1cef9350cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) ping_err() ICMP error handler looks at wrong ICMP header, from Li
    Wei.

 2) TCP socket hash function on ipv6 is too weak, from Eric Dumazet.

 3) netif_set_xps_queue() forgets to drop mutex on errors, fix from
    Alexander Duyck.

 4) sum_frag_mem_limit() can deadlock due to lack of BH disabling, fix
    from Eric Dumazet.

 5) TCP SYN data is miscalculated in tcp_send_syn_data(), because the
    amount of TCP option space was not taken into account properly in
    this code path.  Fix from yuchung Cheng.

 6) MLX4 driver allocates device queues with the wrong size, from Kleber
    Sacilotto.

 7) sock_diag can access past the end of the sock_diag_handlers[] array,
    from Mathias Krause.

 8) vlan_set_encap_proto() makes incorrect assumptions about where
    skb->data points, rework the logic so that it works regardless of
    where skb->data happens to be.  From Jesse Gross.

 9) Fix gianfar build failure with NET_POLL enabled, from Paul
    Gortmaker.

10) Fix Ipv4 ID setting and checksum calculations in GRE driver, from
   Pravin B Shelar.

11) bgmac driver does:

        int i;

        for (i = 0; ...; ...) {
                ...
                for (i = 0; ...; ...) {

    effectively corrupting the outer loop index, use a seperate
    variable for the inner loops.  From Rafał Miłecki.

12) Fix suspend bugs in smsc95xx driver, from Ming Lei.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
  usbnet: smsc95xx: rename FEATURE_AUTOSUSPEND
  usbnet: smsc95xx: fix broken runtime suspend
  usbnet: smsc95xx: fix suspend failure
  bgmac: fix indexing of 2nd level loops
  b43: Fix lockdep splat on module unload
  Revert "ip_gre: propogate target device GSO capability to the tunnel device"
  IP_GRE: Fix GRE_CSUM case.
  VXLAN: Use tunnel_ip_select_ident() for tunnel IP-Identification.
  IP_GRE: Fix IP-Identification.
  net/pasemi: Fix missing coding style
  vmxnet3: fix ethtool ring buffer size setting
  vmxnet3: make local function static
  bnx2x: remove dead code and make local funcs static
  gianfar: fix compile fail for NET_POLL=y due to struct packing
  vlan: adjust vlan_set_encap_proto() for its callers
  sock_diag: Simplify sock_diag_handlers[] handling in __sock_diag_rcv_msg
  sock_diag: Fix out-of-bounds access to sock_diag_handlers[]
  vxlan: remove depends on CONFIG_EXPERIMENTAL
  mlx4_en: fix allocation of CPU affinity reverse-map
  mlx4_en: fix allocation of device tx_cq
  ...
2013-02-26 11:44:11 -08:00
Linus Torvalds
94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Pravin B Shelar
490ab08127 IP_GRE: Fix IP-Identification.
GRE-GSO generates ip fragments with id 0,2,3,4... for every
GSO packet, which is not correct. Following patch fixes it
by setting ip-header id unique id of fragments are allowed.
As Eric Dumazet suggested it is optimized by using inner ip-header
whenever inner packet is ipv4.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-25 15:47:41 -05:00
Eric Dumazet
4cfb04854d net: fix possible deadlock in sum_frag_mem_limit
Dave Jones reported a lockdep splat occurring in IP defrag code.

commit 6d7b857d54 (net: use lib/percpu_counter API for
fragmentation mem accounting) added a possible deadlock.

Because percpu_counter_sum_positive() needs to acquire
a lock that can be used from softirq, we need to disable BH
in sum_frag_mem_limit()

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-22 15:10:19 -05:00
Li Wei
5b0520425e ipv4: fix error handling in icmp_protocol.
Now we handle icmp errors in each transport protocol's err_handler,
for icmp protocols, that is ping_err. Since this handler only care
of those icmp errors triggered by echo request, errors triggered
by echo reply(which sent by kernel) are sliently ignored.

So wrap ping_err() with icmp_err() to deal with those icmp errors.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-22 15:10:18 -05:00
Linus Torvalds
9afa3195b9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Assorted tiny fixes queued in trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits)
  DocBook: update EXPORT_SYMBOL entry to point at export.h
  Documentation: update top level 00-INDEX file with new additions
  ARM: at91/ide: remove unsused at91-ide Kconfig entry
  percpu_counter.h: comment code for better readability
  x86, efi: fix comment typo in head_32.S
  IB: cxgb3: delay freeing mem untill entirely done with it
  net: mvneta: remove unneeded version.h include
  time: x86: report_lost_ticks doesn't exist any more
  pcmcia: avoid static analysis complaint about use-after-free
  fs/jfs: Fix typo in comment : 'how may' -> 'how many'
  of: add missing documentation for of_platform_populate()
  btrfs: remove unnecessary cur_trans set before goto loop in join_transaction
  sound: soc: Fix typo in sound/codecs
  treewide: Fix typo in various drivers
  btrfs: fix comment typos
  Update ibmvscsi module name in Kconfig.
  powerpc: fix typo (utilties -> utilities)
  of: fix spelling mistake in comment
  h8300: Fix home page URL in h8300/README
  xtensa: Fix home page URL in Kconfig
  ...
2013-02-21 17:40:58 -08:00
Eric Dumazet
08dcdbf6a7 ipv6: use a stronger hash for tcp
It looks like its possible to open thousands of TCP IPv6
sessions on a server, all landing in a single slot of TCP hash
table. Incoming packets have to lookup sockets in a very
long list.

We should hash all bits from foreign IPv6 addresses, using
a salt and hash mix, not a simple XOR.

inet6_ehashfn() can also separately use the ports, instead
of xoring them.

Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-21 18:15:58 -05:00
YOSHIFUJI Hideaki / 吉藤英明
ecd9883724 ipv6: fix race condition regarding dst->expires and dst->from.
Eric Dumazet wrote:
| Some strange crashes happen in rt6_check_expired(), with access
| to random addresses.
|
| At first glance, it looks like the RTF_EXPIRES and
| stuff added in commit 1716a96101
| (ipv6: fix problem with expired dst cache)
| are racy : same dst could be manipulated at the same time
| on different cpus.
|
| At some point, our stack believes rt->dst.from contains a dst pointer,
| while its really a jiffie value (as rt->dst.expires shares the same area
| of memory)
|
| rt6_update_expires() should be fixed, or am I missing something ?
|
| CC Neil because of https://bugzilla.redhat.com/show_bug.cgi?id=892060

Because we do not have any locks for dst_entry, we cannot change
essential structure in the entry; e.g., we cannot change reference
to other entity.

To fix this issue, split 'from' and 'expires' field in dst_entry
out of union.  Once it is 'from' is assigned in the constructor,
keep the reference until the very last stage of the life time of
the object.

Of course, it is unsafe to change 'from', so make rt6_set_from simple
just for fresh entries.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Neil Horman <nhorman@tuxdriver.com>
CC: Gao Feng <gaofeng@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reported-by: Steinar H. Gunderson <sesse@google.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20 15:11:45 -05:00
David S. Miller
2ccba5433b Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contain updates for your net-next tree, they are:

* Fix (for just added) connlabel dependencies, from Florian Westphal.

* Add aliasing support for conntrack, thus users can either use -m state
  or -m conntrack from iptables while using the same kernel module, from
  Jozsef Kadlecsik.

* Some code refactoring for the CT target to merge common code in
  revision 0 and 1, from myself.

* Add aliasing support for CT, based on patch from Jozsef Kadlecsik.

* Add one mutex per nfnetlink subsystem, from myself.

* Improved logging for packets that are dropped by helpers, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 23:42:09 -05:00
David S. Miller
6338a53a2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net into net
Pull in 'net' to take in the bug fixes that didn't make it into
3.8-final.

Also, deal with the semantic conflict of the change made to
net/ipv6/xfrm6_policy.c   A missing rt6->n neighbour release
was added to 'net', but in 'net-next' we no longer cache the
neighbour entries in the ipv6 routes so that change is not
appropriate there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 23:34:21 -05:00
Pablo Neira Ayuso
b20ab9cc63 netfilter: nf_ct_helper: better logging for dropped packets
Connection tracking helpers have to drop packets under exceptional
situations. Currently, the user gets the following logging message
in case that happens:

	nf_ct_%s: dropping packet ...

However, depending on the helper, there are different reasons why a
packet can be dropped.

This patch modifies the existing code to provide more specific
error message in the scope of each helper to help users to debug
the reason why the packet has been dropped, ie:

	nf_ct_%s: dropping packet: reason ...

Thanks to Joe Perches for many formatting suggestions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-19 02:48:05 +01:00
John W. Linville
98d5fac233 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/dvm/tx.c
	drivers/net/wireless/ti/wlcore/sdio.c
	drivers/net/wireless/ti/wlcore/spi.c
2013-02-18 13:47:13 -05:00
Ying Xue
dec34fb0f5 net: fix a compile error when SOCK_REFCNT_DEBUG is enabled
When SOCK_REFCNT_DEBUG is enabled, below build error is met:

kernel/sysctl_binary.o: In function `sk_refcnt_debug_release':
include/net/sock.h:1025: multiple definition of `sk_refcnt_debug_release'
kernel/sysctl.o:include/net/sock.h:1025: first defined here
kernel/audit.o: In function `sk_refcnt_debug_release':
include/net/sock.h:1025: multiple definition of `sk_refcnt_debug_release'
kernel/sysctl.o:include/net/sock.h:1025: first defined here
make[1]: *** [kernel/built-in.o] Error 1
make: *** [kernel] Error 2

So we decide to make sk_refcnt_debug_release static to eliminate
the error.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:29:52 -05:00
Jouni Malinen
9d62a98617 cfg80211: Pass station (extended) capability info to kernel
The information of the peer's capabilities and extended capabilities are
required for the driver to perform TDLS Peer UAPSD operations and off
channel operations. This information of the peer is passed from user space
using NL80211_CMD_SET_STATION command. This commit enhances
the function nl80211_set_station to pass the capability information of
the peer to the driver.

Similarly, there may be need for capability information for other modes,
so allow this to be provided with both add_station and change_station.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:43 +01:00
Johannes Berg
a50df0c4c0 cfg80211: advertise extended capabilities to userspace
In many cases, userspace may need to know which of the
802.11 extended capabilities ("Extended Capabilities
element") are implemented in the driver or device, to
include them e.g. in beacons, assoc request/response
or other frames. Add a new nl80211 attribute to hold
the extended capabilities bitmap for this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:42 +01:00
Johannes Berg
af0ed69bad mac80211: stop modifying HT SMPS capability
Instead of modifying the HT SMPS capability field
for stations, track the SMPS mode explicitly in a
new field in the station struct and use it in the
drivers that care about it. This simplifies the
code using it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:41 +01:00
Johannes Berg
c7a6ee27ab cfg80211: allow drivers to selectively disable 80/160 MHz
Some drivers might support 80 or 160 MHz only on some
channels for whatever reason, so allow them to disable
these channel widths. Also maintain the new flags when
regulatory bandwidth limitations would disable these
wide channels.

Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:38 +01:00
Johannes Berg
2c9b735982 mac80211: add ieee80211_vif_change_bandwidth
For HT and VHT the current bandwidth can change,
add the function ieee80211_vif_change_bandwidth()
to take care of this. It returns a failure if the
new bandwidth isn't compatible with the existing
channel context, the caller has to handle that.
When it happens, also inform the driver that the
bandwidth changed for this virtual interface (no
drivers would actually care today though.)

Changing to/from HT/VHT isn't allowed though.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:36 +01:00
Johannes Berg
0af83d3df5 mac80211: handle VHT operating mode notification
Handle the operating mode notification action frame.
When the supported streams or the bandwidth change
let the driver and rate control algorithm know.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:32 +01:00
Johannes Berg
8921d04e8d mac80211: track number of spatial streams
With VHT, a station can change the number of spatial
streams it can receive on the fly, not unlike spatial
multiplexing in HT. Prepare for that by tracking the
maximum number of spatial streams it can receive when
the connection is established.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:31 +01:00
Johannes Berg
e1a0c6b3a4 mac80211: stop toggling IEEE80211_HT_CAP_SUP_WIDTH_20_40
For VHT, many more bandwidth changes are possible. As a first
step, stop toggling the IEEE80211_HT_CAP_SUP_WIDTH_20_40 flag
in the HT capabilities and instead introduce a bandwidth field
indicating the currently usable bandwidth to transmit to the
station. Of course, make all drivers use it.

To achieve this, make ieee80211_ht_cap_ie_to_sta_ht_cap() get
the station as an argument, rather than the new capabilities,
so it can set up the new bandwidth field.

If the station is a VHT station and VHT bandwidth is in use,
also set the bandwidth accordingly.

Doing this allows us to get rid of the supports_40mhz flag as
the HT capabilities now reflect the true capability instead of
the current setting.

While at it, also fix ieee80211_ht_cap_ie_to_sta_ht_cap() to not
ignore HT cap overrides when MCS TX isn't supported (not that it
really happens...)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:30 +01:00
Simon Wunderlich
164eb02d07 mac80211: add radar detection command/event
Add command to trigger radar detection in the driver/FW.
Once radar detection is started it should continuously
monitor for radars as long as the channel active.
If radar is detected usermode notified with 'radar
detected' event.

Scanning and remain on channel functionality must be disabled
while doing radar detection/scanning, and vice versa.

Based on original patch by Victor Goldenshtein <victorg@ti.com>

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:41:04 +01:00
Simon Wunderlich
04f39047af nl80211/cfg80211: add radar detection command/event
Add new NL80211_CMD_RADAR_DETECT, which starts the Channel
Availability Check (CAC). This command will also notify the
usermode about events (CAC finished, CAC aborted, radar
detected, NOP finished).
Once radar detection has started it should continuously
monitor for radars as long as the channel is active.

This patch enables DFS for AP mode in nl80211/cfg80211.

Based on original patch by Victor Goldenshtein <victorg@ti.com>

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
[remove WIPHY_FLAG_HAS_RADAR_DETECT again -- my mistake]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-15 09:40:18 +01:00
David S. Miller
e0376d0043 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Remove a duplicated call to skb_orphan() in pf_key, from Cong Wang.

2) Prepare xfrm and pf_key for algorithms without pf_key support,
   from Jussi Kivilinna.

3) Fix an unbalanced lock in xfrm_output_one(), from Li RongQing.

4) Add an IPsec state resolution packet queue to handle
   packets that are send before the states are resolved.

5) xfrm4_policy_fini() is unused since 2.6.11, time to remove it.
   From Michal Kubecek.

6) The xfrm gc threshold was configurable just in the initial
   namespace, make it configurable in all namespaces. From
   Michal Kubecek.

7) We currently can not insert policies with mark and mask
   such that some flows would be matched from both policies.
   Allow this if the priorities of these policies are different,
   the one with the higher priority is used in this case.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-14 13:29:20 -05:00
Johannes Berg
2a0e047ed6 cfg80211: configuration for WoWLAN over TCP
Intel Wireless devices are able to make a TCP connection
after suspending, sending some data and waking up when
the connection receives wakeup data (or breaks). Add the
WoWLAN configuration and feature advertising API for it.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-13 14:33:42 +01:00
Amitkumar Karwar
bb92d19983 nl80211: add packet offset information for wowlan pattern
If user knows the location of a wowlan pattern to be matched in
Rx packet, he can provide an offset with the pattern. This will
help drivers to ignore initial bytes and match the pattern
efficiently.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
[refactor pattern sending]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-13 10:09:48 +01:00
Jiri Pirko
0e243218ba act_police: move struct tcf_police to act_police.c
It's not used anywhere else, so move it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12 18:59:45 -05:00
Jiri Pirko
34c5d292ce sch_api: introduce qdisc_watchdog_schedule_ns()
tbf will need to schedule watchdog in ns. No need to convert it twice.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12 18:59:45 -05:00
Jiri Pirko
292f1c7ff6 sch: make htb_rate_cfg and functions around that generic
As it is going to be used in tbf as well, push these to generic code.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12 18:59:45 -05:00
Daniel Borkmann
570617e79c net: sctp: remove unused multiple cookie keys
Vlad says: The whole multiple cookie keys code is completely unused
and has been all this time. Noone uses anything other then the
secret_key[0] since there is no changeover support anywhere.

Thus, for now clean up its left-over fragments.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12 16:05:11 -05:00
Eric W. Biederman
b464255699 9p: Modify struct 9p_fid to use a kuid_t not a uid_t
Change struct 9p_fid and it's associated functions to
use kuid_t's instead of uid_t.

Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-12 03:19:32 -08:00
Eric W. Biederman
447c50943f 9p: Modify the stat structures to use kuid_t and kgid_t
9p has thre strucrtures that can encode inode stat information.  Modify
all of those structures to contain kuid_t and kgid_t values.  Modify
he wire encoders and decoders of those structures to use 'u' and 'g' instead of
'd' in the format string where uids and gids are present.

This results in all kuid and kgid conversion to and from on the wire values
being performed by the same code in protocol.c where the client is known
at the time of the conversion.

Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-02-12 03:19:31 -08:00
Eric W. Biederman
f791f7c5e3 9p: Transmit kuid and kgid values
Modify the p9_client_rpc format specifiers of every function that
directly transmits a uid or a gid from 'd' to 'u' or 'g' as
appropriate.

Modify those same functions to take kuid_t and kgid_t parameters
instead of uid_t and gid_t parameters.

Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-02-12 03:19:30 -08:00
Seth Forshee
6c17b77b67 mac80211: Fix tx queue handling during scans
Scans currently work by stopping the netdev tx queues but leaving the
mac80211 queues active. This stops the flow of incoming packets while
still allowing mac80211 to transmit nullfunc and probe request frames to
facilitate scanning. However, the driver may try to wake the mac80211
queues while in this state, which will also wake the netdev queues.

To prevent this, add a new queue stop reason,
IEEE80211_QUEUE_STOP_REASON_OFFCHANNEL, to be used when stopping the tx
queues for off-channel operation. This prevents the netdev queues from
waking when a driver wakes the mac80211 queues.

This also stops all frames from being transmitted, even those meant to
be sent off-channel. Add a new tx control flag,
IEEE80211_TX_CTL_OFFCHAN_TX_OK, which allows frames to be transmitted
when the queues are stopped only for the off-channel stop reason. Update
all locations transmitting off-channel frames to use this flag.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 22:52:21 +01:00
Johannes Berg
f1e3e05156 mac80211: remove IEEE80211_HW_SCAN_WHILE_IDLE
There are only a few drivers that use HW scan, and
all of those don't need a non-idle transition before
starting the scan -- some don't even care about idle
at all. Remove the flag and code associated with it.

The only driver that really actually needed this is
wl1251 and it can just do it itself in the hw_scan
callback -- implement that.

Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:45:01 +01:00
Johannes Berg
09b85568c1 mac80211: remove dynamic PS driver interface
The functions were added for some sort of Bluetooth
coexistence, but aren't used, so remove them again.

Reviewed-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:45:00 +01:00
Johannes Berg
ef429dadf3 mac80211: introduce beacon-only timing data
In order to be able to predict the next DTIM TBTT
in the driver, add the ability to use timing data
from beacons only with the new hardware flag
IEEE80211_HW_TIMING_BEACON_ONLY and the BSS info
value sync_dtim_count which is only valid if the
timing data came from a beacon. The data can only
come from a beacon, and if no beacon was received
before association it is updated later together
with the DTIM count notification.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:45:00 +01:00
Johannes Berg
8cef2c9df8 cfg80211: move TSF into IEs
While technically the TSF isn't an IE, it can be
necessary to distinguish between the TSF from a
beacon and a probe response, in particular in
order to know the next DTIM TBTT, as not all APs
are spec compliant wrt. TSF==0 being a DTIM TBTT
and thus the DTIM count needs to be taken into
account as well.

To allow this, move the TSF into the IE struct
so it can be known whence it came.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:44:59 +01:00
Johannes Berg
83c7aa1a14 cfg80211: remove scan ies NULL check
There's no way scan BSS IEs can be NULL as even
if the allocation fails the frame is discarded.
Remove some code checking for this and document
that it is always non-NULL.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:44:58 +01:00
Alexander Bondar
b207cdb07f mac80211: add vif debugfs driver callbacks
Add debugfs driver callbacks so drivers can add
debugfs entries for interfaces. Note that they
_must_ remove the entries again as add/remove in
the driver doesn't correspond to add/remove in
debugfs; the former is up/down while the latter
is netdev create/destroy.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:44:58 +01:00
Johannes Berg
776b358017 cfg80211: track hidden SSID networks properly
Currently, cfg80211 will copy beacon IEs from a previously
received hidden SSID beacon to a probe response entry, if
that entry is created after the beacon entry. However, if
it is the other way around, or if the beacon is updated,
such changes aren't propagated.

Fix this by tracking the relation between the probe
response and beacon BSS structs in this case.

In case drivers have private data stored in a BSS struct
and need access to such data from a beacon entry, cfg80211
now provides the hidden_beacon_bss pointer from the probe
response entry to the beacon entry.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:44:57 +01:00
Johannes Berg
077f897a8b wireless: fix kernel-doc
Fix most kernel-doc warnings, for some reason it
seems to have issues with __aligned, don't remove
the documentation entries it considers to be in
excess due to that.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:44:53 +01:00
Johannes Berg
5b112d3d09 cfg80211: pass wiphy to cfg80211_ref_bss/put_bss
This prepares for using the spinlock instead of krefs
which is needed in the next patch to track the refs
of combined BSSes correctly.

Acked-by: Bing Zhao <bzhao@marvell.com> [mwifiex]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-11 18:44:52 +01:00
YOSHIFUJI Hideaki / 吉藤英明
daaba4fa17 net neighbour, decnet: Ensure to align device private data on preferred alignment.
To allow both of protocol-specific data and device-specific data
attached with neighbour entry, and to eliminate size calculation
cost when allocating entry, sizeof protocol-speicic data must be
multiple of NEIGH_PRIV_ALIGN.  On 64bit archs,
sizeof(struct dn_neigh) is multiple of NEIGH_PRIV_ALIGN, but on
32bit archs, it was not.

Introduce NEIGH_ENTRY_SPACE() macro to ensure that protocol-specific
entry-size meets our requirement.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-11 00:21:44 -05:00
David Ward
febf018d22 net/802: Implement Multiple Registration Protocol (MRP)
Initial implementation of the Multiple Registration Protocol (MRP)
from IEEE 802.1Q-2011, based on the existing implementation of the
Generic Attribute Registration Protocol (GARP).

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-10 20:37:22 -05:00
John W. Linville
3549c6b195 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Fixed-up drivers/net/wireless/iwlwifi/mvm/mac80211.c to change change
IEEE80211_HW_NEED_DTIM_PERIOD to IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC
as requested by Johannes Berg. -- JWL

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-02-08 14:39:54 -05:00
John W. Linville
f5237f278f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-02-08 13:16:17 -05:00
Michal Kubecek
8d068875ca xfrm: make gc_thresh configurable in all namespaces
The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-02-06 11:36:29 +01:00
Steffen Klassert
a0073fe18e xfrm: Add a state resolution packet queue
As the default, we blackhole packets until the key manager resolves
the states. This patch implements a packet queue where IPsec packets
are queued until the states are resolved. We generate a dummy xfrm
bundle, the output routine of the returned route enqueues the packet
to a per policy queue and arms a timer that checks for state resolution
when dst_output() is called. Once the states are resolved, the packets
are sent out of the queue. If the states are not resolved after some
time, the queue is flushed.

This patch keeps the defaut behaviour to blackhole packets as long
as we have no states. To enable the packet queue the sysctl
xfrm_larval_drop must be switched off.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-02-06 08:31:10 +01:00
Stephen Hemminger
ca2eb5679f tcp: remove Appropriate Byte Count support
TCP Appropriate Byte Count was added by me, but later disabled.
There is no point in maintaining it since it is a potential source
of bugs and Linux already implements other better window protection
heuristics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-05 14:51:16 -05:00
David S. Miller
188d1f76d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/ethtool.c
	drivers/net/vmxnet3/vmxnet3_drv.c
	drivers/net/wireless/iwlwifi/dvm/tx.c
	net/ipv6/route.c

The ipv6 route.c conflict is simple, just ignore the 'net' side change
as we fixed the same problem in 'net-next' by eliminating cached
neighbours from ipv6 routes.

The e1000e conflict is an addition of a new statistic in the ethtool
code, trivial.

The vmxnet3 conflict is about one change in 'net' removing a guarding
conditional, whilst in 'net-next' we had a netdev_info() conversion.

The iwlwifi conflict is dealing with a WARN_ON() conversion in
'net-next' vs. a revert happening in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-05 14:12:20 -05:00
Johannes Berg
37e0838117 cfg80211: remove unused cfg80211_get_mesh
As Thomas pointed out, cfg80211_get_mesh() is
unused and can be removed.

Cc: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-04 18:57:42 +01:00
Vladimir Kondratiev
42745e0393 cfg80211: expand per-station byte counters to 64bit
In per-station statistics, present 32bit counters are too small
for practical purposes - with gigabit speeds, it get overlapped
every few seconds.

Expand counters in the struct station_info to be 64-bit.
Driver can still fill only 32-bit and indicate in @filled
only bits like STATION_INFO_[TR]X_BYTES; in case driver provides
full 64-bit counter, it should also set in @filled
bit STATION_INFO_[TR]RX_BYTES64

Netlink sends both 32-bit and 64-bit counters, if present, to not
break userspace.

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
[change to also have 32-bit counters if driver advertises 64-bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-04 18:57:22 +01:00
Johannes Berg
682bd38b8a mac80211: always allow calling ieee80211_connection_loss()
With multi-channel, there's a corner case where a driver
doesn't receive a beacon soon enough to be able to sync
its timers with the AP. In this case, the only recovery
(after trying again) is to disconnect from the AP. Allow
calling ieee80211_connection_loss() for such cases. To
make that possible, modify the work function to not rely
on the IEEE80211_HW_CONNECTION_MONITOR flag but use new
state kept in the interface instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-04 16:09:58 +01:00
Johan Hedberg
83be8eca2e Bluetooth: Keep track of UUID type upon addition
The primary purpose of the UUIDs is to enable generation of EIR and AD
data. In these data formats the UUIDs are split into separate fields
based on whether they're 16, 32 or 128 bit UUIDs. To make the generation
of these data fields simpler this patch adds a type member to the
bt_uuid struct and assigns a value to it as soon as the UUID is added to
the kernel. This way the type doesn't need to be calculated each time
the UUID list is later iterated.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-02-01 15:50:17 -02:00
Jussi Kivilinna
7e50f84c94 pf_key/xfrm_algo: prepare pf_key and xfrm_algo for new algorithms without pfkey support
Mark existing algorithms as pfkey supported and make pfkey only use algorithms
that have pfkey_supported set.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-02-01 10:13:43 +01:00
Tom Parkin
73df66f8b1 ipv6: rename datagram_send_ctl and datagram_recv_ctl
The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific.  Since
datagram_send_ctl is publicly exported it should be appropriately named to
reflect the fact that it's for IPv6 only.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 13:53:08 -05:00
Johannes Berg
1672c0e319 mac80211: start auth/assoc timeout on frame status
When sending authentication/association frames they
might take a bit of time to go out because we may
have to synchronise with the AP, in particular in
the case where it's really a P2P GO. In this case
the 200ms fixed timeout could potentially be too
short if the beacon interval is relatively large.

For drivers that report TX status we can do better.
Instead of starting the timeout directly, start it
only when the frame status arrives. Since then the
frame was out on the air, we can wait shorter (the
typical response time is supposed to be 30ms, wait
100ms.) Also, if the frame failed to be transmitted
try again right away instead of waiting.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-31 14:28:43 +01:00
Johannes Berg
3ff9a827c6 cfg80211: remove free_priv BSS API
Now that mac80211 no longer uses this API, remove
it completely. If anyone needs it again, we can
revert this patch of course, but mac80211 was the
only user right now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-31 14:07:30 +01:00
Emmanuel Grumbach
c65dd1477b mac80211: inform the driver about update of dtim_period
Currently, when the driver requires the DTIM period,
mac80211 will wait to hear a beacon before association.
This behavior is suboptimal since some drivers may be
able to deal with knowing the DTIM period after the
association, if they get it at all.

To address this, notify the drivers with bss_info_changed
with the new BSS_CHANGED_DTIM_PERIOD flag when the DTIM
becomes known. This might be when changing to associated,
or later when the entire association was done with only
probe response information.

Rename the hardware flag for the current behaviour to
IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC to more accurately
reflect its behaviour. IEEE80211_HW_NEED_DTIM_PERIOD is
no longer accurate as all drivers get the DTIM period
now, just not before association.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-31 14:05:38 +01:00
Johannes Berg
cd8f7cb4e6 cfg80211/mac80211: support reporting wakeup reason
When waking up from WoWLAN, it is useful to know
what triggered the wakeup. Support reporting the
wakeup reason(s) in cfg80211 (and a pass-through
in mac80211) to allow userspace to know.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-31 14:00:21 +01:00
YOSHIFUJI Hideaki / 吉藤英明
18367681a1 ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 22:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明
d3aedd5ebd ipv6 flowlabel: Convert hash list to RCU.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 22:41:13 -05:00
John W. Linville
20fb9e5033 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-01-30 14:22:19 -05:00
John W. Linville
0f496df2d9 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-01-30 14:21:04 -05:00
YOSHIFUJI Hideaki / 吉藤英明
70e94e66ae xfrm: Convert xfrm_addr_cmp() to boolean xfrm_addr_equal().
All users of xfrm_addr_cmp() use its result as boolean.
Introduce xfrm_addr_equal() (which is equal to !xfrm_addr_cmp())
and convert all users.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明
ff88b30c71 xfrm: Use ipv6_addr_equal() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 22:58:40 -05:00
David S. Miller
f1e7b73acc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:32:13 -05:00
Jiri Pirko
5c766d642b ipv4: introduce address lifetime
There are some usecase when lifetime of ipv4 addresses might be helpful.
For example:
1) initramfs networkmanager uses a DHCP daemon to learn network
configuration parameters
2) initramfs networkmanager addresses, routes and DNS configuration
3) initramfs networkmanager is requested to stop
4) initramfs networkmanager stops all daemons including dhclient
5) there are addresses and routes configured but no daemon running. If
the system doesn't start networkmanager for some reason, addresses and
routes will be used forever, which violates RFC 2131.

This patch is essentially a backport of ivp6 address lifetime mechanism
for ipv4 addresses.

Current "ip" tool supports this without any patch (since it does not
distinguish between ipv4 and ipv6 addresses in this perspective.

Also, this should be back-compatible with all current netlink users.

Reported-by: Pavel Šimerda <psimerda@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:59:57 -05:00
Jesper Dangaard Brouer
3ef0eb0db4 net: frag, move LRU list maintenance outside of rwlock
Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock.  However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.

Original-idea-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:36:24 -05:00
Jesper Dangaard Brouer
6d7b857d54 net: use lib/percpu_counter API for fragmentation mem accounting
Replace the per network namespace shared atomic "mem" accounting
variable, in the fragmentation code, with a lib/percpu_counter.

Getting percpu_counter to scale to the fragmentation code usage
requires some tweaks.

At first view, percpu_counter looks superfast, but it does not
scale on multi-CPU/NUMA machines, because the default batch size
is too small, for frag code usage.  Thus, I have adjusted the
batch size by using __percpu_counter_add() directly, instead of
percpu_counter_sub() and percpu_counter_add().

The batch size is increased to 130.000, based on the largest 64K
fragment memory usage.  This does introduce some imprecise
memory accounting, but its does not need to be strict for this
use-case.

It is also essential, that the percpu_counter, does not
share cacheline with other writers, to make this scale.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:36:24 -05:00
Jesper Dangaard Brouer
d433673e5f net: frag helper functions for mem limit tracking
This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue.  This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:36:24 -05:00
Jesper Dangaard Brouer
6e34a8b37a net: cacheline adjust struct inet_frag_queue
Fragmentation code cacheline adjusting of struct inet_frag_queue.

Take advantage of the size of struct timer_list, and move all but
spinlock_t lock, below the timer struct.  On 64-bit 'lru_list',
'list' and 'refcnt', fits exactly into the next cacheline, and a
new cacheline starts at 'fragments'.

The netns_frags *net pointer is moved to the end of the struct,
because its used in a compare, with "next/close-by" elements of
which this struct is embedded into.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:36:23 -05:00
Jesper Dangaard Brouer
5f8e1e8b76 net: cacheline adjust struct inet_frags for better frag performance
The globally shared rwlock, of struct inet_frags, shares
cacheline with the 'rnd' number, which is used by the hash
calculations.  Fix this, as this obviously is a bad idea, as
unnecessary cache-misses will occur when accessing the 'rnd'
number.

Also small note that, moving function ptr (*match) up in struct,
is to avoid it lands on the next cacheline (on 64-bit).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:36:23 -05:00
Jesper Dangaard Brouer
cd39a7890a net: cacheline adjust struct netns_frags for better frag performance
This small cacheline adjustment of struct netns_frags improves
performance significantly for the fragmentation code.

Struct members 'lru_list' and 'mem' are both hot elements, and it
hurts performance, due to cacheline bouncing at every call point,
when they share a cacheline.  Also notice, how mem is placed
together with 'high_thresh' and 'low_thresh', as they are used in
the compare operations together.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:36:23 -05:00
Stanislaw Gruszka
8df6b7b11a mac80211: remove IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL
This is basically a revert of:

commit 5b632fe85e
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Mon Dec 3 12:56:33 2012 +0100

    mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL

We do not need this flag any longer, rt2x00 BAR/BA problem was fixed
correctly by wireless-testing commit:

commit 84e9e8ebd3
Author: Helmut Schaa <helmut.schaa@googlemail.com>
Date:   Thu Jan 17 17:34:32 2013 +0100

    rt2x00: Improve TX status handling for BlockAckReq frames

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-29 12:16:34 +01:00
Johannes Berg
448cd55c37 Merge remote-tracking branch 'wireless-next/master' into HEAD 2013-01-29 12:16:22 +01:00
Jiri Kosina
617677295b Merge branch 'master' into for-next
Conflicts:
	drivers/devfreq/exynos4_bus.c

Sync with Linus' tree to be able to apply patches that are
against newer code (mvneta).
2013-01-29 10:48:30 +01:00
YOSHIFUJI Hideaki / 吉藤英明
08433eff2d net neigh: Optimize neighbor entry size calculation.
When allocating memory for neighbour cache entry, if
tbl->entry_size is not set, we always calculate
sizeof(struct neighbour) + tbl->key_len, which is common
in the same table.

With this change, set tbl->entry_size during the table
initialization phase, if it was not set, and use it in
neigh_alloc() and neighbour_priv().

This change also allow us to have both of protocol private
data and device priate data at tha same time.

Note that the only user of prototcol private is DECnet
and the only user of device private is ATM CLIP.
Since those are exclusive, we have not been facing issues
here.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 23:17:51 -05:00
John W. Linville
4205e6ef4e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-01-28 14:43:00 -05:00
Cong Wang
0e36cbb344 net: add RCU annotation to sk_dst_cache field
sock->sk_dst_cache is protected by RCU.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:15:28 -05:00
Cong Wang
cec771d646 decnet: use correct RCU API to deref sk_dst_cache field
sock->sk_dst_cache is protected by RCU, therefore we should
use __sk_dst_get() to deref it once we lock the sock.

This fixes several sparse warnings.

Cc: linux-decnet-user@lists.sourceforge.net
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:15:27 -05:00
Joe Perches
a1b1add07f gro: Fix kcalloc argument order
First number, then size.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27 22:46:33 -05:00
David S. Miller
b640bee6d9 Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
This batch contains netfilter updates for you net-next tree, they are:

* The new connlabel extension for x_tables, that allows us to attach
  labels to each conntrack flow. The kernel implementation uses a
  bitmask and there's a file in user-space that maps the bits with the
  corresponding string for each existing label. By now, you can attach
  up to 128 overlapping labels. From Florian Westphal.

* A new round of improvements for the netns support for conntrack.
  Gao feng has moved many of the initialization code of each module
  of the netns init path. He also made several code refactoring, that
  code looks cleaner to me now.

* Added documentation for all possible tweaks for nf_conntrack via
  sysctl, from Jiri Pirko.

* Cisco 7941/7945 IP phone support for our SIP conntrack helper,
  from Kevin Cernekee.

* Missing header file in the snmp helper, from Stephen Hemminger.

* Finally, a couple of fixes to resolve minor issues with these
  changes, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27 00:56:10 -05:00
Vasanthakumar Thiagarajan
77765eaf5c cfg80211/nl80211: add API for MAC address ACLs
Add API to enable drivers to implement MAC address based
access control in AP/P2P GO mode. Capable drivers advertise
this capability by setting the maximum number of MAC
addresses in such a list in wiphy->max_acl_mac_addrs.

An initial ACL may be given to the NL80211_CMD_START_AP
command and/or changed later with NL80211_CMD_SET_MAC_ACL.

Black- and whitelists are supported, but not simultaneously.

Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
[rewrite commit log, many cleanups]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-25 18:36:44 +01:00
Vasanthakumar Thiagarajan
6d45a74b1f cfg80211: Move the definition of struct mac_address up
struct mac_address will be used by ACL related configuration ops.

Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-25 18:14:52 +01:00
Emmanuel Grumbach
887da9176e mac80211: provide the vif in rssi_callback
Since drivers can support several BSS / P2P Client
interfaces, the rssi callback needs to inform the driver
about the interface teh rssi event relates to.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-24 15:41:29 +01:00
David S. Miller
93b9c1ddd3 Merge branch 'testing' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Add a statistic counter for invalid output states and
   remove a superfluous state valid check, from Li RongQing.

2) Probe for asynchronous block ciphers instead of synchronous block
   ciphers to make the asynchronous variants available even if no
   synchronous block ciphers are found, from Jussi Kivilinna.

3) Make rfc3686 asynchronous block cipher and make use of
   the new asynchronous variant, from Jussi Kivilinna.

4) Replace some rwlocks by rcu, from Cong Wang.

5) Remove some unused defines.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23 14:00:16 -05:00
Tom Herbert
5ba24953e9 soreuseport: TCP/IPv6 implementation
Motivation for soreuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket.  This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads.  In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets.  We have seen the  disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest.  With so_reusport the distribution is
uniform.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23 13:44:01 -05:00
Tom Herbert
da5e36308d soreuseport: TCP/IPv4 implementation
Allow multiple listener sockets to bind to the same port.

Motivation for soresuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket.  This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads.  In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets.  We have seen the  disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest.  With so_reusport the distribution is
uniform.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23 13:44:01 -05:00
Tom Herbert
055dc21a1d soreuseport: infrastructure
Definitions and macros for implementing soreusport.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23 13:44:00 -05:00
Gao feng
c296bb4d5d netfilter: nf_conntrack: refactor l4proto support for netns
Move the code that register/unregister l4proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l4proto_register
nf_ct_l4proto_pernet_register
nf_ct_l4proto_unregister
nf_ct_l4proto_pernet_unregister

We same many line breaks with it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 14:40:53 +01:00
Gao feng
6330750d56 netfilter: nf_conntrack: refactor l3proto support for netns
Move the code that register/unregister l3proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l3proto_register
nf_ct_l3proto_pernet_register
nf_ct_l3proto_unregister
nf_ct_l3proto_pernet_unregister

We same many line breaks with it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 14:39:20 +01:00
Gao feng
04d8700179 netfilter: nf_ct_proto: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:56:33 +01:00
Gao feng
5f69b8f521 netfilter: nf_ct_labels: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:56:23 +01:00
Gao feng
5e615b2200 netfilter: nf_ct_helper: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:56:13 +01:00
Gao feng
8684094cf1 netfilter: nf_ct_timeout: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:56:02 +01:00
Gao feng
3fe0f943d4 netfilter: nf_ct_ecache: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:55:50 +01:00
Gao feng
73f4001a52 netfilter: nf_ct_tstamp: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:55:39 +01:00
Gao feng
b7ff3a1fae netfilter: nf_ct_acct: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:55:29 +01:00
Gao feng
83b4dbe198 netfilter: nf_ct_expect: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:55:00 +01:00
Gao feng
f94161c1bb netfilter: nf_conntrack: move initialization out of pernet operations
nf_conntrack initialization and cleanup codes happens in pernet
operations function. This task should be done in module_init/exit.
We can't use init_net to identify if it's the right time to initialize
or cleanup since we cannot make assumption on the order netns are
created/destroyed.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:53:35 +01:00
Johan Hedberg
9b008c0457 Bluetooth: Add support for reading LE supported states
The LE supported states indicate the states and state combinations that
the link layer supports. This is important information for knowing what
operations are possible when dealing with multiple connected devices.
This patch adds reading of the supported states to the HCI init
sequence.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-01-23 02:09:16 -02:00
Johan Hedberg
cf1d081f65 Bluetooth: Add support for reading LE White List Size
The LE White List Size is necessary to be known before attempting to
feed the controller with any addresses intended for the white list. This
patch adds the necessary HCI command sending to the HCI init sequence.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-01-23 02:08:43 -02:00
Johan Hedberg
60e7732198 Bluetooth: Add LE Local Features reading support
To be able to make the appropriate decisions for some LE procedures we
need to know the LE features that the local controller supports.
Therefore, it's important to have the LE Read Local Supported Features
HCI comand as part of the HCI init sequence.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-01-23 02:08:18 -02:00
Johan Hedberg
679efe2b4f Bluetooth: Add helper functions for testing bdaddr types
This patch adds two helper functions to test for valid bdaddr type
values. These will be particularely useful in the mgmt code to check
that user space has passed valid values to the kernel.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-01-23 01:58:08 -02:00
John W. Linville
066433a6fa Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-01-22 15:40:56 -05:00
Steffen Klassert
8141ed9fce ipv4: Add a socket release callback for datagram sockets
This implements a socket release callback function to check
if the socket cached route got invalid during the time
we owned the socket. The function is used from udp, raw
and ping sockets.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 14:17:05 -05:00
YOSHIFUJI Hideaki / 吉藤英明
2576f17dfa ipv6: Unshare ip6_nd_hdr() and change return type to void.
- move ip6_nd_hdr() to its users' source files.
  In net/ipv6/mcast.c, it will be called ip6_mc_hdr().
- make return type to void since this function never fails.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:15 -05:00
YOSHIFUJI Hideaki / 吉藤英明
c558e9fca8 ndisc: Move ndisc_opt_addr_space() to include/net/ndisc.h.
This also makes ndisc_opt_addr_data() and ndisc_fill_addr_option()
use ndisc_opt_addr_space().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:14 -05:00
Steffen Klassert
02bfd8ecf5 xfrm: Remove unused defines
XFRM_REPLAY_SEQ, XFRM_REPLAY_OSEQ and XFRM_REPLAY_SEQ_MASK
were introduced years ago but actually never used.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-01-21 11:19:04 +01:00
YOSHIFUJI Hideaki / 吉藤英明
d1641565f6 ipv6: Optimize ipv6_addr_is_ll_all_{nodes,routers}().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-20 22:29:49 -05:00
YOSHIFUJI Hideaki / 吉藤英明
9d10077400 ipv6: Optimize ipv6_addr_is_solict_mult().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-20 22:29:49 -05:00
YOSHIFUJI Hideaki / 吉藤英明
ca97a644d7 ipv6: Introduce ipv6_addr_is_solict_mult() to check Solicited Node Multicast Addresses.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-20 22:29:49 -05:00
YOSHIFUJI Hideaki
b27b28cb44 ipv6: Make ipv6_addr_is_XXX() return boolean.
ipv6_addr_is_{multicast,ll_all_nodes,ll_all_routers,isatap}()
return boolean.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-20 22:29:24 -05:00
Johannes Berg
a65240c101 mac80211: allow drivers to access IPv6 information
To be able to implement NS response offloading (in
regular operation or while in WoWLAN) drivers need
to know the IPv6 addresses assigned to interfaces.
Implement an IPv6 notifier in mac80211 to call the
driver when addresses change.

Unlike for IPv4, implement it as a callback rather
than as a list in the BSS configuration, that is
more flexible.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-18 21:55:38 +01:00
Johannes Berg
0f19b41e22 mac80211: remove ARP filter enable/disable logic
Depending on the driver, having ARP filtering for
some addresses may be possible. Remove the logic
that tracks whether ARP filter is enabled or not
and give the driver the total number of addresses
instead of the length of the list so it can make
its own decision.

Reviewed-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-18 21:20:34 +01:00
YOSHIFUJI Hideaki / 吉藤英明
12fd84f438 ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers.
Because of rt->n removal, we do not need neigh argument any more.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:41:13 -05:00
Yoni Divinsky
de5fad8157 mac80211: add op to configure default key id
There are hardwares which support offload of data packets
for example when auto ARP is enabled the hw will send
the ARP response. In such cases if WEP encryption is
configured the hw must know the default WEP key in order
to encrypt the packets correctly.

When hw_accel is enabled and encryption type is set to WEP,
the driver should get the default key index from mac80211.

Signed-off-by: Yoni Divinsky <yoni.divinsky@ti.com>
[cleanups, fixes, documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-18 13:30:21 +01:00
Johan Hedberg
6ead1bbc38 Bluetooth: Add a new workqueue for hci_request operations
The hci_request function is blocking and cannot be called through the
usual per-HCI device workqueue (hdev->workqueue). While hci_request is
in progress any other work from the queue, including sending HCI
commands to the controller would be blocked and eventually cause the
hci_request call to time out.

This patch adds a second workqueue to be used by operations needing
hci_request and thereby avoiding issues with blocking other workqueue
users.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-01-18 02:54:21 -02:00
YOSHIFUJI Hideaki / 吉藤英明
887c95cc1d ipv6: Complete neighbour entry removal from dst_entry.
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明
9bb5a14813 ipv6: Introduce rt6_nexthop() to select nexthop address.
For RTF_GATEWAY route, return rt->rt6i_gateway.
Otherwise, return 2nd argument (destination address).

This will be used by following patches which remove rt->n
dependency patches in ip6_dst_lookup_tail() and ip6_finish_output2().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明
ac3175fe7a ndisc: Introduce __ipv6_neigh_lookup_noref().
This function, which looks up neighbour entry for an IPv6 address
without touching refcnt, will be used for patches to remove
dependency on rt->n (neighbour entry in rt6_info).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明
8e022ee63f ndisc: Remove tbl argument for __ipv6_neigh_lookup().
We can refer to nd_tbl directly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:18 -05:00
Florian Westphal
9b21f6a909 netfilter: ctnetlink: allow userspace to modify labels
Add the ability to set/clear labels assigned to a conntrack
via ctnetlink.

To allow userspace to only alter specific bits, Pablo suggested to add
a new CTA_LABELS_MASK attribute:

The new set of active labels is then determined via

active = (active & ~mask) ^ changeset

i.e., the mask selects those bits in the existing set that should be
changed.

This follows the same method already used by MARK and CONNMARK targets.

Omitting CTA_LABELS_MASK is the same as setting all bits in CTA_LABELS_MASK
to 1: The existing set is replaced by the one from userspace.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-18 00:28:17 +01:00
Florian Westphal
c539f01717 netfilter: add connlabel conntrack extension
similar to connmarks, except labels are bit-based; i.e.
all labels may be attached to a flow at the same time.

Up to 128 labels are supported.  Supporting more labels
is possible, but requires increasing the ct offset delta
from u8 to u16 type due to increased extension sizes.

Mapping of bit-identifier to label name is done in userspace.

The extension is enabled at run-time once "-m connlabel" netfilter
rules are added.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-18 00:28:15 +01:00
Fabio Baltieri
512613d7dd ipv6: fix ipv6_prefix_equal64_half mask conversion
Fix the 64bit optimized version of ipv6_prefix_equal to convert the
bitmask to network byte order only after the bit-shift.

The bug was introduced in:

3867517 ipv6: 64bit version of ipv6_prefix_equal().

Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 14:29:58 -05:00
Jesper Dangaard Brouer
c2a936600f net: increase fragment memory usage limits
Increase the amount of memory usage limits for incomplete
IP fragments.

Arguing for new thresh high/low values:

 High threshold = 4 MBytes
 Low  threshold = 3 MBytes

The fragmentation memory accounting code, tries to account for the
real memory usage, by measuring both the size of frag queue struct
(inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.

We want to be able to handle/hold-on-to enough fragments, to ensure
good performance, without causing incomplete fragments to hurt
scalability, by causing the number of inet_frag_queue to grow too much
(resulting longer searches for frag queues).

For IPv4, how much memory does the largest frag consume.

Maximum size fragment is 64K, which is approx 44 fragments with
MTU(1500) sized packets. Sizeof(struct ipq) is 200.  A 1500 byte
packet results in a truesize of 2944 (not 2048 as I first assumed)

  (44*2944)+200 = 129736 bytes

The current default high thresh of 262144 bytes, is obviously
problematic, as only two 64K fragments can fit in the queue at the
same time.

How many 64K fragment can we fit into 4 MBytes:

  4*2^20/((44*2944)+200) = 32.34 fragment in queues

An attacker could send a separate/distinct fake fragment packets per
queue, causing us to allocate one inet_frag_queue per packet, and thus
attacking the hash table and its lists.

How many frag queue do we need to store, and given a current hash size
of 64, what is the average list length.

Using one MTU sized fragment per inet_frag_queue, each consuming
(2944+200) 3144 bytes.

  4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length

An attack could send small fragments, the smallest packet I could send
resulted in a truesize of 896 bytes (I'm a little surprised by this).

  4*2^20/(896+200)  = 3827 frag queues -> 59 avg list length

When increasing these number, we also need to followup with
improvements, that is going to help scalability.  Simply increasing
the hash size, is not enough as the current implementation does not
have a per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 14:29:53 -05:00
Vincent Bernat
d59577b6ff sk-filter: Add ability to lock a socket filter program
While a privileged program can open a raw socket, attach some
restrictive filter and drop its privileges (or send the socket to an
unprivileged program through some Unix socket), the filter can still
be removed or modified by the unprivileged program. This commit adds a
socket option to lock the filter (SO_LOCK_FILTER) preventing any
modification of a socket filter program.

This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even
root is not allowed change/drop the filter.

The state of the lock can be read with getsockopt(). No error is
triggered if the state is not changed. -EPERM is returned when a user
tries to remove the lock or to change/remove the filter while the lock
is active. The check is done directly in sk_attach_filter() and
sk_detach_filter() and does not affect only setsockopt() syscall.

Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 03:21:25 -05:00
YOSHIFUJI Hideaki
07f623d3b2 ipv6: Fix endianess warning in ip6_flow_hdr().
Commit 3e4e4c1f ("ipv6: Introduce ip6_flow_hdr() to fill version,
tclass and flowlabel.) uses ntohl(), which should be htonl().

Found by Fengguang Wu <fengguang.wu@intel.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-16 22:12:36 -05:00
Simon Wunderlich
11c4a075db cfg80211: check radar interface combinations
To ease further DFS development regarding interface combinations, use
the interface combinations structure to test for radar capabilities.
Drivers can specify which channel widths they support, and in which
modes. Right now only a single AP interface is allowed, but as the
DFS code evolves other combinations can be enabled.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-16 23:41:54 +01:00
Jouni Malinen
cee00a959c cfg80211: Allow use_mfp to be specified with the connect command
The NL80211_ATTR_USE_MFP attribute was originally added for
NL80211_CMD_ASSOCIATE, but it is actually as useful (if not even more
useful) with NL80211_CMD_CONNECT, so process that attribute with the
connect command, too.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-16 23:27:49 +01:00
Arend van Spriel
1c18f1452a nl80211: allow user-space to set address for P2P_DEVICE
As per email discussion Jouni Malinen pointed out that:

"P2P message exchanges can be executed on the current operating channel
of any operation (both P2P and non-P2P station). These can be on 5 GHz
and even on 60 GHz (so yes, you _can_ do GO Negotiation on 60 GHz).

As an example, it would be possible to receive a GO Negotiation Request
frame on a 5 GHz only radio and then to complete GO Negotiation on that
band. This can happen both when connected to a P2P group (through client
discoverability mechanism) and when connected to a legacy AP (assuming
the station receive Probe Request frame from full scan in the beginning
of P2P device discovery)."

This means that P2P messages can be sent over different radio devices.
However, these should use the same P2P device address so it should be
able to provision this from user-space. This patch adds a parameter for
this to struct vif_params which should only be used during creation of
the P2P device interface.

Cc: Jouni Malinen <j@w1.fi>
Cc: Greg Goldman <ggoldman@broadcom.com>
Cc: Jithu Jance <jithu@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
[add error checking]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-16 23:20:32 +01:00
Marco Porsch
3b1c5a5307 {cfg,nl}80211: mesh power mode primitives and userspace access
Add the nl80211_mesh_power_mode enumeration which holds possible
values for the mesh power mode. These modes are unknown, active,
light sleep and deep sleep.

Add power_mode entry to the mesh config structure to hold the
user-configured default mesh power mode. This value will be used
for new peer links.

Add the dot11MeshAwakeWindowDuration value to the mesh config.
The awake window is a duration in TU describing how long the STA
will stay awake after transmitting its beacon in PS mode.

Add access routines to:
 - get/set local link-specific power mode (STA)
 - get remote STA's link-specific power mode (STA)
 - get remote STA's non-peer power mode (STA)
 - get/set default mesh power mode (mesh config)
 - get/set mesh awake window duration (mesh config)

All config changes may be done at mesh runtime and take effect
immediately.

Signed-off-by: Marco Porsch <marco@cozybit.com>
Signed-off-by: Ivan Bezyazychnyy <ivan.bezyazychnyy@gmail.com>
Signed-off-by: Mike Krinkin <krinkin.m.u@gmail.com>
[fix commit message line length, error handling in set station]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-16 22:48:04 +01:00
Marco Porsch
9bdbf04db0 {cfg,nl,mac}80211: set beacon interval and DTIM period on mesh join
Move the default mesh beacon interval and DTIM period to cfg80211
and make them accessible to nl80211. This enables setting both
values when joining an MBSS.

Previously the DTIM parameter was not set by mac80211 so the
driver's default value was used.

Signed-off-by: Marco Porsch <marco@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-16 22:44:04 +01:00
Johannes Berg
8f21b0adfe mac80211: call restart complete at wowlan resume time
When the driver's resume function can't completely
restore the configuration in the device, it returns
1 from the callback which will be treated like a HW
restart request, but done directly.

In this case, also call the driver's restart_complete()
function so it can finish the reconfiguration there.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-16 15:19:01 +01:00
Yacine Belkadi
0ae997dc75 {cfg,mac}80211.h: fix some kernel-doc warnings
When building the 80211 DocBook, scripts/kernel-doc reports
the following type of warnings:

Warning(include/net/cfg80211.h:334): No description found for return value of 'cfg80211_get_chandef_type'

These warnings are only reported when scripts/kernel-doc
runs in verbose mode.

To fix these use "Return:" to describe function return values.

Signed-off-by: Yacine Belkadi <yacine.belkadi.1@gmail.com>
[adjust for freq_reg_info() change]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-16 15:14:18 +01:00
David S. Miller
4b87f92259 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/networking/ip-sysctl.txt
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

Both conflicts were simply overlapping context.

A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-15 15:05:59 -05:00
Benjamin LaHaise
c1b52739e4 pkt_sched: namespace aware act_mirred
Eric Dumazet pointed out that act_mirred needs to find the current net_ns,
and struct net pointer is not provided in the call chain.  His original
patch made use of current->nsproxy->net_ns to find the network namespace,
but this fails to work correctly for userspace code that makes use of
netlink sockets in different network namespaces.  Instead, pass the
"struct net *" down along the call chain to where it is needed.

This version removes the ifb changes as Eric has submitted that patch
separately, but is otherwise identical to the previous version.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:09:36 -05:00
YOSHIFUJI Hideaki / 吉藤英明
6059283378 ipv6 netevent: Remove old_neigh from netevent_redirect.
The only user is cxgb3 driver.

old_neigh is used to check device change, but it must not happen
on redirect.  In this sense, we can remove old_neigh argument.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:04:59 -05:00
YOSHIFUJI Hideaki / 吉藤英明
38675170e4 ipv6: 64bit version of ipv6_prefix_equal().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:01 -05:00
YOSHIFUJI Hideaki / 吉藤英明
2ef9733203 ipv6: Remove __ipv6_prefix_equal().
ipv6_prefix_equal() just casts its arguments and it is the only
user of __ipv6_prefix_equal().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:01 -05:00
YOSHIFUJI Hideaki / 吉藤英明
5206c579da ipv6: 64bit version of ipv6_addr_set().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:01 -05:00
YOSHIFUJI Hideaki / 吉藤英明
a04d40b895 ipv6: 64bit version of ipv6_addr_v4mapped().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:00 -05:00
YOSHIFUJI Hideaki / 吉藤英明
e287656b36 ipv6: 64bit version of ipv6_addr_loopback().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:00 -05:00
YOSHIFUJI Hideaki / 吉藤英明
9f2e73345a ipv6: 64bit version of ipv6_addr_diff().
Introduce __ipv6_addr_diff64() to to find the first different
bit between two addresses on 64bit architectures.

32bit version is still available as __ipv6_addr_diff32(),
and __ipv6_addr_diff() automatically selects appropriate
version.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:00 -05:00
Luis R. Rodriguez
0c0280bd0b wireless: make the reg_notifier() void
The reg_notifier()'s return value need not be checked
as it is only supposed to do post regulatory work and
that should never fail. Any behaviour to regulatory
that needs to be considered before cfg80211 does work
to a driver should be specified by using the already
existing flags, the reg_notifier() just does post
processing should it find it needs to.

Also make lbs_reg_notifier static.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[move lbs_reg_notifier to not break compile]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-14 11:32:44 +01:00
YOSHIFUJI Hideaki / 吉藤英明
daad151263 ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().
Move generalized version of ipv6_is_mld() to header,
and use it from ip6_mc_input().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明
6502ca527f ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明
3e4e4c1f2d ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.
This is not only for readability but also for optimization.
What we do here is to build the 32bit word at the beginning of the ipv6
header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and
we do not need to read the tclass portion of the target buffer.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
Pablo Neira Ayuso
1e47ee8367 netfilter: nf_conntrack: fix BUG_ON while removing nf_conntrack with netns
canqun zhang reported that we're hitting BUG_ON in the
nf_conntrack_destroy path when calling kfree_skb while
rmmod'ing the nf_conntrack module.

Currently, the nf_ct_destroy hook is being set to NULL in the
destroy path of conntrack.init_net. However, this is a problem
since init_net may be destroyed before any other existing netns
(we cannot assume any specific ordering while releasing existing
netns according to what I read in recent emails).

Thanks to Gao feng for initial patch to address this issue.

Reported-by: canqun zhang <canqunzhang@gmail.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-12 14:12:36 +01:00
YOSHIFUJI Hideaki / 吉藤英明
a0376db0f2 ipv6: Optimize ipv6_change_dsfield().
Do not convert endian back and forth.
If the caller uses contant "mask" argument (and most callers do),
we can omit runtime endian conversion here.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:59:53 -08:00
Samuel Ortiz
390a1bd853 NFC: Initial Secure Element API
Each NFC adapter can have several links to different secure elements and
that property needs to be exported by the drivers.
A secure element link can be enabled and disabled, and card emulation will
be handled by the currently active one. Otherwise card emulation will be
host implemented.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-01-10 00:51:54 +01:00
Eric Lapuyade
bf71ab8ba5 NFC: Add HCI quirks to support driver (non)standard implementations
Some chips diverge from the HCI spec in their implementation of standard
features. This adds a new quirks parameter to
nfc_hci_allocate_device() to let the driver indicate its divergence.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-01-10 00:51:51 +01:00
Eric Lapuyade
27c31191b3 NFC: Added error handling in event_received hci ops
There is no use to return an error if the caller doesn't get it.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-01-10 00:51:49 +01:00
Eric Lapuyade
f0c9103813 NFC: Fixed nfc core and hci unregistration and cleanup
When an adapter is removed, it will unregister itself from hci and/or
nfc core. In order to do that safely, work tasks must first be canceled
and prevented to be scheduled again, before the hci or nfc device can be
destroyed.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-01-10 00:51:48 +01:00
Andrei Emeltchenko
cb6801c640 Bluetooth: AMP: Use set_bit / test_bit for amp_mgr state
Using bit operations solves problems with multiple requests
and clearing state.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-01-09 17:05:05 -02:00
Andrei Emeltchenko
8e05e3ba88 Bluetooth: AMP: Send A2MP Create Phylink Rsp after Assoc write
Postpone sending A2MP Create Phylink Response until we got successful
HCI Command Complete after HCI Write Remote AMP Assoc.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-01-09 17:05:05 -02:00
Rami Rosen
7952861f4a Bluetooth: remove an unused variable in a header file
This patch removes srej_queue_next from include/net/bluetooth/l2cap.h as it
is not used.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-01-09 17:05:05 -02:00
Cong Wang
acb3e04119 ipv6: move csum_ipv6_magic() and udp6_csum_init() into static library
As suggested by David, udp6_csum_init() is too big to be inlined,
move it to ipv6 static library, net/ipv6/ip6_checksum.c.

And the generic csum_ipv6_magic() too.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 17:56:10 -08:00
Hannes Frederic Sowa
5d134f1c1f tcp: make sysctl_tcp_ecn namespace aware
As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl
namespace aware.  The reason behind this patch is to ease the testing
of ecn problems on the internet and allows applications to tune their
own use of ecn.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06 21:09:56 -08:00
Jiri Pirko
81135548e6 net: use ETHTOOL_FWVERS_LEN instead of ETHTOOL_BUSINFO_LEN for fw_ver strings
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06 21:06:31 -08:00
Jorrit Schippers
d82603c6da treewide: Replace incomming with incoming in all comments and strings
Signed-off-by: Jorrit Schippers <jorrit@ncode.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-03 16:15:49 +01:00
Johannes Berg
1c06ef9831 wireless: use __aligned
Use __aligned(...) instead of __attribute__((aligned(...)))
in mac80211 and cfg80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:44 +01:00
Johannes Berg
18b559d5db mac80211: split TX aggregation stop action
When TX aggregation is stopped, there are a few
different cases:
 - connection with the peer was dropped
 - session stop was requested locally
 - session stop was requested by the peer
 - connection was dropped while a session is stopping

The behaviour in these cases should be different, if
the connection is dropped then the driver should drop
all frames, otherwise the frames may continue to be
transmitted, aggregated in the case of a locally
requested session stop or unaggregated in the case of
the peer requesting session stop.

Split these different cases so that the driver can
act accordingly; however, treat local and remote stop
the same way and ask the driver to not send frames as
aggregated packets any more.

In the case of connection drop, the stop callback the
driver is otherwise supposed to call is no longer
required.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:42 +01:00
Johannes Berg
8a61af65c6 mac80211: fix channel context iteration
During suspend/resume channel contexts might be
iterated even if they haven't been re-added to
the driver, keep track of this and skip them in
iteration. Also use the new status for sanity
checks.

Also clarify the fact that during HW restart all
contexts are iterated over (thanks Eliad.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:35 +01:00
Johannes Berg
361c9c8b0e regulatory: use IS_ERR macro family for freq_reg_info
Instead of returning an error and filling a pointer
return the pointer and an ERR_PTR value in error cases.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:31 +01:00
Johannes Berg
c492db370c regulatory: use RCU to protect last_request
This will allow making freq_reg_info() lock-free.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:30 +01:00
Johannes Berg
458f4f9e96 regulatory: use RCU to protect global and wiphy regdomains
To simplify the locking and not require cfg80211_mutex
(which nl80211 uses to access the global regdomain) and
also to make it possible for drivers to access their
wiphy->regd safely, use RCU to protect these pointers.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:29 +01:00
Johannes Berg
fe7ef5e9ba regulatory: remove handling of channel bandwidth
The channel bandwidth handling isn't really quite right,
it assumes that a 40 MHz channel is really two 20 MHz
channels, which isn't strictly true. This is the way the
regulatory database handling is defined right now though
so remove the logic to handle other channel widths.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:28 +01:00
David S. Miller
ac196f8c92 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following batch contains Netfilter fixes for 3.8-rc1. They are
a mixture of old bugs that have passed unnoticed (I'll pass these to
stable) and more fresh ones from the previous merge window, they are:

* Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd
  showing up wrong address, from Bob Hockney.

* Fix a comment in nf_conntrack_ipv6, from Florent Fourcot.

* Fix a leak an error path in ctnetlink while creating an expectation,
  from Jesper Juhl.

* Fix missing ICMP time exceeded in the IPv6 defragmentation code, from
  Haibo Xi.

* Fix inconsistent handling of routing changes in MASQUERADE for the
  new connections case, from Andrew Collins.

* Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to
  crashes in the ixgbe driver (since it seems to access the transport
  header with TSO enabled), from Mukund Jampala.

* Recover obsoleted NOTRACK target by including it into the CT and spot
  a warning via printk about being obsoleted. Many people don't check the
  scheduled to be removal file under Documentation, so we follow some
  less agressive approach to kill this in a year or so. Spotted by Florian
  Westphal, patch from myself.

* Fix race condition in xt_hashlimit that allows to create two or more
  entries, from myself.

* Fix crash if the CT is used due to the recently added facilities to
  consult the dying and unconfirmed conntrack lists, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28 14:28:17 -08:00
Li Zefan
3d0dcfbd8f netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled
sock->sk_cgrp_prioidx won't be used at all if CONFIG_NETPRIO_CGROUP=n.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26 14:16:23 -08:00
Pablo Neira Ayuso
10db9069eb netfilter: xt_CT: recover NOTRACK target support
Florian Westphal reported that the removal of the NOTRACK target
(9655050 netfilter: remove xt_NOTRACK) is breaking some existing
setups.

That removal was scheduled for removal since long time ago as
described in Documentation/feature-removal-schedule.txt

What:  xt_NOTRACK
Files: net/netfilter/xt_NOTRACK.c
When:  April 2011
Why:   Superseded by xt_CT

Still, people may have not notice / may have decided to stick to an
old iptables version. I agree with him in that some more conservative
approach by spotting some printk to warn users for some time is less
agressive.

Current iptables 1.4.16.3 already contains the aliasing support
that makes it point to the CT target, so upgrading would fix it.
Still, the policy so far has been to avoid pushing our users to
upgrade.

As a solution, this patch recovers the NOTRACK target inside the CT
target and it now spots a warning.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-24 12:55:09 +01:00
Linus Torvalds
9eb127cc04 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Really fix tuntap SKB use after free bug, from Eric Dumazet.

 2) Adjust SKB data pointer to point past the transport header before
    calling icmpv6_notify() so that the headers are in the state which
    that function expects.  From Duan Jiong.

 3) Fix ambiguities in the new tuntap multi-queue APIs.  From Jason
    Wang.

 4) mISDN needs to use del_timer_sync(), from Konstantin Khlebnikov.

 5) Don't destroy mutex after freeing up device private in mac802154,
    fix also from Konstantin Khlebnikov.

 6) Fix INET request socket leak in TCP and DCCP, from Christoph Paasch.

 7) SCTP HMAC kconfig rework, from Neil Horman.

 8) Fix SCTP jprobes function signature, otherwise things explode, from
    Daniel Borkmann.

 9) Fix typo in ipv6-offload Makefile variable reference, from Simon
    Arlott.

10) Don't fail USBNET open just because remote wakeup isn't supported,
    from Oliver Neukum.

11) be2net driver bug fixes from Sathya Perla.

12) SOLOS PCI ATM driver bug fixes from Nathan Williams and David
    Woodhouse.

13) Fix MTU changing regression in 8139cp driver, from John Greene.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
  solos-pci: ensure all TX packets are aligned to 4 bytes
  solos-pci: add firmware upgrade support for new models
  solos-pci: remove superfluous debug output
  solos-pci: add GPIO support for newer versions on Geos board
  8139cp: Prevent dev_close/cp_interrupt race on MTU change
  net: qmi_wwan: add ZTE MF880
  drivers/net: Use of_match_ptr() macro in smsc911x.c
  drivers/net: Use of_match_ptr() macro in smc91x.c
  ipv6: addrconf.c: remove unnecessary "if"
  bridge: Correctly encode addresses when dumping mdb entries
  bridge: Do not unregister all PF_BRIDGE rtnl operations
  use generic usbnet_manage_power()
  usbnet: generic manage_power()
  usbnet: handle PM failure gracefully
  ksz884x: fix receive polling race condition
  qlcnic: update driver version
  qlcnic: fix unused variable warnings
  net: fec: forbid FEC_PTP on SoCs that do not support
  be2net: fix wrong frag_idx reported by RX CQ
  be2net: fix be_close() to ensure all events are ack'ed
  ...
2012-12-19 20:29:15 -08:00
Linus Torvalds
6a2b60b17b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "While small this set of changes is very significant with respect to
  containers in general and user namespaces in particular.  The user
  space interface is now complete.

  This set of changes adds support for unprivileged users to create user
  namespaces and as a user namespace root to create other namespaces.
  The tyranny of supporting suid root preventing unprivileged users from
  using cool new kernel features is broken.

  This set of changes completes the work on setns, adding support for
  the pid, user, mount namespaces.

  This set of changes includes a bunch of basic pid namespace
  cleanups/simplifications.  Of particular significance is the rework of
  the pid namespace cleanup so it no longer requires sending out
  tendrils into all kinds of unexpected cleanup paths for operation.  At
  least one case of broken error handling is fixed by this cleanup.

  The files under /proc/<pid>/ns/ have been converted from regular files
  to magic symlinks which prevents incorrect caching by the VFS,
  ensuring the files always refer to the namespace the process is
  currently using and ensuring that the ptrace_mayaccess permission
  checks are always applied.

  The files under /proc/<pid>/ns/ have been given stable inode numbers
  so it is now possible to see if different processes share the same
  namespaces.

  Through the David Miller's net tree are changes to relax many of the
  permission checks in the networking stack to allowing the user
  namespace root to usefully use the networking stack.  Similar changes
  for the mount namespace and the pid namespace are coming through my
  tree.

  Two small changes to add user namespace support were commited here adn
  in David Miller's -net tree so that I could complete the work on the
  /proc/<pid>/ns/ files in this tree.

  Work remains to make it safe to build user namespaces and 9p, afs,
  ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
  Kconfig guard remains in place preventing that user namespaces from
  being built when any of those filesystems are enabled.

  Future design work remains to allow root users outside of the initial
  user namespace to mount more than just /proc and /sys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
  proc: Usable inode numbers for the namespace file descriptors.
  proc: Fix the namespace inode permission checks.
  proc: Generalize proc inode allocation
  userns: Allow unprivilged mounts of proc and sysfs
  userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
  procfs: Print task uids and gids in the userns that opened the proc file
  userns: Implement unshare of the user namespace
  userns: Implent proc namespace operations
  userns: Kill task_user_ns
  userns: Make create_new_namespaces take a user_ns parameter
  userns: Allow unprivileged use of setns.
  userns: Allow unprivileged users to create new namespaces
  userns: Allow setting a userns mapping to your current uid.
  userns: Allow chown and setgid preservation
  userns: Allow unprivileged users to create user namespaces.
  userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
  userns: fix return value on mntns_install() failure
  vfs: Allow unprivileged manipulation of the mount namespace.
  vfs: Only support slave subtrees across different user namespaces
  vfs: Add a user namespace reference from struct mnt_namespace
  ...
2012-12-17 15:44:47 -08:00
Pablo Neira Ayuso
252b3e8c1b netfilter: xt_CT: fix crash while destroy ct templates
In (d871bef netfilter: ctnetlink: dump entries from the dying and
unconfirmed lists), we assume that all conntrack objects are
inserted in any of the existing lists. However, template conntrack
objects were not. This results in hitting BUG_ON in the
destroy_conntrack path while removing a rule that uses the CT target.

This patch fixes the situation by adding the template lists, which
is where template conntrack objects reside now.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:44:12 +01:00
Christoph Paasch
e337e24d66 inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock
If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:

unreferenced object 0xffff88022e8a92c0 (size 1592):
  comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
  hex dump (first 32 bytes):
    0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00  ................
    02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e
    [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5
    [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd
    [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e
    [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b
    [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481
    [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b
    [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416
    [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc
    [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701
    [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4
    [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f
    [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233
    [<ffffffff814cee68>] ip_rcv+0x217/0x267
    [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553
    [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82

This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.

This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...

Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.

Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

A similar approach is taken for dccp by calling dccp_done().

This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version >= 2.6.37.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Duan Jiong
093d04d42f ipv6: Change skb->data before using icmpv6_notify() to propagate redirect
In function ndisc_redirect_rcv(), the skb->data points to the transport
header, but function icmpv6_notify() need the skb->data points to the
inner IP packet. So before using icmpv6_notify() to propagate redirect,
change skb->data to point the inner IP packet that triggered the sending
of the Redirect, and introduce struct rd_msg to make it easy.

Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Linus Torvalds
a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
Linus Torvalds
6be35c700f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
   using netlink.  From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
   Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
   tunneled traffic can still be checksummed by HW).  From Joseph
   Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
   Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
   from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
   socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
   Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
    realities.  The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace.  From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
    Baldessari.

And many, many, driver updates, cleanups, and improvements.  Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  net/mlx4_en: Add support for destination MAC in steering rules
  net/mlx4_en: Use generic etherdevice.h functions.
  net: ethtool: Add destination MAC address to flow steering API
  bridge: add support of adding and deleting mdb entries
  bridge: notify mdb changes via netlink
  ndisc: Unexport ndisc_{build,send}_skb().
  uapi: add missing netconf.h to export list
  pkt_sched: avoid requeues if possible
  solos-pci: fix double-free of TX skb in DMA mode
  bnx2: Fix accidental reversions.
  bna: Driver Version Updated to 3.1.2.1
  bna: Firmware update
  bna: Add RX State
  bna: Rx Page Based Allocation
  bna: TX Intr Coalescing Fix
  bna: Tx and Rx Optimizations
  bna: Code Cleanup and Enhancements
  ath9k: check pdata variable before dereferencing it
  ath5k: RX timestamp is reported at end of frame
  ath9k_htc: RX timestamp is reported at end of frame
  ...
2012-12-12 18:07:07 -08:00
YOSHIFUJI Hideaki
fd0ea7dbfa ndisc: Unexport ndisc_{build,send}_skb().
These symbols were exported for bonding device by commit 305d552a
("bonding: send IPv6 neighbor advertisement on failover").

It bacame obsolete by commit 7c899432 ("bonding, ipv4, ipv6, vlan: Handle
NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS") and removed by
commit 4f5762ec ("bonding: Remove obsolete source file 'bond_ipv6.c'").

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 12:42:29 -05:00
Linus Torvalds
d206e09036 Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "A lot of activities on cgroup side.  The big changes are focused on
  making cgroup hierarchy handling saner.

   - cgroup_rmdir() had peculiar semantics - it allowed cgroup
     destruction to be vetoed by individual controllers and tried to
     drain refcnt synchronously.  The vetoing never worked properly and
     caused good deal of contortions in cgroup.  memcg was the last
     reamining user.  Michal Hocko removed the usage and cgroup_rmdir()
     path has been simplified significantly.  This was done in a
     separate branch so that the memcg people can base further memcg
     changes on top.

   - The above allowed cleaning up cgroup lifecycle management and
     implementation of generic cgroup iterators which are used to
     improve hierarchy support.

   - cgroup_freezer updated to allow migration in and out of a frozen
     cgroup and handle hierarchy.  If a cgroup is frozen, all descendant
     cgroups are frozen.

   - netcls_cgroup and netprio_cgroup updated to handle hierarchy
     properly.

   - Various fixes and cleanups.

   - Two merge commits.  One to pull in memcg and rmdir cleanups (needed
     to build iterators).  The other pulled in cgroup/for-3.7-fixes for
     device_cgroup fixes so that further device_cgroup patches can be
     stacked on top."

Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3fe4 ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)

* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
  cgroup: update Documentation/cgroups/00-INDEX
  cgroup_rm_file: don't delete the uncreated files
  cgroup: remove subsystem files when remounting cgroup
  cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
  cgroup: warn about broken hierarchies only after css_online
  cgroup: list_del_init() on removed events
  cgroup: fix lockdep warning for event_control
  cgroup: move list add after list head initilization
  netprio_cgroup: allow nesting and inherit config on cgroup creation
  netprio_cgroup: implement netprio[_set]_prio() helpers
  netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
  netprio_cgroup: reimplement priomap expansion
  netprio_cgroup: shorten variable names in extend_netdev_table()
  netprio_cgroup: simplify write_priomap()
  netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
  cgroup: remove obsolete guarantee from cgroup_task_migrate.
  cgroup: add cgroup->id
  cgroup, cpuset: remove cgroup_subsys->post_clone()
  cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
  cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
  ...
2012-12-12 08:18:24 -08:00
Eric Dumazet
1abbe1394a pkt_sched: avoid requeues if possible
With BQL being deployed, we can more likely have following behavior :

We dequeue a packet from qdisc in dequeue_skb(), then we realize target
tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
skb into gso_skb for later.

This shows in stats (tc -s qdisc dev eth0) as requeues.

Problem of these requeues is that high priority packets can not be
dequeued as long as this (possibly low prio and big TSO packet) is not
removed from gso_skb.

At 1Gbps speed, a full size TSO packet is 500 us of extra latency.

In some cases, we know that all packets dequeued from a qdisc are
for a particular and known txq :

- If device is non multi queue
- For all MQ/MQPRIO slave qdiscs

This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
this capability, so that dequeue_skb() is allowed to dequeue a packet
only if the associated txq is not stopped.

This indeed reduce latencies for high prio packets (or improve fairness
with sfq/fq_codel), and almost remove qdisc 'requeues'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 00:16:47 -05:00
John W. Linville
f9c4d420c1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-12-11 16:24:55 -05:00
John W. Linville
c66cfd5325 Merge branch 'for-john' of git://git.sipsolutions.net/mac80211-next 2012-12-11 16:04:03 -05:00
Eric Dumazet
f8e8f97c11 net: fix a race in gro_cell_poll()
Dmitry Kravkov reported packet drops for GRE packets since GRO support
was added.

There is a race in gro_cell_poll() because we call napi_complete()
without any synchronization with a concurrent gro_cells_receive()

Once bug was triggered, we queued packets but did not schedule NAPI
poll.

We can fix this issue using the spinlock protected the napi_skbs queue,
as we have to hold it to perform skb dequeue anyway.

As we open-code skb_dequeue(), we no longer need to mask IRQS, as both
producer and consumer run under BH context.

Bug added in commit c9e6bc644e (net: add gro_cells infrastructure)

Reported-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 12:49:53 -05:00
Yuchung Cheng
93b174ad71 tcp: bug fix Fast Open client retransmission
If SYN-ACK partially acks SYN-data, the client retransmits the
remaining data by tcp_retransmit_skb(). This increments lost recovery
state variables like tp->retrans_out in Open state. If loss recovery
happens before the retransmission is acked, it triggers the WARN_ON
check in tcp_fastretrans_alert(). For example: the client sends
SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
another 4 data packets and get 3 dupacks.

Since the retransmission is not caused by network drop it should not
update the recovery state variables. Further the server may return a
smaller MSS than the cached MSS used for SYN-data, so the retranmission
needs a loop. Otherwise some data will not be retransmitted until timeout
or other loss recovery events.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:39:28 -05:00
Thomas Graf
45122ca26c sctp: Add RCU protection to assoc->transport_addr_list
peer.transport_addr_list is currently only protected by sk_sock
which is inpractical to acquire for procfs dumping purposes.

This patch adds RCU protection allowing for the procfs readers to
enter RCU read-side critical sections.

Modification of the list continues to be serialized via sk_lock.

V2: Use list_del_rcu() in sctp_association_free() to be safe
    Skip transports marked dead when dumping for procfs

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:15:04 -05:00
John W. Linville
8024dc1910 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-12-07 13:03:50 -05:00
John W. Linville
403e16731f Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/mwifiex/sta_ioctl.c
	net/mac80211/scan.c
2012-12-06 14:58:41 -05:00
Stanislaw Gruszka
5b632fe85e mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL
Commit f0425beda4 "mac80211: retry sending
failed BAR frames later instead of tearing down aggr" caused regression
on rt2x00 hardware (connection hangs). This regression was fixed by
commit be03d4a45c "rt2x00: Don't let
mac80211 send a BAR when an AMPDU subframe fails". But the latter
commit caused yet another problem reported in
https://bugzilla.kernel.org/show_bug.cgi?id=42828#c22

After long discussion in this thread:
http://mid.gmane.org/20121018075615.GA18212@redhat.com
and testing various alternative solutions, which failed on one or other
setup, we have no other good fix for the issues like just revert both
mentioned earlier commits.

To do not affect other hardware which benefit from commit
f0425beda4, instead of reverting it,
introduce flag that when used will restore mac80211 behaviour before
the commit.

Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
[replaced link with mid.gmane.org that has message-id]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-05 09:53:31 +01:00
Nicolas Dichtel
d67b8c616b netconf: advertise mc_forwarding status
This patch advertise the MC_FORWARDING status for IPv4 and IPv6.
This field is readonly, only multicast engine in the kernel updates it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:10 -05:00
David S. Miller
e8ad1a8fab Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
* Remove limitation in the maximum number of supported sets in ipset.
  Now ipset automagically increments the number of slots in the array
  of sets by 64 new spare slots, from Jozsef Kadlecsik.

* Partially remove the generic queue infrastructure now that ip_queue
  is gone. Its only client is nfnetlink_queue now, from Florian
  Westphal.

* Add missing attribute policy checkings in ctnetlink, from Florian
  Westphal.

* Automagically kill conntrack entries that use the wrong output
  interface for the masquerading case in case of routing changes,
  from Jozsef Kadlecsik.

* Two patches two improve ct object traceability. Now ct objects are
  always placed in any of the existing lists. This allows us to dump
  the content of unconfirmed and dying conntracks via ctnetlink as
  a way to provide more instrumentation in case you suspect leaks,
  from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:01:19 -05:00
John W. Linville
06ef5c4bbb Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-12-03 13:46:03 -05:00
Michele Baldessari
196d675934 sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call
The current SCTP stack is lacking a mechanism to have per association
statistics. This is an implementation modeled after OpenSolaris'
SCTP_GET_ASSOC_STATS.

Userspace part will follow on lksctp if/when there is a general ACK on
this.
V4:
- Move ipackets++ before q->immediate.func() for consistency reasons
- Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
  returning bogus RTO values
- return asoc->rto_min when max_obs_rto value has not changed

V3:
- Increase ictrlchunks in sctp_assoc_bh_rcv() as well
- Move ipackets++ to sctp_inq_push()
- return 0 when no rto updates took place since the last call

V2:
- Implement partial retrieval of stat struct to cope for future expansion
- Kill the rtxpackets counter as it cannot be precise anyway
- Rename outseqtsns to outofseqtsns to make it clearer that these are out
  of sequence unexpected TSNs
- Move asoc->ipackets++ under a lock to avoid potential miscounts
- Fold asoc->opackets++ into the already existing asoc check
- Kill unneeded (q->asoc) test when increasing rtxchunks
- Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
- Don't count SHUTDOWNs as SACKs
- Move SCTP_GET_ASSOC_STATS to the private space API
- Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
  future struct growth
- Move association statistics in their own struct
- Update idupchunks when we send a SACK with dup TSNs
- return min_rto in max_rto when RTO has not changed. Also return the
  transport when max_rto last changed.

Signed-off: Michele Baldessari <michele@acksyn.org>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 13:32:15 -05:00
Andrei Emeltchenko
f2592d3ee3 Bluetooth: trivial: Change NO_FCS_RECV to RECV_NO_FCS
Make code more readable by changing CONF_NO_FCS_RECV which is read
as "No L2CAP FCS option received" to CONF_RECV_NO_FCS which means
"Received L2CAP option NO_FCS". This flag really means that we have
received L2CAP FRAME CHECK SEQUENCE (FCS) OPTION with value "No FCS".

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 16:00:01 -02:00
Andrei Emeltchenko
5d05416e09 Bluetooth: AMP: Check that AMP is present and active
Before starting quering remote AMP controllers make sure
that there is local active AMP controller.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 16:00:00 -02:00
Gustavo Padovan
ffa88e02bc Bluetooth: Move double negation to macros
Some comparisons needs to double negation(!!) in order to make the value
of the field boolean. Add it to the macro makes the code more readable.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:59 -02:00
Frédéric Dalleau
20714bfef8 Bluetooth: Implement deferred sco socket setup
In order to authenticate and configure an incoming SCO connection, the
BT_DEFER_SETUP option was added. This option is intended to defer reply
to Connect Request on SCO sockets.
When a connection is requested, the listening socket is unblocked but
the effective connection setup happens only on first recv. Any send
between accept and recv fails with -ENOTCONN.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:58 -02:00
Jozsef Kadlecsik
a0ecb85a2c netfilter: nf_nat: Handle routing changes in MASQUERADE target
When the route changes (backup default route, VPNs) which affect a
masqueraded target, the packets were sent out with the outdated source
address. The patch addresses the issue by comparing the outgoing interface
directly with the masqueraded interface in the nat table.

Events are inefficient in this case, because it'd require adding route
events to the network core and then scanning the whole conntrack table
and re-checking the route for all entry.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 15:14:20 +01:00
Florian Westphal
0360ae412d netfilter: kill support for per-af queue backends
We used to have several queueing backends, but nowadays only
nfnetlink_queue remains.

In light of this there doesn't seem to be a good reason to
support per-af registering -- just hook up nfnetlink_queue on module
load and remove it on unload.

This means that the userspace BIND/UNBIND_PF commands are now obsolete;
the kernel will ignore them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 15:07:48 +01:00
Pablo Neira Ayuso
04dac0111d netfilter: nf_conntrack: improve nf_conn object traceability
This patch modifies the conntrack subsystem so that all existing
allocated conntrack objects can be found in any of the following
places:

* the hash table, this is the typical place for alive conntrack objects.
* the unconfirmed list, this is the place for newly created conntrack objects
  that are still traversing the stack.
* the dying list, this is where you can find conntrack objects that are dying
  or that should die anytime soon (eg. once the destroy event is delivered to
  the conntrackd daemon).

Thus, we make sure that we follow the track for all existing conntrack
objects. This patch, together with some extension of the ctnetlink interface
to dump the content of the dying and unconfirmed lists, will help in case
to debug suspected nf_conn object leaks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 15:06:33 +01:00
Simon Wunderlich
5d7fad48ca mac80211: Fix typo in mac80211.h
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-03 11:21:40 +01:00
Eric Dumazet
077b393d05 net: fix sparse endianness warnings on sock_common
# make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/inet_hashtables.o
...
net/ipv4/inet_hashtables.c:242:7: warning: restricted __portpair degrades to integer
net/ipv4/inet_hashtables.c:242:7: warning: restricted __addrpair degrades to integer
...

Move __portpair/__addrpair from include/net/inet_hashtables.h
to include/net/sock.h where we need them in struct sock_common

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-02 20:23:01 -05:00
Shmulik Ladkani
aeaf6e9d2f ipv6: unify logic evaluating inet6_dev's accept_ra property
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the
logic determining whether to send Router Solicitations is identical
to the logic determining whether kernel accepts Router Advertisements.

However the condition itself is repeated in several code locations.

Unify it by introducing 'ipv6_accept_ra()' accessor.

Also, simplify the condition expression, making it more readable.
No semantic change.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01 11:36:37 -05:00
Eric Dumazet
ce43b03e88 net: move inet_dport/inet_num in sock_common
commit 68835aba4d (net: optimize INET input path further)
moved some fields used for tcp/udp sockets lookup in the first cache
line of struct sock_common.

This patch moves inet_dport/inet_num as well, filling a 32bit hole
on 64 bit arches and reducing number of cache line misses in lookups.

Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
before addresses match, as this check is more discriminant.

Remove the hash check from MATCH() macros because we dont need to
re validate the hash value after taking a refcount on socket, and
use likely/unlikely compiler hints, as the sk_hash/hash check
makes the following conditional tests 100% predicted by cpu.

Introduce skc_addrpair/skc_portpair pair values to better
document the alignment requirements of the port/addr pairs
used in the various MATCH() macros, and remove some casts.

The namespace check can also be done at last.

This slightly improves TCP/UDP lookup times.

IP/TCP early demux needs inet->rx_dst_ifindex and
TCP needs inet->min_ttl, lets group them together in same cache line.

With help from Ben Hutchings & Joe Perches.

Idea of this patch came after Ling Ma proposal to move skc_hash
to the beginning of struct sock_common, and should allow him
to submit a final version of his patch. My tests show an improvement
doing so.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 15:02:56 -05:00
Rami Rosen
c07135633b rtnelink: remove unused parameter from rtnl_create_link().
This patch removes an unused parameter (src_net) from rtnl_create_link()
method and from the method single invocation, in veth.
This parameter was used in the past when calling
ops->get_tx_queues(src_net, tb) in rtnl_create_link().
The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
get_num_tx_queues() and get_num_rx_queues(), which do not get any
parameter. This was done in commit d40156aa5e by
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:24:40 -05:00
David S. Miller
e7165030db Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Conflicts:
	net/ipv6/exthdrs_core.c

Jesse Gross says:

====================
This series of improvements for 3.8/net-next contains four components:
 * Support for modifying IPv6 headers
 * Support for matching and setting skb->mark for better integration with
   things like iptables
 * Ability to recognize the EtherType for RARP packets
 * Two small performance enhancements

The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge
conflicts.  I left it as is but can do the merge if you want.  The conflicts
are:
 * ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of
   exthdrs_core.c.  Both should stay.
 * A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c
   after this patch.  The IPVS user has two instances of the old constant
   name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:01:30 -05:00
Johannes Berg
9caf036402 cfg80211: fix BSS struct IE access races
When a BSS struct is updated, the IEs are currently
overwritten or freed. This can lead to races if some
other CPU is accessing the BSS struct and using the
IEs concurrently.

Fix this by always allocating the IEs in a new struct
that holds the data and length and protecting access
to this new struct with RCU.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-30 13:42:20 +01:00
Johannes Berg
b9a9ada14a mac80211: remove probe response temporary buffer allocation
Instead of allocating a temporary buffer to build IEs
build them right into the SKB.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-30 13:41:27 +01:00
John W. Linville
79d38f7d6c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/tx.c
2012-11-28 10:56:03 -05:00
Johannes Berg
53cabad70e nl80211: support P2P GO powersave configuration
If a driver supports P2P GO powersave, allow it to
set the new feature flags for it and allow userspace
to configure the parameters for it. This can be done
at GO startup and later changed with SET_BSS.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-27 11:56:18 +01:00
Johannes Berg
5164892184 mac80211: support (partial) VHT radiotap information
Add some information that we have about VHT to radiotap.
This at least lets one see the MCS and NSS information.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-27 11:56:18 +01:00
Johannes Berg
9f5e8f6efc cfg80211: rework chandef checking and export it
Some of the chandef checking that we do in cfg80211
to check if a channel is supported or not is also
needed in mac80211, so rework that a bit and export
the functions that are needed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-27 09:18:25 +01:00
John W. Linville
62c8003ecb Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-11-26 14:46:41 -05:00
Johannes Berg
8bc83c2463 mac80211: support VHT rates in TX info
To achieve this, limit the number of retries to
31 (instead of 255) and use the three bits that
are then free for VHT flags.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-26 12:43:00 +01:00
Johannes Berg
5614618ec4 mac80211: support drivers reporting VHT RX
Add support to mac80211 for having drivers report
received VHT MCS information.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-26 12:42:59 +01:00
Johannes Berg
db9c64cf8d nl80211/cfg80211: add VHT MCS support
Add support for reporting and calculating VHT MCSes.

Note that I'm not completely sure that the bitrate
calculations are correct, nor that they can't be
simplified.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-26 12:42:59 +01:00
Johannes Berg
4bf88530be mac80211: convert to channel definition struct
Convert mac80211 (and where necessary, some drivers a
little bit) to the new channel definition struct.

This will allow extending mac80211 for VHT, which is
currently restricted to channel contexts since there
are no drivers using that which makes it easier. As
I also don't care about VHT for drivers not using the
channel context API, I won't convert the previous API
to VHT support.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-26 12:42:59 +01:00
Johannes Berg
3d9d1d6656 nl80211/cfg80211: support VHT channel configuration
Change nl80211 to support specifying a VHT (or HT)
using the control channel frequency (as before) and
new attributes for the channel width and first and
second center frequency. The old channel type is of
course still supported for HT.

Also change the cfg80211 channel definition struct
to support these by adding the relevant fields to
it (and removing the _type field.)

This also adds new helper functions:
 - cfg80211_chandef_create to create a channel def
   struct given the control channel and channel type,
 - cfg80211_chandef_identical to check if two channel
   definitions are identical
 - cfg80211_chandef_compatible to check if the given
   channel definitions are compatible, and return the
   wider of the two

This isn't entirely complete, but that doesn't matter
until we have a driver using it. In particular, it's
missing
 - regulatory checks on the usable bandwidth (if that
   even makes sense)
 - regulatory TX power (database can't deal with it)
 - a proper channel compatibility calculation for the
   new channel types

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-26 12:42:59 +01:00
Johannes Berg
683b6d3b31 cfg80211: pass a channel definition struct
Instead of passing a channel pointer and channel type
to all functions and driver methods, pass a new channel
definition struct. Right now, this struct contains just
the control channel and channel type, but for VHT this
will change.

Also, add a small inline cfg80211_get_chandef_type() so
that drivers don't need to use the _type field of the
new structure all the time, which will change.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-26 12:42:58 +01:00
Johannes Berg
42d97a599e cfg80211: remove remain-on-channel channel type
As mwifiex (and mac80211 in the software case) are the
only drivers actually implementing remain-on-channel
with channel type, userspace can't be relying on it.
This is the case, as it's used only for P2P operations
right now.

Rather than adding a flag to tell userspace whether or
not it can actually rely on it, simplify all the code
by removing the ability to use different channel types.
Leave only the validation of the attribute, so that if
we extend it again later (with the needed capability
flag), it can't break userspace sending invalid data.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-26 12:42:58 +01:00
Arend van Spriel
c216e6417f cfg80211: change function signature of cfg80211_get_p2p_attr()
The function cfg80211_get_p2p_attr() can fail and returns
a negative error code. However, the return type is unsigned
int. The largest positive number is determined by desired_len
variable in the function, which is u16. So changing the return
type to int to allow easy error checking. Also change the type
for the attribute to enum for improved type checking.

Signed-off-by: Arend van Spriel <arend@broadcom.com>
[fix indentation, don't use u8 attr variable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-26 11:28:55 +01:00
David S. Miller
24bc518a68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/tx.c

Minor iwlwifi conflict in TX queue disabling between 'net', which
removed a bogus warning, and 'net-next' which added some status
register poking code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-25 12:49:17 -05:00
Neal Cardwell
8d8be8389f tcp: remove dead prototype for tcp_v4_get_peer()
This function no longer exists.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-23 14:10:31 -05:00
Tejun Heo
88d642fa2c netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
With priomap expansion no longer depending on knowing max id
allocated, netprio_cgroup can use cgroup->id insted of cs->prioidx.
Drop prioidx alloc/free logic and convert all uses to cgroup->id.

* In cgrp_css_alloc(), parent->id test is moved above @cs allocation
  to simplify error path.

* In cgrp_css_free(), @cs assignment is made initialization.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: David S. Miller <davem@davemloft.net>
2012-11-22 07:32:47 -08:00
John W. Linville
75c8ec71fb Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-11-21 14:43:51 -05:00
John W. Linville
d2ff5ee919 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-11-21 13:03:00 -05:00
Sujith Manoharan
77d2ece6fd mac80211: Add debugfs callbacks for station addition/removal
Provide drivers with hooks to create debugfs files when
a new station is added. This would help drivers to take
advantage of mac80211's station list infrastructure and not maintain
tedious station management code internally.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
[ifdef inline wrapper functions]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-21 11:46:25 +01:00
Neil Horman
de4594a51c sctp: send abort chunk when max_retrans exceeded
In the event that an association exceeds its max_retrans attempts, we should
send an ABORT chunk indicating that we are closing the assocation as a result.
Because of the nature of the error, its unlikely to be received, but its a nice
clean way to close the association if it does make it through, and it will give
anyone watching via tcpdump a clue as to what happened.

Change notes:
v2)
	* Removed erroneous changes from sctp_make_violation_parmlen

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20 15:50:37 -05:00
Eric W. Biederman
98f842e675 proc: Usable inode numbers for the namespace file descriptors.
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20 04:19:49 -08:00
Eric Lapuyade
9c5121a034 NFC: Export nfc_hci_sak_to_protocol()
Some HCI drivers will need it.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-11-19 23:56:59 +01:00
Eric Lapuyade
84d4819033 NFC: Export nfc_hci_result_to_errno as it can be needed by HCI drivers
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-11-19 23:56:59 +01:00
Szymon Janc
d1244adc42 Bluetooth: Increase HCI command tx timeout
Read Local OOB Data command can take more than 1 second on some chips.
e.g. on CSR 0a12:0001 first call to Read Local OOB Data after reset
takes about 1300ms resulting in tx timeout error.

[27698.368655] Bluetooth: hci0 command 0x0c57 tx timeout

2012-10-31 15:53:36.178585 < HCI Command: Read Local OOB Data (0x03|0x0057) plen 0
2012-10-31 15:53:37.496996 > HCI Event: Command Complete (0x0e) plen 36
    Read Local OOB Data (0x03|0x0057) ncmd 1
    status 0x00
    hash 0x92219d9b447f2aa9dc12dda2ae7bae6a
    randomizer 0xb1948d0febe4ea38ce85c4e66313beba

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-19 19:54:38 -02:00
Andrei Emeltchenko
a514b17fab Bluetooth: Refactor locking in amp_physical_cfm
Remove locking from l2cap_physical_cfm and lock chan inside
amp_physical_cfm.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-19 19:30:00 -02:00
Eliad Peller
4988456862 mac80211: make remain_on_channel() op pass vif param
Drivers (e.g. wl12xx) might need to know the vif
to roc on (mainly in order to configure the
rx filters correctly).

Add the vif to the op params, and update the current
users (iwlwifi) to use the new api.

Signed-off-by: Eliad Peller <eliad@wizery.com>
[fix hwsim]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-19 16:20:37 +01:00
Jouni Malinen
3475b0946b cfg80211: Add TDLS event to allow drivers to request operations
The NL80211_CMD_TDLS_OPER command was previously used only for userspace
request for the kernel code to perform TDLS operations. However, there
are also cases where the driver may need to request operations from
userspace, e.g., when using security on the AP path. Add a new cfg80211
function for generating a TDLS operation event for drivers to request a
new link to be set up (NL80211_TDLS_SETUP) or an existing link to be
torn down (NL80211_TDLS_TEARDOWN). Drivers can optionally use these
events, e.g., based on noticing data traffic being sent to a peer
station that is seen with good signal strength.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-19 15:47:32 +01:00
Johannes Berg
90b9e446fb mac80211: support radiotap vendor namespace RX data
In some cases, in particular for experimentation, it
can be useful to be able to add vendor namespace data
to received frames in addition to the normal radiotap
data.

Allow doing this through mac80211 by adding fields to
the RX status descriptor that describe the data while
the data itself is prepended to the frame.

Also add some example code to hwsim, but don't enable
it because it doesn't use a proper OUI identifier.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-19 15:45:46 +01:00
Adam Buchbinder
48fc7f7e78 Fix misspellings of "whether" in comments.
"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:31:35 +01:00
Adam Buchbinder
d93cf0687c various: Fix spelling of "registered" in comments.
Some comments misspell "registered"; this fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:29:46 +01:00
Eric W. Biederman
038e7332b8 userns: make each net (net_ns) belong to a user_ns
The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-18 22:46:23 -08:00
Eric W. Biederman
d727abcb23 netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS
The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace.  We don't even have
a previous network namespace.

Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns.  One version for when
we compile in network namespace suport and another stub for all other
occasions.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-18 22:46:19 -08:00
Eric W. Biederman
d328b83682 userns: make each net (net_ns) belong to a user_ns
The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Eric W. Biederman
2407dc25f7 netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS
The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace.  We don't even have
a previous network namespace.

Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns.  One version for when
we compile in network namespace suport and another stub for all other
occasions.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Johan Hedberg
c1d5dc4ac1 Bluetooth: Fix updating advertising state flags and data
This patch adds a callback for the HCI_LE_Set_Advertise_Enable command.
The callback is responsible for updating the HCI_LE_PERIPHERAL flag
updating as well as updating the advertising data flags field to
indicate undirected connectable advertising.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-18 23:03:01 -02:00
Johan Hedberg
3f0f524baf Bluetooth: Add support for setting LE advertising data
This patch adds support for setting basing LE advertising data. The
three elements supported for now are the advertising flags, the TX power
and the friendly name.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-18 23:03:01 -02:00
Johan Hedberg
bbaf444a89 Bluetooth: Use proper invalid value for tx_power
The core specification defines 127 as the "not available" value (well,
"reserved" for BR/EDR and "not available" for LE - but essentially the
same). Therefore, instead of testing for 0 (which is in fact a valid
value) we should be using this invalid value to test if the tx_power is
available.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-18 23:03:00 -02:00
John W. Linville
0f62248501 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-11-16 13:59:51 -05:00
Nicolas Dichtel
130cd273d4 ipv6: export IP6_RT_PRIO_* to userland
The kernel uses some default metric when routes are managed. For example, a
static route added with a metric set to 0 is inserted in the kernel with
metric 1024 (IP6_RT_PRIO_USER).
It is useful for routing daemons to know these values, to be able to set routes
without interfering with what the kernel does.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-16 01:47:40 -05:00
Vlad Yasevich
f191a1d17f net: Remove code duplication between offload structures
Move the offload callbacks into its own structure.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:39:51 -05:00
Vlad Yasevich
c6b641a4c6 ipv6: Pull IPv6 GSO registration out of the module
Sing GSO support is now separate, pull it out of the module
and make it its own init call.
Remove the cleanup functions as they are no longer called.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:39:24 -05:00
Vlad Yasevich
8663e02aba ipv6: Separate tcp offload functionality
Pull TCPv6 offload functionality into its won file in preparation
for moving it out of the module.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:18 -05:00
Vlad Yasevich
3336288a9f ipv6: Switch to using new offload infrastructure.
Switch IPv6 protocol to using the new GRO/GSO calls and data.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Vlad Yasevich
bca49f843e ipv4: Switch to using the new offload infrastructure.
Switch IPv4 code base to using the new GRO/GSO calls and data.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Vlad Yasevich
8ca896cfdd ipv6: Add new offload registration infrastructure.
Create a new data structure for IPv6 protocols that holds GRO/GSO
callbacks and a new array to track the protocols that register GRO/GSO.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Vlad Yasevich
de27d001d1 net: Add net protocol offload registration infrustructure
Create a new data structure for IPv4 protocols that holds GRO/GSO
callbacks and a new array to track the protocols that register GRO/GSO.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
David S. Miller
b092d92a68 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Included is a Bluetooth pull -- Gustavo says:

"These are the Bluetooth bits for inclusion in 3.8, there is basically one big
thing here which is the High Speed patches from Andrei, he did a lot of work on
A2MP and management of AMP devices. The rest are mostly clean up and bug
fixes."

Also included is an NFC pull -- Samuel says:

"With this one we have:

- pn544 p2p support.
- pn544 physical and HCI layers separation. We are getting the pn544 driver
  ready to support non i2c physical layers.
- LLCP SNL (Service Name Lookup). This is the NFC p2p service discovery
  protocol.
- LLCP datagram sockets (connection less) support.
- IDR library usage for NFC devices indexes assignement.
- NFC netlink extension for setting and getting LLCP link characteristics.
- Various code style fixes and cleanups spread over the pn533, LLCP, HCI and
  pn544 code."

There are a couple of mac80211 pulls as well -- Johannes says:

"Please pull my mac80211-next tree to get the first round of new features
for 3.8. We have:
 * finally, the mac80211 multi-channel work
 * scan improvements:
   - bg scan
   - scan flush
   - forced AP scan
 * cfg80211 tracing
 * a bit of new code to allow implementing SAE (secure authentication of
   equals) in managed mode

Along with a few random improvements, features and fixes."

and...

"Please pull from mac80211-next (per below pull request) to get a few
updates. Most important is probably the fix for the WDS regression that
my previous pull request introduced. Other than that, I have some
tracing code, two mesh updates and a change to allow drivers to
calculate the AES CMAC subkeys without having to implement the GF_mulx
operation themselves."

On top of that are the usual updates to iwlwifi, ath9k, rt2x00,
brcmfmac, mwifiex, and a few others here and there.  Of note is the
addition of the ar5523 driver, ported from an original FreeBSD driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 22:06:57 -05:00
Amerigo Wang
aa0010f880 net: convert __IPTUNNEL_XMIT() to an inline function
__IPTUNNEL_XMIT() is an ugly macro, convert it to a static
inline function, so make it more readable.

IPTUNNEL_XMIT() is unused, just remove it.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 18:49:50 -05:00
John W. Linville
b7fd76d114 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-11-14 14:51:06 -05:00
John W. Linville
5bdf502dd9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-11-14 13:33:43 -05:00
Thomas Pedersen
f4bda337bb mac80211: support RX_FLAG_MACTIME_END
Allow drivers to indicate their mactime is at RX completion and adjust
for this in mac80211. Also rename the existing RX_FLAG_MACTIME_MPDU to
RX_FLAG_MACTIME_START to clarify its intent. Based on similar code by
Johannes Berg.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
[fix docs, atheros drivers]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-13 21:43:55 +01:00
Steffen Klassert
703fb94ec5 xfrm: Fix the gc threshold value for ipv4
The xfrm gc threshold value depends on ip_rt_max_size. This
value was set to INT_MAX with the routing cache removal patch,
so we start doing garbage collecting when we have INT_MAX/2
IPsec routes cached. Fix this by going back to the static
threshold of 1024 routes.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2012-11-13 09:15:07 +01:00
Ansis Atteka
9195bb8e38 ipv6: improve ipv6_find_hdr() to skip empty routing headers
This patch prepares ipv6_find_hdr() function so that it could be
able to skip routing headers, where segements_left is 0. This is
required to handle multiple routing header case correctly when
changing IPv6 addresses.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-11-12 12:31:37 -08:00
David S. Miller
d4185bbf62 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c

Minor conflict between the BCM_CNIC define removal in net-next
and a bug fix added to net.  Based upon a conflict resolution
patch posted by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-10 18:32:51 -05:00
Jesse Gross
f8f626754e ipv6: Move ipv6_find_hdr() out of Netfilter code.
Open vSwitch will soon also use ipv6_find_hdr() so this moves it
out of Netfilter-specific code into a more common location.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-11-09 17:05:07 -08:00
Johannes Berg
8b2c98243e mac80211: clarify interface iteration and make it configurable
During hardware restart, all interfaces are iterated even
though they haven't been re-added to the driver, document
this behaviour. The same also happens during resume, which
is even more confusing since all of the interfaces were
previously removed from the driver. Make this optional so
drivers relying on the current behaviour can still use it,
but to let drivers that don't want this behaviour disable
it.

Also convert all API users, keeping the old semantics
except in hwsim, where the new normal ones are desired.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-09 17:34:35 +01:00
Johannes Berg
9214ad7f9a mac80211: call driver method when restart completes
When the driver requests a restart (reconfiguration) it
gets all the normal method calls, but can't really tell
why they're happening. Call a new restart_complete op
in the driver when the restart completes, so it could
keep its own state about the restart and clear it there.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-09 17:34:35 +01:00
Li RongQing
a4477c4ddb ipv6: remove rt6i_peer_genid from rt6_info and its handler
6431cbc25f(Create a mechanism for upward inetpeer propagation into routes)
introduces these codes, but this mechanism is never enabled since
rt6i_peer_genid always is zero whether it is not assigned or assigned by
rt6_peer_genid(). After 5943634fc5 (ipv4: Maintain redirect and PMTU info
in struct rtable again), the ipv4 related codes of this mechanism has been
removed, I think we maybe able to remove them now.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-08 21:16:08 -05:00
Johannes Berg
488dd7b53d mac80211: pass P2P powersave parameters to driver
While connected to a GO, parse the P2P NoA attribute
and pass the CT Window and opportunistic powersave
parameters to the driver.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-06 13:25:06 +01:00
Johannes Berg
0ee453552f wireless: add utility function to get P2P attribute
Parsing the P2P attributes can be tricky as their
contents can be split across multiple (vendor) IEs.
Thus, it's not possible to parse them like IEs (by
returning a pointer to the data.) Instead, provide
a function that copies the attribute data into a
caller-provided buffer and returns the size needed
(useful in case the buffer was too small.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-06 13:24:52 +01:00
Ben Greear
37c73b5f32 cfg80211: allow registering more than one beacon listener
The commit:

commit 5e760230e4
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Fri Nov 4 11:18:17 2011 +0100

    cfg80211: allow registering to beacons

allowed only a single process to register for beacon events
per wiphy.  This breaks cases where a user may want two or
more VIFs on a wiphy and run a seperate hostapd process on
each vif.

This patch allows multiple beacon listeners, fixing the
regression.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-05 16:33:45 +01:00
Antonio Quartulli
f4e583c893 nl/cfg80211: add the NL80211_CMD_SET_MCAST_RATE command
This command triggers a new callback: set_mcast_rate(). It enables
the user to change the rate used to send multicast frames for vif
configured as IBSS or MESH_POINT

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-11-05 15:54:45 +01:00
Amerigo Wang
94e187c015 ipv6: introduce ip6_rt_put()
As suggested by Eric, we could introduce a helper function
for ipv6 too, to avoid checking if rt is NULL before
dst_release().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03 14:59:05 -04:00
Eric Dumazet
6da025fa23 ipv4: avoid a test in ip_rt_put()
We can save a test in ip_rt_put(), considering dst_release() accepts
a NULL parameter, and dst is first element in rtable.

Add a BUILD_BUG_ON() to catch any change that could break this
assertion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03 14:59:04 -04:00
Neil Horman
b26ddd8130 sctp: Clean up type-punning in sctp_cmd_t union
Lots of points in the sctp_cmd_interpreter function treat the sctp_cmd_t arg as
a void pointer, even though they are written as various other types.  Theres no
need for this as doing so just leads to possible type-punning issues that could
cause crashes, and if we remain type-consistent we can actually just remove the
void * member of the union entirely.

Change Notes:

v2)
	* Dropped chunk that modified SCTP_NULL to create a marker pattern
	 should anyone try to use a SCTP_NULL() assigned sctp_arg_t, Assigning
	 to .zero provides the same effect and should be faster, per Vlad Y.

v3)
	* Reverted part of V2, opting to use memset instead of .zero, so that
	 the entire union is initalized thus avoiding the i164 speculative load
	 problems previously encountered, per Dave M..  Also rewrote
	 SCTP_[NO]FORCE so as to use common infrastructure a little more

Signed-off-by: Neil Horman <nhorman@tuxdriver.com
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03 14:54:55 -04:00
Eric Dumazet
e6c022a4fa tcp: better retrans tracking for defer-accept
For passive TCP connections using TCP_DEFER_ACCEPT facility,
we incorrectly increment req->retrans each time timeout triggers
while no SYNACK is sent.

SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
which we received the ACK from client). Only the last SYNACK is sent
so that we can receive again an ACK from client, to move the req into
accept queue. We plan to change this later to avoid the useless
retransmit (and potential problem as this SYNACK could be lost)

TCP_INFO later gives wrong information to user, claiming imaginary
retransmits.

Decouple req->retrans field into two independent fields :

num_retrans : number of retransmit
num_timeout : number of timeouts

num_timeout is the counter that is incremented at each timeout,
regardless of actual SYNACK being sent or not, and used to
compute the exponential timeout.

Introduce inet_rtx_syn_ack() helper to increment num_retrans
only if ->rtx_syn_ack() succeeded.

Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
when we re-send a SYNACK in answer to a (retransmitted) SYN.
Prior to this patch, we were not counting these retransmits.

Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
only if a synack packet was successfully queued.

Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Elliott Hughes <enh@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03 14:45:00 -04:00
Johan Hedberg
0c0afedf55 Bluetooth: Fix parameter order of hci_get_route
The actual parameter order of hci_get_route is (dst, src) and not (src,
dst). All current callers use the right order but the header file shows
the parameters in the wrong order.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:27:11 -02:00
Andrei Emeltchenko
fffadc08eb Bluetooth: Rename ctrl_id to remote_amp_id
Since we have started to use local_amp_id for local AMP
Controller Id it makes sense to rename ctrl_id to remote_amp_id
since it represents remote AMP controller Id.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:27:11 -02:00
Andrei Emeltchenko
cf70ff220a Bluetooth: AMP: Use l2cap_physical_cfm in phylink complete evt
When receiving HCI Phylink Complete event run amp_physical_cfm
which initialize BR/EDR L2CAP channel associated with High Speed
link and run l2cap_physical_cfm which shall send L2CAP Create
Chan Request.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:27:10 -02:00
Andrei Emeltchenko
419e08c112 Bluetooth: Disconnect logical link when deleting chan
Disconnect logical link for high speed channel hs_hchan
associated with L2CAP channel chan.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:27:09 -02:00
Andrei Emeltchenko
606e2a10a6 Bluetooth: AMP: Process Disc Logical Link
Add processing for HCI Disconnection Logical Link Complete
Event.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:27:07 -02:00
Andrei Emeltchenko
5ce66b59d7 Bluetooth: AMP: Add Logical Link Create function
After physical link is created logical link needs to be created.
The process starts after L2CAP channel is created and L2CAP
Configuration Response with result PENDING is received.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:27:07 -02:00
Andrei Emeltchenko
27695fb415 Bluetooth: AMP: Process Logical Link complete evt
After receiving HCI Logical Link Complete event finish EFS
configuration by sending L2CAP Conf Response with success code.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:27:02 -02:00
Johan Hedberg
6b4b73ee75 Bluetooth: Fix sending unnecessary HCI_Write_SSP_Mode command
This patch fixes sending an unnecessary HCI_Write_SSP_Mode command if
the command has already been sent as part of the default HCI init
sequence.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:26:59 -02:00
Johan Hedberg
33f8f5269e Bluetooth: Add flag for LE GAP Peripheral role
This patch adds a flag to be used for LE GAP Peripheral role. In this
role undirected advertising will be enabled and operations not allowed
in Peripheral role (such as scanning and initiating connections) will be
disallowed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:26:58 -02:00
Johan Hedberg
761f09e4d6 Bluetooth: Add missing feature test macros
This patch adds missing feature test macros needed for various use cases
and also sorts the macros according to the feature bit location in the
feature mask (for easy lookup).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:26:57 -02:00
Denis Kirjanov
711584ea4c Bluetooth:Replace list_for_each with list_for_each_entry() helper
Replace list_for_each with list_for_each_entry() helper

Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-11-01 20:26:43 -02:00
Johannes Berg
1ea6f9c0d4 mac80211: handle TX power per virtual interface
Even before channel contexts/multi-channel, having a
single global TX power limit was already problematic,
in particular if two managed interfaces connected to
two APs with different power constraints. The channel
context introduction completely broke this though and
in fact I had disabled TX power configuration there
for drivers using channel contexts.

Change everything to track TX power per interface so
that different user settings and different channel
maxima are treated correctly. Also continue tracking
the global TX power though for compatibility with
applications that attempt to configure the wiphy's
TX power globally.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-30 09:11:34 +01:00
Johannes Berg
c8442118ad cfg80211: allow per interface TX power setting
The TX power setting is currently per wiphy (hardware
device) but with multi-channel capabilities that doesn't
make much sense any more.

Allow drivers (and mac80211) to advertise support for
per-interface TX power configuration. When the TX power
is configured for the wiphy, the wdev will be NULL and
the driver can still handle that, but when a wdev is
given the TX power can be set only for that wdev now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-30 09:11:34 +01:00
Johannes Berg
6fb47de9cf Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2012-10-30 09:09:48 +01:00
John W. Linville
ab3d59d265 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	drivers/net/wireless/mwifiex/cfg80211.c
2012-10-29 16:05:51 -04:00
John W. Linville
e298c79efc This is the first NFC pull request for 3.8
With this one we have:
 
 - pn544 p2p support.
 - pn544 physical and HCI layers separation. We are getting the pn544 driver
   ready to support non i2c physical layers.
 - LLCP SNL (Service Name Lookup). This is the NFC p2p service discovery
   protocol.
 - LLCP datagram sockets (connection less) support.
 - IDR library usage for NFC devices indexes assignement.
 - NFC netlink extension for setting and getting LLCP link characteristics.
 - Various code style fixes and cleanups spread over the pn533, LLCP, HCI and
   pn544 code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQjcPOAAoJEIqAPN1PVmxKLQUP/3rkHa4enQTyBMU326z+OE6B
 MtY0Xg6C5U5mksNiqAelkKPMOYy3nuoLcx2sZ9ltgJfG6qCNSV273Hr109JFlhI+
 qYAybb+VOb4MqcbkUcMFzO+CBGfcDaGhJ4ibjPAQCgBxIOjKqFzbLVZ4K/xf+GUj
 eTEeEryJYb+jFt/RSMXzoHipEM30ROLa9VRvoyxL6m0/3uRrxK/NPVbOxPRHaJqa
 /oNJxC2BLPb7t4V2iBOzqChFWP2tknmVWzCuI+X0484hH2BGhm4dUzlKXEu286U9
 iMueUv0vx68IpKXgzTStgfeIzxA84ufVWV5d+rPo4iu4HbCth+Fsk6Lds2nqmGAn
 QoEYhds0r2CrPnuWN7zP+cc/mXTDHDrPj8pf9iC/KdMSe2bUg0kHOyF08A9zLNh0
 nOr2DSzYQ9IqgvWtrAUXdfR2QaFHFsb/+pXWKZsE8+PHxXR/sv7mGPD2OCR1TAqx
 G1OmOtHHvnwTlxHpNhIOuH35gfi5HPvPCZ9in5fnN8n0eaX3JyZw96fk5QUqYJ3w
 UQBAsagMnWRJUTUYYKyG+IjxkGrdNjmLRQQScYuKktZ07/PL4T5t3RpSjxzlMFwc
 ImdOdPntHXI2d+m874yCRAhwdp6vTfpgJl4V4XuON13DPbVO6qBqtNGxhYMqJFWF
 cxqrWdaPydmFKy/KLNwF
 =OC0R
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-3.0

This is the first NFC pull request for 3.8

With this one we have:

- pn544 p2p support.
- pn544 physical and HCI layers separation. We are getting the pn544 driver
  ready to support non i2c physical layers.
- LLCP SNL (Service Name Lookup). This is the NFC p2p service discovery
  protocol.
- LLCP datagram sockets (connection less) support.
- IDR library usage for NFC devices indexes assignement.
- NFC netlink extension for setting and getting LLCP link characteristics.
- Various code style fixes and cleanups spread over the pn533, LLCP, HCI and
  pn544 code.
2012-10-29 14:52:37 -04:00
John W. Linville
d1f1030256 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-10-29 14:52:04 -04:00
John W. Linville
efec22b468 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-10-29 14:14:48 -04:00
Johannes Berg
9b395bc3be mac80211: verify that skb data is present
A number of places in the mesh code don't check that
the frame data is present and in the skb header when
trying to access. Add those checks and the necessary
pskb_may_pull() calls. This prevents accessing data
that doesn't actually exist.

To do this, export ieee80211_get_mesh_hdrlen() to be
able to use it in mac80211.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-26 22:52:42 +02:00
David S. Miller
f019948dbb Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following changeset contains updates for IPVS from Jesper Dangaard
Brouer that did not reach the previous merge window in time.

More specifically, updates to improve IPv6 support in IPVS. More
relevantly, some of the existing code performed wrong handling of the
extensions headers and better fragmentation handling.

Jesper promised more follow-up patches to refine this after this batch
hits net-next. Yet to come.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 14:40:55 -04:00
Samuel Ortiz
7eda8b8e96 NFC: Use IDR library to assing NFC devices IDs
As a consequence the NFC device IDs won't be increasing all the time,
as IDR provides the first available ID.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-10-26 18:26:51 +02:00
Eric Lapuyade
97f18414af NFC: Separate pn544 hci driver in HW dependant and independant parts
The driver now has all HCI stuff isolated in one file, and all the
hardware link specifics in another. Writing a pn544 driver on top of
another hardware link is now just a matter of adding a new file for that
new hardware specifics.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-10-26 18:26:46 +02:00
Arron Wang
e81076235b NFC: Implement HCI DEP send and receive data
And implement the corresponding hooks for pn544.

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-10-26 18:26:46 +02:00
Arron Wang
c40d17401f NFC: Implement HCI DEP link up and down
And implement the corresponding hooks for pn544.

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-10-26 18:26:45 +02:00
Arron Wang
f7a5f6c532 NFC: Pass hardware specific HCI event to driver
Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-10-26 18:26:45 +02:00
Arron Wang
7e2afc9d07 NFC: Set local gb and DEP registries
Set the local general bytes and default value for NFCIP1
Target/Initiator registries if the protocol is NFC-DEP

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-10-26 18:26:45 +02:00
Johannes Berg
1041638f2b mac80211: add explicit AP/GO driver operations
Depending on the driver, a lot of setup may be
necessary to start operating as an AP, some of
which may fail. Add an explicit AP start driver
method to make such failures easier to handle,
and add an AP stop driver method for symmetry.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-26 12:57:06 +02:00
David S. Miller
dc95a2c006 net: Update args to dummy sock_update_classid().
Only the real implementation got updated.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 05:07:00 -04:00
Daniel Wagner
fd9a08a7b8 cgroup: net_cls: Pass in task to sock_update_classid()
sock_update_classid() assumes that the update operation always are
applied on the current task. sock_update_classid() needs to know on
which tasks to work on in order to be able to migrate task between
cgroups using the struct cgroup_subsys attach() callback.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Joe Perches <joe@perches.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <netdev@vger.kernel.org>
Cc: <cgroups@vger.kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 03:40:50 -04:00
Daniel Wagner
920750ce38 cgroup: net_cls: Fix local variable type decleration
The classid type used throughout the kernel is u32.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <netdev@vger.kernel.org>
Cc: <cgroups@vger.kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 03:40:50 -04:00
Neil Horman
3c68198e75 sctp: Make hmac algorithm selection for cookie generation dynamic
Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to
generate cookie values when establishing new connections via two build time
config options.  Theres no real reason to make this a static selection.  We can
add a sysctl that allows for the dynamic selection of these algorithms at run
time, with the default value determined by the corresponding crypto library
availability.
This comes in handy when, for example running a system in FIPS mode, where use
of md5 is disallowed, but SHA1 is permitted.

Note: This new sysctl has no corresponding socket option to select the cookie
hmac algorithm.  I chose not to implement that intentionally, as RFC 6458
contains no option for this value, and I opted not to pollute the socket option
namespace.

Change notes:
v2)
	* Updated subject to have the proper sctp prefix as per Dave M.
	* Replaced deafult selection options with new options that allow
	  developers to explicitly select available hmac algs at build time
	  as per suggestion by Vlad Y.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 02:22:18 -04:00
Johan Hedberg
8fa19098eb Bluetooth: Read adversiting channel TX power during init sequence
This patch adds the reading of the LE advertising channel TX power to
the HCI init sequence of LE-capable controllers. This data will be used
e.g. for inclusion in the advertising data packets when advertising is
enabled.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-24 11:17:17 -02:00
Mat Martineau
3f7a56c4ff Bluetooth: Start channel move when socket option is changed
Channel moves are triggered by changes to the BT_CHANNEL_POLICY
sockopt when an ERTM or streaming-mode channel is connected.

Moves are only started if enable_hs is true.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-24 00:26:30 -02:00
Mat Martineau
5b155ef960 Bluetooth: Move channel response
The move response command includes a result code indicating
"pending", "success", or "failure" status.  A pending result is
received when the remote address is still setting up a physical link,
and will be followed by success or failure.  On success, logical link
setup will proceed.  On failure, the move is stopped.  The receiver of
a move channel response must always follow up by sending a move
channel confirm command.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-24 00:20:54 -02:00
Mat Martineau
168df8e57e Bluetooth: Add state to hci_chan
On an AMP controller, hci_chan maps to a logical link.  When a channel
is being moved, the logical link may or may not be connected already.
The hci_chan->state is used to determine the existance of a useable
logical link so the link can be either used or requested.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-24 00:16:23 -02:00
Mat Martineau
08333283a7 Bluetooth: Add new l2cap_chan struct members for high speed channels
An L2CAP channel using high speed continues to be associated with a
BR/EDR l2cap_conn, while also tracking an additional hci_conn
(representing a physical link on a high speed controller) and hci_chan
(representing a logical link).  There may only be one physical link
between two high speed controllers.  Each physical link may contain
several logical links, with each logical link representing a channel
with specific quality of service.

During a channel move, the destination channel id, current move state,
and role (initiator vs. responder) are tracked and used by the channel
move state machine.  The ident value associated with a move request
must also be stored in order to use it in later move responses.

The active channel is stored in local_amp_id.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-23 23:57:02 -02:00
Assaf Krauss
5d0d04e477 mac80211: expose AES-CMAC subkey calculation
Expose a function for the AES-CMAC subkey calculation
to drivers. This is the first step of the AES-CMAC
cipher key setup and may be required for CMAC hardware
offloading.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-23 19:52:52 +02:00
John W. Linville
9b34f40c20 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
	net/mac80211/mlme.c
2012-10-23 11:41:46 -04:00
Nicolas Dichtel
51ebd31815 ipv6: add support of equal cost multipath (ECMP)
Each nexthop is added like a single route in the routing table. All routes
that have the same metric/weight and destination but not the same gateway
are considering as ECMP routes. They are linked together, through a list called
rt6i_siblings.

ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
the other (in both case, the flag NLM_F_EXCL should not be set).

The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-23 02:38:32 -04:00
Jesper Dangaard Brouer
54d83efa44 ipvs: fix build errors related to config option combinations
Fix two build error introduced by commit 63dca2c0:
 "ipvs: Fix faulty IPv6 extension header handling in IPVS"

First build error was fairly trivial and can occur, when
CONFIG_IP_VS_IPV6 is disabled.

The second build error was tricky, and can occur when deselecting
both all Netfilter and IPVS, but selecting CONFIG_IPV6.  This is
caused by "kernel/sysctl_binary.c" including "net/ip_vs.h", which
includes "linux/netfilter_ipv6/ip6_tables.h" causing include
of "include/linux/netfilter/x_tables.h" which then cannot find
the typedef nf_hookfn.

Fix this by only including "linux/netfilter_ipv6/ip6_tables.h" in
case of CONFIG_IP_VS_IPV6 as its already used to guard the usage
of ipv6_find_hdr().

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-10-23 09:23:40 +09:00
Pablo Neira Ayuso
bcc58c4d91 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next
Pull updates from Jesper Dangaard Brouer for IPVS mostly targeted
to improve IPv6 support (7 commits):

 ipvs: Trivial changes, use compressed IPv6 address in output
 ipvs: IPv6 extend ICMPv6 handling for future types
 ipvs: Use config macro IS_ENABLED()
 ipvs: Fix faulty IPv6 extension header handling in IPVS
 ipvs: Complete IPv6 fragment handling for IPVS
 ipvs: API change to avoid rescan of IPv6 exthdr
 ipvs: SIP fragment handling
2012-10-22 12:30:41 +02:00
Pavel Emelyanov
0da5f7c6d2 unix: Remove unused field from unix_sock
The struct sock *other one seem to be unused. Grep and make do not object.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-21 20:37:06 -04:00
John W. Linville
3cd17638fd Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-10-19 15:36:53 -04:00
John W. Linville
bc27d5f143 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-10-19 15:22:27 -04:00
Andrei Emeltchenko
f706adfead Bluetooth: AMP: Get amp_mgr reference in HS hci_conn
When assigning amp_mgr in hci_conn (type AMP_LINK) get also reference.
In hci_conn_del those references would be put for both conn types
AMP_LINK and ACL_LINK associated with amp_mgr.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-18 07:27:20 -03:00
Sujith Manoharan
c13a765bd9 mac80211: Notify new IBSS network creation
Initialization of beacon transmission in IBSS mode depends
on whether a new BSS is being created or joined. When joining
an existing IBSS network, beaconing has to start only after
a TSF-sync has happened - this is explained in 11.1.4.

Introduce a new parameter in the BSS information structure to
indicate creator/joiner mode.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-18 09:01:53 +02:00
Sam Leffler
15d6030b4b cfg80211: add support for flushing old scan results
Add an NL80211_SCAN_FLAG_FLUSH flag that causes old bss cache
entries to be flushed on scan completion. This is useful for
collecting guaranteed fresh scan/survey result (e.g. on resume).

For normal scan, flushing only happens on successful completion
of a scan; i.e. it does not happen if the scan is aborted.
For scheduled scan, previous scan results are flushed everytime
when we get new scan results.

This feature is enabled by default. Drivers can disable it by
unsetting the NL80211_FEATURE_SCAN_FLUSH flag.

Signed-off-by: Sam Leffler <sleffler@chromium.org>
Tested-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
[invert polarity of feature flag to account for old kernels]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-18 09:01:52 +02:00
Sam Leffler
ed47377154 {nl,cfg}80211: add a flags word to scan requests
Add a flags word to direct and scheduled scan requests; it will
be used for control of optional behaviours such as flushing the
bss cache prior to doing a scan.

Signed-off-by: Sam Leffler <sleffler@chromium.org>
Tested-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-18 09:01:23 +02:00
John W. Linville
290eddc4b3 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-10-17 16:23:33 -04:00
Mahesh Palivela
f461be3eff {nl,cfg}80211: Peer STA VHT caps
To save STAs VHT caps in AP mode

Signed-off-by: Mahesh Palivela <maheshp@posedge.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-17 11:02:15 +02:00
Mahesh Palivela
818255ea47 mac80211: VHT peer STA caps
Save the AP's VHT capabilities (in managed
mode) and make them available to the driver
in the station information.

Unlike HT capabilities, they aren't restricted
to the common capabilities, so drivers must be
aware of their own capabilities.

Signed-off-by: Mahesh Palivela <maheshp@posedge.com>
[fix endian conversion bug ...]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-17 11:02:14 +02:00
Jouni Malinen
e39e5b5e72 cfg80211: Allow user space to specify non-IEs to SAE Authentication
SAE extends Authentication frames with fields that are not information
elements. NL80211_ATTR_IE is not suitable for these, so introduce a new
attribute that can be used to specify the fields needed for SAE in
station mode.

Signed-off-by: Jouni Malinen <j@w1.fi>
[change to verify that SAE is only used with authenticate command]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-17 11:02:11 +02:00
Johannes Berg
3448c00583 mac80211: add channel context iterator
Drivers may need to iterate the active channel
contexts, export an iterator function to allow
that. To make it possible, use RCU-safe list
functions.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-17 11:02:10 +02:00
Johannes Berg
04ecd2578e mac80211: track needed RX chains for channel contexts
On each channel that the device is operating on, it
may need to listen using one or more chains depending
on the SMPS settings of the interfaces using it. The
previous channel context changes completely removed
this ability (before, it was available as the SMPS
mode).

Add per-context tracking of the required static and
dynamic RX chains and notify the driver on changes.
To achieve this, track the chains and SMPS mode used
on each virtual interface and update the channel
context whenever this changes.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-17 11:02:09 +02:00
Michal Kazior
c3645eac47 mac80211: introduce new ieee80211_ops
Introduce channel context driver methods. The channel
on a context channel is immutable, but the channel type
and other properties can change.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-16 20:22:42 +02:00
Michal Kazior
d01a1e6586 mac80211: introduce channel context skeleton code
Channel context are the foundation for multi-channel
operation. They are are immutable and are re-created
(or re-used if other interfaces are bound to a certain
channel and a compatible channel type) on channel
switching.

This is an initial implementation and more features
will come in separate patches.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[some changes including RCU protection]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-16 20:22:41 +02:00
Stanislaw Gruszka
6863255bd0 cfg80211/mac80211: avoid state mishmash on deauth
Avoid situation when we are on associate state in mac80211 and
on disassociate state in cfg80211. This can results on crash
during modules unload (like showed on this thread:
http://marc.info/?t=134373976300001&r=1&w=2) and possibly other
problems.

Reported-by: Pedro Francisco <pedrogfrancisco@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-15 17:21:34 +02:00
Andrei Emeltchenko
d73a098804 Bluetooth: AMP: Handle complete frames in l2cap
Check flags type in switch statement and handle new frame
type ACL_COMPLETE used for High Speed data over AMP.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-15 09:46:39 -03:00
Gustavo Padovan
2dc4e5105f Bluetooth: Add chan->ops->defer()
When DEFER_SETUP is set defer() will trigger an authorization
request to the userspace.

l2cap_chan_no_defer() is meant to be used when one does not want to
support DEFER_SETUP (A2MP for example).

Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-15 09:43:29 -03:00
Andrei Emeltchenko
716e4ab5c9 Bluetooth: AMP: Hanlde AMP_LINK case in conn_put
Handle AMP link when setting up disconnect timeout.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:34:46 +08:00
Andrei Emeltchenko
bd1eb66ba4 Bluetooth: AMP: Handle AMP_LINK connection
AMP_LINK represents physical link between AMP controllers.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:34:24 +08:00
Andrei Emeltchenko
42c4e53e7a Bluetooth: AMP: Add handle to hci_chan structure
hci_chan will be identified by handle used in logical link creation
process. This handle is used in AMP ACL-U packet handle field.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:33:05 +08:00
Andrei Emeltchenko
53502d69be Bluetooth: AMP: Handle AMP_LINK timeout
When AMP_LINK timeouts execute HCI_OP_DISCONN_PHY_LINK as analog to
HCI_OP_DISCONNECT for ACL_LINK.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:30:58 +08:00
Julian Anastasov
c92b96553a ipv4: Add FLOWI_FLAG_KNOWN_NH
Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.

	The returned route can be cached as follows:

- in NH exception: because the cached routes are not shared
	with other destinations
- in FIB NH: when using gateway because all destinations for
	NH share same gateway

	As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:36 -04:00
Julian Anastasov
155e8336c3 ipv4: introduce rt_uses_gateway
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:36 -04:00
Andrei Emeltchenko
8936fa6d1c Bluetooth: L2CAP: Fix using default Flush Timeout for EFS
There are two Flush Timeouts: one is old Flush Timeot Option
which is 2 octets and the second is Flush Timeout inside EFS
which is 4 octets long.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-08 20:40:12 +08:00
Andrei Emeltchenko
0b4558e388 Bluetooth: Adjust L2CAP Max PDU size for AMP packets
Maximum PDU size is defined by new BT Spec as 1492 octets.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Reviewed-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-08 06:22:48 +08:00
Andrei Emeltchenko
a0c234fe89 Bluetooth: AMP: Factor out phylink_add
Add direction parameter to phylink_add since it is anyway set later.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-08 06:21:51 +08:00
Andrei Emeltchenko
fa4ebc66c4 Bluetooth: AMP: Factor out amp_ctrl_add
Add ctrl_id parameter to amp_ctrl_add since we always set it
after function ctrl is created.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-08 06:19:04 +08:00
Linus Torvalds
283dbd8205 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking changes from David Miller:
 "The most important bit in here is the fix for input route caching from
  Eric Dumazet, it's a shame we couldn't fully analyze this in time for
  3.6 as it's a 3.6 regression introduced by the routing cache removal.

  Anyways, will send quickly to -stable after you pull this in.

  Other changes of note:

   1) Fix lockdep splats in team and bonding, from Eric Dumazet.

   2) IPV6 adds link local route even when there is no link local
      address, from Nicolas Dichtel.

   3) Fix ixgbe PTP implementation, from Jacob Keller.

   4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya.

   5) MAC length computed improperly in VLAN demux, from Antonio
      Quartulli."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt
  Remove noisy printks from llcp_sock_connect
  tipc: prevent dropped connections due to rcvbuf overflow
  silence some noisy printks in irda
  team: set qdisc_tx_busylock to avoid LOCKDEP splat
  bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
  sctp: check src addr when processing SACK to update transport state
  sctp: fix a typo in prototype of __sctp_rcv_lookup()
  ipv4: add a fib_type to fib_info
  can: mpc5xxx_can: fix section type conflict
  can: peak_pcmcia: fix error return code
  can: peak_pci: fix error return code
  cxgb4: Fix build error due to missing linux/vmalloc.h include.
  bnx2x: fix ring size for 10G functions
  cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params()
  ixgbe: add support for X540-AT1
  ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit
  ixgbe: fix PTP ethtool timestamping function
  ixgbe: (PTP) Fix PPS interrupt code
  ixgbe: Fix PTP X540 SDP alignment code for PPS signal
  ...
2012-10-06 03:11:59 +09:00
Andi Kleen
04a6f82cf0 sections: fix section conflicts in net
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:04:45 +09:00
Nicolas Dichtel
edfee0339e sctp: check src addr when processing SACK to update transport state
Suppose we have an SCTP connection with two paths. After connection is
established, path1 is not available, thus this path is marked as inactive. Then
traffic goes through path2, but for some reasons packets are delayed (after
rto.max). Because packets are delayed, the retransmit mechanism will switch
again to path1. At this time, we receive a delayed SACK from path2. When we
update the state of the path in sctp_check_transmitted(), we do not take into
account the source address of the SACK, hence we update the wrong path.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 15:53:48 -04:00
Eric Dumazet
f4ef85bbda ipv4: add a fib_type to fib_info
commit d2d68ba9fe (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.

With help from Julian Anastasov who provided very helpful
hints, reproduced here :

<quote>
        Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
192.168.0.1
192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1

        The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

        And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

</quote>

Many thanks to Chris Clayton for his patience and help.

Reported-by: Chris Clayton <chris2553@googlemail.com>
Bisected-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 13:58:26 -04:00
Linus Torvalds
aecdc33e11 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

 1) GRE now works over ipv6, from Dmitry Kozlov.

 2) Make SCTP more network namespace aware, from Eric Biederman.

 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

 4) Make openvswitch network namespace aware, from Pravin B Shelar.

 5) IPV6 NAT implementation, from Patrick McHardy.

 6) Server side support for TCP Fast Open, from Jerry Chu and others.

 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

 8) Increate the loopback default MTU to 64K, from Eric Dumazet.

 9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic.  This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail.  Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
  hyperv: Add buffer for extended info after the RNDIS response message.
  hyperv: Report actual status in receive completion packet
  hyperv: Remove extra allocated space for recv_pkt_list elements
  hyperv: Fix page buffer handling in rndis_filter_send_request()
  hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
  hyperv: Fix the max_xfer_size in RNDIS initialization
  vxlan: put UDP socket in correct namespace
  vxlan: Depend on CONFIG_INET
  sfc: Fix the reported priorities of different filter types
  sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
  sfc: Fix loopback self-test with separate_tx_channels=1
  sfc: Fix MCDI structure field lookup
  sfc: Add parentheses around use of bitfield macro arguments
  sfc: Fix null function pointer in efx_sriov_channel_type
  vxlan: virtual extensible lan
  igmp: export symbol ip_mc_leave_group
  netlink: add attributes to fdb interface
  tg3: unconditionally select HWMON support when tg3 is enabled.
  Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
  gre: fix sparse warning
  ...
2012-10-02 13:38:27 -07:00
Linus Torvalds
437589a74b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "This is a mostly modest set of changes to enable basic user namespace
  support.  This allows the code to code to compile with user namespaces
  enabled and removes the assumption there is only the initial user
  namespace.  Everything is converted except for the most complex of the
  filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
  nfs, ocfs2 and xfs as those patches need a bit more review.

  The strategy is to push kuid_t and kgid_t values are far down into
  subsystems and filesystems as reasonable.  Leaving the make_kuid and
  from_kuid operations to happen at the edge of userspace, as the values
  come off the disk, and as the values come in from the network.
  Letting compile type incompatible compile errors (present when user
  namespaces are enabled) guide me to find the issues.

  The most tricky areas have been the places where we had an implicit
  union of uid and gid values and were storing them in an unsigned int.
  Those places were converted into explicit unions.  I made certain to
  handle those places with simple trivial patches.

  Out of that work I discovered we have generic interfaces for storing
  quota by projid.  I had never heard of the project identifiers before.
  Adding full user namespace support for project identifiers accounts
  for most of the code size growth in my git tree.

  Ultimately there will be work to relax privlige checks from
  "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
  root in a user names to do those things that today we only forbid to
  non-root users because it will confuse suid root applications.

  While I was pushing kuid_t and kgid_t changes deep into the audit code
  I made a few other cleanups.  I capitalized on the fact we process
  netlink messages in the context of the message sender.  I removed
  usage of NETLINK_CRED, and started directly using current->tty.

  Some of these patches have also made it into maintainer trees, with no
  problems from identical code from different trees showing up in
  linux-next.

  After reading through all of this code I feel like I might be able to
  win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
  userns: Convert the ufs filesystem to use kuid/kgid where appropriate
  userns: Convert the udf filesystem to use kuid/kgid where appropriate
  userns: Convert ubifs to use kuid/kgid
  userns: Convert squashfs to use kuid/kgid where appropriate
  userns: Convert reiserfs to use kuid and kgid where appropriate
  userns: Convert jfs to use kuid/kgid where appropriate
  userns: Convert jffs2 to use kuid and kgid where appropriate
  userns: Convert hpfs to use kuid and kgid where appropriate
  userns: Convert btrfs to use kuid/kgid where appropriate
  userns: Convert bfs to use kuid/kgid where appropriate
  userns: Convert affs to use kuid/kgid wherwe appropriate
  userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
  userns: On ia64 deal with current_uid and current_gid being kuid and kgid
  userns: On ppc convert current_uid from a kuid before printing.
  userns: Convert s390 getting uid and gid system calls to use kuid and kgid
  userns: Convert s390 hypfs to use kuid and kgid where appropriate
  userns: Convert binder ipc to use kuids
  userns: Teach security_path_chown to take kuids and kgids
  userns: Add user namespace support to IMA
  userns: Convert EVM to deal with kuids and kgids in it's hmac computation
  ...
2012-10-02 11:11:09 -07:00
Linus Torvalds
c0e8a139a5 Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - xattr support added.  The implementation is shared with tmpfs.  The
   usage is restricted and intended to be used to manage per-cgroup
   metadata by system software.  tmpfs changes are routed through this
   branch with Hugh's permission.

 - cgroup subsystem ID handling simplified.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
  cgroup: Assign subsystem IDs during compile time
  cgroup: Do not depend on a given order when populating the subsys array
  cgroup: Wrap subsystem selection macro
  cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
  cgroup: net_prio: Do not define task_netpioidx() when not selected
  cgroup: net_cls: Do not define task_cls_classid() when not selected
  cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
  cgroup: trivial fixes for Documentation/cgroups/cgroups.txt
  xattr: mark variable as uninitialized to make both gcc and smatch happy
  fs: add missing documentation to simple_xattr functions
  cgroup: add documentation on extended attributes usage
  cgroup: rename subsys_bits to subsys_mask
  cgroup: add xattr support
  cgroup: revise how we re-populate root directory
  xattr: extract simple_xattr code from tmpfs
2012-10-02 10:50:47 -07:00
Eric Dumazet
60769a5dcd ipv4: gre: add GRO capability
Add GRO capability to IPv4 GRE tunnels, using the gro_cells
infrastructure.

Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and
checking GRO is building large packets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-01 17:01:57 -04:00
Eric Dumazet
c9e6bc644e net: add gro_cells infrastructure
This adds a new include file (include/net/gro_cells.h), to bring GRO
(Generic Receive Offload) capability to tunnels, in a modular way.

Because tunnels receive path is lockless, and GRO adds a serialization
using a napi_struct, I chose to add an array of up to
DEFAULT_MAX_NUM_RSS_QUEUES cells, so that multi queue devices wont be
slowed down because of GRO layer.

skb_get_rx_queue() is used as selector.

In the future, we might add optional fanout capabilities, using rxhash
for example.

With help from Ben Hutchings who reminded me
netif_get_num_default_rss_queues() function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-01 17:01:46 -04:00
Linus Torvalds
3498d13b80 TTY merge for 3.7-rc1
As we skipped the merge window for 3.6-rc1 for the tty tree, everything
 is now settled down and working properly, so we are ready for 3.7-rc1.
 Here's the patchset, it's big, but the large changes are removing a
 firmware file and adding a staging tty driver (it depended on the tty
 core changes, so it's going through this tree instead of the staging
 tree.)
 
 All of these patches have been in the linux-next tree for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlBp36oACgkQMUfUDdst+yk4WgCdEy13hot8fI2Lqnc7W0LKu7GX
 4p8AoLTjzrXhLosxdijskDQ9X1OtjrxU
 =S5Ng
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY changes from Greg Kroah-Hartman:
 "As we skipped the merge window for 3.6-rc1 for the tty tree,
  everything is now settled down and working properly, so we are ready
  for 3.7-rc1.  Here's the patchset, it's big, but the large changes are
  removing a firmware file and adding a staging tty driver (it depended
  on the tty core changes, so it's going through this tree instead of
  the staging tree.)

  All of these patches have been in the linux-next tree for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fix up more-or-less trivial conflicts in
 - drivers/char/pcmcia/synclink_cs.c:
    tty NULL dereference fix vs tty_port_cts_enabled() helper function
 - drivers/staging/{Kconfig,Makefile}:
    add-add conflict (dgrp driver added close to other staging drivers)
 - drivers/staging/ipack/devices/ipoctal.c:
    "split ipoctal_channel from iopctal" vs "TTY: use tty_port_register_device"

* tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (235 commits)
  tty/serial: Add kgdb_nmi driver
  tty/serial/amba-pl011: Quiesce interrupts in poll_get_char
  tty/serial/amba-pl011: Implement poll_init callback
  tty/serial/core: Introduce poll_init callback
  kdb: Turn KGDB_KDB=n stubs into static inlines
  kdb: Implement disable_nmi command
  kernel/debug: Mask KGDB NMI upon entry
  serial: pl011: handle corruption at high clock speeds
  serial: sccnxp: Make 'default' choice in switch last
  serial: sccnxp: Remove mask termios caps for SW flow control
  serial: sccnxp: Report actual baudrate back to core
  serial: samsung: Add poll_get_char & poll_put_char
  Powerpc 8xx CPM_UART setting MAXIDL register proportionaly to baud rate
  Powerpc 8xx CPM_UART maxidl should not depend on fifo size
  Powerpc 8xx CPM_UART too many interrupts
  Powerpc 8xx CPM_UART desynchronisation
  serial: set correct baud_base for EXSYS EX-41092 Dual 16950
  serial: omap: fix the reciever line error case
  8250: blacklist Winbond CIR port
  8250_pnp: do pnp probe before legacy probe
  ...
2012-10-01 12:26:52 -07:00
Rami Rosen
dfee1ebc0e Bluetooth: remove unused member of hci_dev.
This patch removes core_data member from hci_dev struct as it is unused.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-01 13:09:22 -03:00
David S. Miller
a248afdc1b Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Here is another batch of updates intended for 3.7...

Highlights include an hci_connect re-write in Bluetooth, HCI/LLC
layer separation in NFC, removal of the raw pn544 NFC driver, NFC LLCP
raw sockets support, improved IBSS auth frame handling in mac80211,
full-MAC AP mode notification support in mac80211, a lot of attention
paid to brcmfmac, and the usual level of updates to iwlwifi, ath9k,
mwifiex, and rt2x00, and various other updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-30 02:30:16 -04:00
David S. Miller
6a06e5e1bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/team/team.c
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/bat_iv_ogm.c
	net/ipv4/fib_frontend.c
	net/ipv4/route.c
	net/l2tp/l2tp_netlink.c

The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.

qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

With help from Antonio Quartulli.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28 14:40:49 -04:00
John W. Linville
c487606f83 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	net/nfc/netlink.c

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-09-28 11:11:16 -04:00
Jesper Dangaard Brouer
d4383f04d1 ipvs: API change to avoid rescan of IPv6 exthdr
Reduce the number of times we scan/skip the IPv6 exthdrs.

This patch contains a lot of API changes.  This is done, to avoid
repeating the scan of finding the IPv6 headers, via ipv6_find_hdr(),
which is called by ip_vs_fill_iph_skb().

Finding the IPv6 headers is done as early as possible, and passed on
as a pointer "struct ip_vs_iphdr *" to the affected functions.

This patch reduce/removes 19 calls to ip_vs_fill_iph_skb().

Notice, I have choosen, not to change the API of function
pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
used by external schedulers, via {un,}register_ip_vs_scheduler.
Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
they do, they are only interested in iph->{s,d}addr.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:34:33 +09:00
Jesper Dangaard Brouer
2f74713d14 ipvs: Complete IPv6 fragment handling for IPVS
IPVS now supports fragmented packets, with support from nf_conntrack_reasm.c

Based on patch from: Hans Schillstrom.

IPVS do like conntrack i.e. use the skb->nfct_reasm
(i.e. when all fragments is collected, nf_ct_frag6_output()
starts a "re-play" of all fragments into the interrupted
PREROUTING chain at prio -399 (NF_IP6_PRI_CONNTRACK_DEFRAG+1)
with nfct_reasm pointing to the assembled packet.)

Notice, module nf_defrag_ipv6 must be loaded for this to work.
Report unhandled fragments, and recommend user to load nf_defrag_ipv6.

To handle fw-mark for fragments.  Add a new IPVS hook into prerouting
chain at prio -99 (NF_IP6_PRI_NAT_DST+1) to catch fragments, and copy
fw-mark info from the first packet with an upper layer header.

IPv6 fragment handling should be the last thing on the IPVS IPv6
missing support list.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:34:24 +09:00
Jesper Dangaard Brouer
63dca2c0b0 ipvs: Fix faulty IPv6 extension header handling in IPVS
IPv6 packets can contain extension headers, thus its wrong to assume
that the transport/upper-layer header, starts right after (struct
ipv6hdr) the IPv6 header.  IPVS uses this false assumption, and will
write SNAT & DNAT modifications at a fixed pos which will corrupt the
message.

To fix this, proper header position must be found before modifying
packets.  Introducing ip_vs_fill_iph_skb(), which uses ipv6_find_hdr()
to skip the exthdrs. It finds (1) the transport header offset, (2) the
protocol, and (3) detects if the packet is a fragment.

Note, that fragments in IPv6 is represented via an exthdr.  Thus, this
is detected while skipping through the exthdrs.

This patch depends on commit 84018f55a:
 "netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"
This also adds a dependency to ip6_tables.

Originally based on patch from: Hans Schillstrom

kABI notes:
Changing struct ip_vs_iphdr is a potential minor kABI breaker,
because external modules can be compiled with another version of
this struct.  This should not matter, as they would most-likely
be using a compiled-in version of ip_vs_fill_iphdr().  When
recompiled, they will notice ip_vs_fill_iphdr() no longer exists,
and they have to used ip_vs_fill_iph_skb() instead.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:34:15 +09:00
Jesper Dangaard Brouer
a638e51437 ipvs: Use config macro IS_ENABLED()
Cleanup patch.

Use the IS_ENABLED macro, instead of having to check
both the build and the module CONFIG_ option.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:34:05 +09:00
Jesper Dangaard Brouer
120b9c14f4 ipvs: Trivial changes, use compressed IPv6 address in output
Have not converted the proc file output to compressed IPv6 addresses.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:33:52 +09:00
Eric Dumazet
e2bcabec6e net: remove sk_init() helper
It seems sk_init() has no value today and even does strange things :

# grep . /proc/sys/net/core/?mem_*
/proc/sys/net/core/rmem_default:212992
/proc/sys/net/core/rmem_max:131071
/proc/sys/net/core/wmem_default:212992
/proc/sys/net/core/wmem_max:131071

We can remove it completely.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27 18:42:00 -04:00
stephen hemminger
eccc1bb8d4 tunnel: drop packet if ECN present with not-ECT
Linux tunnels were written before RFC6040 and therefore never
implemented the corner case of ECN getting set in the outer header
and the inner header not being ready for it.

Section 4.2.  Default Tunnel Egress Behaviour.
 o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
      propagate any other ECN codepoint onwards.  This is because the
      inner Not-ECT marking is set by transports that rely on dropped
      packets as an indication of congestion and would not understand or
      respond to any other ECN codepoint [RFC4774].  Specifically:

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         CE, the decapsulator MUST drop the packet.

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
         outgoing packet with the ECN field cleared to Not-ECT.

This patch moves the ECN decap logic out of the individual tunnels
into a common place.

It also adds logging to allow detecting broken systems that
set ECN bits incorrectly when tunneling (or an intermediate
router might be changing the header).

Overloads rx_frame_error to keep track of ECN related error.

Thanks to Chris Wright who caught this while reviewing the new VXLAN
tunnel.

This code was tested by injecting faulty logic in other end GRE
to send incorrectly encapsulated packets.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27 18:12:37 -04:00
Andrei Emeltchenko
d945df256a bluetooth: Remove unneeded batostr function
batostr is not needed anymore since for printing Bluetooth
addresses we use %pMR specifier.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 18:10:43 -03:00
Andrei Emeltchenko
0b26ab9dce Bluetooth: AMP: Handle Accept phylink command status evt
When receiving HCI Command Status event for Accept Physical Link
execute HCI Write Remote AMP Assoc with data saved from A2MP Create
Physical Link Request.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:35:09 -03:00
Andrei Emeltchenko
dffa387110 Bluetooth: AMP: Accept Physical Link
When receiving A2MP Create Physical Link message execute HCI
Accept Physical Link command to AMP controller.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:34:38 -03:00
Andrei Emeltchenko
9495b2ee75 Bluetooth: AMP: Process Chan Selected event
Channel Selected event indicates that link information data is available.
Read it with Read Local AMP Assoc command. The data shall be sent in the
A2MP Create Physical Link Request.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:34:06 -03:00
Andrei Emeltchenko
2766be48a7 Bluetooth: A2MP: Add fallback to normal l2cap init sequence
When there is no remote AMP controller found fallback to normal
L2CAP sequence.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:31:52 -03:00
Andrei Emeltchenko
93c284ee90 Bluetooth: AMP: Write remote AMP Assoc
When receiving HCI Command Status after HCI Create Physical Link
execute HCI Write Remote AMP Assoc command to AMP controller.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:31:18 -03:00
Andrei Emeltchenko
a02226d6ff Bluetooth: AMP: Create Physical Link
When receiving A2MP Get AMP Assoc Response execute HCI Create Physical
Link to AMP controller. Define function which will run when receiving
HCI Command Status.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:30:40 -03:00
Andrei Emeltchenko
5a34918669 Bluetooth: AMP: Add AMP key calculation
Function calculates AMP keys using hmac_sha256 helper. Calculated keys
are Generic AMP Link Key (gamp) and Dedicated AMP Link Key with
keyID "802b" for 802.11 PAL.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:30:22 -03:00
Andrei Emeltchenko
93c3e8f5c9 Bluetooth: Choose connection based on capabilities
Choose which L2CAP connection to establish by checking support
for HS and remote side supported features.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:18:36 -03:00
Andrei Emeltchenko
52c0d6e56b Bluetooth: AMP: Remote AMP ctrl definitions
Create remote AMP controllers structure. It is used to keep information
about discovered remote AMP controllers by A2MP protocol.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:12:46 -03:00
Andrei Emeltchenko
3161ae1c72 Bluetooth: AMP: Physical link struct and helpers
Define physical link structures. Physical links are represented by
hci_conn structure. For BR/EDR we use type ACL_LINK and for AMP
we use AMP_LINK.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:11:52 -03:00
Andrei Emeltchenko
903e454110 Bluetooth: AMP: Use HCI cmd to Read Loc AMP Assoc
When receiving A2MP Get AMP Assoc Request execute Read Local AMP Assoc
HCI command to AMP controller. If the AMP Assoc data is larger than it
can fit to HCI event only fragment is read. When all fragments are read
send A2MP Get AMP Assoc Response.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:10:32 -03:00
Andrei Emeltchenko
8e2a0d92c5 Bluetooth: AMP: Use HCI cmd to Read AMP Info
When receiving A2MP Get Info Request execute Read Local AMP Info HCI
command to AMP controller with function to be executed upon receiving
command complete event. Function will handle A2MP Get Info Response.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:10:18 -03:00
Andrei Emeltchenko
f97268fccd Bluetooth: A2MP: Create amp_mgr global list
Create amp_mgr_list global list which will be used by different
hci devices to find amp_mgr.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:10:03 -03:00
Andrei Emeltchenko
b078b56429 Bluetooth: Add HCI logical link cmds definitions
Add a few definitions to hci.h

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-27 17:09:29 -03:00
John W. Linville
5419575e83 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-09-25 15:54:32 -04:00
Waldemar Rymarkiewicz
96e324024b NFC: xmit from hci ops must return 0 or negative
xmit callback provided by a driver encapsulates upper layers
data and sends it to the hardware. So, HCI does not know the
exact amount of data being sent and thus can't handle partially
sent frames properly.

Therefore, the driver must return 0 for completely sent frame or
negative for failure.

Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@tieto.com>
Acked-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:27 +02:00
Eric Lapuyade
412fda538f NFC: Changed HCI and PN544 HCI driver to use the new HCI LLC Core
The previous shdlc HCI driver and its header are removed from the tree.
PN544 now registers directly with HCI and passes the name of the llc it
requires (shdlc).
HCI instantiation now allocates the required llc instance. The llc is
started when the HCI device is brought up.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:26 +02:00
Eric Lapuyade
4a61cd6687 NFC: Add an shdlc llc module to llc core
This is used by HCI drivers such as the one for the pn544 which require
communications between HCI and the chip to use shdlc.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:25 +02:00
Eric Lapuyade
8af00d48dc NFC: Add a nop (passthrough) llc module to llc core
This is a passthrough llc. It can be used by HCI drivers that don't
need link layer control. HCI will then write directly to the driver, and
driver will deliver incoming frames directly to HCI without any
processing.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:25 +02:00
Eric Lapuyade
67cccfe17d NFC: Add an LLC Core layer to HCI
The LLC layer manages modules that control the link layer protocol (such
as shdlc) between HCI and an HCI driver. The driver must simply specify
the required llc when it registers with HCI.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:25 +02:00
Eric Lapuyade
f3e8fb5527 NFC: Modified hci_transceive to become an asynchronous operation
This enables the completion callback to be called from a different
context, preventing a possible deadlock if the callback resulted in the
invocation of a nested call to the currently locked nfc_dev.
This is also more in line with the im_transceive nfc_ops for NFC Core or
NCI drivers which already behave asynchronously.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:25 +02:00
Eric Lapuyade
e4c4789e55 NFC: Add a public nfc_hci_send_cmd_async method
This method initiates execution of an HCI cmd. Result will be delivered
through an asynchronous callback.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:25 +02:00
Tejun Heo
474fee3db1 NFC: Use system_nrt_wq instead of custom ones
NFC is using a number of custom ordered workqueues w/ WQ_MEM_RECLAIM.
WQ_MEM_RECLAIM is unnecessary unless NFC is gonna be used as transport
for storage device, and all use cases match one work item to one
ordered workqueue - IOW, there's no actual ordering going on at all
and using system_nrt_wq gives the same behavior.

There's nothing to be gained by using custom workqueues.  Use
system_nrt_wq instead and drop all the custom ones.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:23 +02:00
Ilan Elias
767f19ae69 NFC: Implement NCI dep_link_up and dep_link_down
During NFC-DEP target activation, store the remote
general bytes to be used later in dep_link_up.
When dep_link_up is called, activate the NFC-DEP target,
and forward the remote general bytes.
When dep_link_down is called, deactivate the target.

Signed-off-by: Ilan Elias <ilane@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:23 +02:00
Ilan Elias
ac20683840 NFC: Parse NCI NFC-DEP activation params
Signed-off-by: Ilan Elias <ilane@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:23 +02:00
Ilan Elias
7e0352306f NFC: Set local general bytes in nci_start_poll
If initiator protocol is NFC-DEP, set the local general bytes
in nci_start_poll.

Signed-off-by: Ilan Elias <ilane@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-09-25 00:17:23 +02:00
Eric Dumazet
5640f76858 net: use a per task frag allocator
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24 16:31:37 -04:00
David S. Miller
2a6c8c7998 net: Remove unnecessary NULL check in scm_destroy().
All callers provide a non-NULL scm argument.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24 15:52:33 -04:00
John W. Linville
791ef39cd1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-09-24 14:39:16 -04:00
John W. Linville
9b4e9e7565 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-09-24 14:34:40 -04:00
Neal Cardwell
016818d076 tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHS
When taking SYNACK RTT samples for servers using TCP Fast Open, fix
the code to ensure that we only call tcp_valid_rtt_meas() after we
receive the ACK that completes the 3-way handshake.

Previously we were always taking an RTT sample in
tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections
tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we
receive the SYN. So for TFO we must wait until tcp_rcv_state_process()
to take the RTT sample.

To fix this, we wait until after TFO calls tcp_v4_syn_recv_sock()
before we set the snt_synack timestamp, since tcp_synack_rtt_meas()
already ensures that we only take a SYNACK RTT sample if snt_synack is
non-zero. To be careful, we only take a snt_synack timestamp when
a SYNACK transmit or retransmit succeeds.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-22 15:47:10 -04:00
Neal Cardwell
623df484a7 tcp: extract code to compute SYNACK RTT
In preparation for adding another spot where we compute the SYNACK
RTT, extract this code so that it can be shared.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-22 15:47:10 -04:00
Linus Torvalds
abef3bd710 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:
 "More bug fixes, nothing gets past these guys"

 1) More kernel info leaks found by Mathias Krause, this time in the
    IPSEC configuration layers.

 2) When IPSEC policies change, we do not properly make sure that cached
    routes (which could now be stale) throughout the system will be
    revalidated.  Fix this by generalizing the generation count
    invalidation scheme used by ipv4.  From Nicolas Dichtel.

 3) When repairing TCP sockets, we need to allow to restore not just the
    send window scale, but the receive one too.  Extend the existing
    interface to achieve this in a backwards compatible way.  From
    Andrey Vagin.

 4) A fix for FCOE scatter gather feature validation erroneously caused
    scatter gather to be disabled for things like AOE too.  From Ed L
    Cashin.

 5) Several cases of mishandling of error pointers, from Mathias Krause,
    Wei Yongjun, and Devendra Naga.

 6) Fix gianfar build, from Richard Cochran.

 7) CAP_NET_* failures should return -EPERM not -EACCES, from Zhao
    Hongjiang.

 8) Hardware reset fix in janz-ican3 CAN driver, from Ira W Snyder.

 9) Fix oops during rmmod in ti_hecc CAN driver, from Marc Kleine-Budde.

10) The removal of the conditional compilation of the clk support code
    in the stmmac driver broke things.  This is because the interfaces
    used are the ones that don't also perform the enable/disable of the
    clk.  Fix from Stefan Roese.

11) The QFQ packet scheduler can record out of range virtual start
    times, resulting later in misbehavior and even crashes.  Fix from
    Paolo Valente.

12) If MSG_WAITALL is used with IOAT DMA under TCP, we can wedge the
    receiver when the advertised receive window goes to zero.  Detect
    this case and force the processing of the IOAT DMA queue when it
    happens to avoid getting stuck.  Fix from Michal Kubecek.

13) batman-adv assumes that test_bit() returns only 0 or 1, but this is
    not true for x86 (which returns -1 or 0, via the 'sbb' instruction).
    Fix from Linus Lussing.

14) Fix small packet corruption in e1000, from Tushar Dave.

15) make_blackhole() in the IPSEC policy code can do one read unlock too
    many, fix from Li RongQing.

16) The new tcp_try_coalesce() code introduced a bug in TCP URG
    handling, fix from Eric Dumazet.

17) Fix memory leak in __netif_receive_skb() when doing zerocopy and
    when hit an OOM condition.  From Michael S Tsirkin.

18) netxen blindly deferences pdev->bus->self, which is not guarenteed
    to be non-NULL.  Fix from Nikolay Aleksandrov.

19) Fix a performance regression caused by mistakes in ipv6 checksum
    validation in the bnx2x driver, fix from Michal Schmidt.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
  net/stmmac: Use clk_prepare_enable and clk_disable_unprepare
  net: change return values from -EACCES to -EPERM
  net/irda: sh_sir: fix return value check in sh_sir_set_baudrate()
  stmmac: fix return value check in stmmac_open_ext_timer()
  gianfar: fix phc index build failure
  ipv6: fix return value check in fib6_add()
  bnx2x: remove false warning regarding interrupt number
  can: ti_hecc: fix oops during rmmod
  can: janz-ican3: fix support for older hardware revisions
  net: do not disable sg for packets requiring no checksum
  aoe: assert AoE packets marked as requiring no checksum
  at91ether: return PTR_ERR if call to clk_get fails
  xfrm_user: don't copy esn replay window twice for new states
  xfrm_user: ensure user supplied esn replay window is valid
  xfrm_user: fix info leak in copy_to_user_tmpl()
  xfrm_user: fix info leak in copy_to_user_policy()
  xfrm_user: fix info leak in copy_to_user_state()
  xfrm_user: fix info leak in copy_to_user_auth()
  net: qmi_wwan: adding Huawei E367, ZTE MF683 and Pantech P4200
  tcp: restore rcv_wscale in a repair mode (v2)
  ...
2012-09-21 14:32:55 -07:00
Amerigo Wang
6b102865e7 ipv6: unify fragment thresh handling code
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Amerigo Wang
d4915c087f ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Amerigo Wang
b836c99fd6 ipv6: unify conntrack reassembly expire code with standard one
Two years ago, Shan Wei tried to fix this:
http://patchwork.ozlabs.org/patch/43905/

The problem is that RFC2460 requires an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment, if the defragmentation
times out.

"
   If insufficient fragments are received to complete reassembly of a
   packet within 60 seconds of the reception of the first-arriving
   fragment of that packet, reassembly of that packet must be
   abandoned and all the fragments that have been received for that
   packet must be discarded.  If the first fragment (i.e., the one
   with a Fragment Offset of zero) has been received, an ICMP Time
   Exceeded -- Fragment Reassembly Time Exceeded message should be
   sent to the source of that fragment.
"

As Herbert suggested, we could actually use the standard IPv6
reassembly code which follows RFC2460.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 3/4 fragmented
IPv6 UDP packet.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Amerigo Wang
c038a767cd ipv6: add a new namespace for nf_conntrack_reasm
As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Johannes Berg
552bff0c2f cfg80211: constify name parameter to add_virtual_intf
The name can't be modified by the driver,
make it const.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-19 09:32:59 +02:00
Johan Hedberg
92a25256f1 Bluetooth: mgmt: Implement support for passkey notification
This patch adds support for Secure Simple Pairing with devices that have
KeyboardOnly as their IO capability. Such devices will cause a passkey
notification on our side and optionally also keypress notifications.
Without this patch some keyboards cannot be paired using the mgmt
interface.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-18 22:27:29 -03:00
Nicolas Dichtel
6f3118b571 ipv6: use net->rt_genid to check dst validity
IPv6 dst should take care of rt_genid too. When a xfrm policy is inserted or
deleted, all dst should be invalidated.
To force the validation, dst entries should be created with ->obsolete set to
DST_OBSOLETE_FORCE_CHK. This was already the case for all functions calling
ip6_dst_alloc(), except for ip6_rt_copy().

As a consequence, we can remove the specific code in inet6_connection_sock.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:03 -04:00
Nicolas Dichtel
b42664f898 netns: move net->ipv4.rt_genid to net->rt_genid
This commit prepares the use of rt_genid by both IPv4 and IPv6.
Initialization is left in IPv4 part.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:03 -04:00
Nicolas Dichtel
bafa6d9d89 ipv4/route: arg delay is useless in rt_cache_flush()
Since route cache deletion (89aef8921b), delay is no
more used. Remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:44:34 -04:00
Pandiyarajan Pitchaimuthu
ed44a951c7 cfg80211/nl80211: Notify connection request failure in AP mode
In AP mode, when a station requests connection to an AP and if the
request is failed for particular reason, userspace is notified about the
failure through NL80211_CMD_CONN_FAILED command. Reason for the failure
is sent through the attribute NL80211_ATTR_CONN_FAILED_REASON.

Signed-off-by: Pandiyarajan Pitchaimuthu <c_ppitch@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-18 19:54:06 +02:00
Arend van Spriel
30d08a46ea cfg80211: remove obsolete comment for .sched_scan_stop() callback
The kerneldoc comment for .sched_scan_stop() callback describes a
driver_initiated flag, but the interface does not hold such a flag.

Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-18 19:54:05 +02:00
Eric W. Biederman
e1760bd5ff userns: Convert the audit loginuid to be a kuid
Always store audit loginuids in type kuid_t.

Print loginuids by converting them into uids in the appropriate user
namespace, and then printing the resulting uid.

Modify audit_get_loginuid to return a kuid_t.

Modify audit_set_loginuid to take a kuid_t.

Modify /proc/<pid>/loginuid on read to convert the loginuid into the
user namespace of the opener of the file.

Modify /proc/<pid>/loginud on write to convert the loginuid
rom the user namespace of the opener of the file.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <paul@paul-moore.com> ?
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-17 18:08:54 -07:00
Chuck Lever
35c448a8a3 include/net/sock.h: squelch compiler warning in sk_rmem_schedule()
This warning:

  In file included from linux/include/linux/tcp.h:227:0,
                   from linux/include/linux/ipv6.h:221,
                   from linux/include/net/ipv6.h:16,
                   from linux/include/linux/sunrpc/clnt.h:26,
                   from linux/net/sunrpc/stats.c:22:
  linux/include/net/sock.h: In function `sk_rmem_schedule':
  linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

is seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
-Wextra option.

Commit c76562b670 ("netvm: prevent a stream-specific deadlock")
accidentally replaced the "size" parameter of sk_rmem_schedule() with an
unsigned int.  This changes the semantics of the comparison in the
return statement.

In sk_wmem_schedule we have syntactically the same comparison, but
"size" is a signed integer.  In addition, __sk_mem_schedule() takes a
signed integer for its "size" parameter, so there is an implicit type
conversion in sk_rmem_schedule() anyway.

Revert the "size" parameter back to a signed integer so that the
semantics of the expressions in both sk_[rw]mem_schedule() are exactly
the same.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17 15:00:38 -07:00
David S. Miller
b4516a288e llc: Remove stray reference to sysctl_llc_station_ack_timeout.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-17 13:13:24 -04:00
David S. Miller
ba01dfe182 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This is another batch of updates intended for the 3.7 stream.

There are not a lot of large items, but iwlwifi, mwifiex, rt2x00,
ath9k, and brcmfmac all get some attention.  Wei Yongjun also provides
a series of small maintenance fixes.

This also includes a pull of the wireless tree in order to satisfy
some prerequisites for later patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-17 00:57:32 -04:00
Greg Kroah-Hartman
7ac3c93e5d Merge 3.6-rc6 into tty-next
This pulls in the fixes in 3.6-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-16 17:31:36 -07:00
David S. Miller
b48b63a1f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/netfilter/nfnetlink_log.c
	net/netfilter/xt_LOG.c

Rather easy conflict resolution, the 'net' tree had bug fixes to make
sure we checked if a socket is a time-wait one or not and elide the
logging code if so.

Whereas on the 'net-next' side we are calculating the UID and GID from
the creds using different interfaces due to the user namespace changes
from Eric Biederman.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-15 11:43:53 -04:00
John W. Linville
9316f0e3c6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-09-14 13:53:49 -04:00
Daniel Wagner
8a8e04df47 cgroup: Assign subsystem IDs during compile time
WARNING: With this change it is impossible to load external built
controllers anymore.

In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is
set, corresponding subsys_id should also be a constant. Up to now,
net_prio_subsys_id and net_cls_subsys_id would be of the type int and
the value would be assigned during runtime.

By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN
to IS_ENABLED, all *_subsys_id will have constant value. That means we
need to remove all the code which assumes a value can be assigned to
net_prio_subsys_id and net_cls_subsys_id.

A close look is necessary on the RCU part which was introduces by
following patch:

  commit f845172531
  Author:	Herbert Xu <herbert@gondor.apana.org.au>  Mon May 24 09:12:34 2010
  Committer:	David S. Miller <davem@davemloft.net>  Mon May 24 09:12:34 2010

  cls_cgroup: Store classid in struct sock

  Tis code was added to init_cgroup_cls()

	  /* We can't use rcu_assign_pointer because this is an int. */
	  smp_wmb();
	  net_cls_subsys_id = net_cls_subsys.subsys_id;

  respectively to exit_cgroup_cls()

	  net_cls_subsys_id = -1;
	  synchronize_rcu();

  and in module version of task_cls_classid()

	  rcu_read_lock();
	  id = rcu_dereference(net_cls_subsys_id);
	  if (id >= 0)
		  classid = container_of(task_subsys_state(p, id),
					 struct cgroup_cls_state, css)->classid;
	  rcu_read_unlock();

Without an explicit explaination why the RCU part is needed. (The
rcu_deference was fixed by exchanging it to rcu_derefence_index_check()
in a later commit, but that is a minor detail.)

So here is my pondering why it was introduced and why it safe to
remove it now. Note that this code was copied over to net_prio the
reasoning holds for that subsystem too.

The idea behind the RCU use for net_cls_subsys_id is to make sure we
get a valid pointer back from task_subsys_state(). task_subsys_state()
is just blindly accessing the subsys array and returning the
pointer. Obviously, passing in -1 as id into task_subsys_state()
returns an invalid value (out of lower bound).

So this code makes sure that only after module is loaded and the
subsystem registered, the id is assigned.

Before unregistering the module all old readers must have left the
critical section. This is done by assigning -1 to the id and issuing a
synchronized_rcu(). Any new readers wont call task_subsys_state()
anymore and therefore it is safe to unregister the subsystem.

The new code relies on the same trick, but it looks at the subsys
pointer return by task_subsys_state() (remember the id is constant
and therefore we allways have a valid index into the subsys
array).

No precautions need to be taken during module loading
module. Eventually, all CPUs will get a valid pointer back from
task_subsys_state() because rebind_subsystem() which is called after
the module init() function will assigned subsys[net_cls_subsys_id] the
newly loaded module subsystem pointer.

When the subsystem is about to be removed, rebind_subsystem() will
called before the module exit() function. In this case,
rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer
and then it calls synchronize_rcu(). All old readers have left by then
the critical section. Any new reader wont access the subsystem
anymore.  At this point we are safe to unregister the subsystem. No
synchronize_rcu() call is needed.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
2012-09-14 09:57:43 -07:00
Daniel Wagner
51e4e7faba cgroup: net_prio: Do not define task_netpioidx() when not selected
task_netprioidx() should not be defined in case the configuration is
CONFIG_NETPRIO_CGROUP=n. The reason is that in a following patch the
net_prio_subsys_id will only be defined if CONFIG_NETPRIO_CGROUP!=n.
When net_prio is not built at all any callee should only get an empty
task_netprioidx() without any references to net_prio_subsys_id.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
2012-09-14 09:57:28 -07:00
Daniel Wagner
8fb974c937 cgroup: net_cls: Do not define task_cls_classid() when not selected
task_cls_classid() should not be defined in case the configuration is
CONFIG_NET_CLS_CGROUP=n. The reason is that in a following patch the
net_cls_subsys_id will only be defined if CONFIG_NET_CLS_CGROUP!=n.
When net_cls is not built at all a callee should only get an empty
task_cls_classid() without any references to net_cls_subsys_id.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
2012-09-14 09:57:25 -07:00
Daniel Wagner
f341980771 cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
The only user of sock_update_classid() is net/socket.c which happens
to include cls_cgroup.h directly.

tj: Fix build breakage due to missing cls_cgroup.h inclusion in
    drivers/net/tun.c reported in linux-next by Stephen.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
2012-09-14 09:55:57 -07:00
Eric W. Biederman
15e473046c netlink: Rename pid to portid to avoid confusion
It is a frequent mistake to confuse the netlink port identifier with a
process identifier.  Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10 15:30:41 -04:00
Johannes Berg
e548c49e6d mac80211: add key flag for management keys
Mark keys that might be used to receive management
frames so drivers can fall back on software crypto
for them if they don't support hardware offload.
As the new flag is only set correctly for RX keys
and the existing IEEE80211_KEY_FLAG_SW_MGMT flag
can only affect TX, also rename the latter to
IEEE80211_KEY_FLAG_SW_MGMT_TX.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-10 11:29:17 +02:00
Andrei Emeltchenko
376261ae36 Bluetooth: debug: Print refcnt for hci_dev
Add debug output for HCI kref.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-08 18:06:10 -03:00
Andrei Emeltchenko
9472007c62 Bluetooth: trivial: Make hci_chan_del return void
Return code is not needed in hci_chan_del

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-08 17:27:18 -03:00
Andrei Emeltchenko
6b536b5e5e Bluetooth: Remove unneeded zero init
hdev is allocated with kzalloc so zero initialization is not needed.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-08 16:53:48 -03:00
John W. Linville
fac805f8c1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2012-09-07 15:07:55 -04:00
John W. Linville
4a3e12fd7a Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-09-07 14:49:46 -04:00
Nicolas Dichtel
4ccfe6d410 ipv4/route: arg delay is useless in rt_cache_flush()
Since route cache deletion (89aef8921b), delay is no
more used. Remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 14:44:08 -04:00
Eric W. Biederman
dbe9a4173e scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie.
Passing uids and gids on NETLINK_CB from a process in one user
namespace to a process in another user namespace can result in the
wrong uid or gid being presented to userspace.  Avoid that problem by
passing kuids and kgids instead.

- define struct scm_creds for use in scm_cookie and netlink_skb_parms
  that holds uid and gid information in kuid_t and kgid_t.

- Modify scm_set_cred to fill out scm_creds by heand instead of using
  cred_to_ucred to fill out struct ucred.  This conversion ensures
  userspace does not get incorrect uid or gid values to look at.

- Modify scm_recv to convert from struct scm_creds to struct ucred
  before copying credential values to userspace.

- Modify __scm_send to populate struct scm_creds on in the scm_cookie,
  instead of just copying struct ucred from userspace.

- Modify netlink_sendmsg to copy scm_creds instead of struct ucred
  into the NETLINK_CB.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 14:42:05 -04:00
John W. Linville
777bf135b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
John W. Linville says:

====================
Please pull these fixes intended for 3.6.  There are more commits
here than I would like -- I got a bit behind while I was stalking
Steven Rostedt in San Diego last week...  I'll slow it down after this!

There are a couple of pulls here.  One is from Johannes:

"Please pull (according to the below information) to get a few fixes.

 * a fix to properly disconnect in the driver when authentication or
   association fails
 * a fix to prevent invalid information about mesh paths being reported
   to userspace
 * a memory leak fix in an nl80211 error path"

The other comes via Gustavo:

"A few updates for the 3.6 kernel. There are two btusb patches to add
more supported devices through the new USB_VENDOR_AND_INTEFACE_INFO()
macro and another one that add a new device id for a Sony Vaio laptop,
one fix for a user-after-free and, finally, two patches from Vinicius
to fix a issue in SMP pairing."

Along with those...

Arend van Spriel provides a fix for a use-after-free bug in brcmfmac.

Daniel Drake avoids a hang by not trying to touch the libertas hardware
duing suspend if it is already powered-down.

Felix Fietkau provides a batch of ath9k fixes that adress some
potential problems with power settings, as well as a fix to avoid a
potential interrupt storm.

Gertjan van Wingerde provides a register-width fix for rt2x00, and
a rt2x00 fix to prevent incorrectly detecting the rfkill status.
He also provides a device ID patch.

Hante Meuleman gives us three brcmfmac fixes, one that properly
initializes a command structure, one that fixes a race condition that
could lose usb requests, and one that removes some log spam.

Marc Kleine-Budde offers an rt2x00 fix for a voltage setting on some
specific devices.

Mohammed Shafi Shajakhan sent an ath9k fix to avoid a crash related to
using timers that aren't allocated when 2 wire bluetooth coexistence
hardware is in use.

Sergei Poselenov changes rt2800usb to do some validity checking for
received packets, avoiding crashes on an ARM Soc.

Stone Piao gives us an mwifiex fix for an incorrectly set skb length
value for a command buffer.

All of these are localized to their specific drivers, and relatively
small.  The power-related patches from Felix are bigger than I would
like, but I merged them in consideration of their isolation to ath9k
and the sensitive nature of power settings in wireless devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 14:38:50 -04:00
Johannes Berg
00b14825ee Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2012-09-06 17:05:28 +02:00
Johannes Berg
944b9e375d Merge remote-tracking branch 'mac80211/master' into mac80211-next
Pull in mac80211.git to let the next patch apply
without conflicts, also resolving a hwsim conflict.

Conflicts:
	drivers/net/wireless/mac80211_hwsim.c

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-06 15:56:02 +02:00
Nicolas Dichtel
ef2c7d7b59 ipv6: fix handling of blackhole and prohibit routes
When adding a blackhole or a prohibit route, they were handling like classic
routes. Moreover, it was only possible to add this kind of routes by specifying
an interface.

Bug already reported here:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498

Before the patch:
  $ ip route add blackhole 2001::1/128
  RTNETLINK answers: No such device
  $ ip route add blackhole 2001::1/128 dev eth0
  $ ip -6 route | grep 2001
  2001::1 dev eth0  metric 1024

After:
  $ ip route add blackhole 2001::1/128
  $ ip -6 route | grep 2001
  blackhole 2001::1 dev lo  metric 1024  error -22

v2: wrong patch
v3: add a field fc_type in struct fib6_config to store RTN_* type

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05 17:49:28 -04:00
Robert P. J. Day
c9a0a30252 cfg80211: add kerneldoc entry for "vht_cap"
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05 15:54:07 +02:00
Steffen Klassert
3b59df46a4 xfrm: Workaround incompatibility of ESN and async crypto
ESN for esp is defined in RFC 4303. This RFC assumes that the
sequence number counters are always up to date. However,
this is not true if an async crypto algorithm is employed.

If the sequence number counters are not up to date on sequence
number check, we may incorrectly update the upper 32 bit of
the sequence number. This leads to a DOS.

We workaround this by comparing the upper sequence number,
(used for authentication) with the upper sequence number
computed after the async processing. We drop the packet
if these numbers are different.

To do this, we introduce a recheck function that does this
check in the ESN case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 14:09:45 -04:00
David S. Miller
1e9f0207d3 Merge branch 'master' of git://1984.lsi.us.es/nf-next 2012-09-03 20:26:45 -04:00
Yuchung Cheng
684bad1107 tcp: use PRR to reduce cwin in CWR state
Use proportional rate reduction (PRR) algorithm to reduce cwnd in CWR state,
in addition to Recovery state. Retire the current rate-halving in CWR.
When losses are detected via ACKs in CWR state, the sender enters Recovery
state but the cwnd reduction continues and does not restart.

Rename and refactor cwnd reduction functions since both CWR and Recovery
use the same algorithm:
tcp_init_cwnd_reduction() is new and initiates reduction state variables.
tcp_cwnd_reduction() is previously tcp_update_cwnd_in_recovery().
tcp_ends_cwnd_reduction() is previously  tcp_complete_cwr().

The rate halving functions and logic such as tcp_cwnd_down(), tcp_min_cwnd(),
and the cwnd moderation inside tcp_enter_cwr() are removed. The unused
parameter, flag, in tcp_cwnd_reduction() is also removed.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 14:34:02 -04:00
Pablo Neira Ayuso
ace1fe1231 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling
of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
2012-09-03 15:34:51 +02:00
Pablo Neira Ayuso
84b5ee939e netfilter: nf_conntrack: add nf_ct_timeout_lookup
This patch adds the new nf_ct_timeout_lookup function to encapsulate
the timeout policy attachment that is called in the nf_conntrack_in
path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:33:03 +02:00
Jerry Chu
8336886f78 tcp: TCP Fast Open Server - support TFO listeners
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 20:02:19 -04:00
Jerry Chu
1046716368 tcp: TCP Fast Open Server - header & support functions
This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.

In addition, it includes the following:

1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.

2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.

"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.

"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.

3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 20:02:18 -04:00
Alex Bergmann
6c9ff979d1 tcp: Increase timeout for SYN segments
Commit 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for
the passive open side") changed the initRTO from 3secs to 1sec in
accordance to RFC6298 (former RFC2988bis). This reduced the time till
the last SYN retransmission packet gets sent from 93secs to 31secs.

RFC1122 is stating that the retransmission should be done for at least 3
minutes, but this seems to be quite high.

  "However, the values of R1 and R2 may be different for SYN
  and data segments.  In particular, R2 for a SYN segment MUST
  be set large enough to provide retransmission of the segment
  for at least 3 minutes.  The application can close the
  connection (i.e., give up on the open attempt) sooner, of
  course."

This patch increases the value of TCP_SYN_RETRIES to the value of 6,
providing a retransmission window of 63secs.

The comments for SYN and SYNACK retries have also been updated to
describe the current settings. The same goes for the documentation file
"Documentation/networking/ip-sysctl.txt".

Signed-off-by: Alexander Bergmann <alex@linlab.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 15:42:10 -04:00
David S. Miller
c32f38619a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 15:14:18 -04:00
David S. Miller
0dcd5052c8 Merge branch 'master' of git://1984.lsi.us.es/nf 2012-08-31 13:06:37 -04:00
Pablo Neira Ayuso
5b423f6a40 netfilter: nf_conntrack: fix racy timer handling with reliable events
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.

This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.

Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-31 15:50:28 +02:00
Patrick McHardy
b3f644fc82 netfilter: ip6tables: add MASQUERADE target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:18 +02:00
Patrick McHardy
58a317f106 netfilter: ipv6: add IPv6 NAT support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:17 +02:00
Patrick McHardy
2cf545e835 net: core: add function for incremental IPv6 pseudo header checksum updates
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
2012-08-30 03:00:16 +02:00
Patrick McHardy
c7232c9979 netfilter: add protocol independent NAT core
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:14 +02:00
Patrick McHardy
051966c0c6 netfilter: nf_nat: add protoff argument to packet mangling functions
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:13 +02:00
Vinicius Costa Gomes
cc110922da Bluetooth: Change signature of smp_conn_security()
To make it clear that it may be called from contexts that may not have
any knowledge of L2CAP, we change the connection parameter, to receive
a hci_conn.

This also makes it clear that it is checking the security of the link.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-27 08:07:18 -07:00
Greg Kroah-Hartman
e372dc6c62 Merge 3.6-rc3 into tty-next
This picks up all of the different fixes in Linus's tree that we also need here.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-27 07:13:33 -07:00
Patrick McHardy
5f2d04f1f9 ipv4: fix path MTU discovery with connection tracking
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
2012-08-26 19:13:55 +02:00
David S. Miller
e6acb38480 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 18:54:37 -04:00
John W. Linville
f20b6213f1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-08-24 12:25:30 -04:00
Rami Rosen
f63c45e0e6 packet: fix broken build.
This patch fixes a broken build due to a missing header:
...
  CC      net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
                 from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...

The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.

See commit 0fa7fa98db:
packet: Protect packet sk list with mutex (v2) patch,

Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:29:45 -07:00
Pavel Emelyanov
0fa7fa98db packet: Protect packet sk list with mutex (v2)
Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
David S. Miller
bf277b0cce Merge git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:

* Remove unnecessary RTNL locking now that we have support
  for namespace in nf_conntrack, from Patrick McHardy.

* Cleanup to eliminate unnecessary goto in the initialization
  path of several Netfilter tables, from Jean Sacren.

* Another cleanup from Wu Fengguang, this time to PTR_RET instead
  of if IS_ERR then return PTR_ERR.

* Use list_for_each_entry_continue_rcu in nf_iterate, from
  Michael Wang.

* Add pmtu_disc sysctl option to disable PMTU in their tunneling
  transmitter, from Julian Anastasov.

* Generalize application protocol registration in IPVS and modify
  IPVS FTP helper to use it, from Julian Anastasov.

* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
  helper for NAT support, from Julian Anastasov.

* Add logic to update PMTU for IPIP packets in IPVS, again
  from Julian Anastasov.

* A couple of sparse warning fixes for IPVS and Netfilter from
  Claudiu Ghioc and Patrick McHardy respectively.

Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 18:48:52 -07:00
David S. Miller
1304a7343b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
Eric Dumazet
e0e3cea46d af_netlink: force credentials passing [CVE-2012-3520]
Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug.  The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).

This bug was introduced in commit 16e5726269
(af_unix: dont send SCM_CREDENTIALS by default)

This patch forces passing credentials for netlink, as
before the regression.

Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.

With help from Florian Weimer & Petr Matousek

This issue is designated as CVE-2012-3520

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:53:01 -07:00
John W. Linville
01e17dacd4 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/mac80211_hwsim.c
2012-08-21 16:00:21 -04:00
Syam Sidhardhan
144ad33020 Bluetooth: Use kref for l2cap channel reference counting
This patch changes the struct l2cap_chan and associated code to use
kref api for object refcounting and freeing.

Suggested-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:41 -03:00
Mikel Astiz
f0d6a0ea33 Bluetooth: mgmt: Add device disconnect reason
MGMT_EV_DEVICE_DISCONNECTED will now expose the disconnection reason to
userland, distinguishing four possible values:

	0x00	Reason not known or unspecified
	0x01	Connection timeout
	0x02	Connection terminated by local host
	0x03	Connection terminated by remote host

Note that the local/remote distinction just determines which side
terminated the low-level connection, regardless of the disconnection of
the higher-level profiles.

This can sometimes be misleading and thus must be used with care. For
example, some hardware combinations would report a locally initiated
disconnection even if the user turned Bluetooth off in the remote side.

Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:40 -03:00
Mikel Astiz
cdcba7c650 Bluetooth: Add more HCI error codes
Add more HCI error codes as defined in the specification.

Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:39 -03:00
Johannes Berg
6d71117a27 mac80211: add IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
Some devices like the current iwlwifi implementation
require that the P2P interface address match the P2P
Device address (only one P2P interface is supported.)
Add the HW flag IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
that allows drivers to request that P2P Interfaces
added while a P2P Device is active get the same MAC
address by default.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:23 +02:00
Johannes Berg
98104fdeda cfg80211: add P2P Device abstraction
In order to support using a different MAC address
for the P2P Device address we must first have a
P2P Device abstraction that can be assigned a MAC
address.

This abstraction will also be useful to support
offloading P2P operations to the device, e.g.
periodic listen for discoverability.

Currently, the driver is responsible for assigning
a MAC address to the P2P Device, but this could be
changed by allowing a MAC address to be given to
the NEW_INTERFACE command.

As it has no associated netdev, a P2P Device can
only be identified by its wdev identifier but the
previous patches allowed using the wdev identifier
in various APIs, e.g. remain-on-channel.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:21 +02:00
Johannes Berg
4c29867790 mac80211: support A-MPDU status reporting
Support getting A-MPDU status information from the
drivers and reporting it to userspace via radiotap
in the standard fields.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:53:22 +02:00
Johannes Berg
48613ece3d wireless: add radiotap A-MPDU status field
Define the A-MPDU status field in radiotap, also
update the radiotap parser for it and the MCS field
that was apparently missed last time.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:53:09 +02:00
Antonio Quartulli
e687f61eed mac80211: add supported rates change notification in IBSS
In IBSS it is possible that the supported rates set for a station changes over
time (e.g. it gets first initialised as an empty set because of no available
information about rates and updated later). In this case the driver has to be
notified about the change in order to update its internal table accordingly (if
needed).

This behaviour is needed by all those drivers that handle rc internally but
leave stations management to mac80211

Reported-by: Gui Iribarren <gui@altermundi.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
[Johannes - add docs, validate IBSS mode only, fix compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:31:43 +02:00
Patrick McHardy
9d7b0fc1ef net: ipv6: fix oops in inet_putpeer()
Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
initialized, causing a false positive result from inetpeer_ptr_is_peer(),
which in turn causes a NULL pointer dereference in inet_putpeer().

Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
EIP is at inet_putpeer+0xe/0x16
EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
 DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
 f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
 f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
 [<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
 [<c038d5f7>] dst_destroy+0x1d/0xa4
 [<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
 [<c0396870>] flow_cache_gc_task+0x85/0x9f
 [<c0142d2b>] process_one_work+0x122/0x441
 [<c043feb5>] ? apic_timer_interrupt+0x31/0x38
 [<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
 [<c0143e2d>] worker_thread+0x113/0x3cc

Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
properly initialize the dst's peer pointer.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:56:56 -07:00
David S. Miller
2ea214929d Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This is a batch of updates intended for 3.7.  The ath9k, mwifiex,
and b43 drivers get the bulk of the commits this time, with a handful
of other driver bits thrown-in.  It is mostly just minor fixes and
cleanups, etc.

Also included is a Bluetooth pull, with a lot of refactoring.
Gustavo says:

	"These are the changes I queued for 3.7. There are a many
	small fixes/improvements by Andre Guedes. A l2cap channel
	refcounting refactor by Jaganath. Bluetooth sockets now
	appears in /proc/net, by Masatake Yamato and Sachin Kamat
	changes ours drivers to use devm_kzalloc()."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:26:05 -07:00
Fan Du
65e0736bc2 xfrm: remove redundant parameter "int dir" in struct xfrm_mgr.acquire
Sematically speaking, xfrm_mgr.acquire is called when kernel intends to ask
user space IKE daemon to negotiate SAs with peers. IOW the direction will
*always* be XFRM_POLICY_OUT, so remove int dir for clarity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:13:30 -07:00
John W. Linville
16698918cd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-08-15 14:29:37 -04:00
Cong Wang
1f07b62f32 sctp: fix a compile error in sctp.h
I got the following compile error:

In file included from include/net/sctp/checksum.h:46:0,
                 from net/ipv4/netfilter/nf_nat_proto_sctp.c:14:
include/net/sctp/sctp.h: In function ‘sctp_dbg_objcnt_init’:
include/net/sctp/sctp.h:370:88: error: parameter name omitted
include/net/sctp/sctp.h: In function ‘sctp_dbg_objcnt_exit’:
include/net/sctp/sctp.h:371:88: error: parameter name omitted

which is caused by

	commit 13d782f6b4
	Author: Eric W. Biederman <ebiederm@xmission.com>
	Date:   Mon Aug 6 08:45:15 2012 +0000

	    sctp: Make the proc files per network namespace.

This patch could fix it.

Cc: David S. Miller <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 03:43:43 -07:00
Eric W. Biederman
e1fc3b14f9 sctp: Make sysctl tunables per net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:32:16 -07:00
Eric W. Biederman
f53b5b097e sctp: Push struct net down into sctp_verify_ext_param
Add struct net as a parameter to sctp_verify_param so it can be passed
to sctp_verify_ext_param where struct net will be needed when the sctp
tunables become per net tunables.

Add struct net as a parameter to sctp_verify_init so struct net can be
passed to sctp_verify_param.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
24cb81a6a9 sctp: Push struct net down into all of the state machine functions
There are a handle of state machine functions primarily those dealing
with processing INIT packets where there is neither a valid endpoint nor
a valid assoication from which to derive a struct net.  Therefore add
struct net * to the parameter list of sctp_state_fn_t and update all of
the state machine functions.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
e7ff4a7037 sctp: Push struct net down into sctp_in_scope
struct net will be needed shortly when the tunables are made per network
namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
89bf3450cb sctp: Push struct net down into sctp_transport_init
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
55e26eb95a sctp: Push struct net down to sctp_chunk_event_lookup
This trickles up through sctp_sm_lookup_event up to sctp_do_sm
and up further into sctp_primitiv_NAME before the code reaches
places where struct net can be reliably found.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
ebb7e95d93 sctp: Add infrastructure for per net sysctls
Start with an empty sctp_net_table that will be populated as the various
tunable sysctls are made per net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
b01a24078f sctp: Make the mib per network namespace
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:36 -07:00
Eric W. Biederman
13d782f6b4 sctp: Make the proc files per network namespace.
- Convert all of the files under /proc/net/sctp to be per
  network namespace.

- Don't print anything for /proc/net/sctp/snmp except in
  the initial network namespaces as the snmp counters still
  have to be converted to be per network namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:29:53 -07:00
Eric W. Biederman
2ce9550350 sctp: Make the ctl_sock per network namespace
- Kill sctp_get_ctl_sock, it is useless now.
- Pass struct net where needed so net->sctp.ctl_sock is accessible.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:17:26 -07:00
Eric W. Biederman
4db67e8086 sctp: Make the address lists per network namespace
- Move the address lists into struct net
- Add per network namespace initialization and cleanup
- Pass around struct net so it is everywhere I need it.
- Rename all of the global variable references into references
  to the variables moved into struct net

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:12:17 -07:00
Eric W. Biederman
4110cc255d sctp: Make the association hashtable handle multiple network namespaces
- Use struct net in the hash calculation
- Use sock_net(association.base.sk) in the association lookups.
- On receive calculate the network namespace from skb->dev.
- Pass struct net from receive down to the functions that actually
  do the association lookup.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
4cdadcbcb6 sctp: Make the endpoint hashtable handle multiple network namespaces
- Use struct net in the hash calculation
- Use sock_net(endpoint.base.sk) in the endpoint lookups.
- On receive calculate the network namespace from skb->dev.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
f1f4376307 sctp: Make the port hash table use struct net in it's key.
- Add struct net into the port hash table hash calculation
- Add struct net inot the struct sctp_bind_bucket so there
  is a memory of which network namespace a port is allocated in.
  No need for a ref count because sctp_bind_bucket only exists
  when there are sockets in the hash table and sockets can not
  change their network namspace, and sockets already ref count
  their network namespace.
- Add struct net into the key comparison when we are testing
  to see if we have found the port hash table entry we are
  looking for.

With these changes lookups in the port hash table becomes
safe to use in multiple network namespaces.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
af4c6641f5 net sched: Pass the skb into change so it can access NETLINK_CB
cls_flow.c plays with uids and gids.  Unless I misread that
code it is possible for classifiers to depend on the specific uid and
gid values.  Therefore I need to know the user namespace of the
netlink socket that is installing the packet classifiers.  Pass
in the rtnetlink skb so I can access the NETLINK_CB of the passed
packet.  In particular I want access to sk_user_ns(NETLINK_CB(in_skb).ssk).

Pass in not the user namespace but the incomming rtnetlink skb into
the the classifier change routines as that is generally the more useful
parameter.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:28 -07:00
Eric W. Biederman
c336d148ad userns: Implement sk_user_ns
Add a helper sk_user_ns to make it easy to find the user namespace
of the process that opened a socket.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:56 -07:00
Eric W. Biederman
d13fda8564 userns: Convert net/ax25 to use kuid_t where appropriate
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:42 -07:00
Eric W. Biederman
4f82f45730 net ip6 flowlabel: Make owner a union of struct pid * and kuid_t
Correct a long standing omission and use struct pid in the owner
field of struct ip6_flowlabel when the share type is IPV6_FL_S_PROCESS.
This guarantees we don't have issues when pid wraparound occurs.

Use a kuid_t in the owner field of struct ip6_flowlabel when the
share type is IPV6_FL_S_USER to add user namespace support.

In /proc/net/ip6_flowlabel capture the current pid namespace when
opening the file and release the pid namespace when the file is
closed ensuring we print the pid owner value that is meaning to
the reader of the file.  Similarly use from_kuid_munged to print
uid values that are meaningful to the reader of the file.

This requires exporting pid_nr_ns so that ipv6 can continue to built
as a module.  Yoiks what silliness

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:25 -07:00
Eric W. Biederman
7064d16e16 userns: Use kgids for sysctl_ping_group_range
- Store sysctl_ping_group_range as a paire of kgid_t values
  instead of a pair of gid_t values.
- Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range
- For invalid cases reset to the default disabled state.

With the kgid_t conversion made part of the original value sanitation
from userspace understand how the code will react becomes clearer
and it becomes possible to set the sysctl ping group range from
something other than the initial user namespace.

Cc: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:10 -07:00
Eric W. Biederman
a7cb5a49bf userns: Print out socket uids in a user namespace aware fashion.
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:48:06 -07:00
Eric W. Biederman
976d020150 userns: Convert sock_i_uid to return a kuid_t
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:47:34 -07:00
Vinicius Costa Gomes
57f5d0d1d9 Bluetooth: Remove some functions from being exported
Some connection related functions are only used inside hci_conn.c
so no need to have them exported.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:11 -03:00
Ben Hutchings
6024935f5f llc2: Fix silent failure of llc_station_init()
llc_station_init() creates and processes an event skb with no effect
other than to change the state from DOWN to UP.  Allocation failure is
reported, but then ignored by its caller, llc2_init().  Remove this
possibility by simply initialising the state as UP.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:51:18 -07:00
xeb@mail.ru
c12b395a46 gre: Support GRE over IPv6
GRE over IPv6 implementation.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:28:32 -07:00
Eric Dumazet
2359a47671 codel: refine one condition to avoid a nul rec_inv_sqrt
One condition before codel_Newton_step() was not good if
we never left the dropping state for a flow. As a result
rec_inv_sqrt was 0, instead of the ~0 initial value.

codel control law was then set to a very aggressive mode, dropping
many packets before reaching 'target' and recovering from this problem.

To keep codel_vars_init() as efficient as possible, refine
the condition to make sure rec_inv_sqrt initial value is correct

Many thanks to Anton Mich for discovering the issue and suggesting
a fix.

Reported-by: Anton Mich <lp2s1h@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-10 16:52:54 -07:00
Eric Dumazet
b5ec8eeac4 ipv4: fix ip_send_skb()
ip_send_skb() can send orphaned skb, so we must pass the net pointer to
avoid possible NULL dereference in error path.

Bug added by commit 3a7c384ffd (ipv4: tcp: unicast_sock should not
land outside of TCP stack)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-10 14:08:57 -07:00
John W. Linville
57f784fed3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-08-10 15:13:12 -04:00
Eric Dumazet
63d02d157e net: tcp: ipv6_mapped needs sk_rx_dst_set method
commit 5d299f3d3c (net: ipv6: fix TCP early demux) added a
regression for ipv6_mapped case.

[   67.422369] SELinux: initialized (dev autofs, type autofs), uses
genfs_contexts
[   67.449678] SELinux: initialized (dev autofs, type autofs), uses
genfs_contexts
[   92.631060] BUG: unable to handle kernel NULL pointer dereference at
(null)
[   92.631435] IP: [<          (null)>]           (null)
[   92.631645] PGD 0
[   92.631846] Oops: 0010 [#1] SMP
[   92.632095] Modules linked in: autofs4 sunrpc ipv6 dm_mirror
dm_region_hash dm_log dm_multipath dm_mod video sbs sbshc battery ac lp
parport sg snd_hda_intel snd_hda_codec snd_seq_oss snd_seq_midi_event
snd_seq snd_seq_device pcspkr snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer serio_raw button floppy snd i2c_i801 i2c_core soundcore
snd_page_alloc shpchp ide_cd_mod cdrom microcode ehci_hcd ohci_hcd
uhci_hcd
[   92.634294] CPU 0
[   92.634294] Pid: 4469, comm: sendmail Not tainted 3.6.0-rc1 #3
[   92.634294] RIP: 0010:[<0000000000000000>]  [<          (null)>]
(null)
[   92.634294] RSP: 0018:ffff880245fc7cb0  EFLAGS: 00010282
[   92.634294] RAX: ffffffffa01985f0 RBX: ffff88024827ad00 RCX:
0000000000000000
[   92.634294] RDX: 0000000000000218 RSI: ffff880254735380 RDI:
ffff88024827ad00
[   92.634294] RBP: ffff880245fc7cc8 R08: 0000000000000001 R09:
0000000000000000
[   92.634294] R10: 0000000000000000 R11: ffff880245fc7bf8 R12:
ffff880254735380
[   92.634294] R13: ffff880254735380 R14: 0000000000000000 R15:
7fffffffffff0218
[   92.634294] FS:  00007f4516ccd6f0(0000) GS:ffff880256600000(0000)
knlGS:0000000000000000
[   92.634294] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   92.634294] CR2: 0000000000000000 CR3: 0000000245ed1000 CR4:
00000000000007f0
[   92.634294] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   92.634294] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[   92.634294] Process sendmail (pid: 4469, threadinfo ffff880245fc6000,
task ffff880254b8cac0)
[   92.634294] Stack:
[   92.634294]  ffffffff813837a7 ffff88024827ad00 ffff880254b6b0e8
ffff880245fc7d68
[   92.634294]  ffffffff81385083 00000000001d2680 ffff8802547353a8
ffff880245fc7d18
[   92.634294]  ffffffff8105903a ffff88024827ad60 0000000000000002
00000000000000ff
[   92.634294] Call Trace:
[   92.634294]  [<ffffffff813837a7>] ? tcp_finish_connect+0x2c/0xfa
[   92.634294]  [<ffffffff81385083>] tcp_rcv_state_process+0x2b6/0x9c6
[   92.634294]  [<ffffffff8105903a>] ? sched_clock_cpu+0xc3/0xd1
[   92.634294]  [<ffffffff81059073>] ? local_clock+0x2b/0x3c
[   92.634294]  [<ffffffff8138caf3>] tcp_v4_do_rcv+0x63a/0x670
[   92.634294]  [<ffffffff8133278e>] release_sock+0x128/0x1bd
[   92.634294]  [<ffffffff8139f060>] __inet_stream_connect+0x1b1/0x352
[   92.634294]  [<ffffffff813325f5>] ? lock_sock_nested+0x74/0x7f
[   92.634294]  [<ffffffff8104b333>] ? wake_up_bit+0x25/0x25
[   92.634294]  [<ffffffff813325f5>] ? lock_sock_nested+0x74/0x7f
[   92.634294]  [<ffffffff8139f223>] ? inet_stream_connect+0x22/0x4b
[   92.634294]  [<ffffffff8139f234>] inet_stream_connect+0x33/0x4b
[   92.634294]  [<ffffffff8132e8cf>] sys_connect+0x78/0x9e
[   92.634294]  [<ffffffff813fd407>] ? sysret_check+0x1b/0x56
[   92.634294]  [<ffffffff81088503>] ? __audit_syscall_entry+0x195/0x1c8
[   92.634294]  [<ffffffff811cc26e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   92.634294]  [<ffffffff813fd3e2>] system_call_fastpath+0x16/0x1b
[   92.634294] Code:  Bad RIP value.
[   92.634294] RIP  [<          (null)>]           (null)
[   92.634294]  RSP <ffff880245fc7cb0>
[   92.634294] CR2: 0000000000000000
[   92.648982] ---[ end trace 24e2bed94314c8d9 ]---
[   92.649146] Kernel panic - not syncing: Fatal exception in interrupt

Fix this using inet_sk_rx_dst_set(), and export this function in case
IPv6 is modular.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09 20:56:09 -07:00
Julian Anastasov
3654e61137 ipvs: add pmtu_disc option to disable IP DF for TUN packets
Disabling PMTU discovery can increase the output packet
rate but some users have enough resources and prefer to fragment
than to drop traffic. By default, we copy the DF bit but if
pmtu_disc is disabled we do not send FRAG_NEEDED messages anymore.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:35:07 +09:00
Julian Anastasov
be97fdb5fb ipvs: generalize app registration in netns
Get rid of the ftp_app pointer and allow applications
to be registered without adding fields in the netns_ipvs structure.

v2: fix coding style as suggested by Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:34:51 +09:00
Pavel Emelyanov
1fb9489bf1 net: Loopback ifindex is constant now
As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09 16:18:07 -07:00
Pavel Emelyanov
aa79e66eee net: Make ifindex generation per-net namespace
Strictly speaking this is only _really_ required for checkpoint-restore to
make loopback device always have the same index.

This change appears to be safe wrt "ifindex should be unique per-system"
concept, as all the ifindex usage is either already made per net namespace
of is explicitly limited with init_net only.

There are two cool side effects of this. The first one -- ifindices of
devices in container are always small, regardless of how many containers
we've started (and re-started) so far. The second one is -- we can speed
up the loopback ifidex access as shown in the next patch.

v2: Place ifindex right after dev_base_seq : avoid two holes and use the
    same cache line, dirtied in list_netdevice()/unlist_netdevice()

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09 16:18:07 -07:00
Pavel Emelyanov
b14f243a42 net: Dont use ifindices in hash fns
Eric noticed, that when there will be devices with equal indices, some
hash functions that use them will become less effective as they could.
Fix this in advance by mixing the net_device address into the hash value
instead of the device index.

This is true for arp and ndisc hash fns. The netlabel, can and llc ones
are also ifindex-based, but that three are init_net-only, thus will not
be affected.

Many thanks to David and Eric for the hash32_ptr implementation!

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09 16:18:06 -07:00
Eric Dumazet
a37e6e3449 net: force dst_default_metrics to const section
While investigating on network performance problems, I found this little
gem :

$ nm -v vmlinux | grep -1 dst_default_metrics
ffffffff82736540 b busy.46605
ffffffff82736560 B dst_default_metrics
ffffffff82736598 b dst_busy_list

Apparently, declaring a const array without initializer put it in
(writeable) bss section, in middle of possibly often dirtied cache
lines.

Since we really want dst_default_metrics be const to avoid any possible
false sharing and catch any buggy writes, I force a null initializer.

ffffffff818a4c20 R dst_default_metrics

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08 16:00:28 -07:00
Eric Dumazet
425f09ab7d net: output path optimizations
1) Avoid dirtying neighbour's confirmed field.

  TCP workloads hits this cache line for each incoming ACK.
  Lets write n->confirmed only if there is a jiffie change.

2) Optimize neigh_hh_output() for the common Ethernet case, were
   hh_len is less than 16 bytes. Replace the memcpy() call
   by two inlined 64bit load/stores on x86_64.

Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)

24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.

before : 2.247s, 2.235s, 2.247s, 2.318s
after  : 1.884s, 1.905s, 1.891s, 1.895s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-07 16:24:55 -07:00
Eric Dumazet
d25398df59 net: avoid reloads in SNMP_UPD_PO_STATS
Avoid two instructions to reload dev->nd_net->mib.ip_statistics pointer,
unsing a temp variable, in ip_rcv(), ip_output() paths for example.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06 13:40:47 -07:00
Eric Dumazet
5d299f3d3c net: ipv6: fix TCP early demux
IPv6 needs a cookie in dst_check() call.

We need to add rx_dst_cookie and provide a family independent
sk_rx_dst_set(sk, skb) method to properly support IPv6 TCP early demux.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06 13:33:21 -07:00
Andre Guedes
b9b343d254 Bluetooth: Fix hci_le_conn_complete_evt
We need to check the 'Role' parameter from the LE Connection
Complete Event in order to properly set 'out' and 'link_mode'
fields from hci_conn structure.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:05:10 -03:00
Masatake YAMATO
256a06c8a8 Bluetooth: /proc/net/ entries for bluetooth protocols
lsof command can tell the type of socket processes are using.
Internal lsof uses inode numbers on socket fs to resolve the type of
sockets. Files under /proc/net/, such as tcp, udp, unix, etc provides
such inode information.

Unfortunately bluetooth related protocols don't provide such inode
information. This patch series introduces /proc/net files for the protocols.

This patch against af_bluetooth.c provides facility to the implementation
of protocols. This patch extends bt_sock_list and introduces two exported
function bt_procfs_init, bt_procfs_cleanup.

The type bt_sock_list is already used in some of implementation of
protocols. bt_procfs_init prepare seq_operations which converts
protocol own bt_sock_list data to protocol own proc entry when the
entry is accessed.

What I, lsof user, need is just inode number of bluetooth
socket. However, people may want more information. The bt_procfs_init
takes a function pointer for customizing the show handler of
seq_operations.

In v4 patch, __acquires and __releases attributes are added to suppress
sparse warning. Suggested by Andrei Emeltchenko.

In v5 patch, linux/proc_fs.h is included to use PDE. Build error is
reported by Fengguang Wu.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:58 -03:00
Jaganath Kanakkassery
4af66c691f Bluetooth: Free the l2cap channel list only when refcount is zero
Move the l2cap channel list chan->global_l under the refcnt
protection and free it based on the refcnt.

Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:58 -03:00
Jaganath Kanakkassery
3064837289 Bluetooth: Move l2cap_chan_hold/put to l2cap_core.c
Refactor the code in order to use the l2cap_chan_destroy()
from l2cap_chan_put() under the refcnt protection.

Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:58 -03:00
Andrei Emeltchenko
9e66463127 Bluetooth: Make connect / disconnect cfm functions return void
Return values are never used because callers hci_proto_connect_cfm
and hci_proto_disconn_cfm return void.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:58 -03:00
Andrei Emeltchenko
ab846ec4ea Bluetooth: Define AMP controller statuses
AMP status codes copied from Bluez patch sent by Peter Krystad
<pkrystad@codeaurora.org>.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:55 -03:00
Andrei Emeltchenko
b93a68295f Bluetooth: trivial: Fix mixing spaces and tabs in smp
Change spaces to tabs in smp code

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:55 -03:00
Andrei Emeltchenko
71becf0cea Bluetooth: debug: Fix printing refcnt for hci_conn
Use the same style for refcnt printing through all Bluetooth code
taking the reference the l2cap_chan refcnt printing.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:55 -03:00
Andrei Emeltchenko
bb4b2a9ae3 Bluetooth: mgmt: Managing only BR/EDR HCI controllers
Add check that HCI controller is BR/EDR. AMP controller shall not be
managed by mgmt interface and consequently user space.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:54 -03:00
Andre Guedes
3f1732462c Bluetooth: Remove missing code
This patch removes the struct adv_entry since it is not used anymore.
This struct should have been removed in commit 479453d (Bluetooth:
Remove advertising cache).

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06 15:02:54 -03:00
Greg Kroah-Hartman
c87985a3ce Merge tty-next into 3.6-rc1
This handles the merge issue in:
	arch/um/drivers/line.c
	arch/um/drivers/line.h
And resolves the duplicate patches that were in both trees do to the
tty-next branch not getting merged into 3.6-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-06 09:48:31 -07:00
Jiri Pirko
4778e0be16 netlink: add signed types
Signed types might be needed in NL communication from time to time
(I need s32 in team driver), so add them.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-03 20:40:11 -07:00
John W. Linville
1a26904eb6 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-08-02 13:49:38 -04:00
Seth Forshee
03f6b0843a cfg80211: add channel flag to prohibit OFDM operation
Currently the only way for wireless drivers to tell whether or not OFDM
is allowed on the current channel is to check the regulatory
information. However, this requires hodling cfg80211_mutex, which is not
visible to the drivers.

Other regulatory restrictions are provided as flags in the channel
definition, so let's do similarly with OFDM. This patch adds a new flag,
IEEE80211_CHAN_NO_OFDM, to tell drivers that OFDM on a channel is not
allowed. This flag is set on any channels for which regulatory indicates
that OFDM is prohibited.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-02 15:30:49 +02:00
Fan Du
e3c0d04750 Fix unexpected SA hard expiration after changing date
After SA is setup, one timer is armed to detect soft/hard expiration,
however the timer handler uses xtime to do the math. This makes hard
expiration occurs first before soft expiration after setting new date
with big interval. As a result new child SA is deleted before rekeying
the new one.

Signed-off-by: Fan Du <fdu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02 00:19:17 -07:00
Ben Hutchings
1485348d24 tcp: Apply device TSO segment limit earlier
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs.  This avoids the need to fall back to
software GSO for local TCP senders.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02 00:19:17 -07:00
Linus Torvalds
ac694dbdbc Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches:
 - MM
 - a few random fixes
 - a couple of RTC leftovers

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  rtc/rtc-88pm80x: remove unneed devm_kfree
  rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
  mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
  tmpfs: distribute interleave better across nodes
  mm: remove redundant initialization
  mm: warn if pg_data_t isn't initialized with zero
  mips: zero out pg_data_t when it's allocated
  memcg: gix memory accounting scalability in shrink_page_list
  mm/sparse: remove index_init_lock
  mm/sparse: more checks on mem_section number
  mm/sparse: optimize sparse_index_alloc
  memcg: add mem_cgroup_from_css() helper
  memcg: further prevent OOM with too many dirty pages
  memcg: prevent OOM with too many dirty pages
  mm: mmu_notifier: fix freed page still mapped in secondary MMU
  mm: memcg: only check anon swapin page charges for swap cache
  mm: memcg: only check swap cache pages for repeated charging
  mm: memcg: split swapin charge function into private and public part
  mm: memcg: remove needless !mm fixup to init_mm when charging
  mm: memcg: remove unneeded shmem charge type
  ...
2012-07-31 19:25:39 -07:00
Mel Gorman
c76562b670 netvm: prevent a stream-specific deadlock
This patch series is based on top of "Swap-over-NBD without deadlocking
v15" as it depends on the same reservation of PF_MEMALLOC reserves logic.

When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it
with swapon.  In diskless systems this is not an option so if swap if
required then swapping over the network is considered.  The two likely
scenarios are when blade servers are used as part of a cluster where the
form factor or maintenance costs do not allow the use of disks and thin
clients.

The Linux Terminal Server Project recommends the use of the Network Block
Device (NBD) for swap but this is not always an option.  There is no
guarantee that the network attached storage (NAS) device is running Linux
or supports NBD.  However, it is likely that it supports NFS so there are
users that want support for swapping over NFS despite any performance
concern.  Some distributions currently carry patches that support swapping
over NFS but it would be preferable to support it in the mainline kernel.

Patch 1 avoids a stream-specific deadlock that potentially affects TCP.

Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
	reserves.

Patch 3 adds three helpers for filesystems to handle swap cache pages.
	For example, page_file_mapping() returns page->mapping for
	file-backed pages and the address_space of the underlying
	swap file for swap cache pages.

Patch 4 adds two address_space_operations to allow a filesystem
	to pin all metadata relevant to a swapfile in memory. Upon
	successful activation, the swapfile is marked SWP_FILE and
	the address space operation ->direct_IO is used for writing
	and ->readpage for reading in swap pages.

Patch 5 notes that patch 3 is bolting
	filesystem-specific-swapfile-support onto the side and that
	the default handlers have different information to what
	is available to the filesystem. This patch refactors the
	code so that there are generic handlers for each of the new
	address_space operations.

Patch 6 adds an API to allow a vector of kernel addresses to be
	translated to struct pages and pinned for IO.

Patch 7 adds support for using highmem pages for swap by kmapping
	the pages before calling the direct_IO handler.

Patch 8 updates NFS to use the helpers from patch 3 where necessary.

Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.

Patch 10 implements the new swapfile-related address_space operations
	for NFS and teaches the direct IO handler how to manage
	kernel addresses.

Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
	where appropriate.

Patch 12 fixes a NULL pointer dereference that occurs when using
	swap-over-NFS.

With the patches applied, it is possible to mount a swapfile that is on an
NFS filesystem.  Swap performance is not great with a swap stress test
taking roughly twice as long to complete than if the swap device was
backed by NBD.

This patch: netvm: prevent a stream-specific deadlock

It could happen that all !SOCK_MEMALLOC sockets have buffered so much data
that we're over the global rmem limit.  This will prevent SOCK_MEMALLOC
buffers from receiving data, which will prevent userspace from running,
which is needed to reduce the buffered data.

Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.  Once
this change it applied, it is important that sockets that set
SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
If this happens, a warning is generated and the tokens reclaimed to avoid
accounting errors until the bug is fixed.

[davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:47 -07:00
Mel Gorman
b4b9e35585 netvm: set PF_MEMALLOC as appropriate during SKB processing
In order to make sure pfmemalloc packets receive all memory needed to
proceed, ensure processing of pfmemalloc SKBs happens under PF_MEMALLOC.
This is limited to a subset of protocols that are expected to be used for
writing to swap.  Taps are not allowed to use PF_MEMALLOC as these are
expected to communicate with userspace processes which could be paged out.

[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[jslaby@suse.cz: Lock imbalance fix]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:46 -07:00
Mel Gorman
c93bdd0e03 netvm: allow skb allocation to use PFMEMALLOC reserves
Change the skb allocation API to indicate RX usage and use this to fall
back to the PFMEMALLOC reserve when needed.  SKBs allocated from the
reserve are tagged in skb->pfmemalloc.  If an SKB is allocated from the
reserve and the socket is later found to be unrelated to page reclaim, the
packet is dropped so that the memory remains available for page reclaim.
Network protocols are expected to recover from this packet loss.

[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[davem@davemloft.net: Use static branches, coding style corrections]
[sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:46 -07:00
Mel Gorman
7cb0240492 netvm: allow the use of __GFP_MEMALLOC by specific sockets
Allow specific sockets to be tagged SOCK_MEMALLOC and use __GFP_MEMALLOC
for their allocations.  These sockets will be able to go below watermarks
and allocate from the emergency reserve.  Such sockets are to be used to
service the VM (iow.  to swap over).  They must be handled kernel side,
exposing such a socket to user-space is a bug.

There is a risk that the reserves be depleted so for now, the
administrator is responsible for increasing min_free_kbytes as necessary
to prevent deadlock for their workloads.

[a.p.zijlstra@chello.nl: Original patches]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:46 -07:00
Mel Gorman
99a1dec70d net: introduce sk_gfp_atomic() to allow addition of GFP flags depending on the individual socket
Introduce sk_gfp_atomic(), this function allows to inject sock specific
flags to each sock related allocation.  It is only used on allocation
paths that may be required for writing pages back to network storage.

[davem@davemloft.net: Use sk_gfp_atomic only when necessary]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:46 -07:00
Andrew Morton
c255a45805 memcg: rename config variables
Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:43 -07:00
David S. Miller
caacf05e5a ipv4: Properly purge netdev references on uncached routes.
When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.

If a route is uncached, we currently have no way of accomplishing
this.

So create a global list that is scanned when a network device goes
down.  This mirrors the logic in net/core/dst.c's dst_ifdown().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 15:06:50 -07:00
David S. Miller
c5038a8327 ipv4: Cache routes in nexthop exception entries.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 15:02:02 -07:00
Eric Dumazet
d26b3a7c4b ipv4: percpu nh_rth_output cache
Input path is mostly run under RCU and doesnt touch dst refcnt

But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt->rt_pmtu

Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.

24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)

before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 14:41:39 -07:00
Eric Dumazet
54764bb647 ipv4: Restore old dst_free() behavior.
commit 404e0a8b6a (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :

We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.

Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.

Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)

I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 14:41:38 -07:00
Thomas Huehn
36323f817a mac80211: move TX station pointer and restructure TX
Remove the control.sta pointer from ieee80211_tx_info to free up
sufficient space in the TX skb control buffer for the upcoming
Transmit Power Control (TPC).
Instead, the pointer is now on the stack in a new control struct
that is passed as a function parameter to the drivers' tx method.

Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[reworded commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-31 16:18:39 +02:00
Eliad Peller
ab09587740 mac80211: add PS flag to bss_conf
Currently, ps mode is indicated per device (rather than
per interface), which doesn't make a lot of sense.

Moreover, there are subtle bugs caused by the inability
to indicate ps change along with other changes
(e.g. when the AP deauth us, we'd like to indicate
CHANGED_PS | CHANGED_ASSOC, as changing PS before
notifying about disassociation will result in null-packets
being sent (if IEEE80211_HW_SUPPORTS_DYNAMIC_PS) while
the sta is already disconnected.)

Keep the current per-device notifications, and add
parallel per-vif notifications.

In order to keep it simple, the per-device ps and
the per-vif ps are orthogonal - the per-vif ps
configuration is determined only by the user
configuration (enable/disable) and the connection
state, and is not affected by other vifs state and
(temporary) dynamic_ps/offchannel operations
(unlike per-device ps).

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-31 16:11:04 +02:00
Eric Dumazet
0c7462a235 ipv4: remove rt_cache_rebuild_count
After IP route cache removal, rt_cache_rebuild_count is no longer
used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30 14:53:22 -07:00
Eric Dumazet
404e0a8b6a net: ipv4: fix RCU races on dst refcounts
commit c6cffba4ff (ipv4: Fix input route performance regression.)
added various fatal races with dst refcounts.

crashes happen on tcp workloads if routes are added/deleted at the same
time.

The dst_free() calls from free_fib_info_rcu() are clearly racy.

We need instead regular dst refcounting (dst_release()) and make
sure dst_release() is aware of RCU grace periods :

Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period
before dst destruction for cached dst

Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero()
to make sure we dont increase a zero refcount (On a dst currently
waiting an rcu grace period before destruction)

rt_cache_route() must take a reference on the new cached route, and
release it if was not able to install it.

With this patch, my machines survive various benchmarks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30 14:53:22 -07:00
Eric Dumazet
c7109986db ipv6: Early TCP socket demux
This is the IPv6 missing bits for infrastructure added in commit
41063e9dd1 (ipv4: Early TCP socket demux.)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 15:50:39 -07:00
David S. Miller
c6cffba4ff ipv4: Fix input route performance regression.
With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.

Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 15:50:39 -07:00
Linus Torvalds
3c4cfadef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David S Miller:

 1) Remove the ipv4 routing cache.  Now lookups go directly into the FIB
    trie and use prebuilt routes cached there.

    No more garbage collection, no more rDOS attacks on the routing
    cache.  Instead we now get predictable and consistent performance,
    no matter what the pattern of traffic we service.

    This has been almost 2 years in the making.  Special thanks to
    Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who
    have helped along the way.

    I'm sure that with a change of this magnitude there will be some
    kind of fallout, but such things ought the be simple to fix at this
    point.  Luckily I'm not European so I'll be around all of August to
    fix things :-)

    The major stages of this work here are each fronted by a forced
    merge commit whose commit message contains a top-level description
    of the motivations and implementation issues.

 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on
    input.

 3) TCP SYN/ACK performance tweaks from Eric Dumazet.

 4) Add namespace support for netfilter L4 conntrack helpers, from Gao
    Feng.

 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from
    Yuval Mintz.

 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet.

 7) Support for connection tracker helpers in userspace, from Pablo
    Neira Ayuso.

 8) Allow userspace driven TX load balancing functions in TEAM driver,
    from Jiri Pirko.

 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with
    embedded gotos.

10) TCP Small Queues, essentially minimize the amount of TCP data queued
    up in the packet scheduler layer.  Whereas the existing BQL (Byte
    Queue Limits) limits the pkt_sched --> netdevice queuing levels,
    this controls the TCP --> pkt_sched queueing levels.

    From Eric Dumazet.

11) Reduce the number of get_page/put_page ops done on SKB fragments,
    from Alexander Duyck.

12) Implement protection against blind resets in TCP (RFC 5961), from
    Eric Dumazet.

13) Support the client side of TCP Fast Open, basically the ability to
    send data in the SYN exchange, from Yuchung Cheng.

    Basically, the sender queues up data with a sendmsg() call using
    MSG_FASTOPEN, then they do the connect() which emits the queued up
    fastopen data.

14) Avoid all the problems we get into in TCP when timers or PMTU events
    hit a locked socket.  The TCP Small Queues changes added a
    tcp_release_cb() that allows us to queue work up to the
    release_sock() caller, and that's what we use here too.  From Eric
    Dumazet.

15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits)
  genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
  r8169: revert "add byte queue limit support".
  ipv4: Change rt->rt_iif encoding.
  net: Make skb->skb_iif always track skb->dev
  ipv4: Prepare for change of rt->rt_iif encoding.
  ipv4: Remove all RTCF_DIRECTSRC handliing.
  ipv4: Really ignore ICMP address requests/replies.
  decnet: Don't set RTCF_DIRECTSRC.
  net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.
  ipv4: Remove redundant assignment
  rds: set correct msg_namelen
  openvswitch: potential NULL deref in sample()
  tcp: dont drop MTU reduction indications
  bnx2x: Add new 57840 device IDs
  tcp: avoid oops in tcp_metrics and reset tcpm_stamp
  niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
  niu: Fix to check for dma mapping errors.
  net: Fix references to out-of-scope variables in put_cmsg_compat()
  net: ethernet: davinci_emac: add pm_runtime support
  net: ethernet: davinci_emac: Remove unnecessary #include
  ...
2012-07-24 10:01:50 -07:00
David S. Miller
13378cad02 ipv4: Change rt->rt_iif encoding.
On input packet processing, rt->rt_iif will be zero if we should
use skb->dev->ifindex.

Since we access rt->rt_iif consistently via inet_iif(), that is
the only spot whose interpretation have to adjust.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 16:36:27 -07:00
David S. Miller
92101b3b2e ipv4: Prepare for change of rt->rt_iif encoding.
Use inet_iif() consistently, and for TCP record the input interface of
cached RX dst in inet sock.

rt->rt_iif is going to be encoded differently, so that we can
legitimately cache input routes in the FIB info more aggressively.

When the input interface is "use SKB device index" the rt->rt_iif will
be set to zero.

This forces us to move the TCP RX dst cache installation into the ipv4
specific code, and as well it should since doing the route caching for
ipv6 is pointless at the moment since it is not inspected in the ipv6
input paths yet.

Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
obsolete set to a non-zero value to force invocation of the check
callback.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 16:36:26 -07:00
Linus Torvalds
a66d2c8f7e Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull the big VFS changes from Al Viro:
 "This one is *big* and changes quite a few things around VFS.  What's in there:

   - the first of two really major architecture changes - death to open
     intents.

     The former is finally there; it was very long in making, but with
     Miklos getting through really hard and messy final push in
     fs/namei.c, we finally have it.  Unlike his variant, this one
     doesn't introduce struct opendata; what we have instead is
     ->atomic_open() taking preallocated struct file * and passing
     everything via its fields.

     Instead of returning struct file *, it returns -E...  on error, 0
     on success and 1 in "deal with it yourself" case (e.g.  symlink
     found on server, etc.).

     See comments before fs/namei.c:atomic_open().  That made a lot of
     goodies finally possible and quite a few are in that pile:
     ->lookup(), ->d_revalidate() and ->create() do not get struct
     nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
     flags instead, ->create() gets "do we want it exclusive" flag.

     With the introduction of new helper (kern_path_locked()) we are rid
     of all struct nameidata instances outside of fs/namei.c; it's still
     visible in namei.h, but not for long.  Come the next cycle,
     declaration will move either to fs/internal.h or to fs/namei.c
     itself.  [me, miklos, hch]

   - The second major change: behaviour of final fput().  Now we have
     __fput() done without any locks held by caller *and* not from deep
     in call stack.

     That obviously lifts a lot of constraints on the locking in there.
     Moreover, it's legal now to call fput() from atomic contexts (which
     has immediately simplified life for aio.c).  We also don't need
     anti-recursion logics in __scm_destroy() anymore.

     There is a price, though - the damn thing has become partially
     asynchronous.  For fput() from normal process we are guaranteed
     that pending __fput() will be done before the caller returns to
     userland, exits or gets stopped for ptrace.

     For kernel threads and atomic contexts it's done via
     schedule_work(), so theoretically we might need a way to make sure
     it's finished; so far only one such place had been found, but there
     might be more.

     There's flush_delayed_fput() (do all pending __fput()) and there's
     __fput_sync() (fput() analog doing __fput() immediately).  I hope
     we won't need them often; see warnings in fs/file_table.c for
     details.  [me, based on task_work series from Oleg merged last
     cycle]

   - sync series from Jan

   - large part of "death to sync_supers()" work from Artem; the only
     bits missing here are exofs and ext4 ones.  As far as I understand,
     those are going via the exofs and ext4 trees resp.; once they are
     in, we can put ->write_super() to the rest, along with the thread
     calling it.

   - preparatory bits from unionmount series (from dhowells).

   - assorted cleanups and fixes all over the place, as usual.

  This is not the last pile for this cycle; there's at least jlayton's
  ESTALE work and fsfreeze series (the latter - in dire need of fixes,
  so I'm not sure it'll make the cut this cycle).  I'll probably throw
  symlink/hardlink restrictions stuff from Kees into the next pile, too.
  Plus there's a lot of misc patches I hadn't thrown into that one -
  it's large enough as it is..."

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
  ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
  btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
  switch dentry_open() to struct path, make it grab references itself
  spufs: shift dget/mntget towards dentry_open()
  zoran: don't bother with struct file * in zoran_map
  ecryptfs: don't reinvent the wheels, please - use struct completion
  don't expose I_NEW inodes via dentry->d_inode
  tidy up namei.c a bit
  unobfuscate follow_up() a bit
  ext3: pass custom EOF to generic_file_llseek_size()
  ext4: use core vfs llseek code for dir seeks
  vfs: allow custom EOF in generic_file_llseek code
  vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
  vfs: Remove unnecessary flushing of block devices
  vfs: Make sys_sync writeout also block device inodes
  vfs: Create function for iterating over block devices
  vfs: Reorder operations during sys_sync
  quota: Move quota syncing to ->sync_fs method
  quota: Split dquot_quota_sync() to writeback and cache flushing part
  vfs: Move noop_backing_dev_info check from sync into writeback
  ...
2012-07-23 12:27:27 -07:00
Eric Dumazet
563d34d057 tcp: dont drop MTU reduction indications
ICMP messages generated in output path if frame length is bigger than
mtu are actually lost because socket is owned by user (doing the xmit)

One example is the ipgre_tunnel_xmit() calling
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));

We had a similar case fixed in commit a34a101e1e (ipv6: disable GSO on
sockets hitting dst_allfrag).

Problem of such fix is that it relied on retransmit timers, so short tcp
sessions paid a too big latency increase price.

This patch uses the tcp_release_cb() infrastructure so that MTU
reduction messages (ICMP messages) are not lost, and no extra delay
is added in TCP transmits.

Reported-by: Maciej Żenczykowski <maze@google.com>
Diagnosed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 00:58:46 -07:00
David S. Miller
5e9965c15b Merge branch 'kill_rtcache'
The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables.  The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.

The general flow of this patch series is that first the routing cache
is removed.  We build a completely new rtable entry every lookup
request.

Next we make some simplifications due to the fact that removing the
routing cache causes several members of struct rtable to become no
longer necessary.

Then we need to make some amends such that we can legally cache
pre-constructed routes in the FIB nexthops.  Firstly, we need to
invalidate routes which are hit with nexthop exceptions.  Secondly we
have to change the semantics of rt->rt_gateway such that zero means
that the destination is on-link and non-zero otherwise.

Now that the preparations are ready, we start caching precomputed
routes in the FIB nexthops.  Output and input routes need different
kinds of care when determining if we can legally do such caching or
not.  The details are in the commit log messages for those changes.

The patch series then winds down with some more struct rtable
simplifications and other tidy ups that remove unnecessary overhead.

On a SPARC-T3 output route lookups are ~876 cycles.  Input route
lookups are ~1169 cycles with rpfilter disabled, and about ~1468
cycles with rpfilter enabled.

These measurements were taken with the kbench_mod test module in the
net_test_tools GIT tree:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git

That GIT tree also includes a udpflood tester tool and stresses
route lookups on packet output.

For example, on the same SPARC-T3 system we can run:

	time ./udpflood -l 10000000 10.2.2.11

with routing cache:
real    1m21.955s       user    0m6.530s        sys     1m15.390s

without routing cache:
real    1m31.678s       user    0m6.520s        sys     1m25.140s

Performance undoubtedly can easily be improved further.

For example fib_table_lookup() performs a lot of excessive
computations with all the masking and shifting, some of it
conditionalized to deal with edge cases.

Also, Eric's no-ref optimization for input route lookups can be
re-instated for the FIB nexthop caching code path.  I would be really
pleased if someone would work on that.

In fact anyone suitable motivated can just fire up perf on the loading
of the test net_test_tools benchmark kernel module.  I spend much of
my time going:

bash# perf record insmod ./kbench_mod.ko dst=172.30.42.22 src=74.128.0.1 iif=2
bash# perf report

Thanks to helpful feedback from Joe Perches, Eric Dumazet, Ben
Hutchings, and others.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22 17:04:15 -07:00
Al Viro
6120d3dbb1 get rid of ->scm_work_list
recursion in __scm_destroy() will be cut by delaying final fput()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:00 +04:00
John Fastabend
406a3c638c net: netprio_cgroup: rework update socket logic
Instead of updating the sk_cgrp_prioidx struct field on every send
this only updates the field when a task is moved via cgroup
infrastructure.

This allows sockets that may be used by a kernel worker thread
to be managed. For example in the iscsi case today a user can
put iscsid in a netprio cgroup and control traffic will be sent
with the correct sk_cgrp_prioidx value set but as soon as data
is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
is updated with the kernel worker threads value which is the
default case.

It seems more correct to only update the field when the user
explicitly sets it via control group infrastructure. This allows
the users to manage sockets that may be used with other threads.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22 12:44:01 -07:00
Neil Horman
5aa93bcf66 sctp: Implement quick failover draft from tsvwg
I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters.  While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established.  I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
CC: joe@perches.com
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22 12:13:46 -07:00
David S. Miller
0bb4087cbe ipv4: Fix neigh lookup keying over loopback/point-to-point devices.
We were using a special key "0" for all loopback and point-to-point
device neigh lookups under ipv4, but we wouldn't use that special
key for the neigh creation.

So basically we'd make a new neigh at each and every lookup :-)

This special case to use only one neigh for these device types
is of dubious value, so just remove it entirely.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 16:06:10 -07:00
David S. Miller
2860583fe8 ipv4: Kill rt->fi
It's not really needed.

We only grabbed a reference to the fib_info for the sake of fib_info
local metrics.

However, fib_info objects are freed using RCU, as are therefore their
private metrics (if any).

We would have triggered a route cache flush if we eliminated a
reference to a fib_info object in the routing tables.

Therefore, any existing cached routes will first check and see that
they have been invalidated before an errant reference to these
metric values would occur.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:40:07 -07:00
David S. Miller
9917e1e876 ipv4: Turn rt->rt_route_iif into rt->rt_is_input.
That is this value's only use, as a boolean to indicate whether
a route is an input route or not.

So implement it that way, using a u16 gap present in the struct
already.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:40:02 -07:00
David S. Miller
4fd551d7be ipv4: Kill rt->rt_oif
Never actually used.

It was being set on output routes to the original OIF specified in the
flow key used for the lookup.

Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of
the flowi4_oif and flowi4_iif values, thanks to feedback from Julian
Anastasov.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:38:34 -07:00
David S. Miller
ba3f7f04ef ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:36:54 -07:00
David S. Miller
d2d68ba9fe ipv4: Cache input routes in fib_info nexthops.
Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions.  (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).

However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:36:40 -07:00
David S. Miller
f2bb4bedf3 ipv4: Cache output routes in fib_info nexthops.
If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.

Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.

The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.

Before we allocate the route, we lookup the exception.

Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.

Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.

With help from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:36:16 -07:00
David S. Miller
ceb3320610 ipv4: Kill routes during PMTU/redirect updates.
Mark them obsolete so there will be a re-lookup to fetch the
FIB nexthop exception info.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:22 -07:00
David S. Miller
f5b0a87436 net: Document dst->obsolete better.
Add a big comment explaining how the field works, and use defines
instead of magic constants for the values assigned to it.

Suggested by Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:21 -07:00
David S. Miller
f8126f1d51 ipv4: Adjust semantics of rt->rt_gateway.
In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.

The new interpretation is:

1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr

2) rt_gateway != 0, destination requires a nexthop gateway

Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-07-20 13:31:20 -07:00
David S. Miller
f1ce3062c5 ipv4: Remove 'rt_dst' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:19 -07:00
David Miller
b48698895d ipv4: Remove 'rt_mark' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:18 -07:00
David Miller
d6c0a4f609 ipv4: Kill 'rt_src' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:00 -07:00
David Miller
1a00fee4ff ipv4: Remove rt_key_{src,dst,tos} from struct rtable.
They are always used in contexts where they can be reconstituted,
or where the finally resolved rt->rt_{src,dst} is semantically
equivalent.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:59 -07:00
David Miller
38a424e465 ipv4: Kill ip_route_input_noref().
The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:59 -07:00
David S. Miller
89aef8921b ipv4: Delete routing cache.
The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables.  The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:27 -07:00
Jiri Pirko
df4ab5b3c2 net: rename bond_queue_mapping to slave_dev_queue_mapping
As this is going to be used not only by bonding.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:07:00 -07:00
Jiri Pirko
d40156aa5e rtnl: allow to specify different num for rx and tx queue count
Also cut out unused function parameters and possible err in return
value.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:06:59 -07:00
Eric Dumazet
6f458dfb40 tcp: improve latencies of timer triggered events
Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Eric Dumazet
5815d5e7aa tcp: use hash_32() in tcp_metrics
Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not
guaranteed to be a power of two.

Uses hash_32() instead of custom hash.

tcpmhash_entries should be an unsigned int.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
John W. Linville
90b90f60c4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
David S. Miller
abaa72d7fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
2012-07-19 11:17:30 -07:00
Yuchung Cheng
67da22d23f net-tcp: Fast Open client - cookie-less mode
In trusted networks, e.g., intranet, data-center, the client does not
need to use Fast Open cookie to mitigate DoS attacks. In cookie-less
mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless
of cookie availability.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng
aab4874355 net-tcp: Fast Open client - detecting SYN-data drops
On paths with firewalls dropping SYN with data or experimental TCP options,
Fast Open connections will have experience SYN timeout and bad performance.
The solution is to track such incidents in the cookie cache and disables
Fast Open temporarily.

Since only the original SYN includes data and/or Fast Open option, the
SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect
such drops. If a path has recurring Fast Open SYN drops, Fast Open is
disabled for 2^(recurring_losses) minutes starting from four minutes up to
roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but
it behaves as connect() then write().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng
cf60af03ca net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)
sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
and write(2). The application should replace connect() with it to
send data in the opening SYN packet.

For blocking socket, sendmsg() blocks until all the data are buffered
locally and the handshake is completed like connect() call. It
returns similar errno like connect() if the TCP handshake fails.

For non-blocking socket, it returns the number of bytes queued (and
transmitted in the SYN-data packet) if cookie is available. If cookie
is not available, it transmits a data-less SYN packet with Fast Open
cookie request option and returns -EINPROGRESS like connect().

Using MSG_FASTOPEN on connecting or connected socket will result in
simlar errno like repeating connect() calls. Therefore the application
should only use this flag on new sockets.

The buffer size of sendmsg() is independent of the MSS of the connection.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng
783237e8da net-tcp: Fast Open client - sending SYN-data
This patch implements sending SYN-data in tcp_connect(). The data is
from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
type of the SYN. If the cookie is not cached (len==0), the host sends
data-less SYN with Fast Open cookie request option to solicit a cookie
from the remote. If cookie is not available (len > 0), the host sends
a SYN-data with Fast Open cookie option. If cookie length is negative,
  the SYN will not include any Fast Open option (for fall back operations).

To deal with middleboxes that may drop SYN with data or experimental TCP
option, the SYN-data is only sent once. SYN retransmits do not include
data or Fast Open options. The connection will fall back to regular TCP
handshake.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng
1fe4c481ba net-tcp: Fast Open client - cookie cache
With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache.
The basic ones are MSS and the cookies. Later patch will cache more to
handle unfriendly middleboxes.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:55:36 -07:00
Yuchung Cheng
2100c8d2d9 net-tcp: Fast Open base
This patch impelements the common code for both the client and server.

1. TCP Fast Open option processing. Since Fast Open does not have an
   option number assigned by IANA yet, it shares the experiment option
   code 254 by implementing draft-ietf-tcpm-experimental-options
   with a 16 bits magic number 0xF989. This enables global experiments
   without clashing the scarce(2) experimental options available for TCP.

   When the draft status becomes standard (maybe), the client should
   switch to the new option number assigned while the server supports
   both numbers for transistion.

2. The new sysctl tcp_fastopen

3. A place holder init function

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:55:36 -07:00
David S. Miller
d8f1641b58 net: Fix warnings in dst_ops.h
include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:43:03 -07:00
Eric Dumazet
be9f4a44e7 ipv4: tcp: remove per net tcp_sock
tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)

[1] These packets are only generated when no socket was matched for
the incoming packet.

Reported-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:35:30 -07:00
Julian Anastasov
aee06da672 ipv4: use seqlock for nh_exceptions
Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:30:14 -07:00
John W. Linville
0cd06647b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-07-18 14:53:10 -04:00
Eric Dumazet
ddbe503203 ipv6: add ipv6_addr_hash() helper
Introduce ipv6_addr_hash() helper doing a XOR on all bits
of an IPv6 address, with an optimized x86_64 version.

Use it in flow dissector, as suggested by Andrew McGregor,
to reduce hash collision probabilities in fq_codel (and other
users of flow dissector)

Use it in ip6_tunnel.c and use more bit shuffling, as suggested
by David Laight, as existing hash was ignoring most of them.

Use it in sunrpc and use more bit shuffling, using hash_32().

Use it in net/ipv6/addrconf.c, using hash_32() as well.

As a cleanup, use it in net/ipv4/tcp_metrics.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew McGregor <andrewmcgr@gmail.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18 11:28:46 -07:00
Saurabh
eb8637cd4a net/ipv4: VTI support rx-path hook in xfrm4_mode_tunnel.
Incorporated David and Steffen's comments.
Add hook for rx-path xfmr4_mode_tunnel for VTI tunnel module.

Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18 09:36:12 -07:00
Eric Dumazet
d3818c92af ipv6: fix inet6_csk_xmit()
We should provide to inet6_csk_route_socket a struct flowi6 pointer,
so that net6_csk_xmit() works correctly instead of sending garbage.

Also add some consts

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18 08:59:58 -07:00
John W. Linville
707be0ae13 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-07-17 15:07:31 -04:00
David S. Miller
a6ff1a2f1e Merge branch 'nexthop_exceptions'
These patches implement the final mechanism necessary to really allow
us to go without the route cache in ipv4.

We need a place to have long-term storage of PMTU/redirect information
which is independent of the routes themselves, yet does not get us
back into a situation where we have to write to metrics or anything
like that.

For this we use an "next-hop exception" table in the FIB nexthops.

The one thing I desperately want to avoid is having to create clone
routes in the FIB trie for this purpose, because that is very
expensive.   However, I'm willing to entertain such an idea later
if this current scheme proves to have downsides that the FIB trie
variant would not have.

In order to accomodate this any such scheme, we need to be able to
produce a full flow key at PMTU/redirect time.  That required an
adjustment of the interface call-sites used to propagate these events.

For a PMTU/redirect with a fully specified socket, we pass that socket
and use it to produce the flow key.

Otherwise we use a passed in SKB to formulate the key.  There are two
cases that need to be distinguished, ICMP message processing (in which
case the IP header is at skb->data) and output packet processing
(mostly tunnels, and in all such cases the IP header is at ip_hdr(skb)).

We also have to make the code able to handle the case where the dst
itself passed into the dst_ops->{update_pmtu,redirect} method is
invalidated.  This matters for calls from sockets that have cached
that route.  We provide a inet{,6} helper function for this purpose,
and edit SCTP specially since it caches routes at the transport rather
than socket level.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 10:48:26 -07:00
David S. Miller
4895c771c7 ipv4: Add FIB nexthop exceptions.
In a regime where we have subnetted route entries, we need a way to
store persistent storage about destination specific learned values
such as redirects and PMTU values.

This is implemented here via nexthop exceptions.

The initial implementation is a 2048 entry hash table with relaiming
starting at chain length 5.  A more sophisticated scheme can be
devised if that proves necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 08:48:50 -07:00
David S. Miller
6700c2709c net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()
This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more.  In the
future the routes will be without destination address, source address,
etc. keying.  One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information.  This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here.  Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 03:29:28 -07:00
Luis R. Rodriguez
57b5ce072e cfg80211: add cellular base station regulatory hint support
Cellular base stations can provide hints to cfg80211 about
where they think we are. This can be done for example on
a cell phone. To enable these hints we simply allow them
through as user regulatory hints but we allow userspace
to clasify the hint as either coming directly from the
user or coming from a cellular base station. This option
is only available when you enable
CONFIG_CFG80211_CERTIFICATION_ONUS.

The base station hints themselves will not be processed
by the core unless at least one device on the system
supports this feature.

Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17 12:16:39 +02:00
Lin Ming
9e33ce453f ipvs: fix oops on NAT reply in br_nf context
IPVS should not reset skb->nf_bridge in FORWARD hook
by calling nf_reset for NAT replies. It triggers oops in
br_nf_forward_finish.

[  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[  579.781792] PGD 218f9067 PUD 0
[  579.781865] Oops: 0000 [#1] SMP
[  579.781945] CPU 0
[  579.781983] Modules linked in:
[  579.782047]
[  579.782080]
[  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
[  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
[  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
[  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
[  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
[  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
[  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
[  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
[  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
[  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
[  579.783919] Stack:
[  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
[  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
[  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
[  579.784477] Call Trace:
[  579.784523]  <IRQ>
[  579.784562]
[  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
[  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
[  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
[  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
[  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
[  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
[  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
[  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
[  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
[  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
[  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
[  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
[  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
[  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
[  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
[  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30

The steps to reproduce as follow,

1. On Host1, setup brige br0(192.168.1.106)
2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
3. Start IPVS service on Host1
   ipvsadm -A -t 192.168.1.106:80 -s rr
   ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
4. Run apache benchmark on Host2(192.168.1.101)
   ab -n 1000 http://192.168.1.106/

ip_vs_reply4
  ip_vs_out
    handle_response
      ip_vs_notrack
        nf_reset()
        {
          skb->nf_bridge = NULL;
        }

Actually, IPVS wants in this case just to replace nfct
with untracked version. So replace the nf_reset(skb) call
in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.

Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-07-17 12:00:46 +02:00
Thomas Pedersen
84f10708f7 cfg80211: support TX error rate CQM
Let the user configure serveral TX error conection quality monitoring
parameters: % error rate, survey interval, and # of attempted packets.

On exceeding the TX failure rate over the given interval, the driver
will send a CQM notify event with the actual TX failure rate and
packets attempted.

Signed-off-by: Thomas Pedersen <c_tpeder@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17 11:57:23 +02:00
Eric Dumazet
282f23c6ee tcp: implement RFC 5961 3.2
Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.

Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)

If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.

Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.

Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 01:36:20 -07:00
Andrey Vagin
51d7cccf07 net: make sock diag per-namespace
Before this patch sock_diag works for init_net only and dumps
information about sockets from all namespaces.

This patch expands sock_diag for all name-spaces.
It creates a netlink kernel socket for each netns and filters
data during dumping.

v2: filter accoding with netns in all places
    remove an unused variable.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 22:31:34 -07:00
Greg Kroah-Hartman
467a3ca5ca Merge branch 'v3.6-rc7' into tty-next
This is to sync up on Linus's branch to get the other tty and core changes.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 12:32:42 -07:00
David S. Miller
02f3d4ce9e sctp: Adjust PMTU updates to accomodate route invalidation.
This adjusts the call to dst_ops->update_pmtu() so that we can
transparently handle the fact that, in the future, the dst itself can
be invalidated by the PMTU update (when we have non-host routes cached
in sockets).

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 03:57:14 -07:00
David S. Miller
35ad9b9cf7 ipv6: Add helper inet6_csk_update_pmtu().
This is the ipv6 version of inet_csk_update_pmtu().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 03:44:56 -07:00
David S. Miller
80d0a69fc5 ipv4: Add helper inet_csk_update_pmtu().
This abstracts away the call to dst_ops->update_pmtu() so that we can
transparently handle the fact that, in the future, the dst itself can
be invalidated by the PMTU update (when we have non-host routes cached
in sockets).

So we try to rebuild the socket cached route after the method
invocation if necessary.

This isn't used by SCTP because it needs to cache dsts per-transport,
and thus will need it's own local version of this helper.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 03:28:06 -07:00
Mat Martineau
c20f8e35ca Bluetooth: Use tx window from config response for ack timing
This change addresses an L2CAP ERTM throughput problem when a remote
device does not fully utilize the available transmit window.

The L2CAP ERTM transmit window size determines the maximum number of
unacked frames that may be outstanding at any time. It is configured
separately for each direction of an ERTM connection. Each side sends a
configuration request with a tx_win field indicating how many unacked
frames it is capable of receiving before sending an ack. The
configuration response's tx_win field shows how many frames the
transmitter will actually send before waiting for an ack.

It's important to trace both the actual transmit window (to check for
validity of incoming frames) and the number of frames that the
transmitter will send before waiting (to send acks at the appropriate
time). Now there are separate tx_win and ack_win values. ack_win is
updated based on configuration responses, and is used to determine
when acks are sent.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-07-15 12:18:29 -03:00
David S. Miller
921a678cb6 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John Linville says:

====================
Several drivers see updates: mwifiex, ath9k, iwlwifi, brcmsmac,
wlcore/wl12xx/wl18xx, and a handful of others.  The bcma bus got a
lot of attention from Hauke Mehrtens.  The cfg80211 component gets
a flurry of patches for multi-channel support, and the mac80211
component gets the first few VHT (11ac) and 60GHz (11ad) patches.
This also includes the removal of the iwmc3200 drivers, since the
hardware never became available to normal people.

Additionally, the NFC subsystem gets a series of updates.  According to
Samuel, "Here are the interesting bits:

- A better error management for the HCI stack.
- An LLCP "late" binding implementation for a better NFC SAP usage. SAPs are
  now reserved only when there's a client for it.
- Support for Sony RC-S360 (a.k.a. PaSoRi) pn533 based dongle. We can read and
  write NFC tags and also establish a p2p link with this dongle now.
- A few LLCP fixes."

Finally, this includes another pull of the fixes from the wireless
tree in order to resolve some merge issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-13 23:02:28 -07:00
David S. Miller
85b91b0339 ipv4: Don't store a rule pointer in fib_result.
We only use it to fetch the rule's tclassid, so just store the
tclassid there instead.

This also decreases the size of fib_result by a full 8 bytes on
64-bit.  On 32-bits it's a wash.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-13 08:21:29 -07:00
Johannes Berg
4290cb4bf2 cfg80211: reduce monitor interface tracking
Revert commit b78e8ceac2
("cfg80211: track monitor channel") and remove the
set_monitor_enabled() callback.

Due to the tracking happening in NETDEV_PRE_UP, it had
introduced bugs because the monitor interface callback
would be called before the device was started. It looks
like there's no way to fix this, and using NETDEV_PRE_UP
is broken anyway (since there's no NETDEV_UP_FAIL), so
remove all that code, track interfaces in NETDEV_UP and
also stop tracking the monitor channel in cfg80211.

This mostly reverts to before the tracking, except that
we keep the interface count tracking so that setting the
monitor channel can be rejected properly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-13 16:16:11 +02:00
Johannes Berg
5b7ccaf3fc cfg80211/mac80211: re-add get_channel operation
This essentially reverts commit 2e165b8184 but
introduces the get_channel operation with a new
wireless_dev argument so that you can retrieve
the channel per interface. This is necessary as
even though we can track all interface channels
(except monitor) we can't track the channel type
used.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-13 16:16:11 +02:00
John W. Linville
d07d152892 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/iwmc3200wifi/cfg80211.c
	drivers/net/wireless/mwifiex/cfg80211.c
2012-07-12 15:21:05 -04:00
John W. Linville
38a0084063 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-12 13:44:50 -04:00
David S. Miller
391e5c22f5 ipv4: Remove tb_peers from fib_table.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12 09:39:28 -07:00
Johannes Berg
8c358bcd09 mac80211: add time synchronisation with BSS for assoc
Some drivers (iwlegacy, iwlwifi and rt2x00) today use the
bss_conf.last_tsf value. By itself though that value is
completely worthless since it may be ancient. What really
is needed is synchronisation between some device time and
the TSF.

To clarify this, rename bss_conf.last_tsf to sync_tsf and
add sync_device_ts which is obtained from rx_status which
gets a new field device_timestamp for this purpose. This
is intentionally not using the mactime field since that
is used for other things and in IBSS is expected to sync
with the IBSS's TSF which isn't necessarily true for the
device timestamp.

Also, since we have the information and it's useful even
before the connection has been established, give all the
timing details to the driver before authenticating.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-12 12:10:46 +02:00
Johannes Berg
30f422925c mac80211: optimize ieee80211_rx_status struct layout
We waste a lot of space in this struct because it uses
int values where smaller ones would be sufficient. The
upcoming A-MPDU information needs some space, optimize
the struct now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-12 12:10:45 +02:00
Johannes Berg
fd0142844e nl80211: move scan API to wdev
The new P2P Device will have to be able to scan for
P2P search, so move scanning to use struct wireless_dev
instead of struct net_device.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-12 12:10:41 +02:00
Johannes Berg
84efbb84cf cfg80211: use wireless_dev for interface management
In order to be able to create P2P Device wdevs, move
the virtual interface management over to wireless_dev
structures.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-12 12:08:10 +02:00
David S. Miller
b94f1c0904 ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().
And delete rt6_redirect(), since it is no longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12 00:33:37 -07:00
David S. Miller
ec18d9a269 ipv6: Add redirect support to all protocol icmp error handlers.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12 00:25:15 -07:00
David S. Miller
3a5ad2ee5e ipv6: Add ip6_redirect() and ip6_sk_redirect() helper functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12 00:08:07 -07:00
David S. Miller
e8599ff4b1 ipv6: Move bulk of redirect handling into rt6_redirect().
This sets things up so that we can have the protocol error handlers
call down into the ipv6 route code for redirects just as ipv4 already
does.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 23:43:53 -07:00
David S. Miller
30f2a5f379 ipv6: Export ndisc option parsing from ndisc.c
This is going to be used internally by the rt6 redirect code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 23:39:11 -07:00
David S. Miller
1f42539d25 ipv4: Kill ip_rt_redirect().
No longer needed, as the protocol handlers now all properly
propagate the redirect back into the routing code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 21:30:08 -07:00
David S. Miller
b42597e2f3 ipv4: Add ipv4_redirect() and ipv4_sk_redirect() helper functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 21:25:45 -07:00
David S. Miller
e47a185b31 ipv4: Generalize ip_do_redirect() and hook into new dst_ops->redirect.
All of the redirect acceptance policy is now contained within.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 20:55:47 -07:00
David S. Miller
94206125c4 ipv4: Rearrange arguments to ip_rt_redirect()
Pass in the SKB rather than just the IP addresses, so that policy
and other aspects can reside in ip_rt_redirect() rather then
icmp_redirect().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 20:38:08 -07:00
Eric Dumazet
46d3ceabd8 tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 18:12:59 -07:00
Andrei Emeltchenko
4b10b274e2 Bluetooth: debug: Print l2cap_chan refcount
Improve debug output.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-07-11 10:09:20 -03:00
David S. Miller
04c9f416e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/batman-adv/bridge_loop_avoidance.c
	net/batman-adv/bridge_loop_avoidance.h
	net/batman-adv/soft-interface.c
	net/mac80211/mlme.c

With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).

The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:56:33 -07:00
Eric Dumazet
1a203cb33a ipv6: optimize ipv6 addresses compares
On 64 bit arches having efficient unaligned accesses (eg x86_64) we can
use long words to reduce number of instructions for free.

Joe Perches suggested to change ipv6_masked_addr_cmp() to return a bool
instead of 'int', to make sure ipv6_masked_addr_cmp() cannot be used
in a sorting function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:13:46 -07:00
David S. Miller
f185071ddf ipv4: Remove inetpeer from routes.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:18 -07:00
David S. Miller
5943634fc5 ipv4: Maintain redirect and PMTU info in struct rtable again.
Maintaining this in the inetpeer entries was not the right way to do
this at all.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:14 -07:00
David S. Miller
3e12939a2a inet: Kill FLOWI_FLAG_PRECOW_METRICS.
No longer needed.  TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics.  Therefore there is no
longer any reason to pre-cow metrics in this way.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:12 -07:00
David S. Miller
16d1839907 inet: Remove ->get_peer() method.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:10 -07:00
David S. Miller
81166dd6fa tcp: Move timestamps from inetpeer to metrics cache.
With help from Lin Ming.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:08 -07:00
David S. Miller
94334d5ed4 net: Kill set_dst_metric_rtt().
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:07 -07:00
David S. Miller
51c5d0c4b1 tcp: Maintain dynamic metrics in local cache.
Maintain a local hash table of TCP dynamic metrics blobs.

Computed TCP metrics are no longer maintained in the route metrics.

The table uses RCU and an extremely simple hash so that it has low
latency and low overhead.  A simple hash is legitimate because we only
make metrics blobs for fully established connections.

Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking.  But
the basic design seems sound.

With help from Eric Dumazet and Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:39:57 -07:00
David S. Miller
ab92bb2f67 tcp: Abstract back handling peer aliveness test into helper function.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 20:33:49 -07:00
David S. Miller
4aabd8ef8c tcp: Move dynamnic metrics handling into seperate file.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 20:31:36 -07:00
David S. Miller
e044a651b9 ipv4: Fix crashes in fib_rules_tclass().
All paths assume, when CONFIG_IP_MULTIPLE_TABLES is enabled, that any
successful call to fib_lookup() will initialize the fib_result->r
value to something.

We violated that expectation in the new fib_lookup() fast path.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 18:05:28 -07:00
Eric Lapuyade
a10d595b10 NFC: Allow HCI driver to pre-open pipes to some gates
Some NFC chips will statically create and open pipes for both standard
and proprietary gates. The driver can now pass this information to HCI
such that HCI will not attempt to create and open them, but will instead
directly use the passed pipe ids.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-07-09 16:42:12 -04:00
Eric Lapuyade
456411ca81 NFC: Driver failure API
This API should be used by drivers, HCI, SHDLC or NCI stacks to report an
unrecoverable error.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-07-09 16:42:08 -04:00
Eric Lapuyade
a9a741a7e2 NFC: Prepare asynchronous error management for driver and shdlc
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-07-09 16:42:04 -04:00
Johannes Berg
71bbc99438 cfg80211: use wdev in mgmt-tx/ROC APIs
The management frame and remain-on-channel APIs will be
needed in the P2P device abstraction, so move them over
to the new wdev-based APIs. Userspace can still use both
the interface index and wdev identifier for them so it's
backward compatible, but for the P2P Device wdev it will
be able to use the wdev identifier only.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-09 14:51:47 +02:00
Johannes Berg
89a54e48b9 nl80211: prepare for non-netdev wireless devs
In order to support a P2P device abstraction and
Bluetooth high-speed AMPs, we need to have a way
to identify virtual interfaces that don't have a
netdev associated.

Do this by adding a NL80211_ATTR_WDEV attribute
to identify a wdev which may or may not also be
a netdev.

To simplify things, use a 64-bit value with the
high 32 bits being the wiphy index for this new
wdev identifier in the nl80211 API.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-09 14:51:46 +02:00
Johannes Berg
f72b85b8eb mac80211: remove ieee80211_key_removed
This API call was intended to be used by drivers
if they want to optimize key handling by removing
one key when another is added. Remove it since no
driver is using it. If needed, it can always be
added back.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-09 14:49:15 +02:00
Pablo Neira Ayuso
6bd0405bb4 netfilter: nf_ct_ecache: fix crash with multiple containers, one shutting down
Hans reports that he's still hitting:

BUG: unable to handle kernel NULL pointer dereference at 000000000000027c
IP: [<ffffffff813615db>] netlink_has_listeners+0xb/0x60
PGD 0
Oops: 0000 [#3] PREEMPT SMP
CPU 0

It happens when adding a number of containers with do:

nfct_query(h, NFCT_Q_CREATE, ct);

and most likely one namespace shuts down.

this problem was supposed to be fixed by:
70e9942 netfilter: nf_conntrack: make event callback registration per-netns

Still, it was missing one rcu_access_pointer to check if the callback
is set or not.

Reported-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-07-09 10:53:19 +02:00
David S. Miller
d3a5ea6e21 Merge branch 'master' of git://1984.lsi.us.es/nf-next 2012-07-07 16:18:50 -07:00
David S. Miller
f4530fa574 ipv4: Avoid overhead when no custom FIB rules are installed.
If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.

It's just pure overhead.

Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.

Also, move fib_num_tclassid_users into the ipv4 network namespace.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 22:13:13 -07:00
Vladimir Kondratiev
95ddc1fc45 cfg80211: bitrate calculation for 60g
60g band uses different from .11n MCS scheme, so bitrate
should be calculated differently

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-05 15:18:32 +02:00
Vladimir Kondratiev
8eb41c8dfb {nl,cfg}80211: support high bitrates
Until now, a u16 value was used to represent bitrate value.
With VHT bitrates this becomes too small.

Introduce a new 32-bit bitrate attribute. nl80211 will report
both the new and the old attribute, unless the bitrate doesn't
fit into the old u16 attribute in which case only the new one
will be reported.

User space tools encouraged to prefer the 32-bit attribute, if
available (since it won't be available on older kernels.)

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
[reword commit message and comments a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-05 15:18:30 +02:00
David S. Miller
c90a9bb907 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-05 03:44:25 -07:00
David S. Miller
36bdbcae2f net: Kill dst->_neighbour, accessors, and final uses.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 02:42:00 -07:00
David S. Miller
97cac0821a ipv6: Store route neighbour in rt6_info struct.
This makes for a simplified conversion away from dst_get_neighbour*().

All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 02:41:58 -07:00
David S. Miller
1d248b1cf4 net: Pass neighbours and dest address into NETEVENT_REDIRECT events.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 02:21:55 -07:00
David S. Miller
fccd7d5c77 decnet: Use neighbours privately in dn_route struct.
This allows an easy conversion away from dst_get_neighbour*().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:12:14 -07:00
David S. Miller
f894cbf847 net: Add optional SKB arg to dst_ops->neigh_lookup().
Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:04:01 -07:00
David S. Miller
5110effee8 net: Do delayed neigh confirmation.
When a dst_confirm() happens, mark the confirmation as pending in the
dst.  Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL.  So just fix those cases up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:03:06 -07:00
David S. Miller
a263b30936 ipv4: Make neigh lookups directly in output packet path.
Do not use the dst cached neigh, we'll be getting rid of that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:02:12 -07:00
Pablo Neira Ayuso
08911475d1 netfilter: nf_conntrack: generalize nf_ct_l4proto_net
This patch generalizes nf_ct_l4proto_net by splitting it into chunks and
moving the corresponding protocol part to where it really belongs to.

To clarify, note that we follow two different approaches to support per-net
depending if it's built-in or run-time loadable protocol tracker.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-07-04 19:37:22 +02:00
Johannes Berg
a1845fc7c5 mac80211: add TX prepare API
Some drivers require setup before being able to send
management frames in managed mode, in particular in
multi-channel cases.

Introduce API to allow the drivers to do such setup
while being able to sleep waiting for the setup to
finish in the device. This isn't possible inside the
TX call since that can't sleep.

A future patch may also restructure the TX retry to
wait for the driver to report the frame status, as
suggested by Arik in
http://mid.gmane.org/CA+XVXffKSEL6ZQPQ98x-zO-NL2=TNF1uN==mprRyUmAaRn254g@mail.gmail.com

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-03 13:50:34 +02:00
Thomas Huehn
e3e1a0bcb3 mac80211: reduce IEEE80211_TX_MAX_RATES
IEEE80211_TX_MAX_RATES can be reduced from 5 to 4 as there
is no current hardware supporting a rate chain with 5 multi
rate stages (mrr), so 4 mrr stages are sufficient.

The memory that is freed within the ieee80211_tx_info struct
will be used in the upcoming Transmission Power Control (TPC)
implementation.

Suggested-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-03 13:48:37 +02:00
Johannes Berg
cb831b537d mac80211: remove tx_frags driver callback
The implementation of tx_frags is buggy due to
not handling queue stop, and there's no driver
implementing it so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-02 15:40:18 +02:00
Vladimir Kondratiev
3a0c52a6d8 cfg80211: add 802.11ad (60gHz band) support
Add enumerations for both cfg80211 and nl80211.
This expands wiphy.bands etc. arrays.

Extend channel <-> frequency translation to cover 60g band
and modify the rate check logic since there are no legacy
mandatory rates (only MCS is used.)

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-02 15:11:10 +02:00
Neil Horman
4244854d22 sctp: be more restrictive in transport selection on bundled sacks
It was noticed recently that when we send data on a transport, its possible that
we might bundle a sack that arrived on a different transport.  While this isn't
a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
2960:

 An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
   etc.) to the same destination transport address from which it
   received the DATA or control chunk to which it is replying.  This
   rule should also be followed if the endpoint is bundling DATA chunks
   together with the reply chunk.

This patch seeks to correct that.  It restricts the bundling of sack operations
to only those transports which have moved the ctsn of the association forward
since the last sack.  By doing this we guarantee that we only bundle outbound
saks on a transport that has received a chunk since the last sack.  This brings
us into stricter compliance with the RFC.

Vlad had initially suggested that we strictly allow only sack bundling on the
transport that last moved the ctsn forward.  While this makes sense, I was
concerned that doing so prevented us from bundling in the case where we had
received chunks that moved the ctsn on multiple transports.  In those cases, the
RFC allows us to select any of the transports having received chunks to bundle
the sack on.  so I've modified the approach to allow for that, by adding a state
variable to each transport that tracks weather it has moved the ctsn since the
last sack.  This I think keeps our behavior (and performance), close enough to
our current profile that I think we can do this without a sysctl knob to
enable/disable it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yaseivch <vyasevich@gmail.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Reported-by: Michele Baldessari <michele@redhat.com>
Reported-by: sorin serban <sserban@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-30 22:44:35 -07:00
Andrei Emeltchenko
38b3fef173 Bluetooth: Improve debugging messages for hci_conn
Improve debugging of hci_conn objects by: adding print to hci_conn
refcounting, adding object spcifier when missing, change conn to hcon
since conn is heavily used for l2cap_conn objects and this is misleading.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-30 11:41:24 -03:00
John W. Linville
8732baafc3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
2012-06-29 12:42:14 -04:00
Michal Kazior
2e165b8184 cfg80211/mac80211: remove .get_channel
We do not need it anymore since cfg80211 tracks
monitor channel and monitor channel type.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-29 13:39:18 +02:00
Michal Kazior
dbbae26afa cfg80211: track monitor interfaces count
Implements .set_monitor_enabled(wiphy, enabled).

Notifies driver upon change of interface layout.

If only monitor interfaces become present it is
called with 2nd argument being true. If
non-monitor interface appears then 2nd argument
is false. Driver is notified only upon change.

This makes it more obvious about the fact that
cfg80211 supports single monitor channel. Once we
implement multi-channel we don't want to allow
setting monitor channel while other interface
types are running. Otherwise it would be ambiguous
once we start considering num_different_channels.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-29 13:39:16 +02:00
Michal Kazior
c30a3d3868 cfg80211: track ibss fixed channel
IBSS may hop between channels. It is necessary to
account this special case when considering
interface combinations.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-29 13:39:15 +02:00
Michal Kazior
f4489ebeff cfg80211: add channel tracking for AP and mesh
We need to know which channel is used by a running
AP and mesh for channel context accounting and
finding matching/active interface combination.

STA/IBSS have current_bss already which allows us
to check which channel a vif is tuned to.
Non-fixed channel IBSS can be handled with
additional changes.

Monitor mode is going to be handled differently.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-29 13:39:15 +02:00
David S. Miller
7a9bc9b81a ipv4: Elide fib_validate_source() completely when possible.
If rpfilter is off (or the SKB has an IPSEC path) and there are not
tclassid users, we don't have to do anything at all when
fib_validate_source() is invoked besides setting the itag to zero.

We monitor tclassid uses with a counter (modified only under RTNL and
marked __read_mostly) and we protect the fib_validate_source() real
work with a test against this counter and whether rpfilter is to be
done.

Having a way to know whether we need no tclassid processing or not
also opens the door for future optimized rpfilter algorithms that do
not perform full FIB lookups.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29 01:36:36 -07:00
Ville Nuorvala
d0087b29f7 ipv6_tunnel: Allow receiving packets on the fallback tunnel if they pass sanity checks
At Facebook, we do Layer-3 DSR via IP-in-IP tunneling. Our load balancers wrap
an extra IP header on incoming packets so they can be routed to the backend.
In the v4 tunnel driver, when these packets fall on the default tunl0 device,
the behavior is to decapsulate them and drop them back on the stack. So our
setup is that tunl0 has the VIP and eth0 has (obviously) the backend's real
address.

In IPv6 we do the same thing, but the v6 tunnel driver didn't have this same
behavior - if you didn't have an explicit tunnel setup, it would drop the
packet.

This patch brings that v4 feature to the v6 driver.

The same IPv6 address checks are performed as with any normal tunnel,
but as the fallback tunnel endpoint addresses are unspecified, the checks
must be performed on a per-packet basis, rather than at tunnel
configuration time.

[Patch description modified by phil@ipom.com]

Signed-off-by: Ville Nuorvala <ville.nuorvala@gmail.com>
Tested-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29 00:52:32 -07:00
David S. Miller
9e56e3800e ipv4: Adjust in_dev handling in fib_validate_source()
Checking for in_dev being NULL is pointless.

In fact, all of our callers have in_dev precomputed already,
so just pass it in and remove the NULL checking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 18:54:02 -07:00
Thomas Graf
58050fce35 net: Use NLMSG_DEFAULT_SIZE in combination with nlmsg_new()
Using NLMSG_GOODSIZE results in multiple pages being used as
nlmsg_new() will automatically add the size of the netlink
header to the payload thus exceeding the page limit.

NLMSG_DEFAULT_SIZE takes this into account.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Sergey Lapin <slapin@ossfans.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 17:56:43 -07:00
Neal Cardwell
3840a06e60 tcp: pass fl6 to inet6_csk_route_req()
This commit changes inet_csk_route_req() so that it uses a pointer to
a struct flowi6, rather than allocating its own on the stack. This
brings its behavior in line with its IPv4 cousin,
inet_csk_route_req(), and allows a follow-on patch to fix a dst leak.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 17:53:50 -07:00
Johannes Berg
b1fbd46976 Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2012-06-28 13:45:58 +02:00
Mahesh Palivela
bf0c111ec8 cfg80211: allow advertising VHT capabilities
Allow drivers to advertise their VHT capabilities
and export them to userspace via nl80211.

Signed-off-by: Mahesh Palivela <maheshp@posedge.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-28 13:08:34 +02:00
David S. Miller
41347dcdd8 ipv4: Kill rt->rt_spec_dst, no longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 04:05:27 -07:00
David S. Miller
35ebf65e85 ipv4: Create and use fib_compute_spec_dst() helper.
The specific destination is the host we direct unicast replies to.
Usually this is the original packet source address, but if we are
responding to a multicast or broadcast packet we have to use something
different.

Specifically we must use the source address we would use if we were to
send a packet to the unicast source of the original packet.

The routing cache precomputes this value, but we want to remove that
precomputation because it creates a hard dependency on the expensive
rpfilter source address validation which we'd like to make cheaper.

There are only three places where this matters:

1) ICMP replies.

2) pktinfo CMSG

3) IP options

Now there will be no real users of rt->rt_spec_dst and we can simply
remove it altogether.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 03:59:11 -07:00
David S. Miller
70e7341673 ipv4: Show that ip_send_reply() is purely unicast routine.
Rename it to ip_send_unicast_reply() and add explicit 'saddr'
argument.

This removed one of the few users of rt->rt_spec_dst.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 03:21:41 -07:00
Johannes Berg
fc8a7321d3 mac80211: don't expose ieee80211_add_srates_ie()
This and ieee80211_add_ext_srates_ie() aren't
exported, so can't be used by drivers anyway,
but there's also no reason that they should be
so make them private to mac80211 and use sdata
instead of vif arguments.

Acked-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-28 10:35:50 +02:00
David S. Miller
160eb5a6b1 ipv4: Kill early demux method return value.
It's completely unnecessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 22:01:22 -07:00
David S. Miller
1d1e34ddd4 xfrm_user: Propagate netlink error codes properly.
Instead of using a fixed value of "-1" or "-EMSGSIZE", propagate what
the nla_*() interfaces actually return.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 21:57:03 -07:00
David S. Miller
c10237e077 Revert "ipv4: tcp: dont cache unconfirmed intput dst"
This reverts commit c074da2810.

This change has several unwanted side effects:

1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
   thus never create a real cached route.

2) All TCP traffic will use DST_NOCACHE and never use the routing
   cache at all.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 17:05:06 -07:00
Eric Dumazet
c074da2810 ipv4: tcp: dont cache unconfirmed intput dst
DDOS synflood attacks hit badly IP route cache.

On typical machines, this cache is allowed to hold up to 8 Millions dst
entries, 256 bytes for each, for a total of 2GB of memory.

rt_garbage_collect() triggers and tries to cleanup things.

Eventually route cache is disabled but machine is under fire and might
OOM and crash.

This patch exploits the new TCP early demux, to set a nocache
boolean in case incoming TCP frame is for a not yet ESTABLISHED or
TIMEWAIT socket.

This 'nocache' boolean is then used in case dst entry is not found in
route cache, to create an unhashed dst entry (DST_NOCACHE)

SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
output dst for syncookies), so after this patch, a machine is able to
absorb a DDOS synflood attack without polluting its IP route cache.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 15:34:24 -07:00
Gao feng
f28997e27a netfilter: nf_conntrack: add nf_ct_kfree_compat_sysctl_table
This patch is a cleanup.

It adds nf_ct_kfree_compat_sysctl_table to release l4proto's
compat sysctl table and set the compat sysctl table point to NULL.

This new function will be used by follow-up patches.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 18:36:25 +02:00
Gao feng
f1caad2745 netfilter: nf_conntrack: prepare l4proto->init_net cleanup
l4proto->init contain quite redundant code. We can simplify this
by adding a new parameter l3proto.

This patch prepares that code simplification.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 18:31:14 +02:00
Johannes Berg
dfb89c56ad cfg80211: don't allow WoWLAN support without CONFIG_PM
When CONFIG_PM is disabled, no device can possibly
support WoWLAN since it can't go to sleep to start
with. Due to this, mac80211 had even rejected the
hardware registration. By making all the code and
data for WoWLAN depend on CONFIG_PM we can promote
this runtime error to a compile-time error.

Add #ifdef around all WoWLAN code to remove it in
systems that don't need it as they never suspend.

Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-27 17:55:11 +02:00
alex.bluesman.smirnov@gmail.com
32bad7e30f mac802154: add wpan device-class support
Every real 802.15.4 transceiver, which works with software MAC layer,
can be classified as a wpan device in this stack. So the wpan device
implementation provides missing link in datapath between the device
drivers and the Linux network queue.

According to the IEEE 802.15.4 standard each packet can be one of the
following types:
 - beacon
 - MAC layer command
 - ACK
 - data

This patch adds support for the data packet-type only, but this is
enough to perform data transmission and receiving over radio.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-26 21:06:11 -07:00
Greg Kroah-Hartman
fc915c8b93 Merge 3.5-rc4 into tty-next
This is to pick up the serial port and tty changes in Linus's tree to allow
everyone to sync up.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-26 16:04:29 -07:00
John W. Linville
2c443443e7 Merge branch 'for-john' of git://git.sipsolutions.net/mac80211-next 2012-06-26 14:27:34 -04:00
Thomas Pedersen
88e920b450 nl80211: specify RSSI threshold in scheduled scan
Support configuring an RSSI threshold in dBm (s32) when requesting
scheduled scan, below which a BSS won't be reported by the cfg80211
driver.

Signed-off-by: Thomas Pedersen <c_tpeder@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-26 09:32:28 +02:00
Sjur Brændeland
91fa0cbc0c caif-hsi: Remove use of module parameters
Remove use of module parameters on caif hsi device, as
rtnl configuration parameters are already supported.

All caif hsi configuration data is put in cfhsi_config,
and default values in hsi_default_config.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:44:12 -07:00
Sjur Brændeland
1c385f1fdf caif-hsi: Replace platform device with ops structure.
Remove use of struct platform_device, and replace it with
struct cfhsi_ops. Updated variable names in the same
spirit:
cfhsi_get_dev to cfhsi_get_ops,
cfhsi->dev to cfhsi->ops and,
cfhsi->dev.drv to cfhsi->ops->cb_ops.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:44:12 -07:00
Sjur Brændeland
c412540063 caif-hsi: Add rtnl support
Add RTNL support for managing the caif hsi interface.
The HSI HW interface is no longer registering as a device,
instead we use symbol_get to get hold of the HSI API.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:44:12 -07:00
Eric Dumazet
deaa58542b net: struct sock cleanups
Add missing kernel doc for sk_rx_dst

Move sk_rx_dst to avoid two 32bit holes on 64bit arches

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:09:18 -07:00
Vijay Subramanian
efc27f8cee net: Remove 'unlikely' qualifier in skb_steal_sock()
With early demux enabled by default for TCP flows, there is high chance that
skb->sk will be non-null. 'unlikely()' was removed from __inet_lookup_skb() but
maybe it can be removed from skb_steal_sock() as well.

Note: skb_steal_sock() is also called by __inet6_lookup_skb() and
__udp4_lib_lookup_skb() but they are protected by their own 'unlikely' calls.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:08:36 -07:00
David S. Miller
e486463e82 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/translation-table.c
	net/ipv6/route.c

qmi_wwan.c resolution provided by Bjørn Mork.

batman-adv conflict is dealing merely with the changes
of global function names to have a proper subsystem
prefix.

ipv6's route.c conflict is merely two side-by-side additions
of network namespace methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 15:50:32 -07:00
Johannes Berg
bdcbd8e0e3 mac80211: clean up debugging
There are a few things that make the logging and
debugging in mac80211 less useful than it should
be right now:
 * a lot of messages should be pr_info, not pr_debug
 * wholesale use of pr_debug makes it require *both*
   Kconfig and dynamic configuration
 * there are still a lot of ifdefs
 * the style is very inconsistent, sometimes the
   sdata->name is printed in front

Clean up everything, introducing new macros and
separating out the station MLME debugging into
a new Kconfig symbol.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-24 11:32:29 +02:00
Eric Dumazet
7586eceb0a ipv4: tcp: dont cache output dst for syncookies
Don't cache output dst for syncookies, as this adds pressure on IP route
cache and rcu subsystem for no gain.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-22 21:47:33 -07:00
Alexander Duyck
6648bd7e0e ipv4: Add sysctl knob to control early socket demux
This change is meant to add a control for disabling early socket demux.
The main motivation behind this patch is to provide an option to disable
the feature as it adds an additional cost to routing that reduces overall
throughput by up to 5%.  For example one of my systems went from 12.1Mpps
to 11.6 after the early socket demux was added.  It looks like the reason
for the regression is that we are now having to perform two lookups, first
the one for an established socket, and then the one for the routing table.

By adding this patch and toggling the value for ip_early_demux to 0 I am
able to get back to the 12.1Mpps I was previously seeing.

[ Move local variables in ip_rcv_finish() down into the basic
  block in which they are actually used.  -DaveM ]

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-22 17:11:13 -07:00
John W. Linville
133189a46c Merge branch 'for-john' of git://git.sipsolutions.net/mac80211-next 2012-06-22 14:39:53 -04:00
Victor Goldenshtein
66572cfc30 mac80211: add command to get current rssi
Get current rssi (in dBm) from the driver/FW.

Instead of reporting the signal received in the last
rx packet, which might be inaccurate if rx traffic is
low and beacon filtering is enabled, get the signal
from the driver/FW.

Signed-off-by: Victor Goldenshtein <victorg@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-21 16:42:17 +02:00
David S. Miller
41063e9dd1 ipv4: Early TCP socket demux.
Input packet processing for local sockets involves two major demuxes.
One for the route and one for the socket.

But we can optimize this down to one demux for certain kinds of local
sockets.

Currently we only do this for established TCP sockets, but it could
at least in theory be expanded to other kinds of connections.

If a TCP socket is established then it's identity is fully specified.

This means that whatever input route was used during the three-way
handshake must work equally well for the rest of the connection since
the keys will not change.

Once we move to established state, we cache the receive packet's input
route to use later.

Like the existing cached route in sk->sk_dst_cache used for output
packets, we have to check for route invalidations using dst->obsolete
and dst->ops->check().

Early demux occurs outside of a socket locked section, so when a route
invalidation occurs we defer the fixup of sk->sk_rx_dst until we are
actually inside of established state packet processing and thus have
the socket locked.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-19 21:22:05 -07:00
David S. Miller
f9242b6b28 inet: Sanitize inet{,6} protocol demux.
Don't pretend that inet_protos[] and inet6_protos[] are hashes, thay
are just a straight arrays.  Remove all unnecessary hash masking.

Document MAX_INET_PROTOS.

Use RAW_HTABLE_SIZE when appropriate.

Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-19 18:56:21 -07:00
David S. Miller
a77f4b4acf Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John Linville says:

====================
This is a sizeable batch of updates intended for 3.6...

The bulk of the changes here are Bluetooth.  Gustavo says:

	Here goes the first Bluetooth pull request for 3.6, we have
	queued quite a lot of work. Andrei Emeltchenko added the AMP
	Manager code, a lot of work is needed, but the first bit are
	already there. This code is disabled by default.  Mat Martineau
	changed the whole L2CAP ERTM state machine code, replacing
	the old one with a new implementation. Besides that we had
	lot of coding style fixes (to follow net rules), more l2cap
	core separation from socket and many clean ups and fixed all
	over the tree.

Along with the above, there is a healthy dose of ath9k, iwlwifi,
and other driver updates.  There is also another pull from the
wireless tree to resolve some merge issues.  I also fixed-up some
merge discrepencies between net-next and wireless-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-19 14:37:15 -07:00
John W. Linville
b3c911eeb4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/dvm/testmode.c
	drivers/net/wireless/iwlwifi/pcie/trans.c
2012-06-19 14:41:22 -04:00
Pablo Neira Ayuso
674147e211 netfilter: fix missing symbols if CONFIG_NETFILTER_NETLINK_QUEUE_CT unset
ERROR: "nfqnl_ct_parse" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_seq_adjust" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_put" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_get" [net/netfilter/nfnetlink_queue.ko] undefined!

We have to use CONFIG_NETFILTER_NETLINK_QUEUE_CT in
include/net/netfilter/nfnetlink_queue.h, not CONFIG_NF_CONNTRACK.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-18 21:09:17 -07:00
Andrei Emeltchenko
9345d40c58 Bluetooth: Use AUTO_OFF constant in jiffies
Move AUTO_OFF_TIMEOUT to other constants changing name to
HCI_AUTO_OFF_TIMEOUT and convert to jiffies.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-19 00:12:37 -03:00
Pablo Neira Ayuso
7c62234547 netfilter: nfnetlink_queue: fix compilation with NF_CONNTRACK disabled
In "9cb0176 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink"
the compilation with NF_CONNTRACK disabled is broken. This patch fixes this
issue.

I have moved the conntrack part into nfnetlink_queue_ct.c to avoid
peppering the entire nfnetlink_queue.c code with ifdefs.

I also needed to rename nfnetlink_queue.c to nfnetlink_queue_pkt.c
to update the net/netfilter/Makefile to support conditional compilation
of the conntrack integration.

This patch also adds CONFIG_NETFILTER_QUEUE_CT in case you want to explicitly
disable the integration between nf_conntrack and nfnetlink_queue.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-19 04:44:57 +02:00
John W. Linville
8cfe523a12 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-06-18 15:13:27 -04:00
Chun-Yeow Yeoh
728b19e5fb {nl,cfg,mac}80211: implement dot11MeshHWMPconfirmationInterval
As defined in section 13.10.9.3 Case D (802.11-2012), this
control variable is used to limit the mesh STA to send only
one PREQ to a root mesh STA within this interval of time
(in TUs). The default value for this variable is set to
2000 TUs. However, for current implementation, the maximum
configurable of dot11MeshHWMPconfirmationInterval is
restricted by dot11MeshHWMPactivePathTimeout.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
[line-break commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-18 13:55:15 +02:00
Rémi Denis-Courmont
31fdc5553b net: remove my future former mail address
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
Cc: Sakari Ailus <sakari.ailus@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-17 16:29:38 -07:00
David S. Miller
82f437b950 Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo says:

====================
This is the second batch of Netfilter updates for net-next. It contains the
kernel changes for the new user-space connection tracking helper
infrastructure.

More details on this infrastructure are provides here:
http://lwn.net/Articles/500196/

Still, I plan to provide some official documentation through the
conntrack-tools user manual on how to setup user-space utilities for this.
So far, it provides two helper in user-space, one for NFSv3 and another for
Oracle/SQLnet/TNS. Yet in my TODO list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16 15:23:35 -07:00
Eldad Zack
7f95e1880e include/net/dst.h: neaten asterisk placement
Fix code style - place the asterisk where it belongs.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16 15:20:35 -07:00
Pablo Neira Ayuso
12f7a50533 netfilter: add user-space connection tracking helper infrastructure
There are good reasons to supports helpers in user-space instead:

* Rapid connection tracking helper development, as developing code
  in user-space is usually faster.

* Reliability: A buggy helper does not crash the kernel. Moreover,
  we can monitor the helper process and restart it in case of problems.

* Security: Avoid complex string matching and mangling in kernel-space
  running in privileged mode. Going further, we can even think about
  running user-space helpers as a non-root process.

* Extensibility: It allows the development of very specific helpers (most
  likely non-standard proprietary protocols) that are very likely not to be
  accepted for mainline inclusion in the form of kernel-space connection
  tracking helpers.

This patch adds the infrastructure to allow the implementation of
user-space conntrack helpers by means of the new nfnetlink subsystem
`nfnetlink_cthelper' and the existing queueing infrastructure
(nfnetlink_queue).

I had to add the new hook NF_IP6_PRI_CONNTRACK_HELPER to register
ipv[4|6]_helper which results from splitting ipv[4|6]_confirm into
two pieces. This change is required not to break NAT sequence
adjustment and conntrack confirmation for traffic that is enqueued
to our user-space conntrack helpers.

Basic operation, in a few steps:

1) Register user-space helper by means of `nfct':

 nfct helper add ftp inet tcp

 [ It must be a valid existing helper supported by conntrack-tools ]

2) Add rules to enable the FTP user-space helper which is
   used to track traffic going to TCP port 21.

For locally generated packets:

 iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp

For non-locally generated packets:

 iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp

3) Run the test conntrackd in helper mode (see example files under
   doc/helper/conntrackd.conf

 conntrackd

4) Generate FTP traffic going, if everything is OK, then conntrackd
   should create expectations (you can check that with `conntrack':

 conntrack -E expect

    [NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
[DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp

This confirms that our test helper is receiving packets including the
conntrack information, and adding expectations in kernel-space.

The user-space helper can also store its private tracking information
in the conntrack structure in the kernel via the CTA_HELP_INFO. The
kernel will consider this a binary blob whose layout is unknown. This
information will be included in the information that is transfered
to user-space via glue code that integrates nfnetlink_queue and
ctnetlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:40:02 +02:00
Pablo Neira Ayuso
ae243bee39 netfilter: ctnetlink: add CTA_HELP_INFO attribute
This attribute can be used to modify and to dump the internal
protocol information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:09:15 +02:00
Pablo Neira Ayuso
8c88f87cb2 netfilter: nfnetlink_queue: add NAT TCP sequence adjustment if packet mangled
User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.

With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:

1) The packet has been mangled,
2) the NFQA_CFG_F_CONNTRACK flag has been set, and
3) NAT is detected.

There are some records on the Internet complaning about this issue:
http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables

By now, we only support TCP since we have no helpers for DCCP or SCTP.
Better to add this if we ever have some helper over those layer 4 protocols.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:09:08 +02:00
Pablo Neira Ayuso
1afc56794e netfilter: nf_ct_helper: implement variable length helper private data
This patch uses the new variable length conntrack extensions.

Instead of using union nf_conntrack_help that contain all the
helper private data information, we allocate variable length
area to store the private helper data.

This patch includes the modification of all existing helpers.
It also includes a couple of include header to avoid compilation
warnings.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:08:55 +02:00
Pablo Neira Ayuso
3cf4c7e381 netfilter: nf_ct_ext: support variable length extensions
We can now define conntrack extensions of variable size. This
patch is useful to get rid of these unions:

union nf_conntrack_help
union nf_conntrack_proto
union nf_conntrack_nat_help

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:08:49 +02:00
Pablo Neira Ayuso
3a8fc53a45 netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names
This patch modifies the struct nf_conntrack_helper to allocate
the room for the helper name. The maximum length is 16 bytes
(this was already introduced in 2.6.24).

For the maximum length for expectation policy names, I have
also selected 16 bytes.

This patch is required by the follow-up patch to support
user-space connection tracking helpers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:08:39 +02:00
David S. Miller
aee289baaa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/route.c

Pull in 'net' again to get the revert of Thomas's change
which introduced regressions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16 01:23:04 -07:00
David S. Miller
e8803b6c38 Revert "ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route"
This reverts commit 2a0c451ade.

It causes crashes, because now ip6_null_entry is used before
it is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16 01:12:19 -07:00
David S. Miller
7e52b33bd5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/route.c

This deals with a merge conflict between the net-next addition of the
inetpeer network namespace ops, and Thomas Graf's bug fix in
2a0c451ade which makes sure we don't
register /proc/net/ipv6_route before it is actually safe to do so.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 15:51:55 -07:00
Thomas Graf
2a0c451ade ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
handler is installed in ip6_route_net_init() whereas fib_table_hash is
allocated in fib6_net_init() _after_ the proc handler has been installed.

This opens up a short time frame to access fib_table_hash with its pants
down.

fib6_init() as a whole can't be moved to an earlier position as it also
registers the rtnetlink message handlers which should be registered at
the end. Therefore split it into fib6_init() which is run early and
fib6_init_late() to register the rtnetlink message handlers.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 15:30:15 -07:00
David S. Miller
81aded2467 ipv6: Handle PMTU in ICMP error handlers.
One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
to handle the error pass the 32-bit info cookie in network byte order
whereas ipv4 passes it around in host byte order.

Like the ipv4 side, we have two helper functions.  One for when we
have a socket context and one for when we do not.

ip6ip6 tunnels are not handled here, because they handle PMTU events
by essentially relaying another ICMP packet-too-big message back to
the original sender.

This patch allows us to get rid of rt6_do_pmtu_disc().  It handles all
kinds of situations that simply cannot happen when we do the PMTU
update directly using a fully resolved route.

In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
likely be removed or changed into a BUG_ON() check.  We should never
have a prefixed ipv6 route when we get there.

Another piece of strange history here is that TCP and DCCP, unlike in
ipv4, never invoke the update_pmtu() method from their ICMP error
handlers.  This is incredibly astonishing since this is the context
where we have the most accurate context in which to make a PMTU
update, namely we have a fully connected socket and associated cached
socket route.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 14:54:11 -07:00
David S. Miller
3639339553 ipv4: Handle PMTU in all ICMP error handlers.
With ip_rt_frag_needed() removed, we have to explicitly update PMTU
information in every ICMP error handler.

Create two helper functions to facilitate this.

1) ipv4_sk_update_pmtu()

   This updates the PMTU when we have a socket context to
   work with.

2) ipv4_update_pmtu()

   Raw version, used when no socket context is available.  For this
   interface, we essentially just pass in explicit arguments for
   the flow identity information we would have extracted from the
   socket.

   And you'll notice that ipv4_sk_update_pmtu() is simply implemented
   in terms of ipv4_update_pmtu()

Note that __ip_route_output_key() is used, rather than something like
ip_route_output_flow() or ip_route_output_key().  This is because we
absolutely do not want to end up with a route that does IPSEC
encapsulation and the like.  Instead, we only want the route that
would get us to the node described by the outermost IP header.

Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-14 22:22:07 -07:00
Chun-Yeow Yeoh
ac1073a61d {nl,cfg,mac}80211: implement dot11MeshHWMProotInterval and dot11MeshHWMPactivePathToRootTimeout
Add the mesh configuration parameters dot11MeshHWMProotInterval
and dot11MeshHWMPactivePathToRootTimeout to be used by
proactive PREQ mechanism.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
[line-break commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-14 09:08:22 +02:00
John W. Linville
211c17aaee Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	drivers/net/wireless/ath/ath9k/main.c
	net/bluetooth/hci_event.c
2012-06-13 15:35:35 -04:00
John W. Linville
ec8eb9ae58 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-06-13 15:12:07 -04:00
John W. Linville
1f7e010282 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-06-13 14:05:40 -04:00
Johannes Berg
73c3df3ba3 cfg80211/nl80211: fix kernel-doc
Add missing entries to nl80211.h and fix
the kernel-doc notation in cfg80211.h.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-13 11:17:14 +02:00
David S. Miller
43b03f1f6d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	MAINTAINERS
	drivers/net/wireless/iwlwifi/pcie/trans.c

The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.

The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12 21:59:18 -07:00
Jefferson Delfes
af7985bf85 Bluetooth: Fix flags of mgmt_device_found event
Change flags field to matches userspace structure.
This field needs to be converted to little endian before forward it.

Signed-off-by: Jefferson Delfes <jefferson.delfes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-12 23:19:21 -03:00
Jiri Slaby
62f228acb8 TTY: ircomm, use tty from tty_port
This also includes a switch to tty refcounting. It makes sure, the
code no longer can access a freed TTY struct.

Sometimes the only thing needed is to pass tty down to the callies.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-12 15:50:24 -07:00
Jiri Slaby
e673927d8a TTY: ircomm, revamp locking
Use self->spinlock only for ctrl_skb and tx_skb. TTY stuff is now
protected by tty_port->lock. This is needed for further cleanup (and
conversion to tty_port helpers).

This also closes the race in the end of close.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-12 15:50:23 -07:00
Jiri Slaby
849d5a997f TTY: ircomm, use flags from tty_port
Switch to tty_port->flags. And while at it, remove redefined flags for
them.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-12 15:50:23 -07:00
Jiri Slaby
580d27b449 TTY: ircomm, use open counts from tty_port
Switch to tty_port->count and blocked_open.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-12 15:50:23 -07:00
Jiri Slaby
2a0213cb1e TTY: ircomm, use close times from tty_port
Switch to tty_port->close_delay and closing_wait.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-12 15:50:23 -07:00
Jiri Slaby
a3cc9fcff8 TTY: ircomm, add tty_port
And use close/open_wait from there.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-12 15:50:23 -07:00
Eric Dumazet
5ee31c6898 bonding: Fix corrupted queue_mapping
In the transmit path of the bonding driver, skb->cb is used to
stash the skb->queue_mapping so that the bonding device can set its
own queue mapping.  This value becomes corrupted since the skb->cb is
also used in __dev_xmit_skb.

When transmitting through bonding driver, bond_select_queue is
called from dev_queue_xmit.  In bond_select_queue the original
skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
and skb->queue_mapping is overwritten with the bond driver queue.

Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
the packet length into skb->cb, thereby overwriting the stashed
queue mappping.  In bond_dev_queue_xmit (called from hard_start_xmit),
the queue mapping for the skb is set to the stashed value which is now
the skb length and hence is an invalid queue for the slave device.

If we want to save skb->queue_mapping into skb->cb[], best place is to
add a field in struct qdisc_skb_cb, to make sure it wont conflict with
other layers (eg : Qdiscc, Infiniband...)

This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8
bytes :

netem qdisc for example assumes it can store an u64 in it, without
misalignment penalty.

Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[].
The largest user is CHOKe and it fills it.

Based on a previous patch from Tom Herbert.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Tom Herbert <therbert@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12 15:29:21 -07:00
John W. Linville
0440507bbc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-06-12 14:25:04 -04:00
Andrei Emeltchenko
5f246e8905 Bluetooth: Update HCI timeouts constants to use msecs_to_jiffies
The HCI constants are always used in form of jiffies. So just
include the conversion from msecs in the define itself. This has the
advantage of making the code where the timeout is used more readable
and avoiding unnecessary conversions.

The patch is similar to commit ba13ccd9 doing the same job for L2CAP

Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-12 00:07:05 -03:00
Gustavo Padovan
cbe461c526 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Conflicts:
	net/bluetooth/hci_event.c
2012-06-11 22:36:42 -03:00
David S. Miller
55afabaa0d inet: Fix BUG triggered by __rt{,6}_get_peer().
If no peer actually gets attached (either because create is zero or
the peer allocation fails) we'll trigger a BUG because we
unconditionally do an rt{,6}_peer_ptr() afterwards.

Fix this by guarding it with the proper check.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 15:52:29 -07:00
David S. Miller
67da255210 Merge branch 'master' of git://1984.lsi.us.es/net-next 2012-06-11 12:56:14 -07:00
John W. Linville
2e48686835 Merge tag 'nfc-next-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-3.0 2012-06-11 14:46:04 -04:00
David S. Miller
7b34ca2ac7 inet: Avoid potential NULL peer dereference.
We handle NULL in rt{,6}_set_peer but then our caller will try to pass
that NULL pointer into inet_putpeer() which isn't ready for it.

Fix this by moving the NULL check one level up, and then remove the
now unnecessary NULL check from inetpeer_ptr_set_peer().

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 04:13:57 -07:00
David S. Miller
8e77327783 inet: Add inetpeer tree roots to the FIB tables.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 02:09:16 -07:00
David S. Miller
b48c80ece9 inet: Add family scope inetpeer flushes.
This implementation can deal with having many inetpeer roots, which is
a necessary prerequisite for per-FIB table rooted peer tables.

Each family (AF_INET, AF_INET6) has a sequence number which we bump
when we get a family invalidation request.

Each peer lookup cheaply checks whether the flush sequence of the
root we are using is out of date, and if so flushes it and updates
the sequence number.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 02:09:10 -07:00
David S. Miller
46517008e1 ipv4: Kill ip_rt_frag_needed().
There is zero point to this function.

It's only real substance is to perform an extremely outdated BSD4.2
ICMP check, which we can safely remove.  If you really have a MTU
limited link being routed by a BSD4.2 derived system, here's a nickel
go buy yourself a real router.

The other actions of ip_rt_frag_needed(), checking and conditionally
updating the peer, are done by the per-protocol handlers of the ICMP
event.

TCP, UDP, et al. have a handler which will receive this event and
transmit it back into the associated route via dst_ops->update_pmtu().

This simplification is important, because it eliminates the one place
where we do not have a proper route context in which to make an
inetpeer lookup.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 02:08:59 -07:00
David S. Miller
97bab73f98 inet: Hide route peer accesses behind helpers.
We encode the pointer(s) into an unsigned long with one state bit.

The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.

Later the peer roots will be per-FIB table, and this change works to
facilitate that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 02:08:47 -07:00
Chun-Yeow Yeoh
a4f606ea73 {nl,cfg,mac}80211: fix the coding style related to mesh parameters
fix the coding style related to mesh parameters, especially the indentation,
as pointed out by Johannes Berg.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-11 09:23:45 +02:00
Chun-Yeow Yeoh
3ddd53f392 cfg80211: add missing kernel-doc for mesh configuration structure
Add the missing kernel-doc for mesh configuration parameters as pointed
out by Johannes Berg.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-11 09:22:57 +02:00
Roland Dreier
c5d21c4b2a net: Reorder initialization in ip_route_output to fix gcc warning
If I build with W=1, for every file that includes <net/route.h>, I get the warning

    include/net/route.h: In function 'ip_route_output':
    include/net/route.h:135:3: warning: initialized field overwritten [-Woverride-init]
    include/net/route.h:135:3: warning: (near initialization for 'fl4') [-Woverride-init]

(This is with "gcc (Debian 4.6.3-1) 4.6.3")

A fix seems pretty trivial: move the initialization of .flowi4_tos
earlier.  As far as I can tell, this has no effect on code generation.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 00:04:47 -07:00
David S. Miller
c0efc887dc inet: Pass inetpeer root into inet_getpeer*() interfaces.
Otherwise we reference potentially non-existing members when
ipv6 is disabled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 19:12:36 -07:00
David S. Miller
56a6b248eb inet: Consolidate inetpeer_invalidate_tree() interfaces.
We only need one interface for this operation, since we always know
which inetpeer root we want to flush.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 16:32:41 -07:00
David S. Miller
c3426b4719 inet: Initialize per-netns inetpeer roots in net/ipv{4,6}/route.c
Instead of net/ipv4/inetpeer.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 16:27:05 -07:00
David S. Miller
2397849baa [PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary.
Since it's guarenteed that we will access the inetpeer if we're trying
to do timewait recycling and TCP options were enabled on the
connection, just cache the peer in the timewait socket.

In the future, inetpeer lookups will be context dependent (per routing
realm), and this helps facilitate that as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 14:56:12 -07:00
Johannes Berg
d13e141481 mac80211: add some missing kernel-doc
Add a few kernel-doc descriptions that were missed
during development.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-06-09 10:31:09 +02:00
David S. Miller
4670fd819e tcp: Get rid of inetpeer special cases.
The get_peer method TCP uses is full of special cases that make no
sense accommodating, and it also gets in the way of doing more
reasonable things here.

First of all, if the socket doesn't have a usable cached route, there
is no sense in trying to optimize timewait recycling.

Likewise for the case where we have IP options, such as SRR enabled,
that make the IP header destination address (and thus the destination
address of the route key) differ from that of the connection's
destination address.

Just return a NULL peer in these cases, and thus we're also able to
get rid of the clumsy inetpeer release logic.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 01:25:47 -07:00
David S. Miller
fbfe95a42e inet: Create and use rt{,6}_get_peer_create().
There's a lot of places that open-code rt{,6}_get_peer() only because
they want to set 'create' to one.  So add an rt{,6}_get_peer_create()
for their sake.

There were also a few spots open-coding plain rt{,6}_get_peer() and
those are transformed here as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 23:24:18 -07:00
Johan Hedberg
1c2e004183 Bluetooth: Add support for encryption key refresh
With LE/SMP the completion of a security level elavation from medium to
high is indicated by a HCI Encryption Key Refresh Complete event. The
necessary behavior upon receiving this event is a mix of what's done for
auth_complete and encryption_change, which is also where most of the
event handling code has been copied from.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-08 21:00:40 -03:00
Eric Dumazet
7123aaa3a1 af_unix: speedup /proc/net/unix
/proc/net/unix has quadratic behavior, and can hold unix_table_lock for
a while if high number of unix sockets are alive. (90 ms for 200k
sockets...)

We already have a hash table, so its quite easy to use it.

Problem is unbound sockets are still hashed in a single hash slot
(unix_socket_table[UNIX_HASH_TABLE])

This patch also spreads unbound sockets to 256 hash slots, to speedup
both /proc/net/unix and unix_diag.

Time to read /proc/net/unix with 200k unix sockets :
(time dd if=/proc/net/unix of=/dev/null bs=4k)

before : 520 secs
after : 2 secs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 14:27:23 -07:00
Gao feng
54db0cc2ba inetpeer: add parameter net for inet_getpeer_v4,v6
add struct net as a parameter of inet_getpeer_v[4,6],
use net to replace &init_net.

and modify some places to provide net for inet_getpeer_v[4,6]

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 14:27:23 -07:00
Gao feng
c8a627ed06 inetpeer: add namespace support for inetpeer
now inetpeer doesn't support namespace,the information will
be leaking across namespace.

this patch move the global vars v4_peers and v6_peers to
netns_ipv4 and netns_ipv6 as a field peers.

add struct pernet_operations inetpeer_ops to initial pernet
inetpeer data.

and change family_to_base and inet_getpeer to support namespace.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 14:27:23 -07:00
Gao feng
8264deb818 netfilter: nf_conntrack: add namespace support for cttimeout
This patch adds namespace support for cttimeout.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:41 +02:00
Pablo Neira Ayuso
e76d0af5e4 netfilter: nf_conntrack: remove now unused sysctl for nf_conntrack_l[3|4]proto
Since the sysctl data for l[3|4]proto now resides in pernet nf_proto_net.
We can now remove this unused fields from struct nf_contrack_l[3,4]proto.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:41 +02:00
Gao feng
7080ba0955 netfilter: nf_ct_icmp: add namespace support
This patch adds namespace support for ICMPv6 protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng
4b626b9c5d netfilter: nf_ct_icmp: add namespace support
This patch adds namespace support for ICMP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng
0ce490ad43 netfilter: nf_ct_udp: add namespace support
This patch adds namespace support for UDP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng
d2ba1fde42 netfilter: nf_ct_tcp: add namespace support
This patch adds namespace support for TCP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng
15f585bd76 netfilter: nf_ct_generic: add namespace support
This patch adds namespace support for the generic layer 4 protocol
tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng
524a53e5ad netfilter: nf_conntrack: prepare namespace support for l3 protocol trackers
This patch prepares the namespace support for layer 3 protocol trackers.
Basically, this modifies the following interfaces:

* nf_ct_l3proto_[un]register_sysctl.
* nf_conntrack_l3proto_[un]register.

We add a new nf_ct_l3proto_net is used to get the pernet data of l3proto.

This adds rhe new struct nf_ip_net that is used to store the sysctl header
and l3proto_ipv4,l4proto_tcp(6),l4proto_udp(6),l4proto_icmp(v6) because the
protos such tcp and tcp6 use the same data,so making nf_ip_net as a field
of netns_ct is the easiest way to manager it.

This patch also adds init_net to struct nf_conntrack_l3proto to initial
the layer 3 protocol pernet data.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng
2c352f444c netfilter: nf_conntrack: prepare namespace support for l4 protocol trackers
This patch prepares the namespace support for layer 4 protocol trackers.
Basically, this modifies the following interfaces:

* nf_ct_[un]register_sysctl
* nf_conntrack_l4proto_[un]register

to include the namespace parameter. We still use init_net in this patch
to prepare the ground for follow-up patches for each layer 4 protocol
tracker.

We add a new net_id field to struct nf_conntrack_l4proto that is used
to store the pernet_operations id for each layer 4 protocol tracker.

Note that AF_INET6's protocols do not need to do sysctl compat. Thus,
we only register compat sysctl when l4proto.l3proto != AF_INET6.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Johannes Berg
2eb278e083 mac80211: unify SW/offload remain-on-channel
Redesign all the off-channel code, getting rid of
the generic off-channel work concept, replacing
it with a simple remain-on-channel list.

This fixes a number of small issues with the ROC
implementation:
 * offloaded remain-on-channel couldn't be queued,
   now we can queue it as well, if needed
 * in iwlwifi (the only user) offloaded ROC is
   mutually exclusive with scanning, use the new
   queue to handle that case -- I expect that it
   will later depend on a HW flag

The bigger issue though is that there's a bad bug
in the current implementation: if we get a mgmt
TX request while HW roc is active, and this new
request has a wait time, we actually schedule a
software ROC instead since we can't guarantee the
existing offloaded ROC will still be that long.
To fix this, the queuing mechanism was needed.

The queuing mechanism for offloaded ROC isn't yet
optimal, ideally we should add API to have the HW
extend the ROC if needed. We could add that later
but for now use a software implementation.

Overall, this unifies the behaviour between the
offloaded and software-implemented case as much
as possible.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-06 15:31:18 -04:00
Johannes Berg
196ac1c13d mac80211: do remain-on-channel while idle
The IDLE handling in HW off-channel is broken right
now since we turn off IDLE only when the off-channel
period already started. Therefore, all drivers that
use it today (only iwlwifi!) must support off-channel
while idle, so playing with idle isn't needed at all.

Off-channel in general, since it's no longer used for
authentication/association, shouldn't affect PS, so
also remove that logic.

Also document a small caveat for reporting TX status
from off-channel frames in HW remain-on-channel.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-06 15:20:33 -04:00
Johannes Berg
e8c9bd5b8d cfg80211: clarify set_channel APIs
Now that we've removed all uses of the set_channel
API except for the monitor channel and in libertas,
clarify this. Split the libertas mesh use into a
new libertas_set_mesh_channel() operation, just to
keep backward compatibility, and rename the normal
set_channel() to set_monitor_channel().

Also describe the desired set_monitor_channel()
semantics more clearly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-06 15:18:17 -04:00
Eric Dumazet
55432d2b54 inetpeer: fix a race in inetpeer_gc_worker()
commit 5faa5df1fa (inetpeer: Invalidate the inetpeer tree along with
the routing cache) added a race :

Before freeing an inetpeer, we must respect a RCU grace period, and make
sure no user will attempt to increase refcnt.

inetpeer_invalidate_tree() waits for a RCU grace period before inserting
inetpeer tree into gc_list and waking the worker. At that time, no
concurrent lookup can find a inetpeer in this tree.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-06 10:45:15 -07:00
Johannes Berg
cc1d2806bf cfg80211: provide channel to join_mesh function
Just like the AP mode patch, instead of setting
the channel and then joining the mesh network,
provide the channel to join the network on to
the join_mesh() function.

Like in AP mode, you can also give the channel
to the join-mesh nl80211 command now.

Unlike AP mode, it picks a default channel if
none was given.

As libertas uses mesh mode interfaces but has
no join_mesh callback and we can't simply break
it, keep some compatibility code for that case
and configure the channel directly for it.

In the non-libertas case, where we store the
channel until join, allow setting it while the
interface is down.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-05 15:32:18 -04:00
Johannes Berg
aa430da410 cfg80211: provide channel to start_ap function
Instead of setting the channel first and then
starting the AP, let cfg80211 store the channel
and provide it as one of the AP settings.

This means that now you have to set the channel
before you can start an AP interface, but since
hostapd/wpa_supplicant always do that we're OK
with this change.

Alternatively, it's now possible to give the
channel as an attribute to the start-ap nl80211
command, overriding any preset channel.

Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-05 15:32:16 -04:00
Johannes Berg
d58e7e37aa cfg80211: simplify cfg80211_can_beacon_sec_chan API
Change cfg80211_can_beacon_sec_chan() to return true
if there is no secondary channel to simplify all the
current users of it. They all check the channel type
before calling the function because it returns false
if there's no secondary channel.

Also actually document the return value.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-05 15:32:16 -04:00
Eliad Peller
51ca9d8db2 mac80211: remove ieee80211_get_operstate()
ieee80211_get_operstate() was used by drivers in order to
know whether the sta link is up, but it's no longer needed
(nor used) as mac80211 notifies the drivers about
authorization changes (via the sta_state callback)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-05 15:32:10 -04:00
Joe Perches
499f42bb03 net: mac80211: Add and use ibss_vdbg debugging macro
Simplify the use of #ifdef CONFIG_MAC80211_IBSS_DEBUG/#endif
by adding a logging macro to encapsulate the test.

Convert the appropriate uses too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-05 15:32:10 -04:00
Joe Perches
d63e9ae3b1 net: mac80211: Add and use ht_vdbg debugging macro
Simplify the use of #ifdef CONFIG_MAC80211_HT_DEBUG/#endif
by adding a logging macro to encapsulate the test.

Convert the appropriate uses too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-05 15:32:10 -04:00
Arik Nemtsov
72d7872852 mac80211: allow low-level drivers to set netdev feature bits
Low level drivers can now set certain netdev feature bits in
netdev_features member of the ieee80211_hw struct. These will be
propagated to every netdev created from this HW.

The white-listed features currently include only ones related to HW
checksumming.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-05 15:21:46 -04:00
Gustavo Padovan
7e1af8a3a5 Bluetooth: Create empty l2cap ops function
A2MP doesn't use part of the L2CAP chan ops API so we just create general
empty function instead of the A2MP specific one.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-06-05 06:34:16 +03:00
Andre Guedes
8c3a4f004e Bluetooth: Rename L2CAP_LE_DEFAULT_MTU
This patch renames L2CAP_LE_DEFAULT_MTU macro to L2CAP_LE_MIN_MTU
since it represents the minimum MTU value, not the default MTU
value for LE.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:16 +03:00
Szymon Janc
1afd5be87e Bluetooth: Remove unused HCI timeouts definitions
Those are not used anywhere in code (and never were since introduction
in 2006) so just remove them.

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:14 +03:00
Andrei Emeltchenko
97e8e89d2d Bluetooth: A2MP: Manage incoming connections
Handle incoming A2MP connection by creating AMP manager and
processing A2MP messages.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:14 +03:00
Andrei Emeltchenko
416fa7527d Bluetooth: A2MP: Handling fixed channels
A2MP fixed channel do not have sk

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:13 +03:00
Andrei Emeltchenko
8598d064cb Bluetooth: A2MP: Process A2MP Discover Request
Adds helper functions to count HCI devs and process A2MP Discover
Request, code makes sure that first controller in the list is
BREDR one. Trace is shown below:

...
> ACL data: handle 11 flags 0x02 dlen 16
    A2MP: Discover req: mtu/mps 670 mask: 0x0000
< ACL data: handle 11 flags 0x00 dlen 22
    A2MP: Discover rsp: mtu/mps 670 mask: 0x0000
      Controller list:
        id 0 type 0 (BR-EDR) status 0x01 (Bluetooth only)
        id 1 type 1 (802.11 AMP) status 0x01 (Bluetooth only)
...

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:12 +03:00
Andrei Emeltchenko
e7af522e04 Bluetooth: A2MP: Define A2MP status codes
A2MP status codes copied from Bluez patch sent by Peter Krystad
<pkrystad@codeaurora.org>.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:12 +03:00
Andrei Emeltchenko
b9058fb67c Bluetooth: A2MP: Definitions for A2MP commands
Define A2MP command IDs and packet structures.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:12 +03:00
Andrei Emeltchenko
f6d3c6e783 Bluetooth: A2MP: Build and Send msg helpers
Helper function to build and send A2MP messages.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:12 +03:00
Andrei Emeltchenko
9740e49d17 Bluetooth: A2MP: AMP Manager basic functions
Define AMP Manager and some basic functions.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:11 +03:00
Andrei Emeltchenko
466f8004f3 Bluetooth: A2MP: Create A2MP channel
Create and initialize fixed A2MP channel

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:11 +03:00
Andrei Emeltchenko
54a59aa2b5 Bluetooth: Add l2cap_chan->ops->ready()
This move socket specific code to l2cap_sock.c.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:11 +03:00
Andrei Emeltchenko
c0df7f6e06 Bluetooth: Move clean up code and set of SOCK_ZAPPED to l2cap_sock.c
This remove a bit more of socket code from l2cap core, this calls set the
SOCK_ZAPPED and do some clean up depending on the socket state.

Reported-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:10 +03:00
Gustavo Padovan
80b9802795 Bluetooth: Use chan as parameters for l2cap chan ops
Use chan instead of void * makes more sense here.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:10 +03:00
Andrei Emeltchenko
523e93cdb3 Bluetooth: Define HCI AMP cmd struct
Add HCI commands to deal with Bluetooth AMP controllers.
Those commands will be used by bluetooth and softamp code.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:09 +03:00
Andrei Emeltchenko
2983fd6824 Bluetooth: Define and use PSM identifiers
Define assigned Protocol and Service Multiplexor (PSM) identifiers
and use them instead of magic numbers.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:09 +03:00
Andrei Emeltchenko
59e54bd15d Bluetooth: Define L2CAP conf continuation flag
Define Continuation flag which the only flag used from Flags field
in L2CAP Configuration Request and Response.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:08 +03:00
Gustavo Padovan
8c520a5992 Bluetooth: Remove unnecessary headers include
Most of the include were unnecessary or already included by some other
header.
Replace module.h by export.h where possible.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:08 +03:00
Gustavo Padovan
c3c7ea6594 Bluetooth: Fix coding style in include/net/bluetooth
Fix all warning and errors reported by checkpatch but license trailing
whitespace and bdaddr_t definition.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:08 +03:00
Andrei Emeltchenko
9b3b44604a Bluetooth: Use defined link key size
Remove magic number with defined link key size.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:06 +03:00
Szymon Janc
a6c511c636 Bluetooth: Rename HCI_QUIRK_NO_RESET to HCI_QUIRK_RESET_ON_CLOSE
HCI_QUIRK_NO_RESET name is misleading - purpose of this quirk is to
reset device on close instead of init, not to not reset at all.
Rename it to HCI_QUIRK_RESET_ON_CLOSE to avoid confusion.

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:06 +03:00
Gustavo Padovan
38351c66e4 Bluetooth: Fix trailing whitespaces in license text
As reported by checkpatch.pl

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:06 +03:00
Mat Martineau
522cc2ee6e Bluetooth: Remove unused ERTM control field macros
Now that l2cap_ctrl is used to set up control fields, these macros are
not needed.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:05 +03:00
Mat Martineau
4239d16f36 Bluetooth: Check rules when setting retransmit or monitor timers
The ERTM specification requires the retransmit timer to be cancelled
when the monitor timer is set.  The retransmit timer cannot be set
again while the monitor timer is pending.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:04 +03:00
Mat Martineau
f5dbb0772d Bluetooth: Remove receive code that has been superceded
This deletes the receive code that had handlers for each frame type at
the top level, and then had logic to determine the receive state
within each handler.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-05 06:34:03 +03:00
Mat Martineau
2827011f66 Bluetooth: Fix early return from l2cap_chan_del
This fixes a regression from commit
2ead70b839 that is present in all
kernels starting at v3.0.

When L2CAP information was moved to struct l2cap_chan, a check was
added to l2cap_chan_del to avoid certain cleanup operations when ERTM
or streaming mode had not yet been initialized.  The logic in the
check did not take in to account that chan->conf_state is set to 0 in
l2cap_chan_ready, so l2cap_chan_del failed to cancel timers and leaked
memory any time the ERTM queues or lists were not empty.

This change makes sure that l2cap_chan_del only returns early if
ERTM initialization was not performed.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-06-05 06:34:02 +03:00
Samuel Ortiz
73167ced31 NFC: Introduce target mode rx data callback
This routine will be called by drivers whenever they receive data in target
mode. This should be unexpected events and as such should be handled by a
standalone API (i.e. not as a callback pointer from an existing API).

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-06-04 21:34:31 +02:00
Samuel Ortiz
be9ae4ce4e NFC: Introduce target mode tx ops
And rename the initiator mode data exchange ops for consistency sake.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-06-04 21:34:30 +02:00
Samuel Ortiz
f212ad5e99 NFC: Set the NFC device RF mode appropriately
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-06-04 21:34:30 +02:00
Samuel Ortiz
fc40a8c1a0 NFC: Add target mode activation netlink event
Userspace gets a netlink event upon target mode activation.
The LLCP layer is also signaled when we get an ATR_REQ in order to get
the remote general bytes.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-06-04 21:34:30 +02:00
Samuel Ortiz
fe7c580073 NFC: Add target mode protocols to the polling loop startup routine
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-06-04 21:34:29 +02:00
Samuel Ortiz
ab73b75130 NFC: Export LLCP general bytes getter
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-06-04 21:34:29 +02:00
Paul Moore
20e2a86485 cipso: handle CIPSO options correctly when NetLabel is disabled
When NetLabel is not enabled, e.g. CONFIG_NETLABEL=n, and the system
receives a CIPSO tagged packet it is dropped (cipso_v4_validate()
returns non-zero).  In most cases this is the correct and desired
behavior, however, in the case where we are simply forwarding the
traffic, e.g. acting as a network bridge, this becomes a problem.

This patch fixes the forwarding problem by providing the basic CIPSO
validation code directly in ip_options_compile() without the need for
the NetLabel or CIPSO code.  The new validation code can not perform
any of the CIPSO option label/value verification that
cipso_v4_validate() does, but it can verify the basic CIPSO option
format.

The behavior when NetLabel is enabled is unchanged.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-01 14:18:29 -04:00
Linus Torvalds
13199a0845 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking changes from David S. Miller:

 1) Fix IPSEC header length calculation for transport mode in ESP.  The
    issue is whether to do the calculation before or after alignment.
    Fix from Benjamin Poirier.

 2) Fix regression in IPV6 IPSEC fragment length calculations, from Gao
    Feng.  This is another transport vs tunnel mode issue.

 3) Handle AF_UNSPEC connect()s properly in L2TP to avoid OOPSes.  Fix
    from James Chapman.

 4) Fix USB ASIX driver's reception of full sized VLAN packets, from
    Eric Dumazet.

 5) Allow drop monitor (and, more generically, all generic netlink
    protocols) to be automatically loaded as a module.  From Neil
    Horman.

Fix up trivial conflict in Documentation/feature-removal-schedule.txt
due to new entries added next to each other at the end. As usual.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  net/smsc911x: Repair broken failure paths
  virtio-net: remove useless disable on freeze
  netdevice: Update netif_dbg for CONFIG_DYNAMIC_DEBUG
  drop_monitor: Add module alias to enable automatic module loading
  genetlink: Build a generic netlink family module alias
  net: add MODULE_ALIAS_NET_PF_PROTO_NAME
  r6040: Do a Proper deinit at errorpath and also when driver unloads (calling r6040_remove_one)
  r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
  skb: avoid unnecessary reallocations in __skb_cow
  net: sh_eth: fix the rxdesc pointer when rx descriptor empty happens
  asix: allow full size 8021Q frames to be received
  rds_rdma: don't assume infiniband device is PCI
  l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case
  mac80211: fix ADDBA declined after suspend with wowlan
  wlcore: fix undefined symbols when CONFIG_PM is not defined
  mac80211: fix flag check for QoS NOACK frames
  ath9k_hw: apply internal regulator settings on AR933x
  ath9k_hw: update AR933x initvals to fix issues with high power devices
  ath9k: fix a use-after-free-bug when ath_tx_setup_buffer() fails
  ath9k: stop rx dma before stopping tx
  ...
2012-05-31 10:32:36 -07:00
Glauber Costa
3f13461939 memcg: decrement static keys at real destroy time
We call the destroy function when a cgroup starts to be removed, such as
by a rmdir event.

However, because of our reference counters, some objects are still
inflight.  Right now, we are decrementing the static_keys at destroy()
time, meaning that if we get rid of the last static_key reference, some
objects will still have charges, but the code to properly uncharge them
won't be run.

This becomes a problem specially if it is ever enabled again, because now
new charges will be added to the staled charges making keeping it pretty
much impossible.

We just need to be careful with the static branch activation: since there
is no particular preferred order of their activation, we need to make sure
that we only start using it after all call sites are active.  This is
achieved by having a per-memcg flag that is only updated after
static_key_slow_inc() returns.  At this time, we are sure all sites are
active.

This is made per-memcg, not global, for a reason: it also has the effect
of making socket accounting more consistent.  The first memcg to be
limited will trigger static_key() activation, therefore, accounting.  But
all the others will then be accounted no matter what.  After this patch,
only limited memcgs will have its sockets accounted.

[akpm@linux-foundation.org: move enum sock_flag_bits into sock.h,
                            document enum sock_flag_bits,
                            convert memcg_proto_active() and memcg_proto_activated() to test_bit(),
                            redo tcp_update_limit() comment to 80 cols]
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:28 -07:00
Gao feng
0c1833797a ipv6: fix incorrect ipsec fragment
Since commit ad0081e43a
"ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed"
the fragment of packets is incorrect.
because tunnel mode needs IPsec headers and trailer for all fragments,
while on transport mode it is sufficient to add the headers to the
first fragment and the trailer to the last.

so modify mtu and maxfraglen base on ipsec mode and if fragment is first
or last.

with my test,it work well(every fragment's size is the mtu)
and does not trigger slow fragment path.

Changes from v1:
	though optimization, mtu_prev and maxfraglen_prev can be delete.
	replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL.
	add fuction ip6_append_data_mtu to make codes clearer.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-27 01:11:22 -04:00
Linus Torvalds
28f3d71761 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull more networking updates from David Miller:
 "Ok, everything from here on out will be bug fixes."

1) One final sync of wireless and bluetooth stuff from John Linville.
   These changes have all been in his tree for more than a week, and
   therefore have had the necessary -next exposure.  John was just away
   on a trip and didn't have a change to send the pull request until a
   day or two ago.

2) Put back some defines in user exposed header file areas that were
   removed during the tokenring purge.  From Stephen Hemminger and Paul
   Gortmaker.

3) A bug fix for UDP hash table allocation got lost in the pile due to
   one of those "you got it..  no I've got it.." situations.  :-)

   From Tim Bird.

4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll
   try to coalesce overlapping frags and crash.  Fix from Eric Dumazet.

5) RCU routing table lookups can race with free_fib_info(), causing
   crashes when we deref the device pointers in the route.  Fix by
   releasing the net device in the RCU callback.  From Yanmin Zhang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits)
  tcp: take care of overlaps in tcp_try_coalesce()
  ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
  mm: add a low limit to alloc_large_system_hash
  ipx: restore token ring define to include/linux/ipx.h
  if: restore token ring ARP type to header
  xen: do not disable netfront in dom0
  phy/micrel: Fix ID of KSZ9021
  mISDN: Add X-Tensions USB ISDN TA XC-525
  gianfar:don't add FCB length to hard_header_len
  Bluetooth: Report proper error number in disconnection
  Bluetooth: Create flags for bt_sk()
  Bluetooth: report the right security level in getsockopt
  Bluetooth: Lock the L2CAP channel when sending
  Bluetooth: Restore locking semantics when looking up L2CAP channels
  Bluetooth: Fix a redundant and problematic incoming MTU check
  Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
  Bluetooth: Fix EIR data generation for mgmt_device_found
  Bluetooth: Fix Inquiry with RSSI event mask
  Bluetooth: improve readability of l2cap_seq_list code
  Bluetooth: Fix skb length calculation
  ...
2012-05-24 11:54:29 -07:00
Linus Torvalds
88d6ae8dc3 Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "cgroup file type addition / removal is updated so that file types are
  added and removed instead of individual files so that dynamic file
  type addition / removal can be implemented by cgroup and used by
  controllers.  blkio controller changes which will come through block
  tree are dependent on this.  Other changes include res_counter cleanup
  and disallowing kthread / PF_THREAD_BOUND threads to be attached to
  non-root cgroups.

  There's a reported bug with the file type addition / removal handling
  which can lead to oops on cgroup umount.  The issue is being looked
  into.  It shouldn't cause problems for most setups and isn't a
  security concern."

Fix up trivial conflict in Documentation/feature-removal-schedule.txt

* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  res_counter: Account max_usage when calling res_counter_charge_nofail()
  res_counter: Merge res_counter_charge and res_counter_charge_nofail
  cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads
  cgroup: remove cgroup_subsys->populate()
  cgroup: get rid of populate for memcg
  cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
  cgroup: make css->refcnt clearing on cgroup removal optional
  cgroup: use negative bias on css->refcnt to block css_tryget()
  cgroup: implement cgroup_rm_cftypes()
  cgroup: introduce struct cfent
  cgroup: relocate __d_cgrp() and __d_cft()
  cgroup: remove cgroup_add_file[s]()
  cgroup: convert memcg controller to the new cftype interface
  memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP
  cgroup: convert all non-memcg controllers to the new cftype interface
  cgroup: relocate cftype and cgroup_subsys definitions in controllers
  cgroup: merge cft_release_agent cftype array into the base files array
  cgroup: implement cgroup_add_cftypes() and friends
  cgroup: build list of all cgroups under a given cgroupfs_root
  cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()
  ...
2012-05-22 17:40:19 -07:00
John W. Linville
a0d0d1685f Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2012-05-22 15:18:06 -04:00
Eric Dumazet
a50feda546 ipv6: bool/const conversions phase2
Mostly bool conversions, some inline removals and const additions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19 01:08:16 -04:00
Eric Dumazet
92113bfde2 ipv6: bool conversions phase1
ipv6_opt_accepted() returns a bool, and can use const pointers

ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(),
ipv6_addr_orchid() return a bool.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 02:24:13 -04:00
Eric Dumazet
cbc264cacd ip_frag: struct inet_frags match() method returns a bool
- match() method returns a boolean
- return (A && B && C && D) -> return A && B && C && D
- fix indentation

Signed-off-by: Eric Dumazet <edumazet@google.com>
2012-05-18 01:40:27 -04:00
Joe Perches
a508da6cc0 lapb: Neaten debugging
Enable dynamic debugging and remove a bunch of #ifdef/#endifs.

Add a lapb_dbg(level, fmt, ...) macro and replace the
printk(KERN_DEBUG uses.
Add pr_fmt and remove embedded prefixes.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17 18:45:20 -04:00
Eric Dumazet
a2a385d627 tcp: bool conversions
bool conversions where possible.

__inline__ -> inline

space cleanups

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17 14:59:59 -04:00
Eric Dumazet
dc6b9b7823 net: include/net/sock.h cleanup
bool/const conversions where possible

__inline__ -> inline

space cleanups

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17 04:50:21 -04:00
David S. Miller
028940342a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-05-16 22:17:37 -04:00
John W. Linville
05f8f25276 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-05-16 15:38:11 -04:00
Eric Dumazet
1b23a5dfc2 net: sock_flag() cleanup
- sock_flag() accepts a const pointer

- sock_flag() returns a boolean

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:30:26 -04:00
Eric Dumazet
865ec5523d fq_codel: should use qdisc backlog as threshold
codel_should_drop() logic allows a packet being not dropped if queue
size is under max packet size.

In fq_codel, we have two possible backlogs : The qdisc global one, and
the flow local one.

The meaningful one for codel_should_drop() should be the global backlog,
not the per flow one, so that thin flows can have a non zero drop/mark
probability.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:30:26 -04:00
alex.bluesman.smirnov@gmail.com
0606069d9e mac802154: monitor device support
Support for monitor device intended to capture all the network activity.
This interface could be used by networks sniffers and is already
supported by WireShark. That's a good test point to check that basic
MAC support works.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:17:08 -04:00
alex.bluesman.smirnov@gmail.com
90c049b2c6 ieee802154: interface type to be added
This stack implementation distinguishes several types of slave
interfaces. Another parameter to 'add_iface_' function is added
to clarify the interface type is going to be registered.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:17:08 -04:00
alex.bluesman.smirnov@gmail.com
74a02fcf77 mac802154: declare reduced mlme operations
According IEEE 802.15.4 standard each node can be either full functionality
device (FFD) or reduce functionality device (RFD). So 2 sets of operations
are needed. This patch declare RFD operations structure.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:16:56 -04:00
alex.bluesman.smirnov@gmail.com
1cd829c83e mac802154: RX data path
Main RX data path implementation between physical and mac layers.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:16:44 -04:00
alex.bluesman.smirnov@gmail.com
1010f54018 mac802154: allocation of ieee802154 device
An interface to allocate and register ieee802154 compatible device.
The allocated device has the following representation in memory:

	+-----------------------+
	| struct wpan_phy       |
	+-----------------------+
	| struct mac802154_priv |
	+-----------------------+
	| driver's private data |
	+-----------------------+

Used by device drivers to register new instance in the stack.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:16:35 -04:00
alex.bluesman.smirnov@gmail.com
0afd7ad9de mac802154: basic ieee802.15.4 device structures
The IEEE 802.15.4 Working Group focuses on the standardization of the
bottom two layers of ISO/OSI protocol stack: Physical (PHY) and MAC.
The MAC layer provides access control to a shared channel and reliable
data delivery. The main functions performed by the MAC sublayer are:
association and disassociation, security control, optional star
network topology functions, such as beacon generation and Guaranteed
Time Slots (GTSs) management, generation of ACK frames (if used), and,
finally, application support for the two possible network topologies
described in the standard.

This is an initial commit which describes main data structures needed
for ieee802.15.4 compatible devices representation in the MAC layer.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:16:14 -04:00
Gustavo Padovan
c5daa683f2 Bluetooth: Create flags for bt_sk()
defer_setup and suspended are now flags into bt_sk().

Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-16 16:14:17 -03:00
Mat Martineau
a6a5568c03 Bluetooth: Lock the L2CAP channel when sending
The ERTM and streaming mode transmit queue must only be accessed while
the L2CAP channel lock is held.  Locking the channel before calling
l2cap_chan_send ensures that multiple threads cannot simultaneously
manipulate the queue when sending and receiving concurrently.

L2CAP channel locking had previously moved to the l2cap_chan struct
instead of the associated socket, so some of the old socket locking
can also be removed in this patch.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-16 16:14:02 -03:00
Vishal Agarwal
9d939d9484 Bluetooth: Fix EIR data generation for mgmt_device_found
The mgmt_device_found function expects to receive only the significant
part of the EIR data so it needs to be removed before calling the
function. This patch adds a new eir_get_length() helper function to
calculate the length of the significant part.

Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-16 16:13:19 -03:00
Gustavo Padovan
08e6d907fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-05-16 16:11:44 -03:00
Johannes Berg
294a20e039 cfg80211: fix cfg80211_can_beacon_sec_chan prototype
It should return bool, not int. The function even
does return true/false.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-16 13:08:15 -04:00
Johannes Berg
ac55d2fe05 mac80211: (selectively) add HT details in radiotap
Add a flag for the HT format (mixed vs. greenfield)
to allow drivers to report that on receive. Not all
drivers will do that though, so allow drivers to set
which radiotap MCS details they report.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-16 12:46:38 -04:00
Janusz.Dziedzic@tieto.com
ee70108fa2 mac80211: Add IV-room in the skb for TKIP and WEP
Add IV-room in skb also for TKIP and WEP.
Extend patch: "mac80211: support adding IV-room in the skb for CCMP keys"

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-16 12:46:37 -04:00
David S. Miller
c727e7f007 Merge branch 'delete-tokenring' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2012-05-16 01:02:40 -04:00
Paul Gortmaker
211ed86510 net: delete all instances of special processing for token ring
We are going to delete the Token ring support.  This removes any
special processing in the core networking for token ring, (aside
from net/tr.c itself), leaving the drivers and remaining tokenring
support present but inert.

The mass removal of the drivers and net/tr.c will be in a separate
commit, so that the history of these files that we still care
about won't have the giant deletion tied into their history.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-15 20:14:35 -04:00
Eric Lapuyade
03bed29e05 NFC: HCI drivers don't have to keep track of polling state
The NFC core code already does that for them.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-15 17:31:22 -04:00
Eric Lapuyade
1676f75159 NFC: Add HCI/SHDLC support to let driver check for tag presence
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-15 17:28:00 -04:00
Eric Lapuyade
d4ccb13280 NFC: Specify usage for targets found and target lost events
It is now specified that nfc_target_found() and nfc_target_lost() core
functions must not be called from an atomic context. This allow us to
serialize calls and protect the targets table using the nfc device lock
instead of a spinlock.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-15 17:28:00 -04:00
Eric Lapuyade
addfabf98d NFC: Remove useless HCI private nfc target table
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-15 17:28:00 -04:00
Eric Lapuyade
9009943326 NFC: Cache the core NFC active target pointer instead of its index
The NFC Core now caches the active nfc target pointer, thereby avoiding
the need to lookup the target table for each invocation of a driver ops.
Consequently, pn533, HCI and NCI now directly receive an nfc_target
pointer instead of a target index.

Cc: Ilan Elias <ilane@ti.com>
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-15 17:27:59 -04:00
John W. Linville
6037463148 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-05-15 16:38:00 -04:00
David S. Miller
bc9b35ad41 xfrm: Convert several xfrm policy match functions to bool.
xfrm_selector_match
xfrm_sec_ctx_match
__xfrm4_selector_match
__xfrm6_selector_match

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15 15:04:57 -04:00
Eric Dumazet
6ff272c9ad codel: use u16 field instead of 31bits for rec_inv_sqrt
David pointed out gcc might generate poor code with 31bit fields.

Using u16 is more than enough and permits a better code output.

Also make the code intent more readable using constants, fixed point arithmetic
not being trivial for everybody.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-14 18:32:56 -04:00
David S. Miller
c597f6653d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2012-05-14 18:00:48 -04:00
Gustavo Padovan
a7d7723ae7 Bluetooth: notify userspace of security level change
It fixes L2CAP socket based security level elevation during a
connection. The HID profile needs this (for keyboards) and it is the only
way to achieve the security level elevation when using the management
interface to talk to the kernel (hence the management enabling patch
being the one that exposes this issue).

It enables the userspace a security level change when the socket is
already connected and create a way to notify the socket the result of the
request. At the moment of the request the socket is made non writable, if
the request fails the connections closes, otherwise the socket is made
writable again, POLL_OUT is emmited.

Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-14 13:51:25 -04:00
Eric Dumazet
536edd6710 codel: use Newton method instead of sqrt() and divides
As Van pointed out, interval/sqrt(count) can be implemented using
multiplies only.

http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots

This patch implements the Newton method and reciprocal divide.

Total cost is 15 cycles instead of 120 on my Corei5 machine (64bit
kernel).

There is a small 'error' for count values < 5, but we don't really care.

I reuse a hole in struct codel_vars :
 - pack the dropping boolean into one bit
 - use 31bit to store the reciprocal value of sqrt(count).

Suggested-by: Van Jacobson <van@pollere.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 15:50:49 -04:00
Eric Dumazet
76e3cc126b codel: Controlled Delay AQM
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.

http://queue.acm.org/detail.cfm?id=2209336

This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.

As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :

target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)

Based on initial work from Dave Taht.

Refactored to help future codel inclusion as a plugin for other linux
qdisc (FQ_CODEL, ...), like RED.

include/net/codel.h contains codel algorithm as close as possible than
Kathleen reference.

net/sched/sch_codel.c contains the linux qdisc specific glue.

Separate structures permit a memory efficient implementation of fq_codel
(to be sent as a separate work) : Each flow has its own struct
codel_vars.

timestamps are taken at enqueue() time with 1024 ns precision, allowing
a range of 2199 seconds in queue, and 100Gb links support. iproute2 uses
usec as base unit.

Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.

Tested from 2Mb to 10Gb speeds with no particular problems, on ixgbe and
tg3 drivers (BQL enabled).

Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
                          [ interval TIME ] [ ecn ]

qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
 Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
 rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
  count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
  maxpacket 1514 ecn_mark 84399 drop_overlimit 0

CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.

One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10 23:35:02 -04:00
Pavel Emelyanov
292e8d8c85 tcp: Move rcvq sending to tcp_input.c
It actually works on the input queue and will use its read mem
routines, thus it's better to have in in the tcp_input.c file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10 23:24:35 -04:00
Nicolas Dichtel
e0268868ba sctp: check cached dst before using it
dst_check() will take care of SA (and obsolete field), hence
IPsec rekeying scenario is taken into account.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yaseivch <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10 23:15:47 -04:00
Mat Martineau
94122bbe9c Bluetooth: Refactor L2CAP ERTM and streaming transmit segmentation
Use more common code for ERTM and streaming mode segmentation and
transmission, and begin using skb control block data for delaying
extended or enhanced header generation until just before the packet is
transmitted.  This code is also better suited for resegmentation,
which is needed when L2CAP links are reconfigured after an AMP channel
move.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Reviewed-by: Ulisses Furquim <ulisses@profusion.mobi>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 01:40:53 -03:00
Marcel Holtmann
9d42820f37 Bluetooth: Enable Low Energy support by default
The Bluetooth Low Energy support so far was disabled by default via
a module parameter. With this change the module parameter will be removed
and Low Energy is enabled by default.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 01:40:52 -03:00
Syam Sidhardhan
2ee8ce35b1 Bluetooth: Remove unused hci_le_ltk_neg_reply()
No one is using hci_le_ltk_neg_reply() in bluetooth subsystem.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 01:40:51 -03:00
Syam Sidhardhan
e10b9969f2 Bluetooth: Remove unused hci_le_ltk_reply()
In this API, we were using sizeof operator for an array
given as function argument, which is invalid.
However this API is not used anywhere.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 01:40:50 -03:00
Mat Martineau
3ce3514f5d Bluetooth: Remove duplicate structure members from bt_skb_cb
These values are now in the nested l2cap_ctrl struct.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 01:40:47 -03:00
Mat Martineau
5a364bd399 Bluetooth: Improve ERTM sequence number offset calculation
Instead of using modular division, the offset can be calculated using
only addition and subtraction.  The previous calculation did not work
as intended and was more difficult to understand, involving unsigned
integer underflow and a check for a negative value where one was not
possible.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 01:40:46 -03:00
Andre Guedes
479453d5fe Bluetooth: Remove advertising cache
User-space pass the remote device address type to kernel through
struct sockaddr_l2 what makes the advertising useless. This patch
removes all advertising cache code.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-05-09 01:40:46 -03:00
Andre Guedes
8e9f98921c Bluetooth: Use address type info from user-space
In order to establish a LE connection we need the address type
information. User-space already pass this information to kernel
through struct sockaddr_l2.

This patch adds the dst_type parameter to l2cap_chan_connect so we
are able to pass the address type info from user-space down to
hci_conn layer.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-05-09 01:40:45 -03:00
Andre Guedes
b12f62cfd9 Bluetooth: Add dst_type parameter to hci_connect
This patch adds the dst_type parameter to hci_connect function.
Instead of searching the address type in advertising cache, we
use the dst_type parameter to establish LE connections.

The dst_type is ignored for BR/EDR connection establishment.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-05-09 01:40:45 -03:00
Andre Guedes
31f7956c66 Bluetooth: Move bdaddr_to_le to hci_core
This patch moves the helper function bdaddr_to_le to hci_core, so it
can be used in mgmt.c and hci_conn.c.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-05-09 01:40:44 -03:00
Andre Guedes
43ef0b8b8d Bluetooth: Add address type to struct sockaddr_l2
This patch adds the address type info to struct sockaddr_l2 so
user-space can inform the remote device address type required
to establish LE connections.

Soon, instead of looking the advertising cache up to discover the
address type, we'll use this address type info to establish LE
connections.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-05-09 01:40:43 -03:00
Andre Guedes
591f47f31b Bluetooth: Move address type macros to bluetooth.h
This patch moves address type macros to bluetooth.h since they will be
used by management interface and Bluetooth socket interface. It also
replaces the macro prefix MGMT_ADDR_ by BDADDR_.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-05-09 01:40:42 -03:00
Andrei Emeltchenko
2bbf2968e5 Bluetooth: trivial: Remove empty line
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-05-09 01:40:42 -03:00
Syam Sidhardhan
e47872209d Bluetooth: Remove strtoba header declared but not defined
No one is using strtoba() in the bluetooth subsystem.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 01:40:34 -03:00
Syam Sidhardhan
270ca16bc7 Bluetooth: remove header declared but not defined
hci_del_off_timer() doesn't exist anymore.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 01:40:34 -03:00
Mat Martineau
3c588192b5 Bluetooth: Add the l2cap_seq_list structure for tracking frames
A sequence list is a data structure used to track frames that need to
be retransmitted, and frames that have been requested for
retransmission by the remote device.  It can compactly represent a
list of sequence numbers within the ERTM transmit window.  Memory for
the list is allocated once at connection time, and common operations
in ERTM are O(1).

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2012-05-09 01:40:30 -03:00
Gustavo Padovan
9033894722 Bluetooth: Remove err parameter from alloc_skb()
Use ERR_PTR maginc instead.

Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-09 01:40:26 -03:00
Andrei Emeltchenko
bd4b165312 Bluetooth: Adds set_default function in L2CAP setup
Some parameters in L2CAP chan are set to default similar way in
socket based channels and A2MP channels. Adds common function which
sets all defaults.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 00:41:39 -03:00
Andre Guedes
0ed09148fa Bluetooth: Remove MGMT_ADDR_INVALID macro
This patch removes the MGMT_ADDR_INVALID macro. If the address type
isn't LE, we consider it is BR/EDR type.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-09 00:41:37 -03:00
Gustavo Padovan
eef1d9b668 Bluetooth: Remove sk parameter from l2cap_chan_create()
Following the separation if core and sock code this change avoid
manipulation of sk inside l2cap_chan_create().

Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 00:41:36 -03:00
Mat Martineau
00e3112c5a Bluetooth: Add a structure to carry ERTM data in skb control blocks
Every field from ERTM control headers is now carried in the control
block so it only has to be parsed or generated once, and can be
efficiently accessed throughout the ERTM code.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 00:41:35 -03:00
Mat Martineau
d5f7ac3810 Bluetooth: Add definitions and struct members for new ERTM state machine
Adds some missing values for control field parsing, additional data
for the new state machine, and enumerations for states, incoming
packet classification, and state machine events.

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 00:41:35 -03:00
Andrei Emeltchenko
6f74b6f36f Bluetooth: Comments and style fixes
Add comments to timer implementation and style fixes.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 00:41:35 -03:00
Andre Guedes
21693c15c0 Bluetooth: Add HCI_PERIODIC_INQ to dev_flags
This patch adds the HCI_PERIODIC_INQ flag to dev_flags. This flag
tracks if periodic inquiry is enabled or not.

Signed-off-by: Andre Guedes <aguedespe@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 00:41:35 -03:00
Andre Guedes
79d6e068be Bluetooth: Add Periodic Inquiry command complete handler
This patch adds a handler function to Periodic Inquiry command
complete event.

Signed-off-by: Andre Guedes <aguedespe@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-05-09 00:41:35 -03:00
Andre Guedes
7dbfac1d72 Bluetooth: Add hci_cancel_le_scan() to hci_core
This patch adds to hci_core the hci_cancel_le_scan function which
should be used to cancel an ongoing LE scan.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-09 00:41:32 -03:00
Andrei Emeltchenko
58115373e7 Bluetooth: Correct ediv in SMP
ediv is already in little endian order.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-09 00:41:30 -03:00
Marcel Holtmann
cdbaccca73 Bluetooth: Add management command for setting Device ID
The Device ID details need to be programmed into the kernel for every
controller at least once. So provide management command for this.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-09 00:41:30 -03:00
Marcel Holtmann
2b9be137b7 Bluetooth: Handle EIR tags for Device ID
The Device ID information can be provided via Extended Inquiry Data
as well. If a valid source is present, then include it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-09 00:41:30 -03:00
Marcel Holtmann
91c4e9b1ac Bluetooth: Add TX power tag to EIR data
The Inquiry Response TX power tag should be added to the Extended
Inquiry Data (EIR) as well.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-09 00:41:30 -03:00
David Herrmann
6935e0f518 Bluetooth: Remove redundant hdev->parent field
We initialize the "struct device" in hci_alloc_dev() for a long time now
so we can access hdev->dev.parent directly. Hence, we can drop the
temporary field hdev->parent which is used in no other place than
hci_add_sysfs().

SET_HCIDEV_DEV() is never called after registering a device by the
drivers so we do not overwrite internal device-state. Furthermore,
hdev->dev is initialized to 0 by kzalloc() inside hci_alloc_dev() so the
default behavior with dev.parent = NULL is kept.

Signed-off-by: David Herrmann <dh.herrmann@googlemail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-05-09 00:41:30 -03:00
Andrei Emeltchenko
9a00665792 Bluetooth: Correct type for ediv to __le16
Correct type warnings reported by sparse to show that this
functions takes ediv argument in __le16 format.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2012-05-09 00:41:29 -03:00
Andrei Emeltchenko
7d69230c43 Bluetooth: Correct type for hdev lmp_subver
Keep lmp_subver in host byte order. We have following conversion
in hci_cc_read_local_version:
hdev->lmp_subver = __le16_to_cpu(rp->lmp_subver);

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2012-05-09 00:41:28 -03:00
Ashok Nagarajan
70c33eaae7 {nl,cfg,mac}80211: Allow user to see/configure HT protection mode
This patch introduces a new mesh configuration parameter "ht_opmode" and will
allow user to check the current HT protection mode selected. Users could
configure the protection mode by the command "iw mesh_iface set mesh_param
mesh_ht_protection_mode=2". The default protection mode of mesh is set to
non-HT mixed mode.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Reviewed-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:59 -04:00
Ben Greear
e352114fd6 mac80211: Framework to get wifi-driver stats via ethtool.
This adds hooks to call into the driver to get additional
stats for the ethtool API.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:51 -04:00
Ben Greear
d61992182e cfg80211: Add framework to support ethtool stats.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:49 -04:00
Pablo Neira Ayuso
f73181c828 ipvs: add support for sync threads
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.

	The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.

	Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.

	Make sure the backup socks do not block on reading.

Special thanks to Aleksey Chudov for helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:33 +02:00
Julian Anastasov
749c42b620 ipvs: reduce sync rate with time thresholds
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
	timer that triggers new sync message. It can be used to
	avoid sync messages for the specified period (or half of
	the connection timeout if it is lower) if connection state
	is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
	sync_refresh_period/8. Useful to protect against loss of
	sync messages.

	Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

	Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

	Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:10 +02:00
Pablo Neira Ayuso
1c003b1580 ipvs: wakeup master thread
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,

	Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.

	Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.

	As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).

	Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.

	Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.

Thanks to Pablo Neira Ayuso for his valuable feedback!

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:39:53 +02:00
Eric Dumazet
ac3a546ac8 netfilter: nf_conntrack: use this_cpu_inc()
this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:36:33 +02:00
Eric Leblond
a900689264 netfilter: nf_ct_helper: allow to disable automatic helper assignment
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:35:18 +02:00
David S. Miller
0d6c4a2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/param.c
	drivers/net/wireless/iwlwifi/iwl-agn-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell.  In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr.  'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 23:35:40 -04:00
Eric Dumazet
bd14b1b2e2 tcp: be more strict before accepting ECN negociation
It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.

Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Perry Lorier <perryl@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Wilmer van der Gaast <wilmer@google.com>
Cc: Ankur Jain <jankur@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Dave Täht <dave.taht@bufferbloat.net>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-04 12:05:27 -04:00
Eric Dumazet
b081f85c29 net: implement tcp coalescing in tcp_queue_rcv()
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main
receiver function when application is not blocked in recvmsg().

Function tcp_queue_rcv() is moved a bit to allow its call from
tcp_data_queue()

This gives good results especially if GRO could not kick, and if skb
head is a fragment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:11:11 -04:00
Yuchung Cheng
750ea2bafa tcp: early retransmit: delayed fast retransmit
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:10 -04:00
Yuchung Cheng
eed530b6c6 tcp: early retransmit
This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.

While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf

Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.

ER is enabled by sysctl_tcp_early_retrans:
  0: Disables ER

  1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.

  2: (Default) reduce dupthresh like mode 1. In addition, delay
     entering fast recovery by RTT/4.

Note: mode 2 is implemented in the third part of this patch series.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:10 -04:00
John W. Linville
076e7779c0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-05-01 14:14:05 -04:00
Eric Dumazet
518fbf9cdf net: fix sk_sockets_allocated_read_positive
Denys Fedoryshchenko reported frequent crashes on a proxy server and kindly
provided a lockdep report that explains it all :

  [  762.903868]
  [  762.903880] =================================
  [  762.903890] [ INFO: inconsistent lock state ]
  [  762.903903] 3.3.4-build-0061 #8 Not tainted
  [  762.904133] ---------------------------------
  [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
  [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>]
__percpu_counter_sum+0xd/0x58
  [  762.904542] {IN-SOFTIRQ-W} state was registered at:
  [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
  [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
  [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
  [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
  [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
  [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
  [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
  [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
  [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
  [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
  [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
  [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
  [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
  [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
  [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
  [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
  [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
  [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
  [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
  [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
  [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
  [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
  [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
  [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
  [  762.904542] irq event stamp: 156915469
  [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>]
__slab_alloc.clone.58.clone.63+0xc4/0x2de
  [  762.904542] hardirqs last disabled at (156915468): [<c019b452>]
__slab_alloc.clone.58.clone.63+0x22/0x2de
  [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>]
lock_sock_nested+0x64/0x6c
  [  762.904542] softirqs last disabled at (156915464): [<c0349914>]
_raw_spin_lock_bh+0xe/0x45
  [  762.904542]
  [  762.904542] other info that might help us debug this:
  [  762.904542]  Possible unsafe locking scenario:
  [  762.904542]
  [  762.904542]        CPU0
  [  762.904542]        ----
  [  762.904542]   lock(key#3);
  [  762.904542]   <Interrupt>
  [  762.904542]     lock(key#3);
  [  762.904542]
  [  762.904542]  *** DEADLOCK ***
  [  762.904542]
  [  762.904542] 1 lock held by squid/1603:
  [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
lock_sock+0xa/0xc
  [  762.904542]
  [  762.904542] stack backtrace:
  [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
  [  762.904542] Call Trace:
  [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
  [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
  [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
  [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
  [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
  [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
  [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
  [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
  [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
  [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
  [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
  [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
  [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
  [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
  [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
  [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
  [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
  [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
  [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
  [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
  [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
  [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
  [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83

Bug is that sk_sockets_allocated_read_positive() calls
percpu_counter_sum_positive() without BH being disabled.

This bug was added in commit 180d8cd942
(foundations of per-cgroup memory pressure controlling.), since previous
code was using percpu_counter_read_positive() which is IRQ safe.

In __sk_mem_schedule() we dont need the precise count of allocated
sockets and can revert to previous behavior.

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Sined-off-by: Eric Dumazet <edumazet@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Acked-by: Neal Cardwell <ncardwell@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 13:37:59 -04:00
David S. Miller
5414fc12e3 Merge branch 'master' of git://1984.lsi.us.es/net 2012-04-30 13:23:22 -04:00
Hans Schillstrom
8537de8a7a ipvs: kernel oops - do_ip_vs_get_ctl
Change order of init so netns init is ready
when register ioctl and netlink.

Ver2
	Whitespace fixes and __init added.

Reported-by: "Ryan O'Hara" <rohara@redhat.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom
582b8e3ead ipvs: take care of return value from protocol init_netns
ip_vs_create_timeout_table() can return NULL
All functions protocol init_netns is affected of this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Benjamin LaHaise
d7f3f62167 net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6
Now that the sematics of udpv6_queue_rcv_skb() match IPv4's
udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
John W. Linville
4dcc0637fc Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-04-27 15:16:43 -04:00
Eric Dumazet
6746960140 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing
Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.

With help from David S. Miller

Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 00:03:34 -04:00
John W. Linville
d9b8ae6bd8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-testmode.c
2012-04-26 15:03:48 -04:00
Peter Huang (Peng)
a881e963c7 set fake_rtable's dst to NULL to avoid kernel Oops
bridge: set fake_rtable's dst to NULL to avoid kernel Oops

when bridge is deleted before tap/vif device's delete, kernel may
encounter an oops because of NULL reference to fake_rtable's dst.
Set fake_rtable's dst to NULL before sending packets out can solve
this problem.

v4 reformat, change br_drop_fake_rtable(skb) to {}

v3 enrich commit header

v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct.

[ Use "do { } while (0)" for nop br_drop_fake_rtable()
  implementation -DaveM ]

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Peter Huang <peter.huangpeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:16:24 -04:00
David S. Miller
f24001941c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Fix merge between commit 3adadc08cc ("net ax25: Reorder ax25_exit to
remove races") and commit 0ca7a4c87d ("net ax25: Simplify and
cleanup the ax25 sysctl handling")

The former moved around the sysctl register/unregister calls, the
later simply removed them.

With help from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:15:17 -04:00
Eric Dumazet
f545a38f74 net: add a limit parameter to sk_add_backlog()
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the
memory limit. We need to make this limit a parameter for TCP use.

No functional change expected in this patch, all callers still using the
old sk_rcvbuf limit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:28:28 -04:00
Eric W. Biederman
b98985073b net ax25: Fix the build when sysctl support is disabled.
Randy Dunlap <rdunlap@xenotime.net> reported:

> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since 20120420:
>
>
> include/net/ax25.h:447:75: error: expected ';' before '}' token
>
> static inline int ax25_register_dev_sysctl(ax25_dev *ax25_dev) { return 0 };
> static inline void ax25_unregister_dev_sysctl(ax25_dev *ax25_dev) {};
>
> First function:  move ';' inside braces.
> Second function:  drop the ';'.

Put the semicolons where it makes sense.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:14:47 -04:00
Eric W. Biederman
48c7495857 net sysctl: Add place holder functions for when sysctl support is compiled out of the kernel.
Randy Dunlap <rdunlap@xenotime.net> reported:
> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since 20120420:
>
>
>
> ERROR: "unregister_net_sysctl_table" [net/phonet/phonet.ko] undefined!
> ERROR: "register_net_sysctl" [net/phonet/phonet.ko] undefined!
>
> when CONFIG_SYSCTL is not enabled.

Add static inline stub functions to gracefully handle the case when sysctl
support is not present.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 19:24:28 -04:00
Wey-Yi Guy
0d8a0a1728 mac80211: declare ieee80211_ave_rssi as EXPORT
ieee80211_ave_rssi need to be declare as export for driver to use it.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:37:41 -04:00
Neal Cardwell
900f65d361 tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()
This commit moves the (substantial) common code shared between
tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family
independent function, tcp_init_sock().

Centralizing this functionality should help avoid drift issues,
e.g. where the IPv4 side is updated without a corresponding update to
IPv6. There was already some drift: IPv4 initialized snd_cwnd to
TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to
2 (in this case it should not matter, since snd_cwnd is also
initialized in tcp_init_metrics(), but the general risks and
maintenance overhead remain).

When diffing the old and new code, note that new tcp_init_sock()
function uses the order of steps from the tcp_v4_init_sock()
implementation (the order is slightly different in
tcp_v6_init_sock()).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:36:42 -04:00
Pavel Emelyanov
ee9952831c tcp: Initial repair mode
This includes (according the the previous description):

* TCP_REPAIR sockoption

This one just puts the socket in/out of the repair mode.
Allowed for CAP_NET_ADMIN and for closed/establised sockets only.
When repair mode is turned off and the socket happens to be in
the established state the window probe is sent to the peer to
'unlock' the connection.

* TCP_REPAIR_QUEUE sockoption

This one sets the queue which we're about to repair. The
'no-queue' is set by default.

* TCP_QUEUE_SEQ socoption

Sets the write_seq/rcv_nxt of a selected repaired queue.
Allowed for TCP_CLOSE-d sockets only. When the socket changes
its state the other seq-s are changed by the kernel according
to the protocol rules (most of the existing code is actually
reused).

* Ability to forcibly bind a socket to a port

The sk->sk_reuse is set to SK_FORCE_REUSE.

* Immediate connect modification

The connect syscall initializes the connection, then directly jumps
to the code which finalizes it.

* Silent close modification

The close just aborts the connection (similar to SO_LINGER with 0
time) but without sending any FIN/RST-s to peer.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov
370816aef0 tcp: Move code around
This is just the preparation patch, which makes the needed for
TCP repair code ready for use.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov
4a17fd5229 sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Eric W. Biederman
5f568e5afe net: Remove register_net_sysctl_table
All of the users have been converted to use registera_net_sysctl so we
no longer need register_net_sysctl.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman
a5347fe36b net: Delete all remaining instances of ctl_path
We don't use struct ctl_path anymore so delete the exported constants.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman
f99e8f715a net: Convert nf_conntrack_proto to use register_net_sysctl
There isn't much advantage here except that strings paths are a bit
easier to read, and converting everything to them allows me to kill off
ctl_path.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman
6dceb03687 net ipv6: Don't use sysctl tables with .child entries.
The sysctl core no longer natively understands sysctl tables
with .child entries.

Split the ipv6_table to remove the .child entries.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman
0ca7a4c87d net ax25: Simplify and cleanup the ax25 sysctl handling.
Don't register/unregister every ax25 table in a batch.  Instead register
and unregister per device ax25 sysctls as ax25 devices come and go.

This moves ax25 to be a completely modern sysctl user.  Registering the
sysctls in just the initial network namespace, removing the use of
.child entries that are no longer natively supported by the sysctl core
and taking advantage of the fact that there are no longer any ordering
constraints between registering and unregistering different sysctl
tables.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:28 -04:00
Eric W. Biederman
4e5ca78541 net ipv4: Remove the unneeded registration of an empty net/ipv4/neigh
sysctl no longer requires explicit creation of directories.  The neigh
directory is always populated with at least a default entry so this
won't cause any user visible changes.

Delete the ipv4_path and the ipv4_skeleton these are no longer needed.

Directly register the ipv4_route_table.

And since I am an idiot remove the header definitions that I should
have removed in the previous patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:18 -04:00
Eric W. Biederman
4344475797 net: Kill register_sysctl_rotable
register_sysctl_rotable never caught on as an interesting way to
register sysctls.  My take on the situation is that what we want are
sysctls that we can only see in the initial network namespace.  What we
have implemented with register_sysctl_rotable are sysctls that we can
see in all of the network namespaces and can only change in the initial
network namespace.

That is a very silly way to go.  Just register the network sysctls
in the initial network namespace and we don't have any weird special
cases to deal with.

The sysctls affected are:
/proc/sys/net/ipv4/ipfrag_secret_interval
/proc/sys/net/ipv4/ipfrag_max_dist
/proc/sys/net/ipv6/ip6frag_secret_interval
/proc/sys/net/ipv6/mld_max_msf

I really don't expect anyone will miss them if they can't read them in a
child user namespace.

CC: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
Eric W. Biederman
2ca794e5e8 net sysctl: Initialize the network sysctls sooner to avoid problems.
If the netfilter code is modified to use register_net_sysctl_table the
kernel fails to boot because the per net sysctl infrasturce is not setup
soon enough.  So to avoid races call net_sysctl_init from sock_init().

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:16 -04:00
Eric W. Biederman
ab41a2ca50 net: Implement register_net_sysctl.
Right now all of the networking sysctl registrations are running in a
compatibiity mode.  The natvie sysctl registration api takes a cstring
for a path and a simple ctl_table.  Implement register_net_sysctl so
that we can register network sysctls without needing to use
compatiblity code in the sysctl core.

Switching from a ctl_path to a cstring results in less boiler plate
and denser code that is a little easier to read.

I would simply have changed the arguments to register_net_sysctl_table
instead of keeping two functions in parallel but gcc will allow a
ctl_path pointer to be passed to a char * pointer with only issuing a
warning resulting in completely incorrect code can be built.  Since I
have to change the function name I am taking advantage of the situation
to let both register_net_sysctl and register_net_sysctl_table live for a
short time in parallel which makes clean conversion patches a bit easier
to read and write.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:15 -04:00
John W. Linville
59ef43e681 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-testmode.c
	include/net/nfc/nfc.h
	net/nfc/netlink.c
	net/wireless/nl80211.c
2012-04-18 14:27:48 -04:00
Randy Dunlap
d3d4f0a025 net/sock.h: fix sk_peek_off kernel-doc warning
Fix kernel-doc warning in net/sock.h:

Warning(include/net/sock.h:377): No description found for parameter 'sk_peek_off'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-17 22:32:00 -04:00
Jiri Bohac
cda31e10ba ipv6: clean up rt6_clean_expires
Functionally, this change is a NOP.

Semantically, rt6_clean_expires() wants to do rt->dst.from = NULL instead of
rt->dst.expires = 0. It is clearing the RTF_EXPIRES flag, so the union is going
to be treated as a pointer (dst.from) not a long (dst.expires).

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-17 22:31:59 -04:00
Jiri Bohac
edfb5d4687 ipv6: fix rt6_update_expires
Commit 1716a961 (ipv6: fix problem with expired dst cache) broke PMTU
discovery. rt6_update_expires() calls dst_set_expires(), which only updates
dst->expires if it has not been set previously (expires == 0) or if the new
expires is earlier than the current dst->expires.

rt6_update_expires() needs to zero rt->dst.expires, otherwise it will contain
ivalid data left over from rt->dst.from and will confuse dst_set_expires().

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-17 22:31:51 -04:00
David Ward
4362aaf605 net_sched: red: Make minor corrections to comments
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16 23:53:11 -04:00
Wey-Yi Guy
1dae27f84b mac80211: add function retrieve average rssi
Add utility function to provide the average rssi per vif

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:38:49 -04:00
Neal Cardwell
f4f9f6e75d tcp: restore formatting of macros for tcp_skb_cb sacked field
Commit b82d1bb4 inadvertendly placed unrelated new code between
TCPCB_EVER_RETRANS and TCPCB_RETRANS and the other macros that refer
to the sacked field in the struct tcp_skb_cb (probably because there
was a misleading empty line there). This commit fixes up the
formatting so that all macros related to the sacked field are adjacent
again.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16 14:38:16 -04:00
Johannes Berg
8e8b41f9d8 cfg80211: enforce lack of interface combinations
My grand plan to allow drivers to gradually move over
to advertising virtual interface combinations and only
enforce with drivers that do want it enforced doesn't
seem to be working out, only Christian ever added the
advertising (to carl9170), nobody else did.

Begin enforcing combinations in cfg80211 so that users
can rely on the information reported about a device.

Cc: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
Cc: Jouni Malinen <jouni@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Senthil Balasubramanian <senthilb@qca.qualcomm.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Nick Kossifidis <mickflemm@gmail.com>
Cc: Bob Copeland <me@bobcopeland.com>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Gertjan van Wingerde <gwingerde@gmail.com>
Cc: Helmut Schaa <helmut.schaa@googlemail.com>
Cc: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:16:58 -04:00
Vishal Agarwal
6ec5bcadc2 Bluetooth: Temporary keys should be retained during connection
If a key is non persistent then it should not be used in future
connections but it should be kept for current connection. And it
should be removed when connecion is removed.

Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-04-16 12:57:45 +03:00
Vishal Agarwal
745c0ce35f Bluetooth: hci_persistent_key should return bool
This patch changes the return type of function hci_persistent_key
from int to bool because it makes more sense to return information
whether a key is persistent or not as a bool.

Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-04-16 12:57:40 +03:00
David S. Miller
56845d78ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/atheros/atlx/atl1.c
	drivers/net/ethernet/atheros/atlx/atl1.h

Resolved a conflict between a DMA error bug fix and NAPI
support changes in the atl1 driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:19:04 -04:00
Eric Dumazet
95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
Alex Copot
aacd9289af tcp: bind() use stronger condition for bind_conflict
We must try harder to get unique (addr, port) pairs when
doing port autoselection for sockets with SO_REUSEADDR
option set.

We achieve this by adding a relaxation parameter to
inet_csk_bind_conflict. When 'relax' parameter is off
we return a conflict whenever the current searched
pair (addr, port) is not unique.

This tries to address the problems reported in patch:
	8d238b25b1
	Revert "tcp: bind() fix when many ports are bound"

Tests where ran for creating and binding(0) many sockets
on 100 IPs. The results are, on average:

	* 60000 sockets, 600 ports / IP:
		* 0.210 s, 620 (IP, port) duplicates without patch
		* 0.219 s, no duplicates with patch
	* 100000 sockets, 1000 ports / IP:
		* 0.371 s, 1720 duplicates without patch
		* 0.373 s, no duplicates with patch
	* 200000 sockets, 2000 ports / IP:
		* 0.766 s, 6900 duplicates without patch
		* 0.768 s, no duplicates with patch
	* 500000 sockets, 5000 ports / IP:
		* 2.227 s, 41500 duplicates without patch
		* 2.284 s, no duplicates with patch

Signed-off-by: Alex Copot <alex.mihai.c@gmail.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:28:55 -04:00
Eric Dumazet
fd4f2cead6 tcp: RFC6298 supersedes RFC2988bis
Updates some comments to track RFC6298

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:24:26 -04:00
stephen hemminger
87b6d218f3 tunnel: implement 64 bits statistics
Convert the per-cpu statistics kept for GRE, IPIP, and SIT tunnels
to use 64 bit statistics.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 14:47:05 -04:00
Michal Kazior
4ee73f338a mac80211: remove hw.conf.channel usage where possible
Removes hw.conf.channel usage from the following functions:
 * ieee80211_mandatory_rates
 * ieee80211_sta_get_rates
 * ieee80211_frame_duration
 * ieee80211_rts_duration
 * ieee80211_ctstoself_duration

This is in preparation for multi-channel operation.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:50 -04:00
Pontus Fuchs
d91df0e3a1 cfg80211: Add channel information to NL80211_CMD_GET_INTERFACE
If the current channel is known, add frequency and channel type to
NL80211_CMD_GET_INTERFACE.

Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:49 -04:00
David S. Miller
be38395204 rtnetlink: ops->get_tx_queue() cannot take a const 'tb'.
net/core/rtnetlink.c: In function ‘rtnl_create_link’:
net/core/rtnetlink.c:1645:3: warning: passing argument 2 of ‘ops->get_tx_queues’ from incompatible pointer type [enabled by default]
net/core/rtnetlink.c:1645:3: note: expected ‘const struct nlattr **’ but argument is of type ‘struct nlattr **’

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 14:21:04 -04:00
Hiroaki SHIMODA
dcd2ba92e8 neighbour: Make neigh_table_init_no_netlink() static.
neigh_table_init_no_netlink() is only used in net/core/neighbour.c file.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 14:00:44 -04:00
Eric Dumazet
447167bf56 udp: intoduce udp_encap_needed static_key
Most machines dont use UDP encapsulation (L2TP)

Adds a static_key so that udp_queue_rcv_skb() doesnt have to perform a
test if L2TP never setup the encap_rcv on a socket.

Idea of this patch came after Simon Horman proposal to add a hook on TCP
as well.

If static_key is not yet enabled, the fast path does a single JMP .

When static_key is enabled, JMP destination is patched to reach the real
encap_type/encap_rcv logic, possibly adding cache misses.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:39:37 -04:00
stephen hemminger
9b17876f3e rtnetlink: fix comments
Fix spelling and references in rtnetlink.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:32:39 -04:00
stephen hemminger
efacb309b5 rtnetlink & bonding: change args got get_tx_queues
Change get_tx_queues, drop unsused arg/return value real_tx_queues,
and use return by value (with error) rather than call by reference.

Probably bonding should just change to LLTX and the whole get_tx_queues
API could disappear!

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:31:00 -04:00
Gao feng
1716a96101 ipv6: fix problem with expired dst cache
If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.

Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.

rt6_check_expired check if rt6_info.dst.from is expired.

ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

ip6_dst_destroy release the ort.

Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.

Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.

Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 12:58:29 -04:00
Dmitry Tarnyagin
ece367d53a caif-hsi: robust frame aggregation for HSI
Implement aggregation algorithm, combining more data into a single
HSI transfer. 4 different traffic categories are supported:
 1. TC_PRIO_CONTROL .. TC_PRIO_MAX (CTL)
 2. TC_PRIO_INTERACTIVE            (VO)
 3. TC_PRIO_INTERACTIVE_BULK       (VI)
 4. TC_PRIO_BESTEFFORT, TC_PRIO_BULK, TC_PRIO_FILLER (BEBK)

Signed-off-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:37:36 -04:00
Dmitry Tarnyagin
447648128e caif: set traffic class for caif packets
Set traffic class for CAIF packets, based on socket
priority, CAIF protocol type, or type of message.

Traffic class mapping for different packet types:
 - control:       TC_PRIO_CONTROL;
 - flow control:  TC_PRIO_CONTROL;
 - at:            TC_PRIO_CONTROL;
 - rfm:           TC_PRIO_INTERACTIVE_BULK;
 - other sockets: equals to socket's TC;
 - network data:  no change.

Signed-off-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:37:36 -04:00
David S. Miller
816a7854d5 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2012-04-12 20:12:31 -04:00
David S. Miller
011e3c6325 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-12 19:41:23 -04:00
Alexey I. Froloff
e35f30c131 Treat ND option 31 as userland (DNSSL support)
As specified in RFC6106, DNSSL option contains one or more domain names
of DNS suffixes.  8-bit identifier of the DNSSL option type as assigned
by the IANA is 31.  This option should also be treated as userland.

Signed-off-by: Alexey I. Froloff <raorn@raorn.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-12 15:56:57 -04:00
Eric Lapuyade
c8d56ae786 NFC: Add Core support to generate tag lost event
Some HW/drivers get notifications when a tag moves out of the radio field.
This notification is now forwarded to user space through netlink.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:39 -04:00
Eric Lapuyade
144612cacc NFC: Changed target activated state logic
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:38 -04:00
Eric Lapuyade
01ae0eea9b NFC: Fix next target_idx type and rename for clarity
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:37 -04:00
Samuel Ortiz
c4fbb6515a NFC: The core part should generate the target index
The target index can be used by userspace to uniquely identify a target
and thus should be kept unique, per NFC adapter. Moreover, some protocols
do not provide a logical index when discovering new targets, so we have to
generate one for them.
For NCI or pn533 to fetch their logical index, we added a logical_idx field
to the target structure.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:37 -04:00
Eric Lapuyade
eb738fe535 NFC: SHDLC implementation
Most NFC HCI chipsets actually use a simplified HDLC link layer to
carry HCI payloads.
This implementation registers itself as an HCI device on behalf of the
NFC driver.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:35 -04:00
Eric Lapuyade
8b8d2e08bf NFC: HCI support
This is an implementation of ETSI TS 102 622 specification.
Many NFC chipsets use HCI as the host <-> target protocol on top of a
serial link like i2c.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:34 -04:00
Eric Lapuyade
e1da0efa2e NFC: Export target lost function
NFC drivers will call this routine when they detect that a tag leaves the
RF field. This will eventually lead to the corresponding netlink event
to be sent.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:34 -04:00
John W. Linville
7eab0f64a9 Merge branch 'master' into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-testmode.c
	net/wireless/nl80211.c
2012-04-12 14:41:59 -04:00
John W. Linville
8065248069 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2012-04-12 13:49:28 -04:00
Thomas Pedersen
5314526b17 cfg80211: add channel switch notify event
The firmware may decide to switch channels while already beaconing, e.g.
in response to a cfg80211 connect request on a different vif. Add this
event to notify userspace when an AP or GO interface has successfully
migrated to a new channel, so it can update its configuration
accordingly.

Signed-off-by: Thomas Pedersen <c_tpeder@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:59 -04:00
Johannes Berg
6d52563f2b cfg80211/mac80211: enable proper device_set_wakeup_enable handling
In WoWLAN, we only get the triggers when we actually get
to suspend. As a consequence, drivers currently don't
know that the device should enable wakeup. However, the
device_set_wakeup_enable() API is intended to be called
when the wakeup is enabled, not later when needed.

Add a new set_wakeup() call to cfg80211 and mac80211 to
allow drivers to properly call device_set_wakeup_enable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:57 -04:00
Johannes Berg
3a25a8c8b7 mac80211: add improved HW queue control
mac80211 currently only supports one hardware queue
per AC. This is already problematic for off-channel
uses since if we go off channel while the BE queue
is full and then try to send an off-channel frame
the frame will never go out. This will become worse
when we support multi-channel since then a queue on
one channel might be full, but we have to stop the
software queue for all channels. That is obviously
not desirable.

To address this problem allow drivers to register
more hardware queues, and allow them to map them to
virtual interfaces. When they stop a hardware queue
the corresponding AC software queues on the correct
interfaces will be stopped as well. Additionally,
there's an off-channel queue to solve that problem
and a per-interface after-DTIM beacon queue. This
allows drivers to manage software queues closer to
how the hardware works.

Currently, there's a limit of 16 hardware queues.
This may or may not be sufficient, we can adjust it
as needed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:50 -04:00
Johannes Berg
4b6f1dd6a6 mac80211: add explicit monitor interface if needed
The queue mapping redesign that I'm planning to do
will break pure injection unless we handle monitor
interfaces explicitly. One possible option would
be to have the driver tell mac80211 about monitor
mode queues etc., but that would duplicate the API
since we already need to have queue assignments
handled per virtual interface.

So in order to solve this, have a virtual monitor
interface that is added whenever all active vifs
are monitors. We could also use the state of one
of the monitor interfaces, but managing that would
be complicated, so allocate separate state.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:49 -04:00
Ashok Nagarajan
657c3e0c41 mac80211: Indicate basic rates when adding rate IEs
Basic rates are added with supported rates IE and extended supported
rates IE.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:46 -04:00
Javier Cardona
d299a1f21e {nl,cfg}80211: Support for mesh synchronization
Report Toffset to userspace.
Let userspace select the mesh synchronization method.

Signed-off-by: Marco Porsch <marco.porsch@s2005.tu-chemnitz.de>
Signed-off-by: Pavel Zubarev <pavel.zubarev@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 15:20:33 -04:00
Johannes Berg
a3304b0a17 cfg80211/nl80211: clarify TX queue API
With the plan to change mac80211's queue API to
not map ACs to queues 1:1, it seems necessary to
clarify some APIs that act on ACs rather than on
queues to spell that out explicitly. Do this.

Also verify that the AC number given is valid.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:09 -04:00
Johannes Berg
d748b4642a mac80211: remove antenna_sel_tx TX info field
This field is never set to anything non-zero in
mac80211, so we should be able to remove it.
Unfortunately though, the iwlwifi and iwlegacy
drivers use it for their internal TX status
processing (which shouldn't be using the rate
control API to start with), so add a new field
"status.antenna" for them, at least for now.

In the future, I plan to use the new field to
hold the hardware queue, while the SKB's queue
mapping holds the AC.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:09 -04:00
Johannes Berg
8f727ef3c4 mac80211: notify driver of rate control updates
Devices that have internal rate control need to be
notified when the bandwidth or SMPS state changes
just like external rate control algorithms get a
notification now.

Add this notification and clarify the change bits
while at it, the HT_CHANGED bit really meant only
bandwidth changed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:08 -04:00
Johannes Berg
64f68e5d15 mac80211: remove channel type argument from rate_update
The channel type argument to the rate_update()
callback isn't really the correct way to give
the rate control algorithm about the desired
RX bandwidth of the peer.

Remove this argument, and instead update the
STA capabilities with 20/40 appropriately. The
SMPS update done by this callback works in the
same way, so this makes the callback cleaner.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:08 -04:00
David S. Miller
06eb4eafbd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-10 14:30:45 -04:00