Commit Graph

37423 Commits

Author SHA1 Message Date
Henning Rogge
33b4b015e1 net/ipv6/udp: Fix ipv6 multicast socket filter regression
Commit <5cf3d46192fc> ("udp: Simplify__udp*_lib_mcast_deliver")
simplified the filter for incoming IPv6 multicast but removed
the check of the local socket address and the UDP destination
address.

This patch restores the filter to prevent sockets bound to a IPv6
multicast IP to receive other UDP traffic link unicast.

Signed-off-by: Henning Rogge <hrogge@gmail.com>
Fixes: 5cf3d46192 ("udp: Simplify__udp*_lib_mcast_deliver")
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:34:43 -04:00
David S. Miller
456cdf53ef Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2015-05-17

A couple more Bluetooth updates for 4.1:

- New USB IDs for ath3k & btusb
- Fix for remote name resolving during device discovery

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-18 16:15:31 -04:00
Florent Fourcot
21858cd02d tcp/ipv6: fix flow label setting in TIME_WAIT state
commit 1d13a96c74 ("ipv6: tcp: fix flowlabel value in ACK messages
send from TIME_WAIT") added the flow label in the last TCP packets.
Unfortunately, it was not casted properly.

This patch replace the buggy shift with be32_to_cpu/cpu_to_be32.

Fixes: 1d13a96c74 ("ipv6: tcp: fix flowlabel value in ACK messages")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 23:41:59 -04:00
Nicolas Dichtel
ed2a80ab7b rtnl/bond: don't send rtnl msg for unregistered iface
Before the patch, the command 'ip link add bond2 type bond mode 802.3ad'
causes the kernel to send a rtnl message for the bond2 interface, with an
ifindex 0.

'ip monitor' shows:
0: bond2: <BROADCAST,MULTICAST,MASTER> mtu 1500 state DOWN group default
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
9: bond2@NONE: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN group default
    link/ether ea:3e:1f:53:92:7b brd ff:ff:ff:ff:ff:ff
[snip]

The patch fixes the spotted bug by checking in bond driver if the interface
is registered before calling the notifier chain.
It also adds a check in rtmsg_ifinfo() to prevent this kind of bug in the
future.

Fixes: d4261e5650 ("bonding: create netlink event when bonding option is changed")
CC: Jiri Pirko <jiri@resnulli.us>
Reported-by: Julien Meunier <julien.meunier@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 22:43:07 -04:00
Herbert Xu
c0bb07df7d netlink: Reset portid after netlink_insert failure
The commit c5adde9468 ("netlink:
eliminate nl_sk_hash_lock") breaks the autobind retry mechanism
because it doesn't reset portid after a failed netlink_insert.

This means that should autobind fail the first time around, then
the socket will be stuck in limbo as it can never be bound again
since it already has a non-zero portid.

Fixes: c5adde9468 ("netlink: eliminate nl_sk_hash_lock")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-16 17:08:57 -04:00
David S. Miller
1d6057019e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for your net tree, they are:

1) Fix a leak in IPVS, the sysctl table is not released accordingly when
   destroying a netns, patch from Tommi Rantala.

2) Fix a build error when TPROXY and socket are built-in but IPv6 defrag is
   compiled as module, from Florian Westphal.

3) Fix TCP tracket wrt. RFC5961 challenge ACK when in LAST_ACK state, patch
   from Jesper Dangaard Brouer.

4) Fix a bogus WARN_ON() in nf_tables when deleting a set element that stores
   a map, from Mirek Kratochvil.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-16 16:40:22 -04:00
Mirek Kratochvil
960bd2c264 netfilter: nf_tables: fix bogus warning in nft_data_uninit()
The values 0x00000000-0xfffffeff are reserved for userspace datatype. When,
deleting set elements with maps, a bogus warning is triggered.

WARNING: CPU: 0 PID: 11133 at net/netfilter/nf_tables_api.c:4481 nft_data_uninit+0x35/0x40 [nf_tables]()

This fixes the check accordingly to enum definition in
include/linux/netfilter/nf_tables.h

Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=1013
Signed-off-by: Mirek Kratochvil <exa.exa@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-15 22:07:30 +02:00
Jesper Dangaard Brouer
b3cad287d1 conntrack: RFC5961 challenge ACK confuse conntrack LAST-ACK transition
In compliance with RFC5961, the network stack send challenge ACK in
response to spurious SYN packets, since commit 0c228e833c ("tcp:
Restore RFC5961-compliant behavior for SYN packets").

This pose a problem for netfilter conntrack in state LAST_ACK, because
this challenge ACK is (falsely) seen as ACKing last FIN, causing a
false state transition (into TIME_WAIT).

The challenge ACK is hard to distinguish from real last ACK.  Thus,
solution introduce a flag that tracks the potential for seeing a
challenge ACK, in case a SYN packet is let through and current state
is LAST_ACK.

When conntrack transition LAST_ACK to TIME_WAIT happens, this flag is
used for determining if we are expecting a challenge ACK.

Scapy based reproducer script avail here:
 https://github.com/netoptimizer/network-testing/blob/master/scapy/tcp_hacks_3WHS_LAST_ACK.py

Fixes: 0c228e833c ("tcp: Restore RFC5961-compliant behavior for SYN packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-15 20:50:56 +02:00
Florian Westphal
595ca5880b netfilter: avoid build error if TPROXY/SOCKET=y && NF_DEFRAG_IPV6=m
With TPROXY=y but DEFRAG_IPV6=m we get build failure:

net/built-in.o: In function `tproxy_tg_init':
net/netfilter/xt_TPROXY.c:588: undefined reference to `nf_defrag_ipv6_enable'

If DEFRAG_IPV6 is modular, TPROXY must be too.
(or both must be builtin).

This enforces =m for both.

Reported-and-tested-by: Liu Hua <liusdu@126.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-15 20:18:27 +02:00
Roopa Prabhu
eea39946a1 rename RTNH_F_EXTERNAL to RTNH_F_OFFLOAD
RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output.

This patch renames the flag to be consistent with what the user sees.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 22:45:39 -04:00
Vlad Yasevich
e87a468eb9 ipv6: Fix udp checksums with raw sockets
It was reported that trancerout6 would cause
a kernel to crash when trying to compute checksums
on raw UDP packets.  The cause was the check in
__ip6_append_data that would attempt to use
partial checksums on the packet.  However,
raw sockets do not initialize partial checksum
fields so partial checksums can't be used.

Solve this the same way IPv4 does it.  raw sockets
pass transhdrlen value of 0 to ip_append_data which
causes the checksum to be computed in software.  Use
the same check in ip6_append_data (check transhdrlen).

Reported-by: Wolfgang Walter <linux@stwm.de>
CC: Wolfgang Walter <linux@stwm.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 22:27:03 -04:00
Eric Dumazet
91dd93f956 netlink: move nl_table in read_mostly section
netlink sockets creation and deletion heavily modify nl_table_users
and nl_table_lock.

If nl_table is sharing one cache line with one of them, netlink
performance is really bad on SMP.

ffffffff81ff5f00 B nl_table
ffffffff81ff5f0c b nl_table_users

Putting nl_table in read_mostly section increased performance
of my open/delete netlink sockets test by about 80 %

This came up while diagnosing a getaddrinfo() problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 17:49:06 -04:00
Wesley Kuo
177d0506a9 Bluetooth: Fix remote name event return directly.
This patch fixes hci_remote_name_evt dose not resolve name during
discovery status is RESOLVING. Before simultaneous dual mode scan enabled,
hci_check_pending_name will set discovery status to STOPPED eventually.

Signed-off-by: Wesley Kuo <wesley.kuo@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-14 10:35:04 +02:00
Vlad Yasevich
be346ffaad vlan: Correctly propagate promisc|allmulti flags in notifier.
Currently vlan notifier handler will try to update all vlans
for a device when that device comes up.  A problem occurs,
however, when the vlan device was set to promiscuous, but not
by the user (ex: a bridge).  In that case, dev->gflags are
not updated.  What results is that the lower device ends
up with an extra promiscuity count.  Here are the
backtraces that prove this:
[62852.052179]  [<ffffffff814fe248>] __dev_set_promiscuity+0x38/0x1e0
[62852.052186]  [<ffffffff8160bcbb>] ? _raw_spin_unlock_bh+0x1b/0x40
[62852.052188]  [<ffffffff814fe4be>] ? dev_set_rx_mode+0x2e/0x40
[62852.052190]  [<ffffffff814fe694>] dev_set_promiscuity+0x24/0x50
[62852.052194]  [<ffffffffa0324795>] vlan_dev_open+0xd5/0x1f0 [8021q]
[62852.052196]  [<ffffffff814fe58f>] __dev_open+0xbf/0x140
[62852.052198]  [<ffffffff814fe88d>] __dev_change_flags+0x9d/0x170
[62852.052200]  [<ffffffff814fe989>] dev_change_flags+0x29/0x60

The above comes from the setting the vlan device to IFF_UP state.

[62852.053569]  [<ffffffff814fe248>] __dev_set_promiscuity+0x38/0x1e0
[62852.053571]  [<ffffffffa032459b>] ? vlan_dev_set_rx_mode+0x2b/0x30
[8021q]
[62852.053573]  [<ffffffff814fe8d5>] __dev_change_flags+0xe5/0x170
[62852.053645]  [<ffffffff814fe989>] dev_change_flags+0x29/0x60
[62852.053647]  [<ffffffffa032334a>] vlan_device_event+0x18a/0x690
[8021q]
[62852.053649]  [<ffffffff8161036c>] notifier_call_chain+0x4c/0x70
[62852.053651]  [<ffffffff8109d456>] raw_notifier_call_chain+0x16/0x20
[62852.053653]  [<ffffffff814f744d>] call_netdevice_notifiers+0x2d/0x60
[62852.053654]  [<ffffffff814fe1a3>] __dev_notify_flags+0x33/0xa0
[62852.053656]  [<ffffffff814fe9b2>] dev_change_flags+0x52/0x60
[62852.053657]  [<ffffffff8150cd57>] do_setlink+0x397/0xa40

And this one comes from the notification code.  What we end
up with is a vlan with promiscuity count of 1 and and a physical
device with a promiscuity count of 2.  They should both have
a count 1.

To resolve this issue, vlan code can use dev_get_flags() api
which correctly masks promiscuity and allmulti flags.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 00:54:32 -04:00
Linus Torvalds
110bc76729 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle max TX power properly wrt VIFs and the MAC in iwlwifi, from
    Avri Altman.

 2) Use the correct FW API for scan completions in iwlwifi, from Avraham
    Stern.

 3) FW monitor in iwlwifi accidently uses unmapped memory, fix from Liad
    Kaufman.

 4) rhashtable conversion of mac80211 station table was buggy, the
    virtual interface was not taken into account.  Fix from Johannes
    Berg.

 5) Fix deadlock in rtlwifi by not using a zero timeout for
    usb_control_msg(), from Larry Finger.

 6) Update reordering state before calculating loss detection, from
    Yuchung Cheng.

 7) Fix off by one in bluetooth firmward parsing, from Dan Carpenter.

 8) Fix extended frame handling in xiling_can driver, from Jeppe
    Ledet-Pedersen.

 9) Fix CODEL packet scheduler behavior in the presence of TSO packets,
    from Eric Dumazet.

10) Fix NAPI budget testing in fm10k driver, from Alexander Duyck.

11) macvlan needs to propagate promisc settings down the the lower
    device, from Vlad Yasevich.

12) igb driver can oops when changing number of rings, from Toshiaki
    Makita.

13) Source specific default routes not handled properly in ipv6, from
    Markus Stenberg.

14) Use after free in tc_ctl_tfilter(), from WANG Cong.

15) Use softirq spinlocking in netxen driver, from Tony Camuso.

16) Two ARM bpf JIT fixes from Nicolas Schichan.

17) Handle MSG_DONTWAIT properly in ring based AF_PACKET sends, from
    Mathias Kretschmer.

18) Fix x86 bpf JIT implementation of FROM_{BE16,LE16,LE32}, from Alexei
    Starovoitov.

19) ll_temac driver DMA maps TX packet header with incorrect length, fix
    from Michal Simek.

20) We removed pm_qos bits from netdevice.h, but some indirect
    references remained.  Kill them.  From David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
  net: Remove remaining remnants of pm_qos from netdevice.h
  e1000e: Add pm_qos header
  net: phy: micrel: Fix regression in kszphy_probe
  net: ll_temac: Fix DMA map size bug
  x86: bpf_jit: fix FROM_BE16 and FROM_LE16/32 instructions
  netns: return RTM_NEWNSID instead of RTM_GETNSID on a get
  Update be2net maintainers' email addresses
  net_sched: gred: use correct backlog value in WRED mode
  pppoe: drop pppoe device in pppoe_unbind_sock_work
  net: qca_spi: Fix possible race during probe
  net: mdio-gpio: Allow for unspecified bus id
  af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT).
  bnx2x: limit fw delay in kdump to 5s after boot
  ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits.
  ARM: net fix emit_udiv() for BPF_ALU | BPF_DIV | BPF_K intruction.
  mpls: Change reserved label names to be consistent with netbsd
  usbnet: avoid integer overflow in start_xmit
  netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2)
  net: xgene_enet: Set hardware dependency
  net: amd-xgbe: Add hardware dependency
  ...
2015-05-12 21:10:38 -07:00
Nicolas Dichtel
e3d8ecb70e netns: return RTM_NEWNSID instead of RTM_GETNSID on a get
Usually, RTM_NEWxxx is returned on a get (same as a dump).

Fixes: 0c7aecd4bd ("netns: add rtnl cmd to add and get peer netns ids")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:53:25 -04:00
David Ward
145a42b3a9 net_sched: gred: use correct backlog value in WRED mode
In WRED mode, the backlog for a single virtual queue (VQ) should not be
used to determine queue behavior; instead the backlog is summed across
all VQs. This sum is currently used when calculating the average queue
lengths. It also needs to be used when determining if the queue's hard
limit has been reached, or when reporting each VQ's backlog via netlink.
q->backlog will only be used if the queue switches out of WRED mode.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 13:26:26 -04:00
Kretschmer, Mathias
fbf33a2802 af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT).
This patch fixes an issue where the send(MSG_DONTWAIT) call
on a TX_RING is not fully non-blocking in cases where the device's sndBuf is
full. We pass nonblock=true to sock_alloc_send_skb() and return any possibly
occuring error code (most likely EGAIN) to the caller. As the fast-path stays
as it is, we keep the unlikely() around skb == NULL.

Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-10 19:40:08 -04:00
Tom Herbert
78f5b89919 mpls: Change reserved label names to be consistent with netbsd
Since these are now visible to userspace it is nice to be consistent
with BSD (sys/netmpls/mpls.h in netBSD).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 22:29:50 -04:00
WANG Cong
d744318574 net_sched: fix a use-after-free in tc_ctl_tfilter()
When tcf_destroy() returns true, tp could be already destroyed,
we should not use tp->next after that.

For long term, we probably should move tp list to list_head.

Fixes: 1e052be69d ("net_sched: destroy proto tp when all filters are gone")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 16:14:04 -04:00
Sowmini Varadhan
c82ac7e69e net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket.
When the peer of an RDS-TCP connection restarts, a reconnect
attempt should only be made from the active side  of the TCP
connection, i.e. the side that has a transient TCP port
number. Do not add the passive side of the TCP connection
to the c_hash_node and thus avoid triggering rds_queue_reconnect()
for passive rds connections.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 16:03:28 -04:00
Sowmini Varadhan
f711a6ae06 net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection.
When running RDS over TCP, the active (client) side connects to the
listening ("passive") side at the RDS_TCP_PORT.  After the connection
is established, if the client side reboots (potentially without even
sending a FIN) the server still has a TCP socket in the esablished
state.  If the server-side now gets a new SYN comes from the client
with a different client port, TCP will create a new socket-pair, but
the RDS layer will incorrectly pull up the old rds_connection (which
is still associated with the stale t_sock and RDS socket state).

This patch corrects this behavior by having rds_tcp_accept_one()
always create a new connection for an incoming TCP SYN.
The rds and tcp state associated with the old socket-pair is cleaned
up via the rds_tcp_state_change() callback which would typically be
invoked in most cases when the client-TCP sends a FIN on TCP restart,
triggering a transition to CLOSE_WAIT state. In the rarer event of client
death without a FIN, TCP_KEEPALIVE probes on the socket will detect
the stale socket, and the TCP transition to CLOSE state will trigger
the RDS state cleanup.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 16:03:27 -04:00
Markus Stenberg
e16e888b52 ipv6: Fixed source specific default route handling.
If there are only IPv6 source specific default routes present, the
host gets -ENETUNREACH on e.g. connect() because ip6_dst_lookup_tail
calls ip6_route_output first, and given source address any, it fails,
and ip6_route_get_saddr is never called.

The change is to use the ip6_route_get_saddr, even if the initial
ip6_route_output fails, and then doing ip6_route_output _again_ after
we have appropriate source address available.

Note that this is '99% fix' to the problem; a correct fix would be to
do route lookups only within addrconf.c when picking a source address,
and never call ip6_route_output before source address has been
populated.

Signed-off-by: Markus Stenberg <markus.stenberg@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 15:58:41 -04:00
David S. Miller
0a801445db Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
Here are a couple of important Bluetooth & mac802154 fixes for 4.1:

 - mac802154 fix for crypto algorithm allocation failure checking
 - mac802154 wpan phy leak fix for error code path
 - Fix for not calling Bluetooth shutdown() if interface is not up

Let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 15:51:00 -04:00
Tommi Rantala
f30bf2a5ca ipvs: fix memory leak in ip_vs_ctl.c
Fix memory leak introduced in commit a0840e2e16 ("IPVS: netns,
ip_vs_ctl local vars moved to ipvs struct."):

unreferenced object 0xffff88005785b800 (size 2048):
  comm "(-localed)", pid 1434, jiffies 4294755650 (age 1421.089s)
  hex dump (first 32 bytes):
    bb 89 0b 83 ff ff ff ff b0 78 f0 4e 00 88 ff ff  .........x.N....
    04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8262ea8e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811fba74>] __kmalloc_track_caller+0x244/0x430
    [<ffffffff811b88a0>] kmemdup+0x20/0x50
    [<ffffffff823276b7>] ip_vs_control_net_init+0x1f7/0x510
    [<ffffffff8231d630>] __ip_vs_init+0x100/0x250
    [<ffffffff822363a1>] ops_init+0x41/0x190
    [<ffffffff82236583>] setup_net+0x93/0x150
    [<ffffffff82236cc2>] copy_net_ns+0x82/0x140
    [<ffffffff810ab13d>] create_new_namespaces+0xfd/0x190
    [<ffffffff810ab49a>] unshare_nsproxy_namespaces+0x5a/0xc0
    [<ffffffff810833e3>] SyS_unshare+0x173/0x310
    [<ffffffff8265cbd7>] system_call_fastpath+0x12/0x6f
    [<ffffffffffffffff>] 0xffffffffffffffff

Fixes: a0840e2e16 ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-05-08 17:58:54 +09:00
Eric Dumazet
31ccd0e66d tcp_westwood: fix tcp_westwood_info()
I forgot to update tcp_westwood when changing get_info() behavior,
this patch should fix this.

Fixes: 64f40ff5bb ("tcp: prepare CC get_info() access from getsockopt()")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05 19:50:09 -04:00
Tom Herbert
c967a0873a mpls: Move reserved label definitions
Move to include/uapi/linux/mpls.h to be externally visibile.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05 19:40:36 -04:00
David S. Miller
b7ba7b469a We have only a few fixes right now:
* a fix for an issue with hash collision handling in the
    rhashtable conversion
  * a merge issue - rhashtable removed default shrinking
    just before mac80211 was converted, so enable it now
  * remove an invalid WARN that can trigger with legitimate
    userspace behaviour
  * add a struct member missing from kernel-doc that caused
    a lot of warnings
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVR1FIAAoJEDBSmw7B7bqrbkQP/RPleNMLHNOLRoJva6vNVers
 hiyLwG+lpV9yHCUX1I7TsvyTHddjW3ANNC1LvlKyUiNzBBsr/Yg/Qe1RVGkRTK7E
 ZPTU7xutk2nSLjvbVLjTQtIAynKZM6W9vM3X16WtsZ9+25z4e8jN8BjhzP99Vs7X
 IkwL7Qe2q1O/ta+/ByVR4dFWtGyJDxRjXkFs4cQ/VdL/fJalKxIa9YJrmUwIBMOU
 UIuJ1lh7kkNFGky+4Y3dBA/2L+O/iS64dxe+ygLMJGf9xdGGHwyuM6oe6OKEKPi1
 pF0NIWgHRGOUEJNyUKZTOwCyk1PeOr9aFFDHN4GJyE463KzZqVYdA4ajI6udwL2h
 I/AjsBnM0zQcoBD8HUeJK2ZgndPxOrMZGVjs0/M3K9lFocYBALoLkt5YPD1PXOZs
 9EQ8mhej3/xbwyEaaAp+HJPOr/wLZ//juxsPP8HSR1BDm8cR40rd9rETEGWnmrxt
 5vpmb2vNpKekw+pKf2vwuEH6BwH02nuz4YPudTswiEzObEQAuGx+rpopQVI9bdgE
 V/r3WY0cl+bDMrcSpoZ7v6FDoHHA/zPq9aAAV0JsMyYApY0nMdqwShBJq5cTSoK9
 rK5I6oIbqChskpoWULUVbbjb1fLsNVI4C89KjxW8xziTEY1d2fEeeJ2tR0lYBX/5
 /KOygSze3Qi0MjrdE9D2
 =sf/A
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
We have only a few fixes right now:
 * a fix for an issue with hash collision handling in the
   rhashtable conversion
 * a merge issue - rhashtable removed default shrinking
   just before mac80211 was converted, so enable it now
 * remove an invalid WARN that can trigger with legitimate
   userspace behaviour
 * add a struct member missing from kernel-doc that caused
   a lot of warnings
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 16:00:55 -04:00
David S. Miller
73e84313ee Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-05-04

Here's the first bluetooth-next pull request for 4.2:

 - Various fixes for at86rf230 driver
 - ieee802154: trace events support for rdev->ops
 - HCI UART driver refactoring
 - New Realtek IDs added to btusb driver
 - Off-by-one fix for rtl8723b in btusb driver
 - Refactoring of btbcm driver for both UART & USB use

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 15:36:07 -04:00
David Ahern
e2783717a7 net/rds: Fix new sparse warning
c0adf54a10 introduced new sparse warnings:
  CHECK   /home/dahern/kernels/linux.git/net/rds/ib_cm.c
net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types)
net/rds/ib_cm.c:191:34:    expected unsigned long long [unsigned] [usertype] dp_ack_seq
net/rds/ib_cm.c:191:34:    got restricted __be64 <noident>
net/rds/ib_cm.c:194:51: warning: cast to restricted __be64

The temporary variable for sequence number should have been declared as __be64
rather than u64. Make it so.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 15:23:06 -04:00
Vlad Yasevich
d66bf7dd27 net: core: Correct an over-stringent device loop detection.
The code in __netdev_upper_dev_link() has an over-stringent
loop detection logic that actually prevents valid configurations
from working correctly.

In particular, the logic returns an error if an upper device
is already in the list of all upper devices for a given dev.
This particular check seems to be a overzealous as it disallows
perfectly valid configurations.  For example:
  # ip l a link eth0 name eth0.10 type vlan id 10
  # ip l a dev br0 typ bridge
  # ip l s eth0.10 master br0
  # ip l s eth0 master br0  <--- Will fail

If you switch the last two commands (add eth0 first), then both
will succeed.  If after that, you remove eth0 and try to re-add
it, it will fail!

It appears to be enough to simply check adj_list to keeps things
safe.

I've tried stacking multiple devices multiple times in all different
combinations, and either rx_handler registration prevented the stacking
of the device linking cought the error.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 14:57:59 -04:00
Scott Mayhew
9507271d96 svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures
In an environment where the KDC is running Active Directory, the
exported composite name field returned in the context could be large
enough to span a page boundary.  Attaching a scratch buffer to the
decoding xdr_stream helps deal with those cases.

The case where we saw this was actually due to behavior that's been
fixed in newer gss-proxy versions, but we're fixing it here too.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-05-04 12:02:40 -04:00
Herbert Xu
2e70aedd3d Revert "net: kernel socket should be released in init_net namespace"
This reverts commit c243d7e209.

That patch is solving a non-existant problem while creating a
real problem.  Just because a socket is allocated in the init
name space doesn't mean that it gets hashed in the init name space.

When we unhash it the name space must be the same as the one
we had when we hashed it.  So this patch is completely bogus
and causes socket leaks.

Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 00:13:16 -04:00
shamir rabinovitch
c0adf54a10 net/rds: fix unaligned memory access
rdma_conn_param private data is copied using memcpy after headers such
as cma_hdr (see cma_resolve_ib_udp as example). so the start of the
private data is aligned to the end of the structure that come before. if
this structure end with u32 the meaning is that the start of the private
data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are
naturally aligned but in case the structure start is not 8 bytes aligned,
all u64 members of this structure will not be aligned. to solve this issue
we must use special macros that allow unaligned access to those
unaligned members.

Addresses the following kernel log seen when attempting to use RDMA:

Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma]

Acked-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
[Minor tweaks for top of tree by:]
Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 23:39:01 -04:00
Herbert Xu
edac450d5b netlink: Remove max_size setting
We currently limit the hash table size to 64K which is very bad
as even 10 years ago it was relatively easy to generate millions
of sockets.

Since the hash table is naturally limited by memory allocation
failure, we don't really need an explicit limit so this patch
removes it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 23:27:02 -04:00
Eric Dumazet
a5d2809040 codel: fix maxpacket/mtu confusion
Under presence of TSO/GSO/GRO packets, codel at low rates can be quite
useless. In following example, not a single packet was ever dropped,
while average delay in codel queue is ~100 ms !

qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
 Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 13626b 3p requeues 0
  count 0 lastcount 0 ldelay 96.9ms drop_next 0us
  maxpacket 9084 ecn_mark 0 drop_overlimit 0

This comes from a confusion of what should be the minimal backlog. It is
pretty clear it is not 64KB or whatever max GSO packet ever reached the
qdisc.

codel intent was to use MTU of the device.

After the fix, we finally drop some packets, and rtt/cwnd of my single
TCP flow are meeting our expectations.

qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
 Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0)
 backlog 6056b 3p requeues 0
  count 1 lastcount 1 ldelay 36.3ms drop_next 0us
  maxpacket 10598 ecn_mark 0 drop_overlimit 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 22:17:40 -04:00
David S. Miller
a134f083e7 ipv4: Missing sk_nulls_node_init() in ping_unhash().
If we don't do that, then the poison value is left in the ->pprev
backlink.

This can cause crashes if we do a disconnect, followed by a connect().

Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Wen Xu <hotdog3645@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01 22:02:47 -04:00
Alexander Aring
1add156466 ieee802154: trace: fix endian convertion
This patch fix endian convertions for extended address and short address
handling when TP_printk is called.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30 18:48:11 +02:00
Varka Bhadram
5b4a103904 cfg802154: pass name_assign_type to rdev_add_virtual_intf()
This code is based on commit 6bab2e19c5
("cfg80211: pass name_assign_type to rdev_add_virtual_intf()")

This will expose in sysfs whether the ifname of a IEEE-802.15.4
device is set by userspace or generated by the kernel.
We are using two types of name_assign_types
 o NET_NAME_ENUM: Default interface name provided by kernel
 o NET_NAME_USER: Interface name provided by user.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30 18:48:10 +02:00
Guido Günther
1cc800e7aa ieee802154: Add trace events for rdev->ops
Enabling tracing via

 echo 1 > /sys/kernel/debug/tracing/events/cfg802154/enable

enables event tracing like

 iwpan dev wpan0 set pan_id 0xbeef
 cat /sys/kernel/debug/tracing/trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 2/2   #P:1
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
            iwpan-2663  [000] ....   170.369142: 802154_rdev_set_pan_id: phy0, wpan_dev(1), pan id: 0xbeef
            iwpan-2663  [000] ....   170.369177: 802154_rdev_return_int: phy0, returned: 0

Signed-off-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30 18:48:09 +02:00
Wei Yongjun
89eb6d0677 mac802154: llsec: fix return value check in llsec_key_alloc()
In case of error, the functions crypto_alloc_aead() and crypto_alloc_blkcipher()
returns ERR_PTR() and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30 18:46:28 +02:00
Alexander Aring
2b4d413c38 mac802154: fix ieee802154_register_hw error handling
Currently if ieee802154_if_add failed, we don't unregister the wpan phy
which was registered before. This patch adds a correct error handling
for unregister the wpan phy when ieee802154_if_add failed.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30 18:46:21 +02:00
Gabriele Mazzotta
d24d81444f Bluetooth: Skip the shutdown routine if the interface is not up
Most likely, the shutdown routine requires the interface to be up.
This is the case for BTUSB_INTEL: the routine tries to send a command
to the interface, but since this one is down, it fails and exits once
HCI_INIT_TIMEOUT has expired.

Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 4.0.x
2015-04-30 18:45:27 +02:00
Yuchung Cheng
9dac883544 tcp: update reordering first before detecting loss
tcp_mark_lost_retrans is not used when FACK is disabled. Since
tcp_update_reordering may disable FACK, it should be called first
before tcp_mark_lost_retrans.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 17:10:38 -04:00
Eric Dumazet
6e9250f59e tcp: add TCP_CC_INFO socket option
Some Congestion Control modules can provide per flow information,
but current way to get this information is to use netlink.

Like TCP_INFO, let's add TCP_CC_INFO so that applications can
issue a getsockopt() if they have a socket file descriptor,
instead of playing complex netlink games.

Sample usage would be :

  union tcp_cc_info info;
  socklen_t len = sizeof(info);

  if (getsockopt(fd, SOL_TCP, TCP_CC_INFO, &info, &len) == -1)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 17:10:38 -04:00
Eric Dumazet
64f40ff5bb tcp: prepare CC get_info() access from getsockopt()
We would like that optional info provided by Congestion Control
modules using netlink can also be read using getsockopt()

This patch changes get_info() to put this information in a buffer,
instead of skb, like tcp_get_info(), so that following patch
can reuse this common infrastructure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 17:10:38 -04:00
Eric Dumazet
bdd1f9edac tcp: add tcpi_bytes_received to tcp_info
This patch tracks total number of payload bytes received on a TCP socket.
This is the sum of all changes done to tp->rcv_nxt

RFC4898 named this : tcpEStatsAppHCThruOctetsReceived

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_received was placed near tp->rcv_nxt for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 17:10:37 -04:00
Eric Dumazet
0df48c26d8 tcp: add tcpi_bytes_acked to tcp_info
This patch tracks total number of bytes acked for a TCP socket.
This is the sum of all changes done to tp->snd_una, and allows
for precise tracking of delivered data.

RFC4898 named this : tcpEStatsAppHCThruOctetsAcked

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_acked was placed near tp->snd_una for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 17:10:37 -04:00
Guenter Roeck
50d4964f1d net: dsa: Fix scope of eeprom-length property
eeprom-length is a switch property, not a dsa property, and thus
needs to be attached to the switch node, not to the dsa node.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 6793abb4e8 ("net: dsa: Add support for switch EEPROM access")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 15:35:04 -04:00
Jon Paul Maloy
0d699f28ee tipc: fix problem with parallel link synchronization mechanism
Currently, we try to accumulate arrived packets in the links's
'deferred' queue during the parallel link syncronization phase.

This entails two problems:

- With an unlucky combination of arriving packets the algorithm
  may go into a lockstep with the out-of-sequence handling function,
  where the synch mechanism is adding a packet to the deferred queue,
  while the out-of-sequence handling is retrieving it again, thus
  ending up in a loop inside the node_lock scope.

- Even if this is avoided, the link will very often send out
  unnecessary protocol messages, in the worst case leading to
  redundant retransmissions.

We fix this by just dropping arriving packets on the upcoming link
during the synchronization phase, thus relying on the retransmission
protocol to resolve the situation once the two links have arrived to
a synchronized state.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 15:08:59 -04:00