Commit Graph

48399 Commits

Author SHA1 Message Date
David S. Miller
f8ddadc4db Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
There were quite a few overlapping sets of changes here.

Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.

Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly.  If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.

In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().

Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.

The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 13:39:14 +01:00
Linus Torvalds
b5ac3beb5a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A little more than usual this time around. Been travelling, so that is
  part of it.

  Anyways, here are the highlights:

   1) Deal with memcontrol races wrt. listener dismantle, from Eric
      Dumazet.

   2) Handle page allocation failures properly in nfp driver, from Jaku
      Kicinski.

   3) Fix memory leaks in macsec, from Sabrina Dubroca.

   4) Fix crashes in pppol2tp_session_ioctl(), from Guillaume Nault.

   5) Several fixes in bnxt_en driver, including preventing potential
      NVRAM parameter corruption from Michael Chan.

   6) Fix for KRACK attacks in wireless, from Johannes Berg.

   7) rtnetlink event generation fixes from Xin Long.

   8) Deadlock in mlxsw driver, from Ido Schimmel.

   9) Disallow arithmetic operations on context pointers in bpf, from
      Jakub Kicinski.

  10) Missing sock_owned_by_user() check in sctp_icmp_redirect(), from
      Xin Long.

  11) Only TCP is supported for sockmap, make that explicit with a
      check, from John Fastabend.

  12) Fix IP options state races in DCCP and TCP, from Eric Dumazet.

  13) Fix panic in packet_getsockopt(), also from Eric Dumazet.

  14) Add missing locked in hv_sock layer, from Dexuan Cui.

  15) Various aquantia bug fixes, including several statistics handling
      cures. From Igor Russkikh et al.

  16) Fix arithmetic overflow in devmap code, from John Fastabend.

  17) Fix busted socket memory accounting when we get a fault in the tcp
      zero copy paths. From Willem de Bruijn.

  18) Don't leave opt->tot_len uninitialized in ipv6, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
  ipv6: flowlabel: do not leave opt->tot_len with garbage
  of_mdio: Fix broken PHY IRQ in case of probe deferral
  textsearch: fix typos in library helpers
  rxrpc: Don't release call mutex on error pointer
  net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
  net: stmmac: Fix stmmac_get_rx_hwtstamp()
  net: stmmac: Add missing call to dev_kfree_skb()
  mlxsw: spectrum_router: Configure TIGCR on init
  mlxsw: reg: Add Tunneling IPinIP General Configuration Register
  net: ethtool: remove error check for legacy setting transceiver type
  soreuseport: fix initialization race
  net: bridge: fix returning of vlan range op errors
  sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
  bpf: add test cases to bpf selftests to cover all access tests
  bpf: fix pattern matches for direct packet access
  bpf: fix off by one for range markings with L{T, E} patterns
  bpf: devmap fix arithmetic overflow in bitmap_size calculation
  net: aquantia: Bad udp rate on default interrupt coalescing
  net: aquantia: Enable coalescing management via ethtool interface
  ...
2017-10-21 22:44:48 -04:00
Eric Dumazet
864e2a1f8a ipv6: flowlabel: do not leave opt->tot_len with garbage
When syzkaller team brought us a C repro for the crash [1] that
had been reported many times in the past, I finally could find
the root cause.

If FlowLabel info is merged by fl6_merge_options(), we leave
part of the opt_space storage provided by udp/raw/l2tp with random value
in opt_space.tot_len, unless a control message was provided at sendmsg()
time.

Then ip6_setup_cork() would use this random value to perform a kzalloc()
call. Undefined behavior and crashes.

Fix is to properly set tot_len in fl6_merge_options()

At the same time, we can also avoid consuming memory and cpu cycles
to clear it, if every option is copied via a kmemdup(). This is the
change in ip6_setup_cork().

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cb64a100 task.stack: ffff8801cc350000
RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
 ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
 udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x358/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4520a9
RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:22:24 +01:00
Lawrence Brakmo
85cce21578 bpf: Add BPF_SOCKET_OPS_BASE_RTT support to tcp_nv
TCP_NV will try to get the base RTT from a socket_ops BPF program if one
is loaded. NV will then use the base RTT to bound its min RTT (its
notion of the base RTT). It uses the base RTT as an upper bound and 80%
of the base RTT as its lower bound.

In other words, NV will consider filtered RTTs larger than base RTT as a
sign of congestion. As a result, there is no minRTT inflation when there
is a lot of congestion. For example, in a DC where the RTTs are less
than 40us when there is no congestion, a base RTT value of 80us improves
the performance of NV. The difference between the uncongested RTT and
the base RTT provided represents how much queueing we are willing to
have (in practice it can be higher).

NV has been tunned to reduce congestion when there are many flows at the
cost of one flow not achieving full bandwith utilization. When a
reasonable base RTT is provided, one NV flow can now fully utilize the
full bandwidth. In addition, the performance is also improved when there
are many flows.

In the following examples the NV results are using a kernel with this
patch set (i.e. both NV results are using the new nv_loss_dec_factor).

With one host sending to another host and only one flow the
goodputs are:
  Cubic: 9.3 Gbps, NV: 5.5 Gbps, NV (baseRTT=80us): 9.2 Gbps

With 2 hosts sending to one host (1 flow per host, the goodput per flow
is:
  Cubic: 4.6 Gbps, NV: 4.5 Gbps, NV (baseRTT=80us)L 4.6 Gbps

But the RTTs seen by a ping process in the sender is:
  Cubic: 3.3ms  NV: 97us,  NV (baseRTT=80us): 146us

With a lot of flows things look even better for NV with baseRTT. Here we
have 3 hosts sending to one host. Each sending host has 6 flows: 1
stream, 4x1MB RPC, 1x10KB RPC. Cubic, NV and NV with baseRTT all fully
utilize the full available bandwidth. However, the distribution of
bandwidth among the flows is very different. For the 10KB RPC flow:
  Cubic: 27Mbps, NV: 111Mbps, NV (baseRTT=80us): 222Mbps

The 99% latencies for the 10KB flows are:
  Cubic: 26ms,  NV: 1ms,  NV (baseRTT=80us): 500us

The RTT seen by a ping process at the senders:
  Cubic: 3.2ms  NV: 720us,  NV (baseRTT=80us): 330us

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:12:05 +01:00
Lawrence Brakmo
cd86d1fd21 bpf: Adding helper function bpf_getsockops
Adding support for helper function bpf_getsockops to socket_ops BPF
programs. This patch only supports TCP_CONGESTION.

Signed-off-by: Vlad Vysotsky <vlad@cs.ucla.edu>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:12:05 +01:00
Gustavo A. R. Silva
0cea8e28df net: x25: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:08:46 +01:00
Gustavo A. R. Silva
110af3acb8 net: af_unix: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:07:50 +01:00
David Howells
6cb3ece968 rxrpc: Don't release call mutex on error pointer
Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
call pointer actually holds an error value.

Fixes: 540b1c48c3 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:05:39 +01:00
Jon Maloy
0d5fcebf3c tipc: refactor tipc_sk_timeout() function
The function tipc_sk_timeout() is more complex than necessary, and
even seems to contain an undetected bug. At one of the occurences
where we renew the timer we just order it with (HZ / 20), instead
of (jiffies + HZ / 20);

In this commit we clean up the function.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:36:35 +01:00
Niklas Söderlund
95491e3cf3 net: ethtool: remove error check for legacy setting transceiver type
Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
restores the transceiver type to struct ethtool_link_settings and
convert_link_ksettings_to_legacy_settings() but forgets to remove the
error check for the same in convert_legacy_settings_to_link_ksettings().
This prevents older versions of ethtool to change link settings.

    # ethtool --version
    ethtool version 3.16

    # ethtool -s eth0 autoneg on speed 100 duplex full
    Cannot set new settings: Invalid argument
      not setting speed
      not setting duplex
      not setting autoneg

While newer versions of ethtool works.

    # ethtool --version
    ethtool version 4.10

    # ethtool -s eth0 autoneg on speed 100 duplex full
    [   57.703268] sh-eth ee700000.ethernet eth0: Link is Down
    [   59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx

Fixes: 19cab88726 ("net: ethtool: Add back transceiver type")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reported-by: Renjith R V <renjith.rv@quest-global.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:14:18 +01:00
Gustavo A. R. Silva
f3ae608edb net: sched: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:07:08 +01:00
Craig Gallek
1b5f962e71 soreuseport: fix initialization race
Syzkaller stumbled upon a way to trigger
WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39

There are two initialization paths for the sock_reuseport structure in a
socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
SO_ATTACH_REUSEPORT_[CE]BPF before bind.  The existing implementation
assumedthat the socket lock protected both of these paths when it actually
only protects the SO_ATTACH_REUSEPORT path.  Syzkaller triggered this
double allocation by running these paths concurrently.

This patch moves the check for double allocation into the reuseport_alloc
function which is protected by a global spin lock.

Fixes: e32ea7e747 ("soreuseport: fast reuseport UDP socket selection")
Fixes: c125e80b88 ("soreuseport: fast reuseport TCP socket selection")
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:03:51 +01:00
Gustavo A. R. Silva
a05b8c43ac net: rose: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:02:26 +01:00
Gustavo A. R. Silva
279badc2a8 openvswitch: conntrack: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I placed a "fall through" comment on
its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:01:26 +01:00
Gustavo A. R. Silva
e28101a37c net: netrom: nr_in: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:00:33 +01:00
Nikolay Aleksandrov
66c5451754 net: bridge: fix returning of vlan range op errors
When vlan tunnels were introduced, vlan range errors got silently
dropped and instead 0 was returned always. Restore the previous
behaviour and return errors to user-space.

Fixes: efa5356b0d ("bridge: per vlan dst_metadata netlink support")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 01:46:32 +01:00
Willem de Bruijn
54d4311764 sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues
after triggering an EFAULT in __zerocopy_sg_from_iter.

On this error, skb_zerocopy_stream_iter resets the skb to its state
before the operation with __pskb_trim. It cannot kfree_skb like
datagram callers, as the skb may have data from a previous send call.

__pskb_trim calls skb_condense for unowned skbs, which adjusts their
truesize. These tcp skbuffs are owned and their truesize must add up
to sk_wmem_queued. But they match because their skb->sk is NULL until
tcp_transmit_skb.

Temporarily set skb->sk when calling __pskb_trim to signal that the
skbuffs are owned and avoid the skb_condense path.

Fixes: 52267790ef ("sock: add MSG_ZEROCOPY")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 01:45:52 +01:00
Jon Maloy
cb4dc41eaa tipc: fix broken tipc_poll() function
In commit ae236fb208 ("tipc: receive group membership events via
member socket") we broke the tipc_poll() function by checking the
state of the receive queue before the call to poll_sock_wait(), while
relying that state afterwards, when it might have changed.

We restore this in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 12:27:05 +01:00
Jiri Pirko
8d26d5636d net: sched: avoid ndo_setup_tc calls for TC_SETUP_CLS*
All drivers are converted to use block callbacks for TC_SETUP_CLS*.
So it is now safe to remove the calls to ndo_setup_tc from cls_*

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:08 +01:00
Jiri Pirko
6b3eb752b4 dsa: Convert ndo_setup_tc offloads to block callbacks
Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for matchall offloads to block callbacks.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:08 +01:00
Jiri Pirko
3f7889c4c7 net: sched: cls_bpf: call block callbacks for offload
Use the newly introduced callbacks infrastructure and call block
callbacks alongside with the existing per-netdev ndo_setup_tc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:07 +01:00
Jiri Pirko
245dc5121a net: sched: cls_u32: call block callbacks for offload
Use the newly introduced callbacks infrastructure and call block
callbacks alongside with the existing per-netdev ndo_setup_tc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:07 +01:00
Jiri Pirko
7746041192 net: sched: cls_u32: swap u32_remove_hw_knode and u32_remove_hw_hnode
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:07 +01:00
Jiri Pirko
2447a96f88 net: sched: cls_matchall: call block callbacks for offload
Use the newly introduced callbacks infrastructure and call block
callbacks alongside with the existing per-netdev ndo_setup_tc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:07 +01:00
Jiri Pirko
208c0f4b52 net: sched: use tc_setup_cb_call to call per-block callbacks
Extend the tc_setup_cb_call entrypoint function originally used only for
action egress devices callbacks to call per-block callbacks as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:07 +01:00
Jiri Pirko
acb674428c net: sched: introduce per-block callbacks
Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule for a specific block.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:06 +01:00
Jiri Pirko
6e40cf2d4d net: sched: use extended variants of block_get/put in ingress and clsact qdiscs
Use previously introduced extended variants of block get and put
functions. This allows to specify a binder types specific to clsact
ingress/egress which is useful for drivers to distinguish who actually
got the block.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:06 +01:00
Jiri Pirko
8c4083b30e net: sched: add block bind/unbind notif. and extended block_get/put
Introduce new type of ndo_setup_tc message to propage binding/unbinding
of a block to driver. Call this ndo whenever qdisc gets/puts a block.
Alongside with this, there's need to propagate binder type from qdisc
code down to the notifier. So introduce extended variants of
block_get/put in order to pass this info.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:06 +01:00
Matteo Croce
197df02cb3 udp: make some messages more descriptive
In the UDP code there are two leftover error messages with very few meaning.
Replace them with a more descriptive error message as some users
reported them as "strange network error".

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 02:52:35 +01:00
David S. Miller
c69d75ae15 linux-can-fixes-for-4.14-20171019
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlnohyMTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAe/sog7Dgc9jRFCACmSBz2H1yW5VfjrN3ZgoEoUo8P5qCZ
 7mXvTstTASF20T7kFqNS2K94CMVjBYRUIHvihp0LmPmPAmpmQRNedAssWuZpMflw
 xhaH8T8YWbyzHU2MBZP9WtTjrTilPEPcjvFSgYw6wpHW7VC3S/ffEN8Mj3ymR6bW
 KRw7gemXpvYUxPrGGCzyvFy4G+KptyPcAD6I8ceDUtdkwK4TXGYqEAoSqxtAIUHX
 dgRLzheOv8sDh+dM+ZNpH6UUkzK0gVHQ40lgXydZazEwPPq9zdj2Ec/6qpR5fkUH
 dnCIhzLNiCJOMGSfzkf0Esef1zBpU8oJmILbgf97CdayaW9cA1EUleGW
 =HYTx
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.14-20171019' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2017-10-19

this is a pull request of 11 patches for the upcoming 4.14 release.

There are 6 patches by ZHU Yi for the flexcan driver, that work around
the CAN error handling state transition problems found in various
incarnations of the flexcan IP core.

The patch by Colin Ian King fixes a potential NULL pointer deref in the
CAN broad cast manager (bcm). One patch by me replaces a direct deref of a RCU
protected pointer by rcu_access_pointer. My second patch adds missing
OOM error handling in af_can. A patch by Stefan Mätje for the esd_usb2
driver fixes the dlc in received RTR frames. And the last patch is by
Wolfgang Grandegger, it fixes a busy loop in the gs_usb driver in case
it runs out of TX contexts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 02:30:31 +01:00
Paolo Abeni
b65f164d37 ipv6: let trace_fib6_table_lookup() dereference the fib table
The perf traces for ipv6 routing code show a relevant cost around
trace_fib6_table_lookup(), even if no trace is enabled. This is
due to the fib6_table de-referencing currently performed by the
caller.

Let's the tracing code pay this overhead, passing to the trace
helper the table pointer. This gives small but measurable
performance improvement under UDP flood.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 02:23:38 +01:00
David S. Miller
f730cc9fee Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2017-10-19

Here's the first bluetooth-next pull request targeting the 4.15 kernel
release.

 - Multiple fixes & improvements to the hci_bcm driver
 - DT improvements, e.g. new local-bd-address property
 - Fixes & improvements to ECDH usage. Private key is now generated by
   the crypto subsystem.
 - gcc-4.9 warning fixes

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 02:22:19 +01:00
Dexuan Cui
b4562ca792 hv_sock: add locking in the open/close/release code paths
Without the patch, when hvs_open_connection() hasn't completely established
a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't
inserted the sock into the connected queue), vsock_stream_connect() may see
the sk_state change and return the connection to the userspace, and next
when the userspace closes the connection quickly, hvs_release() may not see
the connection in the connected queue; finally hvs_open_connection()
inserts the connection into the queue, but we won't be able to purge the
connection for ever.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 02:21:08 +01:00
Gavin Shan
0a90e25198 net/ncsi: Fix length of GVI response packet
The length of GVI (GetVersionInfo) response packet should be 40 instead
of 36. This issue was found from /sys/kernel/debug/ncsi/eth0/stats.

 # ethtool --ncsi eth0 swstats
     :
 RESPONSE     OK       TIMEOUT  ERROR
 =======================================
 GVI          0        0        2

With this applied, no error reported on GVI response packets:

 # ethtool --ncsi eth0 swstats
     :
 RESPONSE     OK       TIMEOUT  ERROR
 =======================================
 GVI          2        0        0

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:56:38 +01:00
Gavin Shan
52b4c8627f net/ncsi: Enforce failover on link monitor timeout
The NCSI channel has been configured to provide service if its link
monitor timer is enabled, regardless of its state (inactive or active).
So the timeout event on the link monitor indicates the out-of-service
on that channel, for which a failover is needed.

This sets NCSI_DEV_RESHUFFLE flag to enforce failover on link monitor
timeout, regardless the channel's original state (inactive or active).
Also, the link is put into "down" state to give the failing channel
lowest priority when selecting for the active channel. The state of
failing channel should be set to active in order for deinitialization
and failover to be done.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:56:38 +01:00
Gavin Shan
100ef01f3e net/ncsi: Disable HWA mode when no channels are found
When there are no NCSI channels probed, HWA (Hardware Arbitration)
mode is enabled. It's not correct because HWA depends on the fact:
NCSI channels exist and all of them support HWA mode. This disables
HWA when no channels are probed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:56:38 +01:00
Samuel Mendoza-Jonas
0795fb2021 net/ncsi: Stop monitor if channel times out or is inactive
ncsi_channel_monitor() misses stopping the channel monitor in several
places that it should, causing a WARN_ON_ONCE() to trigger when the
monitor is re-started later, eg:

[  459.040000] WARNING: CPU: 0 PID: 1093 at net/ncsi/ncsi-manage.c:269 ncsi_start_channel_monitor+0x7c/0x90
[  459.040000] CPU: 0 PID: 1093 Comm: kworker/0:3 Not tainted 4.10.17-gaca2fdd #140
[  459.040000] Hardware name: ASpeed SoC
[  459.040000] Workqueue: events ncsi_dev_work
[  459.040000] [<80010094>] (unwind_backtrace) from [<8000d950>] (show_stack+0x20/0x24)
[  459.040000] [<8000d950>] (show_stack) from [<801dbf70>] (dump_stack+0x20/0x28)
[  459.040000] [<801dbf70>] (dump_stack) from [<80018d7c>] (__warn+0xe0/0x108)
[  459.040000] [<80018d7c>] (__warn) from [<80018e70>] (warn_slowpath_null+0x30/0x38)
[  459.040000] [<80018e70>] (warn_slowpath_null) from [<803f6a08>] (ncsi_start_channel_monitor+0x7c/0x90)
[  459.040000] [<803f6a08>] (ncsi_start_channel_monitor) from [<803f7664>] (ncsi_configure_channel+0xdc/0x5fc)
[  459.040000] [<803f7664>] (ncsi_configure_channel) from [<803f8160>] (ncsi_dev_work+0xac/0x474)
[  459.040000] [<803f8160>] (ncsi_dev_work) from [<8002d244>] (process_one_work+0x1e0/0x450)
[  459.040000] [<8002d244>] (process_one_work) from [<8002d510>] (worker_thread+0x5c/0x570)
[  459.040000] [<8002d510>] (worker_thread) from [<80033614>] (kthread+0x124/0x164)
[  459.040000] [<80033614>] (kthread) from [<8000a5e8>] (ret_from_fork+0x14/0x2c)

This also updates the monitor instead of just returning if
ncsi_xmit_cmd() fails to send the get-link-status command so that the
monitor properly times out.

Fixes: e6f44ed6d0 "net/ncsi: Package and channel management"

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:56:38 +01:00
Samuel Mendoza-Jonas
6850d0f8b2 net/ncsi: Fix AEN HNCDSC packet length
Correct the value of the HNCDSC AEN packet.
Fixes: 7a82ecf4cf "net/ncsi: NCSI AEN packet handler"

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:56:38 +01:00
Eric Dumazet
164a5e7ad5 ipv4: ipv4_default_advmss() should use route mtu
ipv4_default_advmss() incorrectly uses the device MTU instead
of the route provided one. IPv6 has the proper behavior,
lets harmonize the two protocols.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:55:05 +01:00
Eric Dumazet
509c7a1ecc packet: avoid panic in packet_getsockopt()
syzkaller got crashes in packet_getsockopt() processing
PACKET_ROLLOVER_STATS command while another thread was managing
to change po->rollover

Using RCU will fix this bug. We might later add proper RCU annotations
for sparse sake.

In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu()
variant, as spotted by John.

Fixes: a9b6391814 ("packet: rollover statistics")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: John Sperbeck <jsperbeck@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:51:34 +01:00
David S. Miller
520d0d75dd Merge branch 'ieee802154-for-davem-2017-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:

====================
pull-request: ieee802154 2017-10-18

Please find below a pull request from the ieee802154 subsystem for net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:45:56 +01:00
Eric Dumazet
ba233b3474 tcp: fix tcp_send_syn_data()
syn_data was allocated by sk_stream_alloc_skb(), meaning
its destructor and _skb_refdst fields are mangled.

We need to call tcp_skb_tsorted_anchor_cleanup() before
calling kfree_skb() or kernel crashes.

Bug was reported by syzkaller bot.

Fixes: e2080072ed ("tcp: new list for sent but unacked skbs for RACK recovery")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:44:05 +01:00
Paolo Abeni
1859bac04f ipv6: remove from fib tree aged out RTF_CACHE dst
The commit 2b760fcf5c ("ipv6: hook up exception table to store
dst cache") partially reverted the commit 1e2ea8ad37 ("ipv6: set
dst.obsolete when a cached route has expired").

As a result, RTF_CACHE dst referenced outside the fib tree will
not be removed until the next sernum change; dst_check() does not
fail on aged-out dst, and dst->__refcnt can't decrease: the aged
out dst will stay valid for a potentially unlimited time after the
timeout expiration.

This change explicitly removes RTF_CACHE dst from the fib tree when
aged out. The rt6_remove_exception() logic will then obsolete the
dst and other entities will drop the related reference on next
dst_check().

pMTU exceptions are not aged-out, and are removed from the exception
table only when the - usually considerably longer - ip6_rt_mtu_expires
timeout expires.

v1 -> v2:
  - do not touch dst.obsolete in rt6_remove_exception(), not needed
v2 -> v3:
  - take care of pMTU exceptions, too

Fixes: 2b760fcf5c ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:39:10 +01:00
Paolo Abeni
b886d5f2f2 ipv6: start fib6 gc on RTF_CACHE dst creation
After the commit 2b760fcf5c ("ipv6: hook up exception table
to store dst cache"), the fib6 gc is not started after the
creation of a RTF_CACHE via a redirect or pmtu update, since
fib6_add() isn't invoked anymore for such dsts.

We need the fib6 gc to run periodically to clean the RTF_CACHE,
or the dst will stay there forever.

Fix it by explicitly calling fib6_force_start_gc() on successful
exception creation. gc_args->more accounting will ensure that
the gc timer will run for whatever time needed to properly
clean the table.

v2 -> v3:
 - clarified the commit message

Fixes: 2b760fcf5c ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:39:10 +01:00
Eric Dumazet
c92e8c02fe tcp/dccp: fix ireq->opt races
syzkaller found another bug in DCCP/TCP stacks [1]

For the reasons explained in commit ce1050089c ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.

Note the opt field is renamed to ireq_opt to ease grep games.

[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295

CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
 tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000

Allocated by task 3295:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 __do_kmalloc mm/slab.c:3725 [inline]
 __kmalloc+0x162/0x760 mm/slab.c:3734
 kmalloc include/linux/slab.h:498 [inline]
 tcp_v4_save_options include/net/tcp.h:1962 [inline]
 tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3306:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xca/0x250 mm/slab.c:3820
 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
 __sk_destruct+0xfd/0x910 net/core/sock.c:1560
 sk_destruct+0x47/0x80 net/core/sock.c:1595
 __sk_free+0x57/0x230 net/core/sock.c:1603
 sk_free+0x2a/0x40 net/core/sock.c:1614
 sock_put include/net/sock.h:1652 [inline]
 inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
 tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
 tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:33:19 +01:00
Chenbo Feng
6e71b04a82 bpf: Add file mode configuration into bpf maps
Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:32:59 +01:00
David Ahern
6eba87c781 net: ipv4: Change fib notifiers to take a fib_alias
All of the notifier data (fib_info, tos, type and table id) are
contained in the fib_alias. Pass it to the notifier instead of
each data separately shortening the argument list by 3.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:29:26 +01:00
Yuchung Cheng
1fba70e5b6 tcp: socket option to set TCP fast open key
New socket option TCP_FASTOPEN_KEY to allow different keys per
listener.  The listener by default uses the global key until the
socket option is set.  The key is a 16 bytes long binary data. This
option has no effect on regular non-listener TCP sockets.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:21:36 +01:00
David Ahern
de95e04791 net: Add extack to validator_info structs used for address notifier
Add extack to in_validator_info and in6_validator_info. Update the one
user of each, ipvlan, to return an error message for failures.

Only manual configuration of an address is plumbed in the IPv6 code path.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:15:07 +01:00
David Ahern
ff7883ea60 net: ipv6: Make inet6addr_validator a blocking notifier
inet6addr_validator chain was added by commit 3ad7d2468f ("Ipvlan
should return an error when an address is already in use") to allow
address validation before changes are committed and to be able to
fail the address change with an error back to the user. The address
validation is not done for addresses received from router
advertisements.

Handling RAs in softirq context is the only reason for the notifier
chain to be atomic versus blocking. Since the only current user, ipvlan,
of the validator chain ignores softirq context, the notifier can be made
blocking and simply not invoked for softirq path.

The blocking option is needed by spectrum for example to validate
resources for an adding an address to an interface.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:15:07 +01:00