Commit Graph

53071 Commits

Author SHA1 Message Date
Cong Wang
f5b9bac745 net_sched: notify filter deletion when deleting a chain
When we delete a chain of filters, we need to notify
user-space we are deleting each filters in this chain
too.

Fixes: 32a4f5ecd7 ("net: sched: introduce chain object to uapi")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 09:07:40 -07:00
Cong Wang
9708d2b5b7 llc: avoid blocking in llc_sap_close()
llc_sap_close() is called by llc_sap_put() which
could be called in BH context in llc_rcv(). We can't
block in BH.

There is no reason to block it here, kfree_rcu() should
be sufficient.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 09:04:58 -07:00
Xin Long
22d0bd82cc ipv6: use rt6_info members when dst is set in rt6_fill_node
In inet6_rtm_getroute, since Commit 93531c6743 ("net/ipv6: separate
handling of FIB entries from dst based routes"), it has used rt->from
to dump route info instead of rt.

However for some route like cache, some of its information like flags
or gateway is not the same as that of the 'from' one. It caused 'ip
route get' to dump the wrong route information.

In Jianlin's testing, the output information even lost the expiration
time for a pmtu route cache due to the wrong fib6_flags.

So change to use rt6_info members for dst addr, src addr, flags and
gateway when it tries to dump a route entry without fibmatch set.

v1->v2:
  - not use rt6i_prefsrc.
  - also fix the gw dump issue.

Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 08:20:37 -07:00
Andre Naujoks
15033f0457 ipv6: Add sockopt IPV6_MULTICAST_ALL analogue to IP_MULTICAST_ALL
The socket option will be enabled by default to ensure current behaviour
is not changed. This is the same for the IPv4 version.

A socket bound to in6addr_any and a specific port will receive all traffic
on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
one or more multicast groups were joined (using said socket) and only
pass on multicast traffic from groups, which were explicitly joined via
this socket.

Without this option disabled a socket (system even) joined to multiple
multicast groups is very hard to get right. Filtering by destination
address has to take place in user space to avoid receiving multicast
traffic from other multicast groups, which might have traffic on the same
port.

The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
too, is not done to avoid changing the behaviour of current applications.

Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
Acked-By: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 08:17:27 -07:00
Hauke Mehrtens
7969119293 net: dsa: Add Lantiq / Intel GSWIP tag support
This handles the tag added by the PMAC on the VRX200 SoC line.

The GSWIP uses internally a GSWIP special tag which is located after the
Ethernet header. The PMAC which connects the GSWIP to the CPU converts
this special tag used by the GSWIP into the PMAC special tag which is
added in front of the Ethernet header.

This was tested with GSWIP 2.1 found in the VRX200 SoCs, other GSWIP
versions use slightly different PMAC special tags.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 08:14:33 -07:00
David S. Miller
aaf9253025 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-12 22:22:42 -07:00
Nikolay Aleksandrov
435f2e7cc0 net: bridge: add support for sticky fdb entries
Add support for entries which are "sticky", i.e. will not change their port
if they show up from a different one. A new ndm flag is introduced for that
purpose - NTF_STICKY. We allow to set it only to non-local entries.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:30:03 -07:00
Tushar Dave
4c3d795cb0 bpf: use __GFP_COMP while allocating page
Helper bpg_msg_pull_data() can allocate multiple pages while
linearizing multiple scatterlist elements into one shared page.
However, if the shared page has size > PAGE_SIZE, using
copy_page_to_iter() causes below warning.

e.g.
[ 6367.019832] WARNING: CPU: 2 PID: 7410 at lib/iov_iter.c:825
page_copy_sane.part.8+0x0/0x8

To avoid above warning, use __GFP_COMP while allocating multiple
contiguous pages.

Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-12 23:47:28 +02:00
Cong Wang
12a78b026f tipc: check return value of __tipc_dump_start()
When __tipc_dump_start() fails with running out of memory,
we have no reason to continue, especially we should avoid
calling tipc_dump_done().

Fixes: 8f5c5fcf35 ("tipc: call start and done ops directly in __tipc_nl_compat_dumpit()")
Reported-and-tested-by: syzbot+3f8324abccfbf8c74a9f@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 13:15:04 -07:00
Cong Wang
cc4dfb7f70 rds: fix two RCU related problems
When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rds sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.

Mark the rds sock with SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.

The other problem is in rds_find_bound(), the rds sock could be
freed in between rhashtable_lookup_fast() and rds_sock_addref(),
so we need to extend RCU read lock protection in rds_find_bound()
to close this race condition.

Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oarcle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:09:19 -07:00
Li RongQing
0041195d55 netlink: remove hash::nelems check in netlink_insert
The type of hash::nelems has been changed from size_t to atom_t
which in fact is int, so not need to check if BITS_PER_LONG, that
is bit number of size_t, is bigger than 32

and rht_grow_above_max() will be called to check if hashtable is
too big, ensure it can not bigger than 1<<31

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:08:44 -07:00
Vakul Garg
150085791a net/tls: Fixed return value when tls_complete_pending_work() fails
In tls_sw_sendmsg() and tls_sw_sendpage(), the variable 'ret' has
been set to return value of tls_complete_pending_work(). This allows
return of proper error code if tls_complete_pending_work() fails.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:04:28 -07:00
Haishuang Yan
51dc63e391 erspan: fix error handling for erspan tunnel
When processing icmp unreachable message for erspan tunnel, tunnel id
should be erspan_net_id instead of ipgre_net_id.

Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:50:54 -07:00
Haishuang Yan
5a64506b5c erspan: return PACKET_REJECT when the appropriate tunnel is not found
If erspan tunnel hasn't been established, we'd better send icmp port
unreachable message after receive erspan packets.

Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:50:53 -07:00
Willem de Bruijn
0297c1c2ea tcp: rate limit synflood warnings further
Convert pr_info to net_info_ratelimited to limit the total number of
synflood warnings.

Commit 946cedccbd ("tcp: Change possible SYN flooding messages")
rate limits synflood warnings to one per listener.

Workloads that open many listener sockets can still see a high rate of
log messages. Syzkaller is one frequent example.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:34:20 -07:00
David S. Miller
4ecdf77091 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for you net tree:

1) Remove duplicated include at the end of UDP conntrack, from Yue Haibing.

2) Restore conntrack dependency on xt_cluster, from Martin Willi.

3) Fix splat with GSO skbs from the checksum target, from Florian Westphal.

4) Rework ct timeout support, the template strategy to attach custom timeouts
   is not correct since it will not work in conjunction with conntrack zones
   and we have a possible free after use when removing the rule due to missing
   refcounting. To fix these problems, do not use conntrack template at all
   and set custom timeout on the already valid conntrack object. This
   fix comes with a preparation patch to simplify timeout adjustment by
   initializating the first position of the timeout array for all of the
   existing trackers. Patchset from Florian Westphal.

5) Fix missing dependency on from IPv4 chain NAT type, from Florian.

6) Release chain reference counter from the flush path, from Taehee Yoo.

7) After flushing an iptables ruleset, conntrack hooks are unregistered
   and entries are left stale to be cleaned up by the timeout garbage
   collector. No TCP tracking is done on established flows by this time.
   If ruleset is reloaded, then hooks are registered again and TCP
   tracking is restored, which considers packets to be invalid. Clear
   window tracking to exercise TCP flow pickup from the middle given that
   history is lost for us. Again from Florian.

8) Fix crash from netlink interface with CONFIG_NF_CONNTRACK_TIMEOUT=y
   and CONFIG_NF_CT_NETLINK_TIMEOUT=n.

9) Broken CT target due to returning incorrect type from
   ctnl_timeout_find_get().

10) Solve conntrack clash on NF_REPEAT verdicts too, from Michal Vaner.

11) Missing conversion of hashlimit sysctl interface to new API, from
    Cong Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 21:17:30 -07:00
Anders Roxell
1edb6e035e net/core/filter: fix unused-variable warning
Building with CONFIG_INET=n will show the warning below:
net/core/filter.c: In function ‘____bpf_getsockopt’:
net/core/filter.c:4048:19: warning: unused variable ‘tp’ [-Wunused-variable]
  struct tcp_sock *tp;
                   ^~
net/core/filter.c:4046:31: warning: unused variable ‘icsk’ [-Wunused-variable]
  struct inet_connection_sock *icsk;
                               ^~~~
Move the variable declarations inside the {} block where they are used.

Fixes: 1e215300f1 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-11 14:54:52 -07:00
Matias Karhumaa
4ba5175f2c Bluetooth: Use correct tfm to generate OOB data
In case local OOB data was generated and other device initiated pairing
claiming that it has got OOB data, following crash occurred:

[  222.847853] general protection fault: 0000 [#1] SMP PTI
[  222.848025] CPU: 1 PID: 42 Comm: kworker/u5:0 Tainted: G         C        4.18.0-custom #4
[  222.848158] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  222.848307] Workqueue: hci0 hci_rx_work [bluetooth]
[  222.848416] RIP: 0010:compute_ecdh_secret+0x5a/0x270 [bluetooth]
[  222.848540] Code: 0c af f5 48 8b 3d 46 de f0 f6 ba 40 00 00 00 be c0 00 60 00 e8 b7 7b c5 f5 48 85 c0 0f 84 ea 01 00 00 48 89 c3 e8 16 0c af f5 <49> 8b 47 38 be c0 00 60 00 8b 78 f8 48 83 c7 48 e8 51 84 c5 f5 48
[  222.848914] RSP: 0018:ffffb1664087fbc0 EFLAGS: 00010293
[  222.849021] RAX: ffff8a5750d7dc00 RBX: ffff8a5671096780 RCX: ffffffffc08bc32a
[  222.849111] RDX: 0000000000000000 RSI: 00000000006000c0 RDI: ffff8a5752003800
[  222.849192] RBP: ffffb1664087fc60 R08: ffff8a57525280a0 R09: ffff8a5752003800
[  222.849269] R10: ffffb1664087fc70 R11: 0000000000000093 R12: ffff8a5674396e00
[  222.849350] R13: ffff8a574c2e79aa R14: ffff8a574c2e796a R15: 020e0e100d010101
[  222.849429] FS:  0000000000000000(0000) GS:ffff8a5752500000(0000) knlGS:0000000000000000
[  222.849518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.849586] CR2: 000055856016a038 CR3: 0000000110d2c005 CR4: 00000000000606e0
[  222.849671] Call Trace:
[  222.849745]  ? sc_send_public_key+0x110/0x2a0 [bluetooth]
[  222.849825]  ? sc_send_public_key+0x115/0x2a0 [bluetooth]
[  222.849925]  smp_recv_cb+0x959/0x2490 [bluetooth]
[  222.850023]  ? _cond_resched+0x19/0x40
[  222.850105]  ? mutex_lock+0x12/0x40
[  222.850202]  l2cap_recv_frame+0x109d/0x3420 [bluetooth]
[  222.850315]  ? l2cap_recv_frame+0x109d/0x3420 [bluetooth]
[  222.850426]  ? __switch_to_asm+0x34/0x70
[  222.850515]  ? __switch_to_asm+0x40/0x70
[  222.850625]  ? __switch_to_asm+0x34/0x70
[  222.850724]  ? __switch_to_asm+0x40/0x70
[  222.850786]  ? __switch_to_asm+0x34/0x70
[  222.850846]  ? __switch_to_asm+0x40/0x70
[  222.852581]  ? __switch_to_asm+0x34/0x70
[  222.854976]  ? __switch_to_asm+0x40/0x70
[  222.857475]  ? __switch_to_asm+0x40/0x70
[  222.859775]  ? __switch_to_asm+0x34/0x70
[  222.861218]  ? __switch_to_asm+0x40/0x70
[  222.862327]  ? __switch_to_asm+0x34/0x70
[  222.863758]  l2cap_recv_acldata+0x266/0x3c0 [bluetooth]
[  222.865122]  hci_rx_work+0x1c9/0x430 [bluetooth]
[  222.867144]  process_one_work+0x210/0x4c0
[  222.868248]  worker_thread+0x41/0x4d0
[  222.869420]  kthread+0x141/0x160
[  222.870694]  ? process_one_work+0x4c0/0x4c0
[  222.871668]  ? kthread_create_worker_on_cpu+0x90/0x90
[  222.872896]  ret_from_fork+0x35/0x40
[  222.874132] Modules linked in: algif_hash algif_skcipher af_alg rfcomm bnep btusb btrtl btbcm btintel snd_intel8x0 cmac intel_rapl_perf vboxvideo(C) snd_ac97_codec bluetooth ac97_bus joydev ttm snd_pcm ecdh_generic drm_kms_helper snd_timer snd input_leds drm serio_raw fb_sys_fops soundcore syscopyarea sysfillrect sysimgblt mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci psmouse libahci i2c_piix4 video e1000 pata_acpi
[  222.883153] fbcon_switch: detected unhandled fb_set_par error, error code -16
[  222.886774] fbcon_switch: detected unhandled fb_set_par error, error code -16
[  222.890503] ---[ end trace 6504aa7a777b5316 ]---
[  222.890541] RIP: 0010:compute_ecdh_secret+0x5a/0x270 [bluetooth]
[  222.890551] Code: 0c af f5 48 8b 3d 46 de f0 f6 ba 40 00 00 00 be c0 00 60 00 e8 b7 7b c5 f5 48 85 c0 0f 84 ea 01 00 00 48 89 c3 e8 16 0c af f5 <49> 8b 47 38 be c0 00 60 00 8b 78 f8 48 83 c7 48 e8 51 84 c5 f5 48
[  222.890555] RSP: 0018:ffffb1664087fbc0 EFLAGS: 00010293
[  222.890561] RAX: ffff8a5750d7dc00 RBX: ffff8a5671096780 RCX: ffffffffc08bc32a
[  222.890565] RDX: 0000000000000000 RSI: 00000000006000c0 RDI: ffff8a5752003800
[  222.890571] RBP: ffffb1664087fc60 R08: ffff8a57525280a0 R09: ffff8a5752003800
[  222.890576] R10: ffffb1664087fc70 R11: 0000000000000093 R12: ffff8a5674396e00
[  222.890581] R13: ffff8a574c2e79aa R14: ffff8a574c2e796a R15: 020e0e100d010101
[  222.890586] FS:  0000000000000000(0000) GS:ffff8a5752500000(0000) knlGS:0000000000000000
[  222.890591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.890594] CR2: 000055856016a038 CR3: 0000000110d2c005 CR4: 00000000000606e0

This commit fixes a bug where invalid pointer to crypto tfm was used for
SMP SC ECDH calculation when OOB was in use. Solution is to use same
crypto tfm than when generating OOB material on generate_oob() function.

This bug was introduced in commit c0153b0b90 ("Bluetooth: let the crypto
subsystem generate the ecc privkey"). Bug was found by fuzzing kernel SMP
implementation using Synopsys Defensics.

Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-11 13:33:57 +02:00
Johan Hedberg
94f14e4728 Bluetooth: SMP: Fix trying to use non-existent local OOB data
A remote device may claim that it has received our OOB data, even
though we never geneated it. Add a new flag to track whether we
actually have OOB data, and ignore the remote peer's flag if haven't
generated OOB data.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-11 13:33:57 +02:00
Cong Wang
1286df269f netfilter: xt_hashlimit: use s->file instead of s->private
After switching to the new procfs API, it is supposed to
retrieve the private pointer from PDE_DATA(file_inode(s->file)),
s->private is no longer referred.

Fixes: 1cd6718272 ("netfilter/x_tables: switch to proc_create_seq_private")
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:35:32 +02:00
Michal 'vorner' Vaner
ad18d7bf68 netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT
NF_REPEAT places the packet at the beginning of the iptables chain
instead of accepting or rejecting it right away. The packet however will
reach the end of the chain and continue to the end of iptables
eventually, so it needs the same handling as NF_ACCEPT and NF_DROP.

Fixes: 368982cd7d ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks")
Signed-off-by: Michal 'vorner' Vaner <michal.vaner@avast.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:31:47 +02:00
Pablo Neira Ayuso
99e25d071f netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type
Compiler did not catch incorrect typing in the rcu hook assignment.

 % nfct add timeout test-tcp inet tcp established 100 close 10 close_wait 10
 % iptables -I OUTPUT -t raw -p tcp -j CT --timeout test-tcp
 dmesg - xt_CT: Timeout policy `test-tcp' can only be used by L3 protocol number 25000

The CT target bails out with incorrect layer 3 protocol number.

Fixes: 6c1fd7dc48 ("netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout object")
Reported-by: Harsha Sharma <harshasharmaiitr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:31:10 +02:00
Pablo Neira Ayuso
a874752a10 netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT
Now that cttimeout support for nft_ct is in place, these should depend
on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
policy if this option is not enabled.

[   71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[...]
[   71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
[...]
[   71.600188] Call Trace:
[   71.600201]  ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:30:25 +02:00
Florian Westphal
f94e63801a netfilter: conntrack: reset tcp maxwin on re-register
Doug Smythies says:
  Sometimes it is desirable to temporarily disable, or clear,
  the iptables rule set on a computer being controlled via a
  secure shell session (SSH). While unwise on an internet facing
  computer, I also do it often on non-internet accessible computers
  while testing. Recently, this has become problematic, with the
  SSH session being dropped upon re-load of the rule set.

The problem is that when all rules are deleted, conntrack hooks get
unregistered.

In case the rules are re-added later, its possible that tcp window
has moved far enough so that all packets are considered invalid (out of
window) until entry expires (which can take forever, default
established timeout is 5 days).

Fix this by clearing maxwin of existing tcp connections on register.

v2: don't touch entries on hook removal.
v3: remove obsolete expiry check.

Reported-by: Doug Smythies <dsmythies@telus.net>
Fixes: 4d3a57f23d ("netfilter: conntrack: do not enable connection tracking unless needed")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:29:24 +02:00
Cong Wang
11957be20f htb: use anonymous union for simplicity
cl->leaf.q is slightly more readable than cl->un.leaf.q.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:39:57 -07:00
Cong Wang
8ecc7c8a1c net_sched: remove redundant qdisc lock classes
We no longer take any spinlock on RX path for ingress qdisc,
so this lockdep annotation is no longer needed.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:39:38 -07:00
Vlad Buslov
86c55361e5 net: sched: cls_flower: dump offload count value
Change flower in_hw_count type to fixed-size u32 and dump it as
TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
blocks and re-offload functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:35:15 -07:00
David S. Miller
992cba7e27 net: Add and use skb_list_del_init().
It documents what is happening, and eliminates the spurious list
pointer poisoning.

In the long term, in order to get proper list head debugging, we
might want to use the list poison value as the indicator that
an SKB is a singleton and not on a list.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:54 -07:00
David S. Miller
a8305bff68 net: Add and use skb_mark_not_on_list().
An SKB is not on a list if skb->next is NULL.

Codify this convention into a helper function and use it
where we are dequeueing an SKB and need to mark it as such.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:54 -07:00
David S. Miller
1181d629cc sctp: Use skb_queue_is_first().
Instead of direct skb_queue_head pointer accesses.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:53 -07:00
David S. Miller
7957a9dea8 mac80211: Don't access sk_queue_head->next directly.
Use __skb_peek() instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:53 -07:00
David S. Miller
596977300a sch_netem: Move private queue handler to generic location.
By hand copies of SKB list handlers do not belong in individual packet
schedulers.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:53 -07:00
David S. Miller
aea890b8b2 sch_htb: Remove local SKB queue handling code.
Instead, adjust __qdisc_enqueue_tail() such that HTB can use it
instead.

The only other caller of __qdisc_enqueue_tail() is
qdisc_enqueue_tail() so we can move the backlog and return value
handling (which HTB doesn't need/want) to the latter.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:06:52 -07:00
David Ahern
0153167aeb net/ipv6: Remove rt6i_prefsrc
After the conversion to fib6_info, rt6i_prefsrc has a single user that
reads the value and otherwise it is only set. The one reader can be
converted to use rt->from so rt6i_prefsrc can be removed, reducing
rt6_info by another 20 bytes.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:02:25 -07:00
Taehee Yoo
5d407b071d ip: frags: fix crash in ip_do_fragment()
A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.

test commands:
   %iptables -t nat -I POSTROUTING -j MASQUERADE
   %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

splat looks like:
[  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
[  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
[  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
[  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
[  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
[  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
[  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
[  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
[  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
[  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
[  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
[  261.198158] Call Trace:
[  261.199018]  ? dst_output+0x180/0x180
[  261.205011]  ? save_trace+0x300/0x300
[  261.209018]  ? ip_copy_metadata+0xb00/0xb00
[  261.213034]  ? sched_clock_local+0xd4/0x140
[  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
[  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
[  261.227014]  ? find_held_lock+0x39/0x1c0
[  261.233008]  ip_finish_output+0x51d/0xb50
[  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
[  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
[  261.250152]  ? rcu_is_watching+0x77/0x120
[  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
[  261.261033]  ? nf_hook_slow+0xb1/0x160
[  261.265007]  ip_output+0x1c7/0x710
[  261.269005]  ? ip_mc_output+0x13f0/0x13f0
[  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
[  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
[  261.282996]  ? nf_hook_slow+0xb1/0x160
[  261.287007]  raw_sendmsg+0x21f9/0x4420
[  261.291008]  ? dst_output+0x180/0x180
[  261.297003]  ? sched_clock_cpu+0x126/0x170
[  261.301003]  ? find_held_lock+0x39/0x1c0
[  261.306155]  ? stop_critical_timings+0x420/0x420
[  261.311004]  ? check_flags.part.36+0x450/0x450
[  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.326142]  ? cyc2ns_read_end+0x10/0x10
[  261.330139]  ? raw_bind+0x280/0x280
[  261.334138]  ? sched_clock_cpu+0x126/0x170
[  261.338995]  ? check_flags.part.36+0x450/0x450
[  261.342991]  ? __lock_acquire+0x4500/0x4500
[  261.348994]  ? inet_sendmsg+0x11c/0x500
[  261.352989]  ? dst_output+0x180/0x180
[  261.357012]  inet_sendmsg+0x11c/0x500
[ ... ]

v2:
 - clear skb->sk at reassembly routine.(Eric Dumarzet)

Fixes: fa0f527358 ("ip: use rb trees for IP frag queue.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 14:50:56 -07:00
Vakul Garg
52ea992cfa net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC
tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 08:10:01 -07:00
Vlad Buslov
f20a4d0117 net: sched: act_nat: remove dependency on rtnl lock
According to the new locking rule, we have to take tcf_lock for both
->init() and ->dump(), as RTNL will be removed.

Use tcf spinlock to protect private nat action data from concurrent
modification during dump. (nat init already uses tcf spinlock when changing
action state)

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-08 10:18:25 -07:00
Vlad Buslov
6d7a8df6df net: sched: act_skbedit: remove dependency on rtnl lock
According to the new locking rule, we have to take tcf_lock for both
->init() and ->dump(), as RTNL will be removed.

Use tcf lock to protect skbedit action struct private data from concurrent
modification in init and dump. Use rcu swap operation to reassign params
pointer under protection of tcf lock. (old params value is not used by
init, so there is no need of standalone rcu dereference step)

Remove rtnl lock assertion that is no longer required.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-08 10:17:35 -07:00
Vincent Whitchurch
5cf4a8532c tcp: really ignore MSG_ZEROCOPY if no SO_ZEROCOPY
According to the documentation in msg_zerocopy.rst, the SO_ZEROCOPY
flag was introduced because send(2) ignores unknown message flags and
any legacy application which was accidentally passing the equivalent of
MSG_ZEROCOPY earlier should not see any new behaviour.

Before commit f214f915e7 ("tcp: enable MSG_ZEROCOPY"), a send(2) call
which passed the equivalent of MSG_ZEROCOPY without setting SO_ZEROCOPY
would succeed.  However, after that commit, it fails with -ENOBUFS.  So
it appears that the SO_ZEROCOPY flag fails to fulfill its intended
purpose.  Fix it.

Fixes: f214f915e7 ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 23:11:06 -07:00
Cong Wang
a162c35114 net_sched: properly cancel netlink dump on failure
When nla_put*() fails after nla_nest_start(), we need
to call nla_nest_cancel() to cancel the message, otherwise
we end up calling nla_nest_end() like a success.

Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 23:05:07 -07:00
Florian Fainelli
98cdb48071 net: dsa: Expose tagging protocol to user-space
There is no way for user-space to know what a given DSA network device's
tagging protocol is. Expose this information through a dsa/tagging
attribute which reflects the tagging protocol currently in use.

This is helpful for configuration (e.g: none behaves dramatically
different wrt. bridges) as well as for packet capture tools when there
is not a proper Ethernet type available.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 14:49:55 -07:00
Marek Lindner
4c4af69008 batman-adv: fix hardif_neigh refcount on queue_work() failure
The hardif_neigh refcounter is to be decreased by the queued work and
currently is never decreased if the queue_work() call fails.
Fix by checking the queue_work() return value and decrease refcount
if necessary.

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-07 08:14:26 +02:00
Marek Lindner
5af96b9c59 batman-adv: fix backbone_gw refcount on queue_work() failure
The backbone_gw refcounter is to be decreased by the queued work and
currently is never decreased if the queue_work() call fails.
Fix by checking the queue_work() return value and decrease refcount
if necessary.

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-07 08:14:26 +02:00
Jesper Dangaard Brouer
47b123ed9e xdp: split code for map vs non-map redirect
The compiler does an efficient job of inlining static C functions.
Perf top clearly shows that almost everything gets inlined into the
function call xdp_do_redirect.

The function xdp_do_redirect end-up containing and interleaving the
map and non-map redirect code.  This is sub-optimal, as it would be
strange for an XDP program to use both types of redirect in the same
program. The two use-cases are separate, and interleaving the code
just cause more instruction-cache pressure.

I would like to stress (again) that the non-map variant bpf_redirect
is very slow compared to the bpf_redirect_map variant, approx half the
speed.  Measured with driver i40e the difference is:

- map     redirect: 13,250,350 pps
- non-map redirect:  7,491,425 pps

For this reason, the function name of the non-map variant of redirect
have been called xdp_do_redirect_slow.  This hopefully gives a hint
when using perf, that this is not the optimal XDP redirect operating mode.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-06 22:34:08 -07:00
Jesper Dangaard Brouer
2a68d85fe1 xdp: explicit inline __xdp_map_lookup_elem
The compiler chooses to not-inline the function __xdp_map_lookup_elem,
because it can see that it is used by both Generic-XDP and native-XDP
do redirect calls (xdp_do_generic_redirect_map and xdp_do_redirect_map).

The compiler cannot know that this is a bad choice, as it cannot know
that a net device cannot run both XDP modes (Generic or Native) at the
same time.  Thus, mark this function inline, even-though we normally
leave this up-to the compiler.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-06 22:34:08 -07:00
Jesper Dangaard Brouer
e1302542e3 xdp: unlikely instrumentation for xdp map redirect
Notice the compiler generated ASM code layout was suboptimal.  It
assumed map enqueue errors as the likely case, which is shouldn't.
It assumed that xdp_do_flush_map() was a likely case, due to maps
changing between packets, which should be very unlikely.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-06 22:34:08 -07:00
Cong Wang
8f5c5fcf35 tipc: call start and done ops directly in __tipc_nl_compat_dumpit()
__tipc_nl_compat_dumpit() uses a netlink_callback on stack,
so the only way to align it with other ->dumpit() call path
is calling tipc_dump_start() and tipc_dump_done() directly
inside it. Otherwise ->dumpit() would always get NULL from
cb->args[].

But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve
net pointer, the cb->skb here doesn't set skb->sk, the net pointer
is saved in msg->net instead, so introduce a helper function
__tipc_dump_start() to pass in msg->net.

Ying pointed out cb->args[0...3] are already used by other
callbacks on this call path, so we can't use cb->args[0] any
more, use cb->args[4] instead.

Fixes: 9a07efa9ae ("tipc: switch to rhashtable iterator")
Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 21:49:18 -07:00
Yi-Hung Wei
fa642f0883 openvswitch: Derive IP protocol number for IPv6 later frags
Currently, OVS only parses the IP protocol number for the first
IPv6 fragment, but sets the IP protocol number for the later fragments
to be NEXTHDF_FRAGMENT.  This patch tries to derive the IP protocol
number for the IPV6 later frags so that we can match that.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 21:47:49 -07:00
Sven Eckelmann
ae3cdc97dc batman-adv: Prevent duplicated tvlv handler
The function batadv_tvlv_handler_register is responsible for adding new
tvlv_handler to the handler_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: ef26157747 ("batman-adv: tvlv - basic infrastructure")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 14:37:13 +02:00
Sven Eckelmann
e7136e48ff batman-adv: Prevent duplicated global TT entry
The function batadv_tt_global_orig_entry_add is responsible for adding new
tt_orig_list_entry to the orig_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: d657e621a0 ("batman-adv: add reference counting for type batadv_tt_orig_list_entry")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 14:37:12 +02:00
Sven Eckelmann
94cb82f594 batman-adv: Prevent duplicated softif_vlan entry
The function batadv_softif_vlan_get is responsible for adding new
softif_vlan to the softif_vlan_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: 5d2c05b213 ("batman-adv: add per VLAN interface attribute framework")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 14:37:05 +02:00
Sven Eckelmann
fa122fec86 batman-adv: Prevent duplicated nc_node entry
The function batadv_nc_get_nc_node is responsible for adding new nc_nodes
to the in_coding_list and out_coding_list. It first checks whether the
entry already is in the list or not. If it is, then the creation of a new
entry is aborted.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: d56b1705e2 ("batman-adv: network coding - detect coding nodes and remove these after timeout")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:55:58 +02:00
Sven Eckelmann
dff9bc42ab batman-adv: Prevent duplicated gateway_node entry
The function batadv_gw_node_add is responsible for adding new gw_node to
the gateway_list. It is expecting that the caller already checked that
there is not already an entry with the same key or not.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:55:20 +02:00
Sven Eckelmann
a25bab9d72 batman-adv: Fix segfault when writing to sysfs elp_interval
The per hardif sysfs file "batman_adv/elp_interval" is using the generic
functions to store/show uint values. The helper __batadv_store_uint_attr
requires the softif net_device as parameter to print the resulting change
as info text when the users writes to this file. It uses the helper
function batadv_info to add it at the same time to the kernel ring buffer
and to the batman-adv debug log (when CONFIG_BATMAN_ADV_DEBUG is enabled).

The function batadv_info requires as first parameter the batman-adv softif
net_device. This parameter is then used to find the private buffer which
contains the debug log for this batman-adv interface. But
batadv_store_throughput_override used as first argument the slave
net_device. This slave device doesn't have the batadv_priv private data
which is access by batadv_info.

Writing to this file with CONFIG_BATMAN_ADV_DEBUG enabled can either lead
to a segfault or to memory corruption.

Fixes: 0744ff8fa8 ("batman-adv: Add hard_iface specific sysfs wrapper macros for UINT")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:54:48 +02:00
Sven Eckelmann
b9fd14c208 batman-adv: Fix segfault when writing to throughput_override
The per hardif sysfs file "batman_adv/throughput_override" prints the
resulting change as info text when the users writes to this file. It uses
the helper function batadv_info to add it at the same time to the kernel
ring buffer and to the batman-adv debug log (when CONFIG_BATMAN_ADV_DEBUG
is enabled).

The function batadv_info requires as first parameter the batman-adv softif
net_device. This parameter is then used to find the private buffer which
contains the debug log for this batman-adv interface. But
batadv_store_throughput_override used as first argument the slave
net_device. This slave device doesn't have the batadv_priv private data
which is access by batadv_info.

Writing to this file with CONFIG_BATMAN_ADV_DEBUG enabled can either lead
to a segfault or to memory corruption.

Fixes: 0b5ecc6811 ("batman-adv: add throughput override attribute to hard_ifaces")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:54:48 +02:00
Sven Eckelmann
88d0895d0e batman-adv: Avoid probe ELP information leak
The probe ELPs for WiFi interfaces are expanded to contain at least
BATADV_ELP_MIN_PROBE_SIZE bytes. This is usually a lot more than the
number of bytes which the template ELP packet requires.

These extra padding bytes were not initialized and thus could contain data
which were previously stored at the same location. It is therefore required
to set it to some predefined or random values to avoid leaking private
information from the system transmitting these kind of packets.

Fixes: e4623c913508 ("batman-adv: Avoid probe ELP information leak")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:54:47 +02:00
Julian Wiedmann
b7f4156554 net/iucv: declare iucv_path_table_empty() as static
Fixes a compile warning.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:22 -07:00
Julian Wiedmann
b2f543949a net/af_iucv: fix skb handling on HiperTransport xmit error
When sending an skb, afiucv_hs_send() bails out on various error
conditions. But currently the caller has no way of telling whether the
skb was freed or not - resulting in potentially either
a) leaked skbs from iucv_send_ctrl(), or
b) double-free's from iucv_sock_sendmsg().

As dev_queue_xmit() will always consume the skb (even on error), be
consistent and also free the skb from all other error paths. This way
callers no longer need to care about managing the skb.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:22 -07:00
Julian Wiedmann
222440996d net/af_iucv: drop inbound packets with invalid flags
Inbound packets may have any combination of flag bits set in their iucv
header. If we don't know how to handle a specific combination, drop the
skb instead of leaking it.

To clarify what error is returned in this case, replace the hard-coded
0 with the corresponding macro.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:21 -07:00
Christian Brauner
203651b665 ipv6: add inet6_fill_args
inet6_fill_if{addr,mcaddr, acaddr}() already took 6 arguments which
meant the 7th argument would need to be pushed onto the stack on x86.
Add a new struct inet6_fill_args which holds common information passed
to inet6_fill_if{addr,mcaddr, acaddr}() and shortens the functions to
three pointer arguments.

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
978a46fa6c ipv4: add inet_fill_args
inet_fill_ifaddr() already took 6 arguments which meant the 7th argument
would need to be pushed onto the stack on x86.
Add a new struct inet_fill_args which holds common information passed
to inet_fill_ifaddr() and shortens the function to three pointer arguments.

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
7e4a8d5a93 rtnetlink: s/IFLA_IF_NETNSID/IFLA_TARGET_NETNSID/g
IFLA_TARGET_NETNSID is the new alias for IFLA_IF_NETNSID. This commit
replaces all occurrences of IFLA_IF_NETNSID with the new alias to
indicate that this identifier is the preferred one.

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
87ccbb1f94 rtnetlink: move type calculation out of loop
I don't see how the type - which is one of
RTM_{GETADDR,GETROUTE,GETNETCONF} - can change. So do the message type
calculation once before entering the for loop.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
6ecf4c37eb ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR
- Backwards Compatibility:
  If userspace wants to determine whether ipv6 RTM_GETADDR requests
  support the new IFA_TARGET_NETNSID property it should verify that the
  reply includes the IFA_TARGET_NETNSID property. If it does not
  userspace should assume that IFA_TARGET_NETNSID is not supported for
  ipv6 RTM_GETADDR requests on this kernel.
- From what I gather from current userspace tools that make use of
  RTM_GETADDR requests some of them pass down struct ifinfomsg when they
  should actually pass down struct ifaddrmsg. To not break existing
  tools that pass down the wrong struct we will do the same as for
  RTM_GETLINK | NLM_F_DUMP requests and not error out when the
  nlmsg_parse() fails.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
d38071455f ipv4: enable IFA_TARGET_NETNSID for RTM_GETADDR
- Backwards Compatibility:
  If userspace wants to determine whether ipv4 RTM_GETADDR requests
  support the new IFA_TARGET_NETNSID property it should verify that the
  reply includes the IFA_TARGET_NETNSID property. If it does not
  userspace should assume that IFA_TARGET_NETNSID is not supported for
  ipv4 RTM_GETADDR requests on this kernel.
- From what I gather from current userspace tools that make use of
  RTM_GETADDR requests some of them pass down struct ifinfomsg when they
  should actually pass down struct ifaddrmsg. To not break existing
  tools that pass down the wrong struct we will do the same as for
  RTM_GETLINK | NLM_F_DUMP requests and not error out when the
  nlmsg_parse() fails.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
c383edc424 rtnetlink: add rtnl_get_net_ns_capable()
get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to
retrieve network namespaces based on network namespace identifiers. So
remove the static declaration and export in the rtnetlink header. Also,
rename it to rtnl_get_net_ns_capable() to make it obvious what this
function is doing.
Export rtnl_get_net_ns_capable() so it can be used when ipv6 is built as
a module.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Davide Caratti
ee28bb56ac net/sched: fix memory leak in act_tunnel_key_init()
If users try to install act_tunnel_key 'set' rules with duplicate values
of 'index', the tunnel metadata are allocated, but never released. Then,
kmemleak complains as follows:

 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 # echo clear > /sys/kernel/debug/kmemleak
 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 Error: TC IDR already exists.
 We have an error talking to the kernel
 # echo scan > /sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
 unreferenced object 0xffff8800574e6c80 (size 256):
   comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff  ................
     81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00  .$..............
   backtrace:
     [<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
     [<000000007d98fccd>] tcf_action_init_1+0x698/0xac0
     [<0000000099b8f7cc>] tcf_action_init+0x15c/0x590
     [<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2
     [<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0
     [<000000000bfe7575>] netlink_rcv_skb+0x124/0x350
     [<00000000edab656f>] netlink_unicast+0x40f/0x5d0
     [<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0
     [<0000000063d9d490>] sock_sendmsg+0xb3/0xf0
     [<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960
     [<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170
     [<00000000ce72e4b0>] do_syscall_64+0xa5/0x470
     [<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
     [<00000000fac1b476>] 0xffffffffffffffff

This problem theoretically happens also in case users attempt to setup a
geneve rule having wrong configuration data, or when the kernel fails to
allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
metadata also in the above conditions.

Addresses-Coverity-ID: 1373974 ("Resource leak")
Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:18:54 -07:00
Cong Wang
0a3b8b2b21 tipc: orphan sock in tipc_release()
Before we unlock the sock in tipc_release(), we have to
detach sk->sk_socket from sk, otherwise a parallel
tipc_sk_fill_sock_diag() could stil read it after we
free this socket.

Fixes: c30b70deb5 ("tipc: implement socket diagnostics for AF_TIPC")
Reported-and-tested-by: syzbot+48804b87c16588ad491d@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:14:00 -07:00
Dmitry Safonov
428f944bd5 netlink: Make groups check less stupid in netlink_bind()
As Linus noted, the test for 0 is needless, groups type can follow the
usual kernel style and 8*sizeof(unsigned long) is BITS_PER_LONG:

> The code [..] isn't technically incorrect...
> But it is stupid.
> Why stupid? Because the test for 0 is pointless.
>
> Just doing
>        if (nlk->ngroups < 8*sizeof(groups))
>                groups &= (1UL << nlk->ngroups) - 1;
>
> would have been fine and more understandable, since the "mask by shift
> count" already does the right thing for a ngroups value of 0. Now that
> test for zero makes me go "what's special about zero?". It turns out
> that the answer to that is "nothing".
[..]
> The type of "groups" is kind of silly too.
>
> Yeah, "long unsigned int" isn't _technically_ wrong. But we normally
> call that type "unsigned long".

Cleanup my piece of pointlessness.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Fairly-blamed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:11:33 -07:00
Vincent Whitchurch
fa788d986a packet: add sockopt to ignore outgoing packets
Currently, the only way to ignore outgoing packets on a packet socket is
via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
if the filter run from packet_rcv() would reject them.  So the presence
of a packet socket on the interface takes away the benefits of
MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
cloned, but the cost for that is much lower.)

Add a socket option to allow AF_PACKET sockets to ignore outgoing
packets to solve this.  Note that the *BSDs already have something
similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.

The first intended user is lldpd.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:09:37 -07:00
Stanislaw Gruszka
014f5a250f cfg80211: validate wmm rule when setting
Add validation check for wmm rule when copy rules from fwdb and print
error when rule is invalid.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:16:59 +02:00
Gustavo A. R. Silva
40b5a0f8c6 mac80211: remove unnecessary NULL check
Both old and new cannot be NULL at the same time, hence checking
new when old is not NULL is unnecessary.

Also, notice that new is being dereferenced before it is checked:

	idx = new->conf.keyidx;

The above triggers a static code analysis warning.

Address this by removing the NULL check on new and adding a code
comment based on the following piece of code:

387        /* caller must provide at least one old/new */
388        if (WARN_ON(!new && !old))
389                return 0;

Addresses-Coverity-ID: 1473176 ("Dereference before null check")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:15:48 +02:00
Sara Sharon
9739fe29a2 mac80211: add an option for drivers to check if packets can be aggregated
Some hardwares have limitations on the packets' type in AMSDU.
Add an optional driver callback to determine if two skbs can
be used in the same AMSDU or not.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:11:50 +02:00
Sara Sharon
edba6bdad6 mac80211: allow AMSDU size limitation per-TID
Some drivers may have AMSDU size limitation per TID, due to
HW constrains. Add an option to set this limit.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:10:26 +02:00
Sara Sharon
0eeb2b674f mac80211: add an option for station management TXQ
We have a TXQ abstraction for non-data packets that need
powersave buffering. Since the AP cannot sleep, in case
of station we can use this TXQ for all management frames,
regardless if they are bufferable. Add HW flag to allow
that.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:10:11 +02:00
Shaul Triebitz
add7453ad6 wireless: align to draft 11ax D3.0
Align to new 11ax draft D3.0.  Change/add new MAC and PHY capabilities
and update drivers' 11ax capabilities and mac80211's debugfs
accordingly.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:09:50 +02:00
Naftali Goldstein
77cbbc35a4 mac80211: fix saving a few HE values
After masking the he_oper_params, to get the requested values as
integers one must rshift and not lshift.  Fix that by using the
le32_get_bits() macro.

Fixes: 41cbb0f5a2 ("mac80211: add support for HE")
Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com>
[converted to use le32_get_bits()]
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:08:42 +02:00
Shaul Triebitz
c3d1f87528 mac80211: support reporting 0-length PSDU in radiotap
For certain sounding frames, it may be useful to report them
to userspace even though they don't have a PSDU in order to
determine the PHY parameters (e.g. VHT rate/stream config.)
Add support for this to mac80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:08:25 +02:00
Alexander Wetzel
62872a9b9a mac80211: Fix PTK rekey freezes and clear text leak
Rekeying PTK keys without "Extended Key ID for Individually Addressed
Frames" did use a procedure not suitable to replace in-use keys and
could caused the following issues:

 1) Freeze caused by incoming frames:
    If the local STA installed the key prior to the remote STA we still
    had the old key active in the hardware when mac80211 switched over
    to the new key.
    Therefore there was a window where the card could hand over frames
    decoded with the old key to mac80211 and bump the new PN (IV) value
    to an incorrect high number. When it happened the local replay
    detection silently started to drop all frames sent with the new key.

 2) Freeze caused by outgoing frames:
    If mac80211 was providing the PN (IV) and handed over a clear text
    frame for encryption to the hardware prior to a key change the
    driver/card could have processed the queued frame after switching
    to the new key. This bumped the PN value on the remote STA to an
    incorrect high number, tricking the remote STA to discard all frames
    we sent later.

 3) Freeze caused by RX aggregation reorder buffer:
    An aggregation session started with the old key and ending after the
    switch to the new key also bumped the PN to an incorrect high number,
    freezing the connection quite similar to 1).

 4) Freeze caused by repeating lost frames in an aggregation session:
    A driver could repeat a lost frame and encrypt it with the new key
    while in a TX aggregation session without updating the PN for the
    new key. This also could freeze connections similar to 2).

 5) Clear text leak:
    Removing encryption offload from the card cleared the encryption
    offload flag only after the card had deleted the key and we did not
    stop TX during the rekey. The driver/card could therefore get
    unencrypted frames from mac80211 while no longer be instructed to
    encrypt them.

To prevent those issues the key install logic has been changed:
 - Mac80211 divers known to be able to rekey PTK0 keys have to set
   @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0,
 - mac80211 stops queuing frames depending on the key during the replace
 - the key is first replaced in the hardware and after that in mac80211
 - and mac80211 stops/blocks new aggregation sessions during the rekey.

For drivers not setting
@NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 the user space must avoid PTK
rekeys if "Extended Key ID for Individually Addressed Frames" is not
being used. Rekeys for mac80211 drivers without this flag will generate a
warning and use an extra call to ieee80211_flush_queues() to both
highlight and try to prevent the issues with not updated drivers.

The core of the fix changes the key install procedure from:
 - atomic switch over to the new key in mac80211
 - remove the old key in the hardware (stops encryption offloading, fall
   back to software encryption with a potential clear text packet leak
   in between)
 - delete the inactive old key in mac80211
 - enable hardware encryption offloading for the new key
to:
 - if it's a PTK mark the old key as tainted to drop TX frames with the
   outgoing key
 - replace the key in hardware with the new one
 - atomic switch over to the new (not marked as tainted) key in
   mac80211 (which also resumes TX)
 - delete the inactive old key in mac80211

With the new sequence the hardware will be unable to decrypt frames
encrypted with the old key prior to switching to the new key in mac80211
and thus prevent PNs from packets decrypted with the old key to be
accounted against the new key.

For that to work the drivers have to provide a clear boundary.
Mac80211 drivers setting @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 confirm
to provide it and mac80211 will then be able to correctly rekey in-use
PTK keys with those drivers.

The mac80211 requirements for drivers to set the flag have been added to
the "Hardware crypto acceleration" documentation section. It drills down
to:
The drivers must not hand over frames decrypted with the old key to
mac80211 once the call to set_key() with %DISABLE_KEY has been
completed. It's allowed to either drop or continue to use the old key
for any outgoing frames which are already in the queues, but it must not
send out any of them unencrypted or encrypted with the new key.

Even with the new boundary in place aggregation sessions with the
reorder buffer are problematic:
RX aggregation session started prior and completed after the rekey could
still dump frames received with the old key at mac80211 after it
switched over to the new key. This is side stepped by stopping all (RX
and TX) aggregation sessions when replacing a PTK key and hardware key
offloading.
Stopping TX aggregation sessions avoids the need to get
the PNs (IVs) updated in frames prepared for the old key and
(re)transmitted after the switch to the new key. As a bonus it improves
the compatibility when the remote STA is not handling rekeys as it
should.

When using software crypto aggregation sessions are not stopped.
Mac80211 won't be able to decode the dangerous frames and discard them
without special handling.

Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
[trim overly long rekey warning]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:17 +02:00
Shaul Triebitz
d1332e7be2 mac80211: support radiotap L-SIG data
As before with HE, the data needs to be provided by the
driver in the skb head, since there's not enough space
in the skb CB.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:15 +02:00
Wen Gong
70e53669c4 mac80211: Store sk_pacing_shift in ieee80211_hw
Make it possibly for drivers to adjust the default skb_pacing_shift
by storing it in the hardware struct.

Signed-off-by: Wen Gong <wgong@codeaurora.org>
[adjust commit log, move & adjust comment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:15 +02:00
Johannes Berg
e80d642552 mac80211: copy VHT EXT NSS BW Support/Capable data to station
When taking VHT capabilities for a station, copy the new
fields if we support them as a transmitter. Also adjust
the maximum bandwidth the station supports appropriately.

Also, since it was missing, copy tx_highest and rx_highest.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00
Johannes Berg
7eb26df297 mac80211: add ability to parse CCFS2
With newer VHT implementations, it's necessary to look at the
HT operation's CCFS2 field to identify the actual bandwidth
used.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00
Johannes Berg
09b4a4faf9 mac80211: introduce capability flags for VHT EXT NSS support
Depending on whether or not rate control supports selecting
rates depending on the bandwidth, we can use VHT extended
NSS support. In essence, this is dot11VHTExtendedNSSBWCapable
from the spec, since depending on that we'll need to parse
the bandwidth.

If needed, also set/clear the VHT Capability Element bit for
this capability so that we don't advertise it erroneously or
don't advertise it when we actually use it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00
Johannes Berg
b0aa75f0b1 ieee80211: add new VHT capability fields/parsing
IEEE 802.11-2016 extended the VHT capability fields to allow
indicating the number of spatial streams depending on the
actually used bandwidth, add support for decoding this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00
Shaul Triebitz
34fb190ec0 mac80211: in AP mode, set bss_conf::he_supported
In AP mode, If AP advertises HE capabilities, set to true
bss_conf::he_supported so that the Driver knows about it.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:13 +02:00
Shaul Triebitz
244eb9ae79 cfg80211: add he_capabilities (ext) IE to AP settings
Same as for HT and VHT.
This helps the lower level to know whether the AP supports HE.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:13 +02:00
Johannes Berg
adf8ed01e4 mac80211: add an optional TXQ for other PS-buffered frames
Some drivers may want to also use the TXQ abstraction with
non-data packets that need powersave buffering, so add a
hardware flag to allow this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:13 +02:00
Johannes Berg
331aead520 mac80211: use le16_encode_bits() instead of open-coding
Instead of open-coding the equivalent of le16_encode_bits(),
just use that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:13 +02:00
Johannes Berg
6f527287e0 mac80211: remove pointless 'params' NULL checks
These checks aren't necessary, cfg80211 never passes NULL.
Some static checkers complain about the missing checks on
the next line, but really the NULL checks are unnecessary.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:13 +02:00
Ido Yariv
80aaa9c164 mac80211: Add he_capa debugfs entry
Export HE capabilities information via debugfs, similar to HT & VHT.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:13 +02:00
David S. Miller
36302685f5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-04 21:33:03 -07:00
Linus Torvalds
28619527b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Must perform TXQ teardown before unregistering interfaces in
    mac80211, from Toke Høiland-Jørgensen.

 2) Don't allow creating mac80211_hwsim with less than one channel, from
    Johannes Berg.

 3) Division by zero in cfg80211, fix from Johannes Berg.

 4) Fix endian issue in tipc, from Haiqing Bai.

 5) BPF sockmap use-after-free fixes from Daniel Borkmann.

 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.

 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.

 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
    kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.

 9) Fix deadlock in hv_netvsc, from Dexuan Cui.

10) Do not restart timewait timer on RST, from Florian Westphal.

11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.

12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.

13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.

14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.

15) Use-after-free in act_ife, fix from Cong Wang.

16) Missing hold to meta module in act_ife, from Vlad Buslov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
  net: phy: sfp: Handle unimplemented hwmon limits and alarms
  net: sched: action_ife: take reference to meta module
  act_ife: fix a potential use-after-free
  net/mlx5: Fix SQ offset in QPs with small RQ
  tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
  tipc: correct spelling errors for struct tipc_bc_base's comment
  bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
  bnxt_en: Clean up unused functions.
  bnxt_en: Fix firmware signaled resource change logic in open.
  sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
  sctp: fix invalid reference to the index variable of the iterator
  net/ibm/emac: wrong emac_calc_base call was used by typo
  net: sched: null actions array pointer before releasing action
  vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
  r8169: add support for NCube 8168 network card
  ip6_tunnel: respect ttl inherit for ip6tnl
  mac80211: shorten the IBSS debug messages
  mac80211: don't Tx a deauth frame if the AP forbade Tx
  mac80211: Fix station bandwidth setting after channel switch
  mac80211: fix a race between restart and CSA flows
  ...
2018-09-04 12:45:11 -07:00
Vlad Buslov
84cb8eb26c net: sched: action_ife: take reference to meta module
Recent refactoring of add_metainfo() caused use_all_metadata() to add
metainfo to ife action metalist without taking reference to module. This
causes warning in module_put called from ife action cleanup function.

Implement add_metainfo_and_get_ops() function that returns with reference
to module taken if metainfo was added successfully, and call it from
use_all_metadata(), instead of calling __add_metainfo() directly.

Example warning:

[  646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230
[  646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
[  646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
[  646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  646.440595] RIP: 0010:module_put+0x1cb/0x230
[  646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
[  646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286
[  646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db
[  646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518
[  646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3
[  646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d
[  646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100
[  646.508213] FS:  00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000
[  646.516961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0
[  646.530595] Call Trace:
[  646.533408]  ? find_symbol_in_section+0x260/0x260
[  646.538509]  tcf_ife_cleanup+0x11b/0x200 [act_ife]
[  646.543695]  tcf_action_cleanup+0x29/0xa0
[  646.548078]  __tcf_action_put+0x5a/0xb0
[  646.552289]  ? nla_put+0x65/0xe0
[  646.555889]  __tcf_idr_release+0x48/0x60
[  646.560187]  tcf_generic_walker+0x448/0x6b0
[  646.564764]  ? tcf_action_dump_1+0x450/0x450
[  646.569411]  ? __lock_is_held+0x84/0x110
[  646.573720]  ? tcf_ife_walker+0x10c/0x20f [act_ife]
[  646.578982]  tca_action_gd+0x972/0xc40
[  646.583129]  ? tca_get_fill.constprop.17+0x250/0x250
[  646.588471]  ? mark_lock+0xcf/0x980
[  646.592324]  ? check_chain_key+0x140/0x1f0
[  646.596832]  ? debug_show_all_locks+0x240/0x240
[  646.601839]  ? memset+0x1f/0x40
[  646.605350]  ? nla_parse+0xca/0x1a0
[  646.609217]  tc_ctl_action+0x215/0x230
[  646.613339]  ? tcf_action_add+0x220/0x220
[  646.617748]  rtnetlink_rcv_msg+0x56a/0x6d0
[  646.622227]  ? rtnl_fdb_del+0x3f0/0x3f0
[  646.626466]  netlink_rcv_skb+0x18d/0x200
[  646.630752]  ? rtnl_fdb_del+0x3f0/0x3f0
[  646.634959]  ? netlink_ack+0x500/0x500
[  646.639106]  netlink_unicast+0x2d0/0x370
[  646.643409]  ? netlink_attachskb+0x340/0x340
[  646.648050]  ? _copy_from_iter_full+0xe9/0x3e0
[  646.652870]  ? import_iovec+0x11e/0x1c0
[  646.657083]  netlink_sendmsg+0x3b9/0x6a0
[  646.661388]  ? netlink_unicast+0x370/0x370
[  646.665877]  ? netlink_unicast+0x370/0x370
[  646.670351]  sock_sendmsg+0x6b/0x80
[  646.674212]  ___sys_sendmsg+0x4a1/0x520
[  646.678443]  ? copy_msghdr_from_user+0x210/0x210
[  646.683463]  ? lock_downgrade+0x320/0x320
[  646.687849]  ? debug_show_all_locks+0x240/0x240
[  646.692760]  ? do_raw_spin_unlock+0xa2/0x130
[  646.697418]  ? _raw_spin_unlock+0x24/0x30
[  646.701798]  ? __handle_mm_fault+0x1819/0x1c10
[  646.706619]  ? __pmd_alloc+0x320/0x320
[  646.710738]  ? debug_show_all_locks+0x240/0x240
[  646.715649]  ? restore_nameidata+0x7b/0xa0
[  646.720117]  ? check_chain_key+0x140/0x1f0
[  646.724590]  ? check_chain_key+0x140/0x1f0
[  646.729070]  ? __fget_light+0xbc/0xd0
[  646.733121]  ? __sys_sendmsg+0xd7/0x150
[  646.737329]  __sys_sendmsg+0xd7/0x150
[  646.741359]  ? __ia32_sys_shutdown+0x30/0x30
[  646.746003]  ? up_read+0x53/0x90
[  646.749601]  ? __do_page_fault+0x484/0x780
[  646.754105]  ? do_syscall_64+0x1e/0x2c0
[  646.758320]  do_syscall_64+0x72/0x2c0
[  646.762353]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  646.767776] RIP: 0033:0x7f4163872150
[  646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
[  646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150
[  646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003
[  646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000
[  646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20
[  646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0
[  646.837360] irq event stamp: 6083
[  646.841043] hardirqs last  enabled at (6081): [<ffffffff8c220a7d>] __call_rcu+0x17d/0x500
[  646.849882] hardirqs last disabled at (6083): [<ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c
[  646.859775] softirqs last  enabled at (5968): [<ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee
[  646.868784] softirqs last disabled at (6082): [<ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife]
[  646.878845] ---[ end trace b1b8c12ffe51e657 ]---

Fixes: 5ffe57da29 ("act_ife: fix a potential deadlock")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-04 12:20:21 -07:00
Cong Wang
6d784f1625 act_ife: fix a potential use-after-free
Immediately after module_put(), user could delete this
module, so e->ops could be already freed before we call
e->ops->release().

Fix this by moving module_put() after ops->release().

Fixes: ef6980b6be ("introduce IFE action")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-04 12:18:25 -07:00
David S. Miller
fc3e3bf55f Here are quite a large number of fixes, notably:
* various A-MSDU building fixes (currently only affects mt76)
  * syzkaller & spectre fixes in hwsim
  * TXQ vs. teardown fix that was causing crashes
  * embed WMM info in reg rule, bad code here had been causing crashes
  * one compilation issue with fix from Arnd (rfkill-gpio includes)
  * fixes for a race and bad data during/after channel switch
  * nl80211: a validation fix, attribute type & unit fixes
 along with other small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAluNJXcACgkQB8qZga/f
 l8Qvfw//dBwlhMII862Evk4M8OzhdHfkJ4Kp/d2C476whbEySU/jRIIeetmVpXYV
 5cfStTxBpGkwMj5PXy3DaA2PO++L5qaApDJfHc8DNWNmvt9rRRJul1zP05HjZRxW
 G7aFCFRWVK0dlmVP9GC/b20KyUvz4OpklBnxylkIrx0FCkw5bAHs1SsjGZCg/6Tm
 008DAhFz3Ds6hNLxwricvrk5oQ6eC1cDfDd4Rtk3jCYQ4t7KFn5gFoKzKldfLdWe
 TFTpVQ26XAGzn9QVXzAiXN4ZNpUpZrFXosC7cn5Ugiyic4YtnHxS2wVDuL3vs1cL
 J2hoW6wjEBg+U6vmHMcijo1lnQwW7ueYUDWLJPNIXHA6A7sGyA6z6D7vbbvHfoG6
 L681BrYmTmKkXXquu5+r85/9WgP2cmzbRpoIxTQl3sU2Liw2k5IJ9ryLLyul+8z7
 spnDPOY7h4c0JrAvhjHkrKIbbW4FKYunxZJ8dn9eyAzOd/58iKoXzu4yAggwm+0V
 DtZiu0gSr52sKrh1vqEyfhrPFCN1Mc19DRsJBtabUfVEveQTwToCkbZ5s1sLqSId
 m30XUjjYOiRk7MZnncar0lE4//eJ6bnL3Wie3UTmO3xsMwlgKQPqjI4TprNogUCk
 R2dVeGmhm3HSriRHKJL3/D8uzw5mMBI3Kicw9tFSSyVjtJgxvpg=
 =lLBA
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2018-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Here are quite a large number of fixes, notably:
 * various A-MSDU building fixes (currently only affects mt76)
 * syzkaller & spectre fixes in hwsim
 * TXQ vs. teardown fix that was causing crashes
 * embed WMM info in reg rule, bad code here had been causing crashes
 * one compilation issue with fix from Arnd (rfkill-gpio includes)
 * fixes for a race and bad data during/after channel switch
 * nl80211: a validation fix, attribute type & unit fixes
along with other small fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 22:12:02 -07:00
Zhenbo Gao
a484ef3442 tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
tipc_conn_queue_evt -> tipc_topsrv_queue_evt
tipc_send_work -> tipc_conn_send_work
tipc_send_to_sock -> tipc_conn_send_to_sock

Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 22:03:07 -07:00
Zhenbo Gao
9cc1bf3928 tipc: correct spelling errors for struct tipc_bc_base's comment
Trivial fix for two spelling mistakes.

Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 22:03:07 -07:00
Xin Long
741880e1f2 sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
When users set params.spp_address and get a trans, ipv6_flowlabel flag
should be applied into this trans. But even if this one is not an ipv6
trans, it should not go to apply it into all other transes of the asoc
but simply ignore it.

Fixes: 0b0dce7a36 ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 21:57:54 -07:00
Xin Long
af8a2b8ba7 sctp: fix invalid reference to the index variable of the iterator
Now in sctp_apply_peer_addr_params(), if SPP_IPV6_FLOWLABEL flag is set
and trans is NULL, it would use trans as the index variable to traverse
transport_addr_list, then trans is set as the last transport of it.

Later, if SPP_DSCP flag is set, it would enter into the wrong branch as
trans is actually an invalid reference.

So fix it by using a new index variable to traverse transport_addr_list
for both SPP_DSCP and SPP_IPV6_FLOWLABEL flags process.

Fixes: 0b0dce7a36 ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 21:57:54 -07:00
Vlad Buslov
c10bbfae3a net: sched: null actions array pointer before releasing action
Currently, tcf_action_delete() nulls actions array pointer after putting
and deleting it. However, if tcf_idr_delete_index() returns an error,
pointer to action is not set to null. That results it being released second
time in error handling code of tca_action_gd().

Kasan error:

[  807.367755] ==================================================================
[  807.375844] BUG: KASAN: use-after-free in tc_setup_cb_call+0x14e/0x250
[  807.382763] Read of size 8 at addr ffff88033e636000 by task tc/2732

[  807.391289] CPU: 0 PID: 2732 Comm: tc Tainted: G        W         4.19.0-rc1+ #799
[  807.399542] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  807.407948] Call Trace:
[  807.410763]  dump_stack+0x92/0xeb
[  807.414456]  print_address_description+0x70/0x360
[  807.419549]  kasan_report+0x14d/0x300
[  807.423582]  ? tc_setup_cb_call+0x14e/0x250
[  807.428150]  tc_setup_cb_call+0x14e/0x250
[  807.432539]  ? nla_put+0x65/0xe0
[  807.436146]  fl_dump+0x394/0x3f0 [cls_flower]
[  807.440890]  ? fl_tmplt_dump+0x140/0x140 [cls_flower]
[  807.446327]  ? lock_downgrade+0x320/0x320
[  807.450702]  ? lock_acquire+0xe2/0x220
[  807.454819]  ? is_bpf_text_address+0x5/0x140
[  807.459475]  ? memcpy+0x34/0x50
[  807.462980]  ? nla_put+0x65/0xe0
[  807.466582]  tcf_fill_node+0x341/0x430
[  807.470717]  ? tcf_block_put+0xe0/0xe0
[  807.474859]  tcf_node_dump+0xdb/0xf0
[  807.478821]  fl_walk+0x8e/0x170 [cls_flower]
[  807.483474]  tcf_chain_dump+0x35a/0x4d0
[  807.487703]  ? tfilter_notify+0x170/0x170
[  807.492091]  ? tcf_fill_node+0x430/0x430
[  807.496411]  tc_dump_tfilter+0x362/0x3f0
[  807.500712]  ? tc_del_tfilter+0x850/0x850
[  807.505104]  ? kasan_unpoison_shadow+0x30/0x40
[  807.509940]  ? __mutex_unlock_slowpath+0xcf/0x410
[  807.515031]  netlink_dump+0x263/0x4f0
[  807.519077]  __netlink_dump_start+0x2a0/0x300
[  807.523817]  ? tc_del_tfilter+0x850/0x850
[  807.528198]  rtnetlink_rcv_msg+0x46a/0x6d0
[  807.532671]  ? rtnl_fdb_del+0x3f0/0x3f0
[  807.536878]  ? tc_del_tfilter+0x850/0x850
[  807.541280]  netlink_rcv_skb+0x18d/0x200
[  807.545570]  ? rtnl_fdb_del+0x3f0/0x3f0
[  807.549773]  ? netlink_ack+0x500/0x500
[  807.553913]  netlink_unicast+0x2d0/0x370
[  807.558212]  ? netlink_attachskb+0x340/0x340
[  807.562855]  ? _copy_from_iter_full+0xe9/0x3e0
[  807.567677]  ? import_iovec+0x11e/0x1c0
[  807.571890]  netlink_sendmsg+0x3b9/0x6a0
[  807.576192]  ? netlink_unicast+0x370/0x370
[  807.580684]  ? netlink_unicast+0x370/0x370
[  807.585154]  sock_sendmsg+0x6b/0x80
[  807.589015]  ___sys_sendmsg+0x4a1/0x520
[  807.593230]  ? copy_msghdr_from_user+0x210/0x210
[  807.598232]  ? do_wp_page+0x174/0x880
[  807.602276]  ? __handle_mm_fault+0x749/0x1c10
[  807.607021]  ? __handle_mm_fault+0x1046/0x1c10
[  807.611849]  ? __pmd_alloc+0x320/0x320
[  807.615973]  ? check_chain_key+0x140/0x1f0
[  807.620450]  ? check_chain_key+0x140/0x1f0
[  807.624929]  ? __fget_light+0xbc/0xd0
[  807.628970]  ? __sys_sendmsg+0xd7/0x150
[  807.633172]  __sys_sendmsg+0xd7/0x150
[  807.637201]  ? __ia32_sys_shutdown+0x30/0x30
[  807.641846]  ? up_read+0x53/0x90
[  807.645442]  ? __do_page_fault+0x484/0x780
[  807.649949]  ? do_syscall_64+0x1e/0x2c0
[  807.654164]  do_syscall_64+0x72/0x2c0
[  807.658198]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  807.663625] RIP: 0033:0x7f42e9870150
[  807.667568] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
[  807.687328] RSP: 002b:00007ffdbf595b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  807.695564] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f42e9870150
[  807.703083] RDX: 0000000000000000 RSI: 00007ffdbf595b80 RDI: 0000000000000003
[  807.710605] RBP: 00007ffdbf599d90 R08: 0000000000679bc0 R09: 000000000000000f
[  807.718127] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdbf599d88
[  807.725651] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

[  807.735048] Allocated by task 2687:
[  807.738902]  kasan_kmalloc+0xa0/0xd0
[  807.742852]  __kmalloc+0x118/0x2d0
[  807.746615]  tcf_idr_create+0x44/0x320
[  807.750738]  tcf_nat_init+0x41e/0x530 [act_nat]
[  807.755638]  tcf_action_init_1+0x4e0/0x650
[  807.760104]  tcf_action_init+0x1ce/0x2d0
[  807.764395]  tcf_exts_validate+0x1d8/0x200
[  807.768861]  fl_change+0x55a/0x26b4 [cls_flower]
[  807.773845]  tc_new_tfilter+0x748/0xa20
[  807.778051]  rtnetlink_rcv_msg+0x56a/0x6d0
[  807.782517]  netlink_rcv_skb+0x18d/0x200
[  807.786804]  netlink_unicast+0x2d0/0x370
[  807.791095]  netlink_sendmsg+0x3b9/0x6a0
[  807.795387]  sock_sendmsg+0x6b/0x80
[  807.799240]  ___sys_sendmsg+0x4a1/0x520
[  807.803445]  __sys_sendmsg+0xd7/0x150
[  807.807473]  do_syscall_64+0x72/0x2c0
[  807.811506]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  807.818776] Freed by task 2728:
[  807.822283]  __kasan_slab_free+0x122/0x180
[  807.826752]  kfree+0xf4/0x2f0
[  807.830080]  __tcf_action_put+0x5a/0xb0
[  807.834281]  tcf_action_put_many+0x46/0x70
[  807.838747]  tca_action_gd+0x232/0xc40
[  807.842862]  tc_ctl_action+0x215/0x230
[  807.846977]  rtnetlink_rcv_msg+0x56a/0x6d0
[  807.851444]  netlink_rcv_skb+0x18d/0x200
[  807.855731]  netlink_unicast+0x2d0/0x370
[  807.860021]  netlink_sendmsg+0x3b9/0x6a0
[  807.864312]  sock_sendmsg+0x6b/0x80
[  807.868166]  ___sys_sendmsg+0x4a1/0x520
[  807.872372]  __sys_sendmsg+0xd7/0x150
[  807.876401]  do_syscall_64+0x72/0x2c0
[  807.880431]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  807.887704] The buggy address belongs to the object at ffff88033e636000
                which belongs to the cache kmalloc-256 of size 256
[  807.900909] The buggy address is located 0 bytes inside of
                256-byte region [ffff88033e636000, ffff88033e636100)
[  807.913155] The buggy address belongs to the page:
[  807.918322] page:ffffea000cf98d80 count:1 mapcount:0 mapping:ffff88036f80ee00 index:0x0 compound_mapcount: 0
[  807.928831] flags: 0x5fff8000008100(slab|head)
[  807.933647] raw: 005fff8000008100 ffffea000db44f00 0000000400000004 ffff88036f80ee00
[  807.942050] raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000
[  807.950456] page dumped because: kasan: bad access detected

[  807.958240] Memory state around the buggy address:
[  807.963405]  ffff88033e635f00: fc fc fc fc fb fb fb fb fb fb fb fc fc fc fc fb
[  807.971288]  ffff88033e635f80: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
[  807.979166] >ffff88033e636000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  807.994882]                    ^
[  807.998477]  ffff88033e636080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  808.006352]  ffff88033e636100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[  808.014230] ==================================================================
[  808.022108] Disabling lock debugging due to kernel taint

Fixes: edfaf94fa7 ("net_sched: improve and refactor tcf_action_put_many()")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 21:47:33 -07:00
Zhenbo Gao
05a6843c92 tipc: correct structure parameter comments for topsrv
Remove the following obsolete parameter comments of tipc_topsrv struct:
  @rcvbuf_cache
  @tipc_conn_new
  @tipc_conn_release
  @tipc_conn_recvmsg
  @imp
  @type

Add the comments for the missing parameters below of tipc_topsrv struct:
  @awork
  @listener

Remove the unused or duplicated parameter comments of tipc_conn struct:
  @outqueue_lock
  @rx_action

Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 21:43:42 -07:00
Hangbin Liu
36feaac354 ip6_tunnel: respect ttl inherit for ip6tnl
man ip-tunnel ttl section says:
0 is a special value meaning that packets inherit the TTL value.

IPv4 tunnel respect this in ip_tunnel_xmit(), but IPv6 tunnel has not
implement it yet. To make IPv6 behave consistently with IP tunnel,
add ipv6 tunnel inherit support.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 19:04:12 -07:00
Emmanuel Grumbach
c6e57b3896 mac80211: shorten the IBSS debug messages
When tracing is enabled, all the debug messages are recorded and must
not exceed MAX_MSG_LEN (100) columns. Longer debug messages grant the
user with:

WARNING: CPU: 3 PID: 32642 at /tmp/wifi-core-20180806094828/src/iwlwifi-stack-dev/net/mac80211/./trace_msg.h:32 trace_event_raw_event_mac80211_msg_event+0xab/0xc0 [mac80211]
Workqueue: phy1 ieee80211_iface_work [mac80211]
 RIP: 0010:trace_event_raw_event_mac80211_msg_event+0xab/0xc0 [mac80211]
 Call Trace:
  __sdata_dbg+0xbd/0x120 [mac80211]
  ieee80211_ibss_rx_queued_mgmt+0x15f/0x510 [mac80211]
  ieee80211_iface_work+0x21d/0x320 [mac80211]

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-03 10:41:27 +02:00
Emmanuel Grumbach
6c18b27d6e mac80211: don't Tx a deauth frame if the AP forbade Tx
If the driver fails to properly prepare for the channel
switch, mac80211 will disconnect. If the CSA IE had mode
set to 1, it means that the clients are not allowed to send
any Tx on the current channel, and that includes the
deauthentication frame.

Make sure that we don't send the deauthentication frame in
this case.

In iwlwifi, this caused a failure to flush queues since the
firmware already closed the queues after having parsed the
CSA IE. Then mac80211 would wait until the deauthentication
frame would go out (drv_flush(drop=false)) and that would
never happen.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-03 10:39:34 +02:00
Ilan Peer
0007e94355 mac80211: Fix station bandwidth setting after channel switch
When performing a channel switch flow for a managed interface, the
flow did not update the bandwidth of the AP station and the rate
scale algorithm. In case of a channel width downgrade, this would
result with the rate scale algorithm using a bandwidth that does not
match the interface channel configuration.

Fix this by updating the AP station bandwidth and rate scaling algorithm
before the actual channel change in case of a bandwidth downgrade, or
after the actual channel change in case of a bandwidth upgrade.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-03 10:39:11 +02:00
Emmanuel Grumbach
f3ffb6c3a2 mac80211: fix a race between restart and CSA flows
We hit a problem with iwlwifi that was caused by a bug in
mac80211. A bug in iwlwifi caused the firwmare to crash in
certain cases in channel switch. Because of that bug,
drv_pre_channel_switch would fail and trigger the restart
flow.
Now we had the hw restart worker which runs on the system's
workqueue and the csa_connection_drop_work worker that runs
on mac80211's workqueue that can run together. This is
obviously problematic since the restart work wants to
reconfigure the connection, while the csa_connection_drop_work
worker does the exact opposite: it tries to disconnect.

Fix this by cancelling the csa_connection_drop_work worker
in the restart worker.

Note that this can sound racy: we could have:

driver   iface_work   CSA_work   restart_work
+++++++++++++++++++++++++++++++++++++++++++++
              |
 <--drv_cs ---|
<FW CRASH!>
-CS FAILED-->
              |                       |
              |                 cancel_work(CSA)
           schedule                   |
           CSA work                   |
                         |            |
                        Race between those 2

But this is not possible because we flush the workqueue
in the restart worker before we cancel the CSA worker.
That would be bullet proof if we could guarantee that
we schedule the CSA worker only from the iface_work
which runs on the workqueue (and not on the system's
workqueue), but unfortunately we do have an instance
in which we schedule the CSA work outside the context
of the workqueue (ieee80211_chswitch_done).

Note also that we should probably cancel other workers
like beacon_connection_loss_work and possibly others
for different types of interfaces, at the very least,
IBSS should suffer from the exact same problem, but for
now, do the minimum to fix the actual bug that was actually
experienced and reproduced.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-03 10:38:39 +02:00
Dreyfuss, Haim
abd76d255d mac80211: fix WMM TXOP calculation
In commit 9236c4523e5b ("mac80211: limit wmm params to comply
with ETSI requirements"), we have limited the WMM parameters to
comply with 802.11 and ETSI standard.  Mistakenly the TXOP value
was caluclated wrong.  Fix it by taking the minimum between
802.11 to ETSI to make sure we are not violating both.

Fixes: e552af0581 ("mac80211: limit wmm params to comply with ETSI requirements")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-03 10:35:03 +02:00
Dan Carpenter
8442938c3a cfg80211: fix a type issue in ieee80211_chandef_to_operating_class()
The "chandef->center_freq1" variable is a u32 but "freq" is a u16 so we
are truncating away the high bits.  I noticed this bug because in commit
9cf0a0b4b6 ("cfg80211: Add support for 60GHz band channels 5 and 6")
we made "freq <= 56160 + 2160 * 6" a valid requency when before it was
only "freq <= 56160 + 2160 * 4" that was valid.  It introduces a static
checker warning:

    net/wireless/util.c:1571 ieee80211_chandef_to_operating_class()
    warn: always true condition '(freq <= 56160 + 2160 * 6) => (0-u16max <= 69120)'

But really we probably shouldn't have been truncating the high bits
away to begin with.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-03 10:34:12 +02:00
Lorenzo Bianconi
66eb02d839 mac80211: fix an off-by-one issue in A-MSDU max_subframe computation
Initialize 'n' to 2 in order to take into account also the first
packet in the estimation of max_subframe limit for a given A-MSDU
since frag_tail pointer is NULL when ieee80211_amsdu_aggregate
routine analyzes the second frame.

Fixes: 6e0456b545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-03 10:29:14 +02:00
Yafang Shao
743e481580 tcp: remove useless add operation when init sysctl_max_tw_buckets
cp_hashinfo.ehash_mask is always an odd number, which is set in function
alloc_large_system_hash(). See bellow,
        if (_hash_mask)
                *_hash_mask = (1 << log2qty) - 1; <<< always odd number

Hence the local variable 'cnt' is a even number, as a result of that it is
no difference to do the incrementation here.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02 16:12:41 -07:00
David S. Miller
a80afe89d8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-09-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix one remaining buggy offset override in sockmap's bpf_msg_pull_data()
   when linearizing multiple scatterlist elements, from Tushar.

2) Fix BPF sockmap's misuse of ULP when a collision with another ULP is
   found on map update where it would release existing ULP. syzbot found and
   triggered this couple of times now, fix from John.

3) Add missing xskmap type to bpftool so it will properly show the type
   on map dump, from Prashant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02 15:53:38 -07:00
David Ahern
15a81b418e net/ipv6: Only update MTU metric if it set
Jan reported a regression after an update to 4.18.5. In this case ipv6
default route is setup by systemd-networkd based on data from an RA. The
RA contains an MTU of 1492 which is used when the route is first inserted
but then systemd-networkd pushes down updates to the default route
without the mtu set.

Prior to the change to fib6_info, metrics such as MTU were held in the
dst_entry and rt6i_pmtu in rt6_info contained an update to the mtu if
any. ip6_mtu would look at rt6i_pmtu first and use it if set. If not,
the value from the metrics is used if it is set and finally falling
back to the idev value.

After the fib6_info change metrics are contained in the fib6_info struct
and there is no equivalent to rt6i_pmtu. To maintain consistency with
the old behavior the new code should only reset the MTU in the metrics
if the route update has it set.

Fixes: d4ead6b34b ("net/ipv6: move metrics from dst to rt6_info")
Reported-by: Jan Janssen <medhefgo@web.de>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02 14:03:54 -07:00
Hangbin Liu
ff06525fcb igmp: fix incorrect unsolicit report count after link down and up
After link down and up, i.e. when call ip_mc_up(), we doesn't init
im->unsolicit_count. So after igmp_timer_expire(), we will not start
timer again and only send one unsolicit report at last.

Fix it by initializing im->unsolicit_count in igmp_group_added(), so
we can respect igmp robustness value.

Fixes: 24803f38a5 ("igmp: do not remove igmp souce list info when set link down")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02 13:39:37 -07:00
Hangbin Liu
4fb7253e4f igmp: fix incorrect unsolicit report count when join group
We should not start timer if im->unsolicit_count equal to 0 after decrease.
Or we will send one more unsolicit report message. i.e. 3 instead of 2 by
default.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02 13:39:37 -07:00
Tushar Dave
9db39f4d4f bpf: Fix bpf_msg_pull_data()
Helper bpf_msg_pull_data() mistakenly reuses variable 'offset' while
linearizing multiple scatterlist elements. Variable 'offset' is used
to find first starting scatterlist element
    i.e. msg->data = sg_virt(&sg[first_sg]) + start - offset"

Use different variable name while linearizing multiple scatterlist
elements so that value contained in variable 'offset' won't get
overwritten.

Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-02 22:29:53 +02:00
Linus Torvalds
501dacbc24 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:
 "A small set of updates for core code:

   - Prevent tracing in functions which are called from trace patching
     via stop_machine() to prevent executing half patched function trace
     entries.

   - Remove old GCC workarounds

   - Remove pointless includes of notifier.h"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Remove workaround for unreachable warnings from old GCC
  notifier: Remove notifier header file wherever not used
  watchdog: Mark watchdog touch functions as notrace
2018-09-02 09:41:45 -07:00
Arnd Bergmann
2de9d505fb rds: store socket timestamps as ktime_t
rds is the last in-kernel user of the old do_gettimeofday()
function. Convert it over to ktime_get_real() to make it
work more like the generic socket timestamps, and to let
us kill off do_gettimeofday().

A follow-up patch will have to change the user space interface
to deal better with 32-bit tasks, which may use an incompatible
layout for 'struct timespec'.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01 19:52:50 -07:00
Vakul Garg
94524d8fc9 net/tls: Add support for async decryption of tls records
When tls records are decrypted using asynchronous acclerators such as
NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
getting -EINPROGRESS, the tls record processing stops till the time the
crypto accelerator finishes off and returns the result. This incurs a
context switch and is not an efficient way of accessing the crypto
accelerators. Crypto accelerators work efficient when they are queued
with multiple crypto jobs without having to wait for the previous ones
to complete.

The patch submits multiple crypto requests without having to wait for
for previous ones to complete. This has been implemented for records
which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
for all the asynchronous decryption requests to complete.

The references to records which have been sent for async decryption are
dropped. For cases where record decryption is not possible in zero-copy
mode, asynchronous decryption is not used and we wait for decryption
crypto api to complete.

For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context when recvmsg() flags that it has done sending
all the decryption requests and there are no more decryption requests
pending to be completed.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Reviewed-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01 19:52:50 -07:00
Alexey Kodanev
93bbadd6e0 ipv6: don't get lwtstate twice in ip6_rt_copy_init()
Commit 80f1a0f4e0 ("net/ipv6: Put lwtstate when destroying fib6_info")
partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
with ip6_rt_copy_init(), and it should be done only once there.

rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
ip6_rt_copy_init(), so there is no need to get it again at the end.

With this patch, lwtstate also isn't copied from RTF_REJECT routes.

[1]:
unreferenced object 0xffff880b6aaa14e0 (size 64):
  comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
  hex dump (first 32 bytes):
    01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
    [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
    [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
    [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
    [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
    [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
    [<000000004c893c76>] netlink_unicast+0x417/0x5a0
    [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
    [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
    [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
    [<000000001e7426c8>] __sys_sendmsg+0xde/0x170
    [<00000000fe411443>] do_syscall_64+0x9f/0x4a0
    [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000006d21f353>] 0xffffffffffffffff

Fixes: 6edb3c96a5 ("net/ipv6: Defer initialization of dst to data path")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01 17:42:12 -07:00
Andy Shevchenko
459479da97 bridge: Switch to bitmap_zalloc()
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:13:04 -07:00
Florian Westphal
63cc357f7b tcp: do not restart timewait timer on rst reception
RFC 1337 says:
 ''Ignore RST segments in TIME-WAIT state.
   If the 2 minute MSL is enforced, this fix avoids all three hazards.''

So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
expire rather than removing it instantly when a reset is received.

However, Linux will also re-start the TIME-WAIT timer.

This causes connect to fail when tying to re-use ports or very long
delays (until syn retry interval exceeds MSL).

packetdrill test case:
// Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode.
`sysctl net.ipv4.tcp_rfc1337=1`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive first segment
0.310 < P. 1:1001(1000) ack 1 win 46

// Send one ACK
0.310 > . 1:1(0) ack 1001

// read 1000 byte
0.310 read(4, ..., 1000) = 1000

// Application writes 100 bytes
0.350 write(4, ..., 100) = 100
0.350 > P. 1:101(100) ack 1001

// ACK
0.500 < . 1001:1001(0) ack 101 win 257

// close the connection
0.600 close(4) = 0
0.600 > F. 101:101(0) ack 1001 win 244

// Our side is in FIN_WAIT_1 & waits for ack to fin
0.7 < . 1001:1001(0) ack 102 win 244

// Our side is in FIN_WAIT_2 with no outstanding data.
0.8 < F. 1001:1001(0) ack 102 win 244
0.8 > . 102:102(0) ack 1002 win 244

// Our side is now in TIME_WAIT state, send ack for fin.
0.9 < F. 1002:1002(0) ack 102 win 244
0.9 > . 102:102(0) ack 1002 win 244

// Peer reopens with in-window SYN:
1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>

// Therefore, reply with ACK.
1.000 > . 102:102(0) ack 1002 win 244

// Peer sends RST for this ACK.  Normally this RST results
// in tw socket removal, but rfc1337=1 setting prevents this.
1.100 < R 1002:1002(0) win 244

// second syn. Due to rfc1337=1 expect another pure ACK.
31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
31.0 > . 102:102(0) ack 1002 win 244

// .. and another RST from peer.
31.1 < R 1002:1002(0) win 244
31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT`

// third syn after one minute.  Time-Wait socket should have expired by now.
63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>

// so we expect a syn-ack & 3whs to proceed from here on.
63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>

Without this patch, 'ss' shows restarts of tw timer and last packet is
thus just another pure ack, more than one minute later.

This restores the original code from commit 283fd6cf0be690a83
("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git .

For some reason the else branch was removed/lost in 1f28b683339f7
("Merge in TCP/UDP optimizations and [..]") and timer restart became
unconditional.

Reported-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:10:35 -07:00
Pavel Machek
b0e0b0abbd net/rds: RDS is not Radio Data System
Getting prompt "The RDS Protocol" (RDS) is not too helpful, and it is
easily confused with Radio Data System (which we may want to support
in kernel, too).

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:09:53 -07:00
David Ahern
1367bbf52a net/ipv6: Do not reset nl_net in ip6_route_info_create
nl_net is set on entry to ip6_route_info_create. Only devices
within that namespace are considered so no need to reset it
before returning.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:04:52 -07:00
David Ahern
066b103008 net/ipv4: Add extack message that dev is required for ONLINK
Make IPv4 consistent with IPv6 and return an extack message that the
ONLINK flag requires a nexthop device.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:04:03 -07:00
Yuchung Cheng
7788174e87 tcp: change IPv6 flow-label upon receiving spurious retransmission
Currently a Linux IPv6 TCP sender will change the flow label upon
timeouts to potentially steer away from a data path that has gone
bad. However this does not help if the problem is on the ACK path
and the data path is healthy. In this case the receiver is likely
to receive repeated spurious retransmission because the sender
couldn't get the ACKs in time and has recurring timeouts.

This patch adds another feature to mitigate this problem. It
leverages the DSACK states in the receiver to change the flow
label of the ACKs to speculatively re-route the ACK packets.
In order to allow triggering on the second consecutive spurious
RTO, the receiver changes the flow label upon sending a second
consecutive DSACK for a sequence number below RCV.NXT.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:03:00 -07:00
Eric Dumazet
3a7ad0634f Revert "packet: switch kvzalloc to allocate memory"
This reverts commit 71e4128620.

mmap()/munmap() can not be backed by kmalloced pages :

We fault in :

    VM_BUG_ON_PAGE(PageSlab(page), page);

    unmap_single_vma+0x8a/0x110
    unmap_vmas+0x4b/0x90
    unmap_region+0xc9/0x140
    do_munmap+0x274/0x360
    vm_munmap+0x81/0xc0
    SyS_munmap+0x2b/0x40
    do_syscall_64+0x13e/0x1c0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: 71e4128620 ("packet: switch kvzalloc to allocate memory")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: John Sperbeck <jsperbeck@google.com>
Bisected-by: John Sperbeck <jsperbeck@google.com>
Cc: Zhang Yu <zhangyu31@baidu.com>
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:00:28 -07:00
Cong Wang
506a03aa04 net_sched: add missing tcf_lock for act_connmark
According to the new locking rule, we have to take tcf_lock
for both ->init() and ->dump(), as RTNL will be removed.
However, it is missing for act_connmark.

Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 22:57:43 -07:00
Cong Wang
f061b48c17 Revert "net: sched: act: add extack for lookup callback"
This reverts commit 331a9295de ("net: sched: act: add extack for lookup callback").

This extack is never used after 6 months... In fact, it can be just
set in the caller, right after ->lookup().

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 22:50:15 -07:00
David S. Miller
fd3c040b24 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-09-01

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus.

2) BPF verifier improvements by giving each register its own liveness
   chain which allows to simplify and getting rid of skip_callee() logic,
   from Edward.

3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap
   and percpu lru hashmap. Also add generic percpu formatted print on
   bpftool so the same can be dumped there, from Yonghong.

4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and
   TCP_SAVED_SYN options to allow reflection of tos/tclass from received
   SYN packet, from Nikita.

5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2
   interaction and removal of incorrect shutdown() calls, from John.

6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 17:41:08 -07:00
Magnus Karlsson
93ee30f3e8 xsk: i40e: get rid of useless struct xdp_umem_props
This commit gets rid of the structure xdp_umem_props. It was there to
be able to break a dependency at one point, but this is no longer
needed. The values in the struct are instead stored directly in the
xdp_umem structure. This simplifies the xsk code as well as af_xdp
zero-copy drivers and as a bonus gets rid of one internal header file.

The i40e driver is also adapted to the new interface in this commit.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01 01:38:16 +02:00
Prashant Bhole
a29c8bb640 xsk: remove unnecessary assignment
Since xdp_umem_query() was added one assignment of bpf.command was
missed from cleanup. Removing the assignment statement.

Fixes: 84c6b86875 ("xsk: don't allow umem replace at stack level")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01 01:36:08 +02:00
Nikita V. Shirokov
1e215300f1 bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt
Adding support for two new bpf get/set sockopts: TCP_SAVE_SYN (set)
and TCP_SAVED_SYN (get). This would allow for bpf program to build
logic based on data from ingress SYN packet (e.g. doing tcp's tos/
tclass reflection (see sample prog)) and do it transparently from
userspace program point of view.

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01 01:35:59 +02:00
Colin Ian King
7296216776 xdp: remove redundant variable 'headroom'
Variable 'headroom' is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
variable ‘headroom’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01 01:35:53 +02:00
Taehee Yoo
7acfda539c netfilter: nf_tables: release chain in flushing set
When element of verdict map is deleted, the delete routine should
release chain. however, flush element of verdict map routine doesn't
release chain.

test commands:
   %nft add table ip filter
   %nft add chain ip filter c1
   %nft add map ip filter map1 { type ipv4_addr : verdict \; }
   %nft add element ip filter map1 { 1 : jump c1 }
   %nft flush map ip filter map1
   %nft flush ruleset

splat looks like:
[ 4895.170899] kernel BUG at net/netfilter/nf_tables_api.c:1415!
[ 4895.178114] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 4895.178880] CPU: 0 PID: 1670 Comm: nft Not tainted 4.18.0+ #55
[ 4895.178880] RIP: 0010:nf_tables_chain_destroy.isra.28+0x39/0x220 [nf_tables]
[ 4895.178880] Code: fc ff df 53 48 89 fb 48 83 c7 50 48 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 09 3c 03 7f 05 e8 3e 4c 25 e1 8b 43 50 85 c0 74 02 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02
[ 4895.228342] RSP: 0018:ffff88010b98f4c0 EFLAGS: 00010202
[ 4895.234841] RAX: 0000000000000001 RBX: ffff8801131c6968 RCX: ffff8801146585b0
[ 4895.234841] RDX: 1ffff10022638d37 RSI: ffff8801191a9348 RDI: ffff8801131c69b8
[ 4895.234841] RBP: ffff8801146585a8 R08: 1ffff1002323526a R09: 0000000000000000
[ 4895.234841] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200
[ 4895.234841] R13: dead000000000100 R14: ffffffffa3638af8 R15: dffffc0000000000
[ 4895.234841] FS:  00007f6d188e6700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 4895.234841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4895.234841] CR2: 00007ffe72b8df88 CR3: 000000010e2d4000 CR4: 00000000001006f0
[ 4895.234841] Call Trace:
[ 4895.234841]  nf_tables_commit+0x2704/0x2c70 [nf_tables]
[ 4895.234841]  ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
[ 4895.234841]  ? nf_tables_setelem_notify.constprop.48+0x1a0/0x1a0 [nf_tables]
[ 4895.323824]  ? __lock_is_held+0x9d/0x130
[ 4895.323824]  ? kasan_unpoison_shadow+0x30/0x40
[ 4895.333299]  ? kasan_kmalloc+0xa9/0xc0
[ 4895.333299]  ? kmem_cache_alloc_trace+0x2c0/0x310
[ 4895.333299]  ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
[ 4895.333299]  nfnetlink_rcv_batch+0xdb9/0x11b0 [nfnetlink]
[ 4895.333299]  ? debug_show_all_locks+0x290/0x290
[ 4895.333299]  ? nfnetlink_net_init+0x150/0x150 [nfnetlink]
[ 4895.333299]  ? sched_clock_cpu+0xe5/0x170
[ 4895.333299]  ? sched_clock_local+0xff/0x130
[ 4895.333299]  ? sched_clock_cpu+0xe5/0x170
[ 4895.333299]  ? find_held_lock+0x39/0x1b0
[ 4895.333299]  ? sched_clock_local+0xff/0x130
[ 4895.333299]  ? memset+0x1f/0x40
[ 4895.333299]  ? nla_parse+0x33/0x260
[ 4895.333299]  ? ns_capable_common+0x6e/0x110
[ 4895.333299]  nfnetlink_rcv+0x2c0/0x310 [nfnetlink]
[ ... ]

Fixes: 591054469b ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-31 19:07:11 +02:00
Florian Westphal
e075841220 netfilter: kconfig: nat related expression depend on nftables core
NF_TABLES_IPV4 is now boolean so it is possible to set

NF_TABLES=m
NF_TABLES_IPV4=y
NFT_CHAIN_NAT_IPV4=y

which causes:
nft_chain_nat_ipv4.c:(.text+0x6d): undefined reference to `nft_do_chain'

Wrap NFT_CHAIN_NAT_IPV4 and related nat expressions with NF_TABLES to
restore the dependency.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 02c7b25e5f ("netfilter: nf_tables: build-in filter chain type")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-31 19:07:10 +02:00
Björn Töpel
18baed2684 xsk: include XDP meta data in AF_XDP frames
Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did
not include XDP meta data in the data buffers copied out to the user
application.

In this commit, we check if meta data is available, and if so, it is
prepended to the frame.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30 15:25:40 +02:00
Mukesh Ojha
13ba17bee1 notifier: Remove notifier header file wherever not used
The conversion of the hotplug notifiers to a state machine left the
notifier.h includes around in some places. Remove them.

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org
2018-08-30 12:56:40 +02:00
Johannes Berg
aa58acf325 mac80211: always account for A-MSDU header changes
In the error path of changing the SKB headroom of the second
A-MSDU subframe, we would not account for the already-changed
length of the first frame that just got converted to be in
A-MSDU format and thus is a bit longer now.

Fix this by doing the necessary accounting.

It would be possible to reorder the operations, but that would
make the code more complex (to calculate the necessary pad),
and the headroom expansion should not fail frequently enough
to make that worthwhile.

Fixes: 6e0456b545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-30 11:04:19 +02:00
Lorenzo Bianconi
1eb5079036 mac80211: do not convert to A-MSDU if frag/subframe limited
Do not start to aggregate packets in a A-MSDU frame (converting the
first subframe to A-MSDU, adding the header) if max_tx_fragments or
max_amsdu_subframes limits are already exceeded by it. In particular,
this happens when drivers set the limit to 1 to avoid A-MSDUs at all.

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
[reword commit message to be more precise]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-30 09:59:22 +02:00
Arunk Khandavalli
4f0223bfe9 cfg80211: nl80211_update_ft_ies() to validate NL80211_ATTR_IE
nl80211_update_ft_ies() tried to validate NL80211_ATTR_IE with
is_valid_ie_attr() before dereferencing it, but that helper function
returns true in case of NULL pointer (i.e., attribute not included).
This can result to dereferencing a NULL pointer. Fix that by explicitly
checking that NL80211_ATTR_IE is included.

Fixes: 355199e02b ("cfg80211: Extend support for IEEE 802.11r Fast BSS Transition")
Signed-off-by: Arunk Khandavalli <akhandav@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-30 09:58:21 +02:00
David S. Miller
f0259b6ac4 Only a few changes at this point:
* new channels in 60 GHz
  * clarify (average) ACK signal reporting API
  * expose ieee80211_send_layer2_update() for all drivers
  * start/stop mac80211's TXQs properly when required
  * avoid regulatory restore with IE ignoring
  * spelling: contidion -> condition
  * fully implement WFA Multi-AP backhaul
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAluGiMwACgkQB8qZga/f
 l8SHMRAAhlYZNjZIHMqbyqRMFeGkgyfgQRYKb4xhrYr7v0542g5U99MWMNBtUJmq
 8aP4dzUFuPkR0qOi220PCs8PBBjuVcdTK1vq7AYiwiK4Um10/MtAlay6BUJFKVqU
 sJMaMbPy4mB3ocWl/q2K4nKaCZARsr854xwiIJZVwbc8n8t60Mr5ELbzELb5prGS
 jPpeRzYd7m4y4xnSYaiXWchdNOplFRN04NcuKJx10Pr3oWilGlj/ujGvwp78U6Uy
 v1H3T9S4XWMFkvl3deeOS6SVkejx76cvH8Ryoq+/qqQsAgs3c9tPQX+mwj6mNicq
 KsQcMIX6WmHNG5IcyaWat4LzdPgb5Xv31brA5tciZ3jebmIbc0P4dSYDLs7Jq1fg
 gkYuyNV3Jlwzv93RzqcrxfAIquZAvI7fy4CGiiRwtMk3wuHJlGy21PjGqZtWwqbn
 v0MbQf9riv1e653ygKSUpm1UoT5HMFbs6ZzbqpSy7Vr0e9+6B78Xkcp5u1DAvIan
 09YSqzKlypoGC+/802BL34HTpoUnf/hiBzVjYFmqvL/X2qv7oEUMOIv2x9Lg7NHh
 NZOPWcwjtPN57UP97Y6gCRAI2kJTigNdVnKISbIzZgNJ/HhB0M7ZmJ2UpB7EuJ1M
 q5aTIolqdoruwdGJ8d3gRr9xjDcuhhjr+FS8h6KfByJ0Qixroqk=
 =BW1J
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2018-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Only a few changes at this point:
 * new channels in 60 GHz
 * clarify (average) ACK signal reporting API
 * expose ieee80211_send_layer2_update() for all drivers
 * start/stop mac80211's TXQs properly when required
 * avoid regulatory restore with IE ignoring
 * spelling: contidion -> condition
 * fully implement WFA Multi-AP backhaul
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 22:13:47 -07:00
Paolo Abeni
97763dc0f4 net_sched: reject unknown tcfa_action values
After the commit 802bfb1915 ("net/sched: user-space can't set
unknown tcfa_action values"), unknown tcfa_action values are
converted to TC_ACT_UNSPEC, but the common agreement is instead
rejecting such configurations.

This change also introduces a helper to simplify the destruction
of a single action, avoiding code duplication.

v1 -> v2:
 - helper is now static and renamed according to act_* convention
 - updated extack message, according to the new behavior

Fixes: 802bfb1915 ("net/sched: user-space can't set unknown tcfa_action values")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 22:10:59 -07:00
Doron Roberts-Kedes
0927f71dbc net/tls: Calculate nsg for zerocopy path without skb_cow_data.
decrypt_skb fails if the number of sg elements required to map it
is greater than MAX_SKB_FRAGS. nsg must always be calculated, but
skb_cow_data adds unnecessary memcpy's for the zerocopy case.

The new function skb_nsg calculates the number of scatterlist elements
required to map the skb without the extra overhead of skb_cow_data.
This patch reduces memcpy by 50% on my encrypted NBD benchmarks.

Reported-by: Vakul Garg <Vakul.garg@nxp.com>
Reviewed-by: Vakul Garg <Vakul.garg@nxp.com>
Tested-by: Vakul Garg <Vakul.garg@nxp.com>
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 19:57:04 -07:00
Peter Oskolkov
0ff89efb52 ip: fail fast on IP defrag errors
The current behavior of IP defragmentation is inconsistent:
- some overlapping/wrong length fragments are dropped without
  affecting the queue;
- most overlapping fragments cause the whole frag queue to be dropped.

This patch brings consistency: if a bad fragment is detected,
the whole frag queue is dropped. Two major benefits:
- fail fast: corrupted frag queues are cleared immediately, instead of
  by timeout;
- testing of overlapping fragments is now much easier: any kind of
  random fragment length mutation now leads to the frag queue being
  discarded (IP packet dropped); before this patch, some overlaps were
  "corrected", with tests not seeing expected packet drops.

Note that in one case (see "if (end&7)" conditional) the current
behavior is preserved as there are concerns that this could be
legitimate padding.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 19:49:36 -07:00
Michal Kubecek
9b30049535 ethtool: drop get_settings and set_settings callbacks
Since [gs]et_settings ethtool_ops callbacks have been deprecated in
February 2016, all in tree NIC drivers have been converted to provide
[gs]et_link_ksettings() and out of tree drivers have had enough time to do
the same.

Drop get_settings() and set_settings() and implement both ETHTOOL_[GS]SET
and ETHTOOL_[GS]LINKSETTINGS only using [gs]et_link_ksettings().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 19:46:10 -07:00
Sabrina Dubroca
f707ef61e1 net: rtnl: return early from rtnl_unregister_all when protocol isn't registered
rtnl_unregister_all(PF_INET6) gets called from inet6_init in cases when
no handler has been registered for PF_INET6 yet, for example if
ip6_mr_init() fails. Abort and avoid a NULL pointer deref in that case.

Example of panic (triggered by faking a failure of
 register_pernet_subsys):

    general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
    [...]
    RIP: 0010:rtnl_unregister_all+0x17e/0x2a0
    [...]
    Call Trace:
     ? rtnetlink_net_init+0x250/0x250
     ? sock_unregister+0x103/0x160
     ? kernel_getsockopt+0x200/0x200
     inet6_init+0x197/0x20d

Fixes: e2fddf5e96 ("[IPV6]: Make af_inet6 to check ip6_route_init return value.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 19:28:55 -07:00
Sabrina Dubroca
a03dc36bdc ipv6: fix cleanup ordering for pingv6 registration
Commit 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
contains an error in the cleanup path of inet6_init(): when
proto_register(&pingv6_prot, 1) fails, we try to unregister
&pingv6_prot. When rawv6_init() fails, we skip unregistering
&pingv6_prot.

Example of panic (triggered by faking a failure of
 proto_register(&pingv6_prot, 1)):

    general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
    [...]
    RIP: 0010:__list_del_entry_valid+0x79/0x160
    [...]
    Call Trace:
     proto_unregister+0xbb/0x550
     ? trace_preempt_on+0x6f0/0x6f0
     ? sock_no_shutdown+0x10/0x10
     inet6_init+0x153/0x1b8

Fixes: 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 19:28:55 -07:00
Sabrina Dubroca
afe49de44c ipv6: fix cleanup ordering for ip6_mr failure
Commit 15e668070a ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
moved the cleanup label for ipmr_fail, but should have changed the
contents of the cleanup labels as well. Now we can end up cleaning up
icmpv6 even though it hasn't been initialized (jump to icmp_fail or
ipmr_fail).

Simply undo things in the reverse order of their initialization.

Example of panic (triggered by faking a failure of icmpv6_init):

    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
    [...]
    RIP: 0010:__list_del_entry_valid+0x79/0x160
    [...]
    Call Trace:
     ? lock_release+0x8a0/0x8a0
     unregister_pernet_operations+0xd4/0x560
     ? ops_free_list+0x480/0x480
     ? down_write+0x91/0x130
     ? unregister_pernet_subsys+0x15/0x30
     ? down_read+0x1b0/0x1b0
     ? up_read+0x110/0x110
     ? kmem_cache_create_usercopy+0x1b4/0x240
     unregister_pernet_subsys+0x1d/0x30
     icmpv6_cleanup+0x1d/0x30
     inet6_init+0x1b5/0x23f

Fixes: 15e668070a ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 19:28:55 -07:00
YueHaibing
8bad008e79 net/ncsi: remove duplicated include from ncsi-netlink.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 19:10:40 -07:00
Davide Caratti
85eb9af182 net/sched: act_pedit: fix dump of extended layered op
in the (rare) case of failure in nla_nest_start(), missing NULL checks in
tcf_pedit_key_ex_dump() can make the following command

 # tc action add action pedit ex munge ip ttl set 64

dereference a NULL pointer:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 PGD 800000007d1cd067 P4D 800000007d1cd067 PUD 7acd3067 PMD 0
 Oops: 0002 [#1] SMP PTI
 CPU: 0 PID: 3336 Comm: tc Tainted: G            E     4.18.0.pedit+ #425
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_pedit_dump+0x19d/0x358 [act_pedit]
 Code: be 02 00 00 00 48 89 df 66 89 44 24 20 e8 9b b1 fd e0 85 c0 75 46 8b 83 c8 00 00 00 49 83 c5 08 48 03 83 d0 00 00 00 4d 39 f5 <66> 89 04 25 00 00 00 00 0f 84 81 01 00 00 41 8b 45 00 48 8d 4c 24
 RSP: 0018:ffffb5d4004478a8 EFLAGS: 00010246
 RAX: ffff8880fcda2070 RBX: ffff8880fadd2900 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: ffffb5d4004478ca RDI: ffff8880fcda206e
 RBP: ffff8880fb9cb900 R08: 0000000000000008 R09: ffff8880fcda206e
 R10: ffff8880fadd2900 R11: 0000000000000000 R12: ffff8880fd26cf40
 R13: ffff8880fc957430 R14: ffff8880fc957430 R15: ffff8880fb9cb988
 FS:  00007f75a537a740(0000) GS:ffff8880fda00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007a2fa005 CR4: 00000000001606f0
 Call Trace:
  ? __nla_reserve+0x38/0x50
  tcf_action_dump_1+0xd2/0x130
  tcf_action_dump+0x6a/0xf0
  tca_get_fill.constprop.31+0xa3/0x120
  tcf_action_add+0xd1/0x170
  tc_ctl_action+0x137/0x150
  rtnetlink_rcv_msg+0x263/0x2d0
  ? _cond_resched+0x15/0x40
  ? rtnl_calcit.isra.30+0x110/0x110
  netlink_rcv_skb+0x4d/0x130
  netlink_unicast+0x1a3/0x250
  netlink_sendmsg+0x2ae/0x3a0
  sock_sendmsg+0x36/0x40
  ___sys_sendmsg+0x26f/0x2d0
  ? do_wp_page+0x8e/0x5f0
  ? handle_pte_fault+0x6c3/0xf50
  ? __handle_mm_fault+0x38e/0x520
  ? __sys_sendmsg+0x5e/0xa0
  __sys_sendmsg+0x5e/0xa0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f75a4583ba0
 Code: c3 48 8b 05 f2 62 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d fd c3 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
 RSP: 002b:00007fff60ee7418 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fff60ee7540 RCX: 00007f75a4583ba0
 RDX: 0000000000000000 RSI: 00007fff60ee7490 RDI: 0000000000000003
 RBP: 000000005b842d3e R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fff60ee6ea0 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fff60ee7554 R14: 0000000000000001 R15: 000000000066c100
 Modules linked in: act_pedit(E) ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul ext4 crc32_pclmul mbcache ghash_clmulni_intel jbd2 pcbc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer cryptd glue_helper snd joydev pcspkr soundcore virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk virtio_console failover qxl crc32c_intel drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix virtio_pci libata virtio_ring i2c_core virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_pedit]
 CR2: 0000000000000000

Like it's done for other TC actions, give up dumping pedit rules and return
an error if nla_nest_start() returns NULL.

Fixes: 71d0ed7079 ("net/act_pedit: Support using offset relative to the conventional network headers")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 18:11:05 -07:00
Cong Wang
9a07efa9ae tipc: switch to rhashtable iterator
syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
where tipc_group_fill_sock_diag() still reads tsk->group meanwhile
tipc_group_delete() just deletes it in tipc_release().

tipc_nl_sk_walk() aims to lock this sock when walking each sock
in the hash table to close race conditions with sock changes like
this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately
this doesn't work at all. All non-BH call path should take
lock_sock() instead to make it work.

tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
where RCU read lock is required, this is the reason why lock_sock()
can't be taken on this path. This could be resolved by switching to
rhashtable iterator API's, where taking a sleepable lock is possible.
Also, the iterator API's are friendly for restartable calls like
diag dump, the last position is remembered behind the scence,
all we need to do here is saving the iterator into cb->args[].

I tested this with parallel tipc diag dump and thousands of tipc
socket creation and release, no crash or memory leak.

Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 18:04:54 -07:00
Cong Wang
bd583fe304 tipc: fix a missing rhashtable_walk_exit()
rhashtable_walk_exit() must be paired with rhashtable_walk_enter().

Fixes: 40f9f43970 ("tipc: Fix tipc_sk_reinit race conditions")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 17:52:58 -07:00
Alexey Kodanev
9f28954614 vti6: remove !skb->ignore_df check from vti6_xmit()
Before the commit d6990976af ("vti6: fix PMTU caching and reporting
on xmit") '!skb->ignore_df' check was always true because the function
skb_scrub_packet() was called before it, resetting ignore_df to zero.

In the commit, skb_scrub_packet() was moved below, and now this check
can be false for the packet, e.g. when sending it in the two fragments,
this prevents successful PMTU updates in such case. The next attempts
to send the packet lead to the same tx error. Moreover, vti6 initial
MTU value relies on PMTU adjustments.

This issue can be reproduced with the following LTP test script:
    udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000

Fixes: ccd740cbc6 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29 17:51:44 -07:00
Björn Töpel
9025403420 xsk: expose xdp_umem_get_{data,dma} to drivers
Move the xdp_umem_get_{data,dma} functions to include/net/xdp_sock.h,
so that the upcoming zero-copy implementation in the Ethernet drivers
can utilize them.

Also, supply some dummy function implementations for
CONFIG_XDP_SOCKETS=n configs.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 12:25:53 -07:00
Björn Töpel
dce5bd6140 xdp: export xdp_rxq_info_unreg_mem_model
Export __xdp_rxq_info_unreg_mem_model as xdp_rxq_info_unreg_mem_model,
so it can be used from netdev drivers. Also, add additional checks for
the memory type.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 12:25:53 -07:00
Björn Töpel
b0d1beeff2 xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPY
This commit adds proper MEM_TYPE_ZERO_COPY support for
convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an
xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a
MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might
make sense to implement a more sophisticated thread-safe alloc/free
scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is
required in the fast-path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 12:25:53 -07:00
Daniel Borkmann
a8cf76a902 bpf: fix sg shift repair start offset in bpf_msg_pull_data
When we perform the sg shift repair for the scatterlist ring, we
currently start out at i = first_sg + 1. However, this is not
correct since the first_sg could point to the sge sitting at slot
MAX_SKB_FRAGS - 1, and a subsequent i = MAX_SKB_FRAGS will access
the scatterlist ring (sg) out of bounds. Add the sk_msg_iter_var()
helper for iterating through the ring, and apply the same rule
for advancing to the next ring element as we do elsewhere. Later
work will use this helper also in other places.

Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 10:47:17 -07:00
Daniel Borkmann
2e43f95dd8 bpf: fix shift upon scatterlist ring wrap-around in bpf_msg_pull_data
If first_sg and last_sg wraps around in the scatterlist ring, then we
need to account for that in the shift as well. E.g. crafting such msgs
where this is the case leads to a hang as shift becomes negative. E.g.
consider the following scenario:

  first_sg := 14     |=>    shift := -12     msg->sg_start := 10
  last_sg  :=  3     |                       msg->sg_end   :=  5

round  1:  i := 15, move_from :=   3, sg[15] := sg[  3]
round  2:  i :=  0, move_from := -12, sg[ 0] := sg[-12]
round  3:  i :=  1, move_from := -11, sg[ 1] := sg[-11]
round  4:  i :=  2, move_from := -10, sg[ 2] := sg[-10]
[...]
round 13:  i := 11, move_from :=  -1, sg[ 2] := sg[ -1]
round 14:  i := 12, move_from :=   0, sg[ 2] := sg[  0]
round 15:  i := 13, move_from :=   1, sg[ 2] := sg[  1]
round 16:  i := 14, move_from :=   2, sg[ 2] := sg[  2]
round 17:  i := 15, move_from :=   3, sg[ 2] := sg[  3]
[...]

This means we will loop forever and never hit the msg->sg_end condition
to break out of the loop. When we see that the ring wraps around, then
the shift should be MAX_SKB_FRAGS - first_sg + last_sg - 1. Meaning,
the remainder slots from the tail of the ring and the head until last_sg
combined.

Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 10:47:17 -07:00
Daniel Borkmann
0e06b227c5 bpf: fix msg->data/data_end after sg shift repair in bpf_msg_pull_data
In the current code, msg->data is set as sg_virt(&sg[i]) + start - offset
and msg->data_end relative to it as msg->data + bytes. Using iterator i
to point to the updated starting scatterlist element holds true for some
cases, however not for all where we'd end up pointing out of bounds. It
is /correct/ for these ones:

1) When first finding the starting scatterlist element (sge) where we
   find that the page is already privately owned by the msg and where
   the requested bytes and headroom fit into the sge's length.

However, it's /incorrect/ for the following ones:

2) After we made the requested area private and updated the newly allocated
   page into first_sg slot of the scatterlist ring; when we find that no
   shift repair of the ring is needed where we bail out updating msg->data
   and msg->data_end. At that point i will point to last_sg, which in this
   case is the next elem of first_sg in the ring. The sge at that point
   might as well be invalid (e.g. i == msg->sg_end), which we use for
   setting the range of sg_virt(&sg[i]). The correct one would have been
   first_sg.

3) Similar as in 2) but when we find that a shift repair of the ring is
   needed. In this case we fix up all sges and stop once we've reached the
   end. In this case i will point to will point to the new msg->sg_end,
   and the sge at that point will be invalid. Again here the requested
   range sits in first_sg.

Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 10:47:17 -07:00
Florian Westphal
0434ccdcf8 netfilter: nf_tables: rework ct timeout set support
Using a private template is problematic:

1. We can't assign both a zone and a timeout policy
   (zone assigns a conntrack template, so we hit problem 1)
2. Using a template needs to take care of ct refcount, else we'll
   eventually free the private template due to ->use underflow.

This patch reworks template policy to instead work with existing conntrack.

As long as such conntrack has not yet been placed into the hash table
(unconfirmed) we can still add the timeout extension.

The only caveat is that we now need to update/correct ct->timeout to
reflect the initial/new state, otherwise the conntrack entry retains the
default 'new' timeout.

Side effect of this change is that setting the policy must
now occur from chains that are evaluated *after* the conntrack lookup
has taken place.

No released kernel contains the timeout policy feature yet, so this change
should be ok.

Changes since v2:
 - don't handle 'ct is confirmed case'
 - after previous patch, no need to special-case tcp/dccp/sctp timeout
   anymore

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-29 13:04:38 +02:00
Florian Westphal
ef39078d63 netfilter: conntrack: place 'new' timeout in first location too
tcp, sctp and dccp trackers re-use the userspace ctnetlink states
to index their timeout arrays, which means timeout[0] is never
used.  Copy the 'new' state (syn-sent, dccp-request, ..) to 0 as well
so external users can simply read it off timeouts[0] without need to
differentiate dccp/sctp/tcp and udp/icmp/gre/generic.

The alternative is to map all array accesses to 'i - 1', but that
is a much more intrusive change.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-29 13:04:38 +02:00
Sara Sharon
166ac9d55b mac80211: avoid kernel panic when building AMSDU from non-linear SKB
When building building AMSDU from non-linear SKB, we hit a
kernel panic when trying to push the padding to the tail.
Instead, put the padding at the head of the next subframe.
This also fixes the A-MSDU subframes to not have the padding
accounted in the length field and not have pad at all for
the last subframe, both required by the spec.

Fixes: 6e0456b545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Reviewed-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29 12:17:55 +02:00
Yuan-Chi Pang
1f631c3201 mac80211: mesh: fix HWMP sequence numbering to follow standard
IEEE 802.11-2016 14.10.8.3 HWMP sequence numbering says:
If it is a target mesh STA, it shall update its own HWMP SN to
maximum (current HWMP SN, target HWMP SN in the PREQ element) + 1
immediately before it generates a PREP element in response to a
PREQ element.

Signed-off-by: Yuan-Chi Pang <fu3mo6goo@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29 11:15:30 +02:00
Balaji Pothunoori
9c06602b1b cfg80211: clarify frames covered by average ACK signal report
Modify the API to include all ACK frames in average ACK
signal strength reporting, not just ACKs for data frames.
Make exposing the data conditional on implementing the
extended feature flag.

This is how it was really implemented in mac80211, update
the code there to use the new defines and clean up some of
the setting code.

Keep nl80211.h source compatibility by keeping the old names.

Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
[rewrite commit log, change compatibility to be old=new
 instead of the other way around, update kernel-doc,
 roll in mac80211 changes, make mac80211 depend on valid
 bit instead of HW flag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29 11:01:51 +02:00
Sathishkumar Muruganandam
1ecef20cf1 mac80211: add missing WFA Multi-AP backhaul STA Rx requirement
The current mac80211 WDS (4-address mode) can be used to cover most of the
Multi-AP requirements for Data frames per the WFA Multi-AP Specification v1.0.
When configuring AP/STA interfaces in 4-address mode, they are able to function
as fronthaul AP/backhaul STA of Multi-AP device complying below
Tx, Rx requirements except one missing STA Rx requirement added by this patch.

Multi-AP specification section 14.1 describes the following requirements:

Transmitter requirements
------------------------
1. Fronthaul AP
        i) When DA!=RA of backhaul STA, must use 4-address format
        ii) When DA==RA of backhaul STA, shall use either 3-address
            or 4-address format with RA updated with STA MAC

            (mac80211 support 4-address format via AP/VLAN interface)

2. Backhaul STA
        i) When SA!=TA of backhaul STA, must use 4-address format
        ii) When SA==TA of backhaul STA, shall use either 3-address
            or 4-address format with RA updated with AP MAC

            (mac80211 support 4-address format via use_4addr)

Receiver requirements
---------------------
1. Fronthaul AP
        i) When SA!=TA of backhaul STA, must support receiving 4-address
           format frames
        ii) When SA==TA of backhaul STA, must support receiving both
            3-address and 4-address format frames

            (mac80211 support both 3-addr & 4-addr via AP/VLAN interface)

2. Backhaul STA
        i) When DA!=RA of backhaul STA, must support receiving 4-address
           format frames
        ii) When DA==RA of backhaul STA,  must support receiving both
            3-address and 4-address format frames

            (mac80211 support only receiving 4-address format via
             use_4addr)

This patch addresses the above Rx requirement (ii) for backhaul STA to receive
unicast (DA==RA) 3-address frames in addition to 4-address frames.

The current design doesn't accept 3-address frames when configured in 4-address
mode (use_4addr). Hence add a check to allow 3-address frames when DA==RA of
backhaul STA (adhering to Table 9-26 of IEEE Std 802.11™-2016).

This case was tested with a bridged station interface when associated with
a non-mac80211 based vendor AP implementation using 3-address frames for WDS.

STA was able to support the Multi-AP Rx requirement when DA==RA. No issues,
no loops seen when tested with mac80211 based AP as well.

Verified and confirmed all other Tx and Rx requirements of AP and STA for
Multi-AP respectively. They all work using the current mac80211-WDS design.

Signed-off-by: Sathishkumar Muruganandam <murugana@codeaurora.org>

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29 09:25:17 +02:00
Shannon Nelson
4a132095dd xfrm: allow driver to quietly refuse offload
If the "offload" attribute is used to create an IPsec SA
and the .xdo_dev_state_add() fails, the SA creation fails.
However, if the "offload" attribute is used on a device that
doesn't offer it, the attribute is quietly ignored and the SA
is created without an offload.

Along the same line of that second case, it would be good to
have a way for the device to refuse to offload an SA without
failing the whole SA creation.  This patch adds that feature
by allowing the driver to return -EOPNOTSUPP as a signal that
the SA may be fine, it just can't be offloaded.

This allows the user a little more flexibility in requesting
offloads and not needing to know every detail at all times about
each specific NIC when trying to create SAs.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-08-29 08:04:44 +02:00
Haishuang Yan
0c05f98376 esp: remove redundant define esph
The pointer 'esph' is defined but is never used hence it is redundant
and canbe removed.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-08-29 08:02:43 +02:00
Wei Yongjun
211d6f2dc8 xfrm: Make function xfrmi_get_link_net() static
Fixes the following sparse warning:

net/xfrm/xfrm_interface.c:745:12: warning:
 symbol 'xfrmi_get_link_net' was not declared. Should it be static?

Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-08-29 07:56:36 +02:00
Daniel Borkmann
5b24109b05 bpf: fix several offset tests in bpf_msg_pull_data
While recently going over bpf_msg_pull_data(), I noticed three
issues which are fixed in here:

1) When we attempt to find the first scatterlist element (sge)
   for the start offset, we add len to the offset before we check
   for start < offset + len, whereas it should come after when
   we iterate to the next sge to accumulate the offsets. For
   example, given a start offset of 12 with a sge length of 8
   for the first sge in the list would lead us to determine this
   sge as the first sge thinking it covers first 16 bytes where
   start is located, whereas start sits in subsequent sges so
   we would end up pulling in the wrong data.

2) After figuring out the starting sge, we have a short-cut test
   in !msg->sg_copy[i] && bytes <= len. This checks whether it's
   not needed to make the page at the sge private where we can
   just exit by updating msg->data and msg->data_end. However,
   the length test is not fully correct. bytes <= len checks
   whether the requested bytes (end - start offsets) fit into the
   sge's length. The part that is missing is that start must not
   be sge length aligned. Meaning, the start offset into the sge
   needs to be accounted as well on top of the requested bytes
   as otherwise we can access the sge out of bounds. For example
   the sge could have length of 8, our requested bytes could have
   length of 8, but at a start offset of 4, so we also would need
   to pull in 4 bytes of the next sge, when we jump to the out
   label we do set msg->data to sg_virt(&sg[i]) + start - offset
   and msg->data_end to msg->data + bytes which would be oob.

3) The subsequent bytes < copy test for finding the last sge has
   the same issue as in point 2) but also it tests for less than
   rather than less or equal to. Meaning if the sge length is of
   8 and requested bytes of 8 while having the start aligned with
   the sge, we would unnecessarily go and pull in the next sge as
   well to make it private.

Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-28 22:23:45 -07:00
Haim Dreyfuss
b88d26d97c nl80211: Pass center frequency in kHz instead of MHz
freq_reg_info expects to get the frequency in kHz. Instead we
accidently pass it in MHz.  Thus, currently the function always
return ERR rule. Fix that.

Fixes: 50f32718e1 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[fix kHz/MHz in commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:41:23 +02:00
Haim Dreyfuss
d3c89bbc74 nl80211: Fix nla_put_u8 to u16 for NL80211_WMMR_TXOP
TXOP (also known as Channel Occupancy Time) is u16 and should be
added using nla_put_u16 instead of u8, fix that.

Fixes: 50f32718e1 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:37:28 +02:00
Alexei Avshalom Lazar
9cf0a0b4b6 cfg80211: Add support for 60GHz band channels 5 and 6
The current support in the 60GHz band is for channels 1-4.
Add support for channels 5 and 6.
This requires enlarging ieee80211_channel.center_freq from u16 to u32.

Signed-off-by: Alexei Avshalom Lazar <ailizaro@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:23:08 +02:00
Manikanta Pubbisetty
21a5d4c3a4 mac80211: add stop/start logic for software TXQs
Sometimes, it is required to stop the transmissions momentarily and
resume it later; stopping the txqs becomes very critical in scenarios where
the packet transmission has to be ceased completely. For example, during
the hardware restart, during off channel operations,
when initiating CSA(upon detecting a radar on the DFS channel), etc.

The TX queue stop/start logic in mac80211 works well in stopping the TX
when drivers make use of netdev queues, i.e, when Qdiscs in network layer
take care of traffic scheduling. Since the devices implementing
wake_tx_queue can run without Qdiscs, packets will be handed to mac80211
directly without queueing them in the netdev queues.

Also, mac80211 does not invoke any of the
netif_stop_*/netif_wake_* APIs if wake_tx_queue is implemented.
Since the queues are not stopped in this case, transmissions can continue
and this will impact negatively on the operation of the wireless device.

For example,
During hardware restart, we stop the netdev queues so that packets are
not sent to the driver. Since ath10k implements wake_tx_queue,
TX queues will not be stopped and packets might reach the hardware while
it is restarting; this can make hardware unresponsive and the only
possible option for recovery is to reboot the entire system.

There is another problem to this, it is observed that the packets
were sent on the DFS channel for a prolonged duration after radar
detection impacting the channel closing time.

We can still invoke netif stop/wake APIs when wake_tx_queue is implemented
but this could lead to packet drops in network layer; adding stop/start
logic for software TXQs in mac80211 instead makes more sense; the change
proposed adds the same in mac80211.

Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:16:35 +02:00
Rajeev Kumar Sirasanagandla
7417844b63 cfg80211: Avoid regulatory restore when COUNTRY_IE_IGNORE is set
When REGULATORY_COUNTRY_IE_IGNORE is set,  __reg_process_hint_country_ie()
ignores the country code change request from __cfg80211_connect_result()
via regulatory_hint_country_ie().

After Disconnect, similar to above, country code should not be reset to
world when country IE ignore is set. But this is violated and restore of
regulatory settings is invoked by cfg80211_disconnect_work via
regulatory_hint_disconnect().

To address this, avoid regulatory restore from regulatory_hint_disconnect()
when COUNTRY_IE_IGNORE is set.

Note: Currently, restore_regulatory_settings() takes care of clearing
beacon hints. But in the proposed change, regulatory restore is avoided.
Therefore, explicitly clear beacon hints when DISABLE_BEACON_HINTS
is not set.

Signed-off-by: Rajeev Kumar Sirasanagandla <rsirasan@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:15:51 +02:00
Dedy Lansky
30ca1aa536 cfg80211/mac80211: make ieee80211_send_layer2_update a public function
Make ieee80211_send_layer2_update() a common function so other drivers
can re-use it.

Signed-off-by: Dedy Lansky <dlansky@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:15:27 +02:00
Richard Guy Briggs
f404c3ecc4 rfkill: fix spelling mistake contidion to condition
This came about while trying to determine if there would be any pattern
match on contid, a new audit container identifier internal variable.
This was the only one.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:15:27 +02:00
Emmanuel Grumbach
20932750d9 mac80211: don't update the PM state of a peer upon a multicast frame
I changed the way mac80211 updates the PM state of the peer.
I forgot that we could also have multicast frames from the
peer and that those frame should of course not change the
PM state of the peer: A peer goes to power save when it
needs to scan, but it won't send the broadcast Probe Request
with the PM bit set.

This made us mark the peer as awake when it wasn't and then
Intel's firmware would fail to transmit because the peer is
asleep according to its database. The driver warned about
this and it looked like this:

 WARNING: CPU: 0 PID: 184 at /usr/src/linux-4.16.14/drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1369 iwl_mvm_rx_tx_cmd+0x53b/0x860
 CPU: 0 PID: 184 Comm: irq/124-iwlwifi Not tainted 4.16.14 #1
 RIP: 0010:iwl_mvm_rx_tx_cmd+0x53b/0x860
 Call Trace:
  iwl_pcie_rx_handle+0x220/0x880
  iwl_pcie_irq_handler+0x6c9/0xa20
  ? irq_forced_thread_fn+0x60/0x60
  ? irq_thread_dtor+0x90/0x90

The relevant code that spits the WARNING is:

        case TX_STATUS_FAIL_DEST_PS:
                /* the FW should have stopped the queue and not
                 * return this status
                 */
                WARN_ON(1);
                info->flags |= IEEE80211_TX_STAT_TX_FILTERED;

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199967.

Fixes: 9fef654433 ("mac80211: always update the PM state of a peer on MGMT / DATA frames")
Cc: <stable@vger.kernel.org>   #4.16+
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:12:42 +02:00
Stanislaw Gruszka
38cb87ee47 cfg80211: make wmm_rule part of the reg_rule structure
Make wmm_rule be part of the reg_rule structure. This simplifies the
code a lot at the cost of having bigger memory usage. However in most
cases we have only few reg_rule's and when we do have many like in
iwlwifi we do not save memory as it allocates a separate wmm_rule for
each channel anyway.

This also fixes a bug reported in various places where somewhere the
pointers were corrupted and we ended up doing a null-dereference.

Fixes: 230ebaa189 ("cfg80211: read wmm rules from regulatory database")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
[rephrase commit message slightly]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:11:47 +02:00
Danek Duvall
67d1ba8a6d mac80211: correct use of IEEE80211_VHT_CAP_RXSTBC_X
The mod mask for VHT capabilities intends to say that you can override
the number of STBC receive streams, and it does, but only by accident.
The IEEE80211_VHT_CAP_RXSTBC_X aren't bits to be set, but values (albeit
left-shifted).  ORing the bits together gets the right answer, but we
should use the _MASK macro here instead.

Signed-off-by: Danek Duvall <duvall@comfychair.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:08:59 +02:00
Stefan Agner
3f6e138d41 bpf: fix build error with clang
Building the newly introduced BPF_PROG_TYPE_SK_REUSEPORT leads to
a compile time error when building with clang:
net/core/filter.o: In function `sk_reuseport_convert_ctx_access':
  ../net/core/filter.c:7284: undefined reference to `__compiletime_assert_7284'

It seems that clang has issues resolving hweight_long at compile
time. Since SK_FL_PROTO_MASK is a constant, we can use the interface
for known constant arguments which works fine with clang.

Fixes: 2dbb9b9e6d ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-27 20:37:05 -07:00
Zhu Yanjun
53ae914d89 net/rds: Use rdma_read_gids to get connection SGID/DGID in IPv6
In IPv4, the newly introduced rdma_read_gids is used to read the SGID/DGID
for the connection which returns GID correctly for RoCE transport as well.

In IPv6, rdma_read_gids is also used. The following are why rdma_read_gids
is introduced.

rdma_addr_get_dgid() for RoCE for client side connections returns MAC
address, instead of DGID.
rdma_addr_get_sgid() for RoCE doesn't return correct SGID for IPv6 and
when more than one IP address is assigned to the netdevice.

So the transport agnostic rdma_read_gids() API is provided by rdma_cm
module.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 15:26:01 -07:00
Linus Walleij
ad8619864f net: dsa: Drop GPIO includes
Commit 52638f71fc ("dsa: Move gpio reset into switch driver")
moved the GPIO handling into the switch drivers but forgot
to remove the GPIO header includes.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 15:24:33 -07:00
Haiqing Bai
30935198b7 tipc: fix the big/little endian issue in tipc_dest
In function tipc_dest_push, the 32bit variables 'node' and 'port'
are stored separately in uppper and lower part of 64bit 'value'.
Then this value is assigned to dst->value which is a union like:
union
{
  struct {
    u32 port;
    u32 node;
  };
  u64 value;
}
This works on little-endian machines like x86 but fails on big-endian
machines.

The fix remove the 'value' stack parameter and even the 'value'
member of the union in tipc_dest, assign the 'node' and 'port' member
directly with the input parameter to avoid the endian issue.

Fixes: a80ae5306a ("tipc: improve destination linked list")
Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Haiqing Bai <Haiqing.Bai@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 15:23:31 -07:00
Jiri Pirko
b7b4247d55 net: sched: return -ENOENT when trying to remove filter from non-existent chain
When chain 0 was implicitly created, removal of non-existent filter from
chain 0 gave -ENOENT. Once chain 0 became non-implicit, the same call is
giving -EINVAL. Fix this by returning -ENOENT in that case.

Reported-by: Roman Mashak <mrv@mojatatu.com>
Fixes: f71e0ca4db ("net: sched: Avoid implicit chain 0 creation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 15:16:16 -07:00
Jiri Pirko
d5ed72a55b net: sched: fix extack error message when chain is failed to be created
Instead "Cannot find" say "Cannot create".

Fixes: c35a4acc29 ("net: sched: cls_api: handle generic cls errors")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 15:16:16 -07:00
Xin Long
84581bdae9 erspan: set erspan_ver to 1 by default when adding an erspan dev
After erspan_ver is introudced, if erspan_ver is not set in iproute, its
value will be left 0 by default. Since Commit 02f99df187 ("erspan: fix
invalid erspan version."), it has broken the traffic due to the version
check in erspan_xmit if users are not aware of 'erspan_ver' param, like
using an old version of iproute.

To fix this compatibility problem, it sets erspan_ver to 1 by default
when adding an erspan dev in erspan_setup. Note that we can't do it in
ipgre_netlink_parms, as this function is also used by ipgre_changelink.

Fixes: 02f99df187 ("erspan: fix invalid erspan version.")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 15:13:17 -07:00
Xin Long
834539e69a sctp: remove useless start_fail from sctp_ht_iter in proc
After changing rhashtable_walk_start to return void, start_fail would
never be set other value than 0, and the checking for start_fail is
pointless, so remove it.

Fixes: 97a6ec4ac0 ("rhashtable: Change rhashtable_walk_start to return void")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 15:13:17 -07:00
Xin Long
bab1be79a5 sctp: hold transport before accessing its asoc in sctp_transport_get_next
As Marcelo noticed, in sctp_transport_get_next, it is iterating over
transports but then also accessing the association directly, without
checking any refcnts before that, which can cause an use-after-free
Read.

So fix it by holding transport before accessing the association. With
that, sctp_transport_hold calls can be removed in the later places.

Fixes: 626d16f50f ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
Reported-by: syzbot+fe62a0c9aa6a85c6de16@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 15:13:17 -07:00
Linus Torvalds
050cdc6c95 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks.

 2) Better fix for AB-BA deadlock in packet scheduler code, from Cong
    Wang.

 3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel
    Borkmann.

 4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent
    attackers using it as a side-channel. From Eric Dumazet.

 5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva.

 6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin
    Liu.

 7) hns and hns3 bug fixes from Huazhong Tan.

 8) Fix RIF leak in mlxsw driver, from Ido Schimmel.

 9) iova range check fix in vhost, from Jason Wang.

10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend.

11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng.

12) Memory exposure in TCA_U32_SEL handling, from Kees Cook.

13) TCP BBR congestion control fixes from Kevin Yang.

14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger.

15) qed driver fixes from Tomer Tayar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
  net: sched: Fix memory exposure from short TCA_U32_SEL
  qed: fix spelling mistake "comparsion" -> "comparison"
  vhost: correctly check the iova range when waking virtqueue
  qlge: Fix netdev features configuration.
  net: macb: do not disable MDIO bus at open/close time
  Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency"
  net: macb: Fix regression breaking non-MDIO fixed-link PHYs
  mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge
  i40e: fix condition of WARN_ONCE for stat strings
  i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled
  ixgbe: fix driver behaviour after issuing VFLR
  ixgbe: Prevent unsupported configurations with XDP
  ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
  igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
  igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
  igb: Use an advanced ctx descriptor for launchtime
  e1000: ensure to free old tx/rx rings in set_ringparam()
  e1000: check on netif_running() before calling e1000_up()
  ixgb: use dma_zalloc_coherent instead of allocator/memset
  ice: Trivial formatting fixes
  ...
2018-08-27 11:59:39 -07:00
Kees Cook
98c8f125fd net: sched: Fix memory exposure from short TCA_U32_SEL
Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
policy, so max length isn't enforced, only minimum. This means nkeys
(from userspace) was being trusted without checking the actual size of
nla_len(), which could lead to a memory over-read, and ultimately an
exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
a namespace.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-26 14:21:50 -07:00
Linus Torvalds
aba16dc5cf Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax
Pull IDA updates from Matthew Wilcox:
 "A better IDA API:

      id = ida_alloc(ida, GFP_xxx);
      ida_free(ida, id);

  rather than the cumbersome ida_simple_get(), ida_simple_remove().

  The new IDA API is similar to ida_simple_get() but better named.  The
  internal restructuring of the IDA code removes the bitmap
  preallocation nonsense.

  I hope the net -200 lines of code is convincing"

* 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax: (29 commits)
  ida: Change ida_get_new_above to return the id
  ida: Remove old API
  test_ida: check_ida_destroy and check_ida_alloc
  test_ida: Convert check_ida_conv to new API
  test_ida: Move ida_check_max
  test_ida: Move ida_check_leaf
  idr-test: Convert ida_check_nomem to new API
  ida: Start new test_ida module
  target/iscsi: Allocate session IDs from an IDA
  iscsi target: fix session creation failure handling
  drm/vmwgfx: Convert to new IDA API
  dmaengine: Convert to new IDA API
  ppc: Convert vas ID allocation to new IDA API
  media: Convert entity ID allocation to new IDA API
  ppc: Convert mmu context allocation to new IDA API
  Convert net_namespace to new IDA API
  cb710: Convert to new IDA API
  rsxx: Convert to new IDA API
  osd: Convert to new IDA API
  sd: Convert to new IDA API
  ...
2018-08-26 11:48:42 -07:00
Florian Westphal
10568f6c57 netfilter: xt_checksum: ignore gso skbs
Satish Patel reports a skb_warn_bad_offload() splat caused
by -j CHECKSUM rules:

-A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM

The CHECKSUM target has never worked with GSO skbs, and the above rule
makes no sense as kernel will handle checksum updates on transmit.

Unfortunately, there are 3rd party tools that install such rules, so we
cannot reject this from the config plane without potential breakage.

Amend Kconfig text to clarify that the CHECKSUM target is only useful
in virtualized environments, where old dhcp clients that use AF_PACKET
used to discard UDP packets with a 'bad' header checksum and add a
one-time warning in case such rule isn't restricted to UDP.

v2: check IP6T_F_PROTO flag before cmp (Michal Kubecek)

Reported-by: Satish Patel <satish.txt@gmail.com>
Reported-by: Markos Chandras <markos.chandras@suse.com>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-24 09:58:16 +02:00
David S. Miller
ff0fadfffe Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-08-24

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix BPF sockmap and tls where we get a hang in do_tcp_sendpages()
   when sndbuf is full due to missing calls into underlying socket's
   sk_write_space(), from John.

2) Two BPF sockmap fixes to reject invalid parameters on map creation
   and to fix a map element miscount on allocation failure. Another fix
   for BPF hash tables to use per hash table salt for jhash(), from Daniel.

3) Fix for bpftool's command line parsing in order to terminate on bad
   arguments instead of keeping looping in some border cases, from Quentin.

4) Fix error value of xdp_umem_assign_dev() in order to comply with
   expected bind ops error codes, from Prashant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23 22:41:55 -07:00
Linus Torvalds
33e17876ea Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - the rest of MM

 - various misc fixes and tweaks

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  mm: Change return type int to vm_fault_t for fault handlers
  lib/fonts: convert comments to utf-8
  s390: ebcdic: convert comments to UTF-8
  treewide: convert ISO_8859-1 text comments to utf-8
  drivers/gpu/drm/gma500/: change return type to vm_fault_t
  docs/core-api: mm-api: add section about GFP flags
  docs/mm: make GFP flags descriptions usable as kernel-doc
  docs/core-api: split memory management API to a separate file
  docs/core-api: move *{str,mem}dup* to "String Manipulation"
  docs/core-api: kill trailing whitespace in kernel-api.rst
  mm/util: add kernel-doc for kvfree
  mm/util: make strndup_user description a kernel-doc comment
  fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu builds
  treewide: correct "differenciate" and "instanciate" typos
  fs/afs: use new return type vm_fault_t
  drivers/hwtracing/intel_th/msu.c: change return type to vm_fault_t
  mm: soft-offline: close the race against page allocation
  mm: fix race on soft-offlining free huge pages
  namei: allow restricted O_CREAT of FIFOs and regular files
  hfs: prevent crash on exit from failed search
  ...
2018-08-23 19:20:12 -07:00
Arnd Bergmann
3723c63247 treewide: convert ISO_8859-1 text comments to utf-8
Almost all files in the kernel are either plain text or UTF-8 encoded.  A
couple however are ISO_8859-1, usually just a few characters in a C
comments, for historic reasons.

This converts them all to UTF-8 for consistency.

Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Simon Horman <horms@verge.net.au>			[IPVS portion]
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>	[IIO]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>			[powerpc]
Acked-by: Rob Herring <robh@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23 18:48:43 -07:00
Linus Torvalds
53a01c9a5f NFS client updates for Linux 4.19
Stable bufixes:
 - v3.17+: Fix an off-by-one in bl_map_stripe()
 - v4.9+: NFSv4 client live hangs after live data migration recovery
 - v4.18+: xprtrdma: Fix disconnect regression
 - v4.14+: Fix locking in pnfs_generic_recover_commit_reqs
 - v4.9+: Fix a sleep in atomic context in nfs4_callback_sequence()
 
 Features:
 - Add support for asynchronous server-side COPY operations
 
 Other bugfixes and cleanups:
 - Optitmizations and fixes involving NFS v4.1 / pNFS layout handling
 - Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking
 - Immediately reschedule writeback when the server replies with an error
 - Fix excessive attribute revalidation in nfs_execute_ok()
 - Add error checking to nfs_idmap_prepare_message()
 - Use new vm_fault_t return type
 - Return a delegation when reclaiming one that the server has recalled
 - Referrals should inherit proto setting from parents
 - Make rpc_auth_create_args a const
 - Improvements to rpc_iostats tracking
 - Fix a potential reference leak when there is an error processing a callback
 - Fix rmdir / mkdir / rename nlink accounting
 - Fix updating inode change attribute
 - Fix error handling in nfsn4_sp4_select_mode()
 - Use an appropriate work queue for direct-write completion
 - Don't busy wait if NFSv4 session draining is interrupted
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlt/CYIACgkQ18tUv7Cl
 QOu8gBAA0xQWmgRoG6oIdYUxvgYqhuJmMqC4SU1E6mCJ93xEuUSvEFw51X+84KCt
 r6UPkp/bKiVe3EIinKTplIzuxgggXNG0EQmO46FYNTl7nqpN85ffLsQoWsiD23fp
 j8afqKPFR2zfhHXLKQC7k1oiOpwGqJ+EJWgIW4llE80pSNaErEoEaDqSPds5thMN
 dHEjjLr8ef6cbBux6sSPjwWGNbE82uoSu3MDuV2+e62hpGkgvuEYo1vyE6ujeZW5
 MUsmw+AHZkwro0msTtNBOHcPZAS0q/2UMPzl1tsDeCWNl2mugqZ6szQLSS2AThKq
 Zr6iK9Q5dWjJfrQHcjRMnYJB+SCX1SfPA7ASuU34opwcWPjecbS9Q92BNTByQYwN
 o9ngs2K0mZfqpYESMAmf7Il134cCBrtEp3skGko2KopJcYcE5YUFhdKihi1yQQjU
 UbOOubMpQk8vY9DpDCAwGbICKwUZwGvq27uuUWL20kFVDb1+jvfHwcV4KjRAJo/E
 J9aFtU+qOh4rMPMnYlEVZcAZBGfenlv/DmBl1upRpjzBkteUpUJsAbCmGyAk4616
 3RECasehgsjNCQpFIhv3FpUkWzP5jt0T3gRr1NeY6WKJZwYnHEJr9PtapS+EIsCT
 tB5DvvaJqFtuHFOxzn+KlGaxdSodHF7klOq7NM3AC0cX8AkWqaU=
 =8+9t
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "These patches include adding async support for the v4.2 COPY
  operation. I think Bruce is planning to send the server patches for
  the next release, but I figured we could get the client side out of
  the way now since it's been in my tree for a while. This shouldn't
  cause any problems, since the server will still respond with
  synchronous copies even if the client requests async.

  Features:
   - Add support for asynchronous server-side COPY operations

  Stable bufixes:
   - Fix an off-by-one in bl_map_stripe() (v3.17+)
   - NFSv4 client live hangs after live data migration recovery (v4.9+)
   - xprtrdma: Fix disconnect regression (v4.18+)
   - Fix locking in pnfs_generic_recover_commit_reqs (v4.14+)
   - Fix a sleep in atomic context in nfs4_callback_sequence() (v4.9+)

  Other bugfixes and cleanups:
   - Optimizations and fixes involving NFS v4.1 / pNFS layout handling
   - Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking
   - Immediately reschedule writeback when the server replies with an
     error
   - Fix excessive attribute revalidation in nfs_execute_ok()
   - Add error checking to nfs_idmap_prepare_message()
   - Use new vm_fault_t return type
   - Return a delegation when reclaiming one that the server has
     recalled
   - Referrals should inherit proto setting from parents
   - Make rpc_auth_create_args a const
   - Improvements to rpc_iostats tracking
   - Fix a potential reference leak when there is an error processing a
     callback
   - Fix rmdir / mkdir / rename nlink accounting
   - Fix updating inode change attribute
   - Fix error handling in nfsn4_sp4_select_mode()
   - Use an appropriate work queue for direct-write completion
   - Don't busy wait if NFSv4 session draining is interrupted"

* tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (54 commits)
  pNFS: Remove unwanted optimisation of layoutget
  pNFS/flexfiles: ff_layout_pg_init_read should exit on error
  pNFS: Treat RECALLCONFLICT like DELAY...
  pNFS: When updating the stateid in layoutreturn, also update the recall range
  NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
  NFSv4: Fix locking in pnfs_generic_recover_commit_reqs
  NFSv4: Fix a typo in nfs4_init_channel_attrs()
  NFSv4: Don't busy wait if NFSv4 session draining is interrupted
  NFS recover from destination server reboot for copies
  NFS add a simple sync nfs4_proc_commit after async COPY
  NFS handle COPY ERR_OFFLOAD_NO_REQS
  NFS send OFFLOAD_CANCEL when COPY killed
  NFS export nfs4_async_handle_error
  NFS handle COPY reply CB_OFFLOAD call race
  NFS add support for asynchronous COPY
  NFS COPY xdr handle async reply
  NFS OFFLOAD_CANCEL xdr
  NFS CB_OFFLOAD xdr
  NFS: Use an appropriate work queue for direct-write completion
  NFSv4: Fix error handling in nfs4_sp4_select_mode()
  ...
2018-08-23 16:03:58 -07:00
Linus Torvalds
9157141c95 A mistake on my part caused me to tag my branch 6 commits too early,
missing Chuck's fixes for the problem with callbacks over GSS from
 multi-homed servers, and a smaller fix from Laura Abbott.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbftA8AAoJECebzXlCjuG+QPMQALieEKkX0YoqRhPz5G+RrWFy
 KgOBFAoiRcjFQD6wMt9FzD6qYEZqSJ+I2b+K5N3BkdyDDQu845iD0wK0zBGhMgLm
 7ith85nphIMbe18+5jPorqAsI9RlfBQjiSGw1MEx5dicLQQzTObHL5q+l5jcWna4
 jWS3yUKv1URpOsR1hIryw74ktSnhuH8n//zmntw8aWrCkq3hnXOZK/agtYxZ7Viv
 V3kiQsiNpL2FPRcHN7ejhLUTnRkkuD2iYKrzP/SpTT/JfdNEUXlMhKkAySogNpus
 nvR9X7hwta8Lgrt7PSB9ibFTXtCupmuICg5mbDWy6nXea2NvpB01QhnTzrlX17Eh
 Yfk/18z95b6Qs1v4m3SI8ESmyc6l5dMZozLudtHzifyCqooWZriEhCR1PlQfQ/FJ
 4cYQ8U/qiMiZIJXL7N2wpSoSaWR5bqU1rXen29Np1WEDkiv4Nf5u2fsCXzv0ZH2C
 ReWpNkbnNxsNiKpp4geBZtlcSEU1pk+1PqE0MagTdBV3iptiUHRSP4jR7qLnc0zT
 J1lCvU7Fodnt9vNSxMpt2Jd6XxQ6xtx7n6aMQAiYFnXDs+hP2hPnJVCScnYW3L6R
 2r1sHRKKeoOzCJ2thw+zu4lOwMm7WPkJPWAYfv90reWkiKoy2vG0S9P7wsNGoJuW
 fuEjB2b9pow1Ffynat6q
 =JnLK
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Chuck Lever fixed a problem with NFSv4.0 callbacks over GSS from
  multi-homed servers.

  The only new feature is a minor bit of protocol (change_attr_type)
  which the client doesn't even use yet.

  Other than that, various bugfixes and cleanup"

* tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux: (27 commits)
  sunrpc: Add comment defining gssd upcall API keywords
  nfsd: Remove callback_cred
  nfsd: Use correct credential for NFSv4.0 callback with GSS
  sunrpc: Extract target name into svc_cred
  sunrpc: Enable the kernel to specify the hostname part of service principals
  sunrpc: Don't use stack buffer with scatterlist
  rpc: remove unneeded variable 'ret' in rdma_listen_handler
  nfsd: use true and false for boolean values
  nfsd: constify write_op[]
  fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id
  NFSD: Handle full-length symlinks
  NFSD: Refactor the generic write vector fill helper
  svcrdma: Clean up Read chunk path
  svcrdma: Avoid releasing a page in svc_xprt_release()
  nfsd: Mark expected switch fall-through
  sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data'
  nfsd: fix leaked file lock with nfs exported overlayfs
  nfsd: don't advertise a SCSI layout for an unsupported request_queue
  nfsd: fix corrupted reply to badly ordered compound
  nfsd: clarify check_op_ordering
  ...
2018-08-23 16:00:10 -07:00
Linus Torvalds
1290290c92 Second merge window update
- Switch SMC over to rdma_get_gid_attr and remove the compat
 
 - Fix a crash in HFI1 with some BIOS's
 
 - Fix a randconfig failure
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlt9z6kACgkQOG33FX4g
 mxqLag//Zr3RrNSmswNjl+6PrWeg88GQstAm4gQps8mR03eI8Ha5fAS6MWuq1U8+
 LGP8bbBtE6fHSCkPSnN/owReZSTcTIbgWB021vBIUKUpytmxie+ugMVXIHwm6cIM
 E3pgPsRngTrqtuRxyOhyoaMKSjRCbSdMe43SWf42/S+9SxDkmXtBzymYT119bu/A
 eqU7z/HuRqcVxrgOinhPjcQkEFdRYUaRk+g6jzaQmZl8IjKHp/d3BSzmzbE6txMt
 txP5hu/msz4xkyusX/I6ZS3do+4WMIAMNhq9bVMo5pNu2RG1eo0LLmHgjsDsFI3u
 ZTaZQj6eYWlbwWRiOU+2Frf4kH4CGAyxlFVCr4yEvfe2VvEbZsLZbYuF1iasnqrP
 3Mrbh7Esj4MfDT/QbunnoX+49X/Rzma/a+tu6hqAhnGjvYsaDyy8WSNwHMs65f7m
 sii58arik9exJHVKMXZ5FJJydBYBcPt5IlkUdoTTZ1eLkoc2B9lG6lKZbWxUYNPx
 uS6XOyR4fcsmLfKfEvQuLZznWi3Jo4R17QKlL5ulqHrZUJV4PZ4Eiu5izScZ4ci/
 xcPGz1a4MT7Zg3V611qGSGt7zYDdD5+R+k/sDIOleCFIpueFzzh/I1oLkSIoXFIx
 DmwqCdRGeB5F7ZhfArFTezgiaMe55SPJKY5PAiThxzoostUGOfw=
 =6z1N
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull more rdma updates from Jason Gunthorpe:
 "This is the SMC cleanup promised, a randconfig regression fix, and
  kernel oops fix.

  Summary:

   - Switch SMC over to rdma_get_gid_attr and remove the compat

   - Fix a crash in HFI1 with some BIOS's

   - Fix a randconfig failure"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/ucm: fix UCM link error
  IB/hfi1: Invalid NUMA node information can cause a divide by zero
  RDMA/smc: Replace ib_query_gid with rdma_get_gid_attr
2018-08-23 15:34:48 -07:00
Martin Willi
c1dc291205 netfilter: xt_cluster: add dependency on conntrack module
The cluster match requires conntrack for matching packets. If the
netns does not have conntrack hooks registered, the match does not
work at all.

Implicitly load the conntrack hook for the family, exactly as many
other extensions do. This ensures that the match works even if the
hooks have not been registered by other means.

Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-23 20:26:53 +02:00