Commit Graph

56932 Commits

Author SHA1 Message Date
Chris Packham
d12cffe932 tipc: ensure head->lock is initialised
tipc_named_node_up() creates a skb list. It passes the list to
tipc_node_xmit() which has some code paths that can call
skb_queue_purge() which relies on the list->lock being initialised.

The spin_lock is only needed if the messages end up on the receive path
but when the list is created in tipc_named_node_up() we don't
necessarily know if it is going to end up there.

Once all the skb list users are updated in tipc it will then be possible
to update them to use the unlocked variants of the skb list functions
and initialise the lock when we know the message will follow the receive
path.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-12 15:34:26 -07:00
Vlad Buslov
c1a970d06f net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd()
After recent refactoring of block offlads infrastructure, indr_dev->block
pointer is dereferenced before it is verified to be non-NULL. Example stack
trace where this behavior leads to NULL-pointer dereference error when
creating vxlan dev on system with mlx5 NIC with offloads enabled:

[ 1157.852938] ==================================================================
[ 1157.866877] BUG: KASAN: null-ptr-deref in tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.880877] Read of size 4 at addr 0000000000000090 by task ip/3829
[ 1157.901637] CPU: 22 PID: 3829 Comm: ip Not tainted 5.2.0-rc6+ #488
[ 1157.914438] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1157.929031] Call Trace:
[ 1157.938318]  dump_stack+0x9a/0xeb
[ 1157.948362]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.960262]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.972082]  __kasan_report+0x176/0x192
[ 1157.982513]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.994348]  kasan_report+0xe/0x20
[ 1158.004324]  tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1158.015950]  ? tcf_block_setup+0x430/0x430
[ 1158.026558]  ? kasan_unpoison_shadow+0x30/0x40
[ 1158.037464]  __tc_indr_block_cb_register+0x5f5/0xf20
[ 1158.049288]  ? mlx5e_rep_indr_tc_block_unbind+0xa0/0xa0 [mlx5_core]
[ 1158.062344]  ? tc_indr_block_dev_put.part.47+0x5c0/0x5c0
[ 1158.074498]  ? rdma_roce_rescan_device+0x20/0x20 [ib_core]
[ 1158.086580]  ? br_device_event+0x98/0x480 [bridge]
[ 1158.097870]  ? strcmp+0x30/0x50
[ 1158.107578]  mlx5e_nic_rep_netdevice_event+0xdd/0x180 [mlx5_core]
[ 1158.120212]  notifier_call_chain+0x6d/0xa0
[ 1158.130753]  register_netdevice+0x6fc/0x7e0
[ 1158.141322]  ? netdev_change_features+0xa0/0xa0
[ 1158.152218]  ? vxlan_config_apply+0x210/0x310 [vxlan]
[ 1158.163593]  __vxlan_dev_create+0x2ad/0x520 [vxlan]
[ 1158.174770]  ? vxlan_changelink+0x490/0x490 [vxlan]
[ 1158.185870]  ? rcu_read_unlock+0x60/0x60 [vxlan]
[ 1158.196798]  vxlan_newlink+0x99/0xf0 [vxlan]
[ 1158.207303]  ? __vxlan_dev_create+0x520/0x520 [vxlan]
[ 1158.218601]  ? rtnl_create_link+0x3d0/0x450
[ 1158.228900]  __rtnl_newlink+0x8a7/0xb00
[ 1158.238701]  ? stack_access_ok+0x35/0x80
[ 1158.248450]  ? rtnl_link_unregister+0x1a0/0x1a0
[ 1158.258735]  ? find_held_lock+0x6d/0xd0
[ 1158.268379]  ? is_bpf_text_address+0x67/0xf0
[ 1158.278330]  ? lock_acquire+0xc1/0x1f0
[ 1158.287686]  ? is_bpf_text_address+0x5/0xf0
[ 1158.297449]  ? is_bpf_text_address+0x86/0xf0
[ 1158.307310]  ? kernel_text_address+0xec/0x100
[ 1158.317155]  ? arch_stack_walk+0x92/0xe0
[ 1158.326497]  ? __kernel_text_address+0xe/0x30
[ 1158.336213]  ? unwind_get_return_address+0x2f/0x50
[ 1158.346267]  ? create_prof_cpu_mask+0x20/0x20
[ 1158.355936]  ? arch_stack_walk+0x92/0xe0
[ 1158.365117]  ? stack_trace_save+0x8a/0xb0
[ 1158.374272]  ? stack_trace_consume_entry+0x80/0x80
[ 1158.384226]  ? match_held_lock+0x33/0x210
[ 1158.393216]  ? kasan_unpoison_shadow+0x30/0x40
[ 1158.402593]  rtnl_newlink+0x53/0x80
[ 1158.410925]  rtnetlink_rcv_msg+0x3a5/0x600
[ 1158.419777]  ? validate_linkmsg+0x400/0x400
[ 1158.428620]  ? find_held_lock+0x6d/0xd0
[ 1158.437117]  ? match_held_lock+0x1b/0x210
[ 1158.445760]  ? validate_linkmsg+0x400/0x400
[ 1158.454642]  netlink_rcv_skb+0xc7/0x1f0
[ 1158.463150]  ? netlink_ack+0x470/0x470
[ 1158.471538]  ? netlink_deliver_tap+0x1f3/0x5a0
[ 1158.480607]  netlink_unicast+0x2ae/0x350
[ 1158.489099]  ? netlink_attachskb+0x340/0x340
[ 1158.497935]  ? _copy_from_iter_full+0xde/0x3b0
[ 1158.506945]  ? __virt_addr_valid+0xb6/0xf0
[ 1158.515578]  ? __check_object_size+0x159/0x240
[ 1158.524515]  netlink_sendmsg+0x4d3/0x630
[ 1158.532879]  ? netlink_unicast+0x350/0x350
[ 1158.541400]  ? netlink_unicast+0x350/0x350
[ 1158.549805]  sock_sendmsg+0x94/0xa0
[ 1158.557561]  ___sys_sendmsg+0x49d/0x570
[ 1158.565625]  ? copy_msghdr_from_user+0x210/0x210
[ 1158.574457]  ? __fput+0x1e2/0x330
[ 1158.581948]  ? __kasan_slab_free+0x130/0x180
[ 1158.590407]  ? kmem_cache_free+0xb6/0x2d0
[ 1158.598574]  ? mark_lock+0xc7/0x790
[ 1158.606177]  ? task_work_run+0xcf/0x100
[ 1158.614165]  ? exit_to_usermode_loop+0x102/0x110
[ 1158.622954]  ? __lock_acquire+0x963/0x1ee0
[ 1158.631199]  ? lockdep_hardirqs_on+0x260/0x260
[ 1158.639777]  ? match_held_lock+0x1b/0x210
[ 1158.647918]  ? lockdep_hardirqs_on+0x260/0x260
[ 1158.656501]  ? match_held_lock+0x1b/0x210
[ 1158.664643]  ? __fget_light+0xa6/0xe0
[ 1158.672423]  ? __sys_sendmsg+0xd2/0x150
[ 1158.680334]  __sys_sendmsg+0xd2/0x150
[ 1158.688063]  ? __ia32_sys_shutdown+0x30/0x30
[ 1158.696435]  ? lock_downgrade+0x2e0/0x2e0
[ 1158.704541]  ? mark_held_locks+0x1a/0x90
[ 1158.712611]  ? mark_held_locks+0x1a/0x90
[ 1158.720619]  ? do_syscall_64+0x1e/0x2c0
[ 1158.728530]  do_syscall_64+0x78/0x2c0
[ 1158.736254]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1158.745414] RIP: 0033:0x7f62d505cb87
[ 1158.753070] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00[87/1817]
 48 89 f3 48
[ 1158.780924] RSP: 002b:00007fffd9832268 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1158.793204] RAX: ffffffffffffffda RBX: 000000005d26048f RCX: 00007f62d505cb87
[ 1158.805111] RDX: 0000000000000000 RSI: 00007fffd98322d0 RDI: 0000000000000003
[ 1158.817055] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1158.828987] R10: 00007f62d50ce260 R11: 0000000000000246 R12: 0000000000000001
[ 1158.840909] R13: 000000000067e540 R14: 0000000000000000 R15: 000000000067ed20
[ 1158.852873] ==================================================================

Introduce new function tcf_block_non_null_shared() that verifies block
pointer before dereferencing it to obtain index. Use the function in
tc_indr_block_ing_cmd() to prevent NULL pointer dereference.

Fixes: 955bcb6ea0 ("drivers: net: use flow block API")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-12 15:21:53 -07:00
Taehee Yoo
6b660c4177 net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
When a vport is deleted, the maximum headroom size would be changed.
If the vport which has the largest headroom is deleted,
the new max_headroom would be set.
But, if the new headroom size is equal to the old headroom size,
updating routine is unnecessary.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-12 15:16:58 -07:00
David S. Miller
3194d6adfe Merge branch 'net/rds-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux
Santosh Shilimkar says:

====================
rds fixes

Few rds fixes which makes rds rdma transport reliably working on mainline

First two fixes are applicable to v4.11+ stable versions and last
three patches applies to only v5.1 stable and current mainline.

Patchset is re-based against 'net' and also available on below tree
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11 15:27:20 -07:00
Eric Dumazet
d44e3fa5d7 ipv6: fix static key imbalance in fl_create()
fl_create() should call static_branch_deferred_inc() only in
case of success.

Also we should not call fl_free() in error path, as this could
cause a static key imbalance.

jump label: negative count!
WARNING: CPU: 0 PID: 15907 at kernel/jump_label.c:221 static_key_slow_try_dec kernel/jump_label.c:221 [inline]
WARNING: CPU: 0 PID: 15907 at kernel/jump_label.c:221 static_key_slow_try_dec+0x1ab/0x1d0 kernel/jump_label.c:206
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 15907 Comm: syz-executor.2 Not tainted 5.2.0-rc6+ #62
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 panic+0x2cb/0x744 kernel/panic.c:219
 __warn.cold+0x20/0x4d kernel/panic.c:576
 report_bug+0x263/0x2b0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:179 [inline]
 fixup_bug arch/x86/kernel/traps.c:174 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:986
RIP: 0010:static_key_slow_try_dec kernel/jump_label.c:221 [inline]
RIP: 0010:static_key_slow_try_dec+0x1ab/0x1d0 kernel/jump_label.c:206
Code: c0 e8 e9 3e e5 ff 83 fb 01 0f 85 32 ff ff ff e8 5b 3d e5 ff 45 31 ff eb a0 e8 51 3d e5 ff 48 c7 c7 40 99 92 87 e8 13 75 b7 ff <0f> 0b eb 8b 4c 89 e7 e8 a9 c0 1e 00 e9 de fe ff ff e8 bf 6d b7 ff
RSP: 0018:ffff88805f9c7450 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000000
RDX: 000000000000e3e1 RSI: ffffffff815adb06 RDI: ffffed100bf38e7c
RBP: ffff88805f9c74e0 R08: ffff88806acf0700 R09: ffffed1015d060a9
R10: ffffed1015d060a8 R11: ffff8880ae830547 R12: ffffffff89832ce0
R13: ffff88805f9c74b8 R14: 1ffff1100bf38e8b R15: 00000000ffffff01
 __static_key_slow_dec_deferred+0x65/0x110 kernel/jump_label.c:272
 fl_free+0xa9/0xe0 net/ipv6/ip6_flowlabel.c:121
 fl_create+0x6af/0x9f0 net/ipv6/ip6_flowlabel.c:457
 ipv6_flowlabel_opt+0x80e/0x2730 net/ipv6/ip6_flowlabel.c:624
 do_ipv6_setsockopt.isra.0+0x2119/0x4100 net/ipv6/ipv6_sockglue.c:825
 ipv6_setsockopt+0xf6/0x170 net/ipv6/ipv6_sockglue.c:944
 tcp_setsockopt net/ipv4/tcp.c:3131 [inline]
 tcp_setsockopt+0x8f/0xe0 net/ipv4/tcp.c:3125
 sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3130
 __sys_setsockopt+0x253/0x4b0 net/socket.c:2080
 __do_sys_setsockopt net/socket.c:2096 [inline]
 __se_sys_setsockopt net/socket.c:2093 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:2093
 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4597c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2670556c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004597c9
RDX: 0000000000000020 RSI: 0000000000000029 RDI: 0000000000000003
RBP: 000000000075bfc8 R08: 000000000000fdf7 R09: 0000000000000000
R10: 0000000020000000 R11: 0000000000000246 R12: 00007f26705576d4
R13: 00000000004cec00 R14: 00000000004dd520 R15: 00000000ffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: 59c820b231 ("ipv6: elide flowlabel check if no exclusive leases exist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11 14:43:25 -07:00
Eric Dumazet
8975a3abc3 ipv6: fix potential crash in ip6_datagram_dst_update()
Willem forgot to change one of the calls to fl6_sock_lookup(),
which can now return an error or NULL.

syzbot reported :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 31763 Comm: syz-executor.0 Not tainted 5.2.0-rc6+ #63
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:ip6_datagram_dst_update+0x559/0xc30 net/ipv6/datagram.c:83
Code: 00 00 e8 ea 29 3f fb 4d 85 f6 0f 84 96 04 00 00 e8 dc 29 3f fb 49 8d 7e 20 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 16 06 00 00 4d 8b 6e 20 e8 b4 29 3f fb 4c 89 ee
RSP: 0018:ffff88809ba97ae0 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: ffff8880a81254b0 RCX: ffffc90008118000
RDX: 0000000000000003 RSI: ffffffff86319a84 RDI: 000000000000001e
RBP: ffff88809ba97c10 R08: ffff888065e9e700 R09: ffffed1015d26c80
R10: ffffed1015d26c7f R11: ffff8880ae9363fb R12: ffff8880a8124f40
R13: 0000000000000001 R14: fffffffffffffffe R15: ffff88809ba97b40
FS:  00007f38e606a700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000202c0140 CR3: 00000000a026a000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __ip6_datagram_connect+0x5e9/0x1390 net/ipv6/datagram.c:246
 ip6_datagram_connect+0x30/0x50 net/ipv6/datagram.c:269
 ip6_datagram_connect_v6_only+0x69/0x90 net/ipv6/datagram.c:281
 inet_dgram_connect+0x14a/0x2d0 net/ipv4/af_inet.c:571
 __sys_connect+0x264/0x330 net/socket.c:1824
 __do_sys_connect net/socket.c:1835 [inline]
 __se_sys_connect net/socket.c:1832 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1832
 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4597c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f38e6069c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004597c9
RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f38e606a6d4
R13: 00000000004bfd07 R14: 00000000004d1838 R15: 00000000ffffffff
Modules linked in:
RIP: 0010:ip6_datagram_dst_update+0x559/0xc30 net/ipv6/datagram.c:83
Code: 00 00 e8 ea 29 3f fb 4d 85 f6 0f 84 96 04 00 00 e8 dc 29 3f fb 49 8d 7e 20 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 16 06 00 00 4d 8b 6e 20 e8 b4 29 3f fb 4c 89 ee

Fixes: 59c820b231 ("ipv6: elide flowlabel check if no exclusive leases exist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11 14:43:25 -07:00
Eric Dumazet
052e0690f1 ipv6: tcp: fix flowlabels reflection for RST packets
In 323a53c412 ("ipv6: tcp: enable flowlabel reflection in some RST packets")
and 50a8accf10 ("ipv6: tcp: send consistent flowlabel in TIME_WAIT state")
we took care of IPv6 flowlabel reflections for two cases.

This patch takes care of the remaining case, when the RST packet
is sent on behalf of a 'full' socket.

In Marek use case, this was a socket in TCP_CLOSE state.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marek Majkowski <marek@cloudflare.com>
Tested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11 14:43:25 -07:00
yangxingwu
416e8126a2 ipv6: Use ipv6_authlen for len
The length of AH header is computed manually as (hp->hdrlen+2)<<2.
However, in include/linux/ipv6.h, a macro named ipv6_authlen is
already defined for exactly the same job. This commit replaces
the manual computation code with the macro.

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11 14:43:25 -07:00
Cong Wang
311633b604 hsr: switch ->dellink() to ->ndo_uninit()
Switching from ->priv_destructor to dellink() has an unexpected
consequence: existing RCU readers, that is, hsr_port_get_hsr()
callers, may still be able to read the port list.

Instead of checking the return value of each hsr_port_get_hsr(),
we can just move it to ->ndo_uninit() which is called after
device unregister and synchronize_net(), and we still have RTNL
lock there.

Fixes: b9a1e62740 ("hsr: implement dellink to clean up resources")
Fixes: edf070a0fb ("hsr: fix a NULL pointer deref in hsr_dev_xmit()")
Reported-by: syzbot+097ef84cdc95843fbaa8@syzkaller.appspotmail.com
Cc: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11 14:37:45 -07:00
Linus Torvalds
237f83dfbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Some highlights from this development cycle:

   1) Big refactoring of ipv6 route and neigh handling to support
      nexthop objects configurable as units from userspace. From David
      Ahern.

   2) Convert explored_states in BPF verifier into a hash table,
      significantly decreased state held for programs with bpf2bpf
      calls, from Alexei Starovoitov.

   3) Implement bpf_send_signal() helper, from Yonghong Song.

   4) Various classifier enhancements to mvpp2 driver, from Maxime
      Chevallier.

   5) Add aRFS support to hns3 driver, from Jian Shen.

   6) Fix use after free in inet frags by allocating fqdirs dynamically
      and reworking how rhashtable dismantle occurs, from Eric Dumazet.

   7) Add act_ctinfo packet classifier action, from Kevin
      Darbyshire-Bryant.

   8) Add TFO key backup infrastructure, from Jason Baron.

   9) Remove several old and unused ISDN drivers, from Arnd Bergmann.

  10) Add devlink notifications for flash update status to mlxsw driver,
      from Jiri Pirko.

  11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.

  12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.

  13) Various enhancements to ipv6 flow label handling, from Eric
      Dumazet and Willem de Bruijn.

  14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
      der Merwe, and others.

  15) Various improvements to axienet driver including converting it to
      phylink, from Robert Hancock.

  16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.

  17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
      Radulescu.

  18) Add devlink health reporting to mlx5, from Moshe Shemesh.

  19) Convert stmmac over to phylink, from Jose Abreu.

  20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
      Shalom Toledo.

  21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.

  22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.

  23) Track spill/fill of constants in BPF verifier, from Alexei
      Starovoitov.

  24) Support bounded loops in BPF, from Alexei Starovoitov.

  25) Various page_pool API fixes and improvements, from Jesper Dangaard
      Brouer.

  26) Just like ipv4, support ref-countless ipv6 route handling. From
      Wei Wang.

  27) Support VLAN offloading in aquantia driver, from Igor Russkikh.

  28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.

  29) Add flower GRE encap/decap support to nfp driver, from Pieter
      Jansen van Vuuren.

  30) Protect against stack overflow when using act_mirred, from John
      Hurley.

  31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.

  32) Use page_pool API in netsec driver, Ilias Apalodimas.

  33) Add Google gve network driver, from Catherine Sullivan.

  34) More indirect call avoidance, from Paolo Abeni.

  35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.

  36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.

  37) Add MPLS manipulation actions to TC, from John Hurley.

  38) Add sending a packet to connection tracking from TC actions, and
      then allow flower classifier matching on conntrack state. From
      Paul Blakey.

  39) Netfilter hw offload support, from Pablo Neira Ayuso"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
  net/mlx5e: Return in default case statement in tx_post_resync_params
  mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
  net: dsa: add support for BRIDGE_MROUTER attribute
  pkt_sched: Include const.h
  net: netsec: remove static declaration for netsec_set_tx_de()
  net: netsec: remove superfluous if statement
  netfilter: nf_tables: add hardware offload support
  net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
  net: flow_offload: add flow_block_cb_is_busy() and use it
  net: sched: remove tcf block API
  drivers: net: use flow block API
  net: sched: use flow block API
  net: flow_offload: add flow_block_cb_{priv, incref, decref}()
  net: flow_offload: add list handling functions
  net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
  net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
  net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
  net: flow_offload: add flow_block_cb_setup_simple()
  net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
  net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
  ...
2019-07-11 10:55:49 -07:00
Linus Torvalds
d2b6b4c832 Highlights:
- Add a new /proc/fs/nfsd/clients/ directory which exposes some
   long-requested information about NFSv4 clients (like open files) and
   allows forced revocation of client state.
 
 - Replace the global duplicate reply cache by a cache per network
   namespace; previously, a request in one network namespace could
   incorrectly match an entry from another, though we haven't seen this
   in production.  This is the last remaining container bug that I'm
   aware of; at this point you should be able to run separate nfsd's in
   each network namespace, each with their own set of exports, and
   everything should work.
 
 - Cleanup and modify lock code to show the pid of lockd as the owner of
   NLM locks.  This is the correct version of the bugfix originally
   attempted in b8eee0e90f "lockd: Show pid of lockd for remote locks".
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl0mX+YVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+EoYQAIbNV7tqpnWRk19ulxveif9zRLMV
 ImW99rNhzjfLoIBBTclncCrU1+b2VHqlVGYvml+rdsl+fUCESj2m9/P+D70WHDsl
 tk2NJoXkSe1tW4G3YltRfSNNQIsUsEGRa88/4gAT0vYA2OCFDpYzrMleENISQFTp
 QQ+p1ct5tofTZbelx5KqdFnLRnQlUeykJbW68/YKIdtNF+nhq07LlvpVKjy4f3MB
 rK93qn9YUtnNKldkrP2tWjiPAnzJFiX9XFRPLo2JCv13G28XhhuNp2PmWqsVoY+/
 8YMfXY9C028YbrHG9ebwH197XcY1p6ROBZhRxGczEmiSrAHLap8rNGjyYk6+4eO9
 5HAFUQJcFEA1NUD84kpUKNZs9PIi818IgI5FhuJrcCKt8OAeyNJaOo0YU3EhzND2
 /iPt+FCBlJwEwXI9WSjZiyW3OFKuvCZZk99iN2s33X0dNqMSrkQVe4AmHm7vYlzF
 KD0pthVaOwAA9sHua5MSTpi5LHH/IBdWU49NoCgzK277w8xi05oI6ZkYFJQ9hncV
 PIWtmmW1b3uHF95s6Ko7mSU7GLEWB9Ux6B1sfOVNgMETK4i2z0ezUDJ+Hp9RSDcJ
 iHrU3kaGZ60uq3HPwunlhOYuSDt5sew5GIpNdheGoLOjuhySK7ZBwFuvupqZKC7H
 4nxqlrHVI4B8FOAH
 =pAAs
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Highlights:

   - Add a new /proc/fs/nfsd/clients/ directory which exposes some
     long-requested information about NFSv4 clients (like open files)
     and allows forced revocation of client state.

   - Replace the global duplicate reply cache by a cache per network
     namespace; previously, a request in one network namespace could
     incorrectly match an entry from another, though we haven't seen
     this in production. This is the last remaining container bug that
     I'm aware of; at this point you should be able to run separate
     nfsd's in each network namespace, each with their own set of
     exports, and everything should work.

   - Cleanup and modify lock code to show the pid of lockd as the owner
     of NLM locks. This is the correct version of the bugfix originally
     attempted in b8eee0e90f ("lockd: Show pid of lockd for remote
     locks")"

* tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits)
  nfsd: Make __get_nfsdfs_client() static
  nfsd: Make two functions static
  nfsd: Fix misuse of strlcpy
  sunrpc/cache: remove the exporting of cache_seq_next
  nfsd: decode implementation id
  nfsd: create xdr_netobj_dup helper
  nfsd: allow forced expiration of NFSv4 clients
  nfsd: create get_nfsdfs_clp helper
  nfsd4: show layout stateids
  nfsd: show lock and deleg stateids
  nfsd4: add file to display list of client's opens
  nfsd: add more information to client info file
  nfsd: escape high characters in binary data
  nfsd: copy client's address including port number to cl_addr
  nfsd4: add a client info file
  nfsd: make client/ directory names small ints
  nfsd: add nfsd/clients directory
  nfsd4: use reference count to free client
  nfsd: rename cl_refcount
  nfsd: persist nfsd filesystem across mounts
  ...
2019-07-10 21:22:43 -07:00
Linus Torvalds
e6983afd92 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl0kWgwACgkQnJ2qBz9k
 QNkkdggA7bdy6xZRPdumZMxtASGDs1JJ4diNs+apgyc6wUfsT1lCE2ap20EdzfzK
 drAvlJt1vYEW+6apOzUXJ0qWXMVRzy4XRl+jVMO9GW6BoY4OyJQ86AQZlEv1zZ4n
 vxeYnlbxA7JyfkWgup0ZSb5EKRSO1eSxZKEZou0wu2jRCRr/E5RyjPQHXaiE5ihc
 7ilEtTI3Qg3nnAK30F0Iy0X3lGqgXj+rlJ0TgR8BBEDllct2wV16vvMl/Sy+BXip
 5sSWjSy8zntMnkSN8yH/oJN0D+fqmCsnYafwqTpPek8izvEz4xpjshbWTDnPm0HM
 eiMC1U3ZJoD3Z4/wxRZ91m60VYgJBA==
 =SVKR
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "This contains cleanups of the fsnotify name removal hook and also a
  patch to disable fanotify permission events for 'proc' filesystem"

* tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: get rid of fsnotify_nameremove()
  fsnotify: move fsnotify_nameremove() hook out of d_delete()
  configfs: call fsnotify_rmdir() hook
  debugfs: call fsnotify_{unlink,rmdir}() hooks
  debugfs: simplify __debugfs_remove_file()
  devpts: call fsnotify_unlink() hook
  tracefs: call fsnotify_{unlink,rmdir}() hooks
  rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks
  btrfs: call fsnotify_rmdir() hook
  fsnotify: add empty fsnotify_{unlink,rmdir}() hooks
  fanotify: Disallow permission events for proc filesystem
2019-07-10 20:09:17 -07:00
Linus Torvalds
028db3e290 Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs"
This reverts merge 0f75ef6a9c (and thus
effectively commits

   7a1ade8475 ("keys: Provide KEYCTL_GRANT_PERMISSION")
   2e12256b9a ("keys: Replace uid/gid/perm permissions checking with an ACL")

that the merge brought in).

It turns out that it breaks booting with an encrypted volume, and Eric
biggers reports that it also breaks the fscrypt tests [1] and loading of
in-kernel X.509 certificates [2].

The root cause of all the breakage is likely the same, but David Howells
is off email so rather than try to work it out it's getting reverted in
order to not impact the rest of the merge window.

 [1] https://lore.kernel.org/lkml/20190710011559.GA7973@sol.localdomain/
 [2] https://lore.kernel.org/lkml/20190710013225.GB7973@sol.localdomain/

Link: https://lore.kernel.org/lkml/CAHk-=wjxoeMJfeBahnWH=9zShKp2bsVy527vo3_y8HfOdhwAAw@mail.gmail.com/
Reported-by: Eric Biggers <ebiggers@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-10 18:43:43 -07:00
Santosh Shilimkar
dc205a8d34 rds: avoid version downgrade to legitimate newer peer connections
Connections with legitimate tos values can get into usual connection
race. It can result in consumer reject. We don't want tos value or
protocol version to be demoted for such connections otherwise
piers would end up different tos values which can results in
no connection. Example a peer initiated connection with say
tos 8 while usual connection racing can get downgraded to tos 0
which is not desirable.

Patch fixes above issue introduced by commit
commit d021fabf52 ("rds: rdma: add consumer reject")

Reported-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Tested-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-07-09 21:45:43 -07:00
Gerd Rausch
fc640d4cbe rds: Return proper "tos" value to user-space
The proper "tos" value needs to be returned
to user-space (sockopt RDS_INFO_CONNECTIONS).

Fixes: 3eb450367d ("rds: add type of service(tos) infrastructure")
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-07-09 21:45:42 -07:00
Gerd Rausch
8c6166cfc9 rds: Accept peer connection reject messages due to incompatible version
Prior to
commit d021fabf52 ("rds: rdma: add consumer reject")

function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
connection attempt by issuing a "rds_conn_drop".

The commit mentioned above added a "break", eliminating
the "fallthrough" case and made the "rds_conn_drop" rather conditional:

Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
carries an integer-value of "1" inside "private_data":

  if (!conn)
    break;
    err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
    if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
      pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
              &conn->c_laddr, &conn->c_faddr);
              conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
              rds_conn_drop(conn);
    }
    rdsdebug("Connection rejected: %s\n",
             rdma_reject_msg(cm_id, event->status));
    break;
    /* FALLTHROUGH */
A number of issues are worth mentioning here:
   #1) Previous versions of the RDS code simply rejected a connection
       by calling "rdma_reject(cm_id, NULL, 0);"
       So the value of the payload in "private_data" will not be "1",
       but "0".

   #2) Now the code has become dependent on host byte order and sizing.
       If one peer is big-endian, the other is little-endian,
       or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
       the *err check does not work as intended.

   #3) There is no check for "len" to see if the data behind *err is even valid.
       Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" will always
       carry 148 bytes of zeroized payload.
       But that should probably not be relied upon here.

   #4) With the added "break;",
       we might as well drop the misleading "/* FALLTHROUGH */" comment.

This commit does _not_ address issue #2, as the sender would have to
agree on a byte order as well.

Here is the sequence of messages in this observed error-scenario:
   Host-A is pre-QoS changes (excluding the commit mentioned above)
   Host-B is post-QoS changes (including the commit mentioned above)

   #1 Host-B
      issues a connection request via function "rds_conn_path_transition"
      connection state transitions to "RDS_CONN_CONNECTING"

   #2 Host-A
      rejects the incompatible connection request (from #1)
      It does so by calling "rdma_reject(cm_id, NULL, 0);"

   #3 Host-B
      receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
      But since the code is changed in the way described above,
      it won't drop the connection here, simply because "*err == 0".

   #4 Host-A
      issues a connection request

   #5 Host-B
      receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
      and ends up calling "rds_ib_cm_handle_connect".
      But since the state is already in "RDS_CONN_CONNECTING"
      (as of #1) it will end up issuing a "rdma_reject" without
      dropping the connection:
         if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
             /* Wait and see - our connect may still be succeeding */
             rds_ib_stats_inc(s_ib_connect_raced);
         }
         goto out;

   #6 Host-A
      receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
      drops the connection and tries again (goto #4) until it gives up.

Tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-07-09 21:45:42 -07:00
Gerd Rausch
a552078847 Revert "RDS: IB: split the mr registration and invalidation path"
This reverts commit 5601245931.

RDS kept spinning inside function "rds_ib_post_reg_frmr", waiting for
"i_fastreg_wrs" to become incremented:
         while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
                 atomic_inc(&ibmr->ic->i_fastreg_wrs);
                 cpu_relax();
         }

Looking at the original commit:

commit 5601245931 ("RDS: IB: split the mr registration and
invalidation path")

In there, the "rds_ib_mr_cqe_handler" was changed in the following
way:

 void rds_ib_mr_cqe_handler(struct
 rds_ib_connection *ic,
 struct ib_wc *wc)
        if (frmr->fr_inv) {
                  frmr->fr_state = FRMR_IS_FREE;
                  frmr->fr_inv = false;
                atomic_inc(&ic->i_fastreg_wrs);
        } else {
                atomic_inc(&ic->i_fastunreg_wrs);
        }

It looks like it's got it exactly backwards:

Function "rds_ib_post_reg_frmr" keeps track of the outstanding
requests via "i_fastreg_wrs".

Function "rds_ib_post_inv" keeps track of the outstanding requests
via "i_fastunreg_wrs" (post original commit). It also sets:
         frmr->fr_inv = true;

However the completion handler "rds_ib_mr_cqe_handler" adjusts
"i_fastreg_wrs" when "fr_inv" had been true, and adjusts
"i_fastunreg_wrs" otherwise.

The original commit was done in the name of performance:
to remove the performance bottleneck

No performance benefit could be observed with a fixed-up version
of the original commit measured between two Oracle X7 servers,
both equipped with Mellanox Connect-X5 HCAs.

The prudent course of action is to revert this commit.

Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-07-09 21:45:41 -07:00
Santosh Shilimkar
616d37a070 rds: fix reordering with composite message notification
RDS composite message(rdma + control) user notification needs to be
triggered once the full message is delivered and such a fix was
added as part of commit 941f8d55f6 ("RDS: RDMA: Fix the composite
message user notification"). But rds_send_remove_from_sock is missing
data part notify check and hence at times the user don't get
notification which isn't desirable.

One way is to fix the rds_send_remove_from_sock to check of that case
but considering the ordering complexity with completion handler and
rdma + control messages are always dispatched back to back in same send
context, just delaying the signaled completion on rmda work request also
gets the desired behaviour. i.e Notifying application only after
RDMA + control message send completes. So patch updates the earlier
fix with this approach. The delay signaling completions of rdma op
till the control message send completes fix was done by Venkat
Venkatsubra in downstream kernel.

Reviewed-and-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-07-09 21:45:41 -07:00
Vivien Didelot
08cc83cc7f net: dsa: add support for BRIDGE_MROUTER attribute
This patch adds support for enabling or disabling the flooding of
unknown multicast traffic on the CPU ports, depending on the value
of the switchdev SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute.

The current behavior is kept unchanged but a user can now prevent
the CPU conduit to be flooded with a lot of unregistered traffic that
the network stack needs to filter in software with e.g.:

    echo 0 > /sys/class/net/br0/multicast_router

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:49:34 -07:00
Pablo Neira Ayuso
c9626a2cbd netfilter: nf_tables: add hardware offload support
This patch adds hardware offload support for nftables through the
existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER
classifier and the flow rule API. This hardware offload support is
available for the NFPROTO_NETDEV family and the ingress hook.

Each nftables expression has a new ->offload interface, that is used to
populate the flow rule object that is attached to the transaction
object.

There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload
an entire table, including all of its chains.

This patch supports for basic metadata (layer 3 and 4 protocol numbers),
5-tuple payload matching and the accept/drop actions; this also includes
basechain hardware offload only.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:51 -07:00
Pablo Neira Ayuso
f9e30088d2 net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
And any other existing fields in this structure that refer to tc.
Specifically:

* tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule().
* TC_CLSFLOWER_* to FLOW_CLS_*.
* tc_cls_common_offload to tc_cls_common_offload.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:51 -07:00
Pablo Neira Ayuso
0d4fd02e71 net: flow_offload: add flow_block_cb_is_busy() and use it
This patch adds a function to check if flow block callback is already in
use.  Call this new function from flow_block_cb_setup_simple() and from
drivers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
722d36e6e2 net: sched: remove tcf block API
Unused, now replaced by flow block API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
955bcb6ea0 drivers: net: use flow block API
This patch updates flow_block_cb_setup_simple() to use the flow block API.
Several drivers are also adjusted to use it.

This patch introduces the per-driver list of flow blocks to account for
blocks that are already in use.

Remove tc_block_offload alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
59094b1e50 net: sched: use flow block API
This patch adds tcf_block_setup() which uses the flow block API.

This infrastructure takes the flow block callbacks coming from the
driver and register/unregister to/from the cls_api core.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
67bd0d5ea7 net: flow_offload: add flow_block_cb_{priv, incref, decref}()
This patch completes the flow block API to introduce:

* flow_block_cb_priv() to access callback private data.
* flow_block_cb_incref() to bump reference counter on this flow block.
* flow_block_cb_decref() to decrement the reference counter.

These functions are taken from the existing tcf_block_cb_priv(),
tcf_block_cb_incref() and tcf_block_cb_decref().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
da3eeb904f net: flow_offload: add list handling functions
This patch adds the list handling functions for the flow block API:

* flow_block_cb_lookup() allows drivers to look up for existing flow blocks.
* flow_block_cb_add() adds a flow block to the per driver list to be registered
  by the core.
* flow_block_cb_remove() to remove a flow block from the list of existing
  flow blocks per driver and to request the core to unregister this.

The flow block API also annotates the netns this flow block belongs to.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
d63db30c85 net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
Add a new helper function to allocate flow_block_cb objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
32f8c4093a net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
Rename from TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* and
remove temporary tcf_block_binder_type alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
9c0e189ec9 net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
Rename from TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND and remove
temporary tc_block_command alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso
4e95bc268b net: flow_offload: add flow_block_cb_setup_simple()
Most drivers do the same thing to set up the flow block callbacks, this
patch adds a helper function to do this.

This preparation patch reduces the number of changes to adapt the
existing drivers to use the flow block callback API.

This new helper function takes a flow block list per-driver, which is
set to NULL until this driver list is used.

This patch also introduces the flow_block_command and
flow_block_binder_type enumerations, which are renamed to use
FLOW_BLOCK_* in follow up patches.

There are three definitions (aliases) in order to reduce the number of
updates in this patch, which go away once drivers are fully adapted to
use this flow block API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Linus Torvalds
e9a83bd232 It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro.  These create more
    than the usual number of simple but annoying merge conflicts with other
    trees, unfortunately.  He has a lot more of these waiting on the wings
    that, I think, will go to you directly later on.
 
  - A new document on how to use merges and rebases in kernel repos, and one
    on Spectre vulnerabilities.
 
  - Various improvements to the build system, including automatic markup of
    function() references because some people, for reasons I will never
    understand, were of the opinion that :c:func:``function()`` is
    unattractive and not fun to type.
 
  - We now recommend using sphinx 1.7, but still support back to 1.4.
 
  - Lots of smaller improvements, warning fixes, typo fixes, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km
 gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8
 raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF
 3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW
 DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m
 dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY=
 =D0eO
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.3' of git://git.lwn.net/linux

Pull Documentation updates from Jonathan Corbet:
 "It's been a relatively busy cycle for docs:

   - A fair pile of RST conversions, many from Mauro. These create more
     than the usual number of simple but annoying merge conflicts with
     other trees, unfortunately. He has a lot more of these waiting on
     the wings that, I think, will go to you directly later on.

   - A new document on how to use merges and rebases in kernel repos,
     and one on Spectre vulnerabilities.

   - Various improvements to the build system, including automatic
     markup of function() references because some people, for reasons I
     will never understand, were of the opinion that
     :c:func:``function()`` is unattractive and not fun to type.

   - We now recommend using sphinx 1.7, but still support back to 1.4.

   - Lots of smaller improvements, warning fixes, typo fixes, etc"

* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
  docs: automarkup.py: ignore exceptions when seeking for xrefs
  docs: Move binderfs to admin-guide
  Disable Sphinx SmartyPants in HTML output
  doc: RCU callback locks need only _bh, not necessarily _irq
  docs: format kernel-parameters -- as code
  Doc : doc-guide : Fix a typo
  platform: x86: get rid of a non-existent document
  Add the RCU docs to the core-api manual
  Documentation: RCU: Add TOC tree hooks
  Documentation: RCU: Rename txt files to rst
  Documentation: RCU: Convert RCU UP systems to reST
  Documentation: RCU: Convert RCU linked list to reST
  Documentation: RCU: Convert RCU basic concepts to reST
  docs: filesystems: Remove uneeded .rst extension on toctables
  scripts/sphinx-pre-install: fix out-of-tree build
  docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
  Documentation: PGP: update for newer HW devices
  Documentation: Add section about CPU vulnerabilities for Spectre
  Documentation: platform: Delete x86-laptop-drivers.txt
  docs: Note that :c:func: should no longer be used
  ...
2019-07-09 12:34:26 -07:00
Paul Blakey
e0ace68af2 net/sched: cls_flower: Add matching on conntrack info
New matches for conntrack mark, label, zone, and state.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 12:12:00 -07:00
Paul Blakey
75a56758d6 net/flow_dissector: add connection tracking dissection
Retreives connection tracking zone, mark, label, and state from
a SKB.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 12:11:59 -07:00
Paul Blakey
b57dc7c13e net/sched: Introduce action ct
Allow sending a packet to conntrack module for connection tracking.

The packet will be marked with conntrack connection's state, and
any metadata such as conntrack mark and label. This state metadata
can later be matched against with tc classifers, for example with the
flower classifier as below.

In addition to committing new connections the user can optionally
specific a zone to track within, set a mark/label and configure nat
with an address range and port range.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>

Changelog:
V5->V6:
	Added CONFIG_NF_DEFRAG_IPV6 in handle fragments ipv6 case
V4->V5:
	Reordered nf_conntrack_put() in tcf_ct_skb_nfct_cached()
V3->V4:
	Added strict_start_type for act_ct policy
V2->V3:
	Fixed david's comments: Removed extra newline after rcu in tcf_ct_params , and indent of break in act_ct.c
V1->V2:
	Fixed parsing of ranges TCA_CT_NAT_IPV6_MAX as 'else' case overwritten ipv4 max
	Refactored NAT_PORT_MIN_MAX range handling as well
	Added ipv4/ipv6 defragmentation
	Removed extra skb pull push of nw offset in exectute nat
	Refactored tcf_ct_skb_network_trim after pull
	Removed TCA_ACT_CT define

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 12:11:59 -07:00
Parav Pandit
e41b6bf3cd devlink: Introduce PCI VF port flavour and port attribute
In an eswitch, PCI VF may have port which is normally represented using
a representor netdevice.
To have better visibility of eswitch port, its association with VF,
and its representor netdevice, introduce a PCI VF port flavour.

When devlink port flavour is PCI VF, fill up PCI VF attributes of
the port.

Extend port name creation using PCI PF and VF number scheme on best
effort basis, so that vendor drivers can skip defining their own scheme.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 12:02:13 -07:00
Parav Pandit
98fd2d6563 devlink: Introduce PCI PF port flavour and port attribute
In an eswitch, PCI PF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with
PF and a representor netdevice, introduce a PCI PF port
flavour and port attriute.

When devlink port flavour is PCI PF, fill up PCI PF attributes of the
port.

Extend port name creation using PCI PF number on best effort basis.
So that vendor drivers can skip defining their own scheme.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 12:02:13 -07:00
Parav Pandit
a2c6b87dd0 devlink: Return physical port fields only for applicable port flavours
Physical port number and split group fields are applicable only to
physical port flavours such as PHYSICAL, CPU and DSA.
Hence limit returning those values in netlink response to such port
flavours.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 12:02:13 -07:00
Parav Pandit
378ef01b5f devlink: Refactor physical port attributes
To support additional devlink port flavours and to support few common
and few different port attributes, move physical port attributes to a
different structure.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 12:02:13 -07:00
Linus Torvalds
8a3367cc80 LED updates for 5.3-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQUwxxKyE5l/npt8ARiEGxRG/Sl2wUCXRozKAAKCRBiEGxRG/Sl
 25okAP9I0Rmscpqjb/+GEeXH4EmL3moGzc9o/BzHRqfeO4wqYAEA+8f7L20xHe8g
 tvEGfP7mN/oBmcAfqgH5K9F4eJsBRAw=
 =NkCn
 -----END PGP SIGNATURE-----

Merge tag 'leds-for-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds

Pull LED updates from Jacek Anaszewski:

 - Add a new LED common module for ti-lmu driver family

 - Modify MFD ti-lmu bindings
        - add ti,brightness-resolution
        - add the ramp up/down property

 - Add regulator support for LM36274 driver to lm363x-regulator.c

 - New LED class drivers with DT bindings:
        - leds-spi-byte
        - leds-lm36274
        - leds-lm3697 (move the support from MFD to LED subsystem)

 - Simplify getting the I2C adapter of a client:
        - leds-tca6507
        - leds-pca955x

 - Convert LED documentation to ReST

* tag 'leds-for-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  dt: leds-lm36274.txt: fix a broken reference to ti-lmu.txt
  docs: leds: convert to ReST
  leds: leds-tca6507: simplify getting the adapter of a client
  leds: leds-pca955x: simplify getting the adapter of a client
  leds: lm36274: Introduce the TI LM36274 LED driver
  dt-bindings: leds: Add LED bindings for the LM36274
  regulator: lm363x: Add support for LM36274
  mfd: ti-lmu: Add LM36274 support to the ti-lmu
  dt-bindings: mfd: Add lm36274 bindings to ti-lmu
  leds: max77650: Remove set but not used variable 'parent'
  leds: avoid flush_work in atomic context
  leds: lm3697: Introduce the lm3697 driver
  mfd: ti-lmu: Remove support for LM3697
  dt-bindings: ti-lmu: Modify dt bindings for the LM3697
  leds: TI LMU: Add common code for TI LMU devices
  leds: spi-byte: add single byte SPI LED driver
  dt-bindings: leds: Add binding for spi-byte LED.
  dt-bindings: mfd: LMU: Add ti,brightness-resolution
  dt-bindings: mfd: LMU: Add the ramp up/down property
2019-07-09 08:59:39 -07:00
Linus Torvalds
5ad18b2e60 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull force_sig() argument change from Eric Biederman:
 "A source of error over the years has been that force_sig has taken a
  task parameter when it is only safe to use force_sig with the current
  task.

  The force_sig function is built for delivering synchronous signals
  such as SIGSEGV where the userspace application caused a synchronous
  fault (such as a page fault) and the kernel responded with a signal.

  Because the name force_sig does not make this clear, and because the
  force_sig takes a task parameter the function force_sig has been
  abused for sending other kinds of signals over the years. Slowly those
  have been fixed when the oopses have been tracked down.

  This set of changes fixes the remaining abusers of force_sig and
  carefully rips out the task parameter from force_sig and friends
  making this kind of error almost impossible in the future"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
  signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
  signal: Remove the signal number and task parameters from force_sig_info
  signal: Factor force_sig_info_to_task out of force_sig_info
  signal: Generate the siginfo in force_sig
  signal: Move the computation of force into send_signal and correct it.
  signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
  signal: Remove the task parameter from force_sig_fault
  signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
  signal: Explicitly call force_sig_fault on current
  signal/unicore32: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from ptrace_break
  signal/nds32: Remove tsk parameter from send_sigtrap
  signal/riscv: Remove tsk parameter from do_trap
  signal/sh: Remove tsk parameter from force_sig_info_fault
  signal/um: Remove task parameter from send_sigtrap
  signal/x86: Remove task parameter from send_sigtrap
  signal: Remove task parameter from force_sig_mceerr
  signal: Remove task parameter from force_sig
  signal: Remove task parameter from force_sigsegv
  ...
2019-07-08 21:48:15 -07:00
Linus Torvalds
4d2fa8b44b Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 5.3:

  API:
   - Test shash interface directly in testmgr
   - cra_driver_name is now mandatory

  Algorithms:
   - Replace arc4 crypto_cipher with library helper
   - Implement 5 way interleave for ECB, CBC and CTR on arm64
   - Add xxhash
   - Add continuous self-test on noise source to drbg
   - Update jitter RNG

  Drivers:
   - Add support for SHA204A random number generator
   - Add support for 7211 in iproc-rng200
   - Fix fuzz test failures in inside-secure
   - Fix fuzz test failures in talitos
   - Fix fuzz test failures in qat"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (143 commits)
  crypto: stm32/hash - remove interruptible condition for dma
  crypto: stm32/hash - Fix hmac issue more than 256 bytes
  crypto: stm32/crc32 - rename driver file
  crypto: amcc - remove memset after dma_alloc_coherent
  crypto: ccp - Switch to SPDX license identifiers
  crypto: ccp - Validate the the error value used to index error messages
  crypto: doc - Fix formatting of new crypto engine content
  crypto: doc - Add parameter documentation
  crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR
  crypto: arm64/aes-ce - add 5 way interleave routines
  crypto: talitos - drop icv_ool
  crypto: talitos - fix hash on SEC1.
  crypto: talitos - move struct talitos_edesc into talitos.h
  lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE
  crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
  crypto: asymmetric_keys - select CRYPTO_HASH where needed
  crypto: serpent - mark __serpent_setkey_sbox noinline
  crypto: testmgr - dynamically allocate crypto_shash
  crypto: testmgr - dynamically allocate testvec_config
  crypto: talitos - eliminate unneeded 'done' functions at build time
  ...
2019-07-08 20:57:08 -07:00
Jakub Kicinski
5c4b4608fe net/tls: fix socket wmem accounting on fallback with netem
netem runs skb_orphan_partial() which "disconnects" the skb
from normal TCP write memory accounting.  We should not adjust
sk->sk_wmem_alloc on the fallback path for such skbs.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 20:21:10 -07:00
Jakub Kicinski
ab232e61e7 net/tls: add missing prot info init
Turns out TLS_TX in HW offload mode does not initialize tls_prot_info.
Since commit 9cd81988cc ("net/tls: use version from prot") we actually
use this field on the datapath.  Luckily we always compare it to TLS 1.3,
and assume 1.2 otherwise. So since zero is not equal to 1.3, everything
worked fine.

Fixes: 9cd81988cc ("net/tls: use version from prot")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 20:21:09 -07:00
Dirk van der Merwe
b5d9a834f4 net/tls: don't clear TX resync flag on error
Introduce a return code for the tls_dev_resync callback.

When the driver TX resync fails, kernel can retry the resync again
until it succeeds.  This prevents drivers from attempting to offload
TLS packets if the connection is known to be out of sync.

We don't worry about the RX resync since they will be retried naturally
as more encrypted records get received.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 20:21:09 -07:00
Xin Long
3cab2afb14 sctp: remove rcu_read_lock from sctp_bind_addr_state
sctp_bind_addr_state() is called either in packet rcv path or
by sctp_copy_local_addr_list(), which are under rcu_read_lock.
So there's no need to call it again in sctp_bind_addr_state().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 20:18:11 -07:00
Xin Long
e55f4b8bf4 sctp: rename sp strm_interleave to ep intl_enable
Like other endpoint features, strm_interleave should be moved to
sctp_endpoint and renamed to intl_enable.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 20:16:25 -07:00
Xin Long
da1f6d4de7 sctp: rename asoc intl_enable to asoc peer.intl_capable
To keep consistent with other asoc features, we move intl_enable
to peer.intl_capable in asoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 20:16:25 -07:00
Xin Long
1c13475368 sctp: remove prsctp_enable from asoc
Like reconf_enable, prsctp_enable should also be removed from asoc,
as asoc->peer.prsctp_capable has taken its job.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 20:16:24 -07:00
Xin Long
a96701fb35 sctp: remove reconf_enable from asoc
asoc's reconf support is actually decided by the 4-shakehand negotiation,
not something that users can set by sockopt. asoc->peer.reconf_capable is
working for this. So remove it from asoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 20:16:24 -07:00