linux

Author	SHA1	Message	Date
Alexei Starovoitov	9f4686c41b	bpf: improve verification speed by droping states Branch instructions, branch targets and calls in a bpf program are the places where the verifier remembers states that led to successful verification of the program. These states are used to prune brute force program analysis. For unprivileged programs there is a limit of 64 states per such 'branching' instructions (maximum length is tracked by max_states_per_insn counter introduced in the previous patch). Simply reducing this threshold to 32 or lower increases insn_processed metric to the point that small valid programs get rejected. For root programs there is no limit and cilium programs can have max_states_per_insn to be 100 or higher. Walking 100+ states multiplied by number of 'branching' insns during verification consumes significant amount of cpu time. Turned out simple LRU-like mechanism can be used to remove states that unlikely will be helpful in future search pruning. This patch introduces hit_cnt and miss_cnt counters: hit_cnt - this many times this state successfully pruned the search miss_cnt - this many times this state was not equivalent to other states (and that other states were added to state list) The heuristic introduced in this patch is: if (sl->miss_cnt > sl->hit_cnt * 3 + 3) /* drop this state from future considerations / Higher numbers increase max_states_per_insn (allow more states to be considered for pruning) and slow verification speed, but do not meaningfully reduce insn_processed metric. Lower numbers drop too many states and insn_processed increases too much. Many different formulas were considered. This one is simple and works well enough in practice. (the analysis was done on selftests/progs/ and on cilium programs) The end result is this heuristic improves verification speed by 10 times. Large synthetic programs that used to take a second more now take 1/10 of a second. In cases where max_states_per_insn used to be 100 or more, now it's ~10. There is a slight increase in insn_processed for cilium progs: before after bpf_lb-DLB_L3.o 1831 1838 bpf_lb-DLB_L4.o 3029 3218 bpf_lb-DUNKNOWN.o 1064 1064 bpf_lxc-DDROP_ALL.o 26309 26935 bpf_lxc-DUNKNOWN.o 33517 34439 bpf_netdev.o 9713 9721 bpf_overlay.o 6184 6184 bpf_lcx_jit.o 37335 39389 And 2-3 times improvement in the verification speed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:37 +02:00
Alexei Starovoitov	06ee7115b0	bpf: add verifier stats and log_level bit 2 In order to understand the verifier bottlenecks add various stats and extend log_level: log_level 1 and 2 are kept as-is: bit 0 - level=1 - print every insn and verifier state at branch points bit 1 - level=2 - print every insn and verifier state at every insn bit 2 - level=4 - print verifier error and stats at the end of verification When verifier rejects the program the libbpf is trying to load the program twice. Once with log_level=0 (no messages, only error code is reported to user space) and second time with log_level=1 to tell the user why the verifier rejected it. With introduction of bit 2 - level=4 the libbpf can choose to always use that level and load programs once, since the verification speed is not affected and in case of error the verbose message will be available. Note that the verifier stats are not part of uapi just like all other verbose messages. They're expected to change in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:37 +02:00
Andrii Nakryiko	e83b9f5544	kbuild: add ability to generate BTF type info for vmlinux This patch adds new config option to trigger generation of BTF type information from DWARF debuginfo for vmlinux and kernel modules through pahole, which in turn relies on libbpf for btf_dedup() algorithm. The intent is to record compact type information of all types used inside kernel, including all the structs/unions/typedefs/etc. This enables BPF's compile-once-run-everywhere ([0]) approach, in which tracing programs that are inspecting kernel's internal data (e.g., struct task_struct) can be compiled on a system running some kernel version, but would be possible to run on other kernel versions (and configurations) without recompilation, even if the layout of structs changed and/or some of the fields were added, removed, or renamed. This is only possible if BPF loader can get kernel type info to adjust all the offsets correctly. This patch is a first time in this direction, making sure that BTF type info is part of Linux kernel image in non-loadable ELF section. BTF deduplication ([1]) algorithm typically provides 100x savings compared to DWARF data, so resulting .BTF section is not big as is typically about 2MB in size. [0] http://vger.kernel.org/lpc-bpf2018.html#session-2 [1] https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-03 00:53:07 +02:00
Daniel Borkmann	99182beed8	Merge branch 'bpf-selftest-clang-fixes' Stanislav Fomichev says: ==================== This series contains small fixes to make bpf selftests compile cleanly with clangs. All of them are not real problems, but it's nice to have an option to use clang for the tests themselves. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:19 +02:00
Stanislav Fomichev	7596aa3ea8	selftests: bpf: remove duplicate .flags initialization in ctx_skb.c verifier/ctx_skb.c:708:11: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	a918b03e8c	selftests: bpf: fix -Wformat-invalid-specifier for bpf_obj_id.c Use standard C99 %zu for sizeof, not GCC's custom %Zu: bpf_obj_id.c:76:48: warning: invalid conversion specifier 'Z' Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	94e8f3c712	selftests: bpf: fix -Wformat-security warning for flow_dissector_load.c flow_dissector_load.c:55:19: warning: format string is not a string literal (potentially insecure) [-Wformat-security] error(1, errno, command); ^~~~~~~ flow_dissector_load.c:55:19: note: treat the string as an argument to avoid this error(1, errno, command); ^ "%s", 1 warning generated. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	6b7b6995c4	selftests: bpf: tests.h should depend on .c files, not the output This makes sure we don't put headers as input files when doing compilation, because clang complains about the following: clang-9: error: cannot specify -o when generating multiple output files ../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_verifier' failed make: * [xxx/tools/testing/selftests/bpf/test_verifier] Error 1 make: * Waiting for unfinished jobs.... clang-9: error: cannot specify -o when generating multiple output files ../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_progs' failed Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Yonghong Song	9de2640b06	bpf: add bpffs multi-dimensional array tests in test_btf For multiple dimensional arrays like below, int a[2][3] both llvm and pahole generated one BTF_KIND_ARRAY type like . element_type: int . index_type: unsigned int . number of elements: 6 Such a collapsed BTF_KIND_ARRAY type will cause the divergence in BTF vs. the user code. In the compile-once-run-everywhere project, the header file is generated from BTF and used for bpf program, and the definition in the header file will be different from what user expects. But the kernel actually supports chained multi-dimensional array types properly. The above "int a[2][3]" can be represented as Type #n: . element_type: int . index_type: unsigned int . number of elements: 3 Type #(n+1): . element_type: type #n . index_type: unsigned int . number of elements: 2 The following llvm commit https://reviews.llvm.org/rL357215 also enables llvm to generated proper chained multi-dimensional arrays. The test_btf already has a raw test ("struct test #1") for chained multi-dimensional arrays. This patch added amended bpffs test for chained multi-dimensional arrays. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-01 15:41:05 +02:00
Alexei Starovoitov	c3969de8ac	Merge branch 'variable-stack-access' Andrey Ignatov says: ==================== The patch set adds support for stack access with variable offset from helpers. Patch 1 is the main patch in the set and provides more details. Patch 2 adds selftests for new functionality. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-03-29 12:05:36 -07:00
Andrey Ignatov	8ff80e96e3	selftests/bpf: Test variable offset stack access Test different scenarios of indirect variable-offset stack access: out of bound access (>0), min_off below initialized part of the stack, max_off+size above initialized part of the stack, initialized stack. Example of output: ... #856/p indirect variable-offset stack access, out of bound OK #857/p indirect variable-offset stack access, max_off+size > max_initialized OK #858/p indirect variable-offset stack access, min_off < min_initialized OK #859/p indirect variable-offset stack access, ok OK ... Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-03-29 12:05:35 -07:00
Andrey Ignatov	2011fccfb6	bpf: Support variable offset stack access from helpers Currently there is a difference in how verifier checks memory access for helper arguments for PTR_TO_MAP_VALUE and PTR_TO_STACK with regard to variable part of offset. check_map_access, that is used for PTR_TO_MAP_VALUE, can handle variable offsets just fine, so that BPF program can call a helper like this: some_helper(map_value_ptr + off, size); , where offset is unknown at load time, but is checked by program to be in a safe rage (off >= 0 && off + size < map_value_size). But it's not the case for check_stack_boundary, that is used for PTR_TO_STACK, and same code with pointer to stack is rejected by verifier: some_helper(stack_value_ptr + off, size); For example: 0: (7a) (u64 )(r10 -16) = 0 1: (7a) (u64 )(r10 -8) = 0 2: (61) r2 = (u32 )(r1 +0) 3: (57) r2 &= 4 4: (17) r2 -= 16 5: (0f) r2 += r10 6: (18) r1 = 0xffff888111343a80 8: (85) call bpf_map_lookup_elem#1 invalid variable stack read R2 var_off=(0xfffffffffffffff0; 0x4) Add support for variable offset access to check_stack_boundary so that if offset is checked by program to be in a safe range it's accepted by verifier. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-03-29 12:05:35 -07:00
Luca Boccassi	dd399ac9e3	tools/bpf: generate pkg-config file for libbpf Generate a libbpf.pc file at build time so that users can rely on pkg-config to find the library, its CFLAGS and LDFLAGS. Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-03-28 17:06:03 +01:00
David S. Miller	356d71e00d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-03-27 17:37:58 -07:00
Eric Dumazet	df453700e8	inet: switch IP ID generator to siphash According to Amit Klein and Benny Pinkas, IP ID generation is too weak and might be used by attackers. Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix()) having 64bit key and Jenkins hash is risky. It is time to switch to siphash and its 128bit keys. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 14:29:26 -07:00
Colin Ian King	180a8c3d5d	net: phy: mdio-bcm-unimac: remove redundant !timeout check The check for zero timeout is always true at the end of the proceeding while loop; the only other exit path in the loop is if the unimac MDIO is not busy. Remove the redundant zero timeout check and always return -ETIMEDOUT on this timeout return path. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 14:27:30 -07:00
Eric Dumazet	4f661542a4	tcp: fix zerocopy and notsent_lowat issues My recent patch had at least three problems : 1) TX zerocopy wants notification when skb is acknowledged, thus we need to call skb_zcopy_clear() if the skb is cached into sk->sk_tx_skb_cache 2) Some applications might expect precise EPOLLOUT notifications, so we need to update sk->sk_wmem_queued and call sk_mem_uncharge() from sk_wmem_free_skb() in all cases. The SOCK_QUEUE_SHRUNK flag must also be set. 3) Reuse of saved skb should have used skb_cloned() instead of simply checking if the fast clone has been freed. Fixes: `472c2e07ee` ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:59:02 -07:00
Numan Siddique	4d5ec89fc8	net: openvswitch: Add a new action check_pkt_len This patch adds a new action - 'check_pkt_len' which checks the packet length and executes a set of actions if the packet length is greater than the specified length or executes another set of actions if the packet length is lesser or equal to. This action takes below nlattrs * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER - Nested actions to apply if the packet length is greater than the specified 'pkt_len' * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested actions to apply if the packet length is lesser or equal to the specified 'pkt_len'. The main use case for adding this action is to solve the packet drops because of MTU mismatch in OVN virtual networking solution. When a VM (which belongs to a logical switch of OVN) sends a packet destined to go via the gateway router and if the nic which provides external connectivity, has a lesser MTU, OVS drops the packet if the packet length is greater than this MTU. With the help of this action, OVN will check the packet length and if it is greater than the MTU size, it will generate an ICMP packet (type 3, code 4) and includes the next hop mtu in it so that the sender can fragment the packets. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> CC: Gregory Rose <gvrose8192@gmail.com> CC: Pravin B Shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:53:23 -07:00
David S. Miller	d7aa033831	Merge branch 'ethtool-add-support-for-Fast-Link-Down-as-new-PHY-tunable' Heiner Kallweit says: ==================== ethtool: add support for Fast Link Down as new PHY tunable This adds support for Fast Link Down as new PHY tunable. Fast Link Down reduces the time until a link down event is reported for 1000BaseT. According to the standard it's 750ms what is too long for several use cases. This is the kernel-related series, the ethtool userspace extension I'd submit once the kernel part has been applied. v2: - add describing comment in patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:51:49 -07:00
Heiner Kallweit	69f42be8af	net: phy: marvell: add PHY tunable fast link down support for 88E1540 1000BaseT standard requires that a link is reported as down earliest after 750ms. Several use case however require a much faster detecion of a broken link. Fast Link Down supports this by intentionally violating a the standard. This patch exposes the Fast Link Down feature of 88E1540 and 88E6390. These PHY's can be found as internal PHY's in several switches: 88E6352, 88E6240, 88E6176, 88E6172, and 88E6390(X). Fast Link Down and EEE are mutually exclusive. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:51:49 -07:00
Heiner Kallweit	3aeb0803f7	ethtool: add PHY Fast Link Down support This adds support for Fast Link Down as new PHY tunable. Fast Link Down reduces the time until a link down event is reported for 1000BaseT. According to the standard it's 750ms what is too long for several use cases. v2: - add comment describing the constants Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:51:49 -07:00
Bart Van Assche	7b7ed885af	net/core: Allow the compiler to verify declaration and definition consistency Instead of declaring a function in a .c file, declare it in a header file and include that header file from the source files that define and that use the function. That allows the compiler to verify consistency of declaration and definition. See also commit `52267790ef` ("sock: add MSG_ZEROCOPY") # v4.14. Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:44 -07:00
Bart Van Assche	a986967eb8	net/core: Fix rtnetlink kernel-doc headers This patch avoids that the following warnings are reported when building with W=1: net/core/rtnetlink.c:3580: warning: Function parameter or member 'ndm' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'tb' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'dev' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'addr' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'vid' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'flags' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3718: warning: Function parameter or member 'ndm' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3718: warning: Function parameter or member 'tb' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3718: warning: Function parameter or member 'dev' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3718: warning: Function parameter or member 'addr' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3718: warning: Function parameter or member 'vid' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3861: warning: Function parameter or member 'skb' not described in 'ndo_dflt_fdb_dump' net/core/rtnetlink.c:3861: warning: Function parameter or member 'cb' not described in 'ndo_dflt_fdb_dump' net/core/rtnetlink.c:3861: warning: Function parameter or member 'filter_dev' not described in 'ndo_dflt_fdb_dump' net/core/rtnetlink.c:3861: warning: Function parameter or member 'idx' not described in 'ndo_dflt_fdb_dump' net/core/rtnetlink.c:3861: warning: Excess function parameter 'nlh' description in 'ndo_dflt_fdb_dump' Cc: Hubert Sokolowski <hubert.sokolowski@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:44 -07:00
Bart Van Assche	d79b3bafab	net/core: Document __skb_flow_dissect() flags argument This patch avoids that the following warning is reported when building with W=1: warning: Function parameter or member 'flags' not described in '__skb_flow_dissect' Cc: Tom Herbert <tom@herbertland.com> Fixes: `cd79a2382a` ("flow_dissector: Add flags argument to skb_flow_dissector functions") # v4.3. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:43 -07:00
Bart Van Assche	b3c0fd61e6	net/core: Document all dev_ioctl() arguments This patch avoids that the following warnings are reported when building with W=1: net/core/dev_ioctl.c:378: warning: Function parameter or member 'ifr' not described in 'dev_ioctl' net/core/dev_ioctl.c:378: warning: Function parameter or member 'need_copyout' not described in 'dev_ioctl' net/core/dev_ioctl.c:378: warning: Excess function parameter 'arg' description in 'dev_ioctl' Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: `44c02a2c3d` ("dev_ioctl(): move copyin/copyout to callers") # v4.16. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:43 -07:00
Bart Van Assche	37f3c421e8	net/core: Document reuseport_add_sock() bind_inany argument This patch avoids that the following warning is reported when building with W=1: warning: Function parameter or member 'bind_inany' not described in 'reuseport_add_sock' Cc: Martin KaFai Lau <kafai@fb.com> Fixes: `2dbb9b9e6d` ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT") # v4.19. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:43 -07:00
Heiner Kallweit	863d1a8d55	net: dsa: mv88e6xxx: remove unneeded cmode initialization This partially reverts `ed8fe20205` ("net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode"). I missed that chip->ports[].cmode is overwritten anyway by the cmode caching in mv88e6xxx_setup(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:47:23 -07:00
Sudarsana Reddy Kalluru	32705592f9	bnx2x: Utilize FW 7.13.11.0. Commit 8fcf0ec44c11f "bnx2x: Add FW 7.13.11.0" added said .bin FW to linux-firmware; This patch incorporates the FW in the bnx2x driver. This introduces few FW fixes and the support for Tx VLAN filtering. Please consider applying it to 'net-next' tree. Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:43:14 -07:00
Kristian Evensen	1713cb37bf	fou: Support binding FoU socket An FoU socket is currently bound to the wildcard-address. While this works fine, there are several use-cases where the use of the wildcard-address is not desirable. For example, I use FoU on some multi-homed servers and would like to use FoU on only one of the interfaces. This commit adds support for binding FoU sockets to a given source address/interface, as well as connecting the socket to a given destination address/port. udp_tunnel already provides the required infrastructure, so most of the code added is for exposing and setting the different attributes (local address, peer address, etc.). The lookups performed when we add, delete or get an FoU-socket has also been updated to compare all the attributes a user can set. Since the comparison now involves several elements, I have added a separate comparison-function instead of open-coding. In order to test the code and ensure that the new comparison code works correctly, I started by creating a wildcard socket bound to port 1234 on my machine. I then tried to create a non-wildcarded socket bound to the same port, as well as fetching and deleting the socket (including source address, peer address or interface index in the netlink request). Both the create, fetch and delete request failed. Deleting/fetching the socket was only successful when my netlink request attributes matched those used to create the socket. I then repeated the tests, but with a socket bound to a local ip address, a socket bound to a local address + interface, and a bound socket that was also «connected» to a peer. Add only worked when no socket with the matching source address/interface (or wildcard) existed, while fetch/delete was only successful when all attributes matched. In addition to testing that the new code work, I also checked that the current behavior is kept. If none of the new attributes are provided, then an FoU-socket is configured as before (i.e., wildcarded). If any of the new attributes are provided, the FoU-socket is configured as expected. v1->v2: * Fixed building with IPv6 disabled (kbuild). * Fixed a return type warning and make the ugly comparison function more readable (kbuild). * Describe more in detail what has been tested (thanks David Miller). * Make peer port required if peer address is specified. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:30:07 -07:00
Linus Torvalds	1a9df9e29c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Fixes here and there, a couple new device IDs, as usual: 1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei. 2) Fix 64-bit division in iwlwifi, from Arnd Bergmann. 3) Fix documentation for some eBPF helpers, from Quentin Monnet. 4) Some UAPI bpf header sync with tools, also from Quentin Monnet. 5) Set descriptor ownership bit at the right time for jumbo frames in stmmac driver, from Aaro Koskinen. 6) Set IFF_UP properly in tun driver, from Eric Dumazet. 7) Fix load/store doubleword instruction generation in powerpc eBPF JIT, from Naveen N. Rao. 8) nla_nest_start() return value checks all over, from Kangjie Lu. 9) Fix asoc_id handling in SCTP after the SCTP__ASSOC changes this merge window. From Marcelo Ricardo Leitner and Xin Long. 10) Fix memory corruption with large MTUs in stmmac, from Aaro Koskinen. 11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric Dumazet. 12) Fix topology subscription cancellation in tipc, from Erik Hugne. 13) Memory leak in genetlink error path, from Yue Haibing. 14) Valid control actions properly in packet scheduler, from Davide Caratti. 15) Even if we get EEXIST, we still need to rehash if a shrink was delayed. From Herbert Xu. 16) Fix interrupt mask handling in interrupt handler of r8169, from Heiner Kallweit. 17) Fix leak in ehea driver, from Wen Yang" git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits) dpaa2-eth: fix race condition with bql frame accounting chelsio: use BUG() instead of BUG_ON(1) net: devlink: skip info_get op call if it is not defined in dumpit net: phy: bcm54xx: Encode link speed and activity into LEDs tipc: change to check tipc_own_id to return in tipc_net_stop net: usb: aqc111: Extend HWID table by QNAP device net: sched: Kconfig: update reference link for PIE net: dsa: qca8k: extend slave-bus implementations net: dsa: qca8k: remove leftover phy accessors dt-bindings: net: dsa: qca8k: support internal mdio-bus dt-bindings: net: dsa: qca8k: fix example net: phy: don't clear BMCR in genphy_soft_reset bpf, libbpf: clarify bump in libbpf version info bpf, libbpf: fix version info and add it to shared object rxrpc: avoid clang -Wuninitialized warning tipc: tipc clang warning net: sched: fix cleanup NULL pointer exception in act_mirr r8169: fix cable re-plugging issue net: ethernet: ti: fix possible object reference leak net: ibm: fix possible object reference leak ...	2019-03-27 12:22:57 -07:00
David S. Miller	eec7e2954d	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2019-03-26 This series contains more updates to the ice driver only. Jeremiah provides his first patch to the Linux kernel to clean up un-necessary newlines in driver log messages. Mitch updates the ice driver to use existing status codes in the iavf driver so that when errors occur, it will not report nonsensical results. Adds support for VF admin queue interrupts by programming the VPINT_MBX_CTL register array. Brett adds a check for a bit that we set while preparing for a reset, to ensure we are prepared to do a proper reset. Also implemented PCI error handling operations. Went through and audited the hot path with pahole and made modifications based on the results since 2 structures were taking up more space than necessary due to cache alignment issues. Fixed an issue where when flow control was disabled, the state of flow control was being displayed as "Unknown". Anirudh fixes adaptive interrupt moderation changes by adding code that was missed, that should have been added in the initial patch to add that support. Cleaned up a function prototype that was never implemented. Did additional code cleanup by removing unneeded braces and redundant code comments. Akeem fixes an issue that occurs when the VF is attempting to remove the default LAN/MAC address, which is programmed by the administrator by updating the error message to explicitly say that the VF cannot change the MAC programmed by the PF. Preethi fixes the driver to not fall into the error path when a added filter already exists, but instead continue to process the rest of the function and add appropriate checks after adding MAC filters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:19:13 -07:00
David S. Miller	b0be25c575	Merge branch 'net-mvpp2-Classifier-updates-and-cleanups' Maxime Chevallier says: ==================== net: mvpp2: Classifier updates and cleanups In preparation for the future addition of classification offload support, this series cleans-up the current code dealing with the classification engines. As of today, the classification code only deals with RSS, which is a very narrow use-case when considering all of the classifier's features. This lead to design and naming decisions that don't really stand when considering using more classification features. This series therefore includes quite a lot a renaming of functions and macros, and tries to make the naming schemes more consistent and the code more readable. The debugfs interface that allows reading the various Parsing and Classification engines has been cleaned-up and made more generic, allowing to read the Flow Table and C2 engine hit counters on a per-entry basis. The Classifier itself has been made more robust by introducing the lu_type field in classification lookups, that prevents false-positive matches in the future. We also initialise the various engine in a more extensive and less error-prone way by assining default values to all entries in the C2 and Flow table. This is a pretty big series considering it's mostly cleanup, but my goal is to make the future series that will contain new big features easier to review, focusing on the real logic. Besides the debugfs interface, this series doesn't intend to introduce any new features or change in behaviours. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	c2d3d8eebe	net: mvpp2: cls: Rework C2 engine macros The C2 classification engine has a 256 entry TCAM, used for ternary matches on an 8 byte Header Extracted Key. For now, we compute the various indices for classification and RSS that use this engine thanks to a set of macros. This commit mainly renames the macros used to make it clear that they should be used with the C2 engine, but also make use of the full 256 entries in the engine. For now, the C2 entries are only used for RSS. These entries are put at the end of the TCAM range, in case we want to add higher priority matches later on. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	693131db1d	net: mvpp2: cls: Initialize lookup priorities for all entries in the flow When classifying a packet pertaining to a given flow, the classifier will issue multiple lookup commands until it finds one with the 'last' bit set. It expects all prorities to be assign continuously (although not necessarily in an ordered fashion) from 0 to the number of lookups. We can initialize this once, and make sure unused lookups are given an empty port map. This avoids having to maintain priorities and the information of which lookup is the last. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	8d2847d946	net: mvpp2: cls: Invalidate all C2 entries except the ones we use C2 TCAM entries can be invalidated to avoid unwanted matches. Make sure all entries are invalidated at init, then validate only the ones we use. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	ff2f3cb6eb	net: mvpp2: cls: Rename the flow table macros The Flow Table dictates what lookups will be issued for each flow type. The lookup sequence for each flow is similar, and the index of each lookup is computed by some macros. There are similar mechanisms for the C2 TCAM lookups, so in order to avoid confusion, rename the flow table index computing macros with a common prefix. The only difference in behaviour is that we now use the very first entry in the flow for the RSS lookup (the first entry was previously unused). Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	5b35380636	net: mvpp2: cls: Don't use the sequence attribute for classification The classifier allows to combine multiple lookups in one "sequence" that is counted as a single lookup to an engine, with a single result. We don't actually use that feature, so remove any places where we set this field, so that the classifier doesn't try to interpret these fields. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	6310f77d99	net: mvpp2: cls: Rename classifer per-port functions This commit renames some of the classifier functions to follow the naming 'mvpp2_port_*' that's used for function that act on a given port. This commit is purely cosmetic. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	b11ffdc538	net: mvpp2: cls: Move C2 read/write helpers around Move C2 read/write helpers higher in the file to ease future work that rely on these helpers Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	147c538e79	net: mvpp2: cls: Write C2 TCAM data last when writing a C2 entry When writing a C2 entry to hardware, some registers writes will only take effect when the TCAM_DATA4 register is written. This includes all C2 TCAM registers, and the C2 invalidate register. To make sure we always write C2 entries correctly, document that behaviour with a comment, and move TCAM writes to the end of the mvpp2_cls_c2_write helper. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	e4bfb4aced	net: mvpp2: cls: Use iterators to go through the cls_table The cls_table is a global read-only table containing the different parameters that are used by various tables in the classifier. It describes the links between the Header Parser, the decoding table and the flow_table. There are several possible way we want to iterate over that table, depending on wich classifier engine we want to configure. For the Header Parser, we want to iterate over each entry. For the Decoding table, we want to iterate over each entry having a unique flow_id. Finally, when configuring an ethtool flow, we want to iterate over each entry having a unique flow_id and that has a given flow_type. This commit introduces some iterator to both provide syntactic sugar and also clarify the way we want to iterate over the table. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00
Maxime Chevallier	b607cc61be	net: mvpp2: debugfs: Allow reading the C2 engine table from debugfs PPv2's Classifier uses multiple engines to perform classification. So far, only the C2 engine is used, which has a 256 entries TCAM. So far, we only accessed the relevant entries from the C2 engines, which are the one implementing RSS. To implement and debug ntuple classification offload, beaing able to see the hit count for each C2 entry is helpful, so this commit moves the logic to a dedicated directory allowing to access each entry. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00
Maxime Chevallier	8aa651060f	net: mvpp2: debugfs: Allow reading the flow table from debugfs The Classifier flow table is the central part of the PPv2 Classifier, since it describes all classification steps performed for each flow. It has 512 entries, shared between all ports, which are divided into sequences that are pointed-to by the decoding table. Being able to see which entries in the flow table were hit is a key point when implementing and debugging classification offload. This commit allows reading each flow table entry's hit count independently, with a clear-on-read behaviour. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00
Maxime Chevallier	7cb5e36859	net: mvpp2: debugfs: Store debugfs entries data in mvpp2 struct The current way to store the required private data needed to access various debugfs entries is to alloc them on the fly, share them within the entries that need to access them, and finally have one entry free that data upon closing. This leads to hard to maintain code, and is very error-prone. This commit stores all debugfs related data in the same place, making sure this is allocated only when the debugfs directory is successfully created, so that we don't waste memory when we don't use this feature. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00
Maxime Chevallier	0b27f8650f	net: mvpp2: cls: Make the flow definitions const The cls_flow table represent the overall configuration of the classifier, used to match the different traffic classes in the Parsing and Classification engines. This configuration is static, and applies to all PPv2 instances, we must therefore keep it const so that no modifications of this table are performed at runtime. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00
Maxime Chevallier	93c2589c92	net: mvpp2: cls: Rename MVPP2_N_FLOWS to MVPP2_N_PRS_FLOWS The macro definition MVPP2_N_FLOWS is ambiguous because it really represents the number of entries in the Header Parser that are used to identify the classification flows. Rename the macro to clearly state that we represent the number of flows in the Header Parser. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00
Maxime Chevallier	32f1a672d4	net: mvpp2: cls: use Lookup Type in classification engines The PPv2 classifier allows to perform multiple lookups on the same engine when classifying a packet. These lookups can match similar parts of a packet header, but perform different actions upon matching. To differentiate these types of lookups, it's possible to specify a Lookup Type in the flow table entries, which will be part of the key for the lookup engines. This commit introduces the use of Lookup Types for C2 matches. Since for now we only perform C2 lookups to enable RSS, we only need one Lookup Type. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00
Maxime Chevallier	dc61b37fd9	net: mvpp2: cls: Start cls flow entries from beginning of table The Classifier flow table has 512 entries, that contains lookups commands executed consecutively for every flow. Since we have 21 different flows, we have to carefully manage the flow table use. As of today, the start index of a lookup sequence is computed directly based in the flow->id. There are 8 reserved flow ids, from 0-7, which don't have any corresponding sequence in the flow table. We can therefore ignore them when computing the index, and make so that the first non-reserved flow point to the very beginning of the flow table. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Suggested-by: Alan Winkowski <walan@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00
Maxime Chevallier	1f29a8c4c6	net: mvpp2: cls: Add missing MAC_DA field extraction PPv2's classifier supports extracting the MAC Destination Address from the L2 header to perform RSS and flow steering. Add the missing case when setting the Header Extracted Key fields in the flow table. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00
Maxime Chevallier	c9dbb6cf51	net: mvpp2: Don't use an int to store netdev_features_t int is not long enough to store all netdev_features, use the correct dedicated type to store them when building the list of dev->features. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:57 -07:00

1 2 3 4 5 ...

825475 Commits