linux

Author	SHA1	Message	Date
Nikolay Aleksandrov	7d850abd5f	net: bridge: add support for port isolation This patch adds support for a new port flag - BR_ISOLATED. If it is set then isolated ports cannot communicate between each other, but they can still communicate with non-isolated ports. The same can be achieved via ACLs but they can't scale with large number of ports and also the complexity of the rules grows. This feature can be used to achieve isolated vlan functionality (similar to pvlan) as well, though currently it will be port-wide (for all vlans on the port). The new test in should_deliver uses data that is already cache hot and the new boolean is used to avoid an additional source port test in should_deliver. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-25 14:37:20 -04:00
David S. Miller	9c5904904b	Merge branch 'nfp-offload-LAG-for-tc-flower-egress' Jakub Kicinski says: ==================== nfp: offload LAG for tc flower egress This series from John adds bond offload to the nfp driver. Patch 5 exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp hashing matches that of the software LAG. This may be unnecessarily conservative, let's see what LAG maintainers think :) John says: This patchset sets up the infrastructure and offloads output actions for when a TC flower rule attempts to egress a packet to a LAG port. Firstly it adds some of the infrastructure required to the flower app and to the nfp core. This includes the ability to change the MAC address of a repr, a function for combining lookup and write to a FW symbol, and the addition of private data to a repr on a per app basis. Patch 6 continues by implementing notifiers that track Linux bonds and communicates to the FW those which enslave reprs, along with the current state of reprs within the bond. Patch 7 ensures bonds are synchronised with FW by receiving and acting upon cmsgs sent to the kernel. These may request that a bond message is retransmitted when FW can process it, or may request a full sync of the bonds defined in the kernel. Patch 8 offloads a flower action when that action requires egressing to a pre-defined Linux bond. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:57 -04:00
John Hurley	7e24a59311	nfp: flower: compute link aggregation action If the egress device of an offloaded rule is a LAG port, then encode the output port to the NFP with a LAG identifier and the offloaded group ID. A prelag action is also offloaded which must be the first action of the series (although may appear after other pre-actions - e.g. tunnels). This causes the FW to check that it has the necessary information to output to the requested LAG port. If it does not, the packet is sent to the kernel before any other actions are applied to it. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:57 -04:00
John Hurley	2e1cc5226b	nfp: flower: implement host cmsg handler for LAG Adds the control message handler to synchronize offloaded group config with that of the kernel. Such messages are sent from fw to driver and feature the following 3 flags: - Data: an attached cmsg could not be processed - store for retransmission - Xon: FW can accept new messages - retransmit any stored cmsgs - Sync: full sync requested so retransmit all kernel LAG group info Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:57 -04:00
John Hurley	bb9a8d0311	nfp: flower: monitor and offload LAG groups Monitor LAG events via the NETDEV_CHANGEUPPER/NETDEV_CHANGELOWERSTATE notifiers to maintain a list of offloadable groups. Sync these groups with HW via a delayed workqueue to prevent excessive re-configuration. When the workqueue is triggered it may generate multiple control messages for different groups. These messages are linked via a batch ID and flags to indicate a new batch and the end of a batch. Update private data in each repr to track their LAG lower state flags. The state of a repr is used to determine the active netdevs that can be offloaded. For example, in active-backup mode, we only offload the netdev currently active. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:57 -04:00
John Hurley	f44aa9ef79	net: include hash policy in LAG changeupper info LAG upper event notifiers contain the tx type used by the LAG device. Extend this to also include the hash policy used for tx types that utilize hashing. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:57 -04:00
John Hurley	b945245297	nfp: flower: add per repr private data for LAG offload Add a bitmap to each flower repr to track its state if it is enslaved by a bond. This LAG state may be different to the port state - for example, the port may be up but LAG state may be down due to the selection in an active/backup bond. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:56 -04:00
John Hurley	898bc7d634	nfp: flower: check for/turn on LAG support in firmware Check if the fw contains the _abi_flower_balance_sync_enable symbol. If it does then write a 1 to this indicating that the driver is willing to receive NIC to kernel LAG related control messages. If the write is successful, update the list of extra features supported by the fw and add a stub to accept LAG cmsgs. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:56 -04:00
John Hurley	1945ca7a81	nfp: nfpcore: add rtsym writing function Add an rtsym API function that combines the lookup of a symbol and the writing of a value to it. Values can be written as unsigned 32 or 64 bits. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:56 -04:00
John Hurley	24f132e29c	nfp: add ndo_set_mac_address for representors Adding a netdev to a bond requires that its mac address can be modified. The default eth_mac_addr is sufficient to satisfy this requirement. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:10:56 -04:00
Stephen Hemminger	d97cde6ab5	hv_netvsc: fix bogus ifalias on network device If the guest network adapter is not configured with DeviceNaming enabled on the host, then the query for friendly name will return success but with a zero length name. Which then leads to a garbage value (stack contents) for ifalias. Fix is simple, just don't set name if host doesn't return it. Fixes: `0fe554a46a` ("hv_netvsc: propogate Hyper-V friendly name into interface alias") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:06:33 -04:00
David S. Miller	de0a8267e2	Merge branch 'net-Update-fib_table_lookup-tracepoints' David Ahern says: ==================== net: Update fib_table_lookup tracepoints Update the FIB lookup tracepoints to include ip proto and port fields from the flow struct. In the process make the IPv4 tracepoint inline with IPv6 which is much easier to use and follow the lookup and result. Remove the tracepoint in fib_validate_source which does not provide value above the fib_table_lookup which immediately follows it. v2 - move CREATE_TRACE_POINTS for the v6 tracepoint to route.c to handle its need for an internal function to convert route type to error and handle IPv6 as a module or builtin. Reported by kbuild robot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:01:23 -04:00
David Ahern	c949cbbbe5	net/ipv4: Remove tracepoint in fib_validate_source Tracepoint does not add value and the call to fib_lookup follows it which shows the same information and the fib lookup result. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:01:15 -04:00
David Ahern	30d444d300	net/ipv6: Udate fib6_table_lookup tracepoint Commit `bb0ad1987e` ("ipv6: fib6_rules: support for match on sport, dport and ip proto") added support for protocol and ports to FIB rules. Update the FIB lookup tracepoint to dump the parameters. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:01:15 -04:00
David Ahern	9f323973c9	net/ipv4: Udate fib_table_lookup tracepoint Commit `4a2d73a4fb` ("ipv4: fib_rules: support match on sport, dport and ip proto") added support for protocol and ports to FIB rules. Update the FIB lookup tracepoint to dump the parameters. In addition, make the IPv4 tracepoint similar to the IPv6 one where the lookup parameters and result are dumped in 1 event. It is much easier to use and understand the outcome of the lookup. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 23:00:31 -04:00
Cong Wang	aaa908ffbe	net_sched: switch to rcu_work Commit `05f0fe6b74` ("RCU, workqueue: Implement rcu_work") introduces new API's for dispatching work in a RCU callback. Now we can just switch to the new API's for tc filters. This could get rid of a lot of code. Cc: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:56:15 -04:00
David S. Miller	1bb58d2d3c	Merge branch 'Mirroring-tests-involving-VLAN' Petr Machata says: ==================== Mirroring tests involving VLAN This patchset tests mirror-to-gretap with various underlay configurations involving VLAN netdevice in particular. Some of the tests involve bridges as well, but tests aimed specifically at testing bridges (i.e. FDB, STP) are not part of this patchset. In patches #1-#6, the codebase is adapted to support the new tests. In patch #7, a test for mirroring to VLAN is introduced. Patches #8-#10 add three tests where VLAN is part of underlay path after gretap encapsulation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:20 -04:00
Petr Machata	181d95f8e1	selftests: forwarding: Test mirror-to-gre w/ UL 802.1d+VLAN Test for "tc action mirred egress mirror" that mirrors to GRE when the underlay route points at an 802.1d bridge and packet egresses through a VLAN device. Besides testing basic connectivity, this also tests that the traffic is properly tagged. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:20 -04:00
Petr Machata	a08fb9f1ad	selftests: forwarding: Test mirror-to-gre w/ UL VLAN Test for "tc action mirred egress mirror" that mirrors to a gretap netdevice whose underlay route points at a vlan device. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:20 -04:00
Petr Machata	0056042f80	selftests: forwarding: Test mirror-to-gre w/ UL VLAN+802.1q Test for "tc action mirred egress mirror" that mirrors to GRE when the underlay route points at a vlan device on top of a bridge device with vlan filtering (802.1q). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:19 -04:00
Petr Machata	35388a6a0c	selftests: forwarding: Test mirror-to-vlan Test for "tc action mirred egress mirror" that mirrors to a vlan device. - test_vlan() tests that the packets get mirrored - test_tagged_vlan() tests that the mirrored packets have correct inner VLAN tag. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:19 -04:00
Petr Machata	87c0c046e8	selftests: forwarding: lib: Extract trap_{, un}install() A mirror-to-vlan test that's coming next needs to install the trap unconditionally. Therefore extract from slow_path_trap_{,un}install() a more generic functions trap_install() and trap_uninstall(), and covert the former two to conditional wrappers around these. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:19 -04:00
Petr Machata	1893150fd5	selftests: forwarding: mirror_gre_lib: Support VLAN Add full_test_span_gre_dir_vlan_ips() and full_test_span_gre_dir_vlan() to support mirror-to-gre tests that involve VLAN. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:19 -04:00
Petr Machata	0e7a504c09	selftests: forwarding: lib: Support VLAN devices Add vlan_create() and vlan_destroy() to manage VLAN netdevices. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:19 -04:00
Petr Machata	91bac7f997	selftests: forwarding: Add $h3's clsact to mirror_topo_lib.sh Having a clsact qdisc on $h3 is useful in several tests, and will be useful in more tests to come. Move the registration from all the tests that need it into the topology file itself. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:19 -04:00
Petr Machata	d5ea2bfc80	selftests: forwarding: mirror_gre_lib: Extract generic functions For non-GRE mirroring tests, a functions along the lines of do_test_span_gre_dir_ips() and test_span_gre_dir_ips() are necessary, but such that they don't assume tunnels are involved. Extract the code from mirror_gre_lib.sh to mirror_lib.sh and convert to just use a given device without assuming it's named "h3-$tundev". Convert the two above-mentioned functions to wrappers that pass along the correct device name. Add test_span_dir() and fail_test_span_dir() to round up the API for use by following patches. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:19 -04:00
Petr Machata	74ed089d48	selftests: forwarding: Split mirror_gre_topo_lib.sh Move generic parts of mirror_gre_topo_lib.sh into a new file mirror_topo_lib.sh. Reuse the functions in GRE topo, adding the tunnel devices as necessary. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:26:19 -04:00
David S. Miller	90fed9c946	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-05-24 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc). 2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers. 3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit. 4) Jiong Wang adds support for indirect and arithmetic shifts to NFP 5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible. 6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions. 7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT. 8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events. 9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:20:51 -04:00
David S. Miller	49a473f5b5	Merge branch 'ibmvnic-Failover-hardening' Thomas Falcon says: ==================== ibmvnic: Failover hardening Introduce additional transport event hardening to handle events during device reset. In the driver's current state, if a transport event is received during device reset, it can cause the device to become unresponsive as invalid operations are processed as the backing device context changes. After a transport event, the device expects a request to begin the initialization process. If the driver is still processing a previously queued device reset in this state, it is likely to fail as firmware will reject any commands other than the one to initialize the client driver's Command-Response Queue. Instead of failing and becoming dormant, the driver will make one more attempt to recover and continue operation. This is achieved by setting a state flag, which if true will direct the driver to clean up all allocated resources and perform a hard reset in an attempt to bring the driver back to an operational state. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	2770a7984d	ibmvnic: Introduce hard reset recovery Introduce a recovery hard reset to handle reset failure as a result of change of device context following a transport event, such as a backing device failover or partition migration. These operations reset the device context to its initial state. If this occurs during a reset, any initialization commands are likely to fail with an invalid state error as backing device firmware requests reinitialization. When this happens, make one more attempt by performing a hard reset, which frees any resources currently allocated and performs device initialization. If a transport event occurs during a device reset, a flag is set which will trigger a new hard reset following the completionof the current reset event. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	06e43d7f9f	ibmvnic: Set resetting state at earliest possible point Set device resetting state at the earliest possible point: as soon as a reset is successfully scheduled. The reset state is toggled off when all resets have been processed to completion. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	8a348450a0	ibmvnic: Create separate initialization routine for resets Instead of having one initialization routine for all cases, create a separate, simpler function for standard initialization, such as during device probe. Use the original initialization function to handle device reset scenarios. The goal of this patch is to avoid having a single, cluttered init function to handle all possible scenarios. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	ab5ec33b9a	ibmvnic: Handle error case when setting link state If setting the link state is not successful, print a warning with the resulting return code and return it to be handled by the caller. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	17c8705838	ibmvnic: Return error code if init interrupted by transport event If device init is interrupted by a failover, set the init return code so that it can be checked and handled appropriately by the init routine. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	9c4eaabd1b	ibmvnic: Check CRQ command return codes Check whether CRQ command is successful before awaiting a response from the management partition. If the command was not successful, the driver may hang waiting for a response that will never come. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	5153698e55	ibmvnic: Introduce active CRQ state Introduce an "active" state for a IBM vNIC Command-Response Queue. A CRQ is considered active once it has initialized or linked with its partner by sending an initialization request and getting a successful response back from the management partition. Until this has happened, do not allow CRQ commands to be sent other than the initialization request. This change will avoid a protocol error in case of a device transport event occurring during a initialization. When the driver receives a transport event notification indicating that the backing hardware has changed and needs reinitialization, any further commands other than the initialization handshake with the VIOS management partition will result in an invalid state error. Instead of sending a command that will be returned with an error, print a warning and return an error that will be handled by the caller. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:25 -04:00
Thomas Falcon	c3f2241547	ibmvnic: Mark NAPI flag as disabled when released Set adapter NAPI state as disabled if they are removed. This will allow them to be enabled again if reallocated in case of a hard reset. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:25 -04:00
David S. Miller	180f848b8f	Merge branch 'gretap-mirroring-selftests' Petr Machata says: ==================== selftests: forwarding: Additions to mirror-to-gretap tests This patchset is for a handful of edge cases in mirror-to-gretap scenarios: removal of mirrored-to netdevice (#1), removal of underlay route for tunnel remote endpoint (#2) and cessation of mirroring upon removal of flower mirroring rule (#3). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:14:37 -04:00
Petr Machata	a96d81a20b	selftests: forwarding: Test removal of mirroring Test that when flower-based mirror action is removed, mirroring stops. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:14:36 -04:00
Petr Machata	77a8df3810	selftests: forwarding: Test removal of underlay route When underlay route is removed, the mirrored traffic should not be forwarded. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:14:36 -04:00
Petr Machata	6b45432d78	selftests: forwarding: Test mirroring to deleted device Tests that the mirroring code catches up with deletion of a mirrored-to device. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:14:36 -04:00
YueHaibing	d624613e42	cxgb4: Check for kvzalloc allocation failure t4_prep_fw doesn't check for card_fw pointer before store the read data, which could lead to a NULL pointer dereference if kvzalloc failed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 21:52:44 -04:00
Alexei Starovoitov	10f678683e	Merge branch 'xdp_xmit-bulking' Jesper Dangaard Brouer says: ==================== This patchset change ndo_xdp_xmit API to take a bulk of xdp frames. When kernel is compiled with CONFIG_RETPOLINE, every indirect function pointer (branch) call hurts performance. For XDP this have a huge negative performance impact. This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but also prepares for further optimizations. The DMA APIs use of indirect function pointer calls is the primary source the regression. It is left for a followup patchset, to use bulking calls towards the DMA API (via the scatter-gatter calls). The other advantage of this API change is that drivers can easier amortize the cost of any sync/locking scheme, over the bulk of packets. The assumption of the current API is that the driver implemementing the NDO will also allocate a dedicated XDP TX queue for every CPU in the system. Which is not always possible or practical to configure. E.g. ixgbe cannot load an XDP program on a machine with more than 96 CPUs, due to limited hardware TX queues. E.g. virtio_net is hard to configure as it requires manually increasing the queues. E.g. tun driver chooses to use a per XDP frame producer lock modulo smp_processor_id over avail queues. I'm considered adding 'flags' to ndo_xdp_xmit, but it's not part of this patchset. This will be a followup patchset, once we know if this will be needed (e.g. for non-map xdp_redirect flush-flag, and if AF_XDP chooses to use ndo_xdp_xmit for TX). --- V5: Fixed up issues spotted by Daniel and John V4: Splitout the patches from 4 to 8 patches. I cannot split the driver changes from the NDO change, but I've tried to isolated the NDO change together with the driver change as much as possible. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:16 -07:00
Jesper Dangaard Brouer	a570e48fee	samples/bpf: xdp_monitor use err code from tracepoint xdp:xdp_devmap_xmit Update xdp_monitor to use the recently added err code introduced in tracepoint xdp:xdp_devmap_xmit, to show if the drop count is caused by some driver general delivery problem. Other kind of drops will likely just be more normal TX space issues. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	e74de52e55	xdp/trace: extend tracepoint in devmap with an err Extending tracepoint xdp:xdp_devmap_xmit in devmap with an err code allow people to easier identify the reason behind the ndo_xdp_xmit call to a given driver is failing. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	735fc4054b	xdp: change ndo_xdp_xmit API to support bulking This patch change the API for ndo_xdp_xmit to support bulking xdp_frames. When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. Most of the slowdown is caused by DMA API indirect function calls, but also the net_device->ndo_xdp_xmit() call. Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed performance improved: for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps With frames avail as a bulk inside the driver ndo_xdp_xmit call, further optimizations are possible, like bulk DMA-mapping for TX. Testing without CONFIG_RETPOLINE show the same performance for physical NIC drivers. The virtual NIC driver tun sees a huge performance boost, as it can avoid doing per frame producer locking, but instead amortize the locking cost over the bulk. V2: Fix compile errors reported by kbuild test robot <lkp@intel.com> V4: Isolated ndo, driver changes and callers. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	389ab7f01a	xdp: introduce xdp_return_frame_rx_napi When sending an xdp_frame through xdp_do_redirect call, then error cases can happen where the xdp_frame needs to be dropped, and returning an -errno code isn't sufficient/possible any-longer (e.g. for cpumap case). This is already fully supported, by simply calling xdp_return_frame. This patch is an optimization, which provides xdp_return_frame_rx_napi, which is a faster variant for these error cases. It take advantage of the protection provided by XDP RX running under NAPI protection. This change is mostly relevant for drivers using the page_pool allocator as it can take advantage of this. (Tested with mlx5). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	9940fbf633	samples/bpf: xdp_monitor use tracepoint xdp:xdp_devmap_xmit The xdp_monitor sample/tool is updated to use the new tracepoint xdp:xdp_devmap_xmit the previous patch just introduced. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	38edddb811	xdp: add tracepoint for devmap like cpumap have Notice how this allow us get XDP statistic without affecting the XDP performance, as tracepoint is no-longer activated on a per packet basis. V5: Spotted by John Fastabend. Fix 'sent' also counted 'drops' in this patch, a later patch corrected this, but it was a mistake in this intermediate step. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	5d053f9da4	bpf: devmap prepare xdp frames for bulking Like cpumap create queue for xdp frames that will be bulked. For now, this patch simply invoke ndo_xdp_xmit foreach frame. This happens, either when the map flush operation is envoked, or when the limit DEV_MAP_BULK_SIZE is reached. V5: Avoid memleak on error path in dev_map_update_elem() Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:14 -07:00

1 2 3 4 5 ...

755318 Commits