linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-17 01:22:07 +00:00

Author	SHA1	Message	Date
Florian Westphal	600911ff5f	mptcp: add rmem queue accounting If userspace never drains the receive buffers we must stop draining the subflow socket(s) at some point. This adds the needed rmem accouting for this. If the threshold is reached, we stop draining the subflows. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:46:26 -08:00
Florian Westphal	6771bfd9ee	mptcp: update mptcp ack sequence from work queue If userspace is not reading data, all the mptcp-level acks contain the ack_seq from the last time userspace read data rather than the most recent in-sequence value. This causes pointless retransmissions for data that is already queued. The reason for this is that all the mptcp protocol level processing happens at mptcp_recv time. This adds work queue to move skbs from the subflow sockets receive queue on the mptcp socket receive queue (which was not used so far). This allows us to announce the correct mptcp ack sequence in a timely fashion, even when the application does not call recv() on the mptcp socket for some time. We still wake userspace tasks waiting for POLLIN immediately: If the mptcp level receive queue is empty (because the work queue is still pending) it can be filled from in-sequence subflow sockets at recv time without a need to wait for the worker. The skb_orphan when moving skbs from subflow to mptcp level is needed, because the destructor (sock_rfree) relies on skb->sk (ssk!) lock being taken. A followup patch will add needed rmem accouting for the moved skbs. Other problem: In case application behaves as expected, and calls recv() as soon as mptcp socket becomes readable, the work queue will only waste cpu cycles. This will also be addressed in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:46:26 -08:00
Paolo Abeni	8099201715	mptcp: add work queue skeleton Will be extended with functionality in followup patches. Initial user is moving skbs from subflows receive queue to the mptcp-level receive queue. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:46:26 -08:00
Florian Westphal	101f6f851e	mptcp: add and use mptcp_data_ready helper allows us to schedule the work queue to drain the ssk receive queue in a followup patch. This is needed to avoid sending all-to-pessimistic mptcp-level acknowledgements. At this time, the ack_seq is what was last read by userspace instead of the highest in-sequence number queued for reading. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:46:26 -08:00
David S. Miller	5cd129dd5e	Merge branch 'mlxsw-Small-driver-update' Jiri Pirko says: ==================== mlxsw: Small driver update This patchset contains couple of patches not related to each other. They are small optimization and extension changes to the driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:44:42 -08:00
Petr Machata	3b909c552a	mlxsw: spectrum: Add mlxsw_sp_span_ops.buffsize_get for Spectrum-3 The buffer factor on Spectrum-3 is larger than on Spectrum-2. Add a new callback and use it for mlxsw_sp->span_ops on Spectrum-3. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:44:42 -08:00
Ido Schimmel	b401ff8541	mlxsw: spectrum: Initialize advertised speeds to supported speeds During port initialization the driver instructs the device to only advertise speeds that can be supported by the port's current width. Since the device now returns the supported speeds based on the port's current width, the driver no longer needs to compute the speeds that can be advertised. Simplify port initialization by setting the advertised speeds to the queried supported speeds. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:44:42 -08:00
Petr Machata	8a29581eb0	mlxsw: spectrum: Move the ECN-marked packet counter to ethtool Spectrum-1 and Spectrum-2 do not have a per-TC counter of number of packets marked by ECN. The value reported currently is the total number of marked packets. Showing this value at individual TC Qdiscs is misleading. Move the counter to ethtool instead. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:44:42 -08:00
Jiri Pirko	648e53cac7	mlxsw: spectrum_switchdev: Optimize SFN records processing Currently, only one SFN query is done from repetitive work at a time, processing 64 entries. Another work iteration is scheduled in 100ms, that means that the max rate of learned FDB entries is limited to 6400/s. That is slow. Fix this by doing 2 optimizations: 1) Run 10 SFN queries at a time. 2) In case the SFN is not drained, schedule work with 0 delay to allow to continue processing rest of the records. On a testing setup with 500K entries the time to process decreased from 870secs to 10secs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Alex Kushnarov <alexanderk@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:44:42 -08:00
Randy Dunlap	c535f92032	af_llc: fix if-statement empty body warning When debugging via dprintk() is not enabled, make the dprintk() macro be an empty do-while loop, as is done in <linux/sunrpc/debug.h>. This fixes a gcc warning when -Wextra is set: ../net/llc/af_llc.c:974:51: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] I have verified that there is not object code change (with gcc 7.5.0). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:38:13 -08:00
David S. Miller	165b94ffcf	mlx5-updates-2020-02-25 The following series provides some misc updates to mlx5 driver: 1) From Maxim, Refactoring for mlx5e netdev channels recreation flow. - Add error handling - Add context to the preactivate hook - Use preactivate hook with context where it can be used and subsequently unify channel recreation flow everywhere. - Fix XPS cpumask to not reset upon channel recreation. 2) From Tariq: - Use indirect calls wrapper on RX. - Check LRO capability bit 3) Multiple small cleanups -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5VxJAACgkQSD+KveBX +j5q6gf+NPVz9uaWlegeAD0J/AkrycIOlceaBhwQ7zDJ1u2qy8Aan6QuprFLKwkW hHDfyqxMXdsHnQJAfOiUU1HGgK2v092BWoArdHMJwift333TQkRzJNXCM9WBKAOU PpyOgvRESLWOjcgOcm3AYv/FKjiW5akm7rlW+jYy3JswjY4a7rqJvVKUnXXPP9B1 5oiis1OIQVz9y2GTEOYwBx3euZ87TCqzsgF+qNkKtFil74gOnVHNx4kqFTLE3b0Z awq+dvKOUlCIobAX69BmKDj6ieAKNO7l0mfMYnmbVk8nNt33F0z94MBIw2ztCcPq AN9hDZdQitnQFH9okdVCf4rKxaQoxA== =lixf -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-02-25 The following series provides some misc updates to mlx5 driver: 1) From Maxim, Refactoring for mlx5e netdev channels recreation flow. - Add error handling - Add context to the preactivate hook - Use preactivate hook with context where it can be used and subsequently unify channel recreation flow everywhere. - Fix XPS cpumask to not reset upon channel recreation. 2) From Tariq: - Use indirect calls wrapper on RX. - Check LRO capability bit 3) Multiple small cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:30:43 -08:00
David S. Miller	06baf4be20	Merge branch 'net-smc-improve-peer-ID-in-CLC-decline' Hans Wippel says: ==================== net/smc: improve peer ID in CLC decline The following two patches improve the peer ID in CLC decline messages if RoCE devices are present in the host but no suitable device is found for a connection. The first patch reworks the peer ID initialization. The second patch contains the actual changes of the CLC decline messages. Changes v1 -> v2: * make smc_ib_is_valid_local_systemid() static in first patch * changed if in smc_clc_send_decline() to remove curly braces Changes RFC -> v1: * split the patch into two parts * removed zero assignment to global variable (thanks Leon) Thanks to Leon Romanovsky and Karsten Graul for the feedback! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:27:06 -08:00
Hans Wippel	a082ec897f	net/smc: improve peer ID in CLC decline for SMC-R According to RFC 7609, all CLC messages contain a peer ID that consists of a unique instance ID and the MAC address of one of the host's RoCE devices. But if a SMC-R connection cannot be established, e.g., because no matching pnet table entry is found, the current implementation uses a zero value in the CLC decline message although the host's peer ID is set to a proper value. If no RoCE and no ISM device is usable for a connection, there is no LGR and the LGR check in smc_clc_send_decline() prevents that the peer ID is copied into the CLC decline message for both SMC-D and SMC-R. So, this patch modifies the check to also accept the case of no LGR. Also, only a valid peer ID is copied into the decline message. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:27:06 -08:00
Hans Wippel	366bb249b5	net/smc: rework peer ID handling This patch initializes the peer ID to a random instance ID and a zero MAC address. If a RoCE device is in the host, the MAC address part of the peer ID is overwritten with the respective address. Also, a function for checking if the peer ID is valid is added. A peer ID is considered valid if the MAC address part contains a non-zero MAC address. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:27:06 -08:00
Arjun Roy	0b7f41f687	tcp-zerocopy: Update returned getsockopt() optlen. TCP receive zerocopy currently does not update the returned optlen for getsockopt() if the user passed in a larger than expected value. Thus, userspace cannot properly determine if all the fields are set in the passed-in struct. This patch sets the optlen for this case before returning, in keeping with the expected operation of getsockopt(). Fixes: `c8856c0514` ("tcp-zerocopy: Return inq along with tcp receive zerocopy.") Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:24:22 -08:00
David S. Miller	ebb4a4bf76	Merge branch 'net-fix-sysfs-permssions-when-device-changes-network' Christian Brauner says: ==================== net: fix sysfs permssions when device changes network /* v7 / This is v7 with a build warning fixup that slipped past me the last time. It removes to unused variables in sysfs_group_change_owner(). I observed no warning when building just now. / v6 / This is v6 with two small fixups. I missed adapting the commit message to reflect the renamed helper for changing the owner of sysfs files and I also forgot to make the new dpm helper static inline. / v5 / This is v5 with a small fixup requested by Rafael. / v4 / This is v4 with more documentation and other fixes that Greg requested. / v3 */ This is v3 with explicit uid and gid parameters added to functions that change sysfs object ownership as Greg requested. (I've tagged this with net-next since it's triggered by a bug for network device files but it also touches driver core aspects so it's not clear-cut. I can of course split this series into separate patchsets.) We have been struggling with a bug surrounding the ownership of network device sysfs files when moving network devices between network namespaces owned by different user namespaces reported by multiple users. Currently, when moving network devices between network namespaces the ownership of the corresponding sysfs entries is not changed. This leads to problems when tools try to operate on the corresponding sysfs files. I also causes a bug when creating a network device in a network namespaces owned by a user namespace and moving that network device back to the host network namespaces. Because when a network device is created in a network namespaces it will be owned by the root user of the user namespace and all its associated sysfs files will also be owned by the root user of the corresponding user namespace. If such a network device has to be moved back to the host network namespace the permissions will still be set to the root user of the owning user namespaces of the originating network namespace. This means unprivileged users can e.g. re-trigger uevents for such incorrectly owned devices on the host or in other network namespaces. They can also modify the settings of the device itself through sysfs when they wouldn't be able to do the same through netlink. Both of these things are unwanted. For example, quite a few workloads will create network devices in the host network namespace. Other tools will then proceed to move such devices between network namespaces owner by other user namespaces. While the ownership of the device itself is updated in net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs entry for the device is not. Below you'll find that moving a network device (here a veth device) from a network namespace into another network namespaces owned by a different user namespace with a different id mapping. As you can see the permissions are wrong even though it is owned by the userns root user after it has been moved and can be interacted with through netlink: drwxr-xr-x 5 nobody nobody 0 Jan 25 18:08 . drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 .. -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 power -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down drwxr-xr-x 4 nobody nobody 0 Jan 25 18:09 queues -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 statistics lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:08 subsystem -> ../../../../class/net -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent Constrast this with creating a device of the same type in the network namespace directly. In this case the device's sysfs permissions will be correctly updated. (Please also note, that in a lot of workloads this strategy of creating the network device directly in the network device to workaround this issue can not be used. Either because the network device is dedicated after it has been created or because it used by a process that is heavily sandboxed and couldn't create network devices itself.): drwxr-xr-x 5 root root 0 Jan 25 18:12 . drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 .. -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_assign_type -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_len -r--r--r-- 1 root root 4096 Jan 25 18:12 address -r--r--r-- 1 root root 4096 Jan 25 18:12 broadcast -rw-r--r-- 1 root root 4096 Jan 25 18:12 carrier -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_changes -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_down_count -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_up_count -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_id -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_port -r--r--r-- 1 root root 4096 Jan 25 18:12 dormant -r--r--r-- 1 root root 4096 Jan 25 18:12 duplex -rw-r--r-- 1 root root 4096 Jan 25 18:12 flags -rw-r--r-- 1 root root 4096 Jan 25 18:12 gro_flush_timeout -rw-r--r-- 1 root root 4096 Jan 25 18:12 ifalias -r--r--r-- 1 root root 4096 Jan 25 18:12 ifindex -r--r--r-- 1 root root 4096 Jan 25 18:12 iflink -r--r--r-- 1 root root 4096 Jan 25 18:12 link_mode -rw-r--r-- 1 root root 4096 Jan 25 18:12 mtu -r--r--r-- 1 root root 4096 Jan 25 18:12 name_assign_type -rw-r--r-- 1 root root 4096 Jan 25 18:12 netdev_group -r--r--r-- 1 root root 4096 Jan 25 18:12 operstate -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_id -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_name -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_switch_id drwxr-xr-x 2 root root 0 Jan 25 18:12 power -rw-r--r-- 1 root root 4096 Jan 25 18:12 proto_down drwxr-xr-x 4 root root 0 Jan 25 18:12 queues -r--r--r-- 1 root root 4096 Jan 25 18:12 speed drwxr-xr-x 2 root root 0 Jan 25 18:12 statistics lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:12 subsystem -> ../../../../class/net -rw-r--r-- 1 root root 4096 Jan 25 18:12 tx_queue_len -r--r--r-- 1 root root 4096 Jan 25 18:12 type -rw-r--r-- 1 root root 4096 Jan 25 18:12 uevent Now, when creating a network device in a network namespace owned by a user namespace and moving it to the host the permissions will be set to the id that the user namespace root user has been mapped to on the host leading to all sorts of permission issues mentioned above: 458752 drwxr-xr-x 5 458752 458752 0 Jan 25 18:12 . drwxr-xr-x 9 root root 0 Jan 25 18:08 .. -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_assign_type -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_len -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 address -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 broadcast -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_changes -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_down_count -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_up_count -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_id -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_port -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dormant -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 duplex -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 flags -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 gro_flush_timeout -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ifalias -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 ifindex -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 iflink -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 link_mode -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 mtu -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 name_assign_type -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 netdev_group -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 operstate -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_id -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_name -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_switch_id drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 power -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 proto_down drwxr-xr-x 4 458752 458752 0 Jan 25 18:12 queues -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 speed drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 statistics lrwxrwxrwx 1 root root 0 Jan 25 18:12 subsystem -> ../../../../class/net -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 tx_queue_len -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 type -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 uevent Fix this by changing the basic sysfs files associated with network devices when moving them between network namespaces. To this end we add some infrastructure to sysfs. The patchset takes care to only do this when the owning user namespaces changes and the kids differ. So there's only a performance overhead, when the owning user namespace of the network namespace is different __and__ the kid mappings for the root user are different for the two user namespaces: Assume we have a netdev eth0 which we create in netns1 owned by userns1. userns1 has an id mapping of 0 100000 100000. Now we move eth0 into netns2 which is owned by userns2 which also defines an id mapping of 0 100000 100000. In this case sysfs doesn't need updating. The patch will handle this case and not do any needless work. Now assume eth0 is moved into netns3 which is owned by userns3 which defines an id mapping of 0 123456 65536. In this case the root user in each namespace corresponds to different kid and sysfs needs updating. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:26 -08:00
Christian Brauner	ef6a4c88e9	net: fix sysfs permssions when device changes network namespace Now that we moved all the helpers in place and make use netdev_change_owner() to fixup the permissions when moving network devices between network namespaces. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:26 -08:00
Christian Brauner	d755407d44	net-sysfs: add queue_change_owner() Add a function to change the owner of the queue entries for a network device when it is moved between network namespaces. Currently, when moving network devices between network namespaces the ownership of the corresponding queue sysfs entries are not changed. This leads to problems when tools try to operate on the corresponding sysfs files. Fix this. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:26 -08:00
Christian Brauner	e6dee9f389	net-sysfs: add netdev_change_owner() Add a function to change the owner of a network device when it is moved between network namespaces. Currently, when moving network devices between network namespaces the ownership of the corresponding sysfs entries is not changed. This leads to problems when tools try to operate on the corresponding sysfs files. This leads to a bug whereby a network device that is created in a network namespaces owned by a user namespace will have its corresponding sysfs entry owned by the root user of the corresponding user namespace. If such a network device has to be moved back to the host network namespace the permissions will still be set to the user namespaces. This means unprivileged users can e.g. trigger uevents for such incorrectly owned devices. They can also modify the settings of the device itself. Both of these things are unwanted. For example, workloads will create network devices in the host network namespace. Other tools will then proceed to move such devices between network namespaces owner by other user namespaces. While the ownership of the device itself is updated in net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs entry for the device is not: drwxr-xr-x 5 nobody nobody 0 Jan 25 18:08 . drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 .. -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 power -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down drwxr-xr-x 4 nobody nobody 0 Jan 25 18:09 queues -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 statistics lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:08 subsystem -> ../../../../class/net -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent However, if a device is created directly in the network namespace then the device's sysfs permissions will be correctly updated: drwxr-xr-x 5 root root 0 Jan 25 18:12 . drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 .. -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_assign_type -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_len -r--r--r-- 1 root root 4096 Jan 25 18:12 address -r--r--r-- 1 root root 4096 Jan 25 18:12 broadcast -rw-r--r-- 1 root root 4096 Jan 25 18:12 carrier -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_changes -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_down_count -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_up_count -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_id -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_port -r--r--r-- 1 root root 4096 Jan 25 18:12 dormant -r--r--r-- 1 root root 4096 Jan 25 18:12 duplex -rw-r--r-- 1 root root 4096 Jan 25 18:12 flags -rw-r--r-- 1 root root 4096 Jan 25 18:12 gro_flush_timeout -rw-r--r-- 1 root root 4096 Jan 25 18:12 ifalias -r--r--r-- 1 root root 4096 Jan 25 18:12 ifindex -r--r--r-- 1 root root 4096 Jan 25 18:12 iflink -r--r--r-- 1 root root 4096 Jan 25 18:12 link_mode -rw-r--r-- 1 root root 4096 Jan 25 18:12 mtu -r--r--r-- 1 root root 4096 Jan 25 18:12 name_assign_type -rw-r--r-- 1 root root 4096 Jan 25 18:12 netdev_group -r--r--r-- 1 root root 4096 Jan 25 18:12 operstate -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_id -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_name -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_switch_id drwxr-xr-x 2 root root 0 Jan 25 18:12 power -rw-r--r-- 1 root root 4096 Jan 25 18:12 proto_down drwxr-xr-x 4 root root 0 Jan 25 18:12 queues -r--r--r-- 1 root root 4096 Jan 25 18:12 speed drwxr-xr-x 2 root root 0 Jan 25 18:12 statistics lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:12 subsystem -> ../../../../class/net -rw-r--r-- 1 root root 4096 Jan 25 18:12 tx_queue_len -r--r--r-- 1 root root 4096 Jan 25 18:12 type -rw-r--r-- 1 root root 4096 Jan 25 18:12 uevent Now, when creating a network device in a network namespace owned by a user namespace and moving it to the host the permissions will be set to the id that the user namespace root user has been mapped to on the host leading to all sorts of permission issues: 458752 drwxr-xr-x 5 458752 458752 0 Jan 25 18:12 . drwxr-xr-x 9 root root 0 Jan 25 18:08 .. -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_assign_type -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_len -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 address -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 broadcast -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_changes -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_down_count -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_up_count -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_id -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_port -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dormant -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 duplex -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 flags -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 gro_flush_timeout -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ifalias -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 ifindex -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 iflink -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 link_mode -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 mtu -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 name_assign_type -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 netdev_group -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 operstate -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_id -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_name -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_switch_id drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 power -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 proto_down drwxr-xr-x 4 458752 458752 0 Jan 25 18:12 queues -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 speed drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 statistics lrwxrwxrwx 1 root root 0 Jan 25 18:12 subsystem -> ../../../../class/net -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 tx_queue_len -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 type -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 uevent Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:25 -08:00
Christian Brauner	3b52fc5d78	drivers/base/power: add dpm_sysfs_change_owner() Add a helper to change the owner of a device's power entries. This needs to happen when the ownership of a device is changed, e.g. when moving network devices between network namespaces. This function will be used to correctly account for ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:25 -08:00
Christian Brauner	b8f33e5d76	device: add device_change_owner() Add a helper to change the owner of a device's sysfs entries. This needs to happen when the ownership of a device is changed, e.g. when moving network devices between network namespaces. This function will be used to correctly account for ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:25 -08:00
Christian Brauner	2c4f9401ce	sysfs: add sysfs_change_owner() Add a helper to change the owner of sysfs objects. This function will be used to correctly account for kobject ownership changes, e.g. when moving network devices between network namespaces. This mirrors how a kobject is added through driver core which in its guts is done via kobject_add_internal() which in summary creates the main directory via create_dir(), populates that directory with the groups associated with the ktype of the kobject (if any) and populates the directory with the basic attributes associated with the ktype of the kobject (if any). These are the basic steps that are associated with adding a kobject in sysfs. Any additional properties are added by the specific subsystem itself (not by driver core) after it has registered the device. So for the example of network devices, a network device will e.g. register a queue subdirectory under the basic sysfs directory for the network device and than further subdirectories within that queues subdirectory. But that is all specific to network devices and they call the corresponding sysfs functions to do that directly when they create those queue objects. So anything that a subsystem adds outside of what driver core does must also be changed by it (That's already true for removal of files it created outside of driver core.) and it's the same for ownership changes. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:25 -08:00
Christian Brauner	303a42769c	sysfs: add sysfs_group{s}_change_owner() Add helpers to change the owner of sysfs groups. This function will be used to correctly account for kobject ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:25 -08:00
Christian Brauner	0666a3aee7	sysfs: add sysfs_link_change_owner() Add a helper to change the owner of a sysfs link. This function will be used to correctly account for kobject ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:25 -08:00
Christian Brauner	f70ce18568	sysfs: add sysfs_file_change_owner() Add helpers to change the owner of a sysfs files. This function will be used to correctly account for kobject ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 20:07:25 -08:00
Gustavo A. R. Silva	d1c73cbdf9	net: cisco: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] Lastly, fix the following checkpatch warning: CHECK: Prefer kernel type 'u32' over 'u_int32_t' #61: FILE: drivers/net/ethernet/cisco/enic/vnic_devcmd.h:653: + u_int32_t val[]; This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 16:43:09 -08:00
Gustavo A. R. Silva	274ac2831a	net: marvell: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 16:43:09 -08:00
Gustavo A. R. Silva	c5d6cf903f	net: hns: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 16:43:09 -08:00
Gustavo A. R. Silva	62f1914251	sfc: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 16:43:09 -08:00
Gustavo A. R. Silva	4a34d825b8	qlogic: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was detected with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 16:43:09 -08:00
Florian Fainelli	3f02735e5d	Revert "net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278" This reverts commit `7458bd540f` ("net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278") as it causes advanced congestion buffering issues with 7278 switch devices when using their internal Giabit PHY. While this is being debugged, continue with conservative defaults that work and do not cause packet loss. Fixes: `7458bd540f` ("net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 16:33:35 -08:00
Jiri Pirko	bb0858d8bc	iavf: use tc_cls_can_offload_and_chain0() instead of chain check Looks like the iavf code actually experienced a race condition, when a developer took code before the check for chain 0 was put to helper. So use tc_cls_can_offload_and_chain0() helper instead of direct check and move the check to _cb() so this is similar to i40e code. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-26 09:01:19 -08:00
Saeed Mahameed	586ee9e8a3	net/mlx5: sparse: warning: Using plain integer as NULL pointer Return NULL instead of 0. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>	2020-02-25 17:06:21 -08:00
Saeed Mahameed	5edc4c7275	net/mlx5: sparse: warning: incorrect type in assignment drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:191:13: sparse: warning: incorrect type in assignment (different base types) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>	2020-02-25 17:06:19 -08:00
Nathan Chancellor	fa2b491287	net/mlx5: Fix header guard in rsc_dump.h Clang warns: In file included from ../drivers/net/ethernet/mellanox/mlx5/core/main.c:73: ../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:4:9: warning: '__MLX5_RSC_DUMP_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard] #ifndef __MLX5_RSC_DUMP_H ^~~~~~~~~~~~~~~~~ ../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:5:9: note: '__MLX5_RSC_DUMP__H' is defined here; did you mean '__MLX5_RSC_DUMP_H'? #define __MLX5_RSC_DUMP__H ^~~~~~~~~~~~~~~~~~ __MLX5_RSC_DUMP_H 1 warning generated. Make them match to get the intended behavior and remove the warning. Fixes: `12206b1723` ("net/mlx5: Add support for resource dump") Link: https://github.com/ClangBuiltLinux/linux/issues/897 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:06:16 -08:00
Hans Wippel	fa194707a9	Documentation: fix vxlan typo in mlx5.rst Fix a vxlan typo in the mlx5 driver documentation. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:06:13 -08:00
Tariq Toukan	e9c1d2539d	net/mlx5e: RX, Use indirect calls wrapper for handling compressed completions We can avoid an indirect call per compressed completion wrapping the completion handling call with the appropriate helper. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:06:10 -08:00
Tariq Toukan	2c8f80b3e3	net/mlx5e: RX, Use indirect calls wrapper for posting descriptors We can avoid an indirect call per NAPI cycle wrapping the RX descriptors posting call with the appropriate helper. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:06:07 -08:00
Maxim Mikityanskiy	6e0504c698	net/mlx5e: Change inline mode correctly when changing trust state The current steps that are performed when the trust state changes, if the channels are active: 1. The trust state is changed in hardware. 2. The new inline mode is calculated. 3. If the new inline mode is different, the channels are recreated using the new inline mode. This approach has some issues: 1. There is a time gap between changing trust state in hardware and starting sending enough inline headers (the latter happens after recreation of channels). It leads to failed transmissions and error CQEs. 2. If the new channels fail to open, we'll be left with the old ones, but the hardware will be configured for the new trust state, so the interval when we can see TX errors never ends. This patch fixes the issues above by moving the trust state change into the preactivate hook that runs during the recreation of the channels when no channels are active, so it eliminates the gap of partially applied configuration. If the inline mode doesn't change with the change of the trust state, the channels won't be recreated, just like before this patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:06:04 -08:00
Maxim Mikityanskiy	b9ab5d0ecf	net/mlx5e: Add context to the preactivate hook Sometimes the preactivate hook of mlx5e_safe_switch_channels needs more parameters than just struct mlx5e_priv . For such cases, a new parameter (void context) is added to preactivate hooks. Some of the existing normal functions are currently used as preactivate callbacks. To avoid adding an extra unused parameter, they are wrapped in an automatic way using the MLX5E_DEFINE_PREACTIVATE_WRAPPER_CTX macro. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:06:02 -08:00
Maxim Mikityanskiy	35a78ed4c3	net/mlx5e: Allow mlx5e_switch_priv_channels to fail and recover Currently mlx5e_switch_priv_channels expects that the preactivate hook doesn't fail, however, it can fail, because it may set hardware parameters. This commit addresses this issue and provides a way to recover from failures of the preactivate hook: the old channels are not closed until the point where nothing can fail anymore, so in case preactivate fails, the driver can roll back the old channels and activate them again. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:05:59 -08:00
Maxim Mikityanskiy	600a3952a2	net/mlx5e: Remove unneeded netif_set_real_num_tx_queues The number of queues is now updated by mlx5e_update_netdev_queues in a centralized way, when no channels are active. Remove an extra occurrence of netif_set_real_num_tx_queues to prepare it for the next commit. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:05:56 -08:00
Maxim Mikityanskiy	3909a12e79	net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases Currently, mlx5e notifies the kernel about the number of queues and sets the default XPS cpumasks when channels are activated. This implementation has several corner cases, in which the kernel may not be updated on time, or XPS cpumasks may be reset when not directly touched by the user. This commit fixes these corner cases to match the following expected behavior: 1. The number of queues always corresponds to the number of channels configured. 2. XPS cpumasks are set to driver's defaults on netdev attach. 3. XPS cpumasks set by user are not reset, unless the number of channels changes. If the number of channels changes, they are reset to driver's defaults. (In general case, when the number of channels increases or decreases, it's not possible to guess how to convert the current XPS cpumasks to work with the new number of channels, so we let the user reconfigure it if they change the number of channels.) XPS cpumasks are no longer stored per channel. Only one temporary cpumask is used. The old stored cpumasks didn't reflect the user's changes and were not used after applying them. A scratchpad area is added to struct mlx5e_priv. As cpumask_var_t requires allocation, and the preactivate hook can't fail, we need to preallocate the temporary cpumask in advance. It's stored in the scratchpad. Fixes: `149e566fef` ("net/mlx5e: Expand XPS cpumask to cover all online cpus") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:05:53 -08:00
Maxim Mikityanskiy	fe867cac9e	net/mlx5e: Use preactivate hook to set the indirection table mlx5e_ethtool_set_channels updates the indirection table before switching to the new channels. If the switch fails, the indirection table is new, but the channels are old, which is wrong. Fix it by using the preactivate hook of mlx5e_safe_switch_channels to update the indirection table at the stage when nothing can fail anymore. As the code that updates the indirection table is now encapsulated into a new function, use that function in the attach flow when the driver has to reduce the number of channels, and prepare the code for the next commit. Fixes: `85082dba0a` ("net/mlx5e: Correctly handle RSS indirection table when changing number of channels") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:05:51 -08:00
Maxim Mikityanskiy	dca147b3dc	net/mlx5e: Rename hw_modify to preactivate mlx5e_safe_switch_channels accepts a callback to be called before activating new channels. It is intended to configure some hardware parameters in cases where channels are recreated because some configuration has changed. Recently, this callback has started being used to update the driver's internal MLX5E_STATE_XDP_OPEN flag, and the following patches also intend to use this callback for software preparations. This patch renames the hw_modify callback to preactivate, so that the name fits better. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:05:47 -08:00
Maxim Mikityanskiy	c2c95271f9	net/mlx5e: Encapsulate updating netdev queues into a function As a preparation for one of the following commits, create a function to encapsulate the code that notifies the kernel about the new amount of RX and TX queues. The code will be called multiple times in the next commit. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:05:45 -08:00
Tariq Toukan	02377e6edf	net/mlx5e: Add missing LRO cap check The LRO boolean state in params->lro_en must not be set in case the NIC is not capable. Enforce this check and remove the TODO comment. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:05:42 -08:00
Eran Ben Elisha	4229e0ea2c	net/mlx5e: Define one flow for TXQ selection when TCs are configured We shall always extract channel index out of the txq, regardless of the relation between txq_ix and num channels. The extraction is always valid, as if txq is smaller than number of channels, txq_ix == priv->txq2sq[txq_ix]->ch_ix. By doing so, we can remove an if clause from the select queue method, and have one flow for all packets. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-02-25 17:05:39 -08:00
David S. Miller	f13e4415d2	Merge branch 'mlxsw-Implement-ACL-dropped-packets-identification' Jiri Pirko says: ==================== mlxsw: Implement ACL-dropped packets identification mlxsw hardware allows to insert a ACL-drop action with a value defined by user that would be later on passed with a dropped packet. To implement this, use the existing TC action cookie and pass it to the driver. As the cookie format coming down from TC and the mlxsw HW cookie format is different, do the mapping of these two using idr and rhashtable. The cookie is passed up from the HW through devlink_trap_report() to drop_monitor code. A new metadata type is used for that. Example: $ tc qdisc add dev enp0s16np1 clsact $ tc filter add dev enp0s16np1 ingress protocol ip pref 10 flower skip_sw dst_ip 192.168.1.2 action drop cookie 3b45fa38c8 ^^^^^^^^^^ $ devlink trap set pci/0000:00:10.0 trap acl action trap $ dropwatch Initializing null lookup method dropwatch> set hw true setting hardware drops monitoring to 1 dropwatch> set alertmode packet Setting alert mode Alert mode successfully set dropwatch> start Enabling monitoring... Kernel monitoring activated. Issue Ctrl-C to stop monitoring drop at: ingress_flow_action_drop (acl_drops) origin: hardware input port ifindex: 30 input port name: enp0s16np1 cookie: 3b45fa38c8 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< timestamp: Fri Jan 24 17:10:53 2020 715387671 nsec protocol: 0x800 length: 98 original length: 98 This way the user may insert multiple drop rules and monitor the dropped packets with the information of which action caused the drop. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-25 11:05:55 -08:00
Jiri Pirko	7a3c3f4440	selftests: netdevsim: Extend devlink trap test to include flow action cookie Extend existing devlink trap test to include metadata type for flow action cookie. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-25 11:05:55 -08:00

1 2 3 4 5 ...

901553 Commits