linux

Author	SHA1	Message	Date
Michal Swiatkowski	ff5411ef88	ice: manage VSI antispoof and destination override Implement functions to make setting VSI security config easier. Main function ice_update_security fills security section field and checks against error in updating VSI. Reset functions are responsible for correct filling config according to user expectations. This helper is needed because destination override is located in this section. Driver has to set this bit to allow strering Tx packet on VSI based on value in Tx descriptors. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-07 10:41:42 -07:00
Michal Swiatkowski	ac19e03ef7	ice: allow process VF opcodes in different ways In switchdev driver shouldn't add MAC, VLAN and promisc filters on iavf demand but should return success to not break normal iavf flow. Achieve that by creating table of functions pointer with default functions used to parse iavf command. While parse iavf command, call correct function from table instead of calling function direct. When port representors are being created change functions in table to new one that behaves correctly for switchdev puprose (ignoring new filters). Change back to default ops when representors are being removed. Co-developed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-07 10:41:42 -07:00
Michal Swiatkowski	37165e3f56	ice: introduce VF port representor Port representor is used to manage VF from host side. To allow it each created representor registers netdevice with random hw address. Also devlink port is created for all representors. Port representor name is created based on switch id or managed by devlink core if devlink port was registered with success. Open and stop ndo ops are implemented to allow managing the VF link state. Link state is tracked in VF struct. Struct ice_netdev_priv is extended by pointer to representor field. This is needed to get correct representor from netdev struct mostly used in ndo calls. Implement helper functions to check if given netdev is netdev of port representor (ice_is_port_repr_netdev) and to get representor from netdev (ice_netdev_to_repr). As driver mostly will create or destroy port representors on all VFs instead of on single one, write functions to add and remove representor for each VF. Representor struct contains pointer to source VSI, which is VSI configured on VF, backpointer to VF, backpointer to netdev, q_vector pointer and metadata_dst which will be used in data path. Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-07 10:41:41 -07:00
Wojciech Drewek	2ae0aa4758	ice: Move devlink port to PF/VF struct Keeping devlink port inside VSI data structure causes some issues. Since VF VSI is released during reset that means that we have to unregister devlink port and register it again every time reset is triggered. With the new changes in devlink API it might cause deadlock issues. After calling devlink_port_register/devlink_port_unregister devlink API is going to lock rtnl_mutex. It's an issue when VF reset is triggered in netlink operation context (like setting VF MAC address or VLAN), because rtnl_lock is already taken by netlink. Another call of rtnl_lock from devlink API results in dead-lock. By moving devlink port to PF/VF we avoid creating/destroying it during reset. Since this patch, devlink ports are created during ice_probe, destroyed during ice_remove for PF and created during ice_repr_add, destroyed during ice_repr_rem for VF. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-07 10:41:41 -07:00
Michal Swiatkowski	3ea9bd5d02	ice: support basic E-Switch mode control Write set and get eswitch mode functions used by devlink ops. Use new pf struct member eswitch_mode to track current eswitch mode in driver. Changing eswitch mode is only allowed when there are no VFs created. Create new file for eswitch related code. Add config flag ICE_SWITCHDEV to allow user to choose if switchdev support should be enabled or disabled. Use case examples: - show current eswitch mode ('legacy' is the default one) [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode legacy - move to 'switchdev' mode [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode switchdev [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode switchdev - create 2 VFs [root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs - unsuccessful attempt to change eswitch mode while VFs are created [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy devlink answers: Operation not supported - destroy VFs [root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs - restore 'legacy' mode [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode legacy Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-07 10:41:41 -07:00
Stefan Assmann	54ee39439a	iavf: fix double unlock of crit_lock The crit_lock mutex could be unlocked twice as reported here https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210823/025525.html Remove the superfluous unlock. Technically the problem was already present before `5ac49f3c27` as that commit only replaced the locking primitive, but no functional change. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `5ac49f3c27` ("iavf: use mutexes for locking of critical sections") Fixes: `bac8486116` ("iavf: Refactor the watchdog state machine") Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-06 09:49:34 -07:00
Sylwester Dziedziuch	2e5a20573a	i40e: Fix freeing of uninitialized misc IRQ vector When VSI set up failed in i40e_probe() as part of PF switch set up driver was trying to free misc IRQ vectors in i40e_clear_interrupt_scheme and produced a kernel Oops: Trying to free already-free IRQ 266 WARNING: CPU: 0 PID: 5 at kernel/irq/manage.c:1731 __free_irq+0x9a/0x300 Workqueue: events work_for_cpu_fn RIP: 0010:__free_irq+0x9a/0x300 Call Trace: ? synchronize_irq+0x3a/0xa0 free_irq+0x2e/0x60 i40e_clear_interrupt_scheme+0x53/0x190 [i40e] i40e_probe.part.108+0x134b/0x1a40 [i40e] ? kmem_cache_alloc+0x158/0x1c0 ? acpi_ut_update_ref_count.part.1+0x8e/0x345 ? acpi_ut_update_object_reference+0x15e/0x1e2 ? strstr+0x21/0x70 ? irq_get_irq_data+0xa/0x20 ? mp_check_pin_attr+0x13/0xc0 ? irq_get_irq_data+0xa/0x20 ? mp_map_pin_to_irq+0xd3/0x2f0 ? acpi_register_gsi_ioapic+0x93/0x170 ? pci_conf1_read+0xa4/0x100 ? pci_bus_read_config_word+0x49/0x70 ? do_pci_enable_device+0xcc/0x100 local_pci_probe+0x41/0x90 work_for_cpu_fn+0x16/0x20 process_one_work+0x1a7/0x360 worker_thread+0x1cf/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x112/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x1f/0x40 The problem is that at that point misc IRQ vectors were not allocated yet and we get a call trace that driver is trying to free already free IRQ vectors. Add a check in i40e_clear_interrupt_scheme for __I40E_MISC_IRQ_REQUESTED PF state before calling i40e_free_misc_vector. This state is set only if misc IRQ vectors were properly initialized. Fixes: `c17401a1dd` ("i40e: use separate state bit for miscellaneous IRQ setup") Reported-by: PJ Waskiewicz <pwaskiewicz@jumptrading.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-06 09:49:34 -07:00
Jiri Benc	857b6c6f66	i40e: fix endless loop under rtnl The loop in i40e_get_capabilities can never end. The problem is that although i40e_aq_discover_capabilities returns with an error if there's a firmware problem, the returned error is not checked. There is a check for pf->hw.aq.asq_last_status but that value is set to I40E_AQ_RC_OK on most firmware problems. When i40e_aq_discover_capabilities encounters a firmware problem, it will encounter the same problem on its next invocation. As the result, the loop becomes endless. We hit this with I40E_ERR_ADMIN_QUEUE_TIMEOUT but looking at the code, it can happen with a range of other firmware errors. I don't know what the correct behavior should be: whether the firmware should be retried a few times, or whether pf->hw.aq.asq_last_status should be always set to the encountered firmware error (but then it would be pointless and can be just replaced by the i40e_aq_discover_capabilities return value). However, the current behavior with an endless loop under the rtnl mutex(!) is unacceptable and Intel has not submitted a fix, although we explained the bug to them 7 months ago. This may not be the best possible fix but it's better than hanging the whole system on a firmware bug. Fixes: `56a62fc868` ("i40e: init code and hardware support") Tested-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-06 09:49:23 -07:00
Jakub Kicinski	a05e4c0af4	ethernet: use eth_hw_addr_set() for dev->addr_len cases Convert all Ethernet drivers from memcpy(... dev->addr_len) to eth_hw_addr_set(): @@ expression dev, np; @@ - memcpy(dev->dev_addr, np, dev->addr_len) + eth_hw_addr_set(dev, np) In theory addr_len may not be ETH_ALEN, but we don't expect non-Ethernet devices to live under this directory, and only the following cases of setting addr_len exist: - cxgb4 for mgmt device, and the drivers which set it to ETH_ALEN: s2io, mlx4, vxge. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:16:48 +01:00
Jakub Kicinski	16be9a1634	ethernet: use eth_hw_addr_set() - casts eth_hw_addr_set() takes a u8 pointer, like other etherdevice helpers. Convert the few drivers which require casts because they memcpy from "endian marked" types. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-02 14:18:26 +01:00
Jakub Kicinski	f3956ebb3b	ethernet: use eth_hw_addr_set() instead of ether_addr_copy() Convert Ethernet from ether_addr_copy() to eth_hw_addr_set(): @@ expression dev, np; @@ - ether_addr_copy(dev->dev_addr, np) + eth_hw_addr_set(dev, np) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-02 14:18:25 +01:00
Jakub Kicinski	6b7b0c3091	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2021-10-02 We've added 85 non-merge commits during the last 15 day(s) which contain a total of 132 files changed, 13779 insertions(+), 6724 deletions(-). The main changes are: 1) Massive update on test_bpf.ko coverage for JITs as preparatory work for an upcoming MIPS eBPF JIT, from Johan Almbladh. 2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool, with driver support for i40e and ice from Magnus Karlsson. 3) Add legacy uprobe support to libbpf to complement recently merged legacy kprobe support, from Andrii Nakryiko. 4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky. 5) Support saving the register state in verifier when spilling <8byte bounded scalar to the stack, from Martin Lau. 6) Add libbpf opt-in for stricter BPF program section name handling as part of libbpf 1.0 effort, from Andrii Nakryiko. 7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov. 8) Fix skel_internal.h to propagate errno if the loader indicates an internal error, from Kumar Kartikeya Dwivedi. 9) Fix build warnings with -Wcast-function-type so that the option can later be enabled by default for the kernel, from Kees Cook. 10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it otherwise errors out when encountering them, from Toke Høiland-Jørgensen. 11) Teach libbpf to recognize specialized maps (such as for perf RB) and internally remove BTF type IDs when creating them, from Hengqi Chen. 12) Various fixes and improvements to BPF selftests. ==================== Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-01 19:58:02 -07:00
Jakub Kicinski	dd9a887b35	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/phy/bcm7xxx.c `d88fd1b546` ("net: phy: bcm7xxx: Fixed indirect MMD operations") `f68d08c437` ("net: phy: bcm7xxx: Add EPHY entry for 72165") net/sched/sch_api.c `b193e15ac6` ("net: prevent user from passing illegal stab size") `69508d4333` ("net_sched: Use struct_size() and flex_array_size() helpers") Both cases trivial - adjacent code additions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-09-30 14:49:21 -07:00
Jason Xing	4fe815850b	ixgbe: let the xdpdrv work with more than 64 cpus Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the server is equipped with more than 64 cpus online. So it turns out that the loading of xdpdrv causes the "NOMEM" failure. Actually, we can adjust the algorithm and then make it work through mapping the current cpu to some xdp ring with the protect of @tx_lock. Here are some numbers before/after applying this patch with xdp-example loaded on the eth0X: As client (tx path): Before After TCP_STREAM send-64 734.14 714.20 TCP_STREAM send-128 1401.91 1395.05 TCP_STREAM send-512 5311.67 5292.84 TCP_STREAM send-1k 9277.40 9356.22 (not stable) TCP_RR send-1 22559.75 21844.22 TCP_RR send-128 23169.54 22725.13 TCP_RR send-512 21670.91 21412.56 As server (rx path): Before After TCP_STREAM send-64 1416.49 1383.12 TCP_STREAM send-128 3141.49 3055.50 TCP_STREAM send-512 9488.73 9487.44 TCP_STREAM send-1k 9491.17 9356.22 (not stable) TCP_RR send-1 23617.74 23601.60 ... Notice: the TCP_RR mode is unstable as the official document explains. I tested many times with different parameters combined through netperf. Though the result is not that accurate, I cannot see much influence on this patch. The static key is places on the hot path, but it actually shouldn't cause a huge regression theoretically. Co-developed-by: Shujin Li <lishujin@kuaishou.com> Signed-off-by: Shujin Li <lishujin@kuaishou.com> Signed-off-by: Jason Xing <xingwanli@kuaishou.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-30 13:38:08 +01:00
Feng Zhou	513e605d7a	ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup The ixgbe driver currently generates a NULL pointer dereference with some machine (online cpus < 63). This is due to the fact that the maximum value of num_xdp_queues is nr_cpu_ids. Code is in "ixgbe_set_rss_queues"". Here's how the problem repeats itself: Some machine (online cpus < 63), And user set num_queues to 63 through ethtool. Code is in the "ixgbe_set_channels", adapter->ring_feature[RING_F_FDIR].limit = count; It becomes 63. When user use xdp, "ixgbe_set_rss_queues" will set queues num. adapter->num_rx_queues = rss_i; adapter->num_tx_queues = rss_i; adapter->num_xdp_queues = ixgbe_xdp_queues(adapter); And rss_i's value is from f = &adapter->ring_feature[RING_F_FDIR]; rss_i = f->indices = f->limit; So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup", for (i = 0; i < adapter->num_rx_queues; i++) if (adapter->xdp_ring[i]->xsk_umem) It leads to panic. Call trace: [exception RIP: ixgbe_xdp+368] RIP: ffffffffc02a76a0 RSP: ffff9fe16202f8d0 RFLAGS: 00010297 RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000001c RDI: ffffffffa94ead90 RBP: ffff92f8f24c0c18 R8: 0000000000000000 R9: 0000000000000000 R10: ffff9fe16202f830 R11: 0000000000000000 R12: ffff92f8f24c0000 R13: ffff9fe16202fc01 R14: 000000000000000a R15: ffffffffc02a7530 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 7 [ffff9fe16202f8f0] dev_xdp_install at ffffffffa89fbbcc 8 [ffff9fe16202f920] dev_change_xdp_fd at ffffffffa8a08808 9 [ffff9fe16202f960] do_setlink at ffffffffa8a20235 10 [ffff9fe16202fa88] rtnl_setlink at ffffffffa8a20384 11 [ffff9fe16202fc78] rtnetlink_rcv_msg at ffffffffa8a1a8dd 12 [ffff9fe16202fcf0] netlink_rcv_skb at ffffffffa8a717eb 13 [ffff9fe16202fd40] netlink_unicast at ffffffffa8a70f88 14 [ffff9fe16202fd80] netlink_sendmsg at ffffffffa8a71319 15 [ffff9fe16202fdf0] sock_sendmsg at ffffffffa89df290 16 [ffff9fe16202fe08] __sys_sendto at ffffffffa89e19c8 17 [ffff9fe16202ff30] __x64_sys_sendto at ffffffffa89e1a64 18 [ffff9fe16202ff38] do_syscall_64 at ffffffffa84042b9 19 [ffff9fe16202ff50] entry_SYSCALL_64_after_hwframe at ffffffffa8c0008c So I fix ixgbe_max_channels so that it will not allow a setting of queues to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup, take the smaller value of num_rx_queues and num_xdp_queues. Fixes: `4a9b32f30f` ("ixgbe: fix potential RX buffer starvation for AF_XDP") Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-29 10:51:51 +01:00
Len Baker	30cba287eb	ice: Prefer kcalloc over open coded arithmetic As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. In this case this is not actually dynamic sizes: both sides of the multiplication are constant values. However it is best to refactor this anyway, just to keep the open-coded math idiom out of code. So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. [1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Len Baker <len.baker@gmx.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-09-28 09:42:04 -07:00
Jeff Guo	b37e4e94c1	ice: Fix macro name for IPv4 fragment flag In IPv4 header, fragment flags indicate whether the packet needs to be fragmented or not. The value 0x20 represents MF (More Fragment); fix the macro name to match this. Signed-off-by: Ting Xu <ting.xu@intel.com> Signed-off-by: Jeff Guo <jia.guo@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-09-28 09:42:04 -07:00
Jacob Keller	0128cc6e92	ice: refactor devlink getter/fallback functions to void After commit `a8f89fa277` ("ice: do not abort devlink info if board identifier can't be found"), the getter/fallback() functions no longer report an error. Convert the interface to a void so that it is no longer possible to add a version field that is fatal. This makes sense, because we should not fail to report other versions just because one of the version pieces could not be found. Finally, clean up the getter functions line wrapping so that none of them take more than 80 columns, as is the usual style for networking files. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-09-28 09:42:04 -07:00
Anirudh Venkataramanan	4fc5fbee5c	ice: Fix link mode handling The messaging for unsupported module detection is different for lenient mode and strict mode. Update the code to print the right messaging for a given link mode. Media topology conflict is not an error in lenient mode, so return an error code only if not in lenient mode. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-09-28 09:42:04 -07:00
Anirudh Venkataramanan	40b247608b	ice: Add feature bitmap, helpers and a check for DSCP DSCP a.k.a L3 QoS is only supported on certain devices. To enforce this, this patch introduces a bitmap of features and helper functions. The feature bitmap is set based on device IDs on driver init. Currently, DSCP is the only feature in this bitmap, but there will be more in the future. In the DCB netlink flow, check if the feature bit is set before exercising DSCP. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-09-28 09:42:04 -07:00
Dave Ertman	2a87bd73e5	ice: Add DSCP support Implement code to handle submission of APP TLV's containing DSCP to TC mapping. The first such mapping received on an interface will cause that PF to switch to L3 DSCP QoS mode, apply the default config for that mode, and apply the received mapping. Only one such mapping will be allowed per DSCP value, and when the last DSCP mapping is deleted, the PF will switch back into L2 VLAN QoS mode, applying the appropriate default QoS settings. L3 DSCP QoS mode will only be allowed in SW DCBx mode, in other words, when the FW LLDP engine is disabled. Commands that break this mutual exclusivity will be blocked. Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-09-28 09:42:04 -07:00
Magnus Karlsson	6aab0bb0c5	i40e: Use the xsk batched rx allocation interface Use the new xsk batched rx allocation interface for the zero-copy data path. As the array of struct xdp_buff pointers kept by the driver is really a ring that wraps, the allocation routine is modified to detect a wrap and in that case call the allocation function twice. The allocation function cannot deal with wrapped rings, only arrays. As we now know exactly how many buffers we get and that there is no wrapping, the allocation function can be simplified even more as all if-statements in the allocation loop can be removed, improving performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-6-magnus.karlsson@gmail.com	2021-09-28 00:18:35 +02:00
Magnus Karlsson	db804cfc21	ice: Use the xsk batched rx allocation interface Use the new xsk batched rx allocation interface for the zero-copy data path. As the array of struct xdp_buff pointers kept by the driver is really a ring that wraps, the allocation routine is modified to detect a wrap and in that case call the allocation function twice. The allocation function cannot deal with wrapped rings, only arrays. As we now know exactly how many buffers we get and that there is no wrapping, the allocation function can be simplified even more as all if-statements in the allocation loop can be removed, improving performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-5-magnus.karlsson@gmail.com	2021-09-28 00:18:35 +02:00
Magnus Karlsson	57f7f8b6bc	ice: Use xdp_buf instead of rx_buf for xsk zero-copy In order to use the new xsk batched buffer allocation interface, a pointer to an array of struct xsk_buff pointers need to be provided so that the function can put the result of the allocation there. In the ice driver, we already have a ring that stores pointers to xdp_buffs. This is only used for the xsk zero-copy driver and is a union with the structure that is used for the regular non zero-copy path. Unfortunately, that structure is larger than the xdp_buffs pointers which mean that there will be a stride (of 20 bytes) between each xdp_buff pointer. And feeding this into the xsk_buff_alloc_batch interface will not work since it assumes a regular array of xdp_buff pointers (each 8 bytes with 0 bytes in-between them on a 64-bit system). To fix this, remove the xdp_buff pointer from the rx_buf union and move it one step higher to the union above which only has pointers to arrays in it. This solves the problem and we can directly feed the SW ring of xdp_buff pointers straight into the allocation function in the next patch when that interface is used. This will improve performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-4-magnus.karlsson@gmail.com	2021-09-28 00:18:34 +02:00
Jacob Keller	51032e6f17	e100: fix buffer overrun in e100_get_regs The e100_get_regs function is used to implement a simple register dump for the e100 device. The data is broken into a couple of MAC control registers, and then a series of PHY registers, followed by a memory dump buffer. The total length of the register dump is defined as (1 + E100_PHY_REGS) * sizeof(u32) + sizeof(nic->mem->dump_buf). The logic for filling in the PHY registers uses a convoluted inverted count for loop which counts from E100_PHY_REGS (0x1C) down to 0, and assigns the slots 1 + E100_PHY_REGS - i. The first loop iteration will fill in [1] and the final loop iteration will fill in [1 + 0x1C]. This is actually one more than the supposed number of PHY registers. The memory dump buffer is then filled into the space at [2 + E100_PHY_REGS] which will cause that memcpy to assign 4 bytes past the total size. The end result is that we overrun the total buffer size allocated by the kernel, which could lead to a panic or other issues due to memory corruption. It is difficult to determine the actual total number of registers here. The only 8255x datasheet I could find indicates there are 28 total MDI registers. However, we're reading 29 here, and reading them in reverse! In addition, the ethtool e100 register dump interface appears to read the first PHY register to determine if the device is in MDI or MDIx mode. This doesn't appear to be documented anywhere within the 8255x datasheet. I can only assume it must be in register 28 (the extra register we're reading here). Lets not change any of the intended meaning of what we copy here. Just extend the space by 4 bytes to account for the extra register and continue copying the data out in the same order. Change the E100_PHY_REGS value to be the correct total (29) so that the total register dump size is calculated properly. Fix the offset for where we copy the dump buffer so that it doesn't overrun the total size. Re-write the for loop to use counting up instead of the convoluted down-counting. Correct the mdio_read offset to use the 0-based register offsets, but maintain the bizarre reverse ordering so that we have the ABI expected by applications like ethtool. This requires and additional subtraction of 1. It seems a bit odd but it makes the flow of assignment into the register buffer easier to follow. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-09-27 08:57:30 -07:00
Jacob Keller	4329c8dc11	e100: fix length calculation in e100_get_regs_len commit `abf9b90205` ("e100: cleanup unneeded math") tried to simplify e100_get_regs_len and remove a double 'divide and then multiply' calculation that the e100_reg_regs_len function did. This change broke the size calculation entirely as it failed to account for the fact that the numbered registers are actually 4 bytes wide and not 1 byte. This resulted in a significant under allocation of the register buffer used by e100_get_regs. Fix this by properly multiplying the register count by u32 first before adding the size of the dump buffer. Fixes: `abf9b90205` ("e100: cleanup unneeded math") Reported-by: Felicitas Hetzelt <felicitashetzelt@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-09-27 08:57:29 -07:00
Leon Romanovsky	838cefd5e5	ice: Open devlink when device is ready Move devlink_registration routine to be the last command, when the device is fully initialized. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-27 16:31:59 +01:00
Leon Romanovsky	2ff04286a9	ice: Delete always true check of PF pointer PF pointer is always valid when PCI core calls its .shutdown() and .remove() callbacks. There is no need to check it again. Fixes: `837f08fdec` ("ice: Add basic driver framework for Intel(R) E800 Series") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-24 14:12:57 +01:00
Jakub Kicinski	2fcd14d0f7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net net/mptcp/protocol.c `977d293e23` ("mptcp: ensure tx skbs always have the MPTCP ext") `efe686ffce` ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-09-23 11:19:49 -07:00
Leon Romanovsky	db4278c55f	devlink: Make devlink_register to be void devlink_register() can't fail and always returns success, but all drivers are obligated to check returned status anyway. This adds a lot of boilerplate code to handle impossible flow. Make devlink_register() void and simplify the drivers that use that API call. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Simon Horman <simon.horman@corigine.com> Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-22 14:15:12 +01:00
Randy Dunlap	8775851107	igc: fix build errors for PTP When IGC=y and PTP_1588_CLOCK=m, the ptp_() interface family is not available to the igc driver. Make this driver depend on PTP_1588_CLOCK_OPTIONAL so that it will build without errors. Various igc commits have used ptp_() functions without checking that PTP_1588_CLOCK is enabled. Fix all of these here. Fixes these build errors: ld: drivers/net/ethernet/intel/igc/igc_main.o: in function `igc_msix_other': igc_main.c:(.text+0x6494): undefined reference to `ptp_clock_event' ld: igc_main.c:(.text+0x64ef): undefined reference to `ptp_clock_event' ld: igc_main.c:(.text+0x6559): undefined reference to `ptp_clock_event' ld: drivers/net/ethernet/intel/igc/igc_ethtool.o: in function `igc_ethtool_get_ts_info': igc_ethtool.c:(.text+0xc7a): undefined reference to `ptp_clock_index' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_feature_enable_i225': igc_ptp.c:(.text+0x330): undefined reference to `ptp_find_pin' ld: igc_ptp.c:(.text+0x36f): undefined reference to `ptp_find_pin' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_init': igc_ptp.c:(.text+0x11cd): undefined reference to `ptp_clock_register' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_stop': igc_ptp.c:(.text+0x12dd): undefined reference to `ptp_clock_unregister' ld: drivers/platform/x86/dell/dell-wmi-privacy.o: in function `dell_privacy_wmi_probe': Fixes: `64433e5bf4` ("igc: Enable internal i225 PPS") Fixes: `60dbede0c4` ("igc: Add support for ethtool GET_TS_INFO command") Fixes: `87938851b6` ("igc: enable auxiliary PHC functions for the i225") Fixes: `5f2958052c` ("igc: Add basic skeleton for PTP") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Ederson de Souza <ederson.desouza@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: intel-wired-lan@lists.osuosl.org Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-19 12:20:53 +01:00
Hao Chen	6042d4348a	net: e1000e: solve insmod 'Unknown symbol mutex_lock' error After I turn on the CONFIG_LOCK_STAT=y, insmod e1000e.ko will report: [ 5.641579] e1000e: Unknown symbol mutex_lock (err -2) [ 90.775705] e1000e: Unknown symbol mutex_lock (err -2) [ 132.252339] e1000e: Unknown symbol mutex_lock (err -2) This problem fixed after include <linux/mutex.h>. Signed-off-by: Hao Chen <chenhaoa@uniontech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-17 11:11:25 +01:00
Paolo Abeni	40ee363c84	igc: fix tunnel offloading Checking tunnel offloading, it turns out that offloading doesn't work as expected. The following script allows to reproduce the issue. Call it as `testscript DEVICE LOCALIP REMOTEIP NETMASK' === SNIP === if [ $# -ne 4 ] then echo "Usage $0 DEVICE LOCALIP REMOTEIP NETMASK" exit 1 fi DEVICE="$1" LOCAL_ADDRESS="$2" REMOTE_ADDRESS="$3" NWMASK="$4" echo "Driver: $(ethtool -i ${DEVICE} \| awk '/^driver:/{print $2}') " ethtool -k "${DEVICE}" \| grep tx-udp echo echo "Set up NIC and tunnel..." ip addr add "${LOCAL_ADDRESS}/${NWMASK}" dev "${DEVICE}" ip link set "${DEVICE}" up sleep 2 ip link add vxlan1 type vxlan id 42 \ remote "${REMOTE_ADDRESS}" \ local "${LOCAL_ADDRESS}" \ dstport 0 \ dev "${DEVICE}" ip addr add fc00::1/64 dev vxlan1 ip link set vxlan1 up sleep 2 rm -f vxlan.pcap echo "Running tcpdump and iperf3..." ( nohup tcpdump -i any -w vxlan.pcap >/dev/null 2>&1 ) & sleep 2 iperf3 -c fc00::2 >/dev/null pkill tcpdump echo echo -n "Max. Paket Size: " tcpdump -r vxlan.pcap -nnle 2>/dev/null \ \| grep "${LOCAL_ADDRESS}.> ${REMOTE_ADDRESS}.OTV" \ \| awk '{print $8}' \| awk -F ':' '{print $1}' \ \| sort -n \| tail -1 echo ip link del vxlan1 ip addr del ${LOCAL_ADDRESS}/${NWMASK} dev "${DEVICE}" === SNAP === The expected outcome is Max. Paket Size: 64904 This is what you see on igb, the code igc has been taken from. However, on igc the output is Max. Paket Size: 1516 so the GSO aggregate packets are segmented by the kernel before calling igc_xmit_frame. Inside the subsequent call to igc_tso, the check for skb_is_gso(skb) fails and the function returns prematurely. It turns out that this occurs because the feature flags aren't set entirely correctly in igc_probe. In contrast to the original code from igb_probe, igc_probe neglects to set the flags required to allow tunnel offloading. Setting the same flags as igb fixes the issue on igc. Fixes: `34428dff36` ("igc: Add GSO partial support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Corinna Vinschen <vinschen@redhat.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-16 14:29:58 +01:00
Dave Ertman	bfe8443509	ice: Correctly deal with PFs that do not support RDMA There are two cases where the current PF does not support RDMA functionality. The first is if the NVM loaded on the device is set to not support RDMA (common_caps.rdma is false). The second is if the kernel bonding driver has included the current PF in an active link aggregate. When the driver has determined that this PF does not support RDMA, then auxiliary devices should not be created on the auxiliary bus. Without a device on the auxiliary bus, even if the irdma driver is present, there will be no RDMA activity attempted on this PF. Currently, in the reset flow, an attempt to create auxiliary devices is performed without regard to the ability of the PF. There needs to be a check in ice_aux_plug_dev (as the central point that creates auxiliary devices) to see if the PF is in a state to support the functionality. When disabling and re-enabling RDMA due to the inclusion/removal of the PF in a link aggregate, we also need to set/clear the bit which controls auxiliary device creation so that a reset recovery in a link aggregate situation doesn't try to create auxiliary devices when it shouldn't. Fixes: `f9f5301e7e` ("ice: Register auxiliary device to provide RDMA") Reported-by: Yongxin Liu <yongxin.liu@windriver.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-10 09:58:55 +01:00
Jakub Kicinski	29ce8f9701	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/linux/netdevice.h net/socket.c `d0efb16294` ("net: don't unconditionally copy_from_user a struct ifreq for socket ioctls") `876f0bf9d0` ("net: socket: simplify dev_ifconf handling") `29c4964822` ("net: socket: rework compat_ifreq_ioctl()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-31 09:06:04 -07:00
Brett Creeley	b357d9717b	ice: Only lock to update netdev dev_addr commit `3ba7f53f8b` ("ice: don't remove netdev->dev_addr from uc sync list") introduced calls to netif_addr_lock_bh() and netif_addr_unlock_bh() in the driver's ndo_set_mac() callback. This is fine since the driver is updated the netdev's dev_addr, but since this is a spinlock, the driver cannot sleep when the lock is held. Unfortunately the functions to add/delete MAC filters depend on a mutex. This was causing a trace with the lock debug kernel config options enabled when changing the mac address via iproute. [ 203.273059] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 203.273065] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6698, name: ip [ 203.273068] Preemption disabled at: [ 203.273068] [<ffffffffc04aaeab>] ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273097] CPU: 31 PID: 6698 Comm: ip Tainted: G S W I 5.14.0-rc4 #2 [ 203.273100] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ 203.273102] Call Trace: [ 203.273107] dump_stack_lvl+0x33/0x42 [ 203.273113] ? ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273124] ___might_sleep.cold.150+0xda/0xea [ 203.273131] mutex_lock+0x1c/0x40 [ 203.273136] ice_remove_mac+0xe3/0x180 [ice] [ 203.273155] ? ice_fltr_add_mac_list+0x20/0x20 [ice] [ 203.273175] ice_fltr_prepare_mac+0x43/0xa0 [ice] [ 203.273194] ice_set_mac_address+0xab/0x1c0 [ice] [ 203.273206] dev_set_mac_address+0xb8/0x120 [ 203.273210] dev_set_mac_address_user+0x2c/0x50 [ 203.273212] do_setlink+0x1dd/0x10e0 [ 203.273217] ? __nla_validate_parse+0x12d/0x1a0 [ 203.273221] __rtnl_newlink+0x530/0x910 [ 203.273224] ? __kmalloc_node_track_caller+0x17f/0x380 [ 203.273230] ? preempt_count_add+0x68/0xa0 [ 203.273236] ? _raw_spin_lock_irqsave+0x1f/0x30 [ 203.273241] ? kmem_cache_alloc_trace+0x4d/0x440 [ 203.273244] rtnl_newlink+0x43/0x60 [ 203.273245] rtnetlink_rcv_msg+0x13a/0x380 [ 203.273248] ? rtnl_calcit.isra.40+0x130/0x130 [ 203.273250] netlink_rcv_skb+0x4e/0x100 [ 203.273256] netlink_unicast+0x1a2/0x280 [ 203.273258] netlink_sendmsg+0x242/0x490 [ 203.273260] sock_sendmsg+0x58/0x60 [ 203.273263] ____sys_sendmsg+0x1ef/0x260 [ 203.273265] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273268] ? ____sys_recvmsg+0xe6/0x170 [ 203.273270] ___sys_sendmsg+0x7c/0xc0 [ 203.273272] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273274] ? ___sys_recvmsg+0x89/0xc0 [ 203.273276] ? __netlink_sendskb+0x50/0x50 [ 203.273278] ? mod_objcg_state+0xee/0x310 [ 203.273282] ? __dentry_kill+0x114/0x170 [ 203.273286] ? get_max_files+0x10/0x10 [ 203.273288] __sys_sendmsg+0x57/0xa0 [ 203.273290] do_syscall_64+0x37/0x80 [ 203.273295] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 203.273296] RIP: 0033:0x7f8edf96e278 [ 203.273298] Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 63 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 [ 203.273300] RSP: 002b:00007ffcb8bdac08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 203.273303] RAX: ffffffffffffffda RBX: 000000006115e0ae RCX: 00007f8edf96e278 [ 203.273304] RDX: 0000000000000000 RSI: 00007ffcb8bdac70 RDI: 0000000000000003 [ 203.273305] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007ffcb8bda5b0 [ 203.273306] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 203.273306] R13: 0000555e10092020 R14: 0000000000000000 R15: 0000000000000005 Fix this by only locking when changing the netdev->dev_addr. Also, make sure to restore the old netdev->dev_addr on any failures. Fixes: `3ba7f53f8b` ("ice: don't remove netdev->dev_addr from uc sync list") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 13:15:55 -07:00
Jacob Keller	9ee313433c	ice: restart periodic outputs around time changes When we enabled auxiliary input/output support for the E810 device, we forgot to add logic to restart the output when we change time. This is important as the periodic output will be incorrect after a time change otherwise. This unfortunately includes the adjust time function, even though it uses an atomic hardware interface. The atomic adjustment can still cause the pin output to stall permanently, so we need to stop and restart it. Introduce wrapper functions to temporarily disable and then re-enable the clock outputs. Fixes: `172db5f91d` ("ice: add support for auxiliary input/output pins") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha D Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 13:14:41 -07:00
Aravindhan Gunasekaran	1ab011b0bf	igc: Add support for CBS offloading Implement support for Credit-based shaper(CBS) Qdisc hardware offload mode in the driver. There are two sets of IEEE802.1Qav (CBS) HW logic in i225 controller and this patch supports enabling them in the top two priority TX queues. Driver implemented as recommended by Foxville External Architecture Specification v0.993. Idleslope and Hi-credit are the CBS tunable parameters for i225 NIC, programmed in TQAVCC and TQAVHC registers respectively. In-order for IEEE802.1Qav (CBS) algorithm to work as intended and provide BW reservation CBS should be enabled in highest priority queue first. If we enable CBS on any of low priority queues, the traffic in high priority queue does not allow low priority queue to be selected for transmission and bandwidth reservation is not guaranteed. Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 09:31:09 -07:00
Vinicius Costa Gomes	61572d5f8f	igc: Simplify TSN flags handling Separates the procedure done during reset from applying a configuration, knowing when the code is executing allow us to separate the better what changes the hardware state from what changes only the driver state. Introduces a flag for bookkeeping the driver state of TSN features. When Qav and frame-preemption is also implemented this flag makes it easier to keep track on whether a TSN feature driver state is enabled or not though controller state changes, say, during a reset. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 09:31:08 -07:00
Vinicius Costa Gomes	c814a2d2d4	igc: Use default cycle 'start' and 'end' values for queues Sets default values for each queue cycle start and cycle end. This allows some simplification in the handling of these configurations as most TSN features in i225 require a cycle to be configured. In i225, cycle start and end time is required to be programmed for CBS to work properly. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Mallikarjuna Chilakala <mallikarjuna.chilakala@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 09:31:08 -07:00
Jacob Keller	4dd0d5c33c	ice: add lock around Tx timestamp tracker flush The driver didn't take the lock while flushing the Tx tracker, which could cause a race where one thread is trying to read timestamps out while another thread is trying to read the tracker to check the timestamps. Avoid this by ensuring that flushing is locked against read accesses. Fixes: `ea9b847cda` ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 09:14:49 -07:00
Jacob Keller	1f0cbb3e89	ice: remove dead code for allocating pin_config We have code in the ice driver which allocates the pin_config structure if n_pins is > 0, but we never set n_pins to be greater than zero. There's no reason to keep this code until we actually have pin_config support. Remove this. We can re-add it properly when we implement support for pin_config for E810-T devices. Fixes: `172db5f91d` ("ice: add support for auxiliary input/output pins") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 09:11:39 -07:00
Jacob Keller	84c5fb8c42	ice: fix Tx queue iteration for Tx timestamp enablement The driver accidentally copied the ice_for_each_rxq iterator when implementing enablement of the ptp_tx bit for the Tx rings. We still load the Tx rings and set the ptp_tx field, but we iterate over the count of the num_rxq. If the number of Tx and Rx queues differ, this could either cause a buffer overrun when accessing the tx_rings list if num_txq is greater than num_rxq, or it could cause us to fail to enable Tx timestamps for some rings. This was not noticed originally as we generally have the same number of Tx and Rx queues. Fixes: `ea9b847cda` ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 08:38:55 -07:00
Jakub Kicinski	97c78d0af5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 17:57:57 -07:00
Vinicius Costa Gomes	a90ec84837	igc: Add support for PTP getcrosststamp() i225 supports PCIe Precision Time Measurement (PTM), allowing us to support the PTP_SYS_OFFSET_PRECISE ioctl() in the driver via the getcrosststamp() function. The easiest way to expose the PTM registers would be to configure the PTM dialogs to run periodically, but the PTP_SYS_OFFSET_PRECISE ioctl() semantics are more aligned to using a kind of "one-shot" way of retrieving the PTM timestamps. But this causes a bit more code to be written: the trigger registers for the PTM dialogs are not cleared automatically. i225 can be configured to send "fake" packets with the PTM information, adding support for handling these types of packets is left for the future. PTM improves the accuracy of time synchronization, for example, using phc2sys, while a simple application is sending packets as fast as possible. First, without .getcrosststamp(): phc2sys[191.382]: enp4s0 sys offset -959 s2 freq -454 delay 4492 phc2sys[191.482]: enp4s0 sys offset 798 s2 freq +1015 delay 4069 phc2sys[191.583]: enp4s0 sys offset 962 s2 freq +1418 delay 3849 phc2sys[191.683]: enp4s0 sys offset 924 s2 freq +1669 delay 3753 phc2sys[191.783]: enp4s0 sys offset 664 s2 freq +1686 delay 3349 phc2sys[191.883]: enp4s0 sys offset 218 s2 freq +1439 delay 2585 phc2sys[191.983]: enp4s0 sys offset 761 s2 freq +2048 delay 3750 phc2sys[192.083]: enp4s0 sys offset 756 s2 freq +2271 delay 4061 phc2sys[192.183]: enp4s0 sys offset 809 s2 freq +2551 delay 4384 phc2sys[192.283]: enp4s0 sys offset -108 s2 freq +1877 delay 2480 phc2sys[192.383]: enp4s0 sys offset -1145 s2 freq +807 delay 4438 phc2sys[192.484]: enp4s0 sys offset 571 s2 freq +2180 delay 3849 phc2sys[192.584]: enp4s0 sys offset 241 s2 freq +2021 delay 3389 phc2sys[192.684]: enp4s0 sys offset 405 s2 freq +2257 delay 3829 phc2sys[192.784]: enp4s0 sys offset 17 s2 freq +1991 delay 3273 phc2sys[192.884]: enp4s0 sys offset 152 s2 freq +2131 delay 3948 phc2sys[192.984]: enp4s0 sys offset -187 s2 freq +1837 delay 3162 phc2sys[193.084]: enp4s0 sys offset -1595 s2 freq +373 delay 4557 phc2sys[193.184]: enp4s0 sys offset 107 s2 freq +1597 delay 3740 phc2sys[193.284]: enp4s0 sys offset 199 s2 freq +1721 delay 4010 phc2sys[193.385]: enp4s0 sys offset -169 s2 freq +1413 delay 3701 phc2sys[193.485]: enp4s0 sys offset -47 s2 freq +1484 delay 3581 phc2sys[193.585]: enp4s0 sys offset -65 s2 freq +1452 delay 3778 phc2sys[193.685]: enp4s0 sys offset 95 s2 freq +1592 delay 3888 phc2sys[193.785]: enp4s0 sys offset 206 s2 freq +1732 delay 4445 phc2sys[193.885]: enp4s0 sys offset -652 s2 freq +936 delay 2521 phc2sys[193.985]: enp4s0 sys offset -203 s2 freq +1189 delay 3391 phc2sys[194.085]: enp4s0 sys offset -376 s2 freq +955 delay 2951 phc2sys[194.185]: enp4s0 sys offset -134 s2 freq +1084 delay 3330 phc2sys[194.285]: enp4s0 sys offset -22 s2 freq +1156 delay 3479 phc2sys[194.386]: enp4s0 sys offset 32 s2 freq +1204 delay 3602 phc2sys[194.486]: enp4s0 sys offset 122 s2 freq +1303 delay 3731 Statistics for this run (total of 2179 lines), in nanoseconds: average: -1.12 stdev: 634.80 max: 1551 min: -2215 With .getcrosststamp() via PCIe PTM: phc2sys[367.859]: enp4s0 sys offset 6 s2 freq +1727 delay 0 phc2sys[367.959]: enp4s0 sys offset -2 s2 freq +1721 delay 0 phc2sys[368.059]: enp4s0 sys offset 5 s2 freq +1727 delay 0 phc2sys[368.160]: enp4s0 sys offset -1 s2 freq +1723 delay 0 phc2sys[368.260]: enp4s0 sys offset -4 s2 freq +1719 delay 0 phc2sys[368.360]: enp4s0 sys offset -5 s2 freq +1717 delay 0 phc2sys[368.460]: enp4s0 sys offset 1 s2 freq +1722 delay 0 phc2sys[368.560]: enp4s0 sys offset -3 s2 freq +1718 delay 0 phc2sys[368.660]: enp4s0 sys offset 5 s2 freq +1725 delay 0 phc2sys[368.760]: enp4s0 sys offset -1 s2 freq +1721 delay 0 phc2sys[368.860]: enp4s0 sys offset 0 s2 freq +1721 delay 0 phc2sys[368.960]: enp4s0 sys offset 0 s2 freq +1721 delay 0 phc2sys[369.061]: enp4s0 sys offset 4 s2 freq +1725 delay 0 phc2sys[369.161]: enp4s0 sys offset 1 s2 freq +1724 delay 0 phc2sys[369.261]: enp4s0 sys offset 4 s2 freq +1727 delay 0 phc2sys[369.361]: enp4s0 sys offset 8 s2 freq +1732 delay 0 phc2sys[369.461]: enp4s0 sys offset 7 s2 freq +1733 delay 0 phc2sys[369.561]: enp4s0 sys offset 4 s2 freq +1733 delay 0 phc2sys[369.661]: enp4s0 sys offset 1 s2 freq +1731 delay 0 phc2sys[369.761]: enp4s0 sys offset 1 s2 freq +1731 delay 0 phc2sys[369.861]: enp4s0 sys offset -5 s2 freq +1725 delay 0 phc2sys[369.961]: enp4s0 sys offset -4 s2 freq +1725 delay 0 phc2sys[370.062]: enp4s0 sys offset 2 s2 freq +1730 delay 0 phc2sys[370.162]: enp4s0 sys offset -7 s2 freq +1721 delay 0 phc2sys[370.262]: enp4s0 sys offset -3 s2 freq +1723 delay 0 phc2sys[370.362]: enp4s0 sys offset 1 s2 freq +1726 delay 0 phc2sys[370.462]: enp4s0 sys offset -3 s2 freq +1723 delay 0 phc2sys[370.562]: enp4s0 sys offset -1 s2 freq +1724 delay 0 phc2sys[370.662]: enp4s0 sys offset -4 s2 freq +1720 delay 0 phc2sys[370.762]: enp4s0 sys offset -7 s2 freq +1716 delay 0 phc2sys[370.862]: enp4s0 sys offset -2 s2 freq +1719 delay 0 Statistics for this run (total of 2179 lines), in nanoseconds: average: 0.14 stdev: 5.03 max: 48 min: -27 For reference, the statistics for runs without PCIe congestion show that the improvements from enabling PTM are less dramatic. For two runs of 16466 entries: without PTM: avg -0.04 stdev 10.57 max 39 min -42 with PTM: avg 0.01 stdev 4.20 max 19 min -16 One possible explanation is that when PTM is not enabled, and there's a lot of traffic in the PCIe fabric, some register reads will take more time than the others because of congestion on the PCIe fabric. When PTM is enabled, even if the PTM dialogs take more time to complete under heavy traffic, the time measurements do not depend on the time to read the registers. This was implemented following the i225 EAS version 0.993. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-24 12:01:34 -07:00
Vinicius Costa Gomes	1b5d73fb86	igc: Enable PCIe PTM Enables PCIe PTM (Precision Time Measurement) support in the igc driver. Notifies the PCI devices that PCIe PTM should be enabled. PCIe PTM is similar protocol to PTP (Precision Time Protocol) running in the PCIe fabric, it allows devices to report time measurements from their internal clocks and the correlation with the PCIe root clock. The i225 NIC exposes some registers that expose those time measurements, those registers will be used, in later patches, to implement the PTP_SYS_OFFSET_PRECISE ioctl(). Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-24 11:49:21 -07:00
Yufeng Mo	f3ccfda193	ethtool: extend coalesce setting uAPI with CQE mode In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-24 07:38:29 -07:00
Sasha Neftin	4051f68318	e1000e: Do not take care about recovery NVM checksum On new platforms, the NVM is read-only. Attempting to update the NVM is causing a lockup to occur. Do not attempt to write to the NVM on platforms where it's not supported. Emit an error message when the NVM checksum is invalid. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213667 Fixes: `fb776f5d57` ("e1000e: Add support for Tiger Lake") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Suggested-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-20 08:38:01 -07:00
Sasha Neftin	44a13a5d99	e1000e: Fix the max snoop/no-snoop latency for 10M We should decode the latency and the max_latency before directly compare. The latency should be presented as lat_enc = scale x value: lat_enc_d = (lat_enc & 0x0x3ff) x (1U << (5*((max_ltr_enc & 0x1c00) >> 10))) Fixes: `cf8fb73c23` ("e1000e: add support for LTR on I217/I218") Suggested-by: Yee Li <seven.yi.lee@gmail.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-20 08:38:01 -07:00
Toshiki Nishioka	691bd4d776	igc: Use num_tx_queues when iterating over tx_ring queue Use num_tx_queues rather than the IGC_MAX_TX_QUEUES fixed number 4 when iterating over tx_ring queue since instantiated queue count could be less than 4 where on-line cpu count is less than 4. Fixes: `ec50a9d437` ("igc: Add support for taprio offloading") Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-20 08:37:49 -07:00

... 10 11 12 13 14 ...

7740 Commits