linux

Author	SHA1	Message	Date
Eric Dumazet	ba3bb0e76c	tcp: fix SO_RCVLOWAT possible hangs under high mem pressure Whenever tcp_try_rmem_schedule() returns an error, we are under trouble and should make sure to wakeup readers so that they can drain socket queues and eventually make room. Fixes: `03f45c883c` ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:46:04 -07:00
Danny Lin	b97e9d9d67	net: sched: Allow changing default qdisc to FQ-PIE Similar to fq_codel and the other qdiscs that can set as default, fq_pie is also suitable for general use without explicit configuration, which makes it a valid choice for this. This is useful in situations where a painless out-of-the-box solution for reducing bufferbloat is desired but fq_codel is not necessarily the best choice. For example, fq_pie can be better for DASH streaming, but there could be more cases where it's the better choice of the two simple AQMs available in the kernel. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:43:27 -07:00
Zhang Xiaoxu	9ffad9263b	cifs: Fix the target file was deleted when rename failed. When xfstest generic/035, we found the target file was deleted if the rename return -EACESS. In cifs_rename2, we unlink the positive target dentry if rename failed with EACESS or EEXIST, even if the target dentry is positived before rename. Then the existing file was deleted. We should just delete the target file which created during the rename. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:41:56 -05:00
David S. Miller	d8c8a96ce5	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2020-07-01 This series contains updates to all Intel drivers, but a majority of the changes are to the i40e driver. Jeff converts 'fall through' comments to the 'fallthrough;' keyword for all Intel drivers. Removed unnecessary delay in the ixgbe ethtool diagnostics test. Arkadiusz implements Total Port Shutdown for i40e. This is the revised patch based on Jakub's feedback from an earlier submission of this patch, where additional code comments and description was needed to describe the functionality. Wei Yongjun fixes return error code for iavf_init_get_resources(). Magnus optimizes XDP code in i40e; starting with AF_XDP zero-copy transmit completion path. Then by only executing a division when necessary in the napi_poll data path. Move the check for transmit ring full outside the send loop to increase performance. Ciara add XDP ring statistics to i40e and the ability to dump these statistics and descriptors. Tony fixes reporting iavf statistics. Radoslaw adds support for 2.5 and 5 Gbps by implementing the newer ethtool ksettings API in ixgbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:41:56 -07:00
Paul Aurich	5391b8e1b7	SMB3: Honor 'posix' flag for multiuser mounts The flag from the primary tcon needs to be copied into the volume info so that cifs_get_tcon will try to enable extensions on the per-user tcon. At that point, since posix extensions must have already been enabled on the superblock, don't try to needlessly adjust the mount flags. Fixes: `ce558b0e17` ("smb3: Add posix create context for smb3.11 posix mounts") Fixes: `b326614ea2` ("smb3: allow "posix" mount option to enable new SMB311 protocol extensions") Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:41:36 -05:00
Paul Aurich	6b356f6cf9	SMB3: Honor 'handletimeout' flag for multiuser mounts Fixes: `ca567eb2b3` ("SMB3: Allow persistent handle timeout to be configurable on mount") Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:40:33 -05:00
Paul Aurich	ad35f169db	SMB3: Honor lease disabling for multiuser mounts Fixes: `3e7a02d478` ("smb3: allow disabling requesting leases") Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:40:17 -05:00
Paul Aurich	00dfbc2f9c	SMB3: Honor persistent/resilient handle flags for multiuser mounts Without this: - persistent handles will only be enabled for per-user tcons if the server advertises the 'Continuous Availabity' capability - resilient handles would never be enabled for per-user tcons Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:40:06 -05:00
Paul Aurich	cc15461c73	SMB3: Honor 'seal' flag for multiuser mounts Ensure multiuser SMB3 mounts use encryption for all users' tcons if the mount options are configured to require encryption. Without this, only the primary tcon and IPC tcons are guaranteed to be encrypted. Per-user tcons would only be encrypted if the server was configured to require encryption. Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:38:46 -05:00
Willem de Bruijn	0da7536fb4	ip: Fix SO_MARK in RST, ACK and ICMP packets When no full socket is available, skbs are sent over a per-netns control socket. Its sk_mark is temporarily adjusted to match that of the real (request or timewait) socket or to reflect an incoming skb, so that the outgoing skb inherits this in __ip_make_skb. Introduction of the socket cookie mark field broke this. Now the skb is set through the cookie and cork: <caller> # init sockc.mark from sk_mark or cmsg ip_append_data ip_setup_cork # convert sockc.mark to cork mark ip_push_pending_frames ip_finish_skb __ip_make_skb # set skb->mark to cork mark But I missed these special control sockets. Update all callers of __ip(6)_make_skb that were originally missed. For IPv6, the same two icmp(v6) paths are affected. The third case is not, as commit `92e55f412c` ("tcp: don't annotate mark on control socket from tcp_v6_send_response()") replaced the ctl_sk->sk_mark with passing the mark field directly as a function argument. That commit predates the commit that introduced the bug. Fixes: `c6af0c227a` ("ip: support SO_MARK cmsg") Signed-off-by: Willem de Bruijn <willemb@google.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:38:30 -07:00
Paul Aurich	aadd69cad0	cifs: Display local UID details for SMB sessions in DebugData This is useful for distinguishing SMB sessions on a multiuser mount. Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:38:19 -05:00
Eric Dumazet	e114e1e8ac	tcp: md5: do not send silly options in SYNCOOKIES Whenever cookie_init_timestamp() has been used to encode ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK. Otherwise, tcp_synack_options() will still advertize options like WSCALE that we can not deduce later when receiving the packet from the client to complete 3WHS. Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet, but we can not know for sure that all TCP stacks have the same logic. Before the fix a tcpdump would exhibit this wrong exchange : 10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0 10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0 10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0 10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12 10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0 After this patch the exchange looks saner : 11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0 11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0 11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0 11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12 11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0 11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12 11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0 Of course, no SACK block will ever be added later, but nothing should break. Technically, we could remove the 4 nops included in MD5+TS options, but again some stacks could break seeing not conventional alignment. Fixes: `4957faade1` ("TCPCT part 1g: Responder Cookie => Initiator") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:36:23 -07:00
Rao Shoaib	9ef845f894	rds: If one path needs re-connection, check all and re-connect In testing with mprds enabled, Oracle Cluster nodes after reboot were not able to communicate with others nodes and so failed to rejoin the cluster. Peers with lower IP address initiated connection but the node could not respond as it choose a different path and could not initiate a connection as it had a higher IP address. With this patch, when a node sends out a packet and the selected path is down, all other paths are also checked and any down paths are re-connected. Reviewed-by: Ka-cheong Poon <ka-cheong.poon@oracle.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:35:17 -07:00
Eric Dumazet	e6ced831ef	tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers My prior fix went a bit too far, according to Herbert and Mathieu. Since we accept that concurrent TCP MD5 lookups might see inconsistent keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb() Clearing all key->key[] is needed to avoid possible KMSAN reports, if key->keylen is increased. Since tcp_md5_do_add() is not fast path, using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler. data_race() was added in linux-5.8 and will prevent KCSAN reports, this can safely be removed in stable backports, if data_race() is not yet backported. v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add() Fixes: `6a2febec33` ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Marco Elver <elver@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:29:45 -07:00
David S. Miller	11a20c7152	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2020-07-01 This series contains updates to the ice driver only. Jacob implements a devlink region for device capabilities. Bruce removes structs containing only one-element arrays that are either unused or only used for indexing. Instead, use pointer arithmetic or other indexing to access the elements. Converts "C struct hack" variable-length types to the preferred C99 flexible array member. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:25:00 -07:00
Bruce Allan	66486d8943	ice: replace single-element array used for C struct hack Convert the pre-C90-extension "C struct hack" method (using a single- element array at the end of a structure for implementing variable-length types) to the preferred use of C99 flexible array member. Additional code cleanups were done near areas affected by this change. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-01 16:35:23 -07:00
Bruce Allan	b3c3890489	ice: avoid unnecessary single-member variable-length structs There are a number of structures that consist of a one-element array as the only struct member. Some of those are unused so remove them. Others are used to index into a buffer/array consisting of a variable number of a different data or structure type. Those are unnecessary since we can use simple pointer arithmetic or index directly into the buffer to access individual elements of the buffer/array. Additional code cleanups were done near areas affected by this change. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-01 16:33:29 -07:00
Jarod Wilson	a3b658cfb6	bonding: allow xfrm offload setup post-module-load At the moment, bonding xfrm crypto offload can only be set up if the bonding module is loaded with active-backup mode already set. We need to be able to make this work with bonds set to AB after the bonding driver has already been loaded. So what's done here is: 1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used by both bond_main.c and bond_options.c 2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than only when loading in AB mode 3) wire up xfrmdev_ops universally too 4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB 5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a performance hit from traversing into the underlying drivers 5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call netdev_change_features() from bond_option_mode_set() In my local testing, I can change bonding modes back and forth on the fly, have hardware offload work when I'm in AB, and see no performance penalty to non-AB software encryption, despite having xfrm bits all wired up for all modes now. Fixes: `18cb261afd` ("bonding: support hardware encryption offload to slaves") Reported-by: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:53:32 -07:00
Sean Tranchetti	1e82a62fec	genetlink: remove genl_bind A potential deadlock can occur during registering or unregistering a new generic netlink family between the main nl_table_lock and the cb_lock where each thread wants the lock held by the other, as demonstrated below. 1) Thread 1 is performing a netlink_bind() operation on a socket. As part of this call, it will call netlink_lock_table(), incrementing the nl_table_users count to 1. 2) Thread 2 is registering (or unregistering) a genl_family via the genl_(un)register_family() API. The cb_lock semaphore will be taken for writing. 3) Thread 1 will call genl_bind() as part of the bind operation to handle subscribing to GENL multicast groups at the request of the user. It will attempt to take the cb_lock semaphore for reading, but it will fail and be scheduled away, waiting for Thread 2 to finish the write. 4) Thread 2 will call netlink_table_grab() during the (un)registration call. However, as Thread 1 has incremented nl_table_users, it will not be able to proceed, and both threads will be stuck waiting for the other. genl_bind() is a noop, unless a genl_family implements the mcast_bind() function to handle setting up family-specific multicast operations. Since no one in-tree uses this functionality as Cong pointed out, simply removing the genl_bind() function will remove the possibility for deadlock, as there is no attempt by Thread 1 above to take the cb_lock semaphore. Fixes: `c380d9a7af` ("genetlink: pass multicast bind/unbind to families") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Johannes Berg <johannes.berg@intel.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:49:11 -07:00
Jacob Keller	8d7aab3515	ice: implement snapshot for device capabilities Add a new devlink region used for capturing a snapshot of the device capabilities buffer which is reported by the firmware over the AdminQ. This information can useful in debugging driver and firmware interactions. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-01 15:41:45 -07:00
David S. Miller	651f8bd4da	Merge branch 'net-ipa-endpoint-configuration-updates' Alex Elder says: ==================== net: ipa: endpoint configuration updates This series updates code that configures IPA endpoints. The changes made mainly affect access to registers that are valid only for RX, or only for TX endpoints. The first three patches avoid writing endpoint registers if they are not defined to be valid. The fourth patch slightly modifies the parameters for the offset macros used for these endpoint registers, to make it explicit when only some endpoints are valid. The last patch just tweaks one line of code so it uses a convention used everywhere else in the driver. Version 2 of this series eliminates some of the "assert()" comments that Jakub inquired about. The ones removed will actually go away in an upcoming (not-yet-posted) patch series anyway. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:30:34 -07:00
Alex Elder	547c878854	net: ipa: HOL_BLOCK_EN_FMASK is a 1-bit mask The convention throughout the IPA driver is to directly use single-bit field mask values, rather than using (for example) u32_encode_bits() to set or clear them. Fix the one place that doesn't follow that convention, which sets HOL_BLOCK_EN_FMASK in ipa_endpoint_init_hol_block_enable(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:30:34 -07:00
Alex Elder	8b97bcb7bb	net: ipa: clarify endpoint register macro constraints A handful of registers are valid only for RX endpoints, and some others are valid only for TX endpoints. For these endpoints, add a comment above their defined offset macro that indicates the endpoints to which they apply. Extend the endpoint parameter naming convention as well, to make these constraints more explicit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:30:34 -07:00
Alex Elder	00b9102afa	net: ipa: mode register is TX only The INIT_MODE endpoint configuration register is only valid for TX endpoints. Rather than writing a zero to that register for RX endpoints, avoid writing the register at all. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:30:34 -07:00
Alex Elder	9b63f09378	net: ipa: metadata_mask register is RX only The INIT_HDR_METADATA_MASK endpoint configuration register is only valid for RX endpoints. Rather than writing a zero to that register for TX endpoints, avoid writing the register at all. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:30:34 -07:00
Alex Elder	f8d34dfdf3	net: ipa: head-of-line block registers are RX only The INIT_HOL_BLOCK_EN and INIT_HOL_BLOCK_TIMER endpoint registers are only valid for RX endpoints. Have ipa_endpoint_modem_hol_block_clear_all() skip writing these registers for TX endpoints. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:30:33 -07:00
Fabio Estevam	0115e6c98c	dt-bindings: clock: imx: Fix e-mail address The freescale.com domain is gone for quite some time. Use the nxp.com domain instead. Signed-off-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/r/20200701005346.1008-1-festevam@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>	2020-07-01 16:29:11 -06:00
David S. Miller	21ddff5c95	Merge branch 'net-ipa-small-improvements' Alex Elder says: ==================== net: ipa: small improvements This series contains two patches that improve the error output that's reported when an error occurs while changing the state of a GSI channel or event ring. The first ensures all such error conditions report an error, and the second simplifies the messages a little and ensures they are all consistent. A third (independent) patch gets rid of an unused symbol in the microcontroller code. Version 2 fixes two alignment problems pointed out by checkpatch.pl, as requested by Jakub Kicinski. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:29:07 -07:00
Alex Elder	722208ea3e	net: ipa: kill IPA_MEM_UC_OFFSET The microcontroller shared memory area is at the beginning of the IPA resident memory. IPA_MEM_UC_OFFSET was defined as the offset within that region where it's found, but it's 0, and it's never actually used. Just get rid of the definition, and move some of the description it had to be above the definition of the ipa_uc_mem_area structure. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:29:07 -07:00
Alex Elder	8463488af4	net: ipa: standarize more GSI error messages Make minor updates to error messages reported in "gsi.c": - Use local variables to reduce multi-line function calls - Don't use parentheses in messages - Do some slight rewording in a few cases Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:29:07 -07:00
Alex Elder	a442b3c755	net: ipa: always report GSI state errors We check the state of an event ring or channel both before and after any GSI command issued that will change that state. In most--but not all--cases, if the state is something different than expected we report an error message. Add error messages where missing, so that all unexpected states provide information about what went wrong. Drop the parentheses around the state value shown in all cases. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:29:07 -07:00
David S. Miller	6f6746d7ba	Merge branch 'net-ipa-simple-refactorizations' Alex Elder says: ==================== net: ipa: simple refactorizations This series makes three small changes to some endpoint configuration code. The first uses a constant to represent the frequency of an internal clock used for timers in the IPA. The second modifies a limit used so it matches Qualcomm's internal code. And the third reworks a few lines of code, eliminating a multi-line function call. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:27:09 -07:00
Alex Elder	9e88cb5ff7	net: ipa: reuse a local variable in ipa_endpoint_init_aggr() Reuse the "limit" local variable in ipa_endpoint_init_aggr() when setting the aggregation size limit. Simple cleanup. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:27:09 -07:00
Alex Elder	1d86652b13	net: ipa: reduce aggregation time limit Halve the time limit used when aggregation is enabled on an RX endpoint, to half a millisecond. Use DIV_ROUND_CLOSEST() to compute the value that represents the time period, to get better accuracy in the event the time limit is not an even multiple of the granularity. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:27:09 -07:00
Alex Elder	317a5740b7	net: ipa: rework ipa_aggr_granularity_val() The timer used for aggregation makes use of an internal 32 KHz clock. The granularity of the timer is programmed by a field whose value is computed by ipa_aggr_granularity_val(). Redefine the way that value is computed by using a new TIMER_FREQUENCY constant representing the underlying clock frequency. Add two BUILD_BUG_ON() calls to ensure the value used is valid. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:27:09 -07:00
David S. Miller	8c96439724	Merge branch 'add-XDP-support-to-xen-netfront' Denis Kirjanov says: ==================== xen networking: add XDP support to xen-netfront The first patch adds a new extra type to enable proper synchronization between an RX request/response pair. The second patch implements BFP interface for xen-netfront. The third patch enables extra space for XDP processing. v14: - fixed compilation warnings v13: - fixed compilation due to previous rename v12: - xen-netback: rename netfront_xdp_headroom to xdp_headroom v11: - add the new headroom constant to netif.h - xenbus_scanf check - lock a bulk of puckets in xennet_xdp_xmit() v10: - add a new xen_netif_extra_info type to enable proper synchronization between an RX request/response pair. - order local variable declarations v9: - assign an xdp program before switching to Reconfiguring - minor cleanups - address checkpatch issues v8: - add PAGE_POOL config dependency - keep the state of XDP processing in netfront_xdp_enabled - fixed allocator type in xdp_rxq_info_reg_mem_model() - minor cleanups in xen-netback v7: - use page_pool_dev_alloc_pages() on page allocation - remove the leftover break statement from netback_changed v6: - added the missing SOB line - fixed subject v5: - split netfront/netback changes - added a sync point between backend/frontend on switching to XDP - added pagepool API v4: - added verbose patch descriprion - don't expose the XDP headroom offset to the domU guest - add a modparam to netback to toggle XDP offset - don't process jumbo frames for now v3: - added XDP_TX support (tested with xdping echoserver) - added XDP_REDIRECT support (tested with modified xdp_redirect_kern) - moved xdp negotiation to xen-netback v2: - avoid data copying while passing to XDP - tell xen-netback that we need the headroom space ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:25:14 -07:00
Denis Kirjanov	1c9535c701	xen networking: add XDP offset adjustment to xen-netback the patch basically adds the offset adjustment and netfront state reading to make XDP work on netfront side. Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:25:14 -07:00
Denis Kirjanov	6c5aa6fc4d	xen networking: add basic XDP support for xen-netfront The patch adds a basic XDP processing to xen-netfront driver. We ran an XDP program for an RX response received from netback driver. Also we request xen-netback to adjust data offset for bpf_xdp_adjust_head() header space for custom headers. synchronization between frontend and backend parts is done by using xenbus state switching: Reconfiguring -> Reconfigured- > Connected UDP packets drop rate using xdp program is around 310 kpps using ./pktgen_sample04_many_flows.sh and 160 kpps without the patch. Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:25:14 -07:00
Denis Kirjanov	2cef30d7bd	xen: netif.h: add a new extra type for XDP The patch adds a new extra type to be able to diffirentiate between RX responses on xen-netfront side with the adjusted offset required for XDP processing. The offset value from a guest is passed via xenstore. Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:25:14 -07:00
Alexei Starovoitov	91f77560e4	Merge branch 'test_progs-improvements' Jesper Dangaard Brouer says: ==================== V3: Reorder patches to cause less code churn. The BPF selftest 'test_progs' contains many tests, that cover all the different areas of the kernel where BPF is used. The CI system sees this as one test, which is impractical for identifying what team/engineer is responsible for debugging the problem. This patchset add some options that makes it easier to create a shell for-loop that invoke each (top-level) test avail in test_progs. Then each test FAIL/PASS result can be presented the CI system to have a separate bullet. (For Red Hat use-case in Beaker https://beaker-project.org/) Created a public script[1] that uses these features in an advanced way. Demonstrating howto reduce the number of (top-level) tests by grouping tests together via using the existing test pattern selection feature, and then using the new --list feature combined with exclude (-b) to get a list of remaining test names that was not part of the groups. [1] https://github.com/netoptimizer/prototype-kernel/blob/master/scripts/bpf_selftests_grouping.sh ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-07-01 15:22:13 -07:00
Jesper Dangaard Brouer	c1f1f3656e	selftests/bpf: Test_progs option for listing test names The program test_progs have some very useful ability to specify a list of test name substrings for selecting which tests to run. This patch add the ability to list the selected test names without running them. This is practical for seeing which tests gets selected with given select arguments (which can also contain a exclude list via --name-blacklist). This output can also be used by shell-scripts in a for-loop: for N in $(./test_progs --list -t xdp); do \ ./test_progs -t $N 2>&1 > result_test_${N}.log & \ done ; wait This features can also be used for looking up a test number and returning a testname. If the selection was empty then a shell EXIT_FAILURE is returned. This is useful for scripting. e.g. like this: n=1; while [ $(./test_progs --list -n $n) ] ; do \ ./test_progs -n $n ; n=$(( n+1 )); \ done Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159363985751.930467.9610992940793316982.stgit@firesoul	2020-07-01 15:22:13 -07:00
Jesper Dangaard Brouer	643e7233aa	selftests/bpf: Test_progs option for getting number of tests It can be practial to get the number of tests that test_progs contain. This could for example be used to create a shell for-loop construct that runs the individual tests. Like: for N in $(seq 1 $(./test_progs -c)); do ./test_progs -n $N 2>&1 > result_test_${N}.log & done ; wait V2: Add the ability to return the count for the selected tests. This is useful for getting a count e.g. after excluding some tests with option -b. The current beakers test script like to report the max test count upfront. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159363985244.930467.12617117873058936829.stgit@firesoul	2020-07-01 15:22:13 -07:00
Jesper Dangaard Brouer	6c92bd5cd4	selftests/bpf: Test_progs indicate to shell on non-actions When a user selects a non-existing test the summary is printed with indication 0 for all info types, and shell "success" (EXIT_SUCCESS) is indicated. This can be understood by a human end-user, but for shell scripting is it useful to indicate a shell failure (EXIT_FAILURE). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159363984736.930467.17956007131403952343.stgit@firesoul	2020-07-01 15:22:13 -07:00
Andrii Nakryiko	17bbf925c6	tools/bpftool: Turn off -Wnested-externs warning Turn off -Wnested-externs to avoid annoying warnings in BUILD_BUG_ON macro when compiling bpftool: In file included from /data/users/andriin/linux/tools/include/linux/build_bug.h:5, from /data/users/andriin/linux/tools/include/linux/kernel.h:8, from /data/users/andriin/linux/kernel/bpf/disasm.h:10, from /data/users/andriin/linux/kernel/bpf/disasm.c:8: /data/users/andriin/linux/kernel/bpf/disasm.c: In function ‘__func_get_name’: /data/users/andriin/linux/tools/include/linux/compiler.h:37:38: warning: nested extern declaration of ‘__compiletime_assert_0’ [-Wnested-externs] _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^~~~~~~~~~~~~~~~~~~~~ /data/users/andriin/linux/tools/include/linux/compiler.h:16:15: note: in definition of macro ‘__compiletime_assert’ extern void prefix ## suffix(void) __compiletime_error(msg); \ ^~~~~~ /data/users/andriin/linux/tools/include/linux/compiler.h:37:2: note: in expansion of macro ‘_compiletime_assert’ _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^~~~~~~~~~~~~~~~~~~ /data/users/andriin/linux/tools/include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’ #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~~~~~~~~~~~~~~~~~ /data/users/andriin/linux/tools/include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’ BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~~~~~~~~~~~~~ /data/users/andriin/linux/kernel/bpf/disasm.c:20:2: note: in expansion of macro ‘BUILD_BUG_ON’ BUILD_BUG_ON(ARRAY_SIZE(func_id_str) != __BPF_FUNC_MAX_ID); ^~~~~~~~~~~~ Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200701212816.2072340-1-andriin@fb.com	2020-07-02 00:18:28 +02:00
Hao Luo	8d821b5db7	selftests/bpf: Switch test_vmlinux to use hrtimer_range_start_ns. The test_vmlinux test uses hrtimer_nanosleep as hook to test tracing programs. But in a kernel built by clang, which performs more aggresive inlining, that function gets inlined into its caller SyS_nanosleep. Therefore, even though fentry and kprobe do hook on the function, they aren't triggered by the call to nanosleep in the test. A possible fix is switching to use a function that is less likely to be inlined, such as hrtimer_range_start_ns. The EXPORT_SYMBOL functions shouldn't be inlined based on the description of [1], therefore safe to use for this test. Also the arguments of this function include the duration of sleep, therefore suitable for test verification. [1] `af3b56289b` time: don't inline EXPORT_SYMBOL functions Tested: In a clang build kernel, before this change, the test fails: test_vmlinux:PASS:skel_open 0 nsec test_vmlinux:PASS:skel_attach 0 nsec test_vmlinux:PASS:tp 0 nsec test_vmlinux:PASS:raw_tp 0 nsec test_vmlinux:PASS:tp_btf 0 nsec test_vmlinux:FAIL:kprobe not called test_vmlinux:FAIL:fentry not called After switching to hrtimer_range_start_ns, the test passes: test_vmlinux:PASS:skel_open 0 nsec test_vmlinux:PASS:skel_attach 0 nsec test_vmlinux:PASS:tp 0 nsec test_vmlinux:PASS:raw_tp 0 nsec test_vmlinux:PASS:tp_btf 0 nsec test_vmlinux:PASS:kprobe 0 nsec test_vmlinux:PASS:fentry 0 nsec Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200701175315.1161242-1-haoluo@google.com	2020-07-01 15:10:27 -07:00
Radoslaw Tyl	a296d665ea	ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps support Added full support for new version Ethtool API. New API allow use 2500Gbase-T and 5000base-T supported and advertised link speed modes. Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-01 15:04:54 -07:00
Jeff Kirsher	bb0967c04e	ixgbe: Cleanup unneeded delay in ethtool test There is a 4 seconds delay in ixgbe_diag_test() that is holding up other ioctls such as SIOCGIFCONF that Oracle database applications use. One of Oracle's product runs "ethtool -t ethX online" periodically for system monitoring and that is impacting database applications that use SIOCGIFCONF at that same time. This 4 second delay was needed in out early 1GbE parts to give the PHY time to recover from a reset. This code was carried forward to the 10 GbE driver even it was not needed for the supported PHYs in the ixgbe driver. CC: Aleksandr Loktionov <aleksandr.loktionov@intel.com> CC: Jack Vogel <jack.vogel@oracle.com> Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-01 14:47:58 -07:00
Tony Nguyen	9358076642	iavf: Fix updating statistics Commit `bac8486116` ("iavf: Refactor the watchdog state machine") inverted the logic for when to update statistics. Statistics should be updated when no other commands are pending, instead they were only requested when a command was processed. iavf_request_stats() would see a pending request and not request statistics to be updated. This caused statistics to never be updated; fix the logic. Fixes: `bac8486116` ("iavf: Refactor the watchdog state machine") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>	2020-07-01 14:45:59 -07:00
Ciara Loftus	44ea803e2f	i40e: introduce new dump desc XDP command Interfaces already exist for dumping Rx and Tx descriptor information. Introduce another for doing the same for XDP descriptors. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-01 14:44:17 -07:00
Ciara Loftus	890c402c7b	i40e: add XDP ring statistics to dump VSI debug output Prior to this, only the Rx and Tx ring statistics were dumped. The XDP ring statistics are now dumped as well. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-01 14:40:07 -07:00

... 11 12 13 14 15 ...

935158 Commits