Commit Graph

1071697 Commits

Author SHA1 Message Date
David S. Miller
03c82e80ec Merge branch 'octeontx2-af-fixes'
Subbaraya Sundeep says:

====================
octeontx-af2: Fixes for CN10K and CN9xxx platforms

This patchset has consolidated fixes in Octeontx2 driver
handling CN10K and CN9xxx platforms. When testing the
new CN10K hardware some issues resurfaced like accessing
wrong register for CN10K and enabling loopback on not supported
interfaces. Some fixes are needed for CN9xxx platforms as well.

Below is the description of patches

Patch 1: AF sets RX RSS action for all the VFs when a VF is
brought up. But when a PF sets RX action for its VF like Drop/Direct
to a queue in ntuple filter it is not retained because of AF fixup.
This patch skips modifying VF RX RSS action if PF has already
set its action.

Patch 2: When configuring backpressure wrong register is being read for
LBKs hence fixed it.

Patch 3: Some RVU blocks may take longer time to reset but are guaranteed
to complete the reset. Hence wait till reset is complete.

Patch 4: For enabling LMAC CN10K needs another register compared
to CN9xxx platforms. Hence changed it.

Patch 5: Adds missing barrier before submitting memory pointer
to the aura hardware.

Patch 6: Increase polling time while link credit restore and also
return proper error code when timeout occurs.

Patch 7: Internal loopback not supported on LPCS interfaces like
SGMII/QSGMII so do not enable it.

Patch 8: When there is a error in message processing, AF sets the error
response and replies back to requestor. PF forwards a invalid message to
VF back if AF reply has error in it. This way VF lacks the actual error set
by AF for its message. This is changed such that PF simply forwards the
actual reply and let VF handle the error.

Patch 9: ntuple filter with "flow-type ether proto 0x8842 vlan 0x92e"
was not working since ethertype 0x8842 is NGIO protocol. Hardware
parser explicitly parses such NGIO packets and sets the packet as
NGIO and do not set it as tagged packet. Fix this by changing parser
such that it sets the packet as both NGIO and tagged by using
separate layer types.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:21 +00:00
Kiran Kumar K
745166fcf0 octeontx2-af: Add KPU changes to parse NGIO as separate layer
With current KPU profile NGIO is being parsed along with CTAG as
a single layer. Because of this MCAM/ntuple rules installed with
ethertype as 0x8842 are not being hit. Adding KPU profile changes
to parse NGIO in separate ltype and CTAG in separate ltype.

Fixes: f9c49be90c ("octeontx2-af: Update the default KPU profile and fixes")
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:21 +00:00
Subbaraya Sundeep
a8db854be2 octeontx2-pf: Forward error codes to VF
PF forwards its VF messages to AF and corresponding
replies from AF to VF. AF sets proper error code in the
replies after processing message requests. Currently PF
checks the error codes in replies and sends invalid
message to VF. This way VF lacks the information of
error code set by AF for its messages. This patch
changes that such that PF simply forwards AF replies
so that VF can handle error codes.

Fixes: d424b6c024 ("octeontx2-pf: Enable SRIOV and added VF mbox handling")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:21 +00:00
Geetha sowjanya
df66b6ebc5 octeontx2-af: cn10k: Do not enable RPM loopback for LPC interfaces
Internal looback is not supported to low rate LPCS interface like
SGMII/QSGMII. Hence don't allow to enable for such interfaces.

Fixes: 3ad3f8f93c ("octeontx2-af: cn10k: MAC internal loopback support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:20 +00:00
Geetha sowjanya
1581d61b42 octeontx2-af: Increase link credit restore polling timeout
It's been observed that sometimes link credit restore takes
a lot of time than the current timeout. This patch increases
the default timeout value and return the proper error value
on failure.

Fixes: 1c74b89171 ("octeontx2-af: Wait for TX link idle for credits change")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:20 +00:00
Geetha sowjanya
c5d731c54a octeontx2-pf: cn10k: Ensure valid pointers are freed to aura
While freeing SQB pointers to aura, driver first memcpy to
target address and then triggers lmtst operation to free pointer
to the aura. We need to ensure(by adding dmb barrier)that memcpy
is finished before pointers are freed to the aura. This patch also
adds the missing sq context structure entry in debugfs.

Fixes: ef6c8da71e ("octeontx2-pf: cn10K: Reserve LMTST lines per core")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:20 +00:00
Geetha sowjanya
fae80edeaf octeontx2-af: cn10k: Use appropriate register for LMAC enable
CN10K platforms uses RPM(0..2)_MTI_MAC100(0..3)_COMMAND_CONFIG
register for lmac TX/RX enable whereas CN9xxx platforms use
CGX_CMRX_CONFIG register. This config change was missed when
adding support for CN10K RPM.

Fixes: 91c6945ea1 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:20 +00:00
Geetha sowjanya
03ffbc9914 octeontx2-af: Retry until RVU block reset complete
Few RVU blocks like SSO require more time for reset on some
silicons. Hence retrying the block reset until success.

Fixes: c0fa2cff88 ("octeontx2-af: Handle return value in block reset")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:20 +00:00
Sunil Goutham
00bfe94e38 octeontx2-af: Fix LBK backpressure id count
In rvu_nix_get_bpid() lbk_bpid_cnt is being read from
wrong register. Due to this backpressure enable is failing
for LBK VF32 onwards. This patch fixes that.

Fixes: fe1939bb23 ("octeontx2-af: Add SDP interface support")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:20 +00:00
Subbaraya Sundeep
d225c449ab octeontx2-af: Do not fixup all VF action entries
AF modifies all the rules destined for VF to use
the action same as default RSS action. This fixup
was needed because AF only installs default rules with
RSS action. But the action in rules installed by a PF
for its VFs should not be changed by this fixup.
This is because action can be drop or direct to
queue as specified by user(ntuple filters).
This patch fixes that problem.

Fixes: 967db3529e ("octeontx2-af: add support for multicast/promisc packet")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 14:32:20 +00:00
David S. Miller
67ab55956e wireless fixes for v5.17
First set of fixes for v5.17. This is the first pull request from the
 new wireless tree and only changes to MAINTAINERS file.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmHqk/ERHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZuozQf/b2QiYq9qNqWLhyo9JjwN0LG3Z6PxYJUr
 WFhvWOp+Hrhl+dlRGms6W2RUk6SuAAkzU6D/zWQ7Pw1UIQRTkU34Y5kgZp3e3RW8
 yUMOH1Cb/97hc/cxhY1LNloKbVTG6lZO1Cc7hGnqazCWGj5sr5S9N+IDtN2yeuDw
 Fe37aEp/eI7tj2Y9D0V1KWINfX92a+dGGKLkBc/UForMFMOT2/fiejQLEAJTKTMv
 3+uCmclwItgzwxUCDl4IVt1eOA2AMmGP5izHTzXbX2k6ebdSzeToAFsC/KFT8kcv
 Aql+B7uDBnschsQYr5AW2e+g6vANMy7Hp3xaDWAtgp/H1TsdktyM6Q==
 =RWrm
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2022-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v5.17

First set of fixes for v5.17. This is the first pull request from the
new wireless tree and only changes to MAINTAINERS file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 11:23:33 +00:00
David S. Miller
0b6d8cf2ec Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-01-20

This series contains updates to i40e driver only.

Jedrzej increases delay for EMP reset and adds checks to ensure a VF
request to change queues can be met.

Sylwester moves the placement of the Flow Director queue as to not
fragment the queue pile which would cause later re-allocation issues.

Karen prevents VF reset being invoked while another is still occurring
to avoid reading invalid data.

Joe Damato fixes some statistics fields to match the values of the
fields they are based on.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-21 10:30:30 +00:00
Jakub Kicinski
276c7635d7 Merge branch 'mptcp-a-few-fixes'
Mat Martineau says:

====================
mptcp: A few fixes

Patch 1 fixes a RCU locking issue when processing a netlink command that
updates endpoint flags in the in-kernel MPTCP path manager.

Patch 2 fixes a typo affecting available endpoint id tracking.

Patch 3 fixes IPv6 routing in the MPTCP self tests.
====================

Link: https://lore.kernel.org/r/20220121003529.54930-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:24:04 -08:00
Paolo Abeni
9846921dba selftests: mptcp: fix ipv6 routing setup
MPJ ipv6 selftests currently lack per link route to the server
net. Additionally, ipv6 subflows endpoints are created without any
interface specified. The end-result is that in ipv6 self-tests
subflows are created all on the same link, leading to expected delays
and sporadic self-tests failures.

Fix the issue by adding the missing setup bits.

Fixes: 523514ed0a ("selftests: mptcp: add ADD_ADDR IPv6 test cases")
Reported-and-tested-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:24:02 -08:00
Geliang Tang
a4c0214fbe mptcp: fix removing ids bitmap setting
In mptcp_pm_nl_rm_addr_or_subflow(), the bit of rm_list->ids[i] in the
id_avail_bitmap should be set, not rm_list->ids[1]. This patch fixed it.

Fixes: 86e39e0448 ("mptcp: keep track of local endpoint still available for each msk")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:24:01 -08:00
Paolo Abeni
8e9eacad7e mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()
The MPTCP endpoint list is under RCU protection, guarded by the
pernet spinlock. mptcp_nl_cmd_set_flags() traverses the list
without acquiring the spin-lock nor under the RCU critical section.

This change addresses the issue performing the lookup and the endpoint
update under the pernet spinlock.

Fixes: 0f9f696a50 ("mptcp: add set_flags command in PM netlink")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:24:01 -08:00
Jakub Kicinski
6f97fde869 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Incorrect helper module alias in netbios_ns, from Florian Westphal.

2) Remove unused variable in nf_tables.

3) Uninitialized last expression in nf_tables register tracking.

4) Memleak in nft_connlimit after moving stateful data out of the
   expression data area.

5) Bogus invalid stats update when NF_REPEAT is returned, from Florian.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  netfilter: conntrack: don't increment invalid counter on NF_REPEAT
  netfilter: nft_connlimit: memleak if nf_ct_netns_get() fails
  netfilter: nf_tables: set last expression in register tracking area
  netfilter: nf_tables: remove unused variable
  netfilter: nf_conntrack_netbios_ns: fix helper module alias
====================

Link: https://lore.kernel.org/r/20220120125212.991271-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:22:31 -08:00
Eric Dumazet
aafc2e3285 ipv6: annotate accesses to fn->fn_sernum
struct fib6_node's fn_sernum field can be
read while other threads change it.

Add READ_ONCE()/WRITE_ONCE() annotations.

Do not change existing smp barriers in fib6_get_cookie_safe()
and __fib6_update_sernum_upto_root()

syzbot reported:

BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket

write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1:
 fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178
 fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112
 fib6_walk net/ipv6/ip6_fib.c:2160 [inline]
 fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline]
 __fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256
 fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281
 rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline]
 addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230
 addrconf_dad_work+0x908/0x1170
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x1bf/0x1e0 kernel/kthread.c:359
 ret_from_fork+0x1f/0x30

read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0:
 fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline]
 rt6_get_cookie include/net/ip6_fib.h:306 [inline]
 ip6_dst_store include/net/ip6_route.h:234 [inline]
 inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109
 inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121
 __tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402
 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
 tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
 tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725
 mptcp_push_release net/mptcp/protocol.c:1491 [inline]
 __mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578
 mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764
 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 kernel_sendmsg+0x97/0xd0 net/socket.c:745
 sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086
 inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834
 kernel_sendpage+0x187/0x200 net/socket.c:3492
 sock_sendpage+0x5a/0x70 net/socket.c:1007
 pipe_to_sendpage+0x128/0x160 fs/splice.c:364
 splice_from_pipe_feed fs/splice.c:418 [inline]
 __splice_from_pipe+0x207/0x500 fs/splice.c:562
 splice_from_pipe fs/splice.c:597 [inline]
 generic_splice_sendpage+0x94/0xd0 fs/splice.c:746
 do_splice_from fs/splice.c:767 [inline]
 direct_splice_actor+0x80/0xa0 fs/splice.c:936
 splice_direct_to_actor+0x345/0x650 fs/splice.c:891
 do_splice_direct+0x106/0x190 fs/splice.c:979
 do_sendfile+0x675/0xc40 fs/read_write.c:1245
 __do_sys_sendfile64 fs/read_write.c:1310 [inline]
 __se_sys_sendfile64 fs/read_write.c:1296 [inline]
 __x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000026f -> 0x00000271

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

The Fixes tag I chose is probably arbitrary, I do not think
we need to backport this patch to older kernels.

Fixes: c5cff8561d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:18:37 -08:00
Eric Dumazet
ebdc1a0309 tcp: add a missing sk_defer_free_flush() in tcp_splice_read()
Without it, splice users can hit the warning
added in commit 79074a72d3 ("net: Flush deferred skb free on socket destroy")

Fixes: f35f821935 ("tcp: defer skb freeing after socket lock is released")
Fixes: 79074a72d3 ("net: Flush deferred skb free on socket destroy")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20220120124530.925607-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:17:50 -08:00
Gal Pressman
48cec899e3 tcp: Add a stub for sk_defer_free_flush()
When compiling the kernel with CONFIG_INET disabled, the
sk_defer_free_flush() should be defined as a nop.

This resolves the following compilation error:
  ld: net/core/sock.o: in function `sk_defer_free_flush':
  ./include/net/tcp.h:1378: undefined reference to `__sk_defer_free_flush'

Fixes: 79074a72d3 ("net: Flush deferred skb free on socket destroy")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220120123440.9088-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:17:32 -08:00
Marek Behún
cbda1b1668 phylib: fix potential use-after-free
Commit bafbdd527d ("phylib: Add device reset GPIO support") added call
to phy_device_reset(phydev) after the put_device() call in phy_detach().

The comment before the put_device() call says that the phydev might go
away with put_device().

Fix potential use-after-free by calling phy_device_reset() before
put_device().

Fixes: bafbdd527d ("phylib: Add device reset GPIO support")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220119162748.32418-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-20 20:17:04 -08:00
Joe Damato
3b8428b845 i40e: fix unsigned stat widths
Change i40e_update_vsi_stats and struct i40e_vsi to use u64 fields to match
the width of the stats counters in struct i40e_rx_queue_stats.

Update debugfs code to use the correct format specifier for u64.

Fixes: 41c445ff0f ("i40e: main driver core")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-20 10:25:41 -08:00
Karen Sornek
0f344c8129 i40e: Fix for failed to init adminq while VF reset
Fix for failed to init adminq: -53 while VF is resetting via MAC
address changing procedure.
Added sync module to avoid reading deadbeef value in reinit adminq
during software reset.
Without this patch it is possible to trigger VF reset procedure
during reinit adminq. This resulted in an incorrect reading of
value from the AQP registers and generated the -53 error.

Fixes: 5c3c48ac6b ("i40e: implement virtual device interface")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-20 10:25:40 -08:00
Sylwester Dziedziuch
92947844b8 i40e: Fix queues reservation for XDP
When XDP was configured on a system with large number of CPUs
and X722 NIC there was a call trace with NULL pointer dereference.

i40e 0000:87:00.0: failed to get tracking for 256 queues for VSI 0 err -12
i40e 0000:87:00.0: setup of MAIN VSI failed

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:i40e_xdp+0xea/0x1b0 [i40e]
Call Trace:
? i40e_reconfig_rss_queues+0x130/0x130 [i40e]
dev_xdp_install+0x61/0xe0
dev_xdp_attach+0x18a/0x4c0
dev_change_xdp_fd+0x1e6/0x220
do_setlink+0x616/0x1030
? ahci_port_stop+0x80/0x80
? ata_qc_issue+0x107/0x1e0
? lock_timer_base+0x61/0x80
? __mod_timer+0x202/0x380
rtnl_setlink+0xe5/0x170
? bpf_lsm_binder_transaction+0x10/0x10
? security_capable+0x36/0x50
rtnetlink_rcv_msg+0x121/0x350
? rtnl_calcit.isra.0+0x100/0x100
netlink_rcv_skb+0x50/0xf0
netlink_unicast+0x1d3/0x2a0
netlink_sendmsg+0x22a/0x440
sock_sendmsg+0x5e/0x60
__sys_sendto+0xf0/0x160
? __sys_getsockname+0x7e/0xc0
? _copy_from_user+0x3c/0x80
? __sys_setsockopt+0xc8/0x1a0
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f83fa7a39e0

This was caused by PF queue pile fragmentation due to
flow director VSI queue being placed right after main VSI.
Because of this main VSI was not able to resize its
queue allocation for XDP resulting in no queues allocated
for main VSI when XDP was turned on.

Fix this by always allocating last queue in PF queue pile
for a flow director VSI.

Fixes: 41c445ff0f ("i40e: main driver core")
Fixes: 74608d17fe ("i40e: add support for XDP_TX action")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-20 10:25:40 -08:00
Jedrzej Jagielski
d701658a50 i40e: Fix issue when maximum queues is exceeded
Before this patch VF interface vanished when
maximum queue number was exceeded. Driver tried
to add next queues even if there was not enough
space. PF sent incorrect number of queues to
the VF when there were not enough of them.

Add an additional condition introduced to check
available space in 'qp_pile' before proceeding.
This condition makes it impossible to add queues
if they number is greater than the number resulting
from available space.
Also add the search for free space in PF queue
pair piles.

Without this patch VF interfaces are not seen
when available space for queues has been
exceeded and following logs appears permanently
in dmesg:
"Unable to get VF config (-32)".
"VF 62 failed opcode 3, retval: -5"
"Unable to get VF config due to PF error condition, not retrying"

Fixes: 7daa6bf329 ("i40e: driver core headers")
Fixes: 41c445ff0f ("i40e: main driver core")
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-20 10:25:40 -08:00
Jedrzej Jagielski
9b13bd5313 i40e: Increase delay to 1 s after global EMP reset
Recently simplified i40e_rebuild causes that FW sometimes
is not ready after NVM update, the ping does not return.

Increase the delay in case of EMP reset.
Old delay of 300 ms was introduced for specific cards for 710 series.
Now it works for all the cards and delay was increased.

Fixes: 1fa51a650e ("i40e: Add delay after EMP reset for firmware to recover")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-20 10:25:39 -08:00
David S. Miller
57afdc0aab Merge branch 'stmmac-fixes'
Yuji Ishikawa says:

====================
net: stmmac: dwmac-visconti: Fix bit definitions and clock configuration for RMII mode

This series is a fix for RMII/MII operation mode of the dwmac-visconti driver.
It is composed of two parts:

* 1/2: fix constant definitions for cleared bits in ETHER_CLK_SEL register
* 2/2: fix configuration of ETHER_CLK_SEL register for running in RMII operation mode.

  net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL
    v1 -> v2:
      - added Fixes tag to commit message

  net: stmmac: dwmac-visconti: Fix clock configuration for RMII mode
    v1 -> v2:
      - added Fixes tag to commit message
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-20 11:58:45 +00:00
Yuji Ishikawa
0959bc4bd4 net: stmmac: dwmac-visconti: Fix clock configuration for RMII mode
Bit pattern of the ETHER_CLOCK_SEL register for RMII/MII mode should be fixed.
Also, some control bits should be modified with a specific sequence.

Fixes: b38dd98ff8 ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
Signed-off-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Reviewed-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-20 11:58:44 +00:00
Yuji Ishikawa
1ba1a4a90f net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL
just 0 should be used to represent cleared bits

* ETHER_CLK_SEL_DIV_SEL_20
* ETHER_CLK_SEL_TX_CLK_EXT_SEL_IN
* ETHER_CLK_SEL_RX_CLK_EXT_SEL_IN
* ETHER_CLK_SEL_TX_CLK_O_TX_I
* ETHER_CLK_SEL_RMII_CLK_SEL_IN

Fixes: b38dd98ff8 ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
Signed-off-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Reviewed-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-20 11:58:44 +00:00
Ido Schimmel
6cee105e7f ipv6_tunnel: Rate limit warning messages
The warning messages can be invoked from the data path for every packet
transmitted through an ip6gre netdev, leading to high CPU utilization.

Fix that by rate limiting the messages.

Fixes: 09c6bbf090 ("[IPV6]: Do mandatory IPv6 tunnel endpoint checks in realtime")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Tested-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-20 11:38:17 +00:00
Moshe Tal
e2f08207c5 ethtool: Fix link extended state for big endian
The link extended sub-states are assigned as enum that is an integer
size but read from a union as u8, this is working for small values on
little endian systems but for big endian this always give 0. Fix the
variable in the union to match the enum size.

Fixes: ecc31c6024 ("ethtool: Add link extended state")
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-20 11:30:15 +00:00
Robert Hancock
d15c7e875d net: phy: broadcom: hook up soft_reset for BCM54616S
A problem was encountered with the Bel-Fuse 1GBT-SFP05 SFP module (which
is a 1 Gbps copper module operating in SGMII mode with an internal
BCM54616S PHY device) using the Xilinx AXI Ethernet MAC core, where the
module would work properly on the initial insertion or boot of the
device, but after the device was rebooted, the link would either only
come up at 100 Mbps speeds or go up and down erratically.

I found no meaningful changes in the PHY configuration registers between
the working and non-working boots, but the status registers seemed to
have a lot of error indications set on the SERDES side of the device on
the non-working boot. I suspect the problem is that whatever happens on
the SGMII link when the device is rebooted and the FPGA logic gets
reloaded ends up putting the module's onboard PHY into a bad state.

Since commit 6e2d85ec05 ("net: phy: Stop with excessive soft reset")
the genphy_soft_reset call is not made automatically by the PHY core
unless the callback is explicitly specified in the driver structure. For
most of these Broadcom devices, there is probably a hardware reset that
gets asserted to reset the PHY during boot, however for SFP modules
(where the BCM54616S is commonly found) no such reset line exists, so if
the board keeps the SFP cage powered up across a reboot, it will end up
with no reset occurring during reboots.

Hook up the genphy_soft_reset callback for BCM54616S to ensure that a
PHY reset is performed before the device is initialized. This appears to
fix the issue with erratic operation after a reboot with this SFP
module.

Fixes: 6e2d85ec05 ("net: phy: Stop with excessive soft reset")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-20 11:27:06 +00:00
Victor Nogueira
973bf8fdd1 net: sched: Clarify error message when qdisc kind is unknown
When adding a tc rule with a qdisc kind that is not supported or not
compiled into the kernel, the kernel emits the following error: "Error:
Specified qdisc not found.". Found via tdc testing when ETS qdisc was not
compiled in and it was not obvious right away what the message meant
without looking at the kernel code.

Change the error message to be more explicit and say the qdisc kind is
unknown.

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-20 11:25:57 +00:00
Congyu Liu
47934e06b6 net: fix information leakage in /proc/net/ptype
In one net namespace, after creating a packet socket without binding
it to a device, users in other net namespaces can observe the new
`packet_type` added by this packet socket by reading `/proc/net/ptype`
file. This is minor information leakage as packet socket is
namespace aware.

Add a net pointer in `packet_type` to keep the net namespace of
of corresponding packet socket. In `ptype_seq_show`, this net pointer
must be checked when it is not NULL.

Fixes: 2feb27dbe0 ("[NETNS]: Minor information leak via /proc/net/ptype file.")
Signed-off-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-20 10:12:53 +00:00
Linus Torvalds
fa2e1ba3e9 Networking fixes for 5.17-rc1, including fixes from netfilter, bpf.
Current release - regressions:
 
  - fix memory leaks in the skb free deferral scheme if upper layer
    protocols are used, i.e. in-kernel TCP readers like TLS
 
 Current release - new code bugs:
 
  - nf_tables: fix NULL check typo in _clone() functions
 
  - change the default to y for Vertexcom vendor Kconfig
 
  - a couple of fixes to incorrect uses of ref tracking
 
  - two fixes for constifying netdev->dev_addr
 
 Previous releases - regressions:
 
  - bpf:
    - various verifier fixes mainly around register offset handling
      when passed to helper functions
    - fix mount source displayed for bpffs (none -> bpffs)
 
  - bonding:
    - fix extraction of ports for connection hash calculation
    - fix bond_xmit_broadcast return value when some devices are down
 
  - phy: marvell: add Marvell specific PHY loopback
 
  - sch_api: don't skip qdisc attach on ingress, prevent ref leak
 
  - htb: restore minimal packet size handling in rate control
 
  - sfp: fix high power modules without diagnostic monitoring
 
  - mscc: ocelot:
    - don't let phylink re-enable TX PAUSE on the NPI port
    - don't dereference NULL pointers with shared tc filters
 
  - smsc95xx: correct reset handling for LAN9514
 
  - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
 
  - phy: micrel: use kszphy_suspend/_resume for irq aware devices,
    avoid races with the interrupt
 
 Previous releases - always broken:
 
  - xdp: check prog type before updating BPF link
 
  - smc: resolve various races around abnormal connection termination
 
  - sit: allow encapsulated IPv6 traffic to be delivered locally
 
  - axienet: fix init/reset handling, add missing barriers,
    read the right status words, stop queues correctly
 
  - add missing dev_put() in sock_timestamping_bind_phc()
 
 Misc:
 
  - ipv4: prevent accidentally passing RTO_ONLINK to
    ip_route_output_key_hash() by sanitizing flags
 
  - ipv4: avoid quadratic behavior in netns dismantle
 
  - stmmac: dwmac-oxnas: add support for OX810SE
 
  - fsl: xgmac_mdio: add workaround for erratum A-009885
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHoS14ACgkQMUZtbf5S
 IrtMQA/6AxhWuj2JsoNhvTzBCi4vkeo53rKU941bxOaST9Ow8dqDc7yAT8YeJU2B
 lGw6/pXx+Fm9twGsRkkQ0vX7piIk25vKzEwnlCYVVXLAnE+lPu9qFH49X1HO5Fwy
 K+frGDC524MrbJFb+UbZfJG4UitsyHoqc58Mp7ZNBe2gn12DcHotsiSJikzdd02F
 rzQZhvwRKsDS2prcIHdvVAxva380cn99mvaFqIPR9MemhWKOzVa3NfkiC3tSlhW/
 OphG3UuOfKCVdofYAO5/oXlVQcDKx0OD9Sr2q8aO0mlME0p0ounKz+LDcwkofaYQ
 pGeMY2pEAHujLyRewunrfaPv8/SIB/ulSPcyreoF28TTN20M+4onvgTHvVSyzLl7
 MA4kYH7tkPgOfbW8T573OFPdrqsy4WTrFPFovGqvDuiE8h65Pll/gTcAqsWjF/xw
 CmfmtICcsBwVGMLUzpUjKAWuB0/voa/sQUuQoxvQFsgCteuslm1suLY5EfSIhdu8
 nvhySJjPXRHicZQNflIwKTiOYYWls7yYVGe76u9hqjyD36peJXYjUjyyENIfLiFA
 0XclGIfSBMGWMGmxvGYIZDwGOKK0j+s0PipliXVjP2otLrPYUjma5Co37KW8SiSV
 9TT673FAXJNB0IJ7xiT7nRUZ/fjRrweP1glte/6d148J1Lf9MTQ=
 =XM4Y
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, bpf.

  Quite a handful of old regression fixes but most of those are
  pre-5.16.

  Current release - regressions:

   - fix memory leaks in the skb free deferral scheme if upper layer
     protocols are used, i.e. in-kernel TCP readers like TLS

  Current release - new code bugs:

   - nf_tables: fix NULL check typo in _clone() functions

   - change the default to y for Vertexcom vendor Kconfig

   - a couple of fixes to incorrect uses of ref tracking

   - two fixes for constifying netdev->dev_addr

  Previous releases - regressions:

   - bpf:
      - various verifier fixes mainly around register offset handling
        when passed to helper functions
      - fix mount source displayed for bpffs (none -> bpffs)

   - bonding:
      - fix extraction of ports for connection hash calculation
      - fix bond_xmit_broadcast return value when some devices are down

   - phy: marvell: add Marvell specific PHY loopback

   - sch_api: don't skip qdisc attach on ingress, prevent ref leak

   - htb: restore minimal packet size handling in rate control

   - sfp: fix high power modules without diagnostic monitoring

   - mscc: ocelot:
      - don't let phylink re-enable TX PAUSE on the NPI port
      - don't dereference NULL pointers with shared tc filters

   - smsc95xx: correct reset handling for LAN9514

   - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account

   - phy: micrel: use kszphy_suspend/_resume for irq aware devices,
     avoid races with the interrupt

  Previous releases - always broken:

   - xdp: check prog type before updating BPF link

   - smc: resolve various races around abnormal connection termination

   - sit: allow encapsulated IPv6 traffic to be delivered locally

   - axienet: fix init/reset handling, add missing barriers, read the
     right status words, stop queues correctly

   - add missing dev_put() in sock_timestamping_bind_phc()

  Misc:

   - ipv4: prevent accidentally passing RTO_ONLINK to
     ip_route_output_key_hash() by sanitizing flags

   - ipv4: avoid quadratic behavior in netns dismantle

   - stmmac: dwmac-oxnas: add support for OX810SE

   - fsl: xgmac_mdio: add workaround for erratum A-009885"

* tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
  ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
  ipv4: avoid quadratic behavior in netns dismantle
  net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
  powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
  dt-bindings: net: Document fsl,erratum-a009885
  net/fsl: xgmac_mdio: Add workaround for erratum A-009885
  net: mscc: ocelot: fix using match before it is set
  net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices
  net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
  nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
  net: axienet: increase default TX ring size to 128
  net: axienet: fix for TX busy handling
  net: axienet: fix number of TX ring slots for available check
  net: axienet: Fix TX ring slot available check
  net: axienet: limit minimum TX ring size
  net: axienet: add missing memory barriers
  net: axienet: reset core on initialization prior to MDIO access
  net: axienet: Wait for PhyRstCmplt after core reset
  net: axienet: increase reset timeout
  bpf, selftests: Add ringbuf memory type confusion test
  ...
2022-01-20 10:57:05 +02:00
Linus Torvalds
f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00
Colin Ian King
b1e78ef3be lib: remove redundant assignment to variable ret
The variable ret is being assigned a value that is never read.  If the
for-loop is entered then ret is immediately re-assigned a new value.  If
the for-loop is not executed ret is never read.  The assignment is
redundant and can be removed.

Link: https://lkml.kernel.org/r/20211230134557.83633-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Kees Cook
69d0db01e2 ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
The object-size sanitizer is redundant to -Warray-bounds, and
inappropriately performs its checks at run-time when all information
needed for the evaluation is available at compile-time, making it quite
difficult to use:

  https://bugzilla.kernel.org/show_bug.cgi?id=214861

With -Warray-bounds almost enabled globally, it doesn't make sense to
keep this around.

Link: https://lkml.kernel.org/r/20211203235346.110809-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Marco Elver
bece04b5b4 kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
Until recent versions of GCC and Clang, it was not possible to disable
KCOV instrumentation via a function attribute.  The relevant function
attribute was introduced in 540540d06e ("kcov: add
__no_sanitize_coverage to fix noinstr for all architectures").

x86 was the first architecture to want a working noinstr, and at the
time no compiler support for the attribute existed yet.  Therefore,
commit 0f1441b44e ("objtool: Fix noinstr vs KCOV") introduced the
ability to NOP __sanitizer_cov_*() calls in .noinstr.text.

However, this doesn't work for other architectures like arm64 and s390
that want a working noinstr per ARCH_WANTS_NO_INSTR.

At the time of 0f1441b44e, we didn't yet have ARCH_WANTS_NO_INSTR,
but now we can move the Kconfig dependency checks to the generic KCOV
option.  KCOV will be available if:

	- architecture does not care about noinstr, OR
	- we have objtool support (like on x86), OR
	- GCC is 12.0 or newer, OR
	- Clang is 13.0 or newer.

Link: https://lkml.kernel.org/r/20211201152604.3984495-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Nathan Chancellor
bbd2e05fad lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
Commit b05fbcc36b ("btrfs: disable build on platforms having page size
256K") disabled btrfs for configurations that used a 256kB page size.
However, it did not fully solve the problem because CONFIG_TEST_KMOD
selects CONFIG_BTRFS, which does not account for the dependency.  This
results in a Kconfig warning and the failed BUILD_BUG_ON error
returning.

  WARNING: unmet direct dependencies detected for BTRFS_FS
    Depends on [n]: BLOCK [=y] && !PPC_256K_PAGES && !PAGE_SIZE_256KB [=y]
    Selected by [m]:
    - TEST_KMOD [=m] && RUNTIME_TESTING_MENU [=y] && m && MODULES [=y] && NETDEVICES [=y] && NET_CORE [=y] && INET [=y] && BLOCK [=y]

To resolve this, add CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency of
CONFIG_TEST_KMOD so there is no more invalid configuration or build
errors.

Link: https://lkml.kernel.org/r/20211129230141.228085-4-nathan@kernel.org
Fixes: b05fbcc36b ("btrfs: disable build on platforms having page size 256K")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Nathan Chancellor
e900909599 btrfs: use generic Kconfig option for 256kB page size limit
Use the newly introduced CONFIG_PAGE_SIZE_LESS_THAN_256KB to describe
the dependency introduced by commit b05fbcc36b ("btrfs: disable build
on platforms having page size 256K").

Link: https://lkml.kernel.org/r/20211129230141.228085-3-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Nathan Chancellor
e4bbd20d8c arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
Patch series "Fix CONFIG_TEST_KMOD with 256kB page size".

The kernel test robot reported a build error [1] from a failed assertion
in fs/btrfs/inode.c with a hexagon randconfig that includes
CONFIG_PAGE_SIZE_256KB.  This error is the same one that was addressed
by commit b05fbcc36b ("btrfs: disable build on platforms having page
size 256K") but CONFIG_TEST_KMOD selects CONFIG_BTRFS without having the
"page size less than 256kB dependency", which results in the error
reappearing.

The first patch introduces CONFIG_PAGE_SIZE_LESS_THAN_256KB by splitting
it off from CONFIG_PAGE_SIZE_LESS_THAN_64KB, which was introduced in
commit 1f0e290cc5 ("arch: Add generic Kconfig option indicating page
size smaller than 64k") for a similar reason in 5.16-rc3.

The second patch uses that configuration option for CONFIG_BTRFS to
reduce duplication.

The third patch resolves the build error by adding
CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency to CONFIG_TEST_KMOD so
that CONFIG_BTRFS does not get enabled under that invalid configuration.

[1]: https://lore.kernel.org/r/202111270255.UYOoN5VN-lkp@intel.com/

This patch (of 3):

btrfs requires a page size smaller than 256kB.  To use that dependency
in other places, introduce CONFIG_PAGE_SIZE_LESS_THAN_256KB and reuse
that dependency in CONFIG_PAGE_SIZE_LESS_THAN_64KB.

Link: https://lkml.kernel.org/r/20211129230141.228085-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20211129230141.228085-2-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Qian Cai
0aaa8977ac configs: introduce debug.config for CI-like setup
Some general debugging features like kmemleak, KASAN, lockdep, UBSAN etc
help fix many viruses like a microscope.  On the other hand, those
features are scatter around and mixed up with more situational debugging
options making them difficult to consume properly.  This cold help
amplify the general debugging/testing efforts and help establish
sensitive default values for those options across the broad.  This could
also help different distros to collaborate on maintaining debug-flavored
kernels.

The config is based on years' experiences running daily CI inside the
largest enterprise Linux distro company to seek regressions on
linux-next builds on different bare-metal and virtual platforms.  It can
be used for example,

  $ make ARCH=arm64 defconfig debug.config

Since KASAN and KCSAN can't be enabled together, we will need to create
a separate one for KCSAN later as well.

Link: https://lkml.kernel.org/r/20211115134754.7334-1-quic_qiancai@quicinc.com
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: "Stephen Rothwell" <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
wangyong
5bf1828153 delayacct: track delays from memory compact
Delay accounting does not track the delay of memory compact.  When there
is not enough free memory, tasks can spend a amount of their time
waiting for compact.

To get the impact of tasks in direct memory compact, measure the delay
when allocating memory through memory compact.

Also update tools/accounting/getdelays.c:

    / # ./getdelays_next  -di -p 304
    print delayacct stats ON
    printing IO accounting
    PID     304

    CPU             count     real total  virtual total    delay total  delay average
                      277      780000000      849039485       18877296          0.068ms
    IO              count    delay total  delay average
                        0              0              0ms
    SWAP            count    delay total  delay average
                        0              0              0ms
    RECLAIM         count    delay total  delay average
                        5    11088812685           2217ms
    THRASHING       count    delay total  delay average
                        0              0              0ms
    COMPACT         count    delay total  delay average
                        3          72758              0ms
    watch: read=0, write=0, cancelled_write=0

Link: https://lkml.kernel.org/r/1638619795-71451-1-git-send-email-wang.yong12@zte.com.cn
Signed-off-by: wangyong <wang.yong12@zte.com.cn>
Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Reviewed-by: Zhang Wenya <zhang.wenya1@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
wangyong
ec710aa8b2 Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
Add thrashing page cache and direct compact related descriptions and
update the usage of getdelays userspace utility.

The following patches modifications have been updated:
https://lore.kernel.org/all/20190312102002.31737-4-jinpuwang@gmail.com/
https://lore.kernel.org/all/1638619795-71451-1-git-send-email-
wang.yong12@zte.com.cn/

Link: https://lkml.kernel.org/r/1639583021-92977-1-git-send-email-wang.yong12@zte.com.cn
Signed-off-by: wangyong <wang.yong12@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Yang Yang
1193829da1 delayacct: cleanup flags in struct task_delay_info and functions use it
Flags in struct task_delay_info is used to distinguish the difference
between swapin and blkio delay acountings.  But after patch "delayacct:
support swapin delay accounting for swapping without blkio", there is no
need to do that since swapin and blkio delay accounting use their own
functions.

Link: https://lkml.kernel.org/r/20211124065958.36703-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Yang Yang
82065b7266 delayacct: fix incomplete disable operation when switch enable to disable
When a task is created after delayacct is enabled, kernel will do all
the delay accountings for that task.  The problems is if user disables
delayacct by set /proc/sys/kernel/task_delayacct to zero, only blkio
delay accounting is disabled.

Now disable all the kinds of delay accountings when
/proc/sys/kernel/task_delayacct sets to zero.

Link: https://lkml.kernel.org/r/20211123140342.32962-1-ran.xiaokai@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Yang Yang
a3d5dc908a delayacct: support swapin delay accounting for swapping without blkio
Currently delayacct accounts swapin delay only for swapping that cause
blkio.  If we use zram for swapping, tools/accounting/getdelays can't
get any SWAP delay.

It's useful to get zram swapin delay information, for example to adjust
compress algorithm or /proc/sys/vm/swappiness.

Reference to PSI, it accounts any kind of swapping by doing its work in
swap_readpage(), no matter whether swapping causes blkio.  Let delayacct
do the similar work.

Link: https://lkml.kernel.org/r/20211112083813.8559-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Sebastian Andrzej Siewior
e83a4472bf panic: remove oops_id
The oops id has been added as part of the end of trace marker for the
kerneloops.org project.  The id is used to automatically identify
duplicate submissions of the same report.  Identical looking reports
with different a id can be considered as the same oops occurred again.

The early initialisation of the oops_id can create a warning if the
random core is not yet fully initialized.  On PREEMPT_RT it is
problematic if the id is initialized on demand from non preemptible
context.

The kernel oops project is not available since 2017.  Remove the oops_id
and use 0 in the output in case parser rely on it.

Link: https://bugs.debian.org/953172
Link: https://lkml.kernel.org/r/Ybdi16aP2NEugWHq@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Marco Elver
23b36fec7e panic: use error_report_end tracepoint on warnings
Introduce the error detector "warning" to the error_report event and use
the error_report_end tracepoint at the end of a warning report.

This allows in-kernel tests but also userspace to more easily determine
if a warning occurred without polling kernel logs.

[akpm@linux-foundation.org: add comma to enum list, per Andy]

Link: https://lkml.kernel.org/r/20211115085630.1756817-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00