Commit Graph

996504 Commits

Author SHA1 Message Date
Linus Torvalds
9b1ea29bc0 Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
This reverts commit 8ff60eb052.

The kernel test robot reports a huge performance regression due to the
commit, and the reason seems fairly straightforward: when there is
contention on the page list (which is what causes acquire_slab() to
fail), we do _not_ want to just loop and try again, because that will
transfer the contention to the 'n->list_lock' spinlock we hold, and
just make things even worse.

This is admittedly likely a problem only on big machines - the kernel
test robot report comes from a 96-thread dual socket Intel Xeon Gold
6252 setup, but the regression there really is quite noticeable:

   -47.9% regression of stress-ng.rawpkt.ops_per_sec

and the commit that was marked as being fixed (7ced371971: "slub:
Acquire_slab() avoid loop") actually did the loop exit early very
intentionally (the hint being that "avoid loop" part of that commit
message), exactly to avoid this issue.

The correct thing to do may be to pick some kind of reasonable middle
ground: instead of breaking out of the loop on the very first sign of
contention, or trying over and over and over again, the right thing may
be to re-try _once_, and then give up on the second failure (or pick
your favorite value for "once"..).

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/
Cc: Jann Horn <jannh@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-10 10:18:04 -08:00
Linus Torvalds
d3110f256d for-linus-2021-03-10
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYEiQUgAKCRCRxhvAZXjc
 omY/AQDEn45Gx6YxDOiWXxyreS3JYx8jVNyy85uDDFRwR0qK+QEAgsmdOxOJoFfe
 zzNA8dKmXx2t+upuK8htmqtQTrGV/wg=
 =9MKn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2021-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull detached mounts fix from Christian Brauner:
 "Creating a series of detached mounts, attaching them to the
  filesystem, and unmounting them can be used to trigger an integer
  overflow in ns->mounts causing the kernel to block any new mounts in
  count_mounts() and returning ENOSPC because it falsely assumes that
  the maximum number of mounts in the mount namespace has been reached,
  i.e. it thinks it can't fit the new mounts into the mount namespace
  anymore.

  Without this fix heavy use of the new mount API with move_mount() will
  cause the host to become unuseable and thus blocks some xfstest
  patches I want to resend.

  Depending on the number of mounts in your system, this can be
  reproduced on any kernel that supportes open_tree() and move_mount().

  A reproducer has been sent for inclusion with xfstests. It takes care
  to do this in another mount namespace, not in the host's mount
  namespace so there shouldn't be any risk in running it but if one did
  run it on the host it would require a reboot in order to be able to
  mount again. See

      https://lore.kernel.org/fstests/20210309121041.753359-1-christian.brauner@ubuntu.com

  The root cause of this is that detached mounts aren't handled
  correctly when source and target mount are identical and reside on a
  shared mount causing a broken mount tree where the detached source
  itself is propagated which propagation prevents for regular
  bind-mounts and new mounts.

  This ultimately leads to a miscalculation of the number of mounts in
  the mount namespace.

  Detached mounts created via 'open_tree(fd, path, OPEN_TREE_CLONE)' are
  essentially like an unattached bind-mount. They can then later on be
  attached to the filesystem via move_mount() which calls into
  attach_recursive_mount().

  Part of attaching it to the filesystem is making sure that mounts get
  correctly propagated in case the destination mountpoint is MS_SHARED,
  i.e. is a shared mountpoint. This is done by calling into
  propagate_mnt() which walks the list of peers calling propagate_one()
  on each mount in this list making sure it receives the propagation
  event. The propagate_one() function thereby skips both new mounts and
  bind mounts to not propagate them "into themselves". Both are
  identified by checking whether the mount is already attached to any
  mount namespace in mnt->mnt_ns. The is what the IS_MNT_NEW() helper is
  responsible for.

  However, detached mounts have an anonymous mount namespace attached to
  them stashed in mnt->mnt_ns which means that IS_MNT_NEW() doesn't
  realize they need to be skipped causing the mount to propagate "into
  itself" breaking the mount table and causing a disconnect between the
  number of mounts recorded as being beneath or reachable from the
  target mountpoint and the number of mounts actually recorded/counted
  in ns->mounts ultimately causing an overflow which in turn prevents
  any new mounts via the ENOSPC issue.

  So teach propagation to handle detached mounts by making it aware of
  them. I've been tracking this issue down for the last couple of days
  and then verifying that the fix is correct by unmounting everything in
  my current mount table leaving only /proc and /sys mounted and running
  the reproducer above overnight verifying the number of mounts counted
  in ns->mounts. With this fix the counts are correct and the ENOSPC
  issue can't be reproduced.

  This change will only have an effect on mounts created with the new
  mount API since detached mounts cannot be created with the old mount
  API so regressions are extremely unlikely.

  Here's an illustration:

    #### mount():
    ubuntu@f1-vm:~$ sudo mount --bind /mnt/ /mnt/
    ubuntu@f1-vm:~$ findmnt  | grep -i mnt
    ├─/mnt                                /dev/sda2[/mnt] ext4       rw,relatime

    #### open_tree(OPEN_TREE_CLONE) + move_mount() with bug:
    ubuntu@f1-vm:~$ sudo ./mount-new /mnt/ /mnt/
    ubuntu@f1-vm:~$ findmnt  | grep -i mnt
    ├─/mnt                                /dev/sda2[/mnt] ext4       rw,relatime
    │ └─/mnt                              /dev/sda2[/mnt] ext4       rw,relatime

    #### open_tree(OPEN_TREE_CLONE) + move_mount() with the fix:
    ubuntu@f1-vm:~$ sudo ./mount-new /mnt /mnt
    ubuntu@f1-vm:~$ findmnt | grep -i mnt
    └─/mnt                                /dev/sda2[/mnt] ext4       rw,relatime"

* tag 'for-linus-2021-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  mount: fix mounting of detached mounts onto targets that reside on shared mounts
2021-03-10 10:01:35 -08:00
Linus Torvalds
d0df9aabef 6 cifs/smb3 fixes, 3 for stable, including some important mulitchannel crediting fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmBIMv0ACgkQiiy9cAdy
 T1HNFwwAsXZ2JUK6TO2SGbNiW5uBDcIbRjfSH7rgmVQ/q1Csnszrpyz8iLEToUGP
 GH3g04n7GeM5zDB267QHifX44aA2Yy3GbE/goJUhdacAsmkMMDcHmHoJL6OoDvK4
 PUlfLaWHOK1E3MWxpf0idftfNKlcequjxLOpcLTQQg83TC6peYvkMncbjCQVgldw
 4npV40b3qXVZ0TagglR4POQ4gCJt6GXrLwEPIoUbzAPuTZtJfa4LEPa5cJ0gYbSI
 SJ6Ri8KcZu//O/FAEUTJOg3ggkmVy13H7dSf29rRsjVcLYI3lQ8zcMHlCBKw00YF
 Jc3m95Xyn0xsjELKAg5kHd/1yz4vFvitYn9NyVhYUAWtCbMs/2MEYSQJnuLxF1Vm
 RpLRLDQFpewkoEGFNMTfZyNY3A6hq1gQJYcqmmgVru+hAcozvs3+YHhNKWwEMWAP
 weTSpoa/ERUfAn5XcNP5gELcEQALLOHsACJtJ+00T+A+DD7XgWjTX1ynMDa58MfG
 UySTW3iY
 =zEnL
 -----END PGP SIGNATURE-----

Merge tag '5.12-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Six cifs/smb3 fixes, three of them for stable, including some
  important mulitchannel crediting fixes, and a fix for statfs error
  handling"

* tag '5.12-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: do not send close in compound create+close requests
  cifs: return proper error code in statfs(2)
  cifs: change noisy error message to FYI
  cifs: print MIDs in decimal notation
  cifs: ask for more credit on async read/write code paths
  cifs: fix credit accounting for extra channel
2021-03-10 09:55:06 -08:00
Juergen Gross
9e77d96b8e xen/events: reset affinity of 2-level event when tearing it down
When creating a new event channel with 2-level events the affinity
needs to be reset initially in order to avoid using an old affinity
from earlier usage of the event channel port. So when tearing an event
channel down reset all affinity bits.

The same applies to the affinity when onlining a vcpu: all old
affinity settings for this vcpu must be reset. As percpu events get
initialized before the percpu event channel hook is called,
resetting of the affinities happens after offlining a vcpu (this is
working, as initial percpu memory is zeroed out).

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Link: https://lore.kernel.org/r/20210306161833.4552-2-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 07:25:18 -06:00
Linus Torvalds
05a59d7979 Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.

 2) TX skb error handling fix in mt76 driver, also from Felix.

 3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.

 4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
    From Cong Wang.

 5) Use correct printf format for dma_addr_t in ath11k, from Geert
    Uytterhoeven.

 6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
    Hsieh.

 7) Don't report truncated frames to mac80211 in mt76 driver, from
    Lorenzop Bianconi.

 8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.

 9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.

10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
    Arjun Roy.

11) Ignore routes with deleted nexthop object in mlxsw, from Ido
    Schimmel.

12) Need to undo tcp early demux lookup sometimes in nf_nat, from
    Florian Westphal.

13) Fix gro aggregation for udp encaps with zero csum, from Daniel
    Borkmann.

14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
    Donenfeld.

15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.

16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.

17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
    Oltean.

18) Make sure textsearch copntrol block is large enough, from Wilem de
    Bruijn.

19) Revert MAC changes to r8152 leading to instability, from Hates Wang.

20) Advance iov in 9p even for empty reads, from Jissheng Zhang.

21) Double hook unregister in nftables, from PabloNeira Ayuso.

22) Fix memleak in ixgbe, fropm Dinghao Liu.

23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.

24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
    Tang.

25) Fix DOI refcount bugs in cipso, from Paul Moore.

26) One too many irqsave in ibmvnic, from Junlin Yang.

27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
    Balazs Nemeth.

* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
  s390/qeth: fix notification for pending buffers during teardown
  s390/qeth: schedule TX NAPI on QAOB completion
  s390/qeth: improve completion of pending TX buffers
  s390/qeth: fix memory leak after failed TX Buffer allocation
  net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
  net: check if protocol extracted by virtio_net_hdr_set_proto is correct
  net: dsa: xrs700x: check if partner is same as port in hsr join
  net: lapbether: Remove netif_start_queue / netif_stop_queue
  atm: idt77252: fix null-ptr-dereference
  atm: uPD98402: fix incorrect allocation
  atm: fix a typo in the struct description
  net: qrtr: fix error return code of qrtr_sendmsg()
  mptcp: fix length of ADD_ADDR with port sub-option
  net: bonding: fix error return code of bond_neigh_init()
  net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
  net: enetc: set MAC RX FIFO to recommended value
  net: davicom: Use platform_get_irq_optional()
  net: davicom: Fix regulator not turned off on driver removal
  net: davicom: Fix regulator not turned off on failed probe
  net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
  ...
2021-03-09 17:15:56 -08:00
Linus Torvalds
6a30bedfdf Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
 "Fix opcode filtering for exceptions, and clean up defconfig"

* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
  sparc: sparc64_defconfig: remove duplicate CONFIGs
  sparc64: Fix opcode filtering in handling of no fault loads
2021-03-09 17:08:41 -08:00
Corentin Labbe
69264b4a43 sparc: sparc64_defconfig: remove duplicate CONFIGs
After my patch there is CONFIG_ATA defined twice.
Remove the duplicate one.
Same problem for CONFIG_HAPPYMEAL, except I added as builtin for boot
test with NFS.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a57cdeb369 ("sparc: sparc64_defconfig: add necessary configs for qemu")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:22:40 -08:00
Rob Gardner
e5e8b80d35 sparc64: Fix opcode filtering in handling of no fault loads
is_no_fault_exception() has two bugs which were discovered via random
opcode testing with stress-ng. Both are caused by improper filtering
of opcodes.

The first bug can be triggered by a floating point store with a no-fault
ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.

The code first tests op3[5] (0x1000000), which denotes a floating
point instruction, and then tests op3[2] (0x200000), which denotes a
store instruction. But these bits are not mutually exclusive, and the
above mentioned opcode has both bits set. The intent is to filter out
stores, so the test for stores must be done first in order to have
any effect.

The second bug can be triggered by a floating point load with one of
the invalid ASI values 0x8e or 0x8f, which pass this check in
is_no_fault_exception():
     if ((asi & 0xf2) == ASI_PNF)

An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
in handle_ldf_stq(), and is_no_fault_exception() must not allow these
invalid asi values to make it that far.

In both of these cases, handle_ldf_stq() reacts by calling
sun4v_data_access_exception() or spitfire_data_access_exception(),
which call is_no_fault_exception() and results in an infinite
recursion.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:21:10 -08:00
David S. Miller
8515455720 Merge branch 's390-qeth-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2021-03-09

please apply the following patch series to netdev's net tree.

This brings one fix for a memleak in an error path of the setup code.
Also several fixes for dealing with pending TX buffers - two for old
bugs in their completion handling, and one recent regression in a
teardown path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:54 -08:00
Julian Wiedmann
7eefda7f35 s390/qeth: fix notification for pending buffers during teardown
The cited commit reworked the state machine for pending TX buffers.
In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
uses NEED_QAOB for buffers that get parked while waiting for their QAOB
completion.

But it missed to adjust the check in qeth_tx_complete_buf(). So if
qeth_tx_complete_pending_bufs() is called during teardown to drain
the parked TX buffers, we no longer raise a notification for af_iucv.

Instead of updating the checked state, just move this code into
qeth_tx_complete_pending_bufs() itself. This also gets rid of the
special-case in the common TX completion path.

Fixes: 8908f36d20 ("s390/qeth: fix af_iucv notification race")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:54 -08:00
Julian Wiedmann
3e83d467a0 s390/qeth: schedule TX NAPI on QAOB completion
When a QAOB notifies us that a pending TX buffer has been delivered, the
actual TX completion processing by qeth_tx_complete_pending_bufs()
is done within the context of a TX NAPI instance. We shouldn't rely on
this instance being scheduled by some other TX event, but just do it
ourselves.

qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
instance. To avoid touching the TX queue's NAPI instance
before/after it is (un-)registered, reorder the code in qeth_open()
and qeth_stop() accordingly.

Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:54 -08:00
Julian Wiedmann
c20383ad16 s390/qeth: improve completion of pending TX buffers
The current design attaches a pending TX buffer to a custom
single-linked list, which is anchored at the buffer's slot on the
TX ring. The buffer is then checked for final completion whenever
this slot is processed during a subsequent TX NAPI poll cycle.

But if there's insufficient traffic on the ring, we might never make
enough progress to get back to this ring slot and discover the pending
buffer's final TX completion. In particular if this missing TX
completion blocks the application from sending further traffic.

So convert the custom single-linked list code to a per-queue list_head,
and scan this list on every TX NAPI cycle.

Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:54 -08:00
Julian Wiedmann
e7a36d27f6 s390/qeth: fix memory leak after failed TX Buffer allocation
When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
back an Output Queue, the 'out_freeoutqbufs' path will free all
previously allocated buffers for this queue. But it misses to free the
half-finished queue struct itself.

Move the buffer allocation into qeth_alloc_output_queue(), and deal with
such errors internally.

Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:14:53 -08:00
David S. Miller
b005c9ef5a Merge branch 'virtio_net-infinite-loop'
Balazs Nemeth says:

====================
net: prevent infinite loop caused by incorrect proto from virtio_net_hdr_set_proto

These patches prevent an infinite loop for gso packets with a protocol
from virtio net hdr that doesn't match the protocol in the packet.
Note that packets coming from a device without
header_ops->parse_protocol being implemented will not be caught by
the check in virtio_net_hdr_to_skb, but the infinite loop will still
be prevented by the check in the gso layer.

Changes from v2 to v3:
  - Remove unused *eth.
  - Use MPLS_HLEN to also check if the MPLS header length is a multiple
    of four.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:12:20 -08:00
Balazs Nemeth
d348ede32e net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
A packet with skb_inner_network_header(skb) == skb_network_header(skb)
and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
from the packet. Subsequently, the call to skb_mac_gso_segment will
again call mpls_gso_segment with the same packet leading to an infinite
loop. In addition, ensure that the header length is a multiple of four,
which should hold irrespective of the number of stacked labels.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:12:20 -08:00
Balazs Nemeth
924a9bc362 net: check if protocol extracted by virtio_net_hdr_set_proto is correct
For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
set) based on the type in the virtio net hdr, but the skb could contain
anything since it could come from packet_snd through a raw socket. If
there is a mismatch between what virtio_net_hdr_set_proto sets and
the actual protocol, then the skb could be handled incorrectly later
on.

An example where this poses an issue is with the subsequent call to
skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
correctly. A specially crafted packet could fool
skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.

Avoid blindly trusting the information provided by the virtio net header
by checking that the protocol in the packet actually matches the
protocol set by virtio_net_hdr_set_proto. Note that since the protocol
is only checked if skb->dev implements header_ops->parse_protocol,
packets from devices without the implementation are not checked at this
stage.

Fixes: 9274124f02 ("net: stricter validation of untrusted gso packets")
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:12:20 -08:00
George McCollister
286a8624d7 net: dsa: xrs700x: check if partner is same as port in hsr join
Don't assign dp to partner if it's the same port that xrs700x_hsr_join
was called with. The partner port is supposed to be the other port in
the HSR/PRP redundant pair not the same port. This fixes an issue
observed in testing where forwarding between redundant HSR ports on this
switch didn't work depending on the order the ports were added to the
hsr device.

Fixes: bd62e6f5e6 ("net: dsa: xrs700x: add HSR offloading support")
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 16:03:37 -08:00
Linus Torvalds
4b3d9f9cf1 gpio fixes for v5.12-rc3
- fix two regressions in core GPIO subsystem code: one NULL-pointer dereference
   and one list corruption
 - read GPIO line names from fwnode instead of using the generic device
   properties to fix a regression on stm32mp151
 - fixes to ACPI GPIO and gpio-pca953x to handle a regression in IRQ handling
   on Intel Galileo
 - update .gitignore in GPIO selftests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmBHk9AACgkQEacuoBRx
 13Kd8RAAzU2lPguxO0DjbLfOfYObqV6fODcPiMSk58LCsojz2AVrS/Kba6oSIsur
 OGdUQf4wNNMZsr81+DmBbk/4efd8lF1zqkIa6+serBjK+QONGd7Am0Oc3QiXgk6l
 ZnD51FUAHHdsiHVHmlnCYRg2w93kpu1PHDQCOl4/96xSr1BrRAhC8uV2ImVm+0XS
 z5ddHEe9qITsVhlD2MEpKm3wZEhhJHGtzxD7mjO3iAXzSQcCwpjD37Y5DDkzrgal
 gkLEYCXWeaexOHsLxPmfyDO0GMNTWL8ai8GpRMJCJwGTRataf0hVZZ7TVSGhE8+e
 Ms3hOxKj6AJKbfQVsSy+mJyHTF+ef4UqUp4+CqGdLoz6S5+JkQ9TxPdYNzaNCPeD
 Vk4htVUEYx0M8xHJhMA5siXLWTfDGRGU5ILNcwCKHA3iAMtQw/H2+SKzNgQARGgo
 D1LdFVXzfW278zWv1OGybLh2al8CW/Mi2ybg/kmLsXLJFCpj5kzy2/H2BznBJmPT
 HjU61iiaVlnt7kH3aFd92D+Sr33f7r1kxjonq8aNK3mhMoqOQUxQRtduZp9goJZN
 Kt0uHBMfz/YiyoGlM/Y3SA9C3b7O0ClYA/0DyJ6Qt/6NTDRMMfwAXEjd1n7dYLH1
 ZSOKe1S82GbsVdDQP0qSzzLoziX4U/ttbHxN+RCv4hs67+U5Uw4=
 =nVo+
 -----END PGP SIGNATURE-----

Merge tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "A bunch of fixes for the GPIO subsystem. We have two regressions in
  the core code spotted right after the merge window, a series of fixes
  for ACPI GPIO and a subsequent fix for a related regression in
  gpio-pca953x + a minor tweak in .gitignore and a rework of handling of
  the gpio-line-names to remedy a regression in stm32mp151.

  Summary:

   - fix two regressions in core GPIO subsystem code: one NULL-pointer
     dereference and one list corruption

   - read GPIO line names from fwnode instead of using the generic
     device properties to fix a regression on stm32mp151

   - fixes to ACPI GPIO and gpio-pca953x to handle a regression in IRQ
     handling on Intel Galileo

   - update .gitignore in GPIO selftests"

* tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: Read "gpio-line-names" from a firmware node
  gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
  gpiolib: acpi: Allow to find GpioInt() resource by name and index
  gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
  gpiolib: acpi: Add missing IRQF_ONESHOT
  gpio: fix gpio-device list corruption
  gpio: fix NULL-deref-on-deregistration regression
  selftests: gpio: update .gitignore
2021-03-09 12:02:18 -08:00
Linus Torvalds
9c39198a65 - fixes for boot breakage because of misaligned FDTs
- fix for overwritten exception handlers
 - enable MIPS optimized crypto for all MIPS CPUs to improve wireguard
   performance
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmBHkMwaHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHBxRg/+JKKRLn2GyQkqLVkBKCRk
 mKQkqyRAP3HdwBN1EsC2K8PWhoffc577byyO+R+nXDR9BUSrK1geTrwdtOAH/ZMa
 sX/YqvJdyzS5HqAIkSy1KyzoYOWFB/Xe0VsLn85Oz5QMR+OWzcdG06LcKq+v4fro
 za47T6hufjN179EhYOP+xldMwkhfK/fMw4HKFoOY9swaGhCVHx9PSoCb8dyd9vhQ
 X3il5l2BlihcJKAitLErUrxciu6eLiUEB3ODbj6HQM3yjJeRiwK20PJfsfy1QIWn
 44dNz5cI1XRlWk4HdNGfZ5/8VV0gMVv3UKK1SPyiBk1o+CRlh/qtxtvjHejm6c2+
 56iQsYk/XYjFSMZf3WgLzZJoxWGll+ParIFxEJ4SfxFMFDe/KtcvdMhqwy2zJJKq
 cZc17sT3YHIHdelDYpYt0T7TZxFxnj18BWwOWVsvNEMKCGxOUetH2MrOlsi5aIHh
 mSYtsQ6V6FXZKlRgHAVnPL6gxPtlhIjbru7Zv9eW8wRUrB/pQWy6OGoWTR1B0sKr
 TdbmfEbnIKpSEE6WSugGpnflNsIWJYrZXePTBRCibipQZVCmcyhiphSQb0UiNKOO
 uzWnLYGToL+gpRjC/GNEcjycQvT4rUzHUCpnH3s99VfabkQzcgYT+9m7ORsbdKm5
 r9mTuB7X7WYiMcOXTtPBX3A=
 =xWSR
 -----END PGP SIGNATURE-----

Merge tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - fixes for boot breakage because of misaligned FDTs

 - fix for overwritten exception handlers

 - enable MIPS optimized crypto for all MIPS CPUs to improve wireguard
   performance

* tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: kernel: Reserve exception base early to prevent corruption
  MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
  crypto: mips/poly1305 - enable for all MIPS processors
  MIPS: boot/compressed: Copy DTB to aligned address
2021-03-09 11:56:41 -08:00
Xie He
f7d9d48545 net: lapbether: Remove netif_start_queue / netif_stop_queue
For the devices in this driver, the default qdisc is "noqueue",
because their "tx_queue_len" is 0.

In function "__dev_queue_xmit" in "net/core/dev.c", devices with the
"noqueue" qdisc are specially handled. Packets are transmitted without
being queued after a "dev->flags & IFF_UP" check. However, it's possible
that even if this check succeeds, "ops->ndo_stop" may still have already
been called. This is because in "__dev_close_many", "ops->ndo_stop" is
called before clearing the "IFF_UP" flag.

If we call "netif_stop_queue" in "ops->ndo_stop", then it's possible in
"__dev_queue_xmit", it sees the "IFF_UP" flag is present, and then it
checks "netif_xmit_stopped" and finds that the queue is already stopped.
In this case, it will complain that:
"Virtual device ... asks to queue packet!"

To prevent "__dev_queue_xmit" from generating this complaint, we should
not call "netif_stop_queue" in "ops->ndo_stop".

We also don't need to call "netif_start_queue" in "ops->ndo_open",
because after a netdev is allocated and registered, the
"__QUEUE_STATE_DRV_XOFF" flag is initially not set, so there is no need
to call "netif_start_queue" to clear it.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 11:03:09 -08:00
Thomas Bogendoerfer
bd67b711bf MIPS: kernel: Reserve exception base early to prevent corruption
BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb396454 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.

Huge thanks to Serge for analysing and proposing a solution to this
issue.

Fixes: 2dcb396454 ("memblock: do not start bottom-up allocations with kernel_end")
Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-03-09 11:22:59 +01:00
Linus Torvalds
987a08741d Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:
 "Just some more random bits from Al, including a conversion over to
  generic extables"

* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
  sparc32: take ->thread.flags out
  sparc32: get rid of fake_swapper_regs
  sparc64: get rid of fake_swapper_regs
  sparc32: switch to generic extables
  sparc32: switch copy_user.S away from range exception table entries
  sparc32: get rid of range exception table entries in checksum_32.S
  sparc32: switch __bzero() away from range exception table entries
  sparc32: kill lookup_fault()
  sparc32: don't bother with lookup_fault() in __bzero()
2021-03-08 22:01:58 -08:00
Paulo Alcantara
04ad69c342 cifs: do not send close in compound create+close requests
In case of interrupted syscalls, prevent sending CLOSE commands for
compound CREATE+CLOSE requests by introducing an
CIFS_CP_CREATE_CLOSE_OP flag to indicate lower layers that it should
not send a CLOSE command to the MIDs corresponding the compound
CREATE+CLOSE request.

A simple reproducer:

    #!/bin/bash

    mount //server/share /mnt -o username=foo,password=***
    tc qdisc add dev eth0 root netem delay 450ms
    stat -f /mnt &>/dev/null & pid=$!
    sleep 0.01
    kill $pid
    tc qdisc del dev eth0 root
    umount /mnt

Before patch:

    ...
    6 0.256893470 192.168.122.2 → 192.168.122.15 SMB2 402 Create Request File: ;GetInfo Request FS_INFO/FileFsFullSizeInformation;Close Request
    7 0.257144491 192.168.122.15 → 192.168.122.2 SMB2 498 Create Response File: ;GetInfo Response;Close Response
    9 0.260798209 192.168.122.2 → 192.168.122.15 SMB2 146 Close Request File:
   10 0.260841089 192.168.122.15 → 192.168.122.2 SMB2 130 Close Response, Error: STATUS_FILE_CLOSED

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-08 21:23:22 -06:00
Paulo Alcantara
14302ee330 cifs: return proper error code in statfs(2)
In cifs_statfs(), if server->ops->queryfs is not NULL, then we should
use its return value rather than always returning 0.  Instead, use rc
variable as it is properly set to 0 in case there is no
server->ops->queryfs.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-08 21:23:11 -06:00
Paulo Alcantara
e3d100eae4 cifs: change noisy error message to FYI
A customer has reported that their dmesg were being flooded by

  CIFS: VFS: \\server Cancelling wait for mid xxx cmd: a
  CIFS: VFS: \\server Cancelling wait for mid yyy cmd: b
  CIFS: VFS: \\server Cancelling wait for mid zzz cmd: c

because some processes that were performing statfs(2) on the share had
been interrupted due to their automount setup when certain users
logged in and out.

Change it to FYI as they should be mostly informative rather than
error messages.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-08 21:22:52 -06:00
Paulo Alcantara
bf1bc694b6 cifs: print MIDs in decimal notation
The MIDs are mostly printed as decimal, so let's make it consistent.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-08 21:22:31 -06:00
Tong Zhang
4416e98594 atm: idt77252: fix null-ptr-dereference
this one is similar to the phy_data allocation fix in uPD98402, the
driver allocate the idt77105_priv and store to dev_data but later
dereference using dev->dev_data, which will cause null-ptr-dereference.

fix this issue by changing dev_data to phy_data so that PRIV(dev) can
work correctly.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:16:30 -08:00
Tong Zhang
3153724fc0 atm: uPD98402: fix incorrect allocation
dev->dev_data is set in zatm.c, calling zatm_start() will overwrite this
dev->dev_data in uPD98402_start() and a subsequent PRIV(dev)->lock
(i.e dev->phy_data->lock) will result in a null-ptr-dereference.

I believe this is a typo and what it actually want to do is to allocate
phy_data instead of dev_data.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:16:30 -08:00
Tong Zhang
1019d7923d atm: fix a typo in the struct description
phy_data means private PHY data not date

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:16:30 -08:00
Jia-Ju Bai
179d0ba0c4 net: qrtr: fix error return code of qrtr_sendmsg()
When sock_alloc_send_skb() returns NULL to skb, no error return code of
qrtr_sendmsg() is assigned.
To fix this bug, rc is assigned with -ENOMEM in this case.

Fixes: 194ccc8829 ("net: qrtr: Support decoding incoming v2 packets")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:02:53 -08:00
Davide Caratti
27ab92d999 mptcp: fix length of ADD_ADDR with port sub-option
in current Linux, MPTCP peers advertising endpoints with port numbers use
a sub-option length that wrongly accounts for the trailing TCP NOP. Also,
receivers will only process incoming ADD_ADDR with port having such wrong
sub-option length. Fix this, making ADD_ADDR compliant to RFC8684 §3.4.1.

this can be verified running tcpdump on the kselftests artifacts:

 unpatched kernel:
 [root@bottarga mptcp]# tcpdump -tnnr unpatched.pcap | grep add-addr
 reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
 IP 10.0.1.1.10000 > 10.0.1.2.53078: Flags [.], ack 101, win 509, options [nop,nop,TS val 214459678 ecr 521312851,mptcp add-addr v1 id 1 a00:201:2774:2d88:7436:85c3:17fd:101], length 0
 IP 10.0.1.2.53078 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 521312852 ecr 214459678,mptcp add-addr[bad opt]]

 patched kernel:
 [root@bottarga mptcp]# tcpdump -tnnr patched.pcap | grep add-addr
 reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
 IP 10.0.1.1.10000 > 10.0.1.2.38178: Flags [.], ack 101, win 509, options [nop,nop,TS val 3728873902 ecr 2732713192,mptcp add-addr v1 id 1 10.0.2.1:10100 hmac 0xbccdfcbe59292a1f,nop,nop], length 0
 IP 10.0.1.2.38178 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 2732713195 ecr 3728873902,mptcp add-addr v1-echo id 1 10.0.2.1:10100,nop,nop], length 0

Fixes: 22fb85ffae ("mptcp: add port support for ADD_ADDR suboption writing")
CC: stable@vger.kernel.org # 5.11+
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 15:02:03 -08:00
Jia-Ju Bai
2055a99da8 net: bonding: fix error return code of bond_neigh_init()
When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error
return code of bond_neigh_init() is assigned.
To fix this bug, ret is assigned with -EINVAL in these cases.

Fixes: 9e99bfefdb ("bonding: fix bond_neigh_init()")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:05:36 -08:00
Vladimir Oltean
29d98f54a4 net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
The txtime is passed to the driver in skb->skb_mstamp_ns, which is
actually in a union with skb->tstamp (the place where software
timestamps are kept).

Since commit b50a5c70ff ("net: allow simultaneous SW and HW transmit
timestamping"), __sock_recv_timestamp has some logic for making sure
that the two calls to skb_tstamp_tx:

skb_tx_timestamp(skb) # Software timestamp in the driver
-> skb_tstamp_tx(skb, NULL)

and

skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver

will both do the right thing and in a race-free manner, meaning that
skb_tx_timestamp will deliver a cmsg with the software timestamp only,
and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg
with the hardware timestamp only.

Why are races even possible? Well, because although the software timestamp
skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb)
lives in skb_shinfo(skb), an area which is shared between skbs and their
clones. And skb_tstamp_tx works by cloning the packets when timestamping
them, therefore attempting to perform hardware timestamping on an skb's
clone will also change the hardware timestamp of the original skb. And
the original skb might have been yet again cloned for software
timestamping, at an earlier stage.

So the logic in __sock_recv_timestamp can't be as simple as saying
"does this skb have a hardware timestamp? if yes I'll send the hardware
timestamp to the socket, otherwise I'll send the software timestamp",
precisely because the hardware timestamp is shared.
Instead, it's quite the other way around: __sock_recv_timestamp says
"does this skb have a software timestamp? if yes, I'll send the software
timestamp, otherwise the hardware one". This works because the software
timestamp is not shared with clones.

But that means we have a problem when we attempt hardware timestamping
with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp
will say "oh, yeah, this must be some sort of odd clone" and will not
deliver the hardware timestamp to the socket. And this is exactly what
is happening when we have txtime enabled on the socket: as mentioned,
that is put in a union with skb->tstamp, so it is quite easy to mistake
it.

Do what other drivers do (intel igb/igc) and write zero to skb->tstamp
before taking the hardware timestamp. It's of no use to us now (we're
already on the TX confirmation path).

Fixes: 0d08c9ec7d ("enetc: add support time specific departure base on the qos etf")
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:03:42 -08:00
Alex Marginean
1b2395dfff net: enetc: set MAC RX FIFO to recommended value
On LS1028A, the MAC RX FIFO defaults to the value 2, which is too high
and may lead to RX lock-up under traffic at a rate higher than 6 Gbps.
Set it to 1 instead, as recommended by the hardware design team and by
later versions of the ENETC block guide.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Jason Liu <jason.hui.liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:03:42 -08:00
Paul Cercueil
2e26962236 net: davicom: Use platform_get_irq_optional()
The second IRQ line really is optional, so use
platform_get_irq_optional() to obtain it.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:01:58 -08:00
Paul Cercueil
cf9e60aa69 net: davicom: Fix regulator not turned off on driver removal
We must disable the regulator that was enabled in the probe function.

Fixes: 7994fe55a4 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:01:58 -08:00
Paul Cercueil
ac88c531a5 net: davicom: Fix regulator not turned off on failed probe
When the probe fails or requests to be defered, we must disable the
regulator that was previously enabled.

Fixes: 7994fe55a4 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 12:01:58 -08:00
Vladimir Oltean
03cbb87054 net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
Tobias reports that after the blamed patch, VLAN objects being added to
a bridge device are being added to all slave ports instead (swp2, swp3).

ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
ip link set swp3 master br0
bridge vlan add dev br0 vid 100 self

This is because the fix was too broad: we made dsa_port_offloads_netdev
say "yes, I offload the br0 bridge" for all slave ports, but we didn't
add the checks whether the switchdev object was in fact meant for the
physical port or for the bridge itself. So we are reacting on events in
a way in which we shouldn't.

The reason why the fix was too broad is because the question itself,
"does this DSA port offload this netdev", was too broad in the first
place. The solution is to disambiguate the question and separate it into
two different functions, one to be called for each switchdev attribute /
object that has an orig_dev == net_bridge (dsa_port_offloads_bridge),
and the other for orig_dev == net_bridge_port (*_offloads_bridge_port).

In the case of VLAN objects on the bridge interface, this solves the
problem because we know that VLAN objects are per bridge port and not
per bridge. And when orig_dev is equal to the net_bridge, we offload it
as a bridge, but not as a bridge port; that's how we are able to skip
reacting on those events. Note that this is compatible with future plans
to have explicit offloading of VLAN objects on the bridge interface as a
bridge port (in DSA, this signifies that we should add that VLAN towards
the CPU port).

Fixes: 99b8202b17 ("net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored")
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Tested-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:59:00 -08:00
Jia-Ju Bai
62765d3955 net: wan: fix error return code of uhdlc_init()
When priv->rx_skbuff or priv->tx_skbuff is NULL, no error return code of
uhdlc_init() is assigned.
To fix this bug, ret is assigned with -ENOMEM in these cases.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:56:40 -08:00
Jia-Ju Bai
143c253f42 net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
When hns_assemble_skb() returns NULL to skb, no error return code of
hns_nic_clear_all_rx_fetch() is assigned.
To fix this bug, ret is assigned with -ENOMEM in this case.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:56:00 -08:00
Grant Grundler
4d8c79b7e9 net: usb: log errors to dmesg/syslog
Errors in protocol should be logged when the driver aborts operations.
If the driver can carry on and "humor" the device, then emitting
the message as debug output level is fine.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:54:32 -08:00
Grant Grundler
492bbe7f8a net: usb: cdc_ncm: emit dev_err on error paths
Several error paths in bind/probe code will only emit
output using dev_dbg. But if we are going to fail the
bind/probe, emit related output with "err" priority.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:54:05 -08:00
Bhaskar Chowdhury
a4813dc7ba net: ethernet: chelsio: inline_crypto: Mundane typos fixed throughout the file chcr_ktls.c
Mundane typos fixes throughout the file.

s/establised/established/
s/availbale/available/
s/vaues/values/
s/Incase/In case/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:52:46 -08:00
Philipp Zabel
bf9279cd63 net: dsa: bcm_sf2: simplify optional reset handling
As of commit bb475230b8 ("reset: make optional functions really
optional"), the reset framework API calls use NULL pointers to describe
optional, non-present reset controls.

This allows to unconditionally return errors from
devm_reset_control_get_optional_exclusive.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08 11:51:36 -08:00
Bjørn Mork
6654111c89 MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
The devicetree specification requires 8-byte alignment in
memory. This is now enforced by libfdt since commit 79edff1206
("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
which included the upstream commit 5e735860c478 ("libfdt: Check for
8-byte address alignment in fdt_ro_probe_()").

This broke the MIPS raw appended DTBs which would be appended to
the image immediately following the initramfs section.  This ends
with a 32bit size, resulting in a 4-byte alignment of the DTB.

Fix by padding with zeroes to 8-bytes when MIPS_RAW_APPENDED_DTB
is defined.

Fixes: 79edff1206 ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-03-08 18:36:08 +01:00
Christian Brauner
ee2e3f5062
mount: fix mounting of detached mounts onto targets that reside on shared mounts
Creating a series of detached mounts, attaching them to the filesystem,
and unmounting them can be used to trigger an integer overflow in
ns->mounts causing the kernel to block any new mounts in count_mounts()
and returning ENOSPC because it falsely assumes that the maximum number
of mounts in the mount namespace has been reached, i.e. it thinks it
can't fit the new mounts into the mount namespace anymore.

Depending on the number of mounts in your system, this can be reproduced
on any kernel that supportes open_tree() and move_mount() by compiling
and running the following program:

  /* SPDX-License-Identifier: LGPL-2.1+ */

  #define _GNU_SOURCE
  #include <errno.h>
  #include <fcntl.h>
  #include <getopt.h>
  #include <limits.h>
  #include <stdbool.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <sys/mount.h>
  #include <sys/stat.h>
  #include <sys/syscall.h>
  #include <sys/types.h>
  #include <unistd.h>

  /* open_tree() */
  #ifndef OPEN_TREE_CLONE
  #define OPEN_TREE_CLONE 1
  #endif

  #ifndef OPEN_TREE_CLOEXEC
  #define OPEN_TREE_CLOEXEC O_CLOEXEC
  #endif

  #ifndef __NR_open_tree
          #if defined __alpha__
                  #define __NR_open_tree 538
          #elif defined _MIPS_SIM
                  #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                          #define __NR_open_tree 4428
                  #endif
                  #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                          #define __NR_open_tree 6428
                  #endif
                  #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                          #define __NR_open_tree 5428
                  #endif
          #elif defined __ia64__
                  #define __NR_open_tree (428 + 1024)
          #else
                  #define __NR_open_tree 428
          #endif
  #endif

  /* move_mount() */
  #ifndef MOVE_MOUNT_F_EMPTY_PATH
  #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */
  #endif

  #ifndef __NR_move_mount
          #if defined __alpha__
                  #define __NR_move_mount 539
          #elif defined _MIPS_SIM
                  #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                          #define __NR_move_mount 4429
                  #endif
                  #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                          #define __NR_move_mount 6429
                  #endif
                  #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                          #define __NR_move_mount 5429
                  #endif
          #elif defined __ia64__
                  #define __NR_move_mount (428 + 1024)
          #else
                  #define __NR_move_mount 429
          #endif
  #endif

  static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
  {
          return syscall(__NR_open_tree, dfd, filename, flags);
  }

  static inline int sys_move_mount(int from_dfd, const char *from_pathname, int to_dfd,
                                   const char *to_pathname, unsigned int flags)
  {
          return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, to_pathname, flags);
  }

  static bool is_shared_mountpoint(const char *path)
  {
          bool shared = false;
          FILE *f = NULL;
          char *line = NULL;
          int i;
          size_t len = 0;

          f = fopen("/proc/self/mountinfo", "re");
          if (!f)
                  return 0;

          while (getline(&line, &len, f) > 0) {
                  char *slider1, *slider2;

                  for (slider1 = line, i = 0; slider1 && i < 4; i++)
                          slider1 = strchr(slider1 + 1, ' ');

                  if (!slider1)
                          continue;

                  slider2 = strchr(slider1 + 1, ' ');
                  if (!slider2)
                          continue;

                  *slider2 = '\0';
                  if (strcmp(slider1 + 1, path) == 0) {
                          /* This is the path. Is it shared? */
                          slider1 = strchr(slider2 + 1, ' ');
                          if (slider1 && strstr(slider1, "shared:")) {
                                  shared = true;
                                  break;
                          }
                  }
          }
          fclose(f);
          free(line);

          return shared;
  }

  static void usage(void)
  {
          const char *text = "mount-new [--recursive] <base-dir>\n";
          fprintf(stderr, "%s", text);
          _exit(EXIT_SUCCESS);
  }

  #define exit_usage(format, ...)                              \
          ({                                                   \
                  fprintf(stderr, format "\n", ##__VA_ARGS__); \
                  usage();                                     \
          })

  #define exit_log(format, ...)                                \
          ({                                                   \
                  fprintf(stderr, format "\n", ##__VA_ARGS__); \
                  exit(EXIT_FAILURE);                          \
          })

  static const struct option longopts[] = {
          {"help",        no_argument,            0,      'a'},
          { NULL,         no_argument,            0,       0 },
  };

  int main(int argc, char *argv[])
  {
          int exit_code = EXIT_SUCCESS, index = 0;
          int dfd, fd_tree, new_argc, ret;
          char *base_dir;
          char *const *new_argv;
          char target[PATH_MAX];

          while ((ret = getopt_long_only(argc, argv, "", longopts, &index)) != -1) {
                  switch (ret) {
                  case 'a':
                          /* fallthrough */
                  default:
                          usage();
                  }
          }

          new_argv = &argv[optind];
          new_argc = argc - optind;
          if (new_argc < 1)
                  exit_usage("Missing base directory\n");
          base_dir = new_argv[0];

          if (*base_dir != '/')
                  exit_log("Please specify an absolute path");

          /* Ensure that target is a shared mountpoint. */
          if (!is_shared_mountpoint(base_dir))
                  exit_log("Please ensure that \"%s\" is a shared mountpoint", base_dir);

          dfd = open(base_dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
          if (dfd < 0)
                  exit_log("%m - Failed to open base directory \"%s\"", base_dir);

          ret = mkdirat(dfd, "detached-move-mount", 0755);
          if (ret < 0)
                  exit_log("%m - Failed to create required temporary directories");

          ret = snprintf(target, sizeof(target), "%s/detached-move-mount", base_dir);
          if (ret < 0 || (size_t)ret >= sizeof(target))
                  exit_log("%m - Failed to assemble target path");

          /*
           * Having a mount table with 10000 mounts is already quite excessive
           * and shoult account even for weird test systems.
           */
          for (size_t i = 0; i < 10000; i++) {
                  fd_tree = sys_open_tree(dfd, "detached-move-mount",
                                          OPEN_TREE_CLONE |
                                          OPEN_TREE_CLOEXEC |
                                          AT_EMPTY_PATH);
                  if (fd_tree < 0) {
                          fprintf(stderr, "%m - Failed to open %d(detached-move-mount)", dfd);
                          exit_code = EXIT_FAILURE;
                          break;
                  }

                  ret = sys_move_mount(fd_tree, "", dfd, "detached-move-mount", MOVE_MOUNT_F_EMPTY_PATH);
                  if (ret < 0) {
                          if (errno == ENOSPC)
                                  fprintf(stderr, "%m - Buggy mount counting");
                          else
                                  fprintf(stderr, "%m - Failed to attach mount to %d(detached-move-mount)", dfd);
                          exit_code = EXIT_FAILURE;
                          break;
                  }
                  close(fd_tree);

                  ret = umount2(target, MNT_DETACH);
                  if (ret < 0) {
                          fprintf(stderr, "%m - Failed to unmount %s", target);
                          exit_code = EXIT_FAILURE;
                          break;
                  }
          }

          (void)unlinkat(dfd, "detached-move-mount", AT_REMOVEDIR);
          close(dfd);

          exit(exit_code);
  }

and wait for the kernel to refuse any new mounts by returning ENOSPC.
How many iterations are needed depends on the number of mounts in your
system. Assuming you have something like 50 mounts on a standard system
it should be almost instantaneous.

The root cause of this is that detached mounts aren't handled correctly
when source and target mount are identical and reside on a shared mount
causing a broken mount tree where the detached source itself is
propagated which propagation prevents for regular bind-mounts and new
mounts. This ultimately leads to a miscalculation of the number of
mounts in the mount namespace.

Detached mounts created via
open_tree(fd, path, OPEN_TREE_CLONE)
are essentially like an unattached new mount, or an unattached
bind-mount. They can then later on be attached to the filesystem via
move_mount() which calls into attach_recursive_mount(). Part of
attaching it to the filesystem is making sure that mounts get correctly
propagated in case the destination mountpoint is MS_SHARED, i.e. is a
shared mountpoint. This is done by calling into propagate_mnt() which
walks the list of peers calling propagate_one() on each mount in this
list making sure it receives the propagation event.
The propagate_one() functions thereby skips both new mounts and bind
mounts to not propagate them "into themselves". Both are identified by
checking whether the mount is already attached to any mount namespace in
mnt->mnt_ns. The is what the IS_MNT_NEW() helper is responsible for.

However, detached mounts have an anonymous mount namespace attached to
them stashed in mnt->mnt_ns which means that IS_MNT_NEW() doesn't
realize they need to be skipped causing the mount to propagate "into
itself" breaking the mount table and causing a disconnect between the
number of mounts recorded as being beneath or reachable from the target
mountpoint and the number of mounts actually recorded/counted in
ns->mounts ultimately causing an overflow which in turn prevents any new
mounts via the ENOSPC issue.

So teach propagation to handle detached mounts by making it aware of
them. I've been tracking this issue down for the last couple of days and
then verifying that the fix is correct by
unmounting everything in my current mount table leaving only /proc and
/sys mounted and running the reproducer above overnight verifying the
number of mounts counted in ns->mounts. With this fix the counts are
correct and the ENOSPC issue can't be reproduced.

This change will only have an effect on mounts created with the new
mount API since detached mounts cannot be created with the old mount API
so regressions are extremely unlikely.

Link: https://lore.kernel.org/r/20210306101010.243666-1-christian.brauner@ubuntu.com
Fixes: 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-03-08 15:18:43 +01:00
Andy Shevchenko
b41ba2ec54 gpiolib: Read "gpio-line-names" from a firmware node
On STM32MP1, the GPIO banks are subnodes of pin-controller@50002000,
see arch/arm/boot/dts/stm32mp151.dtsi. The driver for
pin-controller@50002000 is in drivers/pinctrl/stm32/pinctrl-stm32.c
and iterates over all of its DT subnodes when registering each GPIO
bank gpiochip. Each gpiochip has:

  - gpio_chip.parent = dev,
    where dev is the device node of the pin controller
  - gpio_chip.of_node = np,
    which is the OF node of the GPIO bank

Therefore, dev_fwnode(chip->parent) != of_fwnode_handle(chip.of_node),
i.e. pin-controller@50002000 != pin-controller@50002000/gpio@5000*000.

The original code behaved correctly, as it extracted the "gpio-line-names"
from of_fwnode_handle(chip.of_node) = pin-controller@50002000/gpio@5000*000.

To achieve the same behaviour, read property from the firmware node.

Fixes: 7cba1a4d5e ("gpiolib: generalize devprop_gpiochip_set_names() for device properties")
Reported-by: Marek Vasut <marex@denx.de>
Reported-by: Roman Guskov <rguskov@dh-electronics.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-03-08 11:59:17 +01:00
Andy Shevchenko
eb441337c7 gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
The commit 0ea683931a ("gpio: dwapb: Convert driver to using the
GPIO-lib-based IRQ-chip") indeliberately made a regression on how
IRQ line from GPIO I²C expander is handled. I.e. it reveals that
the quirk for Intel Galileo Gen 2 misses the part of setting IRQ type
which previously was predefined by gpio-dwapb driver. Now, we have to
reorganize the approach to call necessary parts, which can be done via
ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk.

Without this fix and with above mentioned change the kernel hangs
on the first IRQ event with:

    gpio gpiochip3: Persistence not supported for GPIO 1
    irq 32, desc: 62f8fb50, depth: 0, count: 0, unhandled: 0
    ->handle_irq():  41c7b0ab, handle_bad_irq+0x0/0x40
    ->irq_data.chip(): e03f1e72, 0xc2539218
    ->action(): 0ecc7e6f
    ->action->handler(): 8a3db21e, irq_default_primary_handler+0x0/0x10
       IRQ_NOPROBE set
    unexpected IRQ trap at vector 20

Fixes: ba8c90c618 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931a ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2021-03-08 11:59:17 +01:00
Andy Shevchenko
809390219f gpiolib: acpi: Allow to find GpioInt() resource by name and index
Currently only search by index is supported. However, in some cases
we might need to pass the quirks to the acpi_dev_gpio_irq_get().

For this, split out acpi_dev_gpio_irq_get_by() and replace
acpi_dev_gpio_irq_get() by calling above with NULL for name parameter.

Fixes: ba8c90c618 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931a ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
2021-03-08 11:59:17 +01:00
Andy Shevchenko
62d5247d23 gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
On some systems the ACPI tables has wrong pin number and instead of
having a relative one it provides an absolute one in the global GPIO
number space.

Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk to cope with such cases.

Fixes: ba8c90c618 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931a ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
2021-03-08 11:59:17 +01:00