There's no direct cputime_t manipulation in the xtensa arch code, so
generic virt CPU accounting may be enabled.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Put user exit context tracking call on the common kernel entry/exit path
(function calls are impossible at earlier kernel entry stages because
PS.EXCM is not cleared yet). Put user entry context tracking call on the
user exit path. Syscalls go through this common code too, so nothing
specific needs to be done for them.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
commit dfe564045c ("rcu: Panic after fixed number of stalls")
introduced a new systctl but no accompanying documentation.
Add a simple entry to the documentation.
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Merge in Linux 5.18-rc5 since new code to the STM32 driver
depend in a non-trivial way on the fixes merged in -rc5.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Initially the 'd-ng' template field did not prefix the digest with either
"md5" or "sha1" hash algorithms. Prior to being upstreamed this changed,
but the comments and documentation were not updated. Fix the comments
and documentation.
Fixes: 4d7aeee73f ("ima: define new template ima-ng and template fields d-ng and n-ng")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Take care of faults occuring between the PARange and IPA range by
injecting an exception
- Fix S2 faults taken from a host EL0 in protected mode
- Work around Oops caused by a PMU access from a 32bit guest when PMU
has been created. This is a temporary bodge until we fix it for
good.
x86:
- Fix potential races when walking host page table
- Fix shadow page table leak when KVM runs nested
- Work around bug in userspace when KVM synthesizes leaf 0x80000021
on older (pre-EPYC) or Intel processors
Generic (but affects only RISC-V):
- Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: work around QEMU issue with synthetic CPUID leaves
Revert "x86/mm: Introduce lookup_address_in_mm()"
KVM: x86/mmu: fix potential races when walking host page table
KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
KVM: arm64: Inject exception on out-of-IPA-range translation fault
KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set
KVM: arm64: Handle host stage-2 faults from 32-bit EL0
Add a tristate property to advertise desired transmit level.
If the device supports the 2.4 Vpp operating mode for 10BASE-T1L,
as defined in 802.3gc, and the 2.4 Vpp transmit voltage operation
is desired, property should be set to 1. This property is used
to select whether Auto-Negotiation advertises a request to
operate the 10BASE-T1L PHY in increased transmit level mode.
If property is set to 1, the PHY shall advertise a request
to operate the 10BASE-T1L PHY in increased transmit level mode.
If property is set to zero, the PHY shall not advertise
a request to operate the 10BASE-T1L PHY in increased transmit level mode.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apple SoCs such as the M1 come with a simple DMA address filter called
SART. Unlike a real IOMMU no pagetables can be configured but instead
DMA transactions can be allowed for up to 16 paddr regions. The consumer
also needs special support since not all DMA allocations have to be
added to this filter.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sven Peter <sven@svenpeter.dev>
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.18-rc5 for some
reported issues and new quirks. They include:
- dwc3 driver fixes
- xhci driver fixes
- typec driver fixes
- new usb-serial driver ids
- added new USB devices to existing quirk tables
- other tiny fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (31 commits)
usb: phy: generic: Get the vbus supply
usb: dwc3: gadget: Return proper request status
usb: dwc3: pci: add support for the Intel Meteor Lake-P
usb: dwc3: core: Only handle soft-reset in DCTL
usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind()
usb: misc: eud: Fix an error handling path in eud_probe()
usb: core: Don't hold the device lock while sleeping in do_proc_control()
usb: dwc3: Try usb-role-switch first in dwc3_drd_init
usb: dwc3: core: Fix tx/rx threshold settings
usb: mtu3: fix USB 3.0 dual-role-switch from device to host
xhci: Enable runtime PM on second Alderlake controller
usb: dwc3: fix backwards compat with rockchip devices
dt-bindings: usb: samsung,exynos-usb2: add missing required reg
usb: misc: fix improper handling of refcount in uss720_probe()
USB: Fix ehci infinite suspend-resume loop issue in zhaoxin
usb: typec: tcpm: Fix undefined behavior due to shift overflowing the constant
usb: typec: rt1719: Fix build error without CONFIG_POWER_SUPPLY
usb: typec: ucsi: Fix role swapping
usb: typec: ucsi: Fix reuse of completion structure
usb: xhci: tegra:Fix PM usage reference leak of tegra_xusb_unpowergate_partitions
...
The PHY reset was intended to be a phandle for a special PHY reset
driver for the integrated PHYs as well as any external PHYs. It turns
out, that the culprit is how the reset of the switch device is done.
In particular, the switch reset also affects other subsystems like
the GPIO and the SGPIO block and it happens to be the case that the
reset lines of the external PHYs are connected to a common GPIO line.
Thus as soon as the switch issues a reset during probe time, all the
external PHYs will go into reset because all the GPIO lines will
switch to input and the pull-down on that signal will take effect.
So even if there was a special PHY reset driver, it (1) won't fix
the root cause of the problem and (2) it won't fix all the other
consumers of GPIO lines which will also be reset.
It turns out, the Ocelot SoC has the same weird behavior (or the
lack of a dedicated switch reset) and there the problem is already
solved and all the bits and pieces are already there and this PHY
reset property isn't not needed at all.
There are no users of this binding. Just remove it.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new net.mptcp.pm_type sysctl determines which path manager will be
used by each newly-created MPTCP socket.
v2: Handle builds without CONFIG_SYSCTL
v3: Clarify logic for type-specific PM init (Geliang Tang and Paolo Abeni)
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The LAN8814 has a coma mode pin which is used to put the PHY into
isolate and power-down mode. Usually strapped to be asserted by default.
A GPIO is then used to take the PHY out of this mode.
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ARM SoC fixes from Arnd Bergmann:
- A fix for a regression caused by the previous set of bugfixes
changing tegra and at91 pinctrl properties.
More work is needed to figure out what this should actually be, but a
revert makes it work for the moment.
- Defconfig regression fixes for tegra after renamed symbols
- Build-time warning and static checker fixes for imx, op-tee, sunxi,
meson, at91, and omap
- More at91 DT fixes for audio, regulator and spi nodes
- A regression fix for Renesas Hyperflash memory probe
- A stability fix for amlogic boards, modifying the allowed cpufreq
states
- Multiple fixes for system suspend on omap2+
- DT fixes for various i.MX bugs
- A probe error fix for imx6ull-colibri MMC
- A MAINTAINERS file entry for samsung bug reports
* tag 'soc-fixes-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
Revert "arm: dts: at91: Fix boolean properties with values"
bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()
Revert "arm64: dts: tegra: Fix boolean properties with values"
arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
ARM: dts: imx6ull-colibri: fix vqmmc regulator
MAINTAINERS: add Bug entry for Samsung and memory controller drivers
memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode
ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35
ARM: dts: am3517-evm: Fix misc pinmuxing
ARM: dts: am33xx-l4: Add missing touchscreen clock properties
ARM: dts: Fix mmc order for omap3-gta04
ARM: dts: at91: fix pinctrl phandles
ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name
ARM: dts: at91: Describe regulators on at91sam9g20ek
ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek
ARM: dts: at91: Fix boolean properties with values
ARM: dts: at91: use generic node name for dataflash
ARM: dts: at91: align SPI NOR node name with dtschema
ARM: dts: at91: sama7g5ek: Align the impedance of the QSPI0's HSIO and PCB lines
ARM: dts: at91: sama7g5ek: enable pull-up on flexcom3 console lines
...
Pull clk fixes from Stephen Boyd:
"A semi-large pile of clk driver fixes this time around.
Nothing is touching the core so these fixes are fairly well contained
to specific devices that use these clk drivers.
- Some Allwinner SoC fixes to gracefully handle errors and mark an
RTC clk as critical so that the RTC keeps ticking.
- Fix AXI bus clks and RTC clk design for Microchip PolarFire SoC
driver introduced this cycle. This has some devicetree bits acked
by riscv maintainers. We're fixing it now so that the prior
bindings aren't released in a major kernel version.
- Remove a reset on Microchip PolarFire SoCs that broke when enabling
CONFIG_PM.
- Set a min/max for the Qualcomm graphics clk. This got broken by the
clk rate range patches introduced this cycle"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource()
clk: sunxi-ng: sun6i-rtc: Mark rtc-32k as critical
riscv: dts: microchip: reparent mpfs clocks
clk: microchip: mpfs: add RTCREF clock control
clk: microchip: mpfs: re-parent the configurable clocks
dt-bindings: rtc: add refclk to mpfs-rtc
dt-bindings: clk: mpfs: add defines for two new clocks
dt-bindings: clk: mpfs document msspll dri registers
riscv: dts: microchip: fix usage of fic clocks on mpfs
clk: microchip: mpfs: mark CLK_ATHENA as critical
clk: microchip: mpfs: fix parents for FIC clocks
clk: qcom: clk-rcg2: fix gfx3d frequency calculation
clk: microchip: mpfs: don't reset disabled peripherals
clk: sunxi-ng: fix not NULL terminated coccicheck error
Pull random number generator fixes from Jason Donenfeld:
- Eric noticed that the memmove() in crng_fast_key_erasure() was bogus,
so this has been changed to a memcpy() and the confusing situation
clarified with a detailed comment.
- [Half]SipHash documentation updates from Bagas and Eric, after Eric
pointed out that the use of HalfSipHash in random.c made a bit of the
text potentially misleading.
* tag 'random-5.18-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
Documentation: siphash: disambiguate HalfSipHash algorithm from hsiphash functions
Documentation: siphash: enclose HalfSipHash usage example in the literal block
Documentation: siphash: convert danger note to warning for HalfSipHash
random: document crng_fast_key_erasure() destination possibility
This patch series adds a memory.reclaim proactive reclaim interface.
The rationale behind the interface and how it works are in the first
patch.
This patch (of 4):
Introduce a memcg interface to trigger memory reclaim on a memory cgroup.
Use case: Proactive Reclaim
---------------------------
A userspace proactive reclaimer can continuously probe the memcg to
reclaim a small amount of memory. This gives more accurate and up-to-date
workingset estimation as the LRUs are continuously sorted and can
potentially provide more deterministic memory overcommit behavior. The
memory overcommit controller can provide more proactive response to the
changing behavior of the running applications instead of being reactive.
A userspace reclaimer's purpose in this case is not a complete replacement
for kswapd or direct reclaim, it is to proactively identify memory savings
opportunities and reclaim some amount of cold pages set by the policy to
free up the memory for more demanding jobs or scheduling new jobs.
A user space proactive reclaimer is used in Google data centers.
Additionally, Meta's TMO paper recently referenced a very similar
interface used for user space proactive reclaim:
https://dl.acm.org/doi/pdf/10.1145/3503222.3507731
Benefits of a user space reclaimer:
-----------------------------------
1) More flexible on who should be charged for the cpu of the memory
reclaim. For proactive reclaim, it makes more sense to be centralized.
2) More flexible on dedicating the resources (like cpu). The memory
overcommit controller can balance the cost between the cpu usage and
the memory reclaimed.
3) Provides a way to the applications to keep their LRUs sorted, so,
under memory pressure better reclaim candidates are selected. This
also gives more accurate and uptodate notion of working set for an
application.
Why memory.high is not enough?
------------------------------
- memory.high can be used to trigger reclaim in a memcg and can
potentially be used for proactive reclaim. However there is a big
downside in using memory.high. It can potentially introduce high
reclaim stalls in the target application as the allocations from the
processes or the threads of the application can hit the temporary
memory.high limit.
- Userspace proactive reclaimers usually use feedback loops to decide
how much memory to proactively reclaim from a workload. The metrics
used for this are usually either refaults or PSI, and these metrics will
become messy if the application gets throttled by hitting the high
limit.
- memory.high is a stateful interface, if the userspace proactive
reclaimer crashes for any reason while triggering reclaim it can leave
the application in a bad state.
- If a workload is rapidly expanding, setting memory.high to proactively
reclaim memory can result in actually reclaiming more memory than
intended.
The benefits of such interface and shortcomings of existing interface were
further discussed in this RFC thread:
https://lore.kernel.org/linux-mm/5df21376-7dd1-bf81-8414-32a73cea45dd@google.com/
Interface:
----------
Introducing a very simple memcg interface 'echo 10M > memory.reclaim' to
trigger reclaim in the target memory cgroup.
The interface is introduced as a nested-keyed file to allow for future
optional arguments to be easily added to configure the behavior of
reclaim.
Possible Extensions:
--------------------
- This interface can be extended with an additional parameter or flags
to allow specifying one or more types of memory to reclaim from (e.g.
file, anon, ..).
- The interface can also be extended with a node mask to reclaim from
specific nodes. This has use cases for reclaim-based demotion in memory
tiering systens.
- A similar per-node interface can also be added to support proactive
reclaim and reclaim-based demotion in systems without memcg.
- Add a timeout parameter to make it easier for user space to call the
interface without worrying about being blocked for an undefined amount
of time.
For now, let's keep things simple by adding the basic functionality.
[yosryahmed@google.com: worked on versions v2 onwards, refreshed to
current master, updated commit message based on recent
discussions and use cases]
Link: https://lkml.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220425190040.2475377-2-yosryahmed@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Co-developed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Wei Xu <weixugc@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Today it's only possible to write back as a page, idle, or huge. A user
might want to writeback pages which are huge and idle first as these idle
pages do not require decompression and make a good first pass for
writeback.
Idle writeback specifically has the advantage that a refault is unlikely
given that the page has been swapped for some amount of time without being
refaulted.
Huge writeback has the advantage that you're guaranteed to get the maximum
benefit from a single page writeback, that is, you're reclaiming one full
page of memory. Pages which are compressed in zram being written back
result in some benefit which is always less than a page size because of
the fact that it was compressed.
The primary use of this is for minimizing refaults in situations where the
device has to be sensitive to storage endurance. On ChromeOS we have
devices with slow eMMC and repeated writes and refaults can negatively
affect performance and endurance.
Link: https://lkml.kernel.org/r/20220322215821.1196994-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When userspace is debugging a VM, the kvm_debug_exit_arch part of the
kvm_run struct contains arm64 specific debug information: the ESR_EL2
value, encoded in the field "hsr", and the address of the instruction
that caused the exception, encoded in the field "far".
Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
kvm_debug_exit_arch.hsr cannot be changed because that would change the
memory layout of the struct on big endian machines:
Current layout: | Layout with "hsr" extended to 64 bits:
|
offset 0: ESR_EL2[31:0] (hsr) | offset 0: ESR_EL2[61:32] (hsr[61:32])
offset 4: padding | offset 4: ESR_EL2[31:0] (hsr[31:0])
offset 8: FAR_EL2[61:0] (far) | offset 8: FAR_EL2[61:0] (far)
which breaks existing code.
The padding is inserted by the compiler because the "far" field must be
aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
page 18), and the struct itself must be aligned to 8 bytes (the struct must
be aligned to the maximum alignment of its fields - aapcs64, page 18),
which means that "hsr" must be aligned to 8 bytes as it is the first field
in the struct.
To avoid changing the struct size and layout for the existing fields, add a
new field, "hsr_high", which replaces the existing padding. "hsr_high" will
be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
both on big and little endian machine, becomes:
offset 0: ESR_EL2[31:0] (hsr)
offset 4: ESR_EL2[61:32] (hsr_high)
offset 8: FAR_EL2[61:0] (far)
The padding that the compiler inserts for the current struct layout is
unitialized. To prevent an updated userspace running on an old kernel
mistaking the padding for a valid "hsr_high" value, add a new flag,
KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that
"hsr_high" holds a valid ESR_EL2[61:32] value.
[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull arm64 fix from Will Deacon:
"Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type.
This is a fix to the MTE ELF ABI for a bug that was added during the
most recent merge window as part of the coredump support.
The issue is that the value assigned to the new PT_ARM_MEMTAG_MTE
segment type has already been allocated to PT_AARCH64_UNWIND by the
ELF ABI, so we've bumped the value and changed the name of the
identifier to be better aligned with the existing one"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
elf: Fix the arm64 MTE ELF segment name and value
When shadowing 5-level NPT for 4-level NPT L1 guest, the root_sp is
allocated with role.level = 5 and the guest pagetable's root gfn.
And root_sp->spt[0] is also allocated with the same gfn and the same
role except role.level = 4. Luckily that they are different shadow
pages, but only root_sp->spt[0] is the real translation of the guest
pagetable.
Here comes a problem:
If the guest switches from gCR4_LA57=0 to gCR4_LA57=1 (or vice verse)
and uses the same gfn as the root page for nested NPT before and after
switching gCR4_LA57. The host (hCR4_LA57=1) might use the same root_sp
for the guest even the guest switches gCR4_LA57. The guest will see
unexpected page mapped and L2 may exploit the bug and hurt L1. It is
lucky that the problem can't hurt L0.
And three special cases need to be handled:
The root_sp should be like role.direct=1 sometimes: its contents are
not backed by gptes, root_sp->gfns is meaningless. (For a normal high
level sp in shadow paging, sp->gfns is often unused and kept zero, but
it could be relevant and meaningful if sp->gfns is used because they
are backed by concrete gptes.)
For such root_sp in the case, root_sp is just a portal to contribute
root_sp->spt[0], and root_sp->gfns should not be used and
root_sp->spt[0] should not be dropped if gpte[0] of the guest root
pagetable is changed.
Such root_sp should not be accounted too.
So add role.passthrough to distinguish the shadow pages in the hash
when gCR4_LA57 is toggled and fix above special cases by using it in
kvm_mmu_page_{get|set}_gfn() and sp_has_gptes().
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220420131204.2850-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees.
The merge reconciles the ABI fixes for KVM_EXIT_SYSTEM_EVENT between
5.18 and commit c24a950ec7 ("KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata
for SEV-ES", 2022-04-13).
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees:
* Fix potential races when walking host page table
* Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
* Fix shadow page table leak when KVM runs nested
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused. Unfortunately this extensibility
mechanism has several issues:
- x86 is not writing the member, so it would not be possible to use it
on x86 except for new events
- the member is not aligned to 64 bits, so the definition of the
uAPI struct is incorrect for 32- on 64-bit userspace. This is a
problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
usage of flags was only introduced in 5.18.
Since padding has to be introduced, place a new field in there
that tells if the flags field is valid. To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid. The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.
To avoid breaking compilation of userspace that was using the flags
field, provide a userspace-only union to overlap flags with data[0].
The new field is placed at the same offset for both 32- and 64-bit
userspace.
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Creating a symlink pointing to the correct USB Type-C
connector for the on-board USB4 ports when they are created.
The link will be created only if the firmware is able to
describe the connection between the port and its connector.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Won Chung <wonchung@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
From Tegra186 onwards, memory controller support multiple channels.
"reg" items are updated with address and size of these channels.
Tegra186 has overall 5 memory controller channels. Tegra194 and Tegra234
have overall 17 memory controller channels each.
There is one "reg" entry for memory controller stream-ID registers. So
update the "reg" property's "minItems" and "maxItems" accordingly in the
Tegra186 devicetree documentation.
Also update validation for "reg-names" added for these corresponding
"reg" items. ABI change due to new bindings is intended but backward
compatibility is preserved in driver.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
A compound devmap is a dev_pagemap with @vmemmap_shift > 0 and it means
that pages are mapped at a given huge page alignment and utilize uses
compound pages as opposed to order-0 pages.
Take advantage of the fact that most tail pages look the same (except the
first two) to minimize struct page overhead. Allocate a separate page for
the vmemmap area which contains the head page and separate for the next 64
pages. The rest of the subsections then reuse this tail vmemmap page to
initialize the rest of the tail pages.
Sections are arch-dependent (e.g. on x86 it's 64M, 128M or 512M) and when
initializing compound devmap with big enough @vmemmap_shift (e.g. 1G PUD)
it may cross multiple sections. The vmemmap code needs to consult @pgmap
so that multiple sections that all map the same tail data can refer back
to the first copy of that data for a given gigantic page.
On compound devmaps with 2M align, this mechanism lets 6 pages be saved
out of the 8 necessary PFNs necessary to set the subsection's 512 struct
pages being mapped. On a 1G compound devmap it saves 4094 pages.
Altmap isn't supported yet, given various restrictions in altmap pfn
allocator, thus fallback to the already in use vmemmap_populate(). It is
worth noting that altmap for devmap mappings was there to relieve the
pressure of inordinate amounts of memmap space to map terabytes of pmem.
With compound pages the motivation for altmaps for pmem gets reduced.
Link: https://lkml.kernel.org/r/20220420155310.9712-5-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
An application is suspected of having memory leak when its memory
consumption is high and keeps increasing. There are several commonly used
memory allocators: slab, cma, vmalloc, etc. The memory leak
identification can be sped up if the page information allocated by an
allocator can be analyzed separately.
This patch provides supports for memory allocator labelling for slab,
vmalloc, and cma. The pages allocated by slab and cma can be confirmed
from the "PFN" line according to the kernel codes, and the label of the
vmalloc allocator can be obtained by analyzing the stack trace. Thanks
for Vlastimil Babka's constructive suggestions.
Based on Yinan Zhang's study, the call chain of vmalloc() is vmalloc() ->
... -> __vmalloc_node_range() -> __vmalloc_area_node().
__vmalloc_area_node() requests memory through the interface of buddy
allocation system. In the current version, __vmalloc_area_node() uses
four interfaces: alloc_pages_bulk_array_mempolicy(),
alloc_pages_bulk_array_node(), alloc_pages() and alloc_pages_node(). By
disassembling the code, we find that __vmalloc_area_node() is expanded in
__vmalloc_node_range(). So __vmalloc_area_node is not in the stack trace.
On the test machine, the stack trace of pages allocated by vmalloc has the
following four forms:
__alloc_pages_bulk+0x230/0x6a0
__vmalloc_node_range+0x19c/0x598
alloc_pages_bulk_array_mempolicy+0xbc/0x278
__vmalloc_node_range+0x1e8/0x598
__alloc_pages+0x160/0x2b0
__vmalloc_node_range+0x234/0x598
alloc_pages+0xac/0x150
__vmalloc_node_range+0x44c/0x598
Therefore, in two consecutive lines of stacktrace, if the first line
contains the word "alloc_pages" and the second line contains the word
"__vmalloc_node_range", it can be determined that the page is allocated by
vmalloc. And the function offset and size are not the same on different
machines, so there is no need to match them.
At the same time, this patch updates the --cull and --sort options to
support allocator-based merge statistics and sorting. The added functions
are fully compatible with the original work. When using, you can use
"allocator", or abbreviated as "ator". Relevant updates have also been
made in the documentation(Documentation/vm/page_owner.rst).
Example:
./page_owner_sort <input> <output> --cull=st,pid,name,allocator
./page_owner_sort <input> <output> --sort=ator,pid,name
This work is coauthored by Jiajian Ye, Yinan Zhang, Shenghong Han,
Chongxi Zhao, Yuhong Feng and Yongqiang Liu.
Link: https://lkml.kernel.org/r/20220410132932.9402-1-caoyixuan2019@email.szu.edu.cn
Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Cc: Haowen Bai <baihaowen@meizu.com>
Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Yuhong Feng <yuhongf@szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When viewing page owner information, we may want to sort blocks of
information by multiple keys, since one single key does not uniquely
identify a block. Therefore, following adjustments are made:
1. Add a new --sort option to support sorting blocks of information by
multiple keys.
./page_owner_sort <input> <output> --sort=<order>
./page_owner_sort <input> <output> --sort <order>
<order> is a single argument in the form of a comma-separated list,
which offers a way to specify sorting order.
Sorting syntax is [+|-]key[,[+|-]key[,...]]. The ascending or descending
order can be specified by adding the + (ascending, default) or - (descend
-ing) prefix to the key:
./page_owner_sort <input> <output> [option] --sort -key1,+key2,key3...
For example, to sort the blocks first by task command name in lexicographic
order and then by pid in ascending numerical order, use the following:
./page_owner_sort <input> <output> --sort=name,+pid
To sort the blocks first by pid in ascending order and then by timestamp
of the page when it is allocated in descending order, use the following:
./page_owner_sort <input> <output> --sort=pid,-alloc_ts
2. Add explanations of a newly added --sort option in the function usage()
and the document(Documentation/vm/page_owner.rst).
This work is coauthored by
Yixuan Cao
Shenghong Han
Yinan Zhang
Chongxi Zhao
Yuhong Feng
Yongqiang Liu
Link: https://lkml.kernel.org/r/20220401024856.767-3-yejiajian2018@email.szu.edu.cn
Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Yuhong Feng <yuhongf@szu.edu.cn>
Cc: Haowen Bai <baihaowen@meizu.com>
Cc: Sean Anderson <seanga2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When viewing page owner information, we may want to select blocks whose
PID/TGID/TASK_COMM_NAME appears in a user-specified list for data analysis
and aggregation. But currently page_owner_sort only supports selecting
blocks associated with only one specified PID/TGID/TASK_COMM_NAME.
Therefore, following adjustments are made to fix the problem:
1. Enhance selecting function to support the selection of multiple
PIDs/TGIDs/TASK_COMM_NAMEs.
The enhanced usages are as follows:
--pid <pidlist> Select by pid. This selects the blocks whose PID
numbers appear in <pidlist>.
--tgid <tgidlist> Select by tgid. This selects the blocks whose
TGID numbers appear in <tgidlist>.
--name <cmdlist> Select by task command name. This selects the
blocks whose task command name appear in <cmdlist>.
Where <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of
a comma-separated list,which offers a way to specify individual selecting
rules.
For example, if you want to select blocks whose tgids are 1, 2 or 3, you
have to use 4 commands as follows:
./page_owner_sort <input> <output1> --tgid=1
./page_owner_sort <input> <output2> --tgid=2
./page_owner_sort <input> <output3> --tgid=3
cat <output1> <output2> <output3> > <output>
With this patch, you can use only 1 command to obtain the same result as
above:
./page_owner_sort <input> <output1> --tgid=1,2,3
2. Update explanations of --pid, --tgid and --name in the function
usage() and the document(Documents/vm/page_owner.rst).
This work is coauthored by
Yixuan Cao
Shenghong Han
Yinan Zhang
Chongxi Zhao
Yuhong Feng
Yongqiang Liu
Link: https://lkml.kernel.org/r/20220401024856.767-2-yejiajian2018@email.szu.edu.cn
Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Yuhong Feng <yuhongf@szu.edu.cn>
Cc: Haowen Bai <baihaowen@meizu.com>
Cc: Sean Anderson <seanga2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drm-misc-next for 5.19:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- Introduction of display-helper module, and rework of the DP, DSC,
HDCP, HDMI and SCDC headers
- doc: Improvements for tiny drivers, link to external resources
- formats: helper to convert from RGB888 and RGB565 to XRGB8888
- modes: make width-mm/height-mm check mandatory in of_get_drm_panel_display_mode
- ttm: Convert from kvmalloc_array to kvcalloc
Driver Changes:
- bridge:
- analogix_dp: Fix error handling in probe
- dw_hdmi: Coccinelle fixes
- it6505: Fix Kconfig dependency on DRM_DP_AUX_BUS
- panel:
- new panel: DataImage FG040346DSSWBG04
- amdgpu: ttm_eu cleanups
- mxsfb: Rework CRTC mode setting
- nouveau: Make some variables static
- sun4i: Drop drm_display_info.is_hdmi caching, support for the
Allwinner D1
- vc4: Drop drm_display_info.is_hdmi caching
- vmwgfx: Fence improvements
Signed-off-by: Dave Airlie <airlied@redhat.com>
# gpg: Signature made Thu 28 Apr 2022 17:52:13 AEST
# gpg: using EDDSA key 5C1337A45ECA9AEB89060E9EE3EF0D6F671851C5
# gpg: Can't check signature: No public key
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220428075237.yypztjha7hetphcd@houat
Convert the fsl,layerscape-scfg binding to the new YAML format.
In the device trees, the device node always have a "syscon"
compatible, which wasn't mentioned in the previous binding.
Also added, compared to the original binding, is the
interrupt-controller subnode as used in arch/arm/boot/dts/ls1021a.dtsi
as well as the litte-endian and big-endian properties.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220427075338.1156449-5-michael@walle.cc
Convert the fsl,ls-extirq binding to the new YAML format.
In contrast to the original binding documentation, there are three
compatibles which are used in their corresponding device trees which
have a specific compatible and the (already documented) fallback
compatible:
- "fsl,ls1046a-extirq", "fsl,ls1043a-extirq"
- "fsl,ls2080a-extirq", "fsl,ls1088a-extirq"
- "fsl,lx2160a-extirq", "fsl,ls1088a-extirq"
Depending on the number of the number of the external IRQs which is
usually 12 except for the LS1021A where there are only 6, the
interrupt-map-mask was reduced from 0xffffffff to 0xf and 0x7
respectively and the number of interrupt-map entries have to
match.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220427075338.1156449-4-michael@walle.cc
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and netfilter.
Current release - new code bugs:
- bridge: switchdev: check br_vlan_group() return value
- use this_cpu_inc() to increment net->core_stats, fix preempt-rt
Previous releases - regressions:
- eth: stmmac: fix write to sgmii_adapter_base
Previous releases - always broken:
- netfilter: nf_conntrack_tcp: re-init for syn packets only,
resolving issues with TCP fastopen
- tcp: md5: fix incorrect tcp_header_len for incoming connections
- tcp: fix F-RTO may not work correctly when receiving DSACK
- tcp: ensure use of most recently sent skb when filling rate samples
- tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
- virtio_net: fix wrong buf address calculation when using xdp
- xsk: fix forwarding when combining copy mode with busy poll
- xsk: fix possible crash when multiple sockets are created
- bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
bpf_xmit lwt hook
- sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
- wireguard: device: check for metadata_dst with skb_valid_dst()
- netfilter: update ip6_route_me_harder to consider L3 domain
- gre: make o_seqno start from 0 in native mode
- gre: switch o_seqno to atomic to prevent races in collect_md mode
Misc:
- add Eric Dumazet to networking maintainers
- dt: dsa: realtek: remove realtek,rtl8367s string
- netfilter: flowtable: Remove the empty file"
* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
tcp: fix F-RTO may not work correctly when receiving DSACK
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
ixgbe: ensure IPsec VF<->PF compatibility
MAINTAINERS: Update BNXT entry with firmware files
netfilter: nft_socket: only do sk lookups when indev is available
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
bnx2x: fix napi API usage sequence
tls: Skip tls_append_frag on zero copy size
Add Eric Dumazet to networking maintainers
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
net: Use this_cpu_inc() to increment net->core_stats
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
ice: fix use-after-free when deinitializing mailbox snapshot
ice: wait 5 s for EMP reset after firmware flash
ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
...