1.Refresh defconfig of CI20 to support OTG and RNG.
2.Refresh defconfig of CU1000-Neo to support OTG/RNG/OST/SC16IS752.
3.Refresh defconfig of CU1830-Neo to support OTG/DTRNG/OST/SC16IS752.
Tested-by: 周正 (Zhou Zheng) <sernia.zhou@foxmail.com>
Tested by: H. Nikolaus Schaller <hns@goldelico.com> # CI20/jz4780
Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Pull in mips-fixes to get memblock fix.
- fix bug preventing booting on several platforms
- fix for build error, when modules need has_transparent_hugepage
- fix for memleak in alchemy clk setup
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
If the clk_register fails, we should free h before
function returns to prevent memleak.
Fixes: 474402291a ("MIPS: Alchemy: clock framework integration of onchip clocks")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The loop over all memblocks works with PFNs and not physical
addresses, so we need for_each_mem_pfn_range().
Fixes: b10d6bca87 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = a27bd01c
[00000000] *pgd=236a0003, *pmd=1ffa64003
Internal error: Oops: 207 [#1] SMP ARM
Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
Hardware name: BCM2711
PC is at zs_map_object+0x94/0x338
LR is at zram_bvec_rw.constprop.0+0x330/0xa64
pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013
sp : e376bbe0 ip : 00000000 fp : c1e2921c
r10: 00000002 r9 : c1dda730 r8 : 00000000
r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000
r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5383d Table: 235c2a80 DAC: fffffffd
Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
Stack: (0xe376bbe0 to 0xe376c000)
As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.
The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.
After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.
I looked up the possible number of bits in source code and
datasheets, here is what I found:
- on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
- on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
support more than 32 bits, even though supersections in theory allow
up to 40 bits as well.
- on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
XPA supports up to 60 bits in theory, but 40 bits are more than
anyone will ever ship
- On PowerPC, there are three different implementations of 36 bit
addressing, but 32-bit is used without CONFIG_PTE_64BIT
- On RISC-V, the normal page table format can support 34 bit
addressing. There is no highmem support on RISC-V, so anything
above 2GB is unused, but it might be useful to eventually support
CONFIG_ZRAM for high pages.
Fixes: 61989a80fb ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a9 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Tested-by: Stefan Agner <stefan@agner.ch>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Commit c3e2ee6574 ("MIPS: generic: Add support for zboot") added
support for self-extracting images to Generic MIPS. However, the
intended way to boot Generic MIPS kernels is using FIT Images and
UHI boot protocol, but currently there's no way to make self-extracting
FIT Image (only legacy uzImages).
Add a target for this named "vmlinuz.itb", which will consist of
vmlinuz.bin and selected DT blobs. It will allow to have the advantages
of both UHI and self-extracting images.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
1. All final targets like vmlinuz.{bin,ecoff,srec} etc. should reside in
$(objtree)/arch/mips/boot, not in the root $(objtree) directory.
The only file that should be left there is vmlinuz, similar to other
architectures.
2. Add all the targets to $(targets) variable, so they'll be properly
accounted by Kbuild. This also allows to remove redundant
$(clean-files) (which were missing uzImage BTW).
3. Prefix all targets with $(obj)/$(objtree), depending on their
locations.
Misc: fix the identation of the 'STRIP' quiet message.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Ocelots is supported by the generic MIPS build so make it clears that
LEGACY_BOARD_OCELOT is only needed for legacy boards which didn't have
bootloader supporting device tree.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Ocelot belongs to a family of SoC named the VCore III. In order to add
these new Soc, use the new symbol SOC_VCOREIII instead of a one
dedicated to Ocelot.
In order to avoid regression on driver building, the MSCC_OCELOT
configuration symbol is kept until the driver will be converted.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
When BCM47XX_BCMA is enabled and BCMA_DRIVER_PCI is disabled, it results
in the following Kbuild warning:
WARNING: unmet direct dependencies detected for BCMA_DRIVER_PCI_HOSTMODE
Depends on [n]: MIPS [=y] && BCMA_DRIVER_PCI [=n] && PCI_DRIVERS_LEGACY [=y] && BCMA [=y]=y
Selected by [y]:
- BCM47XX_BCMA [=y] && BCM47XX [=y] && PCI [=y]
The reason is that BCM47XX_BCMA selects BCMA_DRIVER_PCI_HOSTMODE without
depending on or selecting BCMA_DRIVER_PCI while BCMA_DRIVER_PCI_HOSTMODE
depends on BCMA_DRIVER_PCI. This can also fail building the kernel.
Honor the kconfig dependency to remove unmet direct dependency warnings
and avoid any potential build failures.
Fixes: c1d1c5d421 ("bcm47xx: add support for bcma bus")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209879
Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
When BCM47XX_SSB is enabled and SSB_PCIHOST is disabled, it results in the
following Kbuild warning:
WARNING: unmet direct dependencies detected for SSB_B43_PCI_BRIDGE
Depends on [n]: SSB [=y] && SSB_PCIHOST [=n]
Selected by [y]:
- BCM47XX_SSB [=y] && BCM47XX [=y] && PCI [=y]
The reason is that BCM47XX_SSB selects SSB_B43_PCI_BRIDGE without
depending on or selecting SSB_PCIHOST while SSB_B43_PCI_BRIDGE depends on
SSB_PCIHOST. This can also fail building the kernel as demonstrated in a
bug report.
Honor the kconfig dependency to remove unmet direct dependency warnings
and avoid any potential build failures.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=210051
Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In accordance with the Generic EHCI/OHCI bindings the corresponding node
name is suppose to comply with the Generic USB HCD DT schema, which
requires the USB nodes to have the name acceptable by the regexp:
"^usb(@.*)?" . Make sure the "generic-ehci" and "generic-ohci"-compatible
nodes are correctly named.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In accordance with the Generic EHCI/OHCI bindings the corresponding node
name is suppose to comply with the Generic USB HCD DT schema, which
requires the USB nodes to have the name acceptable by the regexp:
"^usb(@.*)?" . Make sure the "generic-ehci" and "generic-ohci"-compatible
nodes are correctly named.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In accordance with the Generic EHCI/OHCI bindings the corresponding node
name is suppose to comply with the Generic USB HCD DT schema, which
requires the USB nodes to have the name acceptable by the regexp:
"^usb(@.*)?" . Make sure the "generic-ehci" and "generic-ohci"-compatible
nodes are correctly named.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In play_dead function, the whole 64-bit PC mailbox was used as a indicator
to determine if the master core had written boot jump information.
However, after we introduced CSR mailsend, the hardware will not guarante
an atomic write for the 64-bit PC mailbox. Thus we have to use the lower
32-bit which is written at the last as the jump indicator instead.
Signed-off-by: Lu Zeng <zenglu@loongson.cn>
Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Loongson 3A4000+ CPU has per-core Mail_Send register to send mail,
there is no need to maintain register address of each core and node,
just simply specify cpu number.
Signed-off-by: Lu Zeng <zenglu@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In the current code, for example, core 1 sets Core[0, 1, 2, 3]_IPI_Enalbe
register and core 2, 3 do the same thing on the 1-way Loongson64 platform,
this is not necessary. Set IPI_Enable register per core by itself to avoid
duplicate operations and make the logic more clear.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The field ELPA of CP0_PAGEGRAIN register is set at the beginning
of the kernel entry point in kernel-entry-init.h, no need to set
it again in numa.c, we can remove enable_lpa() and only print the
related information.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The field LPA of CP0_CONFIG3 register is read only for Loongson64, so the
write operations are meaningless, remove them.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Everything in arch/mips/include/uapi/asm/types.h is protected by
"#ifndef __KERNEL__", so it's unused for kernelspace.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This driver transports LAPB (X.25 link layer) frames over TTY links.
I can safely say that this driver has no actual user because it was
not working at all until:
commit 8fdcabeac3 ("drivers/net/wan/x25_asy: Fix to make it work")
The code in its current state still has problems:
1.
The uses of "struct x25_asy" in x25_asy_unesc (when receiving) and in
x25_asy_write_wakeup (when sending) are not protected by locks against
x25_asy_change_mtu's changing of the transmitting/receiving buffers.
Also, all "netif_running" checks in this driver are not protected by
locks against the ndo_stop function.
2.
The driver stops all TTY read/write when the netif is down.
I think this is not right because this may cause the last outgoing frame
before the netif goes down to be incompletely transmitted, and the first
incoming frame after the netif goes up to be incompletely received.
And there may also be other problems.
I was planning to fix these problems but after recent discussions about
deleting other old networking code, I think we may just delete this
driver, too.
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20201105073434.429307-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
add missing iounmap() of pciu_base on error when failed to init
io_map_base.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add the missing iounmap() of iounmap(mips_gcr_base) before
return from mips_cm_probe() in the error handling case.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add the missing iounmap() of gpch->regs before return from
ar7_gpio_init() in the error handling case.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This will make a separated section for dtb appear in ELF, and we can
then use objcopy to patch a dtb into vmlinuz when RAW_APPENDED_DTB
is set in kernel config.
command to patch a dtb:
objcopy --set-section-flags=.appended_dtb=alloc,contents \
--update-section=.appended_dtb=<target>.dtb vmlinuz
Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Drop the separate Kconfig symbol for Xircom / Entrega and always include
support in the keyspan_pda driver.
Note that all configs that enabled CONFIG_USB_SERIAL_XIRCOM also enable
CONFIG_USB_SERIAL_KEYSPAN_PDA.
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to
require each one to select that symbol manually.
Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as
a simplification. It should be possible to select both
GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now
and decide at runtime between the two.
For the clockevents arch-support.txt file, this means that additional
architectures are marked as TODO when they have at least one machine
that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when
at least one machine has been converted. This means that both m68k and
arm (for riscpc) revert to TODO.
At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS
rather than leaving it off when not needed. I built an m68k
defconfig kernel (using gcc-10.1.0) and found that this would add
around 5.5KB in kernel image size:
text data bss dec hex filename
3861936 1092236 196656 5150828 4e986c obj-m68k/vmlinux-no-clockevent
3866201 1093832 196184 5156217 4ead79 obj-m68k/vmlinux-clockevent
On Arm (MACH_RPC), that difference appears to be twice as large,
around 11KB on top of an 6MB vmlinux.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add /sys/firmware/lefi/boardinfo to get mainboard and BIOS info easily
on the Loongson platform, this is useful to point out the current used
mainboard type and BIOS version when there exists problems related with
hardware or firmware.
The related structures are already defined in the interface specification
about firmware and kernel which are common requirement and specific for
Loongson64, so only add a new boardinfo.c file in arch/mips/loongson64.
E.g. with this patch:
[loongson@linux ~]$ cat /sys/firmware/lefi/boardinfo
Board Info
Manufacturer : LEMOTE
Board Name : LEMOTE-LS3A4000-7A1000-1w-V01-pc
Family : LOONGSON3
BIOS Info
Vendor : Kunlun
Version : Kunlun-A1901-V4.1.3-20200414093938
ROM Size : 4 KB
Release Date : 2020-04-14
By the way, using dmidecode command can get the similar info if there
exists SMBIOS in firmware, but the fact is that there is no SMBIOS on
some machines, we can see nothing when execute dmidecode, like this:
[root@linux loongson]# dmidecode
# dmidecode 2.12
# No SMBIOS nor DMI entry point found, sorry.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Pull ARM SoC-related driver updates from Olof Johansson:
"Various driver updates for platforms. A bulk of this is smaller fixes
or cleanups, but some of the new material this time around is:
- Support for Nvidia Tegra234 SoC
- Ring accelerator support for TI AM65x
- PRUSS driver for TI platforms
- Renesas support for R-Car V3U SoC
- Reset support for Cortex-M4 processor on i.MX8MQ
There are also new socinfo entries for a handful of different SoCs and
platforms"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (131 commits)
drm/mediatek: reduce clear event
soc: mediatek: cmdq: add clear option in cmdq_pkt_wfe api
soc: mediatek: cmdq: add jump function
soc: mediatek: cmdq: add write_s_mask value function
soc: mediatek: cmdq: add write_s value function
soc: mediatek: cmdq: add read_s function
soc: mediatek: cmdq: add write_s_mask function
soc: mediatek: cmdq: add write_s function
soc: mediatek: cmdq: add address shift in jump
soc: mediatek: mtk-infracfg: Fix kerneldoc
soc: amlogic: pm-domains: use always-on flag
reset: sti: reset-syscfg: fix struct description warnings
reset: imx7: add the cm4 reset for i.MX8MQ
dt-bindings: reset: imx8mq: add m4 reset
reset: Fix and extend kerneldoc
reset: reset-zynqmp: Added support for Versal platform
dt-bindings: reset: Updated binding for Versal reset driver
reset: imx7: Support module build
soc: fsl: qe: Remove unnessesary check in ucc_set_tdm_rxtx_clk
soc: fsl: qman: convert to use be32_add_cpu()
...
Pull KVM updates from Paolo Bonzini:
"For x86, there is a new alternative and (in the future) more scalable
implementation of extended page tables that does not need a reverse
map from guest physical addresses to host physical addresses.
For now it is disabled by default because it is still lacking a few of
the existing MMU's bells and whistles. However it is a very solid
piece of work and it is already available for people to hammer on it.
Other updates:
ARM:
- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigation
PPC:
- Fix for running nested guests with in-kernel IRQ chip
- Fix race condition causing occasional host hard lockup
- Minor cleanups and bugfixes
x86:
- allow trapping unknown MSRs to userspace
- allow userspace to force #GP on specific MSRs
- INVPCID support on AMD
- nested AMD cleanup, on demand allocation of nested SVM state
- hide PV MSRs and hypercalls for features not enabled in CPUID
- new test for MSR_IA32_TSC writes from host and guest
- cleanups: MMU, CPUID, shared MSRs
- LAPIC latency optimizations ad bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
kvm: x86/mmu: NX largepage recovery for TDP MMU
kvm: x86/mmu: Don't clear write flooding count for direct roots
kvm: x86/mmu: Support MMIO in the TDP MMU
kvm: x86/mmu: Support write protection for nesting in tdp MMU
kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
kvm: x86/mmu: Support dirty logging for the TDP MMU
kvm: x86/mmu: Support changed pte notifier in tdp MMU
kvm: x86/mmu: Add access tracking for tdp_mmu
kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
kvm: x86/mmu: Add TDP MMU PF handler
kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
KVM: Cache as_id in kvm_memory_slot
kvm: x86/mmu: Add functions to handle changed TDP SPTEs
kvm: x86/mmu: Allocate and free TDP MMU roots
kvm: x86/mmu: Init / Uninit the TDP MMU
kvm: x86/mmu: Introduce tdp_iter
KVM: mmu: extract spte.h and spte.c
KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
...
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:
- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().
- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"
* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
Pull Kbuild updates from Masahiro Yamada:
- Support 'make compile_commands.json' to generate the compilation
database more easily, avoiding stale entries
- Support 'make clang-analyzer' and 'make clang-tidy' for static checks
using clang-tidy
- Preprocess scripts/modules.lds.S to allow CONFIG options in the
module linker script
- Drop cc-option tests from compiler flags supported by our minimal
GCC/Clang versions
- Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y
- Use sha1 build id for both BFD linker and LLD
- Improve deb-pkg for reproducible builds and rootless builds
- Remove stale, useless scripts/namespace.pl
- Turn -Wreturn-type warning into error
- Fix build error of deb-pkg when CONFIG_MODULES=n
- Replace 'hostname' command with more portable 'uname -n'
- Various Makefile cleanups
* tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
kbuild: Use uname for LINUX_COMPILE_HOST detection
kbuild: Only add -fno-var-tracking-assignments for old GCC versions
kbuild: remove leftover comment for filechk utility
treewide: remove DISABLE_LTO
kbuild: deb-pkg: clean up package name variables
kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
kbuild: enforce -Werror=return-type
scripts: remove namespace.pl
builddeb: Add support for all required debian/rules targets
builddeb: Enable rootless builds
builddeb: Pass -n to gzip for reproducible packages
kbuild: split the build log of kallsyms
kbuild: explicitly specify the build id style
scripts/setlocalversion: make git describe output more reliable
kbuild: remove cc-option test of -Werror=date-time
kbuild: remove cc-option test of -fno-stack-check
kbuild: remove cc-option test of -fno-strict-overflow
kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
kbuild: do not create built-in objects for external module builds
...
Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode
There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.
The information required to make the reclaim decision is not known to the
app. Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.
To solve the issue, this patch introduces a new syscall
process_madvise(2). It uses pidfd of an external process to give the
hint. It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement. I
think it would be bigger in real practice because the testing ran very
cache friendly environment).
Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations. In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment. With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.
ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.
I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky. Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone. Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.
If someone want to add other hints, we could hear the usecase and review
it for each hint. It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.
So finally, the API is as follows,
ssize_t process_madvise(int pidfd, const struct iovec *iovec,
unsigned long vlen, int advice, unsigned int flags);
DESCRIPTION
The process_madvise() system call is used to give advice or directions
to the kernel about the address ranges from external process as well as
local process. It provides the advice to address ranges of process
described by iovec and vlen. The goal of such advice is to improve
system or application performance.
The pidfd selects the process referred to by the PID file descriptor
specified in pidfd. (See pidofd_open(2) for further information)
The pointer iovec points to an array of iovec structures, defined in
<sys/uio.h> as:
struct iovec {
void *iov_base; /* starting address */
size_t iov_len; /* number of bytes to be advised */
};
The iovec describes address ranges beginning at address(iov_base)
and with size length of bytes(iov_len).
The vlen represents the number of elements in iovec.
The advice is indicated in the advice argument, which is one of the
following at this moment if the target process specified by pidfd is
external.
MADV_COLD
MADV_PAGEOUT
Permission to provide a hint to external process is governed by a
ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
The process_madvise supports every advice madvise(2) has if target
process is in same thread group with calling process so user could
use process_madvise(2) to extend existing madvise(2) to support
vector address ranges.
RETURN VALUE
On success, process_madvise() returns the number of bytes advised.
This return value may be less than the total number of requested
bytes, if an error occurred. The caller should check return value
to determine whether a partial advice occurred.
FAQ:
Q.1 - Why does any external entity have better knowledge?
Quote from Sandeep
"For Android, every application (including the special SystemServer)
are forked from Zygote. The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.
After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.
In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.
So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.
Besides, we can never rely on applications to clean things up
themselves. We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.
So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.
- ssp
Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?
process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called. If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect. It's the
responsibility of the process calling process_madvise to close this
race condition. For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called. Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process. Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm. The suggested API itself does not provide synchronization. It
also apply other APIs like move_pages, process_vm_write.
The race isn't really a problem though. Why is it so wrong to require
that callers do their own synchronization in some manner? Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something. Think about mmap. It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before. That's where we need synchronization by using other API or
design from userside. It shouldn't be part of API itself. If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.
To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.
Q.3 - Why doesn't ptrace work?
Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA. Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill. It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.
[1] https://developer.android.com/topic/performance/memory"
[2] process_getinfo for getting the cookie which is updated whenever
vma of process address layout are changed - Daniel Colascione -
https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224
[3] anonymous fd which is used for the object(i.e., address range)
validation - Michal Hocko -
https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/
[minchan@kernel.org: fix process_madvise build break for arm64]
Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull MIPS updates from Thomas Bogendoerfer:
- removed support for PNX833x alias NXT_STB22x
- included Ingenic SoC support into generic MIPS kernels
- added support for new Ingenic SoCs
- converted workaround selection to use Kconfig
- replaced old boot mem functions by memblock_*
- enabled COP2 usage in kernel for Loongson64 to make use
of 16byte load/stores possible
- cleanups and fixes
* tag 'mips_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (92 commits)
MIPS: DEC: Restore bootmem reservation for firmware working memory area
MIPS: dec: fix section mismatch
bcm963xx_tag.h: fix duplicated word
mips: ralink: enable zboot support
MIPS: ingenic: Remove CPU_SUPPORTS_HUGEPAGES
MIPS: cpu-probe: remove MIPS_CPU_BP_GHIST option bit
MIPS: cpu-probe: introduce exclusive R3k CPU probe
MIPS: cpu-probe: move fpu probing/handling into its own file
MIPS: replace add_memory_region with memblock
MIPS: Loongson64: Clean up numa.c
MIPS: Loongson64: Select SMP in Kconfig to avoid build error
mips: octeon: Add Ubiquiti E200 and E220 boards
MIPS: SGI-IP28: disable use of ll/sc in kernel
MIPS: tx49xx: move tx4939_add_memory_regions into only user
MIPS: pgtable: Remove used PAGE_USERIO define
MIPS: alchemy: Share prom_init implementation
MIPS: alchemy: Fix build breakage, if TOUCHSCREEN_WM97XX is disabled
MIPS: process: include exec.h header in process.c
MIPS: process: Add prototype for function arch_dup_task_struct
MIPS: idle: Add prototype for function check_wait
...