Commit Graph

12886 Commits

Author SHA1 Message Date
周琰杰 (Zhou Yanjie)
ab3040e137 MIPS: Ingenic: Add MAC syscon nodes for Ingenic SoCs.
Add MAC syscon nodes for X1000 SoC and X1830 SoC from Ingenic.

Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Acked-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-30 14:37:16 +02:00
周琰杰 (Zhou Yanjie)
579f73cf84 MIPS: X1830: Respect cell count of common properties.
If N fields of X cells should be provided, then that's what the
devicetree should represent, instead of having one single field of
(N * X) cells.

Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Acked-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-30 14:37:16 +02:00
Linus Torvalds
65090f30ab Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "191 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
  slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
  mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
  pagealloc, and memory-failure)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
  mm,hwpoison: make get_hwpoison_page() call get_any_page()
  mm,hwpoison: send SIGBUS with error virutal address
  mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
  mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
  mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
  mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
  docs: remove description of DISCONTIGMEM
  arch, mm: remove stale mentions of DISCONIGMEM
  mm: remove CONFIG_DISCONTIGMEM
  m68k: remove support for DISCONTIGMEM
  arc: remove support for DISCONTIGMEM
  arc: update comment about HIGHMEM implementation
  alpha: remove DISCONTIGMEM and NUMA
  mm/page_alloc: move free_the_page
  mm/page_alloc: fix counting of managed_pages
  mm/page_alloc: improve memmap_pages dbg msg
  mm: drop SECTION_SHIFT in code comments
  mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
  mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
  mm/page_alloc: scale the number of pages that are batch freed
  ...
2021-06-29 17:29:11 -07:00
Linus Torvalds
21edf50948 Merge tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core changes:

   - Cleanup and simplification of common code to invoke the low level
     interrupt flow handlers when this invocation requires irqdomain
     resolution. Add the necessary core infrastructure.

   - Provide a proper interface for modular PMU drivers to set the
     interrupt affinity.

   - Add a request flag which allows to exclude interrupts from spurious
     interrupt detection. Useful especially for IPI handlers which
     always return IRQ_HANDLED which turns the spurious interrupt
     detection into a pointless waste of CPU cycles.

  Driver changes:

   - Bulk convert interrupt chip drivers to the new irqdomain low level
     flow handler invocation mechanism.

   - Add device tree bindings for the Renesas R-Car M3-W+ SoC

   - Enable modular build of the Qualcomm PDC driver

   - The usual small fixes and improvements"

* tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties
  irqchip: gic-pm: Remove redundant error log of clock bulk
  irqchip/sun4i: Remove unnecessary oom message
  irqchip/irq-imx-gpcv2: Remove unnecessary oom message
  irqchip/imgpdc: Remove unnecessary oom message
  irqchip/gic-v3-its: Remove unnecessary oom message
  irqchip/gic-v2m: Remove unnecessary oom message
  irqchip/exynos-combiner: Remove unnecessary oom message
  irqchip: Bulk conversion to generic_handle_domain_irq()
  genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ()
  genirq: Add generic_handle_domain_irq() helper
  irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq()
  irqdesc: Fix __handle_domain_irq() comment
  genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co
  irqdomain: Introduce irq_resolve_mapping()
  irqdomain: Protect the linear revmap with RCU
  irqdomain: Cache irq_data instead of a virq number in the revmap
  irqdomain: Use struct_size() helper when allocating irqdomain
  irqdomain: Make normal and nomap irqdomains exclusive
  powerpc: Move the use of irq_domain_add_nomap() behind a config option
  ...
2021-06-29 12:25:04 -07:00
Mike Rapoport
a9ee6cf5c6 mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
configuration options are equivalent.

Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.

Done with

	$ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
		$(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
	$ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
		$(git grep -wl NEED_MULTIPLE_NODES)

with manual tweaks afterwards.

[rppt@linux.ibm.com: fix arm boot crash]
  Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com

Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:55 -07:00
Mike Rapoport
d3c251ab95 arch, mm: remove stale mentions of DISCONIGMEM
There are several places that mention DISCONIGMEM in comments or have
stale code guarded by CONFIG_DISCONTIGMEM.

Remove the dead code and update the comments.

Link: https://lkml.kernel.org/r/20210608091316.3622-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:55 -07:00
Liam Howlett
7f7020ac0d arch/mips/kernel/traps: use vma_lookup() instead of find_vma()
Use vma_lookup() to find the VMA at a specific address.  As vma_lookup()
will return NULL if the address is not within any VMA, the start address
no longer needs to be validated.

Link: https://lkml.kernel.org/r/20210521174745.2219620-8-Liam.Howlett@Oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:51 -07:00
Nick Desaulniers
c994a3ec7e MIPS: set mips32r5 for virt extensions
Clang's integrated assembler only accepts these instructions when the
cpu is set to mips32r5. With this change, we can assemble
malta_defconfig with Clang via `make LLVM_IAS=1`.

Link: https://github.com/ClangBuiltLinux/linux/issues/763
Reported-by: Dmitry Golovin <dima@golovin.in>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-29 10:23:57 +02:00
zhanglianjie
6817c94443 MIPS: loongsoon64: Reserve memory below starting pfn to prevent Oops
The cause of the problem is as follows:
1. when cat /sys/devices/system/memory/memory0/valid_zones,
   test_pages_in_a_zone() will be called.
2. test_pages_in_a_zone() finds the zone according to stat_pfn = 0.
   The smallest pfn of the numa node in the mips architecture is 128,
   and the page corresponding to the previous 0~127 pfn is not
   initialized (page->flags is 0xFFFFFFFF)
3. The nid and zonenum obtained using page_zone(pfn_to_page(0)) are out
   of bounds in the corresponding array,
   &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)],
   access to the out-of-bounds zone member variables appear abnormal,
   resulting in Oops.
Therefore, it is necessary to keep the page between 0 and the minimum
pfn to prevent Oops from appearing.

Signed-off-by: zhanglianjie <zhanglianjie@uniontech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-29 10:15:46 +02:00
Paul Cercueil
cad065ed8d MIPS: MT extensions are not available on MIPS32r1
MIPS MT extensions were added with the MIPS 34K processor, which was
based on the MIPS32r2 ISA.

This fixes a build error when building a generic kernel for a MIPS32r1
CPU.

Fixes: c434b9f80b ("MIPS: Kconfig: add MIPS_GENERIC_KERNEL symbol")
Cc: stable@vger.kernel.org # v5.9
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-29 10:14:58 +02:00
Linus Torvalds
36824f198c Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "This covers all architectures (except MIPS) so I don't expect any
  other feature pull requests this merge window.

  ARM:

   - Add MTE support in guests, complete with tag save/restore interface

   - Reduce the impact of CMOs by moving them in the page-table code

   - Allow device block mappings at stage-2

   - Reduce the footprint of the vmemmap in protected mode

   - Support the vGIC on dumb systems such as the Apple M1

   - Add selftest infrastructure to support multiple configuration and
     apply that to PMU/non-PMU setups

   - Add selftests for the debug architecture

   - The usual crop of PMU fixes

  PPC:

   - Support for the H_RPT_INVALIDATE hypercall

   - Conversion of Book3S entry/exit to C

   - Bug fixes

  S390:

   - new HW facilities for guests

   - make inline assembly more robust with KASAN and co

  x86:

   - Allow userspace to handle emulation errors (unknown instructions)

   - Lazy allocation of the rmap (host physical -> guest physical
     address)

   - Support for virtualizing TSC scaling on VMX machines

   - Optimizations to avoid shattering huge pages at the beginning of
     live migration

   - Support for initializing the PDPTRs without loading them from
     memory

   - Many TLB flushing cleanups

   - Refuse to load if two-stage paging is available but NX is not (this
     has been a requirement in practice for over a year)

   - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
     CR0/CR4/EFER, using the MMU mode everywhere once it is computed
     from the CPU registers

   - Use PM notifier to notify the guest about host suspend or hibernate

   - Support for passing arguments to Hyper-V hypercalls using XMM
     registers

   - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap
     on AMD processors

   - Hide Hyper-V hypercalls that are not included in the guest CPUID

   - Fixes for live migration of virtual machines that use the Hyper-V
     "enlightened VMCS" optimization of nested virtualization

   - Bugfixes (not many)

  Generic:

   - Support for retrieving statistics without debugfs

   - Cleanups for the KVM selftests API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits)
  KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
  kvm: x86: disable the narrow guest module parameter on unload
  selftests: kvm: Allows userspace to handle emulation errors.
  kvm: x86: Allow userspace to handle emulation errors
  KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
  KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
  KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
  KVM: x86: Enhance comments for MMU roles and nested transition trickiness
  KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
  KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
  KVM: x86/mmu: Use MMU's role to determine PTTYPE
  KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
  KVM: x86/mmu: Add a helper to calculate root from role_regs
  KVM: x86/mmu: Add helper to update paging metadata
  KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
  KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
  KVM: x86/mmu: Get nested MMU's root level from the MMU's role
  ...
2021-06-28 15:40:51 -07:00
Linus Torvalds
54a728dc5e Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:

 - Changes to core scheduling facilities:

    - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
      coordinated scheduling across SMT siblings. This is a much
      requested feature for cloud computing platforms, to allow the
      flexible utilization of SMT siblings, without exposing untrusted
      domains to information leaks & side channels, plus to ensure more
      deterministic computing performance on SMT systems used by
      heterogenous workloads.

      There are new prctls to set core scheduling groups, which allows
      more flexible management of workloads that can share siblings.

    - Fix task->state access anti-patterns that may result in missed
      wakeups and rename it to ->__state in the process to catch new
      abuses.

 - Load-balancing changes:

    - Tweak newidle_balance for fair-sched, to improve 'memcache'-like
      workloads.

    - "Age" (decay) average idle time, to better track & improve
      workloads such as 'tbench'.

    - Fix & improve energy-aware (EAS) balancing logic & metrics.

    - Fix & improve the uclamp metrics.

    - Fix task migration (taskset) corner case on !CONFIG_CPUSET.

    - Fix RT and deadline utilization tracking across policy changes

    - Introduce a "burstable" CFS controller via cgroups, which allows
      bursty CPU-bound workloads to borrow a bit against their future
      quota to improve overall latencies & batching. Can be tweaked via
      /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.

    - Rework assymetric topology/capacity detection & handling.

 - Scheduler statistics & tooling:

    - Disable delayacct by default, but add a sysctl to enable it at
      runtime if tooling needs it. Use static keys and other
      optimizations to make it more palatable.

    - Use sched_clock() in delayacct, instead of ktime_get_ns().

 - Misc cleanups and fixes.

* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/doc: Update the CPU capacity asymmetry bits
  sched/topology: Rework CPU capacity asymmetry detection
  sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
  psi: Fix race between psi_trigger_create/destroy
  sched/fair: Introduce the burstable CFS controller
  sched/uclamp: Fix uclamp_tg_restrict()
  sched/rt: Fix Deadline utilization tracking during policy change
  sched/rt: Fix RT utilization tracking during policy change
  sched: Change task_struct::state
  sched,arch: Remove unused TASK_STATE offsets
  sched,timer: Use __set_current_state()
  sched: Add get_current_state()
  sched,perf,kvm: Fix preemption condition
  sched: Introduce task_is_running()
  sched: Unbreak wakeups
  sched/fair: Age the average idle time
  sched/cpufreq: Consider reduced CPU capacity in energy calculation
  sched/fair: Take thermal pressure into account while estimating energy
  thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
  sched/fair: Return early from update_tg_cfs_load() if delta == 0
  ...
2021-06-28 12:14:19 -07:00
Linus Torvalds
28a27cbd86 Merge tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:

 - Platform PMU driver updates:

     - x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
     - Fix RDPMC support
     - Fix [extended-]PEBS-via-PT support
     - Fix Sapphire Rapids event constraints
     - Fix :ppp support on Sapphire Rapids
     - Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
     - Other heterogenous-PMU fixes

 - Kprobes:

     - Remove the unused and misguided kprobe::fault_handler callbacks.
     - Warn about kprobes taking a page fault.
     - Fix the 'nmissed' stat counter.

 - Misc cleanups and fixes.

* tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix task context PMU for Hetero
  perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids
  perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids
  perf/x86/intel: Fix fixed counter check warning for some Alder Lake
  perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS
  perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  kprobes: Do not increment probe miss count in the fault handler
  x86,kprobes: WARN if kprobes tries to handle a fault
  kprobes: Remove kprobe::fault_handler
  uprobes: Update uprobe_write_opcode() kernel-doc comment
  perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint
  perf/core: Fix DocBook warnings
  perf/core: Make local function perf_pmu_snapshot_aux() static
  perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX
  perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR
  perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure
  perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
2021-06-28 12:03:20 -07:00
Linus Torvalds
a15286c63d Merge tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - Core locking & atomics:

     - Convert all architectures to ARCH_ATOMIC: move every architecture
       to ARCH_ATOMIC, then get rid of ARCH_ATOMIC and all the
       transitory facilities and #ifdefs.

       Much reduction in complexity from that series:

           63 files changed, 756 insertions(+), 4094 deletions(-)

     - Self-test enhancements

 - Futexes:

     - Add the new FUTEX_LOCK_PI2 ABI, which is a variant that doesn't
       set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).

       [ The temptation to repurpose FUTEX_LOCK_PI's implicit setting of
         FLAGS_CLOCKRT & invert the flag's meaning to avoid having to
         introduce a new variant was resisted successfully. ]

     - Enhance futex self-tests

 - Lockdep:

     - Fix dependency path printouts

     - Optimize trace saving

     - Broaden & fix wait-context checks

 - Misc cleanups and fixes.

* tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  locking/lockdep: Correct the description error for check_redundant()
  futex: Provide FUTEX_LOCK_PI2 to support clock selection
  futex: Prepare futex_lock_pi() for runtime clock selection
  lockdep/selftest: Remove wait-type RCU_CALLBACK tests
  lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
  lockdep: Fix wait-type for empty stack
  locking/selftests: Add a selftest for check_irq_usage()
  lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
  locking/lockdep: Remove the unnecessary trace saving
  locking/lockdep: Fix the dep path printing for backwards BFS
  selftests: futex: Add futex compare requeue test
  selftests: futex: Add futex wait test
  seqlock: Remove trailing semicolon in macros
  locking/lockdep: Reduce LOCKDEP dependency list
  locking/lockdep,doc: Improve readability of the block matrix
  locking/atomics: atomic-instrumented: simplify ifdeffery
  locking/atomic: delete !ARCH_ATOMIC remnants
  locking/atomic: xtensa: move to ARCH_ATOMIC
  locking/atomic: sparc: move to ARCH_ATOMIC
  locking/atomic: sh: move to ARCH_ATOMIC
  ...
2021-06-28 11:45:29 -07:00
Paolo Bonzini
b8917b4ae4 Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
  and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
2021-06-25 11:24:24 -04:00
Jing Zhang
bc9e9e672d KVM: debugfs: Reuse binary stats descriptors
To remove code duplication, use the binary stats descriptors in the
implementation of the debugfs interface for statistics. This unifies
the definition of statistics for the binary and debugfs interfaces.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-8-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:29 -04:00
Jing Zhang
ce55c04945 KVM: stats: Support binary stats retrieval for a VCPU
Add a VCPU ioctl to get a statistics file descriptor by which a read
functionality is provided for userspace to read out VCPU stats header,
descriptors and data.
Define VCPU statistics descriptors and header for all architectures.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com> #arm64
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-5-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:19 -04:00
Jing Zhang
fcfe1baedd KVM: stats: Support binary stats retrieval for a VM
Add a VM ioctl to get a statistics file descriptor by which a read
functionality is provided for userspace to read out VM stats header,
descriptors and data.
Define VM statistics descriptors and header for all architectures.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com> #arm64
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-4-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:10 -04:00
Martynas Pumputis
e8b9eab992 net: retrieve netns cookie via getsocketopt
It's getting more common to run nested container environments for
testing cloud software. One of such examples is Kind [1] which runs a
Kubernetes cluster in Docker containers on a single host. Each container
acts as a Kubernetes node, and thus can run any Pod (aka container)
inside the former. This approach simplifies testing a lot, as it
eliminates complicated VM setups.

Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
programs are used for load-balancing. The load-balancer BPF program
needs to detect whether a request originates from the host netns or a
container netns in order to allow some access, e.g. to a service via a
loopback IP address. Typically, the programs detect this by comparing
netns cookies with the one of the init ns via a call to
bpf_get_netns_cookie(NULL). However, in nested environments the latter
cannot be used given the Kubernetes node's netns is outside the init ns.
To fix this, we need to pass the Kubernetes node netns cookie to the
program in a different way: by extending getsockopt() with a
SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
node netns can retrieve the cookie and pass it to the program instead.

Thus, this is following up on Eric's commit 3d368ab87c ("net:
initialize net->net_cookie at netns setup") to allow retrieval via
SO_NETNS_COOKIE.  This is also in line in how we retrieve socket cookie
via SO_COOKIE.

  [1] https://kind.sigs.k8s.io/

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:13:05 -07:00
Jing Zhang
cb082bfab5 KVM: stats: Add fd-based API to read binary stats data
This commit defines the API for userspace and prepare the common
functionalities to support per VM/VCPU binary stats data readings.

The KVM stats now is only accessible by debugfs, which has some
shortcomings this change series are supposed to fix:
1. The current debugfs stats solution in KVM could be disabled
   when kernel Lockdown mode is enabled, which is a potential
   rick for production.
2. The current debugfs stats solution in KVM is organized as "one
   stats per file", it is good for debugging, but not efficient
   for production.
3. The stats read/clear in current debugfs solution in KVM are
   protected by the global kvm_lock.

Besides that, there are some other benefits with this change:
1. All KVM VM/VCPU stats can be read out in a bulk by one copy
   to userspace.
2. A schema is used to describe KVM statistics. From userspace's
   perspective, the KVM statistics are self-describing.
3. With the fd-based solution, a separate telemetry would be able
   to read KVM stats in a less privileged environment.
4. After the initial setup by reading in stats descriptors, a
   telemetry only needs to read the stats data itself, no more
   parsing or setup is needed.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com> #arm64
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-3-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 11:47:57 -04:00
Jing Zhang
0193cc908b KVM: stats: Separate generic stats from architecture specific ones
Generic KVM stats are those collected in architecture independent code
or those supported by all architectures; put all generic statistics in
a separate structure.  This ensures that they are defined the same way
in the statistics API which is being added, removing duplication among
different architectures in the declaration of the descriptors.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-2-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 11:47:56 -04:00
zhouchuangao
a2cdc24e20 mips/kvm: Use BUG_ON instead of if condition followed by BUG
BUG_ON uses unlikely in if(), it can be optimized at compile time.

Usually, the condition in if() is not satisfied. In my opinion,
this can improve the efficiency of the multi-stage pipeline.

Signed-off-by: zhouchuangao <zhouchuangao@vivo.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-21 11:40:54 +02:00
Yang Yingliang
f9cd46a4bb MIPS: OCTEON: octeon-usb: Use devm_platform_get_and_ioremap_resource()
Remove unneeded error handling on the result of a call
to platform_get_resource() when the value is passed to
devm_ioremap_resource().

And use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-21 11:32:47 +02:00
Huang Pei
ed914d48b6 MIPS: add PMD table accounting into MIPS'pmd_alloc_one
This fixes Page Table accounting bug.

MIPS is the ONLY arch just defining __HAVE_ARCH_PMD_ALLOC_ONE alone.
Since commit b2b29d6d01 (mm: account PMD tables like PTE tables),
"pmd_free" in asm-generic with PMD table accounting and "pmd_alloc_one"
in MIPS without PMD table accounting causes PageTable accounting number
negative, which read by global_zone_page_state(), always returns 0.

Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-21 11:27:30 +02:00
Tom Rix
06476b5bed MIPS: Loongson64: fix spelling of SPDX tag
checkpatch looks for SPDX-License-Identifier.
So change the '_' to '-'

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-06-21 11:26:45 +02:00
Jakub Kicinski
adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Qing Zhang
47a311c31a MIPS: Loongson64: DTS: Add GMAC support for LS7A PCH
The GMAC module is now supported, enable it.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 11:48:34 -07:00
Qing Zhang
f8a1142507 MIPS: Loongson64: Add GMAC support for Loongson-2K1000
The GMAC module is now supported, enable it.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 11:48:34 -07:00
Peter Zijlstra
7c3edd6d9c sched,arch: Remove unused TASK_STATE offsets
All 6 architectures define TASK_STATE in asm-offsets, but then never
actually use it. Remove the definitions to make sure they never will.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210611082838.472811363@infradead.org
2021-06-18 11:43:09 +02:00
Peter Zijlstra
b03fbd4ff2 sched: Introduce task_is_running()
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
2021-06-18 11:43:07 +02:00
Ingo Molnar
b2c0931a07 Merge branch 'sched/urgent' into sched/core, to resolve conflicts
This commit in sched/urgent moved the cfs_rq_is_decayed() function:

  a7b359fc6a: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")

and this fresh commit in sched/core modified it in the old location:

  9e077b52d8: ("sched/pelt: Check that *_avg are null when *_sum are")

Merge the two variants.

Conflicts:
	kernel/sched/fair.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-18 11:31:25 +02:00
Paolo Bonzini
e3cb6fa0e2 KVM: switch per-VM stats to u64
Make them the same type as vCPU stats.  There is no reason
to limit the counters to unsigned long.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 14:25:27 -04:00
Maciej W. Rozycki
7888478398 MIPS: Malta: Enable magic multipliers for Super I/O UARTs
The SMSC FDC37M817 Super I/O chip has a configuration feature that lets
it support special UART divisor values of 32770 and 32769 for bit rates
of 230400 and 460800 bits per second respectively.  Our 8250 driver core
provides support for these special divisors via the UPF_MAGIC_MULTIPLIER
flag, and YAMON firmware unconditionally configures the Super I/O chip
with these divisors enabled as well, so all we need to do in platform
setup for these bit rates to work is to set the flag.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2105182249380.3032@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16 09:20:29 +02:00
Sergio Paracuellos
222b27713d MIPS: ralink: Define PCI_IOBASE
PCI_IOBASE is used to create VM maps for PCI I/O ports, it is
required by generic PCI drivers to make memory mapped I/O range
work. Hence define it for ralink architectures to be able to
avoid parsing manually IO ranges in PCI generic driver code.
Function 'plat_mem_setup' for ralink is using 'set_io_port_base'
call using '0xa0000000' as address, so use the same address in
the definition to align things.

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210614100617.28753-2-sergio.paracuellos@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-15 15:24:29 +02:00
Marc Zyngier
a12a9c5c03 MIPS: Do not include linux/irqdomain.h from asm/irq.h
Including linux/irqdomain.h from asm/irq.h is going to break
as soon as linux/irqdomain.h will include linux/irq.h, so
let's fix this. Code relying on linux/irqomain.h should include
it directly.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10 13:09:16 +01:00
Marc Zyngier
18ca45f5ba MIPS: Add missing linux/irqdomain.h includes
A number of MIPS platforms are failing to directly include
irqdomain.h. Fix this so that we can drop unnecessary dependencies

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10 13:09:16 +01:00
Marc Zyngier
95af1df6f4 MIPS: lantiq: Directly include linux/of.h in xway/dma.c
This drivers currently obtains linux/of.h by luck and a chain of
bizarre inclusions, which we're about to fix.

Let's include the required file directly.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10 13:09:16 +01:00
Arnd Bergmann
2f4574dd6d clkdev: remove CONFIG_CLKDEV_LOOKUP
This option is now synonymous with CONFIG_HAVE_CLK, so use
the latter globally. Any out-of-tree platform ports that
still use a private clk_get()/clk_put() implementation should
move to CONFIG_COMMON_CLK.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-06-08 17:00:09 +02:00
Arnd Bergmann
35f752be4f mips: ralink: convert to CONFIG_COMMON_CLK
ralink only has a very trivial clock implementation, with everything
being fixed clocks.

Convert it to CONFIG_COMMON_CLK to reduce the number of platforms
that rely on legacy clocks. Of course, the clocks really should
be read from the device tree instead, but this is a step into that
direction.

This adds about 50KB to the kernel image size, which is an unfortunate
increase, but not as bad as I had feared:

   text	   data	    bss	    dec	    hex	filename
3778560	1582216	  92256	5453032	 5334e8	vmlinux-vocore-before
3822148	1601192	  92304	5515644	 54297c	vmlinux-vocore-after
3870226	1644468	 200192	5714886	 5733c6	vmlinux-rt305x-before
3916727	1668404	 200240	5785371	 58471b	vmlinux-rt305x-after

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-06-08 17:00:09 +02:00
Arnd Bergmann
b408b611eb mips: ar7: convert to CONFIG_COMMON_CLK
Perform a minimal conversion of the ar7 clock implementation to the common
clock framework. While the hardware can control the rates, this is left
unchanged, and all clocks are registered as fixed-rate or fixed-divider
clocks. Similarly, the clkdev lookup information is left unchanged but
moved from the table format into individual allocations.

There is a small increase in code size:

   text	   data	    bss	    dec	    hex	filename
4757116	 596640	  91328	5445084	 5315dc	vmlinux-before
4806159	 602360	  91344	5499863	 53ebd7	vmlinux-after

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-06-08 17:00:09 +02:00
Arnd Bergmann
4410c4236e mips: ar7: convert to clkdev_lookup
ar7 is one of only two platforms that provide the clock interface but
implement a custom version of the clkdev_lookup code.

Change this to use the generic version instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-06-08 17:00:08 +02:00
Jan Kara
65ffb3d69e quota: Wire up quotactl_fd syscall
Wire up the quotactl_fd syscall.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-06-07 12:11:24 +02:00
Sergio Paracuellos
a49a8717d8 pinctrl: ralink: move RT288X SoC pinmux config into a new 'pinctrl-rt288x.c' file
Move all related code for SoC RT288X into a new driver located
in 'pinctrl-rt288x.c' source file.

Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210604115159.8834-7-sergio.paracuellos@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-06-07 09:17:48 +02:00
Sergio Paracuellos
745ec436de pinctrl: ralink: move MT7620 SoC pinmux config into a new 'pinctrl-mt7620.c' file
Move all related code for SoC MT7620 into a new driver located
in 'pinctrl-mt7620.c' source file.

Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210604115159.8834-6-sergio.paracuellos@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-06-07 09:17:47 +02:00
Sergio Paracuellos
3a1b0ca5a8 pinctrl: ralink: move RT305X SoC pinmux config into a new 'pinctrl-rt305x.c' file
Move all related code for SoC RT305X into a new driver located
in 'pinctrl-rt305x.c' source file.

Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210604115159.8834-5-sergio.paracuellos@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-06-07 09:17:47 +02:00
Sergio Paracuellos
8b8b0fa21d pinctrl: ralink: move RT3883 SoC pinmux config into a new 'pinctrl-rt3883.c' file
Move all related code for SoC RT3883 into a new driver located
in 'pinctrl-rt3883.c' source file

Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210604115159.8834-4-sergio.paracuellos@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-06-07 09:17:47 +02:00
Sergio Paracuellos
e98dd3db96 pinctrl: ralink: move MT7621 SoC pinmux config into a new 'pinctrl-mt7621.c' file
Move all related code for SoC MT7621 into a new driver located
in 'pinctrl-mt7621.c' source file.

Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210604115159.8834-3-sergio.paracuellos@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-06-07 09:17:47 +02:00
Sergio Paracuellos
276e552e08 pinctrl: ralink: move ralink architecture pinmux header into the driver
Ralink architecture is making use of the header located in
'arch/mips/include/asm/mach-ralink/pinmux.h' to stablish the
mechanisms to make derived SoCs to set its pin functions and
groups. In order to move all architecture pinmux into a more
accurate place which is 'drivers/pinctrl/ralink' we have to
first of all move this file also there with a small modification
which creates 'rt2880_pinmux_init' function to allow SoCs pinctrl
drivers to pass its configuration to the common code located in
'pinctrl-rt2880.c' file.

Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20210604115159.8834-2-sergio.paracuellos@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-06-07 09:17:47 +02:00
Thomas Bogendoerfer
50c25ee97c Revert "MIPS: make userspace mapping young by default"
This reverts commit f685a533a7.

The MIPS cache flush logic needs to know whether the mapping was already
established to decide how to flush caches.  This is done by checking the
valid bit in the PTE.  The commit above breaks this logic by setting the
valid in the PTE in new mappings, which causes kernel crashes.

Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de
Fixes: f685a533a7 ("MIPS: make userspace mapping young by default")
Reported-by: Zhou Yanjie <zhouyanjie@wanyeetech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Huang Pei <huangpei@loongson.cn>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Ingo Molnar
a9e906b71f Merge branch 'sched/urgent' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-03 19:00:49 +02:00