linux

Author	SHA1	Message	Date
Linus Torvalds	a5a9e00605	Merge tag 'seccomp-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "These are x86-specific, but I carried these since they're also seccomp-specific. This flips the defaults for spec_store_bypass_disable and spectre_v2_user from "seccomp" to "prctl", as enough time has passed to allow system owners to have updated the defensive stances of their various workloads, and it's long overdue to unpessimize seccomp threads. Extensive rationale and details are in Andrea's main patch. Summary: - set spec_store_bypass_disable & spectre_v2_user to prctl (Andrea Arcangeli)" * tag 'seccomp-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: x86: deduplicate the spectre_v2_user documentation x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl	2021-11-01 17:25:09 -07:00
Linus Torvalds	8cb1ae19bf	Merge tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Thomas Gleixner: - Cleanup of extable fixup handling to be more robust, which in turn allows to make the FPU exception fixups more robust as well. - Change the return code for signal frame related failures from explicit error codes to a boolean fail/success as that's all what the calling code evaluates. - A large refactoring of the FPU code to prepare for adding AMX support: - Distangle the public header maze and remove especially the misnomed kitchen sink internal.h which is despite it's name included all over the place. - Add a proper abstraction for the register buffer storage (struct fpstate) which allows to dynamically size the buffer at runtime by flipping the pointer to the buffer container from the default container which is embedded in task_struct::tread::fpu to a dynamically allocated container with a larger register buffer. - Convert the code over to the new fpstate mechanism. - Consolidate the KVM FPU handling by moving the FPU related code into the FPU core which removes the number of exports and avoids adding even more export when AMX has to be supported in KVM. This also removes duplicated code which was of course unnecessary different and incomplete in the KVM copy. - Simplify the KVM FPU buffer handling by utilizing the new fpstate container and just switching the buffer pointer from the user space buffer to the KVM guest buffer when entering vcpu_run() and flipping it back when leaving the function. This cuts the memory requirements of a vCPU for FPU buffers in half and avoids pointless memory copy operations. This also solves the so far unresolved problem of adding AMX support because the current FPU buffer handling of KVM inflicted a circular dependency between adding AMX support to the core and to KVM. With the new scheme of switching fpstate AMX support can be added to the core code without affecting KVM. - Replace various variables with proper data structures so the extra information required for adding dynamically enabled FPU features (AMX) can be added in one place - Add AMX (Advanced Matrix eXtensions) support (finally): AMX is a large XSTATE component which is going to be available with Saphire Rapids XEON CPUs. The feature comes with an extra MSR (MSR_XFD) which allows to trap the (first) use of an AMX related instruction, which has two benefits: 1) It allows the kernel to control access to the feature 2) It allows the kernel to dynamically allocate the large register state buffer instead of burdening every task with the the extra 8K or larger state storage. It would have been great to gain this kind of control already with AVX512. The support comes with the following infrastructure components: 1) arch_prctl() to - read the supported features (equivalent to XGETBV(0)) - read the permitted features for a task - request permission for a dynamically enabled feature Permission is granted per process, inherited on fork() and cleared on exec(). The permission policy of the kernel is restricted to sigaltstack size validation, but the syscall obviously allows further restrictions via seccomp etc. 2) A stronger sigaltstack size validation for sys_sigaltstack(2) which takes granted permissions and the potentially resulting larger signal frame into account. This mechanism can also be used to enforce factual sigaltstack validation independent of dynamic features to help with finding potential victims of the 2K sigaltstack size constant which is broken since AVX512 support was added. 3) Exception handling for #NM traps to catch first use of a extended feature via a new cause MSR. If the exception was caused by the use of such a feature, the handler checks permission for that feature. If permission has not been granted, the handler sends a SIGILL like the #UD handler would do if the feature would have been disabled in XCR0. If permission has been granted, then a new fpstate which fits the larger buffer requirement is allocated. In the unlikely case that this allocation fails, the handler sends SIGSEGV to the task. That's not elegant, but unavoidable as the other discussed options of preallocation or full per task permissions come with their own set of horrors for kernel and/or userspace. So this is the lesser of the evils and SIGSEGV caused by unexpected memory allocation failures is not a fundamentally new concept either. When allocation succeeds, the fpstate properties are filled in to reflect the extended feature set and the resulting sizes, the fpu::fpstate pointer is updated accordingly and the trap is disarmed for this task permanently. 4) Enumeration and size calculations 5) Trap switching via MSR_XFD The XFD (eXtended Feature Disable) MSR is context switched with the same life time rules as the FPU register state itself. The mechanism is keyed off with a static key which is default disabled so !AMX equipped CPUs have zero overhead. On AMX enabled CPUs the overhead is limited by comparing the tasks XFD value with a per CPU shadow variable to avoid redundant MSR writes. In case of switching from a AMX using task to a non AMX using task or vice versa, the extra MSR write is obviously inevitable. All other places which need to be aware of the variable feature sets and resulting variable sizes are not affected at all because they retrieve the information (feature set, sizes) unconditonally from the fpstate properties. 6) Enable the new AMX states Note, this is relatively new code despite the fact that AMX support is in the works for more than a year now. The big refactoring of the FPU code, which allowed to do a proper integration has been started exactly 3 weeks ago. Refactoring of the existing FPU code and of the original AMX patches took a week and has been subject to extensive review and testing. The only fallout which has not been caught in review and testing right away was restricted to AMX enabled systems, which is completely irrelevant for anyone outside Intel and their early access program. There might be dragons lurking as usual, but so far the fine grained refactoring has held up and eventual yet undetected fallout is bisectable and should be easily addressable before the 5.16 release. Famous last words... Many thanks to Chang Bae and Dave Hansen for working hard on this and also to the various test teams at Intel who reserved extra capacity to follow the rapid development of this closely which provides the confidence level required to offer this rather large update for inclusion into 5.16-rc1 * tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits) Documentation/x86: Add documentation for using dynamic XSTATE features x86/fpu: Include vmalloc.h for vzalloc() selftests/x86/amx: Add context switch test selftests/x86/amx: Add test cases for AMX state management x86/fpu/amx: Enable the AMX feature in 64-bit mode x86/fpu: Add XFD handling for dynamic states x86/fpu: Calculate the default sizes independently x86/fpu/amx: Define AMX state components and have it used for boot-time checks x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers x86/fpu/xstate: Add fpstate_realloc()/free() x86/fpu/xstate: Add XFD #NM handler x86/fpu: Update XFD state where required x86/fpu: Add sanity checks for XFD x86/fpu: Add XFD state to fpstate x86/msr-index: Add MSRs for XFD x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit x86/fpu: Reset permission and fpstate on exec() x86/fpu: Prepare fpu_clone() for dynamically enabled features x86/fpu/signal: Prepare for variable sigframe length x86/signal: Use fpu::__state_user_size for sigalt stack validation ...	2021-11-01 14:03:56 -07:00
Linus Torvalds	9a7e0a90a4	Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Revert the printk format based wchan() symbol resolution as it can leak the raw value in case that the symbol is not resolvable. - Make wchan() more robust and work with all kind of unwinders by enforcing that the task stays blocked while unwinding is in progress. - Prevent sched_fork() from accessing an invalid sched_task_group - Improve asymmetric packing logic - Extend scheduler statistics to RT and DL scheduling classes and add statistics for bandwith burst to the SCHED_FAIR class. - Properly account SCHED_IDLE entities - Prevent a potential deadlock when initial priority is assigned to a newly created kthread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems. - Improve idle balancing in general and especially for NOHZ enabled systems. - Provide proper interfaces for live patching so it does not have to fiddle with scheduler internals. - Add cluster aware scheduling support. - A small set of tweaks for RT (irqwork, wait_task_inactive(), various scheduler options and delaying mmdrop) - The usual small tweaks and improvements all over the place * tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) sched/fair: Cleanup newidle_balance sched/fair: Remove sysctl_sched_migration_cost condition sched/fair: Wait before decaying max_newidle_lb_cost sched/fair: Skip update_blocked_averages if we are defering load balance sched/fair: Account update_blocked_averages in newidle_balance cost x86: Fix __get_wchan() for !STACKTRACE sched,x86: Fix L2 cache mask sched/core: Remove rq_relock() sched: Improve wake_up_all_idle_cpus() take #2 irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ sched: Add cluster scheduler level for x86 sched: Add cluster scheduler level in core and related Kconfig for ARM64 topology: Represent clusters of CPUs within a die sched: Disable -Wunused-but-set-variable sched: Add wrapper for get_wchan() to keep task blocked x86: Fix get_wchan() to support the ORC unwinder proc: Use task_is_running() for wchan in /proc/$pid/stat ...	2021-11-01 13:48:52 -07:00
J. Bruce Fields	6d91929a6f	nfsd: document server-to-server-copy parameters Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-11-01 15:18:39 -04:00
Colin Ian King	d64fbe9f50	speakup: Fix typo in documentation "boo" -> "boot" There is a typo in the speakup documentation. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Link: https://lore.kernel.org/r/20211028182319.613315-1-colin.i.king@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2021-11-01 11:17:21 -06:00
Gabriel Krisman Bertazi	9abeae5d44	docs: Fix formatting of literal sections in fanotify docs Stephen Rothwell reported the following warning was introduced by commit `c0baf9ac0b` ("docs: Document the FAN_FS_ERROR event"). Documentation/admin-guide/filesystem-monitoring.rst:60: WARNING: Definition list ends without a blank line; unexpected unindent. Link: https://lore.kernel.org/r/87y26camhe.fsf@collabora.com Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>	2021-11-01 12:45:06 +01:00
Paolo Bonzini	4e33868433	Merge tag 'kvmarm-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us	2021-10-31 02:28:48 -04:00
Gabriel Krisman Bertazi	c0baf9ac0b	docs: Document the FAN_FS_ERROR event Document the FAN_FS_ERROR event for user administrators and user space developers. Link: https://lore.kernel.org/r/20211025192746.66445-32-krisman@collabora.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2021-10-27 12:53:47 +02:00
Niklas Schnelle	6aefbf1cdf	s390/pci: add s390_iommu_aperture kernel parameter Some applications map the same memory area for DMA multiple times while also mapping significant amounts of memory. With our current DMA code these applications will run out of DMA addresses after mapping half of the available memory because the number of DMA mappings is constrained by the number of concurrently active DMA addresses we support which in turn is limited by the minimum of hardware constraints and high_memory. Limiting the number of active DMA addresses to high_memory is only a heuristic to save memory used by the iommu_bitmap and DMA page tables however. This was added under the assumption that it rarely makes sense to DMA map more than system memory. To accommodate special applications which insist on double mapping, which works on other platforms, allow specifying a factor of how many times installed memory is available as DMA address space. Use 0 as a special value to apply no constraints beyond what hardware dictates at the expense of significantly more memory use. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-10-26 15:21:30 +02:00
Thomas Gleixner	3aac3ebea0	x86/signal: Implement sigaltstack size validation For historical reasons MINSIGSTKSZ is a constant which became already too small with AVX512 support. Add a mechanism to enforce strict checking of the sigaltstack size against the real size of the FPU frame. The strict check can be enabled via a config option and can also be controlled via the kernel command line option 'strict_sas_size' independent of the config switch. Enabling it might break existing applications which allocate a too small sigaltstack but 'work' because they never get a signal delivered. Though it can be handy to filter out binaries which are not yet aware of AT_MINSIGSTKSZ. Also the upcoming support for dynamically enabled FPU features requires a strict sanity check to ensure that: - Enabling of a dynamic feature, which changes the sigframe size fits into an enabled sigaltstack - Installing a too small sigaltstack after a dynamic feature has been added is not possible. Implement the base check which is controlled by config and command line options. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-3-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Junaid Shahid	4dfe4f40d8	kvm: x86: mmu: Make NX huge page recovery period configurable Currently, the NX huge page recovery thread wakes up every minute and zaps 1/nx_huge_pages_recovery_ratio of the total number of split NX huge pages at a time. This is intended to ensure that only a relatively small number of pages get zapped at a time. But for very large VMs (or more specifically, VMs with a large number of executable pages), a period of 1 minute could still result in this number being too high (unless the ratio is changed significantly, but that can result in split pages lingering on for too long). This change makes the period configurable instead of fixing it at 1 minute. Users of large VMs can then adjust the period and/or the ratio to reduce the number of pages zapped at one time while still maintaining the same overall duration for cycling through the entire list. By default, KVM derives a period from the ratio such that a page will remain on the list for 1 hour on average. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20211020010627.305925-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-22 05:19:28 -04:00
Jim Cromie	09ee10ff80	dyndbg: refine verbosity 1-4 summary-detail adjust current vpr_info() calls to fit an overview..detail scheme: 1- module level activity: add/remove, etc 2- command ingest, splitting, summary of effects. per >control write 3- command parsing: op, flags, search terms 4- per-site change msg can yield ~3k x 2 logs per echo "+p;-p" > command. Summarize these 4 levels in MODULE_PARM_DESC, and update verbose=3 in Doc. 2- is new, to isolate a problem where a stress-test script (which feeds ~4kb multi-command strings) would produce short writes, truncating last command and causing parsing errors, which confused test results. The script fix was to use syswrite, to deliver full proper commands. 4- gets per-callsite "changed:" pr-infos, which are very noisy during stress tests, and formerly obscured v1-3 messages, and overwhelmed the static-key workload being tested. The verbose parameter has previously seen adjustment: commit `481c0e33f1` ("dyndbg: refine debug verbosity; 1 is basic, 2 more chatty") The script driving these adjustments is: !/usr/bin/perl -w =for Doc 1st purpose was to benchmark the effect of wildcard queries on query performance; if wildcards are risk free cheap enough, we can deploy them in the (floating) format search. 1st finding: wildcards take 2x as long to process. 2nd purpose was to benchmark real static-key changes VS simple flag changes. Found ~100x decrease for the hard work. The script maximizes workload per >control by packing it a ~4kb string of "+p; -p;" commands; this uncovered some broken stuff. The 85th query failed, and appears to be truncated, so is gramatically incorrect. Its either an error here, or in the kernel. Its not happening atm, retest. Plot thickens: fail only happens doing +-p, not +-mf, likely load dependent. Error remains consistent. Looks like a short write, longer on writer than kernel-reader. Try syswrite on handle to control this. That fixed short write. =cut use Getopt::Std; getopts('vN:k:', \my %opts) or die <<EOH; $0 options: -v verbose -k=n kernel dyndbg verbosity -N=n number of loops.. tbrc EOH $opts{N} //= 10; # !undef, 0 tests too long. my $ctrl = '/proc/dynamic_debug/control'; vx($opts{k}) if defined $opts{k}; # works on -k0 open(my $CTL, '>', $ctrl) or die "cant open $ctrl for writing: $!\n"; sub vx { my $arg = shift; my $cmd = "echo $arg > /sys/module/dynamic_debug/parameters/verbose"; system($cmd); warn("vx problem: rc:$? err:$! qry: $cmd\n") if ($?); } sub qryOK { my $qry = shift; print "syntax test: <\n$qry>\n" if $opts{v}; my $bytes = syswrite $CTL, $qry; printf "short read: $bytes / %d\n", length $qry if $bytes < length $qry; if ($?) { warn "rc:$? err:$! qry: $qry\n"; return 0; } return 1; } sub build_queries { my ($cmd, $flags, $ct) = @_; # build experiment and reference queries my $cycle = " $cmd +$flags # on ; $cmd -$flags # off \n"; my $ref = " +$flags ; -$flags \n"; my $len = length $cycle; my $max = int(4096 / $len); # break/fit to buffer size $ct \|= $max; print "qry: ct:$max x << \n$cycle >>\n"; return unless qryOK($ref); return unless qryOK($cycle); my $wild = $cycle x $ct; my $empty = $ref x $ct; printf "len: %d, %d\n", length $wild, length $empty; return { trial => $wild, ref => $empty, probe => $cycle, zero => $ref, count => $ct, max => $max }; } my $query_set = build_queries(' file "" module "" func "" ', "mf"); qryOK($query_set->{zero}); qryOK($query_set->{probe}); qryOK($query_set->{ref}); qryOK($query_set->{trial}); use Benchmark; sub dobatch { my ($cmd, $flags, $reps, $ct) = @_; $reps \|\|= $opts{N}; my $qs = build_queries($cmd, $flags, $ct); timethese($reps, { wildcards => sub { syswrite $CTL, $qs->{trial}; }, no_search => sub { syswrite $CTL, $qs->{ref}; } } ); } sub bench_static_key_toggle { vx 0; dobatch(' file "" module "" func "" ', "mf"); dobatch(' file "" module "" func "" ', "p"); } sub bench_verbose_levels { for my $i (0..4) { vx $i; dobatch(' file "" module "" func "" ', "mf"); } } bench_static_key_toggle(); __END__ Heres how the test-script runs: :: verbose=3 parsing info [ 48.401646] dyndbg: query 95: "file "" module "" func "" -mf # off " mod:* [ 48.402040] dyndbg: split into words: "file" "" "module" "" "func" "" "-mf" [ 48.402456] dyndbg: op='-' [ 48.402615] dyndbg: flags=0x6 [ 48.402779] dyndbg: flagsp=0x0 maskp=0xfffffff9 [ 48.403033] dyndbg: parsed: func="" file="" module="" format="" lineno=0-0 [ 48.403674] dyndbg: applied: func="" file="" module="" format="" lineno=0-0 :: verbose=2 >control summary. ~300k site matches/changes per 4kb command [ 48.404063] dyndbg: processed 96 queries, with 296160 matches, 0 errs :: 2 queries against each other, no-search vs all-wildcard-search qry: ct:48 x << file "" module "" func "" +mf # on ; file "" module "" func "" -mf # off >> len: 4080, 576 Benchmark: timing 10 iterations of no_search, wildcards... no_search: 0 wallclock secs ( 0.00 usr + 0.03 sys = 0.03 CPU) @ 333.33/s (n=10) (warning: too few iterations for a reliable count) wildcards: 0 wallclock secs ( 0.00 usr + 0.09 sys = 0.09 CPU) @ 111.11/s (n=10) (warning: too few iterations for a reliable count) :: 2 queries, both doing real work / changing stati-key states. qry: ct:49 x << file "" module "" func "" +p # on ; file "" module "" func "*" -p # off >> len: 4067, 490 Benchmark: timing 10 iterations of no_search, wildcards... no_search: 20 wallclock secs ( 0.00 usr + 20.36 sys = 20.36 CPU) @ 0.49/s (n=10) wildcards: 21 wallclock secs ( 0.00 usr + 21.08 sys = 21.08 CPU) @ 0.47/s (n=10) bash-5.1# Thats 150k static-key-toggles / sec ~600x slower than simple flags on qemu --smp 3 run Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20211019210746.185307-1-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-21 13:01:25 +02:00
Mauro Carvalho Chehab	fad956fc5c	dt-bindings: reserved-memory: ramoops: update ramoops.yaml references Changeset `89a5bf0f22` ("dt-bindings: reserved-memory: ramoops: Convert txt bindings to yaml") renamed: Documentation/devicetree/bindings/reserved-memory/ramoops.txt to: Documentation/devicetree/bindings/reserved-memory/ramoops.yaml. Update the cross-references accordingly. Fixes: `89a5bf0f22` ("dt-bindings: reserved-memory: ramoops: Convert txt bindings to yaml") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: David Heidelberg <david@ixit.cz> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/bccd9c181b68a1ebbaefd5dcce63e1b8a4b1596c.1634630486.git.mchehab+huawei@kernel.org	2021-10-19 11:54:16 -05:00
Greg Kroah-Hartman	b5bc8ac25a	Merge 5.15-rc6 into driver-core-next We need the driver-core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-18 09:43:37 +02:00
Jonathan Cameron	c5e22feffd	topology: Represent clusters of CPUs within a die Both ACPI and DT provide the ability to describe additional layers of topology between that of individual cores and higher level constructs such as the level at which the last level cache is shared. In ACPI this can be represented in PPTT as a Processor Hierarchy Node Structure [1] that is the parent of the CPU cores and in turn has a parent Processor Hierarchy Nodes Structure representing a higher level of topology. For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. +-----------------------------------+ +---------+ \| +------+ +------+ +--------------------------+ \| \| \| CPU0 \| \| cpu1 \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| +----+ L3 \| \| \| \| +------+ +------+ cluster \| \| tag \| \| \| \| \| CPU2 \| \| CPU3 \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +----+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| L3 \| \| data \| +-----------------------------------+ \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ +----+ L3 \| \| \| \| \| \| tag \| \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ +--------------------------+ \| +-----------------------------------\| \| \| +-----------------------------------\| \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| +----+ L3 \| \| \| \| +------+ +------+ \| \| tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +---+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +--+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| +---------+ +-----------------------------------+ That means spreading tasks among clusters will bring more bandwidth while packing tasks within one cluster will lead to smaller cache synchronization latency. So both kernel and userspace will have a chance to leverage this topology to deploy tasks accordingly to achieve either smaller cache latency within one cluster or an even distribution of load among clusters for higher throughput. This patch exposes cluster topology to both kernel and userspace. Libraried like hwloc will know cluster by cluster_cpus and related sysfs attributes. PoC of HWLOC support at [2]. Note this patch only handle the ACPI case. Special consideration is needed for SMT processors, where it is necessary to move 2 levels up the hierarchy from the leaf nodes (thus skipping the processor core level). Note that arm64 / ACPI does not provide any means of identifying a die level in the topology but that may be unrelate to the cluster level. [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node structure (Type 0) [2] https://github.com/hisilicon/hwloc/tree/linux-cluster Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210924085104.44806-2-21cnbao@gmail.com	2021-10-15 11:25:15 +02:00
Andrew Halaney	5879f1c94d	Documentation: dyndbg: Improve cli param examples Jim pointed out that using $module.dyndbg= is always a more flexible choice for using dynamic debug on the command line. The $module.dyndbg style is checked at boot and handles if $module is a builtin. If it is actually a loadable module, it is handled again later when the module is loaded. If you just use dyndbg="module $module +p" dynamic debug is only enabled when $module is a builtin. It was recommended to illustrate wildcard usage as well. Suggested-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Link: https://lore.kernel.org/r/1634139622-20667-4-git-send-email-jbaron@akamai.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-14 10:48:56 +02:00
Andrew Halaney	9c40e1aa84	dyndbg: Remove support for ddebug_query param This param has been deprecated for a very long time now, let's rip it out. Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Link: https://lore.kernel.org/r/1634139622-20667-3-git-send-email-jbaron@akamai.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-14 10:48:56 +02:00
Linus Torvalds	459ea72c6c	Merge branch 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "All documentation / comment updates" * 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroupv2, docs: fix misinformation in "device controller" section cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem docs/cgroup: remove some duplicate words	2021-10-11 17:16:41 -07:00
Marc Zyngier	cd67e9af77	Merge branch kvm-arm64/pkvm/restrict-hypercalls into kvmarm-master/next * kvm-arm64/pkvm/restrict-hypercalls: : . : Restrict the use of some hypercalls as well as kexec once : the protected KVM mode has been initialised. : . Documentation: admin-guide: Document side effects when pKVM is enabled Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-10-11 16:55:03 +01:00
Alexandru Elisei	53e8ce137f	Documentation: admin-guide: Document side effects when pKVM is enabled Recent changes to KVM for arm64 has made it impossible for the host to hibernate or use kexec when protected mode is enabled via the kernel command line. There are people who rely on kexec (for example, developers who use kexec as a quick way to test a new kernel), let's document this change in behaviour, so it doesn't catch them by surprise and we have a place to point people to if it does. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211011153835.291147-1-alexandru.elisei@arm.com	2021-10-11 16:52:11 +01:00
Marc Zyngier	b6a68b97af	KVM: arm64: Allow KVM to be disabled from the command line Although KVM can be compiled out of the kernel, it cannot be disabled at runtime. Allow this possibility by introducing a new mode that will prevent KVM from initialising. This is useful in the (limited) circumstances where you don't want KVM to be available (what is wrong with you?), or when you want to install another hypervisor instead (good luck with that). Reviewed-by: David Brazdil <dbrazdil@google.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Scull <ascull@google.com> Link: https://lore.kernel.org/r/20211001170553.3062988-1-maz@kernel.org	2021-10-11 09:48:47 +01:00
Linus Torvalds	3946b46cab	Merge tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - fix two minor issues in the Xen privcmd driver plus a cleanup patch for that driver - fix multiple issues related to running as PVH guest and some related earlyprintk fixes for other Xen guest types - fix an issue introduced in 5.15 the Xen balloon driver * tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: fix cancelled balloon action xen/x86: adjust data placement x86/PVH: adjust function/data placement xen/x86: hook up xen_banner() also for PVH xen/x86: generalize preferred console model from PV to PVH Dom0 xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU xen/x86: allow "earlyprintk=xen" to work for PV Dom0 xen/x86: make "earlyprintk=xen" work better for PVH Dom0 xen/x86: allow PVH Dom0 without XEN_PV=y xen/x86: prevent PVH type from getting clobbered xen/privcmd: drop "pages" parameter from xen_remap_pfn() xen/privcmd: fix error handling in mmap-resource processing xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages	2021-10-08 12:55:23 -07:00
Sakari Ailus	566778bc1d	media: admin-guide: Update i2c-cardlist Add MIPI CCS compliant devices, a few Sony IMX, Hynix Hi-846 and Omnivision ov13b10 sensors and the DW9768 lens driver to the list of supported devices. Also drop SMIA since as a standard it is obsolete. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2021-10-08 13:24:52 +02:00
Martin Kepplinger	5fe23d700d	media: Documentation: i2c-cardlist: add the Hynix hi846 sensor Add the SK Hynix Hi-846 8M Pixel CMOS image sensor to the i2c-cardlist. Signed-off-by: Martin Kepplinger <martin.kepplinger@puri.sm> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2021-10-08 13:24:04 +02:00
Huaixin Chang	d73df887b6	sched/fair: Add document for burstable CFS bandwidth Basic description of usage and effect for CFS Bandwidth Control Burst. Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210830032215.16302-3-changhuaixin@linux.alibaba.com	2021-10-05 15:51:41 +02:00
Jan Beulich	42bc9716bc	xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU xenboot_write_console() is dealing with these quite fine so I don't see why xenboot_console_setup() would return -ENOENT in this case. Adjust documentation accordingly. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/3d212583-700e-8b2d-727a-845ef33ac265@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2021-10-05 08:36:05 +02:00
Jonathan Corbet	b718f9d919	Merge tag 'v5.15-rc4' into docs-next This is needed to get a docs fix that entered via the DRM tree; testers have requested it so that PDF builds in docs-next work again.	2021-10-04 16:44:16 -06:00
Andrea Arcangeli	d9bbdbf324	x86: deduplicate the spectre_v2_user documentation This would need updating to make prctl be the new default, but it's simpler to delete it and refer to the dup. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201105001406.13005-2-aarcange@redhat.com	2021-10-04 12:12:57 -07:00
Andrea Arcangeli	2f46993d83	x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl Switch the kernel default of SSBD and STIBP to the ones with CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl spectre_v2_user=prctl) even if CONFIG_SECCOMP=y. Several motivations listed below: - If SMT is enabled the seccomp jail can still attack the rest of the system even with spectre_v2_user=seccomp by using MDS-HT (except on XEON PHI where MDS can be tamed with SMT left enabled, but that's a special case). Setting STIBP become a very expensive window dressing after MDS-HT was discovered. - The seccomp jail cannot attack the kernel with spectre-v2-HT regardless (even if STIBP is not set), but with MDS-HT the seccomp jail can attack the kernel too. - With spec_store_bypass_disable=prctl the seccomp jail can attack the other userland (guest or host mode) using spectre-v2-HT, but the userland attack is already mitigated by both ASLR and pid namespaces for host userland and through virt isolation with libkrun or kata. (if something if somebody is worried about spectre-v2-HT it's best to mount proc with hidepid=2,gid=proc on workstations where not all apps may run under container runtimes, rather than slowing down all seccomp jails, but the best is to add pid namespaces to the seccomp jail). As opposed MDS-HT is not mitigated and the seccomp jail can still attack all other host and guest userland if SMT is enabled even with spec_store_bypass_disable=seccomp. - If full security is required then MDS-HT must also be mitigated with nosmt and then spectre_v2_user=prctl and spectre_v2_user=seccomp would become identical. - Setting spectre_v2_user=seccomp is overall lower priority than to setting javascript.options.wasm false in about:config to protect against remote wasm MDS-HT, instead of worrying about Spectre-v2-HT and STIBP which again is already statistically well mitigated by other means in userland and it's fully mitigated in kernel with retpolines (unlike the wasm assist call with MDS-HT). - SSBD is needed to prevent reading the JIT memory and the primary user being the OpenJDK. However the primary user of SSBD wouldn't be covered by spec_store_bypass_disable=seccomp because it doesn't use seccomp and the primary user also explicitly declined to set PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS despite it easily could. In fact it would need to set it only when the sandboxing mechanism is enabled for javaws applets, but it still declined it by declaring security within the same user address space as an untenable objective for their JIT, even in the sandboxing case where performance would be a lesser concern (for the record: I kind of disagree in not setting PR_SPEC_STORE_BYPASS in the sandbox case and I prefer to run javaws through a wrapper that sets PR_SPEC_STORE_BYPASS if I need). In turn it can be inferred that even if the primary user of SSBD would use seccomp, they would invoke it with SECCOMP_FILTER_FLAG_SPEC_ALLOW by now. - runc/crun already set SECCOMP_FILTER_FLAG_SPEC_ALLOW by default, k8s and podman have a default json seccomp allowlist that cannot be slowed down, so for the #1 seccomp user this change is already a noop. - systemd/sshd or other apps that use seccomp, if they really need STIBP or SSBD, they need to explicitly set the PR_SET_SPECULATION_CTRL by now. The stibp/ssbd seccomp blind catch-all approach was done probably initially with a wishful thinking objective to pretend to have a peace of mind that it could magically fix it all. That was wishful thinking before MDS-HT was discovered, but after MDS-HT has been discovered it become just window dressing. - For qemu "-sandbox" seccomp jail it wouldn't make sense to set STIBP or SSBD. SSBD doesn't help with KVM because there's no JIT (if it's needed with TCG it should be an opt-in with PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS and it shouldn't slowdown KVM for nothing). For qemu+KVM STIBP would be even more window dressing than it is for all other apps, because in the qemu+KVM case there's not only the MDS attack to worry about with SMT enabled. Even after disabling SMT, there's still a theoretical spectre-v2 attack possible within the same thread context from guest mode to host ring3 that the host kernel retpoline mitigation has no theoretical chance to mitigate. On some kernels a ibrs-always/ibrs-retpoline opt-in model is provided that will enabled IBRS in the qemu host ring3 userland which fixes this theoretical concern. Only after enabling IBRS in the host userland it would then make sense to proceed and worry about STIBP and an attack on the other host userland, but then again SMT would need to be disabled for full security anyway, so that would render STIBP again a noop. - last but not the least: the lack of "spec_store_bypass_disable=prctl spectre_v2_user=prctl" means the moment a guest boots and sshd/systemd runs, the guest kernel will write to SPEC_CTRL MSR which will make the guest vmexit forever slower, forcing KVM to issue a very slow rdmsr instruction at every vmexit. So the end result is that SPEC_CTRL MSR is only available in GCE. Most other public cloud providers don't expose SPEC_CTRL, which means that not only STIBP/SSBD isn't available, but IBPB isn't available either (which would cause no overhead to the guest or the hypervisor because it's write only and requires no reading during vmexit). So the current default already net loss in security (missing IBPB) which means most public cloud providers cannot achieve a fully secure guest with nosmt (and nosmt is enough to fully mitigate MDS-HT). It also means GCE and is unfairly penalized in performance because it provides the option to enable full security in the guest as an opt-in (i.e. nosmt and IBPB). So this change will allow all cloud providers to expose SPEC_CTRL without incurring into any hypervisor slowdown and at the same time it will remove the unfair penalization of GCE performance for doing the right thing and it'll allow to get full security with nosmt with IBPB being available (and STIBP becoming meaningless). Example to put things in prospective: the STIBP enabled in seccomp has never been about protecting apps using seccomp like sshd from an attack from a malicious userland, but to the contrary it has always been about protecting the system from an attack from sshd, after a successful remote network exploit against sshd. In fact initially it wasn't obvious STIBP would work both ways (STIBP was about preventing the task that runs with STIBP to be attacked with spectre-v2-HT, but accidentally in the STIBP case it also prevents the attack in the other direction). In the hypothetical case that sshd has been remotely exploited the last concern should be STIBP being set, because it'll be still possible to obtain info even from the kernel by using MDS if nosmt wasn't set (and if it was set, STIBP is a noop in the first place). As opposed kernel cannot leak anything with spectre-v2 HT because of retpolines and the userland is mitigated by ASLR already and ideally PID namespaces too. If something it'd be worth checking if sshd run the seccomp thread under pid namespaces too if available in the running kernel. SSBD also would be a noop for sshd, since sshd uses no JIT. If sshd prefers to keep doing the STIBP window dressing exercise, it still can even after this change of defaults by opting-in with PR_SPEC_INDIRECT_BRANCH. Ultimately setting SSBD and STIBP by default for all seccomp jails is a bad sweet spot and bad default with more cons than pros that end up reducing security in the public cloud (by giving an huge incentive to not expose SPEC_CTRL which would be needed to get full security with IBPB after setting nosmt in the guest) and by excessively hurting performance to more secure apps using seccomp that end up having to opt out with SECCOMP_FILTER_FLAG_SPEC_ALLOW. The following is the verified result of the new default with SMT enabled: (gdb) print spectre_v2_user_stibp $1 = SPECTRE_V2_USER_PRCTL (gdb) print spectre_v2_user_ibpb $2 = SPECTRE_V2_USER_PRCTL (gdb) print ssb_mode $3 = SPEC_STORE_BYPASS_PRCTL Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201104235054.5678-1-aarcange@redhat.com Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/lkml/AAA2EF2C-293D-4D5B-BFA6-FF655105CD84@redhat.com Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/lkml/c0722838-06f7-da6b-138f-e0f26362f16a@redhat.com	2021-10-04 12:12:57 -07:00
Pedro Terra	9b4a9b31b9	media: vimc: Enable set resolution at the scaler src pad Modify the scaler subdevice to accept setting the resolution of the source pad (previously the source resolution would always be 3 times the sink for both dimensions). Now any resolution can be set at src (even smaller ones) and the sink video will be scaled to match it. Test example: With the vimc module up (using the default vimc topology) media-ctl -d platform:vimc -V '"Sensor A":0[fmt:SBGGR8_1X8/640x480]' media-ctl -d platform:vimc -V '"Debayer A":0[fmt:SBGGR8_1X8/640x480]' media-ctl -d platform:vimc -V '"Scaler":0[fmt:RGB888_1X24/640x480]' media-ctl -d platform:vimc -V '"Scaler":0[crop:(100,50)/400x150]' media-ctl -d platform:vimc -V '"Scaler":1[fmt:RGB888_1X24/300x700]' v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v width=300,height=700 v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v pixelformat=BA81 v4l2-ctl --stream-mmap --stream-count=10 -z platform:vimc -d "RGB/YUV Capture" \ --stream-to=test.raw The result will be a cropped stream that can be checked with the command ffplay -loglevel warning -v info -f rawvideo -pixel_format rgb24 \ -video_size "300x700" test.raw Co-developed-by: Gabriela Bittencourt <gabrielabittencourt00@gmail.com> Signed-off-by: Gabriela Bittencourt <gabrielabittencourt00@gmail.com> Co-developed-by: Gabriel Francisco Mandaji <gfmandaji@gmail.com> Signed-off-by: Gabriel Francisco Mandaji <gfmandaji@gmail.com> Signed-off-by: Pedro Terra <pedro@terraco.de> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2021-09-30 10:07:59 +02:00
Fabio Estevam	1e6494daaf	media: imx7.rst: Provide an example for imx6ull-evk capture imx6ull-evk has a parallel OV5640 sensor. Provide an example for imx6ull-evk capture to improve the document. Signed-off-by: Fabio Estevam <festevam@denx.de> Acked-by: Rui Miguel Silva <rmfrfs@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2021-09-30 10:07:56 +02:00
Nícolas F. R. A. Prado	75821f8107	media: ipu3.rst: Improve header formatting on tables Use the header-rows option of the flat-table directive in order to have the first row displayed as a header. Also capitalize these headers. These changes make the tables easier to read. Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2021-09-30 10:07:44 +02:00
Ezequiel Garcia	78eee7b5f1	media: Rename V4L2_PIX_FMT_HM12 to V4L2_PIX_FMT_NV12_16L16 The V4L2_PIX_FMT_HM12 format is actually a simple NV12 tiled format, with 16x16 linear tiles. Rename the format and move its documentation together with the other tiled NV12 formats. Keep V4L2_PIX_FMT_HM12 for application compatibility. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2021-09-30 10:07:39 +02:00
Eugene Syromiatnikov	61bc346ce6	uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument Commit `7ac592aa35` ("sched: prctl() core-scheduling interface") made use of enum pid_type in prctl's arg4; this type and the associated enumeration definitions are not exposed to userspace. Christian has suggested to provide additional macro definitions that convey the meaning of the type argument more in alignment with its actual usage, and this patch does exactly that. Link: https://lore.kernel.org/r/20210825170613.GA3884@asgard.redhat.com Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Complements: `7ac592aa35` ("sched: prctl() core-scheduling interface") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-09-29 13:00:05 +02:00
Tiberiu A Georgescu	5b32e44e8b	Documentation: update pagemap with shmem exceptions Mentioning the current missing information in the pagemap and alternatives on how to retrieve it, in case someone stumbles upon unexpected behaviour. Signed-off-by: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com> Reviewed-by: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Reviewed-by: Florian Schmidt <florian.schmidt@nutanix.com> Reviewed-by: Carl Waldspurger <carl.waldspurger@nutanix.com> Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20210923064618.157046-2-tiberiu.georgescu@nutanix.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2021-09-27 11:27:20 -06:00
Chunguang Xu	4b53bb873f	docs/cgroup: add entry for misc.events Added descriptions of misc.events. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-09-20 07:35:38 -10:00
ArthurChiao	c0002d11d7	cgroupv2, docs: fix misinformation in "device controller" section Hotmail was rejected by the mailing list, switched to gmail to resend. 1. Clarify cgroup BPF program type and attach type; 2. Fix file path broken. Signed-off-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-09-13 08:08:46 -10:00
Chunguang Xu	22b1255792	docs/cgroup: remove some duplicate words When I tried to add some new entries to cgroup-v2.rst, I found that the description of memory.events had some repetitive words, so I tried to delete them. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-09-13 07:55:35 -10:00
Linus Torvalds	316346243b	Merge branch 'gcc-min-version-5.1' (make gcc-5.1 the minimum version) Merge patch series from Nick Desaulniers to update the minimum gcc version to 5.1. This is some of the left-overs from the merge window that I didn't want to deal with yesterday, so it comes in after -rc1 but was sent before. Gcc-4.9 support has been an annoyance for some time, and with -Werror I had the choice of applying a fairly big patch from Kees Cook to remove a fair number of initializer warnings (still leaving some), or this patch series from Nick that just removes the source of the problem. The initializer cleanups might still be worth it regardless, but honestly, I preferred just tackling the problem with gcc-4.9 head-on. We've been more aggressiuve about no longer having to care about compilers that were released a long time ago, and I think it's been a good thing. I added a couple of patches on top to sort out a few left-overs now that we no longer support gcc-4.x. As noted by Arnd, as a result of this minimum compiler version upgrade we can probably change our use of '--std=gnu89' to '--std=gnu11', and finally start using local loop declarations etc. But this series does _not_ yet do that. Link: https://lore.kernel.org/all/20210909182525.372ee687@canb.auug.org.au/ Link: https://lore.kernel.org/lkml/CAK7LNASs6dvU6D3jL2GG3jW58fXfaj6VNOe55NJnTB8UPuk2pA@mail.gmail.com/ Link: https://github.com/ClangBuiltLinux/linux/issues/1438 * emailed patches from Nick Desaulniers <ndesaulniers@google.com>: Drop some straggling mentions of gcc-4.9 as being stale compiler_attributes.h: drop __has_attribute() support for gcc4 vmlinux.lds.h: remove old check for GCC 4.9 compiler-gcc.h: drop checks for older GCC versions Makefile: drop GCC < 5 -fno-var-tracking-assignments workaround arm64: remove GCC version check for ARCH_SUPPORTS_INT128 powerpc: remove GCC version check for UPD_CONSTR riscv: remove Kconfig check for GCC version for ARCH_RV64I Kconfig.debug: drop GCC 5+ version check for DWARF5 mm/ksm: remove old GCC 4.9+ check compiler.h: drop fallback overflow checkers Documentation: raise minimum supported version of GCC to 5.1	2021-09-13 10:43:04 -07:00
Linus Torvalds	df26327ea0	Drop some straggling mentions of gcc-4.9 as being stale Fix up the admin-guide README file to the new gcc-5.1 requirement, and remove a stale comment about gcc support for the __assume_aligned__ attribute. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-13 10:29:44 -07:00
Linus Torvalds	43175623dd	Merge tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - Add migrate-disable counter to tracing header - Fix error handling in event probes - Fix missed unlock in osnoise in error path - Fix merge issue with tools/bootconfig - Clean up bootconfig data when init memory is removed - Fix bootconfig to loop only on subkeys - Have kernel command lines override bootconfig options - Increase field counts for synthetic events - Have histograms dynamic allocate event elements to save space - Fixes in testing and documentation * tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/boot: Fix to loop on only subkeys selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events tracing: Dynamically allocate the per-elt hist_elt_data array tracing: synth events: increase max fields count tools/bootconfig: Show whole test command for each test case bootconfig: Fix missing return check of xbc_node_compose_key function tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh docs: bootconfig: Add how to use bootconfig for kernel parameters init/bootconfig: Reorder init parameter from bootconfig and cmdline init: bootconfig: Remove all bootconfig data when the init memory is removed tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads() tracing: Fix some alloc_event_probe() error handling bugs tracing: Add migrate-disabled counter to tracing output.	2021-09-09 13:11:15 -07:00
Linus Torvalds	0aa2516017	Merge tag 'dmaengine-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New drivers/devices - Support for Renesas RZ/G2L dma controller - New driver for AMD PTDMA controller Updates: - Big pile of idxd updates - Updates for Altera driver, stm32-dma, dw etc" * tag 'dmaengine-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (83 commits) dmaengine: sh: fix some NULL dereferences dmaengine: sh: Fix unused initialization of pointer lmdesc MAINTAINERS: Fix AMD PTDMA DRIVER entry dmaengine: ptdma: remove PT_OFFSET to avoid redefnition dmaengine: ptdma: Add debugfs entries for PTDMA dmaengine: ptdma: register PTDMA controller as a DMA resource dmaengine: ptdma: Initial driver for the AMD PTDMA dmaengine: fsl-dpaa2-qdma: Fix spelling mistake "faile" -> "failed" dmaengine: idxd: remove interrupt disable for dev_lock dmaengine: idxd: remove interrupt disable for cmd_lock dmaengine: idxd: fix setting up priv mode for dwq dmaengine: xilinx_dma: Set DMA mask for coherent APIs dmaengine: ti: k3-psil-j721e: Add entry for CSI2RX dmaengine: sh: Add DMAC driver for RZ/G2L SoC dmaengine: Extend the dma_slave_width for 128 bytes dt-bindings: dma: Document RZ/G2L bindings dmaengine: ioat: depends on !UML dmaengine: idxd: set descriptor allocation size to threshold for swq dmaengine: idxd: make submit failure path consistent on desc freeing dmaengine: idxd: remove interrupt flag for completion list spinlock ...	2021-09-09 11:07:47 -07:00
Linus Torvalds	9c566611ac	Merge tag 'acpi-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These add ACPI support to the PCI VMD driver, improve suspend-to-idle support for AMD platforms and update documentation. Specifics: - Add ACPI support to the PCI VMD driver (Rafael Wysocki) - Rearrange suspend-to-idle support code to reflect the platform firmware expectations on some AMD platforms (Mario Limonciello) - Make SSDT overlays documentation follow the code documented by it more closely (Andy Shevchenko)" * tag 'acpi-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PM: s2idle: Run both AMD and Microsoft methods if both are supported Documentation: ACPI: Align the SSDT overlays file with the code PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus	2021-09-08 16:33:21 -07:00
Linus Torvalds	2d338201d5	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "147 patches, based on `7d2a07b769`. Subsystems affected by this patch series: mm (memory-hotplug, rmap, ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan), alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib, checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig, selftests, ipc, and scripts" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) scripts: check_extable: fix typo in user error message mm/workingset: correct kernel-doc notations ipc: replace costly bailout check in sysvipc_find_ipc() selftests/memfd: remove unused variable Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH configs: remove the obsolete CONFIG_INPUT_POLLDEV prctl: allow to setup brk for et_dyn executables pid: cleanup the stale comment mentioning pidmap_init(). kernel/fork.c: unexport get_{mm,task}_exe_file coredump: fix memleak in dump_vma_snapshot() fs/coredump.c: log if a core dump is aborted due to changed file permissions nilfs2: use refcount_dec_and_lock() to fix potential UAF nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_device_group trap: cleanup trap_init() init: move usermodehelper_enable() to populate_rootfs() ...	2021-09-08 12:55:35 -07:00
Masami Hiramatsu	26c9c72fd0	docs: bootconfig: Add how to use bootconfig for kernel parameters Add a section to describe how to use the bootconfig for specifying kernel and init parameters. This is an important section because it is the reason why this document is under the admin-guide. Link: https://lkml.kernel.org/r/163077086399.222577.5881779375643977991.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-09-08 15:10:41 -04:00
SeongJae Park	c4ba6014ae	Documentation: add documents for DAMON This commit adds documents for DAMON under `Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`. Link: https://lkml.kernel.org/r/20210716081449.22187-11-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Reviewed-by: Markus Boehme <markubo@amazon.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 11:50:25 -07:00
David Hildenbrand	ac3332c447	memory-hotplug.rst: complete admin-guide overhaul The memory hot(un)plug documentation is outdated and incomplete. Most of the content dates back to 2007, so it's time for a major overhaul. Let's rewrite, reorganize and update most parts of the documentation. In addition to memory hot(un)plug, also add some details regarding ZONE_MOVABLE, with memory hotunplug being one of its main consumers. Drop the file history, that information can more reliably be had from the git log. The style of the document is also properly fixed that e.g., "restview" renders it cleanly now. In the future, we might add some more details about virt users like virtio-mem, the XEN balloon, the Hyper-V balloon and ppc64 dlpar. Link: https://lkml.kernel.org/r/20210707073205.3835-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 11:50:22 -07:00
David Hildenbrand	df82bf5a9f	memory-hotplug.rst: remove locking details from admin-guide Patch series "memory-hotplug.rst: complete admin-guide overhaul", v3. This patch (of 2): We have the same content at Documentation/core-api/memory-hotplug.rst and it doesn't fit into the admin-guide. The documentation was accidentially duplicated when merging. Link: https://lkml.kernel.org/r/20210707073205.3835-1-david@redhat.com Link: https://lkml.kernel.org/r/20210707073205.3835-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 11:50:22 -07:00
Linus Torvalds	69a5c49a91	Merge tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - New DART IOMMU driver for Apple Silicon M1 chips - Optimizations for iommu_[map/unmap] performance - Selective TLB flush support for the AMD IOMMU driver to make it more efficient on emulated IOMMUs - Rework IOVA setup and default domain type setting to move more code out of IOMMU drivers and to support runtime switching between certain types of default domains - VT-d Updates from Lu Baolu: - Update the virtual command related registers - Enable Intel IOMMU scalable mode by default - Preset A/D bits for user space DMA usage - Allow devices to have more than 32 outstanding PRs - Various cleanups - ARM SMMU Updates from Will Deacon: SMMUv3: - Minor optimisation to avoid zeroing struct members on CMD submission - Increased use of batched commands to reduce submission latency - Refactoring in preparation for ECMDQ support SMMUv2: - Fix races when probing devices with identical StreamIDs - Optimise walk cache flushing for Qualcomm implementations - Allow deep sleep states for some Qualcomm SoCs with shared clocks - Various smaller optimizations, cleanups, and fixes * tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (85 commits) iommu/io-pgtable: Abstract iommu_iotlb_gather access iommu/arm-smmu: Fix missing unlock on error in arm_smmu_device_group() iommu/vt-d: Add present bit check in pasid entry setup helpers iommu/vt-d: Use pasid_pte_is_present() helper function iommu/vt-d: Drop the kernel doc annotation iommu/vt-d: Allow devices to have more than 32 outstanding PRs iommu/vt-d: Preset A/D bits for user space DMA usage iommu/vt-d: Enable Intel IOMMU scalable mode by default iommu/vt-d: Refactor Kconfig a bit iommu/vt-d: Remove unnecessary oom message iommu/vt-d: Update the virtual command related registers iommu: Allow enabling non-strict mode dynamically iommu: Merge strictness and domain type configs iommu: Only log strictness for DMA domains iommu: Expose DMA domain strictness via sysfs iommu: Express DMA strictness via the domain type iommu/vt-d: Prepare for multiple DMA domain types iommu/arm-smmu: Prepare for multiple DMA domain types iommu/amd: Prepare for multiple DMA domain types iommu: Introduce explicit type for non-strict DMA domains ...	2021-09-03 10:44:35 -07:00
Linus Torvalds	14726903c8	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...	2021-09-03 10:08:28 -07:00

1 2 3 4 5 ...

1855 Commits