This mechanically converts all remaining cases of ancient open-coded timer
setup with the old setup_timer() API, which is the first step in timer
conversions. This has no behavioral changes, since it ultimately just
changes the order of assignment to fields of struct timer_list when
finding variations of:
init_timer(&t);
f.function = timer_callback;
t.data = timer_callback_arg;
to be converted into:
setup_timer(&t, timer_callback, timer_callback_arg);
The conversion is done with the following Coccinelle script, which
is an improved version of scripts/cocci/api/setup_timer.cocci, in the
following ways:
- assignments-before-init_timer() cases
- limit the .data case removal to the specific struct timer_list instance
- handling calls by dereference (timer->field vs timer.field)
spatch --very-quiet --all-includes --include-headers \
-I ./arch/x86/include -I ./arch/x86/include/generated \
-I ./include -I ./arch/x86/include/uapi \
-I ./arch/x86/include/generated/uapi -I ./include/uapi \
-I ./include/generated/uapi --include ./include/linux/kconfig.h \
--dir . \
--cocci-file ~/src/data/setup_timer.cocci
@fix_address_of@
expression e;
@@
init_timer(
-&(e)
+&e
, ...)
// Match the common cases first to avoid Coccinelle parsing loops with
// "... when" clauses.
@match_immediate_function_data_after_init_timer@
expression e, func, da;
@@
-init_timer
+setup_timer
( \(&e\|e\)
+, func, da
);
(
-\(e.function\|e->function\) = func;
-\(e.data\|e->data\) = da;
|
-\(e.data\|e->data\) = da;
-\(e.function\|e->function\) = func;
)
@match_immediate_function_data_before_init_timer@
expression e, func, da;
@@
(
-\(e.function\|e->function\) = func;
-\(e.data\|e->data\) = da;
|
-\(e.data\|e->data\) = da;
-\(e.function\|e->function\) = func;
)
-init_timer
+setup_timer
( \(&e\|e\)
+, func, da
);
@match_function_and_data_after_init_timer@
expression e, e2, e3, e4, e5, func, da;
@@
-init_timer
+setup_timer
( \(&e\|e\)
+, func, da
);
... when != func = e2
when != da = e3
(
-e.function = func;
... when != da = e4
-e.data = da;
|
-e->function = func;
... when != da = e4
-e->data = da;
|
-e.data = da;
... when != func = e5
-e.function = func;
|
-e->data = da;
... when != func = e5
-e->function = func;
)
@match_function_and_data_before_init_timer@
expression e, e2, e3, e4, e5, func, da;
@@
(
-e.function = func;
... when != da = e4
-e.data = da;
|
-e->function = func;
... when != da = e4
-e->data = da;
|
-e.data = da;
... when != func = e5
-e.function = func;
|
-e->data = da;
... when != func = e5
-e->function = func;
)
... when != func = e2
when != da = e3
-init_timer
+setup_timer
( \(&e\|e\)
+, func, da
);
@r1 exists@
expression t;
identifier f;
position p;
@@
f(...) { ... when any
init_timer@p(\(&t\|t\))
... when any
}
@r2 exists@
expression r1.t;
identifier g != r1.f;
expression e8;
@@
g(...) { ... when any
\(t.data\|t->data\) = e8
... when any
}
// It is dangerous to use setup_timer if data field is initialized
// in another function.
@script:python depends on r2@
p << r1.p;
@@
cocci.include_match(False)
@r3@
expression r1.t, func, e7;
position r1.p;
@@
(
-init_timer@p(&t);
+setup_timer(&t, func, 0UL);
... when != func = e7
-t.function = func;
|
-t.function = func;
... when != func = e7
-init_timer@p(&t);
+setup_timer(&t, func, 0UL);
|
-init_timer@p(t);
+setup_timer(t, func, 0UL);
... when != func = e7
-t->function = func;
|
-t->function = func;
... when != func = e7
-init_timer@p(t);
+setup_timer(t, func, 0UL);
)
Signed-off-by: Kees Cook <keescook@chromium.org>
This changes all DEFINE_TIMER() callbacks to use a struct timer_list
pointer instead of unsigned long. Since the data argument has already been
removed, none of these callbacks are using their argument currently, so
this renames the argument to "unused".
Done using the following semantic patch:
@match_define_timer@
declarer name DEFINE_TIMER;
identifier _timer, _callback;
@@
DEFINE_TIMER(_timer, _callback);
@change_callback depends on match_define_timer@
identifier match_define_timer._callback;
type _origtype;
identifier _origarg;
@@
void
-_callback(_origtype _origarg)
+_callback(struct timer_list *unused)
{ ... }
Signed-off-by: Kees Cook <keescook@chromium.org>
use ffz primitive which maps to ARCv2 instruction, vs. non atomic
__test_and_set_bit
It is unlikely if we will even have more than 32 counters, but still add
a BUILD_BUG to catch that
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Current perf ISR loops thru all 32 counters, checking for each if it
caused the interrupt. Instead only loop thru counters which actually
interrupted (typically 1).
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Currently, for ARM kernels with CONFIG_ARM_LPAE and
CONFIG_STRICT_KERNEL_RWX enabled, the 2MiB pages mapping the
kernel code and rodata are writable. They are marked read-only in
a software bit (L_PMD_SECT_RDONLY) but the hardware read-only bit
is not set (PMD_SECT_AP2).
For user mappings, the logic that propagates the software bit
to the hardware bit is in set_pmd_at(); but for the kernel,
section_update() writes the PMDs directly, skipping this logic.
The fix is to set PMD_SECT_AP2 for read-only sections in
section_update(), at the same time as L_PMD_SECT_RDONLY.
Fixes: 1e3479225a ("ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error")
Signed-off-by: Philip Derrin <philip@cog.systems>
Reported-by: Neil Dick <neil@cog.systems>
Tested-by: Neil Dick <neil@cog.systems>
Tested-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When CONFIG_ARM_LPAE is set, the PMD dump relies on the software
read-only bit to determine whether a page is writable. This
concealed a bug which left the kernel text section writable
(AP2=0) while marked read-only in the software bit.
In a kernel with the AP2 bug, the dump looks like this:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M ro x SHD
0xc0600000-0xc0800000 2M ro NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
The fix is to check that the software and hardware bits are both
set before displaying "ro". The dump then shows the true perms:
---[ Kernel Mapping ]---
0xc0000000-0xc0200000 2M RW NX SHD
0xc0200000-0xc0600000 4M RW x SHD
0xc0600000-0xc0800000 2M RW NX SHD
0xc0800000-0xc4800000 64M RW NX SHD
Fixes: ded9477984 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE")
Signed-off-by: Philip Derrin <philip@cog.systems>
Tested-by: Neil Dick <neil@cog.systems>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Make the decompressor debug output user selectable, otherwise merely
enabling DEBUG_LL causes the decompressor to become board specific,
thereby preventing a multi-platform kernel from booting. Enabling
DEBUG_LL doesn't cause the kernel itself to become platform specific
unless EARLY_PRINTK is enabled, or one of the debugging routines is
added in a path that results in it being called.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Ensure that get_user_pages_fast() is not able to access memory which
has been mapped with PROT_NONE.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Export the symbol chip_to_vas_id() to fix a build failure when
CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV=m.
Fixes: d4ef61b5e8 ("powerpc/vas, nx-842: Define and use chip_to_vas_id()")
Reported-by: Haren Myneni <hbabu@us.ibm.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Print a rate-limited warning when a user-space program attempts to execute
any of the instructions that UMIP protects (i.e., SGDT, SIDT, SLDT, STR
and SMSW).
This is useful, because when CONFIG_X86_INTEL_UMIP=y is selected and
supported by the hardware, user space programs that try to execute such
instructions will receive a SIGSEGV signal that they might not expect.
In the specific cases for which emulation is provided (instructions SGDT,
SIDT and SMSW in protected and virtual-8086 modes), no signal is
generated. However, a warning is helpful to encourage updates in such
programs to avoid the use of such instructions.
Warnings are printed via a customized printk() function that also provides
information about the program that attempted to use the affected
instructions.
Utility macros are defined to wrap umip_printk() for the error and warning
kernel log levels.
While here, replace an existing call to the generic rate-limited pr_err()
with the new umip_pr_err().
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1511233476-17088-1-git-send-email-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While computing slice mask for the free area we need make sure we only
search in the addr limit applicable for this mmap. We update the
slb_addr_limit after we request for a mmap above 128TB. But the
following mmap request with hint addr below 128TB should still limit
its search to below 128TB. ie. we should not use slb_addr_limit to
compute slice mask in this case. Instead, we should derive high addr
limit based on the mmap hint addr value.
Fixes: f4ea6dcb08 ("powerpc/mm: Enable mappings above 128TB")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When searching the opcode offset table within find_insn() the check
"entry->opcode == 0" was intended to clarify that 1-byte opcodes, the
first one being 0, are special.
However there is no mnemonic for an illegal opcode starting with 0.
Therefore there is also no opcode offset table entry that matches,
which again means that the check never is true. Therefore just remove
the confusing check, and add a comment which hopefully explains how
this works.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If GCC_PLUGIN_RANDSTRUCT is enabled the members of task_struct will be
shuffled around. The offsets of the "pid" and "stack" members within
task_struct may not necessarily fit into 12 bits anymore, which causes
compile errors within __switch_to, since instructions are used, which
only have a 12 bit displacement field.
Therefore rework __switch_to, to allow for larger offsets.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 1887aa07b6
("s390/topology: add detection of dedicated vs shared CPUs")
introduced following compiler error when CONFIG_SCHED_TOPOLOGY is not set.
CC arch/s390/kernel/smp.o
...
arch/s390/kernel/smp.c: In function ‘smp_start_secondary’:
arch/s390/kernel/smp.c:812:6: error: implicit declaration of function
‘topology_cpu_dedicated’; did you mean ‘topology_cpu_init’?
This patch fixes the compiler error by adding function
topology_cpu_dedicated() to return false when this config option is
not defined.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull sparc updates from David Miller:
1) Add missing cmpxchg64() for 32-bit sparc.
2) Timer conversions from Allen Pais and Kees Cook.
3) vDSO support, from Nagarathnam Muthusamy.
4) Fix sparc64 huge page table walks based upon bug report by Al Viro,
from Nitin Gupta.
5) Optimized fls() for T4 and above, from Vijay Kumar.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix page table walk for PUD hugepages
sparc64: Convert timers to user timer_setup()
sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_t
sparc/led: Convert timers to use timer_setup()
sparc64: Use sparc optimized fls and __fls for T4 and above
sparc64: SPARC optimized __fls function
sparc64: SPARC optimized fls function
sparc64: Define SPARC default __fls function
sparc64: Define SPARC default fls function
vDSO for sparc
sparc32: Add cmpxchg64().
sbus: char: Move D7S_MINOR to include/linux/miscdevice.h
sparc: time: Remove unneeded linux/miscdevice.h include
sparc64: mmu_context: Add missing include files
Pull Kbuild updates from Masahiro Yamada:
"One of the most remarkable improvements in this cycle is, Kbuild is
now able to cache the result of shell commands. Some variables are
expensive to compute, for example, $(call cc-option,...) invokes the
compiler. It is not efficient to redo this computation every time,
even when we are not actually building anything. Kbuild creates a
hidden file ".cache.mk" that contains invoked shell commands and their
results. The speed-up should be noticeable.
Summary:
- Fix arch build issues (hexagon, sh)
- Clean up various Makefiles and scripts
- Fix wrong usage of {CFLAGS,LDFLAGS}_MODULE in arch Makefiles
- Cache variables that are expensive to compute
- Improve cc-ldopton and ld-option for Clang
- Optimize output directory creation"
* tag 'kbuild-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits)
kbuild: move coccicheck help from scripts/Makefile.help to top Makefile
sh: decompressor: add shipped files to .gitignore
frv: .gitignore: ignore vmlinux.lds
selinux: remove unnecessary assignment to subdir-
kbuild: specify FORCE in Makefile.headersinst as .PHONY target
kbuild: remove redundant mkdir from ./Kbuild
kbuild: optimize object directory creation for incremental build
kbuild: create object directories simpler and faster
kbuild: filter-out PHONY targets from "targets"
kbuild: remove redundant $(wildcard ...) for cmd_files calculation
kbuild: create directory for make cache only when necessary
sh: select KBUILD_DEFCONFIG depending on ARCH
kbuild: fix linker feature test macros when cross compiling with Clang
kbuild: shrink .cache.mk when it exceeds 1000 lines
kbuild: do not call cc-option before KBUILD_CFLAGS initialization
kbuild: Cache a few more calls to the compiler
kbuild: Add a cache for generated variables
kbuild: add forward declaration of default target to Makefile.asm-generic
kbuild: remove KBUILD_SUBDIR_ASFLAGS and KBUILD_SUBDIR_CCFLAGS
hexagon/kbuild: replace CFLAGS_MODULE with KBUILD_CFLAGS_MODULE
...
Patch series "Replacing PID bitmap implementation with IDR API", v4.
This series replaces kernel bitmap implementation of PID allocation with
IDR API. These patches are written to simplify the kernel by replacing
custom code with calls to generic code.
The following are the stats for pid and pid_namespace object files
before and after the replacement. There is a noteworthy change between
the IDR and bitmap implementation.
Before
text data bss dec hex filename
8447 3894 64 12405 3075 kernel/pid.o
After
text data bss dec hex filename
3397 304 0 3701 e75 kernel/pid.o
Before
text data bss dec hex filename
5692 1842 192 7726 1e2e kernel/pid_namespace.o
After
text data bss dec hex filename
2854 216 16 3086 c0e kernel/pid_namespace.o
The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.
ps:
With IDR API With bitmap
real 0m1.479s 0m2.319s
user 0m0.070s 0m0.060s
sys 0m0.289s 0m0.516s
pstree:
With IDR API With bitmap
real 0m1.024s 0m1.794s
user 0m0.348s 0m0.612s
sys 0m0.184s 0m0.264s
proc:
With IDR API With bitmap
real 0m0.059s 0m0.074s
user 0m0.000s 0m0.004s
sys 0m0.016s 0m0.016s
This patch (of 2):
Replace the current bitmap implementation for Process ID allocation.
Functions that are no longer required, for example, free_pidmap(),
alloc_pidmap(), etc. are removed. The rest of the functions are
modified to use the IDR API. The change was made to make the PID
allocation less complex by replacing custom code with calls to generic
API.
[gs051095@gmail.com: v6]
Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
[avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull two power management fixes from Rafael Wysocki:
"This is the change making /proc/cpuinfo on x86 report current CPU
frequency in "cpu MHz" again in all cases and an additional one
dealing with an overzealous check in one of the helper routines in the
runtime PM framework"
* tag 'pm-fixes-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / runtime: Drop children check from __pm_runtime_set_status()
x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
Pull parisc updates from Helge Deller:
"Highlights:
- one important fix from Dave to prevent kernel crash when userspace
hands over invalid values to our in-kernel CAS implementation.
- added CPU topology support, including multi-core scheduler support
on PA8900 CPUs
Minor changes:
- minor fixes for sparse (from Luc)
- drop duplicates for CPU_BIG_ENDIAN from parisc and sparc top
Kconfig files (from Babu)
- reorganized parisc PDC (firmware-access) header files for usage
from userspace. Required for upcoming qemu parisc emulator and
SeaBIOS fork to support parisc"
* 'parisc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
arch: Fix duplicates in Kconfig for parisc and sparc
parisc: Make some PDC structures accessible in uapi headers
parisc: Pass endianness info to sparse
parisc: Add CPU topology support
parisc: Fix validity check of pointer size argument in new CAS implementation
Pull second round of s390 updates from Martin Schwidefsky:
- rework of the vdso code to avoid the use of the access register mode
- use perf AUX buffers for the transport of diagnostic sample data
- add perf_regs and user stack dump support
- enable perf call graphs for user space programs
- add perf register support for floating-point registers
- all remaining s390 related timer_setup conversions
- bug fixes and cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (30 commits)
s390: remove unused parameter from Makefile
zfcp: purely mechanical update using timer API, plus blank lines
s390/scsi: Convert timers to use timer_setup()
s390/cpum_sf: correctly set the PID and TID in perf samples
s390/cpum_sf: load program parameter at sampler enablement
s390/perf: add perf register support for floating-point registers
s390/perf: extend perf_regs support to include floating-point registers
s390/perf: define common DWARF register string table
s390/perf: add support for perf_regs and libdw
s390/perf: add perf_regs support and user stack dump
s390/cpum_sf: do not register PMU if no sampling mode is authorized
s390/cpumf: remove raw event support in basic-only sampling mode
s390/perf: add callback to perf to enable using AUX buffer
s390/cpumf: enable using AUX buffer
s390/cpumf: introduce AUX buffer for dump diagnostic sample data
s390/disassembler: increase show_code buffer size
s390: Remove CONFIG_HARDENED_USERCOPY
s390: enable CPU alternatives unconditionally
s390/nmi: remove unused code
s390/mm: remove unused code
...
Pull file locking update from Jeff Layton:
"A couple of fixes for a patch that went into v4.14, and the bug report
just came in a few days ago.. It passes my (minimal) testing, and has
been in linux-next for a few days now.
I also would like to get my address changed in MAINTAINERS to clear
that hurdle"
* tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
fcntl: don't leak fd reference when fixup_compat_flock fails
MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
Pull compat and uaccess updates from Al Viro:
- {get,put}_compat_sigset() series
- assorted compat ioctl stuff
- more set_fs() elimination
- a few more timespec64 conversions
- several removals of pointless access_ok() in places where it was
followed only by non-__ variants of primitives
* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
coredump: call do_unlinkat directly instead of sys_unlink
fs: expose do_unlinkat for built-in callers
ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
ipmi: get rid of pointless access_ok()
pi433: sanitize ioctl
cxlflash: get rid of pointless access_ok()
mtdchar: get rid of pointless access_ok()
r128: switch compat ioctls to drm_ioctl_kernel()
selection: get rid of field-by-field copyin
VT_RESIZEX: get rid of field-by-field copyin
i2c compat ioctls: move to ->compat_ioctl()
sched_rr_get_interval(): move compat to native, get rid of set_fs()
mips: switch to {get,put}_compat_sigset()
sparc: switch to {get,put}_compat_sigset()
s390: switch to {get,put}_compat_sigset()
ppc: switch to {get,put}_compat_sigset()
parisc: switch to {get,put}_compat_sigset()
get_compat_sigset()
get rid of {get,put}_compat_itimerspec()
io_getevents: Use timespec64 to represent timeouts
...
Pull libnvdimm and dax updates from Dan Williams:
"Save for a few late fixes, all of these commits have shipped in -next
releases since before the merge window opened, and 0day has given a
build success notification.
The ext4 touches came from Jan, and the xfs touches have Darrick's
reviewed-by. An xfstest for the MAP_SYNC feature has been through
a few round of reviews and is on track to be merged.
- Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
'userspace flush' of persistent memory updates via filesystem-dax
mappings. It arranges for any filesystem metadata updates that may
be required to satisfy a write fault to also be flushed ("on disk")
before the kernel returns to userspace from the fault handler.
Effectively every write-fault that dirties metadata completes an
fsync() before returning from the fault handler. The new
MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag
is validated as supported by the filesystem's ->mmap() file
operation.
- Add support for the standard ACPI 6.2 label access methods that
replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods.
This enables interoperability with environments that only implement
the standardized methods.
- Add support for the ACPI 6.2 NVDIMM media error injection methods.
- Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for
latch last shutdown status, firmware update, SMART error injection,
and SMART alarm threshold control.
- Cleanup physical address information disclosures to be root-only.
- Fix revalidation of the DIMM "locked label area" status to support
dynamic unlock of the label area.
- Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
(system-physical-address) command and error injection commands.
Acknowledgements that came after the commits were pushed to -next:
- 957ac8c421 ("dax: fix PMD faults on zero-length files"):
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
- a39e596baa ("xfs: support for synchronous DAX faults") and
7b565c9f96 ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()")
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>"
* tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits)
acpi, nfit: add 'Enable Latch System Shutdown Status' command support
dax: fix general protection fault in dax_alloc_inode
dax: fix PMD faults on zero-length files
dax: stop requiring a live device for dax_flush()
brd: remove dax support
dax: quiet bdev_dax_supported()
fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
tools/testing/nvdimm: unit test clear-error commands
acpi, nfit: validate commands against the device type
tools/testing/nvdimm: stricter bounds checking for error injection commands
xfs: support for synchronous DAX faults
xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
ext4: Support for synchronous DAX faults
ext4: Simplify error handling in ext4_dax_huge_fault()
dax: Implement dax_finish_sync_fault()
dax, iomap: Add support for synchronous faults
mm: Define MAP_SYNC and VM_SYNC flags
dax: Allow tuning whether dax_insert_mapping_entry() dirties entry
dax: Allow dax_iomap_fault() to return pfn
dax: Fix comment describing dax_iomap_fault()
...
Analyzing large early boot allocations unveiled the logical package id
storage as a prominent memory waste. Since commit 1f12e32f4c
("x86/topology: Create logical package id") every 64-bit system allocates a
128k array to convert logical package ids.
This happens because the array is sized for MAX_LOCAL_APIC which is always
32k on 64bit systems, and it needs 4 bytes for each entry.
This is fairly wasteful, especially for the common case of having only one
socket, which uses exactly 4 byte out of 128K.
There is no user of the package id map which is performance critical, so
the lookup is not required to be O(1). Store the logical processor id in
cpu_data and use a loop based lookup.
To keep the mapping stable accross cpu hotplug operations, add a flag to
cpu_data which is set when the CPU is brought up the first time. When the
flag is set, then cpu_data is not reinitialized by copying boot_cpu_data on
subsequent bringups.
[ tglx: Rename the flag to 'initialized', use proper pointers instead of
repeated cpu_data(x) evaluation and massage changelog. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: He Chen <he.chen@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Piotr Luc <piotr.luc@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171114124257.22013-3-prarit@redhat.com
When crosvm is used to boot a kernel as a VM, the SMP MP-table is found
at physical address 0x0. This causes mpf_base to be set to 0 and a
subsequent "if (!mpf_base)" check in default_get_smp_config() results in
the MP-table not being parsed. Further into the boot this results in an
oops when attempting a read_apic_id().
Add a boolean variable that is set to true when the MP-table is found.
Use this variable for testing if the MP-table was found so that even a
value of 0 for mpf_base will result in continued parsing of the MP-table.
Fixes: 5997efb967 ("x86/boot: Use memremap() to map the MPF and MPC data")
Reported-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: regression@leemhuis.info
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171106201753.23059.86674.stgit@tlendack-t1.amdoffice.net
Fix duplicates for sparc and parisc. This was due these following commits.
1. commit 4c97a0c8fe ("arch: define CPU_BIG_ENDIAN for all fixed big
endian archs")
2. commit 97d9f96916 ("arch/sparc: Define config parameter
CPU_BIG_ENDIAN")
3. commit 74ad3d28af ("parisc: Define CONFIG_CPU_BIG_ENDIAN")
Remove duplicates.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
While working on a qemu and SeaBIOS-port to parisc, those PDC structures are
useful to have accessible from userspace.
Signed-off-by: Helge Deller <deller@gmx.de>
parisc is big-endian only but sparse assumes the same endianness as the
building machine.
This is problematic for code which expect __BYTE_ORDER__ being correctly
predefined by the compiler which sparse can then pre-process differently
from what gcc would.
Fix this by letting sparse know about the architecture endianness.
To: James Bottomley <jejb@parisc-linux.org>
To: Helge Deller <deller@gmx.de>
CC: linux-parisc@vger.kernel.org
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Add topology support, including multi-core scheduler support on
PA8800/PA8900 CPUs and enhanced output in /proc/cpuinfo, e.g.
lscpu now reports on a single-socket, dual-core machine:
Architecture: parisc64
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
CPU family: PA-RISC 2.0
Model name: PA8800 (Mako)
Signed-off-by: Helge Deller <deller@gmx.de>
As noted by Christoph Biedl, passing a pointer size of 4 in the new CAS
implementation causes a kernel crash. The attached patch corrects the
off by one error in the argument validity check.
In reviewing the code, I noticed that we only perform word operations
with the pointer size argument. The subi instruction intentionally uses
a word condition on 64-bit kernels. Nullification was used instead of a
cmpib instruction as the branch should never be taken. The shlw
pseudo-operation generates a depw,z instruction and it clears the target
before doing a shift left word deposit. Thus, we don't need to clip the
upper 32 bits of this argument on 64-bit kernels.
Tested with a gcc testsuite run with a 64-bit kernel. The gcc atomic
code in libgcc is the only direct user of the new CAS implementation
that I am aware of.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Helge Deller <deller@gmx.de>
These bits were not defined until now in common code, but they are
now that the kernel supports UMIP too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vmx_check_nested_events() should return -EBUSY only in case there is a
pending L1 event which requires a VMExit from L2 to L1 but such a
VMExit is currently blocked. Such VMExits are blocked either
because nested_run_pending=1 or an event was reinjected to L2.
vmx_check_nested_events() should return 0 in case there are no
pending L1 events which requires a VMExit from L2 to L1 or if
a VMExit from L2 to L1 was done internally.
However, upstream commit which introduced blocking in case an event was
reinjected to L2 (commit acc9ab6013 ("KVM: nVMX: Fix pending events
injection")) contains a bug: It returns -EBUSY even if there are no
pending L1 events which requires VMExit from L2 to L1.
This commit fix this issue.
Fixes: acc9ab6013 ("KVM: nVMX: Fix pending events injection")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>