Pull x86 boot updates from Ingo Molnar:
"The biggest changes in this cycle were:
- prepare for more KASLR related changes, by restructuring, cleaning
up and fixing the existing boot code. (Kees Cook, Baoquan He,
Yinghai Lu)
- simplifly/concentrate subarch handling code, eliminate
paravirt_enabled() usage. (Luis R Rodriguez)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/KASLR: Clarify purpose of each get_random_long()
x86/KASLR: Add virtual address choosing function
x86/KASLR: Return earliest overlap when avoiding regions
x86/KASLR: Add 'struct slot_area' to manage random_addr slots
x86/boot: Add missing file header comments
x86/KASLR: Initialize mapping_info every time
x86/boot: Comment what finalize_identity_maps() does
x86/KASLR: Build identity mappings on demand
x86/boot: Split out kernel_ident_mapping_init()
x86/boot: Clean up indenting for asm/boot.h
x86/KASLR: Improve comments around the mem_avoid[] logic
x86/boot: Simplify pointer casting in choose_random_location()
x86/KASLR: Consolidate mem_avoid[] entries
x86/boot: Clean up pointer casting
x86/boot: Warn on future overlapping memcpy() use
x86/boot: Extract error reporting functions
x86/boot: Correctly bounds-check relocations
x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
x86/boot: Fix "run_size" calculation
x86/boot: Calculate decompression size during boot not build
...
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- MSR access API fixes and enhancements (Andy Lutomirski)
- early exception handling improvements (Andy Lutomirski)
- user-space FS/GS prctl usage fixes and improvements (Andy
Lutomirski)
- Remove the cpu_has_*() APIs and replace them with equivalents
(Borislav Petkov)
- task switch micro-optimization (Brian Gerst)
- 32-bit entry code simplification (Denys Vlasenko)
- enhance PAT handling in enumated CPUs (Toshi Kani)
... and lots of other cleanups/fixlets"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
x86/arch_prctl/64: Restore accidentally removed put_cpu() in ARCH_SET_GS
x86/entry/32: Remove asmlinkage_protect()
x86/entry/32: Remove GET_THREAD_INFO() from entry code
x86/entry, sched/x86: Don't save/restore EFLAGS on task switch
x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs
selftests/x86/ldt_gdt: Test set_thread_area() deletion of an active segment
x86/tls: Synchronize segment registers in set_thread_area()
x86/asm/64: Rename thread_struct's fs and gs to fsbase and gsbase
x86/arch_prctl/64: Remove FSBASE/GSBASE < 4G optimization
x86/segments/64: When load_gs_index fails, clear the base
x86/segments/64: When loadsegment(fs, ...) fails, clear the base
x86/asm: Make asm/alternative.h safe from assembly
x86/asm: Stop depending on ptrace.h in alternative.h
x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()
x86/asm: Make sure verify_cpu() has a good stack
x86/extable: Add a comment about early exception handlers
x86/msr: Set the return value to zero when native_rdmsr_safe() fails
x86/paravirt: Make "unsafe" MSR accesses unsafe even if PARAVIRT=y
x86/paravirt: Add paravirt_{read,write}_msr()
x86/msr: Carry on after a non-"safe" MSR access fails
...
Pull scheduler updates from Ingo Molnar:
- massive CPU hotplug rework (Thomas Gleixner)
- improve migration fairness (Peter Zijlstra)
- CPU load calculation updates/cleanups (Yuyang Du)
- cpufreq updates (Steve Muckle)
- nohz optimizations (Frederic Weisbecker)
- switch_mm() micro-optimization on x86 (Andy Lutomirski)
- ... lots of other enhancements, fixes and cleanups.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
ARM: Hide finish_arch_post_lock_switch() from modules
sched/core: Provide a tsk_nr_cpus_allowed() helper
sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed
sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
sched/fair: Correct unit of load_above_capacity
sched/fair: Clean up scale confusion
sched/nohz: Fix affine unpinned timers mess
sched/fair: Fix fairness issue on migration
sched/core: Kill sched_class::task_waking to clean up the migration logic
sched/fair: Prepare to fix fairness problems on migration
sched/fair: Move record_wakee()
sched/core: Fix comment typo in wake_q_add()
sched/core: Remove unused variable
sched: Make hrtick_notifier an explicit call
sched/fair: Make ilb_notifier an explicit call
sched/hotplug: Make activate() the last hotplug step
sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()
sched/migration: Move CPU_ONLINE into scheduler state
sched/migration: Move calc_load_migrate() into CPU_DYING
sched/migration: Move prepare transition to SCHED_STARTING state
...
Pull RAS updates from Ingo Molnar:
"Main changes in this cycle were:
- AMD MCE/RAS handling updates (Yazen Ghannam, Aravind
Gopalakrishnan)
- Cleanups (Borislav Petkov)
- logging fix (Tony Luck)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/RAS: Add SMCA support to AMD Error Injector
EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCA
x86/mce: Update AMD mcheck init to use cpu_has() facilities
x86/cpu: Add detection of AMD RAS Capabilities
x86/mce/AMD: Save an indentation level in prepare_threshold_block()
x86/mce/AMD: Disable LogDeferredInMcaStat for SMCA systems
x86/mce/AMD: Log Deferred Errors using SMCA MCA_DE{STAT,ADDR} registers
x86/mce: Detect local MCEs properly
x86/mce: Look in genpool instead of mcelog for pending error records
x86/mce: Detect and use SMCA-specific msr_ops
x86/mce: Define vendor-specific MSR accessors
x86/mce: Carve out writes to MCx_STATUS and MCx_CTL
x86/mce: Grade uncorrected errors for SMCA-enabled systems
x86/mce: Log MCEs after a warm rest on AMD, Fam17h and later
x86/mce: Remove explicit smp_rmb() when starting CPUs sync
x86/RAS: Rename AMD MCE injector config item
Pull perf updates from Ingo Molnar:
"Bigger kernel side changes:
- Add backwards writing capability to the perf ring-buffer code,
which is preparation for future advanced features like robust
'overwrite support' and snapshot mode. (Wang Nan)
- Add pause and resume ioctls for the perf ringbuffer (Wang Nan)
- x86 Intel cstate code cleanups and reorgnization (Thomas Gleixner)
- x86 Intel uncore and CPU PMU driver updates (Kan Liang, Peter
Zijlstra)
- x86 AUX (Intel PT) related enhancements and updates (Alexander
Shishkin)
- x86 MSR PMU driver enhancements and updates (Huang Rui)
- ... and lots of other changes spread out over 40+ commits.
Biggest tooling side changes:
- 'perf trace' features and enhancements. (Arnaldo Carvalho de Melo)
- BPF tooling updates (Wang Nan)
- 'perf sched' updates (Jiri Olsa)
- 'perf probe' updates (Masami Hiramatsu)
- ... plus 200+ other enhancements, fixes and cleanups to tools/
The merge commits, the shortlog and the changelogs contain a lot more
details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (249 commits)
perf/core: Disable the event on a truncated AUX record
perf/x86/intel/pt: Generate PMI in the STOP region as well
perf buildid-cache: Use lsdir() for looking up buildid caches
perf symbols: Use lsdir() for the search in kcore cache directory
perf tools: Use SBUILD_ID_SIZE where applicable
perf tools: Fix lsdir to set errno correctly
perf trace: Move seccomp args beautifiers to tools/perf/trace/beauty/
perf trace: Move flock op beautifier to tools/perf/trace/beauty/
perf build: Add build-test for debug-frame on arm/arm64
perf build: Add build-test for libunwind cross-platforms support
perf script: Fix export of callchains with recursion in db-export
perf script: Fix callchain addresses in db-export
perf script: Fix symbol insertion behavior in db-export
perf symbols: Add dso__insert_symbol function
perf scripting python: Use Py_FatalError instead of die()
perf tools: Remove xrealloc and ALLOC_GROW
perf help: Do not use ALLOC_GROW in add_cmd_list
perf pmu: Make pmu_formats_string to check return value of strbuf
perf header: Make topology checkers to check return value of strbuf
perf tools: Make alias handler to check return value of strbuf
...
Pull support for killable rwsems from Ingo Molnar:
"This, by Michal Hocko, implements down_write_killable().
The main usecase will be to update mm_sem usage sites to use this new
API, to allow the mm-reaper introduced in commit aac4536355 ("mm,
oom: introduce oom reaper") to tear down oom victim address spaces
asynchronously with minimum latencies and without deadlock worries"
[ The vfs will want it too as the inode lock is changed from a mutex to
a rwsem due to the parallel lookup and readdir updates ]
* 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix comment on register clobbering
locking/rwsem: Fix down_write_killable()
locking/rwsem, x86: Add frame annotation for call_rwsem_down_write_failed_killable()
locking/rwsem: Provide down_write_killable()
locking/rwsem, x86: Provide __down_write_killable()
locking/rwsem, s390: Provide __down_write_killable()
locking/rwsem, ia64: Provide __down_write_killable()
locking/rwsem, alpha: Provide __down_write_killable()
locking/rwsem: Introduce basis for down_write_killable()
locking/rwsem, sparc: Drop superfluous arch specific implementation
locking/rwsem, sh: Drop superfluous arch specific implementation
locking/rwsem, xtensa: Drop superfluous arch specific implementation
locking/rwsem: Drop explicit memory barriers
locking/rwsem: Get rid of __down_write_nested()
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Drop the unused EFI_SYSTEM_TABLES efi.flags bit and ensure the
ARM/arm64 EFI System Table mapping is read-only (Ard Biesheuvel)
- Add a comment to explain that one of the code paths in the x86/pat
code is only executed for EFI boot (Matt Fleming)
- Improve Secure Boot status checks on arm64 and handle unexpected
errors (Linn Crosetto)
- Remove the global EFI memory map variable 'memmap' as the same
information is already available in efi::memmap (Matt Fleming)
- Add EFI Memory Attribute table support for ARM/arm64 (Ard
Biesheuvel)
- Add EFI GOP framebuffer support for ARM/arm64 (Ard Biesheuvel)
- Add EFI Bootloader Control driver for storing reboot(2) data in EFI
variables for consumption by bootloaders (Jeremy Compostella)
- Add Core EFI capsule support (Matt Fleming)
- Add EFI capsule char driver (Kweh, Hock Leong)
- Unify EFI memory map code for ARM and arm64 (Ard Biesheuvel)
- Add generic EFI support for detecting when firmware corrupts CPU
status register bits (like IRQ flags) when performing EFI runtime
service calls (Mark Rutland)
... and other misc cleanups"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
efivarfs: Make efivarfs_file_ioctl() static
efi: Merge boolean flag arguments
efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
efibc: Fix excessive stack footprint warning
efi/capsule: Make efi_capsule_pending() lockless
efi: Remove unnecessary (and buggy) .memmap initialization from the Xen EFI driver
efi/runtime-wrappers: Remove ARCH_EFI_IRQ_FLAGS_MASK #ifdef
x86/efi: Enable runtime call flag checking
arm/efi: Enable runtime call flag checking
arm64/efi: Enable runtime call flag checking
efi/runtime-wrappers: Detect firmware IRQ flag corruption
efi/runtime-wrappers: Remove redundant #ifdefs
x86/efi: Move to generic {__,}efi_call_virt()
arm/efi: Move to generic {__,}efi_call_virt()
arm64/efi: Move to generic {__,}efi_call_virt()
efi/runtime-wrappers: Add {__,}efi_call_virt() templates
efi/arm-init: Reserve rather than unmap the memory map for ARM as well
efi: Add misc char driver interface to update EFI firmware
x86/efi: Force EFI reboot to process pending capsules
efi: Add 'capsule' update support
...
Pull core signal updates from Ingo Molnar:
"These updates from Stas Sergeev and Andy Lutomirski, improve the
sigaltstack interface by extending its ABI with the SS_AUTODISARM
feature, which makes it possible to use swapcontext() in a sighandler
that works on sigaltstack. Without this flag, the subsequent signal
will corrupt the state of the switched-away sighandler.
The inspiration is more robust dosemu signal handling"
* 'core-signals-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
signals/sigaltstack: Change SS_AUTODISARM to (1U << 31)
signals/sigaltstack: Report current flag bits in sigaltstack()
selftests/sigaltstack: Fix the sigaltstack test on old kernels
signals/sigaltstack: If SS_AUTODISARM, bypass on_sig_stack()
selftests/sigaltstack: Add new testcase for sigaltstack(SS_ONSTACK|SS_AUTODISARM)
signals/sigaltstack: Implement SS_AUTODISARM flag
signals/sigaltstack: Prepare to add new SS_xxx flags
signals/sigaltstack, x86/signals: Unify the x86 sigaltstack check with other architectures
Pull RCU updates from Ingo Molnar:
"The main changes are:
- Documentation updates, including fixes to the design-level
requirements documentation and a fixed version of the design-level
data-structure documentation. These fixes include removing
cartoons and getting rid of the html/htmlx duplication.
- Further improvements to the new-age expedited grace periods.
- Miscellaneous fixes.
- Torture-test changes, including a new rcuperf module for measuring
RCU grace-period performance and scalability, which is useful for
the expedited-grace-period changes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
rcutorture: Add boot-time adjustment of leaf fanout
rcutorture: Add irqs-disabled test for call_rcu()
rcutorture: Dump trace buffer upon shutdown
rcutorture: Don't rebuild identical kernel
rcutorture: Add OS-jitter capability
documentation: Add documentation for RCU's major data structures
rcutorture: Convert test duration to seconds early
torture: Kill qemu, not parent process
torture: Clarify refusal to run more than one torture test
rcutorture: Consider FROZEN hotplug notifier transitions
rcutorture: Remove redundant initialization to zero
rcuperf: Do not wake up shutdown wait queue if "shutdown" is false.
rcutorture: Add largish-system rcuperf scenario
rcutorture: Avoid RCU CPU stall warning and RT throttling
rcutorture: Add rcuperf holdoff boot parameter to reduce interference
rcutorture: Make scripts analyze rcuperf trace data, if present
rcutorture: Make rcuperf collect expedited event-trace data
rcutorture: Print measure of batching efficiency
rcutorture: Set rcuperf writer kthreads to real-time priority
rcutorture: Bind rcuperf reader/writer kthreads to CPUs
...
Pull core/lib update from Ingo Molnar:
"This contains a single commit that removes an unused facility that the
scheduler used to make use of"
* 'core-lib-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/proportions: Remove unused code
The hash mixing between adding the next 64 bits of name
was just a bit weak.
Replaced with a still very fast but slightly more effective
mixing function.
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new signal_pending exit path in __rwsem_down_write_failed_common()
was fingered as breaking his kernel by Tetsuo Handa.
Upon inspection it was found that there are two things wrong with it;
- it forgets to remove WAITING_BIAS if it leaves the list empty, or
- it forgets to wake further waiters that were blocked on the now
removed waiter.
Especially the first issue causes new lock attempts to block and stall
indefinitely, as the code assumes that pending waiters mean there is
an owner that will wake when it releases the lock.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Waiman Long <Waiman.Long@hpe.com>
Link: http://lkml.kernel.org/r/20160512115745.GP3192@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fix from Thomas Gleixner:
"Just the missing compat entry for the new pread/writev2"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Use compat version for preadv2 and pwritev2
Original implementation commit e54bcde3d6 ("arm64: eBPF JIT compiler")
had the relevant code paths, but due to an oversight always fail jiting.
As a result, we had been falling back to BPF interpreter whenever a BPF
program has JMP_JSET_{X,K} instructions.
With this fix, we confirm that the corresponding tests in lib/test_bpf
continue to pass, and also jited.
...
[ 2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS
[ 2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS
[ 2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS
...
[ 3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS
[ 3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS
[ 3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS
[ 3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS
...
Fixes: e54bcde3d6 ("arm64: eBPF JIT compiler")
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when creating or updating a route, no check is performed
in both ipv4 and ipv6 code to the hoplimit value.
The caller can i.e. set hoplimit to 256, and when such route will
be used, packets will be sent with hoplimit/ttl equal to 0.
This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
ipv6 route code, substituting any value greater than 255 with 255.
This is consistent with what is currently done for ADVMSS and MTU
in the ipv4 code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.
Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.
This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.
Fixes: 5b3501faa8 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull vfs fixes from Al Viro:
"Overlayfs fixes from Miklos, assorted fixes from me.
Stable fodder of varying severity, all sat in -next for a while"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ovl: ignore permissions on underlying lookup
vfs: add lookup_hash() helper
vfs: rename: check backing inode being equal
vfs: add vfs_select_inode() helper
get_rock_ridge_filename(): handle malformed NM entries
ecryptfs: fix handling of directory opening
atomic_open(): fix the handling of create_error
fix the copy vs. map logics in blk_rq_map_user_iov()
do_splice_to(): cap the size before passing to ->splice_read()
This patch fixes SG_RX_DV_GATE_REG_0_ADDR register offset
and ring state field lengths.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the race condition on updating the statistics
counters by moving the counters to the ring structure.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses ununiform latency across queues by adding
more queues to match with, upto number of CPU cores.
Also, number of interrupts are increased and the channel numbers
are reordered.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since hardware doesn't allow sharing of interrupts,
this patch fixes the same by removing IRQF_SHARED flag.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the crash observed during IPv4 forward test by
setting the drop field in the dbptr.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cgroup fixes from Tejun Heo:
"During v4.6-rc1 cgroup namespace support was merged. There is an
issue where it's impossible to tell whether a given cgroup mount point
is bind mounted or namespaced. Serge has been working on the issue
but it took longer than expected to resolve, so the late pull request.
Given that it's a completely new feature and the patches don't touch
anything else, the risk seems acceptable. However, if this is too
late, an alternative is plugging new cgroup ns creation for v4.6 and
retrying for v4.7"
* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix compile warning
kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
kernfs_path_from_node_locked: don't overwrite nlen
Pull workqueue fix from Tejun Heo:
"CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding
DOWN_PREPARE which can trigger a WARN_ON() in workqueue.
The bug has been there for a very long time. It only triggers if CPU
down fails at a specific point and I don't think it has adverse
effects other than the warning messages. The fix is very low impact"
* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix rebind bound workers warning
Pull scheduler fix from Ingo Molnar:
"This is a revert to fix an interactivity problem.
The proper fixes for the problems that the reverted commit exposed are
now in sched/core (consisting of 3 patches), but were too risky for
v4.6 and will arrive in the v4.7 merge window"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "sched/fair: Fix fairness issue on migration"
Pull perf fixes from Ingo Molnar:
"An uncharacteristically large number of bugs popped up in the last
week:
- various tooling fixes, two crashes and build problems
- two Intel PT fixes
- an KNL uncore driver fix
- an Intel PMU driver fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf stat: Fallback to user only counters when perf_event_paranoid > 1
perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()
perf evsel: Improve EPERM error handling in open_strerror()
tools lib traceevent: Do not reassign parg after collapse_tree()
perf probe: Check if dwarf_getlocations() is available
perf dwarf: Guard !x86_64 definitions under #ifdef else clause
perf tools: Use readdir() instead of deprecated readdir_r()
perf thread_map: Use readdir() instead of deprecated readdir_r()
perf script: Use readdir() instead of deprecated readdir_r()
perf tools: Use readdir() instead of deprecated readdir_r()
perf/core: Disable the event on a truncated AUX record
perf/x86/intel/pt: Generate PMI in the STOP region as well
perf/x86: Fix undefined shift on 32-bit kernels
perf/x86/msr: Fix SMI overflow
perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform
perf diff: Fix duplicated output column
- The Atmel sama5d2 was registering the wrong NFC device type
- On Atmel sam9x5, the power management controller had an incorrect
register area size
- On ARM64 Allwinner machine was not secting the generic irqchip
code, causing build errors in some configurations
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVzX1BGCrR//JCVInAQILnA//VG5KC42ohWCzlLrD1zmRamJ0R3YttuS7
7HWDzokTHzSX3DQ0GRVRZZ333rpqPXnDq3egz67FIBMpI9kP8wYRWvn8FWC7dS1v
6hy1m5mjJgaF7M2FcSLWyKCgRcnI61LRLhsLp8vinZsOPisfV0PMtq/bMeJALOKv
wZVfRaY6WbWo8ejPxXQY4f86qkYIPDZ4ExzymAJQwthXMR7E8BZN+ygIWkriP6kv
JxlkOSkU9qTtqShRILG+qUvlwQyNOgI/zjZ0tCRq05pzO3wUpUQLILNeM4yIcCpn
LOQqKHJ7VsH0V+9QIXtF1Q71MDRwx/e87VkBsyvpF096HlBfTsPmhwiLNxzcQ09U
fddSCPRAk3WZMsJ/gpMRvAGNF+65/MM8a1QpcHnyvAGM2nRa+FeaZJ1d/aEh7q/o
P4Tj7fCg/K3DNbidWTXgwiDtaCeoCaxjsoWSXPArboLdsTTZznaDDgjI8UwmdIZ4
CSt28ba5ugPYqpHdMbiNCNW08ofeBi6nyI094yibJYoggVq5rrHJIcT/rhkhFmMs
Z+WfSbp6KIbjaGvBEQA+S2Ed1VzndcTQSrzdM/sQE54uBQmQZs6bvWCx5xTYzTMP
gr8nJT+ZILzFnclqRvF7qX58fJCrFj5acyV9edr2axMN2SR9EmcN8dtnrOF3Evw5
8dn6DIYppKE=
=HPqy
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Three more bug fixes for ARM SoCs this week:
- The Atmel sama5d2 was registering the wrong NFC device type
- On Atmel sam9x5, the power management controller had an incorrect
register area size
- On ARM64 Allwinner machine was not secting the generic irqchip
code, causing build errors in some configurations"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
arm64/sunxi: 4.6-rc1: Add dependency on generic irq chip
ARM: dts: at91: sama5d2: use "atmel,sama5d3-nfc" compatible for nfc
A small collection of driver specific fixes for the regulator
subsysetem:
- Fix handling of probe deferral for GPIO regulators.
- Fix a typo in the module alias for DA9053.
- Fix the definition of BUCK9 in the S2MPS11 driver. This change looks
larger than it is because an irregularity in the hardware means that
the macro used to define bucks 6-10 needs duplicating and tweaking
to have a separate macro for 9.
- Fix a series of errors in the definitions of the LDOs the AXP20x
regulators, some of which had always been present and some of which
were introduced in the merge window.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXNazxAAoJECTWi3JdVIfQd9EH/0x/3Era4ctuLFP/dKVg+lAQ
PDwgGRlHg1RVyEzzy1ZWqew//P7hgBo6urwFHjcczBE95e0bBhhMJERGnhya07uq
eWYfPARl1zHxEm0gOfLBDaFAW8YIaNyXiEsW/aTeViNIL9/HIAd/ZTZhJiDLPiAF
FdNs3yzBuAR5SHsnvCTXZY1d20LeQC6AC2nWUo2WrcyKKZZ0HamrSLP0lMq0OHVQ
n2h/8f1g9TcykakCStB/4D8vLKMt0OYmalh3Y/C/JGkWnhMoBTLRgktrNFh6lnh9
E0SnMXSuZbC9+mv8MLRHQft6XagDYzNq+odwSb9tfqbXqBdncOpvvVr6hLmzxGk=
=sVmG
-----END PGP SIGNATURE-----
Merge tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A small collection of driver specific fixes for the regulator
subsysetem:
- Fix handling of probe deferral for GPIO regulators
- Fix a typo in the module alias for DA9053
- Fix the definition of BUCK9 in the S2MPS11 driver. This change
looks larger than it is because an irregularity in the hardware
means that the macro used to define bucks 6-10 needs duplicating
and tweaking to have a separate macro for 9
- Fix a series of errors in the definitions of the LDOs the AXP20x
regulators, some of which had always been present and some of which
were introduced in the merge window"
* tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: da9063: Correct module alias prefix to fix module autoloading
regulator: axp20x: Fix axp22x ldo_io registration error on cold boot
regulator: axp20x: Fix axp22x ldo_io voltage ranges
regulator: axp20x: Fix LDO4 linear voltage range
regulator: s2mps11: Fix invalid selector mask and voltages for buck9
regulator: gpio: check return value of of_get_named_gpio
This is rather too late so it'd be completely understandable if you
don't want to pull it at this point, I had thought I'd sent this earlier
but it seems I didn't. Everything has been in -next for some time now.
The main set of fixes here are mopping up some more issues with MMIO,
fixing handling of endianness configuration in DT (which just wasn't
working at all) and cases where the register and value endianness are
different.
There is also a fix for bulk register reads on SPMI.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXNaKUAAoJECTWi3JdVIfQoxEH/1DZGfeFyqhh4cdxrYz6UUr/
MbfMDR1wxNEvD4pifak6wYzQlern2Ki4J8ac72LP2OH9MYEoVs7kxESdSbqcVerc
uWQmn1qzj41hBSP5I4zmHIvYAwDry3B8mVVK9aYk+kBZLyOiw0eoMgZYyxB9cqws
pnfR4+5XHcaLGxh8QdIM7cXW1YOykXma4ivdwjwSlgWhYLGdh++j/uo9IUxqCrSa
qqxfMLPvAtEcksH8p4sJoRjibVDvzLXOvr3jMVHWyFoRGNDDB4TRiPkvydI04vYP
39pjg3c4gIAmp/RFyie1B8+zm4C+EQAkeiyH6D4jq8k9Gu6Y/XhqwFIAKtni3zg=
=k8tp
-----END PGP SIGNATURE-----
Merge tag 'regmap-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"This is rather too late so it'd be completely understandable if you
don't want to pull it at this point, I had thought I'd sent this
earlier but it seems I didn't. Everything has been in -next for some
time now.
The main set of fixes here are mopping up some more issues with MMIO,
fixing handling of endianness configuration in DT (which just wasn't
working at all) and cases where the register and value endianness are
different.
There is also a fix for bulk register reads on SPMI"
* tag 'regmap-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
regmap: mmio: Explicitly say little endian is the defualt in the bus config
regmap: mmio: Parse endianness definitions from DT
regmap: Fix implicit inclusion of device.h
regmap: mmio: Fix value endianness selection
regmap: fix documentation to match code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXNbnRAAoJEAhfPr2O5OEV9NAP/R72SwjZZrPV8ABV/yF6iiRL
pEcnDWP1vfOq3hlWKAvKHtzyAh13zUlkf0WAZJ5SFbkIyxrPD1mNCjwqNdYPFUpS
m2kAP7E8sU3ldRCdlD9QpsaOnrnRVpsPc8ChawqRyp0gVs/b+I3cJ82l2S1JHwTR
wO+7Uid3FwDtgzuiTecmtDiqKJJ1AgW3YNXJXnug2L6bAhPxHgJct+OV2K/n37de
MSIaVNHkIZxaFLXtIoCWiyxhJQCngAQhdDm+wxNGWupSdxmqArBq9QljX/6ZR9rB
7OFT8Td85J59WmMhxSvbjfCnlj0KO/fJX/QIaBXK4gFtolZxgpVAsxHaWMWtrrii
UitDMcRL1aEyygpwW72BqJ62r4RQn1QBBTRUaGlAWzYwZU6C0B0Aj70qhxdocjNE
cxv6ND2WftbYIUEQceLrFV7Qzee07YIS+FapB1Vda1s+mDtfCBnfd+n5gTkGmIeY
bweoHGAELDPyLntu6iboqAtG5MiSWGUtOEAZsYhm7ugY5OeCh8wiq4lUDsb2hQ3T
TnaCVPHEFghN5+KAmiFcp46lOCab8H0crmfEmSMGzcIMf1fZBYUc8TjceGb5GHI/
l2uNYAUz7vrZvs/qX/tKfIEERlzDgDTvw1T8NL6yt8fKzZA4Ajr3WhWntBbrfnXi
KplRJ1ngkpoArG4+24v9
=PEbo
-----END PGP SIGNATURE-----
Merge tag 'media/v4.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"A revert fixing a breakage that caused an OOPS on all VB2-based DVB
drivers.
We already have a proper fix, but it sounds safer to keep it being
tested for a while and not hurry, to avoid the risk of another
regression, specially since this is meant to be c/c to stable. So,
for now, let's just revert the broken patch"
* tag 'media/v4.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
Pull drm fixes from Dave Airlie:
"A bunch of radeon displayport mode setting fixes, and some misc i915
fixes.
There is one revert, the MST audio code in i915 was causing some
oopses, so we've decided just to drop it until next kernel when we can
fix it properly"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: fix DP mode validation
drm/radeon: fix DP mode validation
drm/i915: Bail out of pipe config compute loop on LPT
drm/radeon: fix PLL sharing on DCE6.1 (v2)
drm/radeon: fix DP link training issue with second 4K monitor
Revert "drm/i915: start adding dp mst audio"
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915/lvds: separate border enable readout from panel fitter
drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency
Pull crypto fix from Herbert Xu:
"This fixes a bug in the RSA self-test that may cause crashes on some
architectures such as SPARC"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: testmgr - Use kmalloc memory for RSA input
The introduction of switch_mm_irqs_off() brought back an old bug
regarding the use of preempt_enable_no_resched:
As part of:
62b94a08da ("sched/preempt: Take away preempt_enable_no_resched() from modules")
the definition of preempt_enable_no_resched() is only available in
built-in code, not in loadable modules, so we can't generally use
it from header files.
However, the ARM version of finish_arch_post_lock_switch()
calls preempt_enable_no_resched() and is defined as a static
inline function in asm/mmu_context.h. This in turn means we cannot
include asm/mmu_context.h from modules.
With today's tip tree, asm/mmu_context.h gets included from
linux/mmu_context.h, which is normally the exact pattern one would
expect, but unfortunately, linux/mmu_context.h can be included from
the vhost driver that is a loadable module, now causing this compile
time error with modular configs:
In file included from ../include/linux/mmu_context.h:4:0,
from ../drivers/vhost/vhost.c:18:
../arch/arm/include/asm/mmu_context.h: In function 'finish_arch_post_lock_switch':
../arch/arm/include/asm/mmu_context.h:88:3: error: implicit declaration of function 'preempt_enable_no_resched' [-Werror=implicit-function-declaration]
preempt_enable_no_resched();
Andy already tried to fix the bug by including linux/preempt.h
from asm/mmu_context.h, but that didn't help. Arnd suggested reordering
the header files, which wasn't popular, so let's use this
workaround instead:
The finish_arch_post_lock_switch() definition is now also hidden
inside of #ifdef MODULE, so we don't see anything referencing
preempt_enable_no_resched() from a header file. I've built a
few hundred randconfig kernels with this, and did not see any
new problems.
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: f98db6013c ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
Link: http://lkml.kernel.org/r/1463146234-161304-1-git-send-email-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Patch 562abd39 "xen-netback: support multiple extra info fragments
passed from frontend" contained a mistake which can result in an in-
correct number of responses being generated when handling errors
encountered when processing packets containing extra info fragments.
This patch fixes the problem.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
User visible:
- Fallback to usermode only counters when perf_event_paranoid > 1, which
is the case now (Arnaldo Carvalho de Melo)
- Do not reassign parg after collapse_tree() in libtraceevent, which
may cause tool crashes (Steven Rostedt)
Build fixes:
- Fix the build on Fedora Rawhide, where readdir_r() is deprecated and
also wrt -Werror=unused-const-variable= + x86_32_regoffset_table on
!x86_64 (Arnaldo Carvalho de Melo)
- Fix the build on Ubuntu 12.04.5, where dwarf_getlocations() isn't
available, i.e. libdw-dev < 0.157 (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXNOELAAoJENZQFvNTUqpAEmcQAJIZePquYNAsnTcRAPcgXjSG
wdxMK1xlHW8WgPi7OoU6a2Hx04xRtEc8CUsb4mTUa1umAMe7Fp9BCuWZrX2tNAxJ
y6tn95RkMSxZSbPp9SvSJDLAi7NrOafoblPnsjLw3bpdMTY+ngxhMq2Aftjpz5ai
Ok3sC3deCYoibSzpxLNkoBPPR7eOhK8wzcVNAvu5mPY0EF9VPpNF+yPs6mW38wBs
TQtji1X7049bAjxMLeeXcR4Z6x5hhVM6i3U1lB5XSMoqkcQMLFHJze7Fe5VLMc/3
jtZFZvV79PlAhJtYxlLeuuSjjuaZ6dCSE+87YYAmukup3SMp3sCTvba+7YhWEgEE
hZEAaHe8eJbSQndhYpY82mV+AQe4dINTgdLBoV1uQ5EUh/KlOaph3MuCb2jMXtVb
JLROl6wktgFJ75NzkHvix798DtOVyLqa5z0H2h27Jqm2LqrotIn+trXuz6a+0nxP
aoHsKPyZYytPvZoWHjesIn86iOSCrLN8UGhaQPTIunfO0evlSvEkWtAqU1u4mX+P
CqDFI4/2dZXnAVBZvk47xsrcxhkvEO63SDv3AIdXvsihYBGJLtaVKT9qfsxyxyqN
fAy9MghKrPXTRRJP9Z/Fzbycbv/ioiRdnNmj+cgWkPkjqt+lr5Q1V+B9BZhqsWHF
Fl0GoIr//HFjSk7gSFaN
=qn4a
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fallback to usermode-only counters when perf_event_paranoid > 1, which
is the case now (Arnaldo Carvalho de Melo)
- Do not reassign parg after collapse_tree() in libtraceevent, which
may cause tool crashes (Steven Rostedt)
- Fix the build on Fedora Rawhide, where readdir_r() is deprecated and
also wrt -Werror=unused-const-variable= + x86_32_regoffset_table on
!x86_64 (Arnaldo Carvalho de Melo)
- Fix the build on Ubuntu 12.04.5, where dwarf_getlocations() isn't
available, i.e. libdw-dev < 0.157 (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge fixes from Andrew Morton:
"4 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: thp: calculate the mapcount correctly for THP pages during WP faults
ksm: fix conflict between mmput and scan_get_next_rmap_item
ocfs2: fix posix_acl_create deadlock
ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.
total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.
This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.
Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.
Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.
Dean Luick noticed an uninitialized variable that could result in a
rmap inefficiency for the non-THP case in a previous version.
Mike Marciniszyn said:
: Our RDMA tests are seeing an issue with memory locking that bisects to
: commit 61f5d698cc ("mm: re-enable THP")
:
: The test program registers two rather large MRs (512M) and RDMA
: writes data to a passive peer using the first and RDMA reads it back
: into the second MR and compares that data. The sizes are chosen randomly
: between 0 and 1024 bytes.
:
: The test will get through a few (<= 4 iterations) and then gets a
: compare error.
:
: Tracing indicates the kernel logical addresses associated with the individual
: pages at registration ARE correct , the data in the "RDMA read response only"
: packets ARE correct.
:
: The "corruption" occurs when the packet crosse two pages that are not physically
: contiguous. The second page reads back as zero in the program.
:
: It looks like the user VA at the point of the compare error no longer points to
: the same physical address as was registered.
:
: This patch totally resolves the issue!
Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>
Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
Cc: <stable@vger.kernel.org> [4.5]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A concurrency issue about KSM in the function scan_get_next_rmap_item.
task A (ksmd): |task B (the mm's task):
|
mm = slot->mm; |
down_read(&mm->mmap_sem); |
|
... |
|
spin_lock(&ksm_mmlist_lock); |
|
ksm_scan.mm_slot go to the next slot; |
|
spin_unlock(&ksm_mmlist_lock); |
|mmput() ->
| ksm_exit():
|
|spin_lock(&ksm_mmlist_lock);
|if (mm_slot && ksm_scan.mm_slot != mm_slot) {
| if (!mm_slot->rmap_list) {
| easy_to_free = 1;
| ...
|
|if (easy_to_free) {
| mmdrop(mm);
| ...
|
|So this mm_struct may be freed in the mmput().
|
up_read(&mm->mmap_sem); |
As we can see above, the ksmd thread may access a mm_struct that already
been freed to the kmem_cache. Suppose a fork will get this mm_struct from
the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will
cause mmap_sem.count to become -1.
As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has
the same SMP race condition, so fix it too. My prev fix in function
scan_get_next_rmap_item will introduce a different SMP race condition, so
just invert the up_read/spin_unlock order as Andrea Arcangeli said.
Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.com
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 702e5bc68a ("ocfs2: use generic posix ACL infrastructure")
refactored code to use posix_acl_create. The problem with this function
is that it is not mindful of the cluster wide inode lock making it
unsuitable for use with ocfs2 inode creation with ACLs. For example,
when used in ocfs2_mknod, this function can cause deadlock as follows.
The parent dir inode lock is taken when calling posix_acl_create ->
get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can
cause deadlock if there is a blocked remote lock request waiting for the
lock to be downconverted. And same deadlock happened in ocfs2_reflink.
This fix is to revert back using ocfs2_init_acl.
Fixes: 702e5bc68a ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>