Deadlock during cgroup migration from cpu hotplug path when a task T is
being moved from source to destination cgroup.
kworker/0:0
cpuset_hotplug_workfn()
cpuset_hotplug_update_tasks()
hotplug_update_tasks_legacy()
remove_tasks_in_empty_cpuset()
cgroup_transfer_tasks() // stuck in iterator loop
cgroup_migrate()
cgroup_migrate_add_task()
In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T.
Task T will not migrate to destination cgroup. css_task_iter_start()
will keep pointing to task T in loop waiting for task T cg_list node
to be removed.
Task T
do_exit()
exit_signals() // sets PF_EXITING
exit_task_namespaces()
switch_task_namespaces()
free_nsproxy()
put_mnt_ns()
drop_collected_mounts()
namespace_unlock()
synchronize_rcu()
_synchronize_rcu_expedited()
schedule_work() // on cpu0 low priority worker pool
wait_event() // waiting for work item to execute
Task T inserted a work item in the worklist of cpu0 low priority
worker pool. It is waiting for expedited grace period work item
to execute. This work item will only be executed once kworker/0:0
complete execution of cpuset_hotplug_workfn().
kworker/0:0 ==> Task T ==>kworker/0:0
In case of PF_EXITING task being migrated from source to destination
cgroup, migrate next available task in source cgroup.
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-17
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a corner case in generic XDP where we have non-linear skbs
but enough tailroom in the skb to not miss to linearizing there,
from Song.
2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when
BPF context is not skb, from Daniel.
3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper
call would use the wrong register for the skb, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_NO_HZ_FULL doesn't make sense without CONFIG_CPU_ISOLATION. In
fact enabling the first without the second is a regression as nohz_full=
boot parameter gets silently ignored.
Besides this unnatural combination hangs RCU gp kthread when running
rcutorture for reasons that are not yet fully understood:
rcu_preempt kthread starved for 9974 jiffies! g4294967208
+c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
rcu_preempt I 7464 8 2 0x80000000
Call Trace:
__schedule+0x493/0x620
schedule+0x24/0x40
schedule_timeout+0x330/0x3b0
? preempt_count_sub+0xea/0x140
? collect_expired_timers+0xb0/0xb0
rcu_gp_kthread+0x6bf/0xef0
This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which
prevents all these bad behaviours.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <kernellwp@gmail.com>
Fixes: 5c4991e24c ("sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL")
Link: http://lkml.kernel.org/r/1513275507-29200-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull timer fix from Thomas Gleixner:
"A single bugfix which prevents arbitrary sigev_notify values in
posix-timers"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timer: Properly check sigevent->sigev_notify
Pull networking fixes from David Miller:
1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot.
2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik
Brueckner.
3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes
Berg.
4) Add missing barriers to ptr_ring, from Michael S. Tsirkin.
5) Don't advertise gigabit in sh_eth when not available, from Thomas
Petazzoni.
6) Check network namespace when delivering to netlink taps, from Kevin
Cernekee.
7) Kill a race in raw_sendmsg(), from Mohamed Ghannam.
8) Use correct address in TCP md5 lookups when replying to an incoming
segment, from Christoph Paasch.
9) Add schedule points to BPF map alloc/free, from Eric Dumazet.
10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also
from Eric Dumazet.
11) Fix SKB leak in tipc, from Jon Maloy.
12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz.
13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn.
14) Add some new qmi_wwan device IDs, from Daniele Palmas.
15) Fix static key imbalance in ingress qdisc, from Jiri Pirko.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net: qcom/emac: Reduce timeout for mdio read/write
net: sched: fix static key imbalance in case of ingress/clsact_init error
net: sched: fix clsact init error path
ip_gre: fix wrong return value of erspan_rcv
net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
pkt_sched: Remove TC_RED_OFFLOADED from uapi
net: sched: Move to new offload indication in RED
net: sched: Add TCA_HW_OFFLOAD
net: aquantia: Increment driver version
net: aquantia: Fix typo in ethtool statistics names
net: aquantia: Update hw counters on hw init
net: aquantia: Improve link state and statistics check interval callback
net: aquantia: Fill in multicast counter in ndev stats from hardware
net: aquantia: Fill ndev stat couters from hardware
net: aquantia: Extend stat counters to 64bit values
net: aquantia: Fix hardware DMA stream overload on large MRRS
net: aquantia: Fix actual speed capabilities reporting
sock: free skb in skb_complete_tx_timestamp on error
s390/qeth: update takeover IPs after configuration change
s390/qeth: lock IP table while applying takeover changes
...
Pull locking fixes from Ingo Molnar:
"Misc fixes:
- Fix a S390 boot hang that was caused by the lock-break logic.
Remove lock-break to begin with, as review suggested it was
unreasonably fragile and our confidence in its continued good
health is lower than our confidence in its removal.
- Remove the lockdep cross-release checking code for now, because of
unresolved false positive warnings. This should make lockdep work
well everywhere again.
- Get rid of the final (and single) ACCESS_ONCE() straggler and
remove the API from v4.15.
- Fix a liblockdep build warning"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/lib/lockdep: Add missing declaration of 'pr_cont()'
checkpatch: Remove ACCESS_ONCE() warning
compiler.h: Remove ACCESS_ONCE()
tools/include: Remove ACCESS_ONCE()
tools/perf: Convert ACCESS_ONCE() to READ_ONCE()
locking/lockdep: Remove the cross-release locking checks
locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y
locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK
Pull scheduler fixes from Ingo Molnar:
"Two fixes: a crash fix for an ARM SoC platform, and kernel-doc
warnings fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Do not pull from current CPU if only one CPU to pull
sched/core: Fix kernel-doc warnings after code movement
Some JITs don't cache skb context on stack in prologue, so when
LD_ABS/IND is used and helper calls yield bpf_helper_changes_pkt_data()
as true, then they temporarily save/restore skb pointer. However,
the assumption that skb always has to be in r1 is a bit of a
gamble. Right now it turned out to be true for all helpers listed
in bpf_helper_changes_pkt_data(), but lets enforce that from verifier
side, so that we make this a guarantee and bail out if the func
proto is misconfigured in future helpers.
In case of BPF helper calls from cBPF, bpf_helper_changes_pkt_data()
is completely unrelevant here (since cBPF is context read-only) and
therefore always false.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Wagner reported a crash on the BeagleBone Black SoC.
This is a single CPU architecture, and does not have a functional
arch_send_call_function_single_ipi() implementation which can crash
the kernel if that is called.
As it only has one CPU, it shouldn't be called, but if the kernel is
compiled for SMP, the push/pull RT scheduling logic now calls it for
irq_work if the one CPU is overloaded, it can use that function to call
itself and crash the kernel.
Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system
only has a single CPU. But SCHED_FEAT is a constant if sched debugging
is turned off. Another fix can also be used, and this should also help
with normal SMP machines. That is, do not initiate the pull code if
there's only one RT overloaded CPU, and that CPU happens to be the
current CPU that is scheduling in a lower priority task.
Even on a system with many CPUs, if there's many RT tasks waiting to
run on a single CPU, and that CPU schedules in another RT task of lower
priority, it will initiate the PULL logic in case there's a higher
priority RT task on another CPU that is waiting to run. But if there is
no other CPU with waiting RT tasks, it will initiate the RT pull logic
on itself (as it still has RT tasks waiting to run). This is a wasted
effort.
Not only does this help with SMP code where the current CPU is the only
one with RT overloaded tasks, it should also solve the issue that
Daniel encountered, because it will prevent the PULL logic from
executing, as there's only one CPU on the system, and the check added
here will cause it to exit the RT pull code.
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic")
Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As long as cft->name is guaranteed to be NUL-terminated, using strlcpy() would
work just as well and avoid that warning, so the change below could be folded
into that commit.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
timer_create() specifies via sigevent->sigev_notify the signal delivery for
the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
and (SIGEV_SIGNAL | SIGEV_THREAD_ID).
The sanity check in good_sigevent() is only checking the valid combination
for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
not set it accepts any random value.
This has no real effects on the posix timer and signal delivery code, but
it affects show_timer() which handles the output of /proc/$PID/timers. That
function uses a string array to pretty print sigev_notify. The access to
that array has no bound checks, so random sigev_notify cause access beyond
the array bounds.
Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
masking from various code pathes as SIGEV_NONE can never be set in
combination with SIGEV_THREAD_ID.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: stable@vger.kernel.org
Pull tracing fixes from Steven Rostedt:
"Various fix-ups:
- comment fixes
- build fix
- better memory alloction (don't use NR_CPUS)
- configuration fix
- build warning fix
- enhanced callback parameter (to simplify users of trace hooks)
- give up on stack tracing when RCU isn't watching (it's a lost
cause)"
* tag 'trace-v4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have stack trace not record if RCU is not watching
tracing: Pass export pointer as argument to ->write()
ring-buffer: Remove unused function __rb_data_page_index()
tracing: make PREEMPTIRQ_EVENTS depend on TRACING
tracing: Allocate mask_str buffer dynamically
tracing: always define trace_{irq,preempt}_{enable_disable}
tracing: Fix code comments in trace.c
The stack tracer records a stack dump whenever it sees a stack usage that is
more than what it ever saw before. This can happen at any function that is
being traced. If it happens when the CPU is going idle (or other strange
locations), RCU may not be watching, and in this case, the recording of the
stack trace will trigger a warning. There's been lots of efforts to make
hacks to allow stack tracing to proceed even if RCU is not watching, but
this only causes more issues to appear. Simply do not trace a stack if RCU
is not watching. It probably isn't a bad stack anyway.
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
gcc toggle -fisolate-erroneous-paths-dereference (default at -O2
onwards) isolates faulty code paths such as null pointer access, divide
by zero etc. If gcc port doesnt implement __builtin_trap, an abort() is
generated which causes kernel link error.
In this case, gcc is generating abort due to 'divide by zero' in
lib/mpi/mpih-div.c.
Currently 'frv' and 'arc' are failing. Previously other arch was also
broken like m32r was fixed by commit d22e3d69ee ("m32r: fix build
failure").
Let's define this weak function which is common for all arch and fix the
problem permanently. We can even remove the arch specific 'abort' after
this is done.
Link: http://lkml.kernel.org/r/1513118956-8718-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch remains without practical effect since both macros carry
identical values. Still, it might become a problem in the future if
(for whatever reason) the default overflow uid and gid differ. The
DEFAULT_FS_OVERFLOWGID macro was previously unused.
Signed-off-by: Wolffhardt Schwabe <wolffhardt.schwabe@fau.de>
Signed-off-by: Anatoliy Cherepantsev <anatoliy.cherepantsev@fau.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
While using large percpu maps, htab_map_alloc() can hold
cpu for hundreds of ms.
This patch adds cond_resched() calls to percpu alloc/free
call sites, all running in process context.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When tracing and networking programs are both attached in the
system and both use event-output helpers that eventually call
into perf_event_output(), then we could end up in a situation
where the tracing attached program runs in user context while
a cls_bpf program is triggered on that same CPU out of softirq
context.
Since both rely on the same per-cpu perf_sample_data, we could
potentially corrupt it. This can only ever happen in a combination
of the two types; all tracing programs use a bpf_prog_active
counter to bail out in case a program is already running on
that CPU out of a different context. XDP and cls_bpf programs
by themselves don't have this issue as they run in the same
context only. Therefore, split both perf_sample_data so they
cannot be accessed from each other.
Fixes: 20b9d7ac48 ("bpf: avoid excessive stack usage for perf_sample_data")
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
cgroup root name and file name have max length limit, we should
avoid copying longer name than that to the name.
tj: minor update to $SUBJ.
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y),
while it found a number of old bugs initially, was also causing too many
false positives that caused people to disable lockdep - which is arguably
a worse overall outcome.
If we disable cross-release by default but keep the code upstream then
in practice the most likely outcome is that we'll allow the situation
to degrade gradually, by allowing entropy to introduce more and more
false positives, until it overwhelms maintenance capacity.
Another bad side effect was that people were trying to work around
the false positives by uglifying/complicating unrelated code. There's
a marked difference between annotating locking operations and
uglifying good code just due to bad lock debugging code ...
This gradual decrease in quality happened to a number of debugging
facilities in the kernel, and lockdep is pretty complex already,
so we cannot risk this outcome.
Either cross-release checking can be done right with no false positives,
or it should not be included in the upstream kernel.
( Note that it might make sense to maintain it out of tree and go through
the false positives every now and then and see whether new bugs were
introduced. )
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When CONFIG_GENERIC_LOCKBEAK=y, locking structures grow an extra int ->break_lock
field which is used to implement raw_spin_is_contended() by setting the field
to 1 when waiting on a lock and clearing it to zero when holding a lock.
However, there are a few problems with this approach:
- There is a write-write race between a CPU successfully taking the lock
(and subsequently writing break_lock = 0) and a waiter waiting on
the lock (and subsequently writing break_lock = 1). This could result
in a contended lock being reported as uncontended and vice-versa.
- On machines with store buffers, nothing guarantees that the writes
to break_lock are visible to other CPUs at any particular time.
- READ_ONCE/WRITE_ONCE are not used, so the field is potentially
susceptible to harmful compiler optimisations,
Consequently, the usefulness of this field is unclear and we'd be better off
removing it and allowing architectures to implement raw_spin_is_contended() by
providing a definition of arch_spin_is_contended(), as they can when
CONFIG_GENERIC_LOCKBREAK=n.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1511894539-7988-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
a8a217c221 ("locking/core: Remove {read,spin,write}_can_lock()")
removed the definition of raw_spin_can_lock(), causing the GENERIC_LOCKBREAK
spin_lock() routines to poll the ->break_lock field when waiting on a lock.
This has been reported to cause a deadlock during boot on s390, because
the ->break_lock field is also set by the waiters, and can potentially
remain set indefinitely if no other CPUs come in to take the lock after
it has been released.
This patch removes the explicit spinning on ->break_lock from the waiters,
instead relying on the outer trylock() operation to determine when the
lock is available.
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a8a217c221 ("locking/core: Remove {read,spin,write}_can_lock()")
Link: http://lkml.kernel.org/r/1511894539-7988-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull cgroup fixes from Tejun Heo:
- Prateek posted a couple patches to fix a deadlock involving cpuset
and workqueue. It unfortunately caused a different deadlock and the
recent workqueue hotplug simplification removed the original
deadlock, so Prateek's two patches are reverted for now.
- The new stat code was missing u64_stats initialization. Fixed.
- Doc and other misc changes
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: add warning about RT not being supported on cgroup2
Revert "cgroup/cpuset: remove circular dependency deadlock"
Revert "cpuset: Make cpuset hotplug synchronous"
cgroup: properly init u64_stats
debug cgroup: use task_css_set instead of rcu_dereference
cpuset: Make cpuset hotplug synchronous
cgroup/cpuset: remove circular dependency deadlock
Pull workqueue fixes from Tejun Heo:
- Lai's hotplug simplifications inadvertently fix a possible deadlock
involving cpuset and workqueue
- CPU isolation fix which was reverted due to the changes in the
housekeeping code resurrected
- A trivial unused include removal
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove unneeded kallsyms include
workqueue/hotplug: remove the workaround in rebind_workers()
workqueue/hotplug: simplify workqueue_offline_cpu()
workqueue: respect isolated cpus when queueing an unbound work
main: kernel_start: move housekeeping_init() before workqueue_init_early()
The purpose of torture_runnable is to allow rcutorture and locktorture
to be started and stopped via sysfs when they are built into the kernel
(as in not compiled as loadable modules). However, the 0444 permissions
for both instances of torture_runnable prevent this use case from ever
being put into practice. Given that there have been no complaints
about this deficiency, it is reasonable to conclude that no one actually
makes use of this sysfs capability. The perf_runnable module parameter
for rcuperf is in the same situation.
This commit therefore removes both torture_runnable instances as well
as perf_runnable.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The stutter_wait() function repeatedly fetched stutter_pause_test, and
should really just fetch it once on each pass. The races should be
harmless, but why have the races? Also, the whole point of the value
"2" for stutter_pause_test is to get everyone to start at very nearly
the same time, but the value "2" was the first jiffy of the stutter
rather than the last jiffy of the stutter.
This commit rearranges the code to be more sensible.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Things can explode for locktorture if the user does combinations
of nwriters_stress=0 nreaders_stress=0. Fix this by not assuming
we always want to torture writer threads.
Reported-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
We should account for nreader threads, not writers in this
callback. Could even trigger a div by 0 if the user explicitly
disables writers.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit attempts to make a very rare rcutorture failure happen
more often by increasing the fraction of RCU-preempt read-side critical
sections that are preempted.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds a torture_preempt_schedule() that is nothingness
in !PREEMPT builds and is preempt_schedule() otherwise. Then
torture_preempt_schedule() is used to eliminate several ugly #ifdefs,
both in rcutorture and in locktorture.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently have_rcu_nocb_mask is used to avoid double allocation of
rcu_nocb_mask during boot up. Due to different representation of
cpumask_var_t on different kernel config CPUMASK=y(or n) it was okay.
But now we have a helper cpumask_available(), which can be utilized
to check whether rcu_nocb_mask has been allocated or not without using
a variable.
Removing the variable also reduces vmlinux size.
Unpatched version:
text data bss dec hex filename
13050393 7852470 14543408 35446271 21cddff vmlinux
Patched version:
text data bss dec hex filename
13050390 7852438 14543408 35446236 21cdddc vmlinux
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The following statement has for some reason proven non-intuitive:
WARN_ON_ONCE(rcu_segcblist_empty(&rdp->cblist) != (count == 0));
This commit therefore adds a comment that states that this warning
usually triggers in response to a double call_rcu(), which is sort
of like a double free. The comment also suggests building with
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y to track down the double call_rcu().
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
sign_extend32 counts the sign bit parameter from 0, not from 1. So we
have to use "11" for 12th bit, not "12".
This mistake means we have not allowed negative op and cmp args since
commit 30d6e0a419 ("futex: Remove duplicated code and fix undefined
behaviour") till now.
Fixes: 30d6e0a419 ("futex: Remove duplicated code and fix undefined behaviour")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
drivers).
2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
to userspace and broke some apps. From Johannes Berg.
3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
Wang.
4) Gianfar MAC can't do EEE so don't advertise it by default, from
Claudiu Manoil.
5) Relax strict netlink attribute validation, but emit a warning. From
David Ahern.
6) Fix regression in checksum offload of thunderx driver, from Florian
Westphal.
7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.
8) New card support in iwlwifi, from Ihab Zhaika.
9) BBR congestion control bug fixes from Neal Cardwell.
10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.
11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.
12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.
13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
net: mvpp2: fix the RSS table entry offset
tcp: evaluate packet losses upon RTT change
tcp: fix off-by-one bug in RACK
tcp: always evaluate losses in RACK upon undo
tcp: correctly test congestion state in RACK
bnxt_en: Fix sources of spurious netpoll warnings
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
tcp_bbr: reset full pipe detection on loss recovery undo
tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
sfc: pass valid pointers from efx_enqueue_unwind
gianfar: Disable EEE autoneg by default
tcp: invalidate rate samples during SACK reneging
can: peak/pcie_fd: fix potential bug in restarting tx queue
can: usb_8dev: cancel urb on -EPIPE and -EPROTO
can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
can: esd_usb2: cancel urb on -EPIPE and -EPROTO
can: ems_usb: cancel urb on -EPIPE and -EPROTO
can: mcba_usb: cancel urb on -EPROTO
usbnet: fix alignment for frames with no ethernet header
tcp: use current time in tcp_rcv_space_adjust()
...
Use of init_rcu_head() and destroy_rcu_head() from modules results in
the following build-time error with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y:
ERROR: "init_rcu_head" [drivers/scsi/scsi_mod.ko] undefined!
ERROR: "destroy_rcu_head" [drivers/scsi/scsi_mod.ko] undefined!
This commit therefore adds EXPORT_SYMBOL_GPL() for each to allow them to
be used by GPL-licensed kernel modules.
Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull kgdb fixes from Jason Wessel:
- Fix long standing problem with kdb kallsyms_symbol_next() return
value
- Add new co-maintainer Daniel Thompson
* tag 'for_linus-4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
kgdb/kdb/debug_core: Add co-maintainer Daniel Thompson
kdb: Fix handling of kallsyms_symbol_next() return value
Pull CPU hotplug fix from Ingo Molnar:
"A single fix moving the smp-call queue flush step to the intended
point in the state machine"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place
Pull scheduler fixes from Ingo Molnar:
"This includes a fix for the add_wait_queue() queue ordering brown
paperbag bug, plus PELT accounting fixes for cgroups scheduling
artifacts"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Update and fix the runnable propagation rule
sched/wait: Fix add_wait_queue() behavioral change
Pull perf fixes from Ingo Molnar:
"This includes perf namespace support kernel side fixes, plus an
accumulated set of perf tooling fixes - including UAPI header
synchronization that should make the perf build less noisy"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
tooling/headers: Synchronize updated s390 and x86 UAPI headers
tools headers: Syncronize mman.h ABI header
tools headers: Synchronize prctl.h ABI header
tools headers: Synchronize KVM arch ABI headers
tools headers: Synchronize drm/i915_drm.h
tools headers uapi: Synchronize drm/drm.h
tools headers: Synchronize perf_event.h header
tools headers: Synchronize kernel ABI headers wrt SPDX tags
tools/headers: Synchronize kernel x86 UAPI headers
perf intel-pt: Bring instruction decoder files into line with the kernel
perf test: Fix test 21 for s390x
perf bench numa: Fixup discontiguous/sparse numa nodes
perf top: Use signal interface for SIGWINCH handler
perf top: Fix window dimensions change handling
perf: Fix header.size for namespace events
perf top: Ignore kptr_restrict when not sampling the kernel
perf record: Ignore kptr_restrict when not sampling the kernel
perf report: Ignore kptr_restrict when not sampling the kernel
perf evlist: Add helper to check if attr.exclude_kernel is set in all evsels
perf test shell: Fix test case probe libc's inet_pton on s390x
...
Pull lockdep fix from Ingo Molnar:
"Fix a possible NULL dereference for the (rare) case when a task
doesn't have ->xhlocks space allocated due to kmalloc() OOM-ing"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix possible NULL deref
Pull irq fixes from Ingo Molnar:
"Two fixes: use bool type consistently, plus a irq_matrix_available()
bugfix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdesc: Use bool return type instead of int
genirq/matrix: Fix the precedence fix for real
kallsyms_symbol_next() returns a boolean (true on success). Currently
kdb_read() tests the return value with an inequality that
unconditionally evaluates to true.
This is fixed in the obvious way and, since the conditional branch is
supposed to be unreachable, we also add a WARN_ON().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>