linux/kernel
Paul E. McKenney ef631b0ca0 rcu: Make hierarchical RCU less IPI-happy
This patch fixes a hierarchical-RCU performance bug located by Anton
Blanchard.  The problem stems from a misguided attempt to provide a
work-around for jiffies-counter failure.  This work-around uses a per-CPU
n_rcu_pending counter, which is incremented on each call to rcu_pending(),
which in turn is called from each scheduling-clock interrupt.  Each CPU
then treats this counter as a surrogate for the jiffies counter, so
that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
counter will cause RCU to invoke force_quiescent_state(), which in turn
will (among other things) send resched IPIs to CPUs that have thus far
failed to pass through an RCU quiescent state.

Unfortunately, each CPU resets only its own counter after sending a
batch of IPIs.  This means that the other CPUs will also (needlessly)
send -another- round of IPIs, for a full N-squared set of IPIs in the
worst case every three scheduler-clock ticks until the grace period
finally ends.  It is not reasonable for a given CPU to reset each and
every n_rcu_pending for all the other CPUs, so this patch instead simply
disables the jiffies-counter "training wheels", thus eliminating the
excessive IPIs.

Note that the jiffies-counter IPIs do not have this problem due to
the fact that the jiffies counter is global, so that the CPU sending
the IPIs can easily reset things, thus preventing the other CPUs from
sending redundant IPIs.

Note also that the n_rcu_pending counter remains, as it will continue to
be used for tracing.  It may also see use to update the jiffies counter,
should an appropriate kick-the-jiffies-counter API appear.

Located-by: Anton Blanchard <anton@au1.ibm.com>
Tested-by: Anton Blanchard <anton@au1.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: anton@samba.org
Cc: akpm@linux-foundation.org
Cc: dipankar@in.ibm.com
Cc: manfred@colorfullife.com
Cc: cl@linux-foundation.org
Cc: josht@linux.vnet.ibm.com
Cc: schamp@sgi.com
Cc: niv@us.ibm.com
Cc: dvhltc@us.ibm.com
Cc: ego@in.ibm.com
Cc: laijs@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: peterz@infradead.org
Cc: penberg@cs.helsinki.fi
Cc: andi@firstfloor.org
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
LKML-Reference: <12396834793575-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 11:31:50 +02:00
..
irq Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:35:30 -07:00
power PM/Hibernate: Wait for SCSI devices scan to complete during resume 2009-04-13 11:37:07 -07:00
time Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-03-26 16:05:42 -07:00
trace tracing/filters: return proper error code when writing filter file 2009-04-12 11:59:29 +02:00
.gitignore
acct.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
async.c async: remove the temporary (2.6.29) "async is off by default" code 2009-03-28 13:05:30 -07:00
audit_tree.c audit: incorrect ref counting in audit tree tag_chunk 2009-04-05 13:48:26 -04:00
audit.c Audit: remove spaces from audit_log_d_path 2009-04-05 13:49:04 -04:00
audit.h
auditfilter.c make the e->rule.xxx shorter in kernel auditfilter.c 2009-04-05 13:40:33 -04:00
auditsc.c Audit: remove spaces from audit_log_d_path 2009-04-05 13:49:04 -04:00
backtracetest.c
bounds.c
capability.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
cgroup_debug.c debug cgroup: remove unneeded cgroup_lock 2009-04-02 19:04:54 -07:00
cgroup_freezer.c
cgroup.c memcg: fix OOM killer under memcg 2009-04-02 19:04:55 -07:00
compat.c
configs.c
cpu.c cpumask: use set_cpu_active in init/main.c 2009-03-30 22:05:12 +10:30
cpuset.c cpusets: prevent PF_THREAD_BOUND tasks from attaching to non-root cpusets 2009-04-02 19:04:57 -07:00
cred-internals.h
cred.c
delayacct.c
dma-coherent.c dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy. 2009-01-21 18:51:53 +09:00
dma.c
exec_domain.c Get rid of indirect include of fs_struct.h 2009-03-31 23:00:27 -04:00
exit.c Merge branch 'irq/threaded' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-07 14:07:52 -07:00
extable.c Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 11:04:19 -07:00
fork.c Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:35:30 -07:00
freezer.c
futex_compat.c
futex.c futex: comment requeue key reference semantics 2009-04-02 23:39:53 +02:00
hrtimer.c hrtimer: fix rq->lock inversion (again) 2009-03-31 14:52:52 +02:00
hung_task.c softlockup: ensure the task has been switched out once 2009-02-11 11:04:16 +01:00
itimer.c timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
kallsyms.c Ksplice: Add functions for walking kallsyms symbols 2009-03-31 13:05:32 +10:30
Kconfig.freezer
Kconfig.hz
Kconfig.preempt
kexec.c kexec: vmcoreinfo_data[] can become static 2009-04-02 19:05:04 -07:00
kfifo.c
kgdb.c
kmod.c module: create a request_module_nowait() 2009-03-31 13:05:35 +10:30
kprobes.c kprobes: support per-kprobe disabling 2009-04-07 08:31:08 -07:00
ksysfs.c
kthread.c kthread: move sched-realeted initialization from kthreadd context 2009-04-09 09:50:37 +09:30
latencytop.c sched, latencytop: incorporate review feedback from Andrew Morton 2009-02-11 10:18:04 +01:00
lockdep_internals.h lockdep: get_user_chars() redo 2009-02-14 23:28:22 +01:00
lockdep_proc.c lockstat: warn about disabled lock debugging 2009-02-14 23:28:28 +01:00
lockdep_states.h lockdep: move state bit definitions around 2009-02-14 23:27:59 +01:00
lockdep.c Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-06 13:37:30 -07:00
Makefile Merge branch 'linus' into core/softlockup 2009-04-07 11:15:40 +02:00
marker.c
module.c async: Fix module loading async-work regression 2009-04-11 12:44:49 -07:00
mutex-debug.c mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
mutex-debug.h mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
mutex.c mutex: have non-spinning mutexes on s390 by default 2009-04-09 19:28:24 +02:00
mutex.h mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
notifier.c
ns_cgroup.c cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups 2009-04-02 19:04:53 -07:00
nsproxy.c
panic.c lockdep: continue lock debugging despite some taints 2009-04-12 16:10:52 +02:00
params.c param: fix charp parameters set via sysfs 2009-03-31 13:05:30 +10:30
pid_namespace.c signals: zap_pid_ns_process() should use force_sig() 2009-04-02 19:04:58 -07:00
pid.c pids: refactor vnr/nr_ns helpers to make them safe 2009-04-02 19:05:02 -07:00
pm_qos_params.c
posix-cpu-timers.c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:37:28 -07:00
posix-timers.c [CVE-2009-0029] System call wrappers part 05 2009-01-14 14:15:20 +01:00
printk.c Merge branch 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 10:23:25 -07:00
profile.c profiling: fix broken profiling regression 2009-02-10 00:50:37 +01:00
ptrace.c ptrace: fix exit_ptrace() vs ptrace_traceme() race 2009-04-13 15:04:31 -07:00
rcuclassic.c kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies 2009-04-03 12:23:02 +02:00
rcupdate.c rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu 2009-03-31 00:09:37 +02:00
rcupreempt_trace.c
rcupreempt.c kmemtrace, rcu: fix rcupreempt.c data structure dependencies 2009-04-03 12:23:04 +02:00
rcutorture.c cpumask: convert rcutorture.c 2009-03-30 22:05:16 +10:30
rcutree_trace.c rcu: Make hierarchical RCU less IPI-happy 2009-04-14 11:31:50 +02:00
rcutree.c rcu: Make hierarchical RCU less IPI-happy 2009-04-14 11:31:50 +02:00
rcutree.h kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies 2009-04-03 12:23:03 +02:00
relay.c Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 11:04:19 -07:00
res_counter.c
resource.c resources: fix parameter name and kernel-doc 2009-01-15 16:39:38 -08:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
sched_clock.c Merge branch 'tracing/core-v2' into tracing-for-linus 2009-04-02 00:49:02 +02:00
sched_cpupri.c sched_rt: don't allocate cpumask in fastpath 2009-04-01 13:24:51 +02:00
sched_cpupri.h cpumask: remove cpumask_t from core 2009-03-30 22:05:17 +10:30
sched_debug.c sched: remove unused fields from struct rq 2009-03-24 23:16:51 +01:00
sched_fair.c Merge branch 'sched/urgent'; commit 'v2.6.29-rc5' into sched/core 2009-02-15 21:15:16 +01:00
sched_features.h Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-03-30 17:17:35 -07:00
sched_idletask.c
sched_rt.c Merge commit 'v2.6.30-rc1' into sched/urgent 2009-04-08 17:26:00 +02:00
sched_stats.h sched: remove unused fields from struct rq 2009-03-24 23:16:51 +01:00
sched.c Merge commit 'v2.6.30-rc1' into sched/urgent 2009-04-08 17:26:00 +02:00
seccomp.c x86-64: seccomp: fix 32/64 syscall hole 2009-03-02 15:41:30 -08:00
semaphore.c
signal.c signals: SI_USER: Masquerade si_pid when crossing pid ns boundary 2009-04-02 19:04:58 -07:00
slow-work.c Document the slow work thread pool 2009-04-03 16:42:35 +01:00
smp.c generic-ipi: eliminate WARN_ON()s during oops/panic 2009-03-13 10:47:34 +01:00
softirq.c Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-06 13:37:30 -07:00
softlockup.c softlockup: decouple hung tasks check from softlockup detection 2009-01-16 14:06:04 +01:00
spinlock.c Allow rwlocks to re-enable interrupts 2009-04-02 19:05:11 -07:00
srcu.c
stacktrace.c
stop_machine.c cpumask: remove cpumask_t from core 2009-03-30 22:05:17 +10:30
sys_ni.c [CVE-2009-0029] Make sys_syslog a conditional system call 2009-01-14 14:15:16 +01:00
sys.c kernel/sys.c: clean up sys_shutdown exit path 2009-04-13 15:04:32 -07:00
sysctl_check.c net: add ARP notify option for devices 2009-02-01 01:04:33 -08:00
sysctl.c mm: move the scan_unevictable_pages sysctl to the vm table 2009-04-13 15:04:28 -07:00
taskstats.c
test_kprobes.c
time.c [CVE-2009-0029] System call wrappers part 01 2009-01-14 14:15:18 +01:00
timeconst.pl
timer.c Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:35:30 -07:00
tracepoint.c tracepoints: dont update zero-sized tracepoint sections 2009-03-18 19:55:00 +01:00
tsacct.c Fix fixpoint divide exception in acct_update_integrals 2009-03-09 08:13:35 -07:00
uid16.c [CVE-2009-0029] System call wrappers part 19 2009-01-14 14:15:26 +01:00
up.c smp_call_function_single(): be slightly less stupid, fix #2 2009-01-12 16:04:37 +01:00
user_namespace.c Fix recursive lock in free_uid()/free_user_ns() 2009-02-27 16:26:21 -08:00
user.c Merge branch 'master' into next 2009-03-24 10:52:46 +11:00
utsname_sysctl.c proc_sysctl: use CONFIG_PROC_SYSCTL around ipc and utsname proc_handlers 2009-04-02 19:05:01 -07:00
utsname.c
wait.c wait: prevent exclusive waiter starvation 2009-02-05 12:56:48 -08:00
workqueue.c work_on_cpu(): rewrite it to create a kernel thread on demand 2009-04-09 09:50:37 +09:30