linux

History

Vincent Guittot b5a9b34078 sched/fair: Fix incorrect task group ->load_avg A scheduler performance regression has been reported by Joseph Salisbury, which he bisected back to: `3d30544f02` ("sched/fair: Apply more PELT fixes) The regression triggers when several levels of task groups are involved (read: SystemD) and cpu_possible_mask != cpu_present_mask. The root cause is that group entity's load (tg_child->se[i]->avg.load_avg) is initialized to scale_load_down(se->load.weight). During the creation of a child task group, its group entities on possible CPUs are attached to parent's cfs_rq (tg_parent) and their loads are added to the parent's load (tg_parent->load_avg) with update_tg_load_avg(). But only the load on online CPUs will then be updated to reflect real load, whereas load on other CPUs will stay at the initial value. The result is a tg_parent->load_avg that is higher than the real load, the weight of group entities (tg_parent->se[i]->load.weight) on online CPUs is smaller than it should be, and the task group gets a less running time than what it could expect. ( This situation can be detected with /proc/sched_debug. The ".tg_load_avg" of the task group will be much higher than sum of ".tg_load_avg_contrib" of online cfs_rqs of the task group. ) The load of group entities don't have to be intialized to something else than 0 because their load will increase when an entity is attached. Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> # 4.8.x Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: joonwoop@codeaurora.org Fixes: `3d30544f02` ("sched/fair: Apply more PELT fixes) Link: http://lkml.kernel.org/r/1476881123-10159-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>		2016-10-19 15:04:47 +02:00
..
auto_group.c	sched/core: Move the sched_to_prio[] arrays out of line	2015-12-04 10:34:46 +01:00
auto_group.h	sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE()	2015-05-08 12:11:32 +02:00
clock.c	sched/clock: Make local_clock()/cpu_clock() inline	2016-04-13 12:25:22 +02:00
completion.c	sched/completion: Serialize completion_done() with complete()	2015-02-18 14:27:40 +01:00
core.c	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-10-03 16:13:28 -07:00
cpuacct.c	sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together	2016-07-09 13:56:15 +02:00
cpuacct.h	sched/cpuacct: Simplify the cpuacct code	2016-03-21 11:00:28 +01:00
cpudeadline.c	sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()	2016-09-05 13:29:43 +02:00
cpudeadline.h	sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()	2016-09-05 13:29:43 +02:00
cpufreq_schedutil.c	cpufreq: schedutil: Add iowait boosting	2016-09-13 23:36:01 +02:00
cpufreq.c	cpufreq / sched: Pass flags to cpufreq_update_util()	2016-08-16 22:14:55 +02:00
cpupri.c	sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed	2016-05-12 09:55:35 +02:00
cpupri.h	sched/cpupri: Remove unnecessary definitions in cpupri.h	2014-11-16 10:58:59 +01:00
cputime.c	sched/irqtime: Consolidate irqtime flushing code	2016-09-30 11:46:41 +02:00
deadline.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-10-03 13:39:00 -07:00
debug.c	Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2016-10-14 12:18:50 -07:00
fair.c	sched/fair: Fix incorrect task group ->load_avg	2016-10-19 15:04:47 +02:00
features.h	sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define	2015-09-13 09:52:55 +02:00
idle_task.c	sched/core: Rewrite and improve select_idle_siblings()	2016-09-30 11:03:09 +02:00
idle.c	nmi_backtrace: generate one-line reports for idle cpus	2016-10-07 18:46:30 -07:00
loadavg.c	sched/core: Correct off by one bug in load migration calculation	2016-07-13 14:58:20 +02:00
Makefile	cpufreq: schedutil: New governor based on scheduler utilization data	2016-04-02 01:09:12 +02:00
rt.c	cpufreq / sched: Pass runqueue pointer to cpufreq_update_util()	2016-08-16 22:16:03 +02:00
sched.h	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-10-03 16:13:28 -07:00
stats.c	sched: use %*pb[l] to print bitmaps including cpumasks and nodemasks	2015-02-13 21:21:37 -08:00
stats.h	sched/debug: Rename 'schedstat_val()' -> 'schedstat_val_or_zero()'	2016-09-05 13:29:46 +02:00
stop_task.c	locking/lockdep, sched/core: Implement a better lock pinning scheme	2016-05-05 09:23:59 +02:00
swait.c	wait.[ch]: Introduce the simple waitqueue (swait) implementation	2016-02-25 11:27:16 +01:00
wait.c	sched/wait: Introduce init_wait_entry()	2016-09-30 10:54:03 +02:00