linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-18 01:51:53 +00:00

History

Mel Gorman 2ebb177175 sched/core: Offload wakee task activation if it the wakee is descheduling The previous commit: `c6e7bd7afa`: ("sched/core: Optimize ttwu() spinning on p->on_cpu") avoids spinning on p->on_rq when the task is descheduling, but only if the wakee is on a CPU that does not share cache with the waker. This patch offloads the activation of the wakee to the CPU that is about to go idle if the task is the only one on the runqueue. This potentially allows the waker task to continue making progress when the wakeup is not strictly synchronous. This is very obvious with netperf UDP_STREAM running on localhost. The waker is sending packets as quickly as possible without waiting for any reply. It frequently wakes the server for the processing of packets and when netserver is using local memory, it quickly completes the processing and goes back to idle. The waker often observes that netserver is on_rq and spins excessively leading to a drop in throughput. This is a comparison of 5.7-rc6 against "sched: Optimize ttwu() spinning on p->on_cpu" and against this patch labeled vanilla, optttwu-v1r1 and localwakelist-v1r2 respectively. 5.7.0-rc6 5.7.0-rc6 5.7.0-rc6 vanilla optttwu-v1r1 localwakelist-v1r2 Hmean send-64 251.49 ( 0.00%) 258.05 * 2.61%* 305.59 * 21.51%* Hmean send-128 497.86 ( 0.00%) 519.89 * 4.43%* 600.25 * 20.57%* Hmean send-256 944.90 ( 0.00%) 997.45 * 5.56%* 1140.19 * 20.67%* Hmean send-1024 3779.03 ( 0.00%) 3859.18 * 2.12%* 4518.19 * 19.56%* Hmean send-2048 7030.81 ( 0.00%) 7315.99 * 4.06%* 8683.01 * 23.50%* Hmean send-3312 10847.44 ( 0.00%) 11149.43 * 2.78%* 12896.71 * 18.89%* Hmean send-4096 13436.19 ( 0.00%) 13614.09 ( 1.32%) 15041.09 * 11.94%* Hmean send-8192 22624.49 ( 0.00%) 23265.32 * 2.83%* 24534.96 * 8.44%* Hmean send-16384 34441.87 ( 0.00%) 36457.15 * 5.85%* 35986.21 * 4.48%* Note that this benefit is not universal to all wakeups, it only applies to the case where the waker often spins on p->on_rq. The impact can be seen from a "perf sched latency" report generated from a single iteration of one packet size: ----------------------------------------------------------------------------------------------------------------- Task \| Runtime ms \| Switches \| Average delay ms \| Maximum delay ms \| Maximum delay at \| ----------------------------------------------------------------------------------------------------------------- vanilla netperf:4337 \| 21709.193 ms \| 2932 \| avg: 0.002 ms \| max: 0.041 ms \| max at: 112.154512 s netserver:4338 \| 14629.459 ms \| 5146990 \| avg: 0.001 ms \| max: 1615.864 ms \| max at: 140.134496 s localwakelist-v1r2 netperf:4339 \| 29789.717 ms \| 2460 \| avg: 0.002 ms \| max: 0.059 ms \| max at: 138.205389 s netserver:4340 \| 18858.767 ms \| 7279005 \| avg: 0.001 ms \| max: 0.362 ms \| max at: 135.709683 s ----------------------------------------------------------------------------------------------------------------- Note that the average wakeup delay is quite small on both the vanilla kernel and with the two patches applied. However, there are significant outliers with the vanilla kernel with the maximum one measured as 1615 milliseconds with a vanilla kernel but never worse than 0.362 ms with both patches applied and a much higher rate of context switching. Similarly a separate profile of cycles showed that 2.83% of all cycles were spent in try_to_wake_up() with almost half of the cycles spent on spinning on p->on_rq. With the two patches, the percentage of cycles spent in try_to_wake_up() drops to 1.13% Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jirka Hladky <jhladky@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: valentin.schneider@arm.com Cc: Hillf Danton <hdanton@sina.com> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20200524202956.27665-3-mgorman@techsingularity.net		2020-05-25 07:04:10 +02:00
..
autogroup.c	sched/autogroup: Make autogroup_path() always available	2019-06-24 19:23:40 +02:00
autogroup.h
clock.c	sched/clock: Use static_branch_likely() with sched_clock_running	2019-11-29 08:10:54 +01:00
completion.c	completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()	2020-03-23 18:40:25 +01:00
core.c	sched/core: Offload wakee task activation if it the wakee is descheduling	2020-05-25 07:04:10 +02:00
cpuacct.c	sched/cpuacct: Fix charge cpuacct.usage_sys	2020-05-19 20:34:14 +02:00
cpudeadline.c	Linux 5.2-rc5	2019-06-17 12:12:27 +02:00
cpudeadline.h
cpufreq_schedutil.c	sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()	2019-12-25 10:42:08 +01:00
cpufreq.c	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-12 17:59:43 +01:00
cpupri.c	sched/rt: cpupri_find: Trigger a full search as fallback	2020-03-20 13:06:20 +01:00
cpupri.h	sched/rt: Optimize cpupri_find() on non-heterogenous systems	2020-03-06 12:57:27 +01:00
cputime.c	sched/vtime: Work around an unitialized variable warning	2020-04-15 11:06:50 +02:00
deadline.c	sched/deadline: Make two functions static	2020-03-06 12:57:24 +01:00
debug.c	Merge branch 'sched/urgent'	2020-05-19 20:34:12 +02:00
fair.c	sched/fair: Replace zero-length array with flexible-array	2020-05-19 20:34:14 +02:00
features.h	sched/fair/util_est: Implement faster ramp-up EWMA on utilization increases	2019-10-29 10:01:07 +01:00
idle.c	idle: fix spelling mistake "iterrupts" -> "interrupts"	2020-01-17 10:19:22 +01:00
isolation.c	sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters	2020-04-15 10:38:26 +02:00
loadavg.c	timers/nohz: Update NOHZ load in remote tick	2020-01-28 21:36:44 +01:00
Makefile	psi: pressure stall information for CPU, memory, and IO	2018-10-26 16:26:32 -07:00
membarrier.c	membarrier: Fix RCU locking bug caused by faulty merge	2019-10-01 21:27:50 +02:00
pelt.c	sched/pelt: Sync util/runnable_sum with PELT window when propagating	2020-05-19 20:34:14 +02:00
pelt.h	sched/pelt: Add support to track thermal pressure	2020-03-06 12:57:17 +01:00
psi.c	psi: Move PF_MEMSTALL out of task->flags	2020-03-20 13:06:19 +01:00
rt.c	sched: Defend cfs and rt bandwidth quota against overflow	2020-05-19 20:34:14 +02:00
sched-pelt.h	sched/fair: Fix "runnable_avg_yN_inv" not used warnings	2019-06-17 12:15:58 +02:00
sched.h	sched/core: Offload wakee task activation if it the wakee is descheduling	2020-05-25 07:04:10 +02:00
stats.c
stats.h	psi: Move PF_MEMSTALL out of task->flags	2020-03-20 13:06:19 +01:00
stop_task.c	sched/core: Further clarify sched_class::set_next_task()	2019-11-11 08:35:21 +01:00
swait.c	sched/swait: Prepare usage in completions	2020-03-21 16:00:23 +01:00
topology.c	sched/topology: Kill SD_LOAD_BALANCE	2020-04-30 20:14:39 +02:00
wait_bit.c	sched/wait: fix ___wait_var_event(exclusive)	2019-12-17 13:32:50 +01:00
wait.c	Add wake_up_interruptible_sync_poll_locked()	2019-10-31 15:12:23 +00:00