sched: Optimize task_sched_runtime()
Large multi-threaded apps like to hit this using do_sys_times() and then queue up on the rq->lock. Avoid when possible. Larry reported ~20% performance increase his test case. Reported-by: Larry Woodman <lwoodman@redhat.com> Suggested-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131111172925.GG26898@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
parent
5eca82a9ac
commit
911b2898b3
@ -2253,6 +2253,20 @@ unsigned long long task_sched_runtime(struct task_struct *p)
|
||||
struct rq *rq;
|
||||
u64 ns = 0;
|
||||
|
||||
#if defined(CONFIG_64BIT) && defined(CONFIG_SMP)
|
||||
/*
|
||||
* 64-bit doesn't need locks to atomically read a 64bit value.
|
||||
* So we have a optimization chance when the task's delta_exec is 0.
|
||||
* Reading ->on_cpu is racy, but this is ok.
|
||||
*
|
||||
* If we race with it leaving cpu, we'll take a lock. So we're correct.
|
||||
* If we race with it entering cpu, unaccounted time is 0. This is
|
||||
* indistinguishable from the read occurring a few cycles earlier.
|
||||
*/
|
||||
if (!p->on_cpu)
|
||||
return p->se.sum_exec_runtime;
|
||||
#endif
|
||||
|
||||
rq = task_rq_lock(p, &flags);
|
||||
ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
|
||||
task_rq_unlock(rq, p, &flags);
|
||||
|
Loading…
Reference in New Issue
Block a user