In the v6.12 scheduler development cycle we had 63 commits from 18 contributors:

- Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot de Oliveira's
    last major contribution to the kernel:
 
      "SCHED_DEADLINE servers can help fixing starvation issues of low priority
      tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU
      cycles. Today we have RT Throttling; DEADLINE servers should be able to
      replace and improve that."
 
      (Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes,
       Youssef Esmat, Huang Shijie)
 
  - Preparatory changes for sched_ext integration:
 
      - Use set_next_task(.first) where required
      - Fix up set_next_task() implementations
      - Clean up DL server vs. core sched
      - Split up put_prev_task_balance()
      - Rework pick_next_task()
      - Combine the last put_prev_task() and the first set_next_task()
      - Rework dl_server
      - Add put_prev_task(.next)
 
       (Peter Zijlstra, with a fix by Tejun Heo)
 
  - Complete the EEVDF transition and refine EEVDF scheduling:
 
      - Implement delayed dequeue
      - Allow shorter slices to wakeup-preempt
      - Use sched_attr::sched_runtime to set request/slice suggestion
      - Document the new feature flags
      - Remove unused and duplicate-functionality fields
      - Simplify & unify pick_next_task_fair()
      - Misc debuggability enhancements
 
       (Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann,
        Valentin Schneider and Chuyi Zhou)
 
  - Initialize the vruntime of a new task when it is first enqueued,
    resulting in significant decrease in latency of newly woken tasks.
    (Zhang Qiao)
 
  - Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
    (K Prateek Nayak, Peter Zijlstra)
 
  - Clean up and clarify the usage of Clean up usage of rt_task()
    (Qais Yousef)
 
  - Preempt SCHED_IDLE entities in strict cgroup hierarchies
    (Tianchen Ding)
 
  - Clarify the documentation of time units for deadline scheduler
    parameters. (Christian Loehle)
 
  - Remove the HZ_BW chicken-bit feature flag introduced a year ago,
    the original change seems to be working fine.
    (Phil Auld)
 
  - Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie,
    Peilin He, Qais Yousefm and Vincent Guittot)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmbr8qcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gdbw/+Mj3zWfYP+dtUkfgrR2FClPAJoo1/9Dz0
 LYD8XgYHu8rEJ0Aq+VbdkgYGUt9utvzUFPIxvWFDcldQl57KwhF4hp9Ir+PqJyYC
 NolQ1q8ddo1hnslxnEg6SgHVzQq/4FqMM0nDNUkQETCx6zTyFFeRf+q7o/2c2m5B
 uI9dSU1Wrx7XrXm2D3kB8+xP+ZRy+qhbFN5Pfuz96mhelfklylgKMfPzgAiCT/7T
 JTbQhQ2HdcCNgiLoSrWsHBDy2UYpouP4zb4jyd+lDQzhSUJrj3u4Xy4vVmuTKq+y
 sTgWlgKB+MTuh9UuJ4UYzSnMqg161UlMvtXeH84ABmAqDNGHRPtOKrrlcLtJ3D4x
 m1SPhNnsvpjOu2pH0XLIS8al3VUesWND5S+rucHRYSq6Nvhivf4MTvRJlicXXurL
 Mt2APnIlhGJuKBNWnmyZovVdtO0ZUUPlaZWfr3rCS4txAVo+HwWhsm3uhtTycQqN
 gazsCiuGh6Jds90ZqA/BvdLWG+DY8J0xLlV3ex4pCXuQ/HFrabVWTyThJsULhrZ2
 5mTdWIsocPctNMO9/RHMy7vJI7G7ljgHEquWVn5kiGGzXhK6VwVwKAMpfgXGw+YA
 yVP6/M7a7g2yEzj69gXkcDa8k/kedMVquJ/G/8YhZM7u7sPqsMjpmaGsqsJRfnpT
 ChngAzap+kA=
 =TEC6
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot
   de Oliveira's last major contribution to the kernel:

     "SCHED_DEADLINE servers can help fixing starvation issues of low
      priority tasks (e.g., SCHED_OTHER) when higher priority tasks
      monopolize CPU cycles. Today we have RT Throttling; DEADLINE
      servers should be able to replace and improve that."

   (Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef
   Esmat, Huang Shijie)

 - Preparatory changes for sched_ext integration:
     - Use set_next_task(.first) where required
     - Fix up set_next_task() implementations
     - Clean up DL server vs. core sched
     - Split up put_prev_task_balance()
     - Rework pick_next_task()
     - Combine the last put_prev_task() and the first set_next_task()
     - Rework dl_server
     - Add put_prev_task(.next)

   (Peter Zijlstra, with a fix by Tejun Heo)

 - Complete the EEVDF transition and refine EEVDF scheduling:
     - Implement delayed dequeue
     - Allow shorter slices to wakeup-preempt
     - Use sched_attr::sched_runtime to set request/slice suggestion
     - Document the new feature flags
     - Remove unused and duplicate-functionality fields
     - Simplify & unify pick_next_task_fair()
     - Misc debuggability enhancements

   (Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin
   Schneider and Chuyi Zhou)

 - Initialize the vruntime of a new task when it is first enqueued,
   resulting in significant decrease in latency of newly woken tasks
   (Zhang Qiao)

 - Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
   (K Prateek Nayak, Peter Zijlstra)

 - Clean up and clarify the usage of Clean up usage of rt_task()
   (Qais Yousef)

 - Preempt SCHED_IDLE entities in strict cgroup hierarchies
   (Tianchen Ding)

 - Clarify the documentation of time units for deadline scheduler
   parameters (Christian Loehle)

 - Remove the HZ_BW chicken-bit feature flag introduced a year ago,
   the original change seems to be working fine (Phil Auld)

 - Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie,
   Peilin He, Qais Yousefm and Vincent Guittot)

* tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched/cpufreq: Use NSEC_PER_MSEC for deadline task
  cpufreq/cppc: Use NSEC_PER_MSEC for deadline task
  sched/deadline: Clarify nanoseconds in uapi
  sched/deadline: Convert schedtool example to chrt
  sched/debug: Fix the runnable tasks output
  sched: Fix sched_delayed vs sched_core
  kernel/sched: Fix util_est accounting for DELAY_DEQUEUE
  kthread: Fix task state in kthread worker if being frozen
  sched/pelt: Use rq_clock_task() for hw_pressure
  sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c
  sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
  sched: Add put_prev_task(.next)
  sched: Rework dl_server
  sched: Combine the last put_prev_task() and the first set_next_task()
  sched: Rework pick_next_task()
  sched: Split up put_prev_task_balance()
  sched: Clean up DL server vs core sched
  sched: Fixup set_next_task() implementations
  sched: Use set_next_task(.first) where required
  sched/fair: Properly deactivate sched_delayed task upon class change
  ...
This commit is contained in:
Linus Torvalds 2024-09-19 15:55:58 +02:00
commit 2004cef11e
32 changed files with 1695 additions and 747 deletions

View File

@ -749,21 +749,19 @@ Appendix A. Test suite
of the command line options. Please refer to rt-app documentation for more of the command line options. Please refer to rt-app documentation for more
details (`<rt-app-sources>/doc/*.json`). details (`<rt-app-sources>/doc/*.json`).
The second testing application is a modification of schedtool, called The second testing application is done using chrt which has support
schedtool-dl, which can be used to setup SCHED_DEADLINE parameters for a for SCHED_DEADLINE.
certain pid/application. schedtool-dl is available at:
https://github.com/scheduler-tools/schedtool-dl.git.
The usage is straightforward:: The usage is straightforward::
# schedtool -E -t 10000000:100000000 -e ./my_cpuhog_app # chrt -d -T 10000000 -D 100000000 0 ./my_cpuhog_app
With this, my_cpuhog_app is put to run inside a SCHED_DEADLINE reservation With this, my_cpuhog_app is put to run inside a SCHED_DEADLINE reservation
of 10ms every 100ms (note that parameters are expressed in microseconds). of 10ms every 100ms (note that parameters are expressed in nanoseconds).
You can also use schedtool to create a reservation for an already running You can also use chrt to create a reservation for an already running
application, given that you know its pid:: application, given that you know its pid::
# schedtool -E -t 10000000:100000000 my_app_pid # chrt -d -T 10000000 -D 100000000 -p 0 my_app_pid
Appendix B. Minimal main() Appendix B. Minimal main()
========================== ==========================

View File

@ -224,9 +224,9 @@ static void __init cppc_freq_invariance_init(void)
* Fake (unused) bandwidth; workaround to "fix" * Fake (unused) bandwidth; workaround to "fix"
* priority inheritance. * priority inheritance.
*/ */
.sched_runtime = 1000000, .sched_runtime = NSEC_PER_MSEC,
.sched_deadline = 10000000, .sched_deadline = 10 * NSEC_PER_MSEC,
.sched_period = 10000000, .sched_period = 10 * NSEC_PER_MSEC,
}; };
int ret; int ret;

View File

@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
*/ */
rcu_read_lock(); rcu_read_lock();
struct task_struct *owner = READ_ONCE(lock->owner); struct task_struct *owner = READ_ONCE(lock->owner);
bool ret = owner ? owner_on_cpu(owner) : !rt_task(current); bool ret = owner ? owner_on_cpu(owner) : !rt_or_dl_task(current);
rcu_read_unlock(); rcu_read_unlock();
return ret; return ret;

View File

@ -2626,7 +2626,7 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
} }
task_lock(p); task_lock(p);
if (task_is_realtime(p)) if (rt_or_dl_task_policy(p))
slack_ns = 0; slack_ns = 0;
else if (slack_ns == 0) else if (slack_ns == 0)
slack_ns = p->default_timer_slack_ns; slack_ns = p->default_timer_slack_ns;

View File

@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
{ {
if (task->policy == SCHED_IDLE) if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE; return IOPRIO_CLASS_IDLE;
else if (task_is_realtime(task)) else if (rt_or_dl_task_policy(task))
return IOPRIO_CLASS_RT; return IOPRIO_CLASS_RT;
else else
return IOPRIO_CLASS_BE; return IOPRIO_CLASS_BE;

View File

@ -149,8 +149,9 @@ struct user_event_mm;
* Special states are those that do not use the normal wait-loop pattern. See * Special states are those that do not use the normal wait-loop pattern. See
* the comment with set_special_state(). * the comment with set_special_state().
*/ */
#define is_special_task_state(state) \ #define is_special_task_state(state) \
((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | TASK_DEAD)) ((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \
TASK_DEAD | TASK_FROZEN))
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
# define debug_normal_state_change(state_value) \ # define debug_normal_state_change(state_value) \
@ -541,9 +542,14 @@ struct sched_entity {
struct rb_node run_node; struct rb_node run_node;
u64 deadline; u64 deadline;
u64 min_vruntime; u64 min_vruntime;
u64 min_slice;
struct list_head group_node; struct list_head group_node;
unsigned int on_rq; unsigned char on_rq;
unsigned char sched_delayed;
unsigned char rel_deadline;
unsigned char custom_slice;
/* hole */
u64 exec_start; u64 exec_start;
u64 sum_exec_runtime; u64 sum_exec_runtime;
@ -639,12 +645,26 @@ struct sched_dl_entity {
* *
* @dl_overrun tells if the task asked to be informed about runtime * @dl_overrun tells if the task asked to be informed about runtime
* overruns. * overruns.
*
* @dl_server tells if this is a server entity.
*
* @dl_defer tells if this is a deferred or regular server. For
* now only defer server exists.
*
* @dl_defer_armed tells if the deferrable server is waiting
* for the replenishment timer to activate it.
*
* @dl_defer_running tells if the deferrable server is actually
* running, skipping the defer phase.
*/ */
unsigned int dl_throttled : 1; unsigned int dl_throttled : 1;
unsigned int dl_yielded : 1; unsigned int dl_yielded : 1;
unsigned int dl_non_contending : 1; unsigned int dl_non_contending : 1;
unsigned int dl_overrun : 1; unsigned int dl_overrun : 1;
unsigned int dl_server : 1; unsigned int dl_server : 1;
unsigned int dl_defer : 1;
unsigned int dl_defer_armed : 1;
unsigned int dl_defer_running : 1;
/* /*
* Bandwidth enforcement timer. Each -deadline task has its * Bandwidth enforcement timer. Each -deadline task has its
@ -672,7 +692,7 @@ struct sched_dl_entity {
*/ */
struct rq *rq; struct rq *rq;
dl_server_has_tasks_f server_has_tasks; dl_server_has_tasks_f server_has_tasks;
dl_server_pick_f server_pick; dl_server_pick_f server_pick_task;
#ifdef CONFIG_RT_MUTEXES #ifdef CONFIG_RT_MUTEXES
/* /*

View File

@ -10,16 +10,16 @@
#include <linux/sched.h> #include <linux/sched.h>
#define MAX_DL_PRIO 0 static inline bool dl_prio(int prio)
static inline int dl_prio(int prio)
{ {
if (unlikely(prio < MAX_DL_PRIO)) return unlikely(prio < MAX_DL_PRIO);
return 1;
return 0;
} }
static inline int dl_task(struct task_struct *p) /*
* Returns true if a task has a priority that belongs to DL class. PI-boosted
* tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
*/
static inline bool dl_task(struct task_struct *p)
{ {
return dl_prio(p->prio); return dl_prio(p->prio);
} }

View File

@ -14,6 +14,7 @@
*/ */
#define MAX_RT_PRIO 100 #define MAX_RT_PRIO 100
#define MAX_DL_PRIO 0
#define MAX_PRIO (MAX_RT_PRIO + NICE_WIDTH) #define MAX_PRIO (MAX_RT_PRIO + NICE_WIDTH)
#define DEFAULT_PRIO (MAX_RT_PRIO + NICE_WIDTH / 2) #define DEFAULT_PRIO (MAX_RT_PRIO + NICE_WIDTH / 2)

View File

@ -6,19 +6,40 @@
struct task_struct; struct task_struct;
static inline int rt_prio(int prio) static inline bool rt_prio(int prio)
{ {
if (unlikely(prio < MAX_RT_PRIO)) return unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO);
return 1;
return 0;
} }
static inline int rt_task(struct task_struct *p) static inline bool rt_or_dl_prio(int prio)
{
return unlikely(prio < MAX_RT_PRIO);
}
/*
* Returns true if a task has a priority that belongs to RT class. PI-boosted
* tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
*/
static inline bool rt_task(struct task_struct *p)
{ {
return rt_prio(p->prio); return rt_prio(p->prio);
} }
static inline bool task_is_realtime(struct task_struct *tsk) /*
* Returns true if a task has a priority that belongs to RT or DL classes.
* PI-boosted tasks will return true. Use rt_or_dl_task_policy() to ignore
* PI-boosted tasks.
*/
static inline bool rt_or_dl_task(struct task_struct *p)
{
return rt_or_dl_prio(p->prio);
}
/*
* Returns true if a task has a policy that belongs to RT or DL classes.
* PI-boosted tasks will return false.
*/
static inline bool rt_or_dl_task_policy(struct task_struct *tsk)
{ {
int policy = tsk->policy; int policy = tsk->policy;

View File

@ -58,9 +58,9 @@
* *
* This is reflected by the following fields of the sched_attr structure: * This is reflected by the following fields of the sched_attr structure:
* *
* @sched_deadline representative of the task's deadline * @sched_deadline representative of the task's deadline in nanoseconds
* @sched_runtime representative of the task's runtime * @sched_runtime representative of the task's runtime in nanoseconds
* @sched_period representative of the task's period * @sched_period representative of the task's period in nanoseconds
* *
* Given this task model, there are a multiplicity of scheduling algorithms * Given this task model, there are a multiplicity of scheduling algorithms
* and policies, that can be used to ensure all the tasks will make their * and policies, that can be used to ensure all the tasks will make their

View File

@ -72,7 +72,7 @@ bool __refrigerator(bool check_kthr_stop)
bool freeze; bool freeze;
raw_spin_lock_irq(&current->pi_lock); raw_spin_lock_irq(&current->pi_lock);
set_current_state(TASK_FROZEN); WRITE_ONCE(current->__state, TASK_FROZEN);
/* unstale saved_state so that __thaw_task() will wake us up */ /* unstale saved_state so that __thaw_task() will wake us up */
current->saved_state = TASK_RUNNING; current->saved_state = TASK_RUNNING;
raw_spin_unlock_irq(&current->pi_lock); raw_spin_unlock_irq(&current->pi_lock);

View File

@ -845,8 +845,16 @@ repeat:
* event only cares about the address. * event only cares about the address.
*/ */
trace_sched_kthread_work_execute_end(work, func); trace_sched_kthread_work_execute_end(work, func);
} else if (!freezing(current)) } else if (!freezing(current)) {
schedule(); schedule();
} else {
/*
* Handle the case where the current remains
* TASK_INTERRUPTIBLE. try_to_freeze() expects
* the current to be TASK_RUNNING.
*/
__set_current_state(TASK_RUNNING);
}
try_to_freeze(); try_to_freeze();
cond_resched(); cond_resched();

View File

@ -347,7 +347,7 @@ static __always_inline int __waiter_prio(struct task_struct *task)
{ {
int prio = task->prio; int prio = task->prio;
if (!rt_prio(prio)) if (!rt_or_dl_prio(prio))
return DEFAULT_PRIO; return DEFAULT_PRIO;
return prio; return prio;
@ -435,7 +435,7 @@ static inline bool rt_mutex_steal(struct rt_mutex_waiter *waiter,
* Note that RT tasks are excluded from same priority (lateral) * Note that RT tasks are excluded from same priority (lateral)
* steals to prevent the introduction of an unbounded latency. * steals to prevent the introduction of an unbounded latency.
*/ */
if (rt_prio(waiter->tree.prio) || dl_prio(waiter->tree.prio)) if (rt_or_dl_prio(waiter->tree.prio))
return false; return false;
return rt_waiter_node_equal(&waiter->tree, &top_waiter->tree); return rt_waiter_node_equal(&waiter->tree, &top_waiter->tree);

View File

@ -631,7 +631,7 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
* if it is an RT task or wait in the wait queue * if it is an RT task or wait in the wait queue
* for too long. * for too long.
*/ */
if (has_handoff || (!rt_task(waiter->task) && if (has_handoff || (!rt_or_dl_task(waiter->task) &&
!time_after(jiffies, waiter->timeout))) !time_after(jiffies, waiter->timeout)))
return false; return false;
@ -914,7 +914,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
if (owner_state != OWNER_WRITER) { if (owner_state != OWNER_WRITER) {
if (need_resched()) if (need_resched())
break; break;
if (rt_task(current) && if (rt_or_dl_task(current) &&
(prev_owner_state != OWNER_WRITER)) (prev_owner_state != OWNER_WRITER))
break; break;
} }

View File

@ -237,7 +237,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
int a_prio = a->task->prio; int a_prio = a->task->prio;
int b_prio = b->task->prio; int b_prio = b->task->prio;
if (rt_prio(a_prio) || rt_prio(b_prio)) { if (rt_or_dl_prio(a_prio) || rt_or_dl_prio(b_prio)) {
if (a_prio > b_prio) if (a_prio > b_prio)
return true; return true;

View File

@ -163,7 +163,10 @@ static inline int __task_prio(const struct task_struct *p)
if (p->sched_class == &stop_sched_class) /* trumps deadline */ if (p->sched_class == &stop_sched_class) /* trumps deadline */
return -2; return -2;
if (rt_prio(p->prio)) /* includes deadline */ if (p->dl_server)
return -1; /* deadline */
if (rt_or_dl_prio(p->prio))
return p->prio; /* [-1, 99] */ return p->prio; /* [-1, 99] */
if (p->sched_class == &idle_sched_class) if (p->sched_class == &idle_sched_class)
@ -192,8 +195,24 @@ static inline bool prio_less(const struct task_struct *a,
if (-pb < -pa) if (-pb < -pa)
return false; return false;
if (pa == -1) /* dl_prio() doesn't work because of stop_class above */ if (pa == -1) { /* dl_prio() doesn't work because of stop_class above */
return !dl_time_before(a->dl.deadline, b->dl.deadline); const struct sched_dl_entity *a_dl, *b_dl;
a_dl = &a->dl;
/*
* Since,'a' and 'b' can be CFS tasks served by DL server,
* __task_prio() can return -1 (for DL) even for those. In that
* case, get to the dl_server's DL entity.
*/
if (a->dl_server)
a_dl = a->dl_server;
b_dl = &b->dl;
if (b->dl_server)
b_dl = b->dl_server;
return !dl_time_before(a_dl->deadline, b_dl->deadline);
}
if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */ if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */
return cfs_prio_less(a, b, in_fi); return cfs_prio_less(a, b, in_fi);
@ -240,6 +259,9 @@ static inline int rb_sched_core_cmp(const void *key, const struct rb_node *node)
void sched_core_enqueue(struct rq *rq, struct task_struct *p) void sched_core_enqueue(struct rq *rq, struct task_struct *p)
{ {
if (p->se.sched_delayed)
return;
rq->core->core_task_seq++; rq->core->core_task_seq++;
if (!p->core_cookie) if (!p->core_cookie)
@ -250,6 +272,9 @@ void sched_core_enqueue(struct rq *rq, struct task_struct *p)
void sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags) void sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags)
{ {
if (p->se.sched_delayed)
return;
rq->core->core_task_seq++; rq->core->core_task_seq++;
if (sched_core_enqueued(p)) { if (sched_core_enqueued(p)) {
@ -1269,7 +1294,7 @@ bool sched_can_stop_tick(struct rq *rq)
* dequeued by migrating while the constrained task continues to run. * dequeued by migrating while the constrained task continues to run.
* E.g. going from 2->1 without going through pick_next_task(). * E.g. going from 2->1 without going through pick_next_task().
*/ */
if (sched_feat(HZ_BW) && __need_bw_check(rq, rq->curr)) { if (__need_bw_check(rq, rq->curr)) {
if (cfs_task_bw_constrained(rq->curr)) if (cfs_task_bw_constrained(rq->curr))
return false; return false;
} }
@ -1672,6 +1697,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
if (unlikely(!p->sched_class->uclamp_enabled)) if (unlikely(!p->sched_class->uclamp_enabled))
return; return;
if (p->se.sched_delayed)
return;
for_each_clamp_id(clamp_id) for_each_clamp_id(clamp_id)
uclamp_rq_inc_id(rq, p, clamp_id); uclamp_rq_inc_id(rq, p, clamp_id);
@ -1696,6 +1724,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
if (unlikely(!p->sched_class->uclamp_enabled)) if (unlikely(!p->sched_class->uclamp_enabled))
return; return;
if (p->se.sched_delayed)
return;
for_each_clamp_id(clamp_id) for_each_clamp_id(clamp_id)
uclamp_rq_dec_id(rq, p, clamp_id); uclamp_rq_dec_id(rq, p, clamp_id);
} }
@ -1975,14 +2006,21 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)); psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED));
} }
uclamp_rq_inc(rq, p);
p->sched_class->enqueue_task(rq, p, flags); p->sched_class->enqueue_task(rq, p, flags);
/*
* Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear
* ->sched_delayed.
*/
uclamp_rq_inc(rq, p);
if (sched_core_enabled(rq)) if (sched_core_enabled(rq))
sched_core_enqueue(rq, p); sched_core_enqueue(rq, p);
} }
void dequeue_task(struct rq *rq, struct task_struct *p, int flags) /*
* Must only return false when DEQUEUE_SLEEP.
*/
inline bool dequeue_task(struct rq *rq, struct task_struct *p, int flags)
{ {
if (sched_core_enabled(rq)) if (sched_core_enabled(rq))
sched_core_dequeue(rq, p, flags); sched_core_dequeue(rq, p, flags);
@ -1995,8 +2033,12 @@ void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
psi_dequeue(p, flags & DEQUEUE_SLEEP); psi_dequeue(p, flags & DEQUEUE_SLEEP);
} }
/*
* Must be before ->dequeue_task() because ->dequeue_task() can 'fail'
* and mark the task ->sched_delayed.
*/
uclamp_rq_dec(rq, p); uclamp_rq_dec(rq, p);
p->sched_class->dequeue_task(rq, p, flags); return p->sched_class->dequeue_task(rq, p, flags);
} }
void activate_task(struct rq *rq, struct task_struct *p, int flags) void activate_task(struct rq *rq, struct task_struct *p, int flags)
@ -2014,12 +2056,25 @@ void activate_task(struct rq *rq, struct task_struct *p, int flags)
void deactivate_task(struct rq *rq, struct task_struct *p, int flags) void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
{ {
WRITE_ONCE(p->on_rq, (flags & DEQUEUE_SLEEP) ? 0 : TASK_ON_RQ_MIGRATING); SCHED_WARN_ON(flags & DEQUEUE_SLEEP);
WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING);
ASSERT_EXCLUSIVE_WRITER(p->on_rq); ASSERT_EXCLUSIVE_WRITER(p->on_rq);
/*
* Code explicitly relies on TASK_ON_RQ_MIGRATING begin set *before*
* dequeue_task() and cleared *after* enqueue_task().
*/
dequeue_task(rq, p, flags); dequeue_task(rq, p, flags);
} }
static void block_task(struct rq *rq, struct task_struct *p, int flags)
{
if (dequeue_task(rq, p, DEQUEUE_SLEEP | flags))
__block_task(rq, p);
}
/** /**
* task_curr - is this task currently executing on a CPU? * task_curr - is this task currently executing on a CPU?
* @p: the task in question. * @p: the task in question.
@ -2233,6 +2288,12 @@ void migrate_disable(void)
struct task_struct *p = current; struct task_struct *p = current;
if (p->migration_disabled) { if (p->migration_disabled) {
#ifdef CONFIG_DEBUG_PREEMPT
/*
*Warn about overflow half-way through the range.
*/
WARN_ON_ONCE((s16)p->migration_disabled < 0);
#endif
p->migration_disabled++; p->migration_disabled++;
return; return;
} }
@ -2251,14 +2312,20 @@ void migrate_enable(void)
.flags = SCA_MIGRATE_ENABLE, .flags = SCA_MIGRATE_ENABLE,
}; };
#ifdef CONFIG_DEBUG_PREEMPT
/*
* Check both overflow from migrate_disable() and superfluous
* migrate_enable().
*/
if (WARN_ON_ONCE((s16)p->migration_disabled <= 0))
return;
#endif
if (p->migration_disabled > 1) { if (p->migration_disabled > 1) {
p->migration_disabled--; p->migration_disabled--;
return; return;
} }
if (WARN_ON_ONCE(!p->migration_disabled))
return;
/* /*
* Ensure stop_task runs either before or after this, and that * Ensure stop_task runs either before or after this, and that
* __set_cpus_allowed_ptr(SCA_MIGRATE_ENABLE) doesn't schedule(). * __set_cpus_allowed_ptr(SCA_MIGRATE_ENABLE) doesn't schedule().
@ -3607,8 +3674,6 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
rq->idle_stamp = 0; rq->idle_stamp = 0;
} }
#endif #endif
p->dl_server = NULL;
} }
/* /*
@ -3644,12 +3709,14 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
rq = __task_rq_lock(p, &rf); rq = __task_rq_lock(p, &rf);
if (task_on_rq_queued(p)) { if (task_on_rq_queued(p)) {
update_rq_clock(rq);
if (p->se.sched_delayed)
enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
if (!task_on_cpu(rq, p)) { if (!task_on_cpu(rq, p)) {
/* /*
* When on_rq && !on_cpu the task is preempted, see if * When on_rq && !on_cpu the task is preempted, see if
* it should preempt the task that is current now. * it should preempt the task that is current now.
*/ */
update_rq_clock(rq);
wakeup_preempt(rq, p, wake_flags); wakeup_preempt(rq, p, wake_flags);
} }
ttwu_do_wakeup(p); ttwu_do_wakeup(p);
@ -4029,11 +4096,16 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* case the whole 'p->on_rq && ttwu_runnable()' case below * case the whole 'p->on_rq && ttwu_runnable()' case below
* without taking any locks. * without taking any locks.
* *
* Specifically, given current runs ttwu() we must be before
* schedule()'s block_task(), as such this must not observe
* sched_delayed.
*
* In particular: * In particular:
* - we rely on Program-Order guarantees for all the ordering, * - we rely on Program-Order guarantees for all the ordering,
* - we're serialized against set_special_state() by virtue of * - we're serialized against set_special_state() by virtue of
* it disabling IRQs (this allows not taking ->pi_lock). * it disabling IRQs (this allows not taking ->pi_lock).
*/ */
SCHED_WARN_ON(p->se.sched_delayed);
if (!ttwu_state_match(p, state, &success)) if (!ttwu_state_match(p, state, &success))
goto out; goto out;
@ -4322,9 +4394,11 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
p->se.nr_migrations = 0; p->se.nr_migrations = 0;
p->se.vruntime = 0; p->se.vruntime = 0;
p->se.vlag = 0; p->se.vlag = 0;
p->se.slice = sysctl_sched_base_slice;
INIT_LIST_HEAD(&p->se.group_node); INIT_LIST_HEAD(&p->se.group_node);
/* A delayed task cannot be in clone(). */
SCHED_WARN_ON(p->se.sched_delayed);
#ifdef CONFIG_FAIR_GROUP_SCHED #ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = NULL; p->se.cfs_rq = NULL;
#endif #endif
@ -4572,6 +4646,8 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
p->prio = p->normal_prio = p->static_prio; p->prio = p->normal_prio = p->static_prio;
set_load_weight(p, false); set_load_weight(p, false);
p->se.custom_slice = 0;
p->se.slice = sysctl_sched_base_slice;
/* /*
* We don't need the reset flag anymore after the fork. It has * We don't need the reset flag anymore after the fork. It has
@ -4686,7 +4762,7 @@ void wake_up_new_task(struct task_struct *p)
update_rq_clock(rq); update_rq_clock(rq);
post_init_entity_util_avg(p); post_init_entity_util_avg(p);
activate_task(rq, p, ENQUEUE_NOCLOCK); activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
trace_sched_wakeup_new(p); trace_sched_wakeup_new(p);
wakeup_preempt(rq, p, WF_FORK); wakeup_preempt(rq, p, WF_FORK);
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
@ -5769,8 +5845,8 @@ static inline void schedule_debug(struct task_struct *prev, bool preempt)
schedstat_inc(this_rq()->sched_count); schedstat_inc(this_rq()->sched_count);
} }
static void put_prev_task_balance(struct rq *rq, struct task_struct *prev, static void prev_balance(struct rq *rq, struct task_struct *prev,
struct rq_flags *rf) struct rq_flags *rf)
{ {
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
const struct sched_class *class; const struct sched_class *class;
@ -5787,8 +5863,6 @@ static void put_prev_task_balance(struct rq *rq, struct task_struct *prev,
break; break;
} }
#endif #endif
put_prev_task(rq, prev);
} }
/* /*
@ -5800,6 +5874,8 @@ __pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
const struct sched_class *class; const struct sched_class *class;
struct task_struct *p; struct task_struct *p;
rq->dl_server = NULL;
/* /*
* Optimization: we know that if all tasks are in the fair class we can * Optimization: we know that if all tasks are in the fair class we can
* call that function directly, but only if the @prev task wasn't of a * call that function directly, but only if the @prev task wasn't of a
@ -5815,35 +5891,28 @@ __pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
/* Assume the next prioritized class is idle_sched_class */ /* Assume the next prioritized class is idle_sched_class */
if (!p) { if (!p) {
put_prev_task(rq, prev); p = pick_task_idle(rq);
p = pick_next_task_idle(rq); put_prev_set_next_task(rq, prev, p);
} }
/*
* This is the fast path; it cannot be a DL server pick;
* therefore even if @p == @prev, ->dl_server must be NULL.
*/
if (p->dl_server)
p->dl_server = NULL;
return p; return p;
} }
restart: restart:
put_prev_task_balance(rq, prev, rf); prev_balance(rq, prev, rf);
/*
* We've updated @prev and no longer need the server link, clear it.
* Must be done before ->pick_next_task() because that can (re)set
* ->dl_server.
*/
if (prev->dl_server)
prev->dl_server = NULL;
for_each_class(class) { for_each_class(class) {
p = class->pick_next_task(rq); if (class->pick_next_task) {
if (p) p = class->pick_next_task(rq, prev);
return p; if (p)
return p;
} else {
p = class->pick_task(rq);
if (p) {
put_prev_set_next_task(rq, prev, p);
return p;
}
}
} }
BUG(); /* The idle class should always have a runnable task. */ BUG(); /* The idle class should always have a runnable task. */
@ -5873,6 +5942,8 @@ static inline struct task_struct *pick_task(struct rq *rq)
const struct sched_class *class; const struct sched_class *class;
struct task_struct *p; struct task_struct *p;
rq->dl_server = NULL;
for_each_class(class) { for_each_class(class) {
p = class->pick_task(rq); p = class->pick_task(rq);
if (p) if (p)
@ -5911,6 +5982,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* another cpu during offline. * another cpu during offline.
*/ */
rq->core_pick = NULL; rq->core_pick = NULL;
rq->core_dl_server = NULL;
return __pick_next_task(rq, prev, rf); return __pick_next_task(rq, prev, rf);
} }
@ -5929,16 +6001,13 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
WRITE_ONCE(rq->core_sched_seq, rq->core->core_pick_seq); WRITE_ONCE(rq->core_sched_seq, rq->core->core_pick_seq);
next = rq->core_pick; next = rq->core_pick;
if (next != prev) { rq->dl_server = rq->core_dl_server;
put_prev_task(rq, prev);
set_next_task(rq, next);
}
rq->core_pick = NULL; rq->core_pick = NULL;
goto out; rq->core_dl_server = NULL;
goto out_set_next;
} }
put_prev_task_balance(rq, prev, rf); prev_balance(rq, prev, rf);
smt_mask = cpu_smt_mask(cpu); smt_mask = cpu_smt_mask(cpu);
need_sync = !!rq->core->core_cookie; need_sync = !!rq->core->core_cookie;
@ -5979,6 +6048,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
next = pick_task(rq); next = pick_task(rq);
if (!next->core_cookie) { if (!next->core_cookie) {
rq->core_pick = NULL; rq->core_pick = NULL;
rq->core_dl_server = NULL;
/* /*
* For robustness, update the min_vruntime_fi for * For robustness, update the min_vruntime_fi for
* unconstrained picks as well. * unconstrained picks as well.
@ -6006,7 +6076,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
if (i != cpu && (rq_i != rq->core || !core_clock_updated)) if (i != cpu && (rq_i != rq->core || !core_clock_updated))
update_rq_clock(rq_i); update_rq_clock(rq_i);
p = rq_i->core_pick = pick_task(rq_i); rq_i->core_pick = p = pick_task(rq_i);
rq_i->core_dl_server = rq_i->dl_server;
if (!max || prio_less(max, p, fi_before)) if (!max || prio_less(max, p, fi_before))
max = p; max = p;
} }
@ -6030,6 +6102,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
} }
rq_i->core_pick = p; rq_i->core_pick = p;
rq_i->core_dl_server = NULL;
if (p == rq_i->idle) { if (p == rq_i->idle) {
if (rq_i->nr_running) { if (rq_i->nr_running) {
@ -6090,6 +6163,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
if (i == cpu) { if (i == cpu) {
rq_i->core_pick = NULL; rq_i->core_pick = NULL;
rq_i->core_dl_server = NULL;
continue; continue;
} }
@ -6098,6 +6172,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
if (rq_i->curr == rq_i->core_pick) { if (rq_i->curr == rq_i->core_pick) {
rq_i->core_pick = NULL; rq_i->core_pick = NULL;
rq_i->core_dl_server = NULL;
continue; continue;
} }
@ -6105,8 +6180,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
} }
out_set_next: out_set_next:
set_next_task(rq, next); put_prev_set_next_task(rq, prev, next);
out:
if (rq->core->core_forceidle_count && next == rq->idle) if (rq->core->core_forceidle_count && next == rq->idle)
queue_core_balance(rq); queue_core_balance(rq);
@ -6342,19 +6416,12 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* Constants for the sched_mode argument of __schedule(). * Constants for the sched_mode argument of __schedule().
* *
* The mode argument allows RT enabled kernels to differentiate a * The mode argument allows RT enabled kernels to differentiate a
* preemption from blocking on an 'sleeping' spin/rwlock. Note that * preemption from blocking on an 'sleeping' spin/rwlock.
* SM_MASK_PREEMPT for !RT has all bits set, which allows the compiler to
* optimize the AND operation out and just check for zero.
*/ */
#define SM_NONE 0x0 #define SM_IDLE (-1)
#define SM_PREEMPT 0x1 #define SM_NONE 0
#define SM_RTLOCK_WAIT 0x2 #define SM_PREEMPT 1
#define SM_RTLOCK_WAIT 2
#ifndef CONFIG_PREEMPT_RT
# define SM_MASK_PREEMPT (~0U)
#else
# define SM_MASK_PREEMPT SM_PREEMPT
#endif
/* /*
* __schedule() is the main scheduler function. * __schedule() is the main scheduler function.
@ -6395,9 +6462,14 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* *
* WARNING: must be called with preemption disabled! * WARNING: must be called with preemption disabled!
*/ */
static void __sched notrace __schedule(unsigned int sched_mode) static void __sched notrace __schedule(int sched_mode)
{ {
struct task_struct *prev, *next; struct task_struct *prev, *next;
/*
* On PREEMPT_RT kernel, SM_RTLOCK_WAIT is noted
* as a preemption by schedule_debug() and RCU.
*/
bool preempt = sched_mode > SM_NONE;
unsigned long *switch_count; unsigned long *switch_count;
unsigned long prev_state; unsigned long prev_state;
struct rq_flags rf; struct rq_flags rf;
@ -6408,13 +6480,13 @@ static void __sched notrace __schedule(unsigned int sched_mode)
rq = cpu_rq(cpu); rq = cpu_rq(cpu);
prev = rq->curr; prev = rq->curr;
schedule_debug(prev, !!sched_mode); schedule_debug(prev, preempt);
if (sched_feat(HRTICK) || sched_feat(HRTICK_DL)) if (sched_feat(HRTICK) || sched_feat(HRTICK_DL))
hrtick_clear(rq); hrtick_clear(rq);
local_irq_disable(); local_irq_disable();
rcu_note_context_switch(!!sched_mode); rcu_note_context_switch(preempt);
/* /*
* Make sure that signal_pending_state()->signal_pending() below * Make sure that signal_pending_state()->signal_pending() below
@ -6443,22 +6515,32 @@ static void __sched notrace __schedule(unsigned int sched_mode)
switch_count = &prev->nivcsw; switch_count = &prev->nivcsw;
/* Task state changes only considers SM_PREEMPT as preemption */
preempt = sched_mode == SM_PREEMPT;
/* /*
* We must load prev->state once (task_struct::state is volatile), such * We must load prev->state once (task_struct::state is volatile), such
* that we form a control dependency vs deactivate_task() below. * that we form a control dependency vs deactivate_task() below.
*/ */
prev_state = READ_ONCE(prev->__state); prev_state = READ_ONCE(prev->__state);
if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) { if (sched_mode == SM_IDLE) {
if (!rq->nr_running) {
next = prev;
goto picked;
}
} else if (!preempt && prev_state) {
if (signal_pending_state(prev_state, prev)) { if (signal_pending_state(prev_state, prev)) {
WRITE_ONCE(prev->__state, TASK_RUNNING); WRITE_ONCE(prev->__state, TASK_RUNNING);
} else { } else {
int flags = DEQUEUE_NOCLOCK;
prev->sched_contributes_to_load = prev->sched_contributes_to_load =
(prev_state & TASK_UNINTERRUPTIBLE) && (prev_state & TASK_UNINTERRUPTIBLE) &&
!(prev_state & TASK_NOLOAD) && !(prev_state & TASK_NOLOAD) &&
!(prev_state & TASK_FROZEN); !(prev_state & TASK_FROZEN);
if (prev->sched_contributes_to_load) if (unlikely(is_special_task_state(prev_state)))
rq->nr_uninterruptible++; flags |= DEQUEUE_SPECIAL;
/* /*
* __schedule() ttwu() * __schedule() ttwu()
@ -6471,17 +6553,13 @@ static void __sched notrace __schedule(unsigned int sched_mode)
* *
* After this, schedule() must not care about p->state any more. * After this, schedule() must not care about p->state any more.
*/ */
deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); block_task(rq, prev, flags);
if (prev->in_iowait) {
atomic_inc(&rq->nr_iowait);
delayacct_blkio_start();
}
} }
switch_count = &prev->nvcsw; switch_count = &prev->nvcsw;
} }
next = pick_next_task(rq, prev, &rf); next = pick_next_task(rq, prev, &rf);
picked:
clear_tsk_need_resched(prev); clear_tsk_need_resched(prev);
clear_preempt_need_resched(); clear_preempt_need_resched();
#ifdef CONFIG_SCHED_DEBUG #ifdef CONFIG_SCHED_DEBUG
@ -6523,7 +6601,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
psi_account_irqtime(rq, prev, next); psi_account_irqtime(rq, prev, next);
psi_sched_switch(prev, next, !task_on_rq_queued(prev)); psi_sched_switch(prev, next, !task_on_rq_queued(prev));
trace_sched_switch(sched_mode & SM_MASK_PREEMPT, prev, next, prev_state); trace_sched_switch(preempt, prev, next, prev_state);
/* Also unlocks the rq: */ /* Also unlocks the rq: */
rq = context_switch(rq, prev, next, &rf); rq = context_switch(rq, prev, next, &rf);
@ -6599,7 +6677,7 @@ static void sched_update_worker(struct task_struct *tsk)
} }
} }
static __always_inline void __schedule_loop(unsigned int sched_mode) static __always_inline void __schedule_loop(int sched_mode)
{ {
do { do {
preempt_disable(); preempt_disable();
@ -6644,7 +6722,7 @@ void __sched schedule_idle(void)
*/ */
WARN_ON_ONCE(current->__state); WARN_ON_ONCE(current->__state);
do { do {
__schedule(SM_NONE); __schedule(SM_IDLE);
} while (need_resched()); } while (need_resched());
} }
@ -8228,8 +8306,6 @@ void __init sched_init(void)
#endif /* CONFIG_RT_GROUP_SCHED */ #endif /* CONFIG_RT_GROUP_SCHED */
} }
init_rt_bandwidth(&def_rt_bandwidth, global_rt_period(), global_rt_runtime());
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
init_defrootdomain(); init_defrootdomain();
#endif #endif
@ -8284,8 +8360,13 @@ void __init sched_init(void)
init_tg_cfs_entry(&root_task_group, &rq->cfs, NULL, i, NULL); init_tg_cfs_entry(&root_task_group, &rq->cfs, NULL, i, NULL);
#endif /* CONFIG_FAIR_GROUP_SCHED */ #endif /* CONFIG_FAIR_GROUP_SCHED */
rq->rt.rt_runtime = def_rt_bandwidth.rt_runtime;
#ifdef CONFIG_RT_GROUP_SCHED #ifdef CONFIG_RT_GROUP_SCHED
/*
* This is required for init cpu because rt.c:__enable_runtime()
* starts working after scheduler_running, which is not the case
* yet.
*/
rq->rt.rt_runtime = global_rt_runtime();
init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL); init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL);
#endif #endif
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
@ -8317,10 +8398,12 @@ void __init sched_init(void)
#endif /* CONFIG_SMP */ #endif /* CONFIG_SMP */
hrtick_rq_init(rq); hrtick_rq_init(rq);
atomic_set(&rq->nr_iowait, 0); atomic_set(&rq->nr_iowait, 0);
fair_server_init(rq);
#ifdef CONFIG_SCHED_CORE #ifdef CONFIG_SCHED_CORE
rq->core = rq; rq->core = rq;
rq->core_pick = NULL; rq->core_pick = NULL;
rq->core_dl_server = NULL;
rq->core_enabled = 0; rq->core_enabled = 0;
rq->core_tree = RB_ROOT; rq->core_tree = RB_ROOT;
rq->core_forceidle_count = 0; rq->core_forceidle_count = 0;
@ -8333,6 +8416,7 @@ void __init sched_init(void)
} }
set_load_weight(&init_task, false); set_load_weight(&init_task, false);
init_task.se.slice = sysctl_sched_base_slice,
/* /*
* The boot idle thread does lazy MMU switching as well: * The boot idle thread does lazy MMU switching as well:
@ -8548,7 +8632,7 @@ void normalize_rt_tasks(void)
schedstat_set(p->stats.sleep_start, 0); schedstat_set(p->stats.sleep_start, 0);
schedstat_set(p->stats.block_start, 0); schedstat_set(p->stats.block_start, 0);
if (!dl_task(p) && !rt_task(p)) { if (!rt_or_dl_task(p)) {
/* /*
* Renice negative nice level userspace * Renice negative nice level userspace
* tasks back to 0: * tasks back to 0:

View File

@ -654,9 +654,9 @@ static int sugov_kthread_create(struct sugov_policy *sg_policy)
* Fake (unused) bandwidth; workaround to "fix" * Fake (unused) bandwidth; workaround to "fix"
* priority inheritance. * priority inheritance.
*/ */
.sched_runtime = 1000000, .sched_runtime = NSEC_PER_MSEC,
.sched_deadline = 10000000, .sched_deadline = 10 * NSEC_PER_MSEC,
.sched_period = 10000000, .sched_period = 10 * NSEC_PER_MSEC,
}; };
struct cpufreq_policy *policy = sg_policy->policy; struct cpufreq_policy *policy = sg_policy->policy;
int ret; int ret;

View File

@ -320,19 +320,12 @@ void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
__sub_running_bw(dl_se->dl_bw, dl_rq); __sub_running_bw(dl_se->dl_bw, dl_rq);
} }
static void dl_change_utilization(struct task_struct *p, u64 new_bw) static void dl_rq_change_utilization(struct rq *rq, struct sched_dl_entity *dl_se, u64 new_bw)
{ {
struct rq *rq; if (dl_se->dl_non_contending) {
sub_running_bw(dl_se, &rq->dl);
dl_se->dl_non_contending = 0;
WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV);
if (task_on_rq_queued(p))
return;
rq = task_rq(p);
if (p->dl.dl_non_contending) {
sub_running_bw(&p->dl, &rq->dl);
p->dl.dl_non_contending = 0;
/* /*
* If the timer handler is currently running and the * If the timer handler is currently running and the
* timer cannot be canceled, inactive_task_timer() * timer cannot be canceled, inactive_task_timer()
@ -340,13 +333,25 @@ static void dl_change_utilization(struct task_struct *p, u64 new_bw)
* will not touch the rq's active utilization, * will not touch the rq's active utilization,
* so we are still safe. * so we are still safe.
*/ */
if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) if (hrtimer_try_to_cancel(&dl_se->inactive_timer) == 1) {
put_task_struct(p); if (!dl_server(dl_se))
put_task_struct(dl_task_of(dl_se));
}
} }
__sub_rq_bw(p->dl.dl_bw, &rq->dl); __sub_rq_bw(dl_se->dl_bw, &rq->dl);
__add_rq_bw(new_bw, &rq->dl); __add_rq_bw(new_bw, &rq->dl);
} }
static void dl_change_utilization(struct task_struct *p, u64 new_bw)
{
WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV);
if (task_on_rq_queued(p))
return;
dl_rq_change_utilization(task_rq(p), &p->dl, new_bw);
}
static void __dl_clear_params(struct sched_dl_entity *dl_se); static void __dl_clear_params(struct sched_dl_entity *dl_se);
/* /*
@ -771,6 +776,15 @@ static inline void replenish_dl_new_period(struct sched_dl_entity *dl_se,
/* for non-boosted task, pi_of(dl_se) == dl_se */ /* for non-boosted task, pi_of(dl_se) == dl_se */
dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline; dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline;
dl_se->runtime = pi_of(dl_se)->dl_runtime; dl_se->runtime = pi_of(dl_se)->dl_runtime;
/*
* If it is a deferred reservation, and the server
* is not handling an starvation case, defer it.
*/
if (dl_se->dl_defer & !dl_se->dl_defer_running) {
dl_se->dl_throttled = 1;
dl_se->dl_defer_armed = 1;
}
} }
/* /*
@ -809,6 +823,9 @@ static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
replenish_dl_new_period(dl_se, rq); replenish_dl_new_period(dl_se, rq);
} }
static int start_dl_timer(struct sched_dl_entity *dl_se);
static bool dl_entity_overflow(struct sched_dl_entity *dl_se, u64 t);
/* /*
* Pure Earliest Deadline First (EDF) scheduling does not deal with the * Pure Earliest Deadline First (EDF) scheduling does not deal with the
* possibility of a entity lasting more than what it declared, and thus * possibility of a entity lasting more than what it declared, and thus
@ -837,9 +854,18 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se)
/* /*
* This could be the case for a !-dl task that is boosted. * This could be the case for a !-dl task that is boosted.
* Just go with full inherited parameters. * Just go with full inherited parameters.
*
* Or, it could be the case of a deferred reservation that
* was not able to consume its runtime in background and
* reached this point with current u > U.
*
* In both cases, set a new period.
*/ */
if (dl_se->dl_deadline == 0) if (dl_se->dl_deadline == 0 ||
replenish_dl_new_period(dl_se, rq); (dl_se->dl_defer_armed && dl_entity_overflow(dl_se, rq_clock(rq)))) {
dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline;
dl_se->runtime = pi_of(dl_se)->dl_runtime;
}
if (dl_se->dl_yielded && dl_se->runtime > 0) if (dl_se->dl_yielded && dl_se->runtime > 0)
dl_se->runtime = 0; dl_se->runtime = 0;
@ -873,6 +899,44 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se)
dl_se->dl_yielded = 0; dl_se->dl_yielded = 0;
if (dl_se->dl_throttled) if (dl_se->dl_throttled)
dl_se->dl_throttled = 0; dl_se->dl_throttled = 0;
/*
* If this is the replenishment of a deferred reservation,
* clear the flag and return.
*/
if (dl_se->dl_defer_armed) {
dl_se->dl_defer_armed = 0;
return;
}
/*
* A this point, if the deferred server is not armed, and the deadline
* is in the future, if it is not running already, throttle the server
* and arm the defer timer.
*/
if (dl_se->dl_defer && !dl_se->dl_defer_running &&
dl_time_before(rq_clock(dl_se->rq), dl_se->deadline - dl_se->runtime)) {
if (!is_dl_boosted(dl_se) && dl_se->server_has_tasks(dl_se)) {
/*
* Set dl_se->dl_defer_armed and dl_throttled variables to
* inform the start_dl_timer() that this is a deferred
* activation.
*/
dl_se->dl_defer_armed = 1;
dl_se->dl_throttled = 1;
if (!start_dl_timer(dl_se)) {
/*
* If for whatever reason (delays), a previous timer was
* queued but not serviced, cancel it and clean the
* deferrable server variables intended for start_dl_timer().
*/
hrtimer_try_to_cancel(&dl_se->dl_timer);
dl_se->dl_defer_armed = 0;
dl_se->dl_throttled = 0;
}
}
}
} }
/* /*
@ -1023,6 +1087,15 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
} }
replenish_dl_new_period(dl_se, rq); replenish_dl_new_period(dl_se, rq);
} else if (dl_server(dl_se) && dl_se->dl_defer) {
/*
* The server can still use its previous deadline, so check if
* it left the dl_defer_running state.
*/
if (!dl_se->dl_defer_running) {
dl_se->dl_defer_armed = 1;
dl_se->dl_throttled = 1;
}
} }
} }
@ -1055,8 +1128,21 @@ static int start_dl_timer(struct sched_dl_entity *dl_se)
* We want the timer to fire at the deadline, but considering * We want the timer to fire at the deadline, but considering
* that it is actually coming from rq->clock and not from * that it is actually coming from rq->clock and not from
* hrtimer's time base reading. * hrtimer's time base reading.
*
* The deferred reservation will have its timer set to
* (deadline - runtime). At that point, the CBS rule will decide
* if the current deadline can be used, or if a replenishment is
* required to avoid add too much pressure on the system
* (current u > U).
*/ */
act = ns_to_ktime(dl_next_period(dl_se)); if (dl_se->dl_defer_armed) {
WARN_ON_ONCE(!dl_se->dl_throttled);
act = ns_to_ktime(dl_se->deadline - dl_se->runtime);
} else {
/* act = deadline - rel-deadline + period */
act = ns_to_ktime(dl_next_period(dl_se));
}
now = hrtimer_cb_get_time(timer); now = hrtimer_cb_get_time(timer);
delta = ktime_to_ns(now) - rq_clock(rq); delta = ktime_to_ns(now) - rq_clock(rq);
act = ktime_add_ns(act, delta); act = ktime_add_ns(act, delta);
@ -1106,6 +1192,62 @@ static void __push_dl_task(struct rq *rq, struct rq_flags *rf)
#endif #endif
} }
/* a defer timer will not be reset if the runtime consumed was < dl_server_min_res */
static const u64 dl_server_min_res = 1 * NSEC_PER_MSEC;
static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_dl_entity *dl_se)
{
struct rq *rq = rq_of_dl_se(dl_se);
u64 fw;
scoped_guard (rq_lock, rq) {
struct rq_flags *rf = &scope.rf;
if (!dl_se->dl_throttled || !dl_se->dl_runtime)
return HRTIMER_NORESTART;
sched_clock_tick();
update_rq_clock(rq);
if (!dl_se->dl_runtime)
return HRTIMER_NORESTART;
if (!dl_se->server_has_tasks(dl_se)) {
replenish_dl_entity(dl_se);
return HRTIMER_NORESTART;
}
if (dl_se->dl_defer_armed) {
/*
* First check if the server could consume runtime in background.
* If so, it is possible to push the defer timer for this amount
* of time. The dl_server_min_res serves as a limit to avoid
* forwarding the timer for a too small amount of time.
*/
if (dl_time_before(rq_clock(dl_se->rq),
(dl_se->deadline - dl_se->runtime - dl_server_min_res))) {
/* reset the defer timer */
fw = dl_se->deadline - rq_clock(dl_se->rq) - dl_se->runtime;
hrtimer_forward_now(timer, ns_to_ktime(fw));
return HRTIMER_RESTART;
}
dl_se->dl_defer_running = 1;
}
enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);
if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &dl_se->rq->curr->dl))
resched_curr(rq);
__push_dl_task(rq, rf);
}
return HRTIMER_NORESTART;
}
/* /*
* This is the bandwidth enforcement timer callback. If here, we know * This is the bandwidth enforcement timer callback. If here, we know
* a task is not on its dl_rq, since the fact that the timer was running * a task is not on its dl_rq, since the fact that the timer was running
@ -1128,28 +1270,8 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
struct rq_flags rf; struct rq_flags rf;
struct rq *rq; struct rq *rq;
if (dl_server(dl_se)) { if (dl_server(dl_se))
struct rq *rq = rq_of_dl_se(dl_se); return dl_server_timer(timer, dl_se);
struct rq_flags rf;
rq_lock(rq, &rf);
if (dl_se->dl_throttled) {
sched_clock_tick();
update_rq_clock(rq);
if (dl_se->server_has_tasks(dl_se)) {
enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);
resched_curr(rq);
__push_dl_task(rq, &rf);
} else {
replenish_dl_entity(dl_se);
}
}
rq_unlock(rq, &rf);
return HRTIMER_NORESTART;
}
p = dl_task_of(dl_se); p = dl_task_of(dl_se);
rq = task_rq_lock(p, &rf); rq = task_rq_lock(p, &rf);
@ -1319,22 +1441,10 @@ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
return (delta * u_act) >> BW_SHIFT; return (delta * u_act) >> BW_SHIFT;
} }
static inline void s64 dl_scaled_delta_exec(struct rq *rq, struct sched_dl_entity *dl_se, s64 delta_exec)
update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
int flags);
static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64 delta_exec)
{ {
s64 scaled_delta_exec; s64 scaled_delta_exec;
if (unlikely(delta_exec <= 0)) {
if (unlikely(dl_se->dl_yielded))
goto throttle;
return;
}
if (dl_entity_is_special(dl_se))
return;
/* /*
* For tasks that participate in GRUB, we implement GRUB-PA: the * For tasks that participate in GRUB, we implement GRUB-PA: the
* spare reclaimed bandwidth is used to clock down frequency. * spare reclaimed bandwidth is used to clock down frequency.
@ -1353,8 +1463,64 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu); scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
} }
return scaled_delta_exec;
}
static inline void
update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
int flags);
static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64 delta_exec)
{
s64 scaled_delta_exec;
if (unlikely(delta_exec <= 0)) {
if (unlikely(dl_se->dl_yielded))
goto throttle;
return;
}
if (dl_server(dl_se) && dl_se->dl_throttled && !dl_se->dl_defer)
return;
if (dl_entity_is_special(dl_se))
return;
scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
dl_se->runtime -= scaled_delta_exec; dl_se->runtime -= scaled_delta_exec;
/*
* The fair server can consume its runtime while throttled (not queued/
* running as regular CFS).
*
* If the server consumes its entire runtime in this state. The server
* is not required for the current period. Thus, reset the server by
* starting a new period, pushing the activation.
*/
if (dl_se->dl_defer && dl_se->dl_throttled && dl_runtime_exceeded(dl_se)) {
/*
* If the server was previously activated - the starving condition
* took place, it this point it went away because the fair scheduler
* was able to get runtime in background. So return to the initial
* state.
*/
dl_se->dl_defer_running = 0;
hrtimer_try_to_cancel(&dl_se->dl_timer);
replenish_dl_new_period(dl_se, dl_se->rq);
/*
* Not being able to start the timer seems problematic. If it could not
* be started for whatever reason, we need to "unthrottle" the DL server
* and queue right away. Otherwise nothing might queue it. That's similar
* to what enqueue_dl_entity() does on start_dl_timer==0. For now, just warn.
*/
WARN_ON_ONCE(!start_dl_timer(dl_se));
return;
}
throttle: throttle:
if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
dl_se->dl_throttled = 1; dl_se->dl_throttled = 1;
@ -1381,6 +1547,14 @@ throttle:
resched_curr(rq); resched_curr(rq);
} }
/*
* The fair server (sole dl_server) does not account for real-time
* workload because it is running fair work.
*/
if (dl_se == &rq->fair_server)
return;
#ifdef CONFIG_RT_GROUP_SCHED
/* /*
* Because -- for now -- we share the rt bandwidth, we need to * Because -- for now -- we share the rt bandwidth, we need to
* account our runtime there too, otherwise actual rt tasks * account our runtime there too, otherwise actual rt tasks
@ -1405,34 +1579,155 @@ throttle:
rt_rq->rt_time += delta_exec; rt_rq->rt_time += delta_exec;
raw_spin_unlock(&rt_rq->rt_runtime_lock); raw_spin_unlock(&rt_rq->rt_runtime_lock);
} }
#endif
}
/*
* In the non-defer mode, the idle time is not accounted, as the
* server provides a guarantee.
*
* If the dl_server is in defer mode, the idle time is also considered
* as time available for the fair server, avoiding a penalty for the
* rt scheduler that did not consumed that time.
*/
void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
{
s64 delta_exec, scaled_delta_exec;
if (!rq->fair_server.dl_defer)
return;
/* no need to discount more */
if (rq->fair_server.runtime < 0)
return;
delta_exec = rq_clock_task(rq) - p->se.exec_start;
if (delta_exec < 0)
return;
scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
rq->fair_server.runtime -= scaled_delta_exec;
if (rq->fair_server.runtime < 0) {
rq->fair_server.dl_defer_running = 0;
rq->fair_server.runtime = 0;
}
p->se.exec_start = rq_clock_task(rq);
} }
void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec) void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
{ {
update_curr_dl_se(dl_se->rq, dl_se, delta_exec); /* 0 runtime = fair server disabled */
if (dl_se->dl_runtime)
update_curr_dl_se(dl_se->rq, dl_se, delta_exec);
} }
void dl_server_start(struct sched_dl_entity *dl_se) void dl_server_start(struct sched_dl_entity *dl_se)
{ {
struct rq *rq = dl_se->rq;
/*
* XXX: the apply do not work fine at the init phase for the
* fair server because things are not yet set. We need to improve
* this before getting generic.
*/
if (!dl_server(dl_se)) { if (!dl_server(dl_se)) {
u64 runtime = 50 * NSEC_PER_MSEC;
u64 period = 1000 * NSEC_PER_MSEC;
dl_server_apply_params(dl_se, runtime, period, 1);
dl_se->dl_server = 1; dl_se->dl_server = 1;
dl_se->dl_defer = 1;
setup_new_dl_entity(dl_se); setup_new_dl_entity(dl_se);
} }
if (!dl_se->dl_runtime)
return;
enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP); enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))
resched_curr(dl_se->rq);
} }
void dl_server_stop(struct sched_dl_entity *dl_se) void dl_server_stop(struct sched_dl_entity *dl_se)
{ {
if (!dl_se->dl_runtime)
return;
dequeue_dl_entity(dl_se, DEQUEUE_SLEEP); dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);
hrtimer_try_to_cancel(&dl_se->dl_timer);
dl_se->dl_defer_armed = 0;
dl_se->dl_throttled = 0;
} }
void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_server_has_tasks_f has_tasks, dl_server_has_tasks_f has_tasks,
dl_server_pick_f pick) dl_server_pick_f pick_task)
{ {
dl_se->rq = rq; dl_se->rq = rq;
dl_se->server_has_tasks = has_tasks; dl_se->server_has_tasks = has_tasks;
dl_se->server_pick = pick; dl_se->server_pick_task = pick_task;
}
void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq)
{
u64 new_bw = dl_se->dl_bw;
int cpu = cpu_of(rq);
struct dl_bw *dl_b;
dl_b = dl_bw_of(cpu_of(rq));
guard(raw_spinlock)(&dl_b->lock);
if (!dl_bw_cpus(cpu))
return;
__dl_add(dl_b, new_bw, dl_bw_cpus(cpu));
}
int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 period, bool init)
{
u64 old_bw = init ? 0 : to_ratio(dl_se->dl_period, dl_se->dl_runtime);
u64 new_bw = to_ratio(period, runtime);
struct rq *rq = dl_se->rq;
int cpu = cpu_of(rq);
struct dl_bw *dl_b;
unsigned long cap;
int retval = 0;
int cpus;
dl_b = dl_bw_of(cpu);
guard(raw_spinlock)(&dl_b->lock);
cpus = dl_bw_cpus(cpu);
cap = dl_bw_capacity(cpu);
if (__dl_overflow(dl_b, cap, old_bw, new_bw))
return -EBUSY;
if (init) {
__add_rq_bw(new_bw, &rq->dl);
__dl_add(dl_b, new_bw, cpus);
} else {
__dl_sub(dl_b, dl_se->dl_bw, cpus);
__dl_add(dl_b, new_bw, cpus);
dl_rq_change_utilization(rq, dl_se, new_bw);
}
dl_se->dl_runtime = runtime;
dl_se->dl_deadline = period;
dl_se->dl_period = period;
dl_se->runtime = 0;
dl_se->deadline = 0;
dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
dl_se->dl_density = to_ratio(dl_se->dl_deadline, dl_se->dl_runtime);
return retval;
} }
/* /*
@ -1599,46 +1894,40 @@ static inline bool __dl_less(struct rb_node *a, const struct rb_node *b)
return dl_time_before(__node_2_dle(a)->deadline, __node_2_dle(b)->deadline); return dl_time_before(__node_2_dle(a)->deadline, __node_2_dle(b)->deadline);
} }
static inline struct sched_statistics * static __always_inline struct sched_statistics *
__schedstats_from_dl_se(struct sched_dl_entity *dl_se) __schedstats_from_dl_se(struct sched_dl_entity *dl_se)
{ {
if (!schedstat_enabled())
return NULL;
if (dl_server(dl_se))
return NULL;
return &dl_task_of(dl_se)->stats; return &dl_task_of(dl_se)->stats;
} }
static inline void static inline void
update_stats_wait_start_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se) update_stats_wait_start_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se)
{ {
struct sched_statistics *stats; struct sched_statistics *stats = __schedstats_from_dl_se(dl_se);
if (stats)
if (!schedstat_enabled()) __update_stats_wait_start(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
return;
stats = __schedstats_from_dl_se(dl_se);
__update_stats_wait_start(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
} }
static inline void static inline void
update_stats_wait_end_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se) update_stats_wait_end_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se)
{ {
struct sched_statistics *stats; struct sched_statistics *stats = __schedstats_from_dl_se(dl_se);
if (stats)
if (!schedstat_enabled()) __update_stats_wait_end(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
return;
stats = __schedstats_from_dl_se(dl_se);
__update_stats_wait_end(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
} }
static inline void static inline void
update_stats_enqueue_sleeper_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se) update_stats_enqueue_sleeper_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se)
{ {
struct sched_statistics *stats; struct sched_statistics *stats = __schedstats_from_dl_se(dl_se);
if (stats)
if (!schedstat_enabled()) __update_stats_enqueue_sleeper(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
return;
stats = __schedstats_from_dl_se(dl_se);
__update_stats_enqueue_sleeper(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
} }
static inline void static inline void
@ -1735,7 +2024,7 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
* be counted in the active utilization; hence, we need to call * be counted in the active utilization; hence, we need to call
* add_running_bw(). * add_running_bw().
*/ */
if (dl_se->dl_throttled && !(flags & ENQUEUE_REPLENISH)) { if (!dl_se->dl_defer && dl_se->dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
if (flags & ENQUEUE_WAKEUP) if (flags & ENQUEUE_WAKEUP)
task_contending(dl_se, flags); task_contending(dl_se, flags);
@ -1757,6 +2046,25 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
setup_new_dl_entity(dl_se); setup_new_dl_entity(dl_se);
} }
/*
* If the reservation is still throttled, e.g., it got replenished but is a
* deferred task and still got to wait, don't enqueue.
*/
if (dl_se->dl_throttled && start_dl_timer(dl_se))
return;
/*
* We're about to enqueue, make sure we're not ->dl_throttled!
* In case the timer was not started, say because the defer time
* has passed, mark as not throttled and mark unarmed.
* Also cancel earlier timers, since letting those run is pointless.
*/
if (dl_se->dl_throttled) {
hrtimer_try_to_cancel(&dl_se->dl_timer);
dl_se->dl_defer_armed = 0;
dl_se->dl_throttled = 0;
}
__enqueue_dl_entity(dl_se); __enqueue_dl_entity(dl_se);
} }
@ -1846,7 +2154,7 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
enqueue_pushable_dl_task(rq, p); enqueue_pushable_dl_task(rq, p);
} }
static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) static bool dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
{ {
update_curr_dl(rq); update_curr_dl(rq);
@ -1856,6 +2164,8 @@ static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
dequeue_dl_entity(&p->dl, flags); dequeue_dl_entity(&p->dl, flags);
if (!p->dl.dl_throttled && !dl_server(&p->dl)) if (!p->dl.dl_throttled && !dl_server(&p->dl))
dequeue_pushable_dl_task(rq, p); dequeue_pushable_dl_task(rq, p);
return true;
} }
/* /*
@ -2074,6 +2384,9 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0); update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0);
deadline_queue_push_tasks(rq); deadline_queue_push_tasks(rq);
if (hrtick_enabled(rq))
start_hrtick_dl(rq, &p->dl);
} }
static struct sched_dl_entity *pick_next_dl_entity(struct dl_rq *dl_rq) static struct sched_dl_entity *pick_next_dl_entity(struct dl_rq *dl_rq)
@ -2086,7 +2399,11 @@ static struct sched_dl_entity *pick_next_dl_entity(struct dl_rq *dl_rq)
return __node_2_dle(left); return __node_2_dle(left);
} }
static struct task_struct *pick_task_dl(struct rq *rq) /*
* __pick_next_task_dl - Helper to pick the next -deadline task to run.
* @rq: The runqueue to pick the next task from.
*/
static struct task_struct *__pick_task_dl(struct rq *rq)
{ {
struct sched_dl_entity *dl_se; struct sched_dl_entity *dl_se;
struct dl_rq *dl_rq = &rq->dl; struct dl_rq *dl_rq = &rq->dl;
@ -2100,14 +2417,13 @@ again:
WARN_ON_ONCE(!dl_se); WARN_ON_ONCE(!dl_se);
if (dl_server(dl_se)) { if (dl_server(dl_se)) {
p = dl_se->server_pick(dl_se); p = dl_se->server_pick_task(dl_se);
if (!p) { if (!p) {
WARN_ON_ONCE(1);
dl_se->dl_yielded = 1; dl_se->dl_yielded = 1;
update_curr_dl_se(rq, dl_se, 0); update_curr_dl_se(rq, dl_se, 0);
goto again; goto again;
} }
p->dl_server = dl_se; rq->dl_server = dl_se;
} else { } else {
p = dl_task_of(dl_se); p = dl_task_of(dl_se);
} }
@ -2115,24 +2431,12 @@ again:
return p; return p;
} }
static struct task_struct *pick_next_task_dl(struct rq *rq) static struct task_struct *pick_task_dl(struct rq *rq)
{ {
struct task_struct *p; return __pick_task_dl(rq);
p = pick_task_dl(rq);
if (!p)
return p;
if (!p->dl_server)
set_next_task_dl(rq, p, true);
if (hrtick_enabled(rq))
start_hrtick_dl(rq, &p->dl);
return p;
} }
static void put_prev_task_dl(struct rq *rq, struct task_struct *p) static void put_prev_task_dl(struct rq *rq, struct task_struct *p, struct task_struct *next)
{ {
struct sched_dl_entity *dl_se = &p->dl; struct sched_dl_entity *dl_se = &p->dl;
struct dl_rq *dl_rq = &rq->dl; struct dl_rq *dl_rq = &rq->dl;
@ -2824,13 +3128,12 @@ DEFINE_SCHED_CLASS(dl) = {
.wakeup_preempt = wakeup_preempt_dl, .wakeup_preempt = wakeup_preempt_dl,
.pick_next_task = pick_next_task_dl, .pick_task = pick_task_dl,
.put_prev_task = put_prev_task_dl, .put_prev_task = put_prev_task_dl,
.set_next_task = set_next_task_dl, .set_next_task = set_next_task_dl,
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
.balance = balance_dl, .balance = balance_dl,
.pick_task = pick_task_dl,
.select_task_rq = select_task_rq_dl, .select_task_rq = select_task_rq_dl,
.migrate_task_rq = migrate_task_rq_dl, .migrate_task_rq = migrate_task_rq_dl,
.set_cpus_allowed = set_cpus_allowed_dl, .set_cpus_allowed = set_cpus_allowed_dl,

View File

@ -333,8 +333,165 @@ static const struct file_operations sched_debug_fops = {
.release = seq_release, .release = seq_release,
}; };
enum dl_param {
DL_RUNTIME = 0,
DL_PERIOD,
};
static unsigned long fair_server_period_max = (1UL << 22) * NSEC_PER_USEC; /* ~4 seconds */
static unsigned long fair_server_period_min = (100) * NSEC_PER_USEC; /* 100 us */
static ssize_t sched_fair_server_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *ppos, enum dl_param param)
{
long cpu = (long) ((struct seq_file *) filp->private_data)->private;
struct rq *rq = cpu_rq(cpu);
u64 runtime, period;
size_t err;
int retval;
u64 value;
err = kstrtoull_from_user(ubuf, cnt, 10, &value);
if (err)
return err;
scoped_guard (rq_lock_irqsave, rq) {
runtime = rq->fair_server.dl_runtime;
period = rq->fair_server.dl_period;
switch (param) {
case DL_RUNTIME:
if (runtime == value)
break;
runtime = value;
break;
case DL_PERIOD:
if (value == period)
break;
period = value;
break;
}
if (runtime > period ||
period > fair_server_period_max ||
period < fair_server_period_min) {
return -EINVAL;
}
if (rq->cfs.h_nr_running) {
update_rq_clock(rq);
dl_server_stop(&rq->fair_server);
}
retval = dl_server_apply_params(&rq->fair_server, runtime, period, 0);
if (retval)
cnt = retval;
if (!runtime)
printk_deferred("Fair server disabled in CPU %d, system may crash due to starvation.\n",
cpu_of(rq));
if (rq->cfs.h_nr_running)
dl_server_start(&rq->fair_server);
}
*ppos += cnt;
return cnt;
}
static size_t sched_fair_server_show(struct seq_file *m, void *v, enum dl_param param)
{
unsigned long cpu = (unsigned long) m->private;
struct rq *rq = cpu_rq(cpu);
u64 value;
switch (param) {
case DL_RUNTIME:
value = rq->fair_server.dl_runtime;
break;
case DL_PERIOD:
value = rq->fair_server.dl_period;
break;
}
seq_printf(m, "%llu\n", value);
return 0;
}
static ssize_t
sched_fair_server_runtime_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *ppos)
{
return sched_fair_server_write(filp, ubuf, cnt, ppos, DL_RUNTIME);
}
static int sched_fair_server_runtime_show(struct seq_file *m, void *v)
{
return sched_fair_server_show(m, v, DL_RUNTIME);
}
static int sched_fair_server_runtime_open(struct inode *inode, struct file *filp)
{
return single_open(filp, sched_fair_server_runtime_show, inode->i_private);
}
static const struct file_operations fair_server_runtime_fops = {
.open = sched_fair_server_runtime_open,
.write = sched_fair_server_runtime_write,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static ssize_t
sched_fair_server_period_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *ppos)
{
return sched_fair_server_write(filp, ubuf, cnt, ppos, DL_PERIOD);
}
static int sched_fair_server_period_show(struct seq_file *m, void *v)
{
return sched_fair_server_show(m, v, DL_PERIOD);
}
static int sched_fair_server_period_open(struct inode *inode, struct file *filp)
{
return single_open(filp, sched_fair_server_period_show, inode->i_private);
}
static const struct file_operations fair_server_period_fops = {
.open = sched_fair_server_period_open,
.write = sched_fair_server_period_write,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static struct dentry *debugfs_sched; static struct dentry *debugfs_sched;
static void debugfs_fair_server_init(void)
{
struct dentry *d_fair;
unsigned long cpu;
d_fair = debugfs_create_dir("fair_server", debugfs_sched);
if (!d_fair)
return;
for_each_possible_cpu(cpu) {
struct dentry *d_cpu;
char buf[32];
snprintf(buf, sizeof(buf), "cpu%lu", cpu);
d_cpu = debugfs_create_dir(buf, d_fair);
debugfs_create_file("runtime", 0644, d_cpu, (void *) cpu, &fair_server_runtime_fops);
debugfs_create_file("period", 0644, d_cpu, (void *) cpu, &fair_server_period_fops);
}
}
static __init int sched_init_debug(void) static __init int sched_init_debug(void)
{ {
struct dentry __maybe_unused *numa; struct dentry __maybe_unused *numa;
@ -374,6 +531,8 @@ static __init int sched_init_debug(void)
debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops); debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops);
debugfs_fair_server_init();
return 0; return 0;
} }
late_initcall(sched_init_debug); late_initcall(sched_init_debug);
@ -580,27 +739,27 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
else else
SEQ_printf(m, " %c", task_state_to_char(p)); SEQ_printf(m, " %c", task_state_to_char(p));
SEQ_printf(m, "%15s %5d %9Ld.%06ld %c %9Ld.%06ld %9Ld.%06ld %9Ld.%06ld %9Ld %5d ", SEQ_printf(m, " %15s %5d %9Ld.%06ld %c %9Ld.%06ld %c %9Ld.%06ld %9Ld.%06ld %9Ld %5d ",
p->comm, task_pid_nr(p), p->comm, task_pid_nr(p),
SPLIT_NS(p->se.vruntime), SPLIT_NS(p->se.vruntime),
entity_eligible(cfs_rq_of(&p->se), &p->se) ? 'E' : 'N', entity_eligible(cfs_rq_of(&p->se), &p->se) ? 'E' : 'N',
SPLIT_NS(p->se.deadline), SPLIT_NS(p->se.deadline),
p->se.custom_slice ? 'S' : ' ',
SPLIT_NS(p->se.slice), SPLIT_NS(p->se.slice),
SPLIT_NS(p->se.sum_exec_runtime), SPLIT_NS(p->se.sum_exec_runtime),
(long long)(p->nvcsw + p->nivcsw), (long long)(p->nvcsw + p->nivcsw),
p->prio); p->prio);
SEQ_printf(m, "%9lld.%06ld %9lld.%06ld %9lld.%06ld %9lld.%06ld", SEQ_printf(m, "%9lld.%06ld %9lld.%06ld %9lld.%06ld",
SPLIT_NS(schedstat_val_or_zero(p->stats.wait_sum)), SPLIT_NS(schedstat_val_or_zero(p->stats.wait_sum)),
SPLIT_NS(p->se.sum_exec_runtime),
SPLIT_NS(schedstat_val_or_zero(p->stats.sum_sleep_runtime)), SPLIT_NS(schedstat_val_or_zero(p->stats.sum_sleep_runtime)),
SPLIT_NS(schedstat_val_or_zero(p->stats.sum_block_runtime))); SPLIT_NS(schedstat_val_or_zero(p->stats.sum_block_runtime)));
#ifdef CONFIG_NUMA_BALANCING #ifdef CONFIG_NUMA_BALANCING
SEQ_printf(m, " %d %d", task_node(p), task_numa_group_id(p)); SEQ_printf(m, " %d %d", task_node(p), task_numa_group_id(p));
#endif #endif
#ifdef CONFIG_CGROUP_SCHED #ifdef CONFIG_CGROUP_SCHED
SEQ_printf_task_group_path(m, task_group(p), " %s") SEQ_printf_task_group_path(m, task_group(p), " %s")
#endif #endif
SEQ_printf(m, "\n"); SEQ_printf(m, "\n");
@ -612,10 +771,26 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
SEQ_printf(m, "\n"); SEQ_printf(m, "\n");
SEQ_printf(m, "runnable tasks:\n"); SEQ_printf(m, "runnable tasks:\n");
SEQ_printf(m, " S task PID tree-key switches prio" SEQ_printf(m, " S task PID vruntime eligible "
" wait-time sum-exec sum-sleep\n"); "deadline slice sum-exec switches "
"prio wait-time sum-sleep sum-block"
#ifdef CONFIG_NUMA_BALANCING
" node group-id"
#endif
#ifdef CONFIG_CGROUP_SCHED
" group-path"
#endif
"\n");
SEQ_printf(m, "-------------------------------------------------------" SEQ_printf(m, "-------------------------------------------------------"
"------------------------------------------------------\n"); "------------------------------------------------------"
"------------------------------------------------------"
#ifdef CONFIG_NUMA_BALANCING
"--------------"
#endif
#ifdef CONFIG_CGROUP_SCHED
"--------------"
#endif
"\n");
rcu_read_lock(); rcu_read_lock();
for_each_process_thread(g, p) { for_each_process_thread(g, p) {
@ -641,8 +816,6 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SEQ_printf(m, "\n"); SEQ_printf(m, "\n");
SEQ_printf(m, "cfs_rq[%d]:\n", cpu); SEQ_printf(m, "cfs_rq[%d]:\n", cpu);
#endif #endif
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "exec_clock",
SPLIT_NS(cfs_rq->exec_clock));
raw_spin_rq_lock_irqsave(rq, flags); raw_spin_rq_lock_irqsave(rq, flags);
root = __pick_root_entity(cfs_rq); root = __pick_root_entity(cfs_rq);
@ -669,8 +842,6 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SPLIT_NS(right_vruntime)); SPLIT_NS(right_vruntime));
spread = right_vruntime - left_vruntime; spread = right_vruntime - left_vruntime;
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", SPLIT_NS(spread)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", SPLIT_NS(spread));
SEQ_printf(m, " .%-30s: %d\n", "nr_spread_over",
cfs_rq->nr_spread_over);
SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running); SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_running", cfs_rq->h_nr_running); SEQ_printf(m, " .%-30s: %d\n", "h_nr_running", cfs_rq->h_nr_running);
SEQ_printf(m, " .%-30s: %d\n", "idle_nr_running", SEQ_printf(m, " .%-30s: %d\n", "idle_nr_running",
@ -730,9 +901,12 @@ void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq)
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(rt_rq->x)) SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(rt_rq->x))
PU(rt_nr_running); PU(rt_nr_running);
#ifdef CONFIG_RT_GROUP_SCHED
P(rt_throttled); P(rt_throttled);
PN(rt_time); PN(rt_time);
PN(rt_runtime); PN(rt_runtime);
#endif
#undef PN #undef PN
#undef PU #undef PU

File diff suppressed because it is too large Load Diff

View File

@ -5,8 +5,24 @@
* sleep+wake cycles. EEVDF placement strategy #1, #2 if disabled. * sleep+wake cycles. EEVDF placement strategy #1, #2 if disabled.
*/ */
SCHED_FEAT(PLACE_LAG, true) SCHED_FEAT(PLACE_LAG, true)
/*
* Give new tasks half a slice to ease into the competition.
*/
SCHED_FEAT(PLACE_DEADLINE_INITIAL, true) SCHED_FEAT(PLACE_DEADLINE_INITIAL, true)
/*
* Preserve relative virtual deadline on 'migration'.
*/
SCHED_FEAT(PLACE_REL_DEADLINE, true)
/*
* Inhibit (wakeup) preemption until the current task has either matched the
* 0-lag point or until is has exhausted it's slice.
*/
SCHED_FEAT(RUN_TO_PARITY, true) SCHED_FEAT(RUN_TO_PARITY, true)
/*
* Allow wakeup of tasks with a shorter slice to cancel RESPECT_SLICE for
* current.
*/
SCHED_FEAT(PREEMPT_SHORT, true)
/* /*
* Prefer to schedule the task we woke last (assuming it failed * Prefer to schedule the task we woke last (assuming it failed
@ -21,6 +37,18 @@ SCHED_FEAT(NEXT_BUDDY, false)
*/ */
SCHED_FEAT(CACHE_HOT_BUDDY, true) SCHED_FEAT(CACHE_HOT_BUDDY, true)
/*
* Delay dequeueing tasks until they get selected or woken.
*
* By delaying the dequeue for non-eligible tasks, they remain in the
* competition and can burn off their negative lag. When they get selected
* they'll have positive lag by definition.
*
* DELAY_ZERO clips the lag on dequeue (or wakeup) to 0.
*/
SCHED_FEAT(DELAY_DEQUEUE, true)
SCHED_FEAT(DELAY_ZERO, true)
/* /*
* Allow wakeup-time preemption of the current task: * Allow wakeup-time preemption of the current task:
*/ */
@ -85,5 +113,3 @@ SCHED_FEAT(WA_BIAS, true)
SCHED_FEAT(UTIL_EST, true) SCHED_FEAT(UTIL_EST, true)
SCHED_FEAT(LATENCY_WARN, false) SCHED_FEAT(LATENCY_WARN, false)
SCHED_FEAT(HZ_BW, true)

View File

@ -450,43 +450,35 @@ static void wakeup_preempt_idle(struct rq *rq, struct task_struct *p, int flags)
resched_curr(rq); resched_curr(rq);
} }
static void put_prev_task_idle(struct rq *rq, struct task_struct *prev) static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct task_struct *next)
{ {
dl_server_update_idle_time(rq, prev);
} }
static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first) static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
{ {
update_idle_core(rq); update_idle_core(rq);
schedstat_inc(rq->sched_goidle); schedstat_inc(rq->sched_goidle);
next->se.exec_start = rq_clock_task(rq);
} }
#ifdef CONFIG_SMP struct task_struct *pick_task_idle(struct rq *rq)
static struct task_struct *pick_task_idle(struct rq *rq)
{ {
return rq->idle; return rq->idle;
} }
#endif
struct task_struct *pick_next_task_idle(struct rq *rq)
{
struct task_struct *next = rq->idle;
set_next_task_idle(rq, next, true);
return next;
}
/* /*
* It is not legal to sleep in the idle task - print a warning * It is not legal to sleep in the idle task - print a warning
* message if some code attempts to do it: * message if some code attempts to do it:
*/ */
static void static bool
dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags) dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
{ {
raw_spin_rq_unlock_irq(rq); raw_spin_rq_unlock_irq(rq);
printk(KERN_ERR "bad: scheduling from the idle thread!\n"); printk(KERN_ERR "bad: scheduling from the idle thread!\n");
dump_stack(); dump_stack();
raw_spin_rq_lock_irq(rq); raw_spin_rq_lock_irq(rq);
return true;
} }
/* /*
@ -528,13 +520,12 @@ DEFINE_SCHED_CLASS(idle) = {
.wakeup_preempt = wakeup_preempt_idle, .wakeup_preempt = wakeup_preempt_idle,
.pick_next_task = pick_next_task_idle, .pick_task = pick_task_idle,
.put_prev_task = put_prev_task_idle, .put_prev_task = put_prev_task_idle,
.set_next_task = set_next_task_idle, .set_next_task = set_next_task_idle,
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
.balance = balance_idle, .balance = balance_idle,
.pick_task = pick_task_idle,
.select_task_rq = select_task_rq_idle, .select_task_rq = select_task_rq_idle,
.set_cpus_allowed = set_cpus_allowed_common, .set_cpus_allowed = set_cpus_allowed_common,
#endif #endif

View File

@ -8,10 +8,6 @@ int sched_rr_timeslice = RR_TIMESLICE;
/* More than 4 hours if BW_SHIFT equals 20. */ /* More than 4 hours if BW_SHIFT equals 20. */
static const u64 max_rt_runtime = MAX_BW; static const u64 max_rt_runtime = MAX_BW;
static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
struct rt_bandwidth def_rt_bandwidth;
/* /*
* period over which we measure -rt task CPU usage in us. * period over which we measure -rt task CPU usage in us.
* default: 1s * default: 1s
@ -66,6 +62,40 @@ static int __init sched_rt_sysctl_init(void)
late_initcall(sched_rt_sysctl_init); late_initcall(sched_rt_sysctl_init);
#endif #endif
void init_rt_rq(struct rt_rq *rt_rq)
{
struct rt_prio_array *array;
int i;
array = &rt_rq->active;
for (i = 0; i < MAX_RT_PRIO; i++) {
INIT_LIST_HEAD(array->queue + i);
__clear_bit(i, array->bitmap);
}
/* delimiter for bitsearch: */
__set_bit(MAX_RT_PRIO, array->bitmap);
#if defined CONFIG_SMP
rt_rq->highest_prio.curr = MAX_RT_PRIO-1;
rt_rq->highest_prio.next = MAX_RT_PRIO-1;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
#ifdef CONFIG_RT_GROUP_SCHED
rt_rq->rt_time = 0;
rt_rq->rt_throttled = 0;
rt_rq->rt_runtime = 0;
raw_spin_lock_init(&rt_rq->rt_runtime_lock);
#endif
}
#ifdef CONFIG_RT_GROUP_SCHED
static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer) static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
{ {
struct rt_bandwidth *rt_b = struct rt_bandwidth *rt_b =
@ -130,35 +160,6 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
do_start_rt_bandwidth(rt_b); do_start_rt_bandwidth(rt_b);
} }
void init_rt_rq(struct rt_rq *rt_rq)
{
struct rt_prio_array *array;
int i;
array = &rt_rq->active;
for (i = 0; i < MAX_RT_PRIO; i++) {
INIT_LIST_HEAD(array->queue + i);
__clear_bit(i, array->bitmap);
}
/* delimiter for bit-search: */
__set_bit(MAX_RT_PRIO, array->bitmap);
#if defined CONFIG_SMP
rt_rq->highest_prio.curr = MAX_RT_PRIO-1;
rt_rq->highest_prio.next = MAX_RT_PRIO-1;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
rt_rq->rt_time = 0;
rt_rq->rt_throttled = 0;
rt_rq->rt_runtime = 0;
raw_spin_lock_init(&rt_rq->rt_runtime_lock);
}
#ifdef CONFIG_RT_GROUP_SCHED
static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b) static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
{ {
hrtimer_cancel(&rt_b->rt_period_timer); hrtimer_cancel(&rt_b->rt_period_timer);
@ -195,7 +196,6 @@ void unregister_rt_sched_group(struct task_group *tg)
{ {
if (tg->rt_se) if (tg->rt_se)
destroy_rt_bandwidth(&tg->rt_bandwidth); destroy_rt_bandwidth(&tg->rt_bandwidth);
} }
void free_rt_sched_group(struct task_group *tg) void free_rt_sched_group(struct task_group *tg)
@ -253,8 +253,7 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
if (!tg->rt_se) if (!tg->rt_se)
goto err; goto err;
init_rt_bandwidth(&tg->rt_bandwidth, init_rt_bandwidth(&tg->rt_bandwidth, ktime_to_ns(global_rt_period()), 0);
ktime_to_ns(def_rt_bandwidth.rt_period), 0);
for_each_possible_cpu(i) { for_each_possible_cpu(i) {
rt_rq = kzalloc_node(sizeof(struct rt_rq), rt_rq = kzalloc_node(sizeof(struct rt_rq),
@ -604,70 +603,6 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq)
return &rt_rq->tg->rt_bandwidth; return &rt_rq->tg->rt_bandwidth;
} }
#else /* !CONFIG_RT_GROUP_SCHED */
static inline u64 sched_rt_runtime(struct rt_rq *rt_rq)
{
return rt_rq->rt_runtime;
}
static inline u64 sched_rt_period(struct rt_rq *rt_rq)
{
return ktime_to_ns(def_rt_bandwidth.rt_period);
}
typedef struct rt_rq *rt_rq_iter_t;
#define for_each_rt_rq(rt_rq, iter, rq) \
for ((void) iter, rt_rq = &rq->rt; rt_rq; rt_rq = NULL)
#define for_each_sched_rt_entity(rt_se) \
for (; rt_se; rt_se = NULL)
static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se)
{
return NULL;
}
static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
{
struct rq *rq = rq_of_rt_rq(rt_rq);
if (!rt_rq->rt_nr_running)
return;
enqueue_top_rt_rq(rt_rq);
resched_curr(rq);
}
static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
{
dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running);
}
static inline int rt_rq_throttled(struct rt_rq *rt_rq)
{
return rt_rq->rt_throttled;
}
static inline const struct cpumask *sched_rt_period_mask(void)
{
return cpu_online_mask;
}
static inline
struct rt_rq *sched_rt_period_rt_rq(struct rt_bandwidth *rt_b, int cpu)
{
return &cpu_rq(cpu)->rt;
}
static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq)
{
return &def_rt_bandwidth;
}
#endif /* CONFIG_RT_GROUP_SCHED */
bool sched_rt_bandwidth_account(struct rt_rq *rt_rq) bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
{ {
struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
@ -859,7 +794,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
const struct cpumask *span; const struct cpumask *span;
span = sched_rt_period_mask(); span = sched_rt_period_mask();
#ifdef CONFIG_RT_GROUP_SCHED
/* /*
* FIXME: isolated CPUs should really leave the root task group, * FIXME: isolated CPUs should really leave the root task group,
* whether they are isolcpus or were isolated via cpusets, lest * whether they are isolcpus or were isolated via cpusets, lest
@ -871,7 +806,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
*/ */
if (rt_b == &root_task_group.rt_bandwidth) if (rt_b == &root_task_group.rt_bandwidth)
span = cpu_online_mask; span = cpu_online_mask;
#endif
for_each_cpu(i, span) { for_each_cpu(i, span) {
int enqueue = 0; int enqueue = 0;
struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i); struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
@ -938,18 +873,6 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
return idle; return idle;
} }
static inline int rt_se_prio(struct sched_rt_entity *rt_se)
{
#ifdef CONFIG_RT_GROUP_SCHED
struct rt_rq *rt_rq = group_rt_rq(rt_se);
if (rt_rq)
return rt_rq->highest_prio.curr;
#endif
return rt_task_of(rt_se)->prio;
}
static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq) static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
{ {
u64 runtime = sched_rt_runtime(rt_rq); u64 runtime = sched_rt_runtime(rt_rq);
@ -993,6 +916,72 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
return 0; return 0;
} }
#else /* !CONFIG_RT_GROUP_SCHED */
typedef struct rt_rq *rt_rq_iter_t;
#define for_each_rt_rq(rt_rq, iter, rq) \
for ((void) iter, rt_rq = &rq->rt; rt_rq; rt_rq = NULL)
#define for_each_sched_rt_entity(rt_se) \
for (; rt_se; rt_se = NULL)
static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se)
{
return NULL;
}
static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
{
struct rq *rq = rq_of_rt_rq(rt_rq);
if (!rt_rq->rt_nr_running)
return;
enqueue_top_rt_rq(rt_rq);
resched_curr(rq);
}
static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
{
dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running);
}
static inline int rt_rq_throttled(struct rt_rq *rt_rq)
{
return false;
}
static inline const struct cpumask *sched_rt_period_mask(void)
{
return cpu_online_mask;
}
static inline
struct rt_rq *sched_rt_period_rt_rq(struct rt_bandwidth *rt_b, int cpu)
{
return &cpu_rq(cpu)->rt;
}
#ifdef CONFIG_SMP
static void __enable_runtime(struct rq *rq) { }
static void __disable_runtime(struct rq *rq) { }
#endif
#endif /* CONFIG_RT_GROUP_SCHED */
static inline int rt_se_prio(struct sched_rt_entity *rt_se)
{
#ifdef CONFIG_RT_GROUP_SCHED
struct rt_rq *rt_rq = group_rt_rq(rt_se);
if (rt_rq)
return rt_rq->highest_prio.curr;
#endif
return rt_task_of(rt_se)->prio;
}
/* /*
* Update the current task's runtime statistics. Skip current tasks that * Update the current task's runtime statistics. Skip current tasks that
* are not in our scheduling class. * are not in our scheduling class.
@ -1000,7 +989,6 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
static void update_curr_rt(struct rq *rq) static void update_curr_rt(struct rq *rq)
{ {
struct task_struct *curr = rq->curr; struct task_struct *curr = rq->curr;
struct sched_rt_entity *rt_se = &curr->rt;
s64 delta_exec; s64 delta_exec;
if (curr->sched_class != &rt_sched_class) if (curr->sched_class != &rt_sched_class)
@ -1010,6 +998,9 @@ static void update_curr_rt(struct rq *rq)
if (unlikely(delta_exec <= 0)) if (unlikely(delta_exec <= 0))
return; return;
#ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity *rt_se = &curr->rt;
if (!rt_bandwidth_enabled()) if (!rt_bandwidth_enabled())
return; return;
@ -1028,6 +1019,7 @@ static void update_curr_rt(struct rq *rq)
do_start_rt_bandwidth(sched_rt_bandwidth(rt_rq)); do_start_rt_bandwidth(sched_rt_bandwidth(rt_rq));
} }
} }
#endif
} }
static void static void
@ -1184,7 +1176,6 @@ dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
static void static void
inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
{ {
start_rt_bandwidth(&def_rt_bandwidth);
} }
static inline static inline
@ -1492,7 +1483,7 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
enqueue_pushable_task(rq, p); enqueue_pushable_task(rq, p);
} }
static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags) static bool dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
{ {
struct sched_rt_entity *rt_se = &p->rt; struct sched_rt_entity *rt_se = &p->rt;
@ -1500,6 +1491,8 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
dequeue_rt_entity(rt_se, flags); dequeue_rt_entity(rt_se, flags);
dequeue_pushable_task(rq, p); dequeue_pushable_task(rq, p);
return true;
} }
/* /*
@ -1755,17 +1748,7 @@ static struct task_struct *pick_task_rt(struct rq *rq)
return p; return p;
} }
static struct task_struct *pick_next_task_rt(struct rq *rq) static void put_prev_task_rt(struct rq *rq, struct task_struct *p, struct task_struct *next)
{
struct task_struct *p = pick_task_rt(rq);
if (p)
set_next_task_rt(rq, p, true);
return p;
}
static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
{ {
struct sched_rt_entity *rt_se = &p->rt; struct sched_rt_entity *rt_se = &p->rt;
struct rt_rq *rt_rq = &rq->rt; struct rt_rq *rt_rq = &rq->rt;
@ -2652,13 +2635,12 @@ DEFINE_SCHED_CLASS(rt) = {
.wakeup_preempt = wakeup_preempt_rt, .wakeup_preempt = wakeup_preempt_rt,
.pick_next_task = pick_next_task_rt, .pick_task = pick_task_rt,
.put_prev_task = put_prev_task_rt, .put_prev_task = put_prev_task_rt,
.set_next_task = set_next_task_rt, .set_next_task = set_next_task_rt,
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
.balance = balance_rt, .balance = balance_rt,
.pick_task = pick_task_rt,
.select_task_rq = select_task_rq_rt, .select_task_rq = select_task_rq_rt,
.set_cpus_allowed = set_cpus_allowed_common, .set_cpus_allowed = set_cpus_allowed_common,
.rq_online = rq_online_rt, .rq_online = rq_online_rt,
@ -2912,19 +2894,6 @@ int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
#ifdef CONFIG_SYSCTL #ifdef CONFIG_SYSCTL
static int sched_rt_global_constraints(void) static int sched_rt_global_constraints(void)
{ {
unsigned long flags;
int i;
raw_spin_lock_irqsave(&def_rt_bandwidth.rt_runtime_lock, flags);
for_each_possible_cpu(i) {
struct rt_rq *rt_rq = &cpu_rq(i)->rt;
raw_spin_lock(&rt_rq->rt_runtime_lock);
rt_rq->rt_runtime = global_rt_runtime();
raw_spin_unlock(&rt_rq->rt_runtime_lock);
}
raw_spin_unlock_irqrestore(&def_rt_bandwidth.rt_runtime_lock, flags);
return 0; return 0;
} }
#endif /* CONFIG_SYSCTL */ #endif /* CONFIG_SYSCTL */
@ -2944,12 +2913,6 @@ static int sched_rt_global_validate(void)
static void sched_rt_do_global(void) static void sched_rt_do_global(void)
{ {
unsigned long flags;
raw_spin_lock_irqsave(&def_rt_bandwidth.rt_runtime_lock, flags);
def_rt_bandwidth.rt_runtime = global_rt_runtime();
def_rt_bandwidth.rt_period = ns_to_ktime(global_rt_period());
raw_spin_unlock_irqrestore(&def_rt_bandwidth.rt_runtime_lock, flags);
} }
static int sched_rt_handler(const struct ctl_table *table, int write, void *buffer, static int sched_rt_handler(const struct ctl_table *table, int write, void *buffer,

View File

@ -68,6 +68,7 @@
#include <linux/wait_api.h> #include <linux/wait_api.h>
#include <linux/wait_bit.h> #include <linux/wait_bit.h>
#include <linux/workqueue_api.h> #include <linux/workqueue_api.h>
#include <linux/delayacct.h>
#include <trace/events/power.h> #include <trace/events/power.h>
#include <trace/events/sched.h> #include <trace/events/sched.h>
@ -335,7 +336,7 @@ extern bool __checkparam_dl(const struct sched_attr *attr);
extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr); extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr);
extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
extern int dl_bw_check_overflow(int cpu); extern int dl_bw_check_overflow(int cpu);
extern s64 dl_scaled_delta_exec(struct rq *rq, struct sched_dl_entity *dl_se, s64 delta_exec);
/* /*
* SCHED_DEADLINE supports servers (nested scheduling) with the following * SCHED_DEADLINE supports servers (nested scheduling) with the following
* interface: * interface:
@ -361,7 +362,14 @@ extern void dl_server_start(struct sched_dl_entity *dl_se);
extern void dl_server_stop(struct sched_dl_entity *dl_se); extern void dl_server_stop(struct sched_dl_entity *dl_se);
extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_server_has_tasks_f has_tasks, dl_server_has_tasks_f has_tasks,
dl_server_pick_f pick); dl_server_pick_f pick_task);
extern void dl_server_update_idle_time(struct rq *rq,
struct task_struct *p);
extern void fair_server_init(struct rq *rq);
extern void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq);
extern int dl_server_apply_params(struct sched_dl_entity *dl_se,
u64 runtime, u64 period, bool init);
#ifdef CONFIG_CGROUP_SCHED #ifdef CONFIG_CGROUP_SCHED
@ -599,17 +607,12 @@ struct cfs_rq {
s64 avg_vruntime; s64 avg_vruntime;
u64 avg_load; u64 avg_load;
u64 exec_clock;
u64 min_vruntime; u64 min_vruntime;
#ifdef CONFIG_SCHED_CORE #ifdef CONFIG_SCHED_CORE
unsigned int forceidle_seq; unsigned int forceidle_seq;
u64 min_vruntime_fi; u64 min_vruntime_fi;
#endif #endif
#ifndef CONFIG_64BIT
u64 min_vruntime_copy;
#endif
struct rb_root_cached tasks_timeline; struct rb_root_cached tasks_timeline;
/* /*
@ -619,10 +622,6 @@ struct cfs_rq {
struct sched_entity *curr; struct sched_entity *curr;
struct sched_entity *next; struct sched_entity *next;
#ifdef CONFIG_SCHED_DEBUG
unsigned int nr_spread_over;
#endif
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
/* /*
* CFS load tracking * CFS load tracking
@ -726,13 +725,13 @@ struct rt_rq {
#endif /* CONFIG_SMP */ #endif /* CONFIG_SMP */
int rt_queued; int rt_queued;
#ifdef CONFIG_RT_GROUP_SCHED
int rt_throttled; int rt_throttled;
u64 rt_time; u64 rt_time;
u64 rt_runtime; u64 rt_runtime;
/* Nests inside the rq lock: */ /* Nests inside the rq lock: */
raw_spinlock_t rt_runtime_lock; raw_spinlock_t rt_runtime_lock;
#ifdef CONFIG_RT_GROUP_SCHED
unsigned int rt_nr_boosted; unsigned int rt_nr_boosted;
struct rq *rq; struct rq *rq;
@ -820,6 +819,9 @@ static inline void se_update_runnable(struct sched_entity *se)
static inline long se_runnable(struct sched_entity *se) static inline long se_runnable(struct sched_entity *se)
{ {
if (se->sched_delayed)
return false;
if (entity_is_task(se)) if (entity_is_task(se))
return !!se->on_rq; return !!se->on_rq;
else else
@ -834,6 +836,9 @@ static inline void se_update_runnable(struct sched_entity *se) { }
static inline long se_runnable(struct sched_entity *se) static inline long se_runnable(struct sched_entity *se)
{ {
if (se->sched_delayed)
return false;
return !!se->on_rq; return !!se->on_rq;
} }
@ -1044,6 +1049,8 @@ struct rq {
struct rt_rq rt; struct rt_rq rt;
struct dl_rq dl; struct dl_rq dl;
struct sched_dl_entity fair_server;
#ifdef CONFIG_FAIR_GROUP_SCHED #ifdef CONFIG_FAIR_GROUP_SCHED
/* list of leaf cfs_rq on this CPU: */ /* list of leaf cfs_rq on this CPU: */
struct list_head leaf_cfs_rq_list; struct list_head leaf_cfs_rq_list;
@ -1059,6 +1066,7 @@ struct rq {
unsigned int nr_uninterruptible; unsigned int nr_uninterruptible;
struct task_struct __rcu *curr; struct task_struct __rcu *curr;
struct sched_dl_entity *dl_server;
struct task_struct *idle; struct task_struct *idle;
struct task_struct *stop; struct task_struct *stop;
unsigned long next_balance; unsigned long next_balance;
@ -1158,7 +1166,6 @@ struct rq {
/* latency stats */ /* latency stats */
struct sched_info rq_sched_info; struct sched_info rq_sched_info;
unsigned long long rq_cpu_time; unsigned long long rq_cpu_time;
/* could above be rq->cfs_rq.exec_clock + rq->rt_rq.rt_runtime ? */
/* sys_sched_yield() stats */ /* sys_sched_yield() stats */
unsigned int yld_count; unsigned int yld_count;
@ -1187,6 +1194,7 @@ struct rq {
/* per rq */ /* per rq */
struct rq *core; struct rq *core;
struct task_struct *core_pick; struct task_struct *core_pick;
struct sched_dl_entity *core_dl_server;
unsigned int core_enabled; unsigned int core_enabled;
unsigned int core_sched_seq; unsigned int core_sched_seq;
struct rb_root core_tree; struct rb_root core_tree;
@ -2247,11 +2255,13 @@ extern const u32 sched_prio_to_wmult[40];
* *
*/ */
#define DEQUEUE_SLEEP 0x01 #define DEQUEUE_SLEEP 0x01 /* Matches ENQUEUE_WAKEUP */
#define DEQUEUE_SAVE 0x02 /* Matches ENQUEUE_RESTORE */ #define DEQUEUE_SAVE 0x02 /* Matches ENQUEUE_RESTORE */
#define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */ #define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */
#define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */ #define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */
#define DEQUEUE_SPECIAL 0x10
#define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ #define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */
#define DEQUEUE_DELAYED 0x200 /* Matches ENQUEUE_DELAYED */
#define ENQUEUE_WAKEUP 0x01 #define ENQUEUE_WAKEUP 0x01
#define ENQUEUE_RESTORE 0x02 #define ENQUEUE_RESTORE 0x02
@ -2267,6 +2277,7 @@ extern const u32 sched_prio_to_wmult[40];
#endif #endif
#define ENQUEUE_INITIAL 0x80 #define ENQUEUE_INITIAL 0x80
#define ENQUEUE_MIGRATING 0x100 #define ENQUEUE_MIGRATING 0x100
#define ENQUEUE_DELAYED 0x200
#define RETRY_TASK ((void *)-1UL) #define RETRY_TASK ((void *)-1UL)
@ -2285,23 +2296,31 @@ struct sched_class {
#endif #endif
void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags); void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags); bool (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
void (*yield_task) (struct rq *rq); void (*yield_task) (struct rq *rq);
bool (*yield_to_task)(struct rq *rq, struct task_struct *p); bool (*yield_to_task)(struct rq *rq, struct task_struct *p);
void (*wakeup_preempt)(struct rq *rq, struct task_struct *p, int flags); void (*wakeup_preempt)(struct rq *rq, struct task_struct *p, int flags);
struct task_struct *(*pick_next_task)(struct rq *rq); struct task_struct *(*pick_task)(struct rq *rq);
/*
* Optional! When implemented pick_next_task() should be equivalent to:
*
* next = pick_task();
* if (next) {
* put_prev_task(prev);
* set_next_task_first(next);
* }
*/
struct task_struct *(*pick_next_task)(struct rq *rq, struct task_struct *prev);
void (*put_prev_task)(struct rq *rq, struct task_struct *p); void (*put_prev_task)(struct rq *rq, struct task_struct *p, struct task_struct *next);
void (*set_next_task)(struct rq *rq, struct task_struct *p, bool first); void (*set_next_task)(struct rq *rq, struct task_struct *p, bool first);
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *rf); int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
int (*select_task_rq)(struct task_struct *p, int task_cpu, int flags); int (*select_task_rq)(struct task_struct *p, int task_cpu, int flags);
struct task_struct * (*pick_task)(struct rq *rq);
void (*migrate_task_rq)(struct task_struct *p, int new_cpu); void (*migrate_task_rq)(struct task_struct *p, int new_cpu);
void (*task_woken)(struct rq *this_rq, struct task_struct *task); void (*task_woken)(struct rq *this_rq, struct task_struct *task);
@ -2345,7 +2364,7 @@ struct sched_class {
static inline void put_prev_task(struct rq *rq, struct task_struct *prev) static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
{ {
WARN_ON_ONCE(rq->curr != prev); WARN_ON_ONCE(rq->curr != prev);
prev->sched_class->put_prev_task(rq, prev); prev->sched_class->put_prev_task(rq, prev, NULL);
} }
static inline void set_next_task(struct rq *rq, struct task_struct *next) static inline void set_next_task(struct rq *rq, struct task_struct *next)
@ -2353,6 +2372,30 @@ static inline void set_next_task(struct rq *rq, struct task_struct *next)
next->sched_class->set_next_task(rq, next, false); next->sched_class->set_next_task(rq, next, false);
} }
static inline void
__put_prev_set_next_dl_server(struct rq *rq,
struct task_struct *prev,
struct task_struct *next)
{
prev->dl_server = NULL;
next->dl_server = rq->dl_server;
rq->dl_server = NULL;
}
static inline void put_prev_set_next_task(struct rq *rq,
struct task_struct *prev,
struct task_struct *next)
{
WARN_ON_ONCE(rq->curr != prev);
__put_prev_set_next_dl_server(rq, prev, next);
if (next == prev)
return;
prev->sched_class->put_prev_task(rq, prev, next);
next->sched_class->set_next_task(rq, next, true);
}
/* /*
* Helper to define a sched_class instance; each one is placed in a separate * Helper to define a sched_class instance; each one is placed in a separate
@ -2408,7 +2451,7 @@ static inline bool sched_fair_runnable(struct rq *rq)
} }
extern struct task_struct *pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf); extern struct task_struct *pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
extern struct task_struct *pick_next_task_idle(struct rq *rq); extern struct task_struct *pick_task_idle(struct rq *rq);
#define SCA_CHECK 0x01 #define SCA_CHECK 0x01
#define SCA_MIGRATE_DISABLE 0x02 #define SCA_MIGRATE_DISABLE 0x02
@ -2515,7 +2558,6 @@ extern void reweight_task(struct task_struct *p, const struct load_weight *lw);
extern void resched_curr(struct rq *rq); extern void resched_curr(struct rq *rq);
extern void resched_cpu(int cpu); extern void resched_cpu(int cpu);
extern struct rt_bandwidth def_rt_bandwidth;
extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime); extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime);
extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq); extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq);
@ -2586,6 +2628,19 @@ static inline void sub_nr_running(struct rq *rq, unsigned count)
sched_update_tick_dependency(rq); sched_update_tick_dependency(rq);
} }
static inline void __block_task(struct rq *rq, struct task_struct *p)
{
WRITE_ONCE(p->on_rq, 0);
ASSERT_EXCLUSIVE_WRITER(p->on_rq);
if (p->sched_contributes_to_load)
rq->nr_uninterruptible++;
if (p->in_iowait) {
atomic_inc(&rq->nr_iowait);
delayacct_blkio_start();
}
}
extern void activate_task(struct rq *rq, struct task_struct *p, int flags); extern void activate_task(struct rq *rq, struct task_struct *p, int flags);
extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags); extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags);
@ -3607,7 +3662,7 @@ extern int __sched_setaffinity(struct task_struct *p, struct affinity_context *c
extern void __setscheduler_prio(struct task_struct *p, int prio); extern void __setscheduler_prio(struct task_struct *p, int prio);
extern void set_load_weight(struct task_struct *p, bool update_load); extern void set_load_weight(struct task_struct *p, bool update_load);
extern void enqueue_task(struct rq *rq, struct task_struct *p, int flags); extern void enqueue_task(struct rq *rq, struct task_struct *p, int flags);
extern void dequeue_task(struct rq *rq, struct task_struct *p, int flags); extern bool dequeue_task(struct rq *rq, struct task_struct *p, int flags);
extern void check_class_changed(struct rq *rq, struct task_struct *p, extern void check_class_changed(struct rq *rq, struct task_struct *p,
const struct sched_class *prev_class, const struct sched_class *prev_class,

View File

@ -41,26 +41,17 @@ static struct task_struct *pick_task_stop(struct rq *rq)
return rq->stop; return rq->stop;
} }
static struct task_struct *pick_next_task_stop(struct rq *rq)
{
struct task_struct *p = pick_task_stop(rq);
if (p)
set_next_task_stop(rq, p, true);
return p;
}
static void static void
enqueue_task_stop(struct rq *rq, struct task_struct *p, int flags) enqueue_task_stop(struct rq *rq, struct task_struct *p, int flags)
{ {
add_nr_running(rq, 1); add_nr_running(rq, 1);
} }
static void static bool
dequeue_task_stop(struct rq *rq, struct task_struct *p, int flags) dequeue_task_stop(struct rq *rq, struct task_struct *p, int flags)
{ {
sub_nr_running(rq, 1); sub_nr_running(rq, 1);
return true;
} }
static void yield_task_stop(struct rq *rq) static void yield_task_stop(struct rq *rq)
@ -68,7 +59,7 @@ static void yield_task_stop(struct rq *rq)
BUG(); /* the stop task should never yield, its pointless. */ BUG(); /* the stop task should never yield, its pointless. */
} }
static void put_prev_task_stop(struct rq *rq, struct task_struct *prev) static void put_prev_task_stop(struct rq *rq, struct task_struct *prev, struct task_struct *next)
{ {
update_curr_common(rq); update_curr_common(rq);
} }
@ -111,13 +102,12 @@ DEFINE_SCHED_CLASS(stop) = {
.wakeup_preempt = wakeup_preempt_stop, .wakeup_preempt = wakeup_preempt_stop,
.pick_next_task = pick_next_task_stop, .pick_task = pick_task_stop,
.put_prev_task = put_prev_task_stop, .put_prev_task = put_prev_task_stop,
.set_next_task = set_next_task_stop, .set_next_task = set_next_task_stop,
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
.balance = balance_stop, .balance = balance_stop,
.pick_task = pick_task_stop,
.select_task_rq = select_task_rq_stop, .select_task_rq = select_task_rq_stop,
.set_cpus_allowed = set_cpus_allowed_common, .set_cpus_allowed = set_cpus_allowed_common,
#endif #endif

View File

@ -57,7 +57,7 @@ static int effective_prio(struct task_struct *p)
* keep the priority unchanged. Otherwise, update priority * keep the priority unchanged. Otherwise, update priority
* to the normal priority: * to the normal priority:
*/ */
if (!rt_prio(p->prio)) if (!rt_or_dl_prio(p->prio))
return p->normal_prio; return p->normal_prio;
return p->prio; return p->prio;
} }
@ -258,107 +258,6 @@ int sched_core_idle_cpu(int cpu)
#endif #endif
#ifdef CONFIG_SMP
/*
* This function computes an effective utilization for the given CPU, to be
* used for frequency selection given the linear relation: f = u * f_max.
*
* The scheduler tracks the following metrics:
*
* cpu_util_{cfs,rt,dl,irq}()
* cpu_bw_dl()
*
* Where the cfs,rt and dl util numbers are tracked with the same metric and
* synchronized windows and are thus directly comparable.
*
* The cfs,rt,dl utilization are the running times measured with rq->clock_task
* which excludes things like IRQ and steal-time. These latter are then accrued
* in the IRQ utilization.
*
* The DL bandwidth number OTOH is not a measured metric but a value computed
* based on the task model parameters and gives the minimal utilization
* required to meet deadlines.
*/
unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
unsigned long *min,
unsigned long *max)
{
unsigned long util, irq, scale;
struct rq *rq = cpu_rq(cpu);
scale = arch_scale_cpu_capacity(cpu);
/*
* Early check to see if IRQ/steal time saturates the CPU, can be
* because of inaccuracies in how we track these -- see
* update_irq_load_avg().
*/
irq = cpu_util_irq(rq);
if (unlikely(irq >= scale)) {
if (min)
*min = scale;
if (max)
*max = scale;
return scale;
}
if (min) {
/*
* The minimum utilization returns the highest level between:
* - the computed DL bandwidth needed with the IRQ pressure which
* steals time to the deadline task.
* - The minimum performance requirement for CFS and/or RT.
*/
*min = max(irq + cpu_bw_dl(rq), uclamp_rq_get(rq, UCLAMP_MIN));
/*
* When an RT task is runnable and uclamp is not used, we must
* ensure that the task will run at maximum compute capacity.
*/
if (!uclamp_is_used() && rt_rq_is_runnable(&rq->rt))
*min = max(*min, scale);
}
/*
* Because the time spend on RT/DL tasks is visible as 'lost' time to
* CFS tasks and we use the same metric to track the effective
* utilization (PELT windows are synchronized) we can directly add them
* to obtain the CPU's actual utilization.
*/
util = util_cfs + cpu_util_rt(rq);
util += cpu_util_dl(rq);
/*
* The maximum hint is a soft bandwidth requirement, which can be lower
* than the actual utilization because of uclamp_max requirements.
*/
if (max)
*max = min(scale, uclamp_rq_get(rq, UCLAMP_MAX));
if (util >= scale)
return scale;
/*
* There is still idle time; further improve the number by using the
* IRQ metric. Because IRQ/steal time is hidden from the task clock we
* need to scale the task numbers:
*
* max - irq
* U' = irq + --------- * U
* max
*/
util = scale_irq_capacity(util, irq, scale);
util += irq;
return min(scale, util);
}
unsigned long sched_cpu_util(int cpu)
{
return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
}
#endif /* CONFIG_SMP */
/** /**
* find_process_by_pid - find a process with a matching PID value. * find_process_by_pid - find a process with a matching PID value.
* @pid: the pid in question. * @pid: the pid in question.
@ -401,13 +300,23 @@ static void __setscheduler_params(struct task_struct *p,
p->policy = policy; p->policy = policy;
if (dl_policy(policy)) if (dl_policy(policy)) {
__setparam_dl(p, attr); __setparam_dl(p, attr);
else if (fair_policy(policy)) } else if (fair_policy(policy)) {
p->static_prio = NICE_TO_PRIO(attr->sched_nice); p->static_prio = NICE_TO_PRIO(attr->sched_nice);
if (attr->sched_runtime) {
p->se.custom_slice = 1;
p->se.slice = clamp_t(u64, attr->sched_runtime,
NSEC_PER_MSEC/10, /* HZ=1000 * 10 */
NSEC_PER_MSEC*100); /* HZ=100 / 10 */
} else {
p->se.custom_slice = 0;
p->se.slice = sysctl_sched_base_slice;
}
}
/* rt-policy tasks do not have a timerslack */ /* rt-policy tasks do not have a timerslack */
if (task_is_realtime(p)) { if (rt_or_dl_task_policy(p)) {
p->timer_slack_ns = 0; p->timer_slack_ns = 0;
} else if (p->timer_slack_ns == 0) { } else if (p->timer_slack_ns == 0) {
/* when switching back to non-rt policy, restore timerslack */ /* when switching back to non-rt policy, restore timerslack */
@ -708,7 +617,9 @@ recheck:
* but store a possible modification of reset_on_fork. * but store a possible modification of reset_on_fork.
*/ */
if (unlikely(policy == p->policy)) { if (unlikely(policy == p->policy)) {
if (fair_policy(policy) && attr->sched_nice != task_nice(p)) if (fair_policy(policy) &&
(attr->sched_nice != task_nice(p) ||
(attr->sched_runtime != p->se.slice)))
goto change; goto change;
if (rt_policy(policy) && attr->sched_priority != p->rt_priority) if (rt_policy(policy) && attr->sched_priority != p->rt_priority)
goto change; goto change;
@ -854,6 +765,9 @@ static int _sched_setscheduler(struct task_struct *p, int policy,
.sched_nice = PRIO_TO_NICE(p->static_prio), .sched_nice = PRIO_TO_NICE(p->static_prio),
}; };
if (p->se.custom_slice)
attr.sched_runtime = p->se.slice;
/* Fixup the legacy SCHED_RESET_ON_FORK hack. */ /* Fixup the legacy SCHED_RESET_ON_FORK hack. */
if ((policy != SETPARAM_POLICY) && (policy & SCHED_RESET_ON_FORK)) { if ((policy != SETPARAM_POLICY) && (policy & SCHED_RESET_ON_FORK)) {
attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK; attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
@ -1020,12 +934,14 @@ err_size:
static void get_params(struct task_struct *p, struct sched_attr *attr) static void get_params(struct task_struct *p, struct sched_attr *attr)
{ {
if (task_has_dl_policy(p)) if (task_has_dl_policy(p)) {
__getparam_dl(p, attr); __getparam_dl(p, attr);
else if (task_has_rt_policy(p)) } else if (task_has_rt_policy(p)) {
attr->sched_priority = p->rt_priority; attr->sched_priority = p->rt_priority;
else } else {
attr->sched_nice = task_nice(p); attr->sched_nice = task_nice(p);
attr->sched_runtime = p->se.slice;
}
} }
/** /**

View File

@ -516,6 +516,14 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
if (cpumask_test_cpu(rq->cpu, cpu_active_mask)) if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
set_rq_online(rq); set_rq_online(rq);
/*
* Because the rq is not a task, dl_add_task_root_domain() did not
* move the fair server bw to the rd if it already started.
* Add it now.
*/
if (rq->fair_server.dl_server)
__dl_server_attach_root(&rq->fair_server, rq);
rq_unlock_irqrestore(rq, &rf); rq_unlock_irqrestore(rq, &rf);
if (old_rd) if (old_rd)

View File

@ -2557,7 +2557,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
error = current->timer_slack_ns; error = current->timer_slack_ns;
break; break;
case PR_SET_TIMERSLACK: case PR_SET_TIMERSLACK:
if (task_is_realtime(current)) if (rt_or_dl_task_policy(current))
break; break;
if (arg2 <= 0) if (arg2 <= 0)
current->timer_slack_ns = current->timer_slack_ns =

View File

@ -1977,7 +1977,7 @@ static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
* expiry. * expiry.
*/ */
if (IS_ENABLED(CONFIG_PREEMPT_RT)) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT)) if (rt_or_dl_task_policy(current) && !(mode & HRTIMER_MODE_SOFT))
mode |= HRTIMER_MODE_HARD; mode |= HRTIMER_MODE_HARD;
} }

View File

@ -547,7 +547,7 @@ probe_wakeup(void *ignore, struct task_struct *p)
* - wakeup_dl handles tasks belonging to sched_dl class only. * - wakeup_dl handles tasks belonging to sched_dl class only.
*/ */
if (tracing_dl || (wakeup_dl && !dl_task(p)) || if (tracing_dl || (wakeup_dl && !dl_task(p)) ||
(wakeup_rt && !dl_task(p) && !rt_task(p)) || (wakeup_rt && !rt_or_dl_task(p)) ||
(!dl_task(p) && (p->prio >= wakeup_prio || p->prio >= current->prio))) (!dl_task(p) && (p->prio >= wakeup_prio || p->prio >= current->prio)))
return; return;

View File

@ -418,7 +418,7 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE;
tsk = current; tsk = current;
if (rt_task(tsk)) { if (rt_or_dl_task(tsk)) {
bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32; bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32;
thresh += thresh / 4 + global_wb_domain.dirty_limit / 32; thresh += thresh / 4 + global_wb_domain.dirty_limit / 32;
} }
@ -477,7 +477,7 @@ static unsigned long node_dirty_limit(struct pglist_data *pgdat)
else else
dirty = vm_dirty_ratio * node_memory / 100; dirty = vm_dirty_ratio * node_memory / 100;
if (rt_task(tsk)) if (rt_or_dl_task(tsk))
dirty += dirty / 4; dirty += dirty / 4;
/* /*

View File

@ -4004,7 +4004,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
*/ */
if (alloc_flags & ALLOC_MIN_RESERVE) if (alloc_flags & ALLOC_MIN_RESERVE)
alloc_flags &= ~ALLOC_CPUSET; alloc_flags &= ~ALLOC_CPUSET;
} else if (unlikely(rt_task(current)) && in_task()) } else if (unlikely(rt_or_dl_task(current)) && in_task())
alloc_flags |= ALLOC_MIN_RESERVE; alloc_flags |= ALLOC_MIN_RESERVE;
alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags); alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);