linux/kernel/sched
Yasuaki Ishimatsu 2847c90e1b sched/fair: Care divide error in update_task_scan_period()
While offling node by hot removing memory, the following divide error
occurs:

  divide error: 0000 [#1] SMP
  [...]
  Call Trace:
   [...] handle_mm_fault
   [...] ? try_to_wake_up
   [...] ? wake_up_state
   [...] __do_page_fault
   [...] ? do_futex
   [...] ? put_prev_entity
   [...] ? __switch_to
   [...] do_page_fault
   [...] page_fault
  [...]
  RIP  [<ffffffff810a7081>] task_numa_fault
   RSP <ffff88084eb2bcb0>

The issue occurs as follows:
  1. When page fault occurs and page is allocated from node 1,
     task_struct->numa_faults_buffer_memory[] of node 1 is
     incremented and p->numa_faults_locality[] is also incremented
     as follows:

     o numa_faults_buffer_memory[]       o numa_faults_locality[]
              NR_NUMA_HINT_FAULT_TYPES
             |      0     |     1     |
     ----------------------------------  ----------------------
      node 0 |      0     |     0     |   remote |      0     |
      node 1 |      0     |     1     |   locale |      1     |
     ----------------------------------  ----------------------

  2. node 1 is offlined by hot removing memory.

  3. When page fault occurs, fault_types[] is calculated by using
     p->numa_faults_buffer_memory[] of all online nodes in
     task_numa_placement(). But node 1 was offline by step 2. So
     the fault_types[] is calculated by using only
     p->numa_faults_buffer_memory[] of node 0. So both of fault_types[]
     are set to 0.

  4. The values(0) of fault_types[] pass to update_task_scan_period().

  5. numa_faults_locality[1] is set to 1. So the following division is
     calculated.

        static void update_task_scan_period(struct task_struct *p,
                                unsigned long shared, unsigned long private){
        ...
                ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
        }

  6. But both of private and shared are set to 0. So divide error
     occurs here.

The divide error is rare case because the trigger is node offline.
This patch always increments denominator for avoiding divide error.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/54475703.8000505@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:46:03 +01:00
..
auto_group.c sched: Change autogroup_move_group() to use for_each_thread() 2014-08-20 09:47:18 +02:00
auto_group.h Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" 2012-12-11 10:23:45 +01:00
clock.c time: Replace __get_cpu_var uses 2014-08-26 13:45:44 -04:00
completion.c sched: Move completion code from core.c to completion.c 2013-11-06 07:49:19 +01:00
core.c sched: Fix race between task_group and sched_task_group 2014-10-28 10:45:59 +01:00
cpuacct.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
cpuacct.h sched/cpuacct: Initialize root cpuacct earlier 2013-04-10 13:54:20 +02:00
cpudeadline.c sched/deadline: Fix inter- exclusive cpusets migrations 2014-09-24 14:46:57 +02:00
cpudeadline.h sched/deadline: Replace NR_CPUS arrays 2014-05-22 10:21:28 +02:00
cpupri.c Merge commit '3cf2f34' into sched/core, to fix build error 2014-06-12 13:46:37 +02:00
cpupri.h sched/cpupri: Replace NR_CPUS arrays 2014-05-22 10:21:29 +02:00
cputime.c sched, time: Fix build error with 64 bit cputime_t on 32 bit systems 2014-10-03 05:46:55 +02:00
deadline.c sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer() 2014-10-28 10:46:01 +01:00
debug.c sched: print_rq(): Don't use tasklist_lock 2014-09-24 14:47:04 +02:00
fair.c sched/fair: Care divide error in update_task_scan_period() 2014-10-28 10:46:03 +01:00
features.h sched: Rename capacity related flags 2014-06-05 11:52:32 +02:00
idle_task.c sched: Transform resched_task() into resched_curr() 2014-07-16 13:38:19 +02:00
idle.c sched: Let the scheduler see CPU idle states 2014-09-24 14:46:58 +02:00
Makefile sched/idle: Move cpu/idle.c to sched/idle.c 2014-02-11 09:58:30 +01:00
proc.c cpuidle: menu: Lookup CPU runqueues less 2014-08-06 21:17:45 +02:00
rt.c Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
sched.h Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
stats.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
stats.h sched: Micro-optimize by dropping unnecessary task_rq() calls 2013-09-25 13:51:06 +02:00
stop_task.c sched: Add wrapper for checking task_struct::on_rq 2014-08-20 14:52:59 +02:00
wait.c SCHED: add some "wait..on_bit...timeout()" interfaces. 2014-09-25 08:23:57 -04:00