linux/kernel/time
Nicholas Piggin 2fe59f507a timers: Fix excessive granularity of new timers after a nohz idle
When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.

However there are several problems:

- If an existing timer is modified, the base is forwarded only after
  the index is calculated.

- The base is not forwarded by add_timer_on.

- There is a window after a timer is restarted from a nohz idle, after
  it is marked not-idle and before the timer tick on this CPU, where a
  timer may be added but the ancient base does not get forwarded.

These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.

Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.

There is still a case where mod_timer optimises the case of a pending
timer mod with the same expiry time, where the timer can see excessive
granularity relative to the new, shorter interval. A comment is added,
but it's not changed because it is an important fastpath for
networking.

This has been tested and found to fix the RCU softlockup messages.

Testing was also done with tracing to measure requested versus
achieved wakeup latencies for all non-deferrable timers in an idle
system (with no lockup watchdogs running). Wakeup latency relative to
absolute latency is calculated (note this suffers from round-up skew
at low absolute times) and analysed:

             max     avg      std
upstream   506.0    1.20     4.68
patched      2.0    1.08     0.15

The bug was noticed due to the lockup detector Kconfig changes
dropping it out of people's .configs and resulting in larger base
clk skew When the lockup detectors are enabled, no CPU can go idle for
longer than 4 seconds, which limits the granularity errors.
Sub-optimal timer behaviour is observable on a smaller scale in that
case:

	     max     avg      std
upstream     9.0    1.05     0.19
patched      2.0    1.04     0.11

Fixes: Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: David Miller <davem@davemloft.net>
Cc: dzickus@redhat.com
Cc: sfr@canb.auug.org.au
Cc: mpe@ellerman.id.au
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linuxarm@huawei.com
Cc: abdhalee@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: akpm@linux-foundation.org
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com
2017-08-24 11:40:18 +02:00
..
alarmtimer.c nanosleep: Use get_timespec64() and put_timespec64() 2017-06-30 04:14:14 -04:00
clockevents.c clockevents: Make clockevents_config() static 2017-03-23 12:14:05 -07:00
clocksource.c x86/tsc, sched/clock, clocksource: Use clocksource watchdog to provide stable sync points 2017-05-15 10:15:18 +02:00
hrtimer.c nanosleep: Use get_timespec64() and put_timespec64() 2017-06-30 04:14:14 -04:00
itimer.c itimer: Make timeval to nsec conversion range limited 2017-06-20 21:33:56 +02:00
jiffies.c jiffies: Revert bogus conversion of NSEC_PER_SEC to TICK_NSEC 2017-03-07 11:03:28 +01:00
Kconfig rcu: Remove nohz_full full-system-idle state machine 2017-06-08 18:52:39 -07:00
Makefile time: Remove CONFIG_TIMER_STATS 2017-02-10 11:15:08 +01:00
ntp_internal.h
ntp.c ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
posix-clock.c posix-timers: Move posix-timer internals to core 2017-06-04 15:40:23 +02:00
posix-cpu-timers.c Merge branch 'timers-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-05 15:34:35 -07:00
posix-stubs.c posix-timers: Use get_timespec64() and put_timespec64() 2017-06-30 04:13:19 -04:00
posix-timers.c posix_clocks: Use get_itimerspec64() and put_itimerspec64() 2017-06-30 04:15:02 -04:00
posix-timers.h posix-timers: Make nanosleep timespec argument const 2017-06-14 00:00:47 +02:00
sched_clock.c timers, sched_clock: Update timeout for clock wrap 2017-03-23 12:30:27 -07:00
test_udelay.c time: Avoid timespec in udelay_test 2016-06-20 12:47:26 -07:00
tick-broadcast-hrtimer.c ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
tick-broadcast.c tick/broadcast: Make tick_broadcast_setup_oneshot() static 2017-06-12 18:56:01 +02:00
tick-common.c ktime: Cleanup ktime_set() usage 2016-12-25 17:21:22 +01:00
tick-internal.h tick/broadcast: Make tick_broadcast_setup_oneshot() static 2017-06-12 18:56:01 +02:00
tick-oneshot.c ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
tick-sched.c Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-03 13:33:57 -07:00
tick-sched.h nohz: Fix collision between tick and other hrtimers, again 2017-05-17 08:19:47 +02:00
time.c time: introduce {get,put}_itimerspec64 2017-06-25 21:58:46 -04:00
timeconst.bc time: Introduce jiffies64_to_nsecs() 2017-02-01 09:13:45 +01:00
timeconv.c time: Add time64_to_tm() 2016-06-20 12:47:15 -07:00
timecounter.c clocksource: Use a plain u64 instead of cycle_t 2016-12-25 11:04:12 +01:00
timekeeping_debug.c timekeeping: Use deferred printk() in debug code 2017-02-15 11:46:08 +01:00
timekeeping_internal.h clocksource: Use a plain u64 instead of cycle_t 2016-12-25 11:04:12 +01:00
timekeeping.c time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD 2017-06-20 22:14:10 -07:00
timekeeping.h timekeeping: Remove unused timekeeping_{get,set}_tai_offset() 2017-01-06 16:47:28 -08:00
timer_list.c sysrq: Reset the watchdog timers while displaying high-resolution timers 2017-03-23 12:46:53 -07:00
timer.c timers: Fix excessive granularity of new timers after a nohz idle 2017-08-24 11:40:18 +02:00