Commit Graph

238 Commits

Author SHA1 Message Date
Linus Torvalds
714af06938 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix NULL ptr regression in powernow-k8
  [CPUFREQ] Create a blacklist for processors that should not load the acpi-cpufreq module.
  [CPUFREQ] Powernow-k8: Enable more than 2 low P-states
  [CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)
  [CPUFREQ] ondemand - Use global sysfs dir for tuning settings
  [CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq
  [CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got created
  [CPUFREQ] Factor out policy setting from cpufreq_add_dev
  [CPUFREQ] Factor out interface creation from cpufreq_add_dev
  [CPUFREQ] Factor out symlink creation from cpufreq_add_dev
  [CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_dev
  [CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_dev
  [CPUFREQ] update Doc for cpuinfo_cur_freq and scaling_cur_freq
2009-09-18 09:16:57 -07:00
Linus Torvalds
ada3fa1505 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
  powerpc64: convert to dynamic percpu allocator
  sparc64: use embedding percpu first chunk allocator
  percpu: kill lpage first chunk allocator
  x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
  percpu: update embedding first chunk allocator to handle sparse units
  percpu: use group information to allocate vmap areas sparsely
  vmalloc: implement pcpu_get_vm_areas()
  vmalloc: separate out insert_vmalloc_vm()
  percpu: add chunk->base_addr
  percpu: add pcpu_unit_offsets[]
  percpu: introduce pcpu_alloc_info and pcpu_group_info
  percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
  percpu: add @align to pcpu_fc_alloc_fn_t
  percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
  percpu: drop @static_size from first chunk allocators
  percpu: generalize first chunk allocator selection
  percpu: build first chunk allocators selectively
  percpu: rename 4k first chunk allocator to page
  percpu: improve boot messages
  percpu: fix pcpu_reclaim() locking
  ...

Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-15 09:39:44 -07:00
Mathieu Desnoyers
395913d0b1 [CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)
remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)

commit	42a06f2166

Missed a call site for CPUFREQ_GOV_STOP to remove the rwlock taken around the
teardown. To make a long story short, the rwlock write-lock causes a circular
dependency with cancel_delayed_work_sync(), because the timer handler takes the
read lock.

Note that all callers to __cpufreq_set_policy are taking the rwsem. All sysfs
callers (writers) hold the write rwsem at the earliest sysfs calling stage.

However, the rwlock write-lock is not needed upon governor stop.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: trenn@suse.de
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:18 -04:00
Thomas Renninger
0e625ac153 [CPUFREQ] ondemand - Use global sysfs dir for tuning settings
Ondemand has only global variables for userspace tunings via sysfs.
But they were exposed per CPU which wrongly implies to the user that
his settings are applied per cpu. Also locking sysfs against concurrent
access won't be necessary anymore after deprecation time.

This means the ondemand config dir is moved:
/sys/devices/system/cpu/cpu*/cpufreq/ondemand ->
     /sys/devices/system/cpu/cpufreq/ondemand

The old files will still exist, but reading or writing to them will
result in one (printk_once) deprecation msg to syslog per file.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:18 -04:00
Thomas Renninger
8aa84ad8d6 [CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq
Currently everything in the cpufreq layer is per core based.
This does not reflect reality, for example ondemand on conservative
governors have global sysfs variables.

Introduce a global cpufreq directory and add the kobject to the governor
struct, so that governors can easily access it.
The directory is initialized in the cpufreq_core_init initcall and thus will
always be created if cpufreq is compiled in, even if no cpufreq driver is
active later.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:14 -04:00
Thomas Renninger
4bfa042cd3 [CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got created
Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online

on a managed CPU will result in:
Jul 22 15:15:37 linux kernel: [   80.013864] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcf/0xe6()
Jul 22 15:15:37 linux kernel: [   80.013866] Hardware name: To Be Filled By O.E.M.
Jul 22 15:15:37 linux kernel: [   80.013868] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Jul 22 15:15:37 linux kernel: [   80.013870] Modules linked in: powernow_k8
Jul 22 15:15:37 linux kernel: [   80.013874] Pid: 5750, comm: bash Not tainted 2.6.31-rc2 #40
Jul 22 15:15:37 linux kernel: [   80.013876] Call Trace:
Jul 22 15:15:37 linux kernel: [   80.013879]  [<ffffffff8112ebda>] ? sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [   80.013884]  [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 15:15:37 linux kernel: [   80.013888]  [<ffffffff810419a0>] warn_slowpath_fmt+0x3c/0x3e
Jul 22 15:15:37 linux kernel: [   80.013891]  [<ffffffff8112ebda>] sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [   80.013894]  [<ffffffff8112f213>] create_dir+0x58/0x87
Jul 22 15:15:37 linux kernel: [   80.013898]  [<ffffffff8112f27a>] sysfs_create_dir+0x38/0x4f
Jul 22 15:15:37 linux kernel: [   80.013902]  [<ffffffff811ffb8a>] kobject_add_internal+0x11f/0x1de
Jul 22 15:15:37 linux kernel: [   80.013905]  [<ffffffff811ffd21>] kobject_add_varg+0x41/0x4e
Jul 22 15:15:37 linux kernel: [   80.013908]  [<ffffffff811ffd7a>] kobject_init_and_add+0x4c/0x57
Jul 22 15:15:37 linux kernel: [   80.013913]  [<ffffffff810667bc>] ? mark_lock+0x22/0x228
Jul 22 15:15:37 linux kernel: [   80.013918]  [<ffffffff813e8a3b>] cpufreq_add_dev_interface+0x40/0x1e4
...

This bug slipped in by git commit:
150b06f7f223cfd0f808737a5243cceca8ea47fa

When splitting up cpufreq_add_dev, the whole cpufreq_add_dev function
is not left anymore, only cpufreq_add_dev_policy.
This patch should reconstruct the identical functionality again as it
was before the split.

CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:14 -04:00
Dave Jones
ecf7e4611c [CPUFREQ] Factor out policy setting from cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:14 -04:00
Dave Jones
909a694e33 [CPUFREQ] Factor out interface creation from cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:13 -04:00
Dave Jones
19d6f7ec3e [CPUFREQ] Factor out symlink creation from cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:12 -04:00
Dave Jones
059019a3c3 [CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:09 -04:00
Dave Jones
54e6fe167b [CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_dev
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:09 -04:00
Dominik Brodowski
ce6c3997c2 [CPUFREQ] Re-enable cpufreq suspend and resume code
Commit 4bc5d34135 is broken and causes regressions:

(1) cpufreq_driver->resume() and ->suspend() were only called on
__powerpc__, but you could set them on all architectures. In fact,
->resume() was defined and used before the PPC-related commit
42d4dc3f4e complained about in 4bc5d34135.

(2) Therfore, the resume functions in acpi_cpufreq and speedstep-smi
would never be called.

(3) This means speedstep-smi would be unusuable after suspend or resume.

The _real_ problem was calling cpufreq_driver->get() with interrupts
off, but it re-enabling interrupts on some platforms. Why is ->get()
necessary?

Some systems like to change the CPU frequency behind our
back, especially during BIOS-intensive operations like suspend or
resume. If such systems also use a CPU frequency-dependant timing loop,
delays might be off by large factors. Therefore, we need to ascertain
as soon as possible that the CPU frequency is indeed at the speed we
think it is. You can do this two ways: either setting it anew, or trying
to get it. The latter is what was done, the former also has the same IRQ
issue.

So, let's try something different: defer the checking to after interrupts
are re-enabled, by calling cpufreq_update_policy() (via schedule_work()).
Timings may be off until this later stage, so let's watch out for
resume regressions caused by the deferred handling of frequency changes
behind the kernel's back.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:08 -04:00
Tejun Heo
384be2b18a Merge branch 'percpu-for-linus' into percpu-for-next
Conflicts:
	arch/sparc/kernel/smp_64.c
	arch/x86/kernel/cpu/perf_counter.c
	arch/x86/kernel/setup_percpu.c
	drivers/cpufreq/cpufreq_ondemand.c
	mm/percpu.c

Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids.  As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14 14:45:31 +09:00
Dave Jones
4bc5d34135 [CPUFREQ] Make cpufreq suspend code conditional on powerpc.
The suspend code runs with interrupts disabled, and the powerpc workaround we
do in the cpufreq suspend hook calls the drivers ->get method.

powernow-k8's ->get does an smp_call_function_single
which needs interrupts enabled

cpufreq's suspend/resume code was added in 42d4dc3f4e to work around
a hardware problem on ppc powerbooks.  If we make all this code
conditional on powerpc, we avoid the issue above.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04 14:32:11 -04:00
Thomas Renninger
d5194decd0 [CPUFREQ] Fix a kobject reference bug related to managed CPUs
The first offline/online cycle is successful, the second not.
Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online
echo 0 >cpu1/online

The last command will trigger:
Jul 22 14:39:50 linux kernel: [  593.210125] ------------[ cut here ]------------
Jul 22 14:39:50 linux kernel: [  593.210139] WARNING: at lib/kref.c:43 kref_get+0x23/0x2b()
Jul 22 14:39:50 linux kernel: [  593.210144] Hardware name: To Be Filled By O.E.M.
Jul 22 14:39:50 linux kernel: [  593.210148] Modules linked in: powernow_k8
Jul 22 14:39:50 linux kernel: [  593.210158] Pid: 378, comm: kondemand/2 Tainted: G        W  2.6.31-rc2 #38
Jul 22 14:39:50 linux kernel: [  593.210163] Call Trace:
Jul 22 14:39:50 linux kernel: [  593.210171]  [<ffffffff812008e8>] ? kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [  593.210181]  [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 14:39:50 linux kernel: [  593.210190]  [<ffffffff81041962>] warn_slowpath_null+0xf/0x11
Jul 22 14:39:50 linux kernel: [  593.210198]  [<ffffffff812008e8>] kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [  593.210206]  [<ffffffff811ffa19>] kobject_get+0x1a/0x22
Jul 22 14:39:50 linux kernel: [  593.210214]  [<ffffffff813e815d>] cpufreq_cpu_get+0x8a/0xcb
Jul 22 14:39:50 linux kernel: [  593.210222]  [<ffffffff813e87d1>] __cpufreq_driver_getavg+0x1d/0x67
Jul 22 14:39:50 linux kernel: [  593.210231]  [<ffffffff813ea18f>] do_dbs_timer+0x158/0x27f
Jul 22 14:39:50 linux kernel: [  593.210240]  [<ffffffff810529ea>] worker_thread+0x200/0x313
...

The output continues on every do_dbs_timer ondemand freq checking poll.
This regression was introduced by git commit:
3f4a782b5c

The policy is released when the cpufreq device is removed in:
__cpufreq_remove_dev():
	/* if this isn't the CPU which is the parent of the kobj, we
	 * only need to unlink, put and exit
	 */

Not creating the symlink is not sever at all.
As long as:
sysfs_remove_link(&sys_dev->kobj, "cpufreq");
handles it gracefully that the symlink did not exist.
Possibly no error should be returned at all, because ondemand
governor would still provide the same functionality.
Userspace in userspace gov case might be confused if the link
is missing.

Resolves http://bugzilla.kernel.org/show_bug.cgi?id=13903

CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04 14:32:11 -04:00
Prarit Bhargava
42c74b84c6 [CPUFREQ] Do not set policy for offline cpus
Suspend/Resume fails on multi socket, multi core systems because the cpufreq
code erroneously sets the per_cpu policy_cpu value when a logical cpu is
offline.

This most notably results in missing sysfs files that are used to set the
cpu frequencies of the various cpus.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04 14:32:10 -04:00
Pallipadi, Venkatesh
26d204afa1 [CPUFREQ] Fix NULL pointer dereference regression in conservative governor
Commit ee88415caf
introduced this regression when it removed enable bit in cpu_dbs_info_s.
That added a possibility of dbs_cpufreq_notifier getting called for a
CPU that is not yet managed by conservative governor. That will happen
as the transition notifier is set as soon as one CPU switches to
conservative governor and other CPUs can get a NULL pointer dereference
without the enable bit check. Add the enable bit back again.

Reported-by: Lermytte Christophe <Christophe.Lermytte@thomson.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04 14:32:10 -04:00
Dave Jones
5e1596f753 [CPUFREQ] Fix compile failure in cpufreq.c
managed_policy is out of scope for the non-smp case.
Declare it locally where used (twice)

Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-08 19:04:23 -04:00
Mathieu Desnoyers
3f4a782b5c [CPUFREQ] fix (utter) cpufreq_add_dev mess
OK, I've tried to clean it up the best I could, but please test this with
concurrent cpu hotplug and cpufreq add/remove in loops. I'm sure we will make
other interesting findings.

This is step one of fixing the overall locking dependency mess in cpufreq.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
CC: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-06 21:38:28 -04:00
venkatesh.pallipadi@intel.com
ee88415caf [CPUFREQ] Cleanup locking in conservative governor
Redesign the locking inside conservative driver. Make dbs_mutex handle all the
global state changes inside the driver and invent a new percpu mutex
to serialize percpu timer and frequency limit change.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-06 21:38:28 -04:00
venkatesh.pallipadi@intel.com
5a75c82828 [CPUFREQ] Cleanup locking in ondemand governor
Redesign the locking inside ondemand driver. Make dbs_mutex handle all the
global state changes inside the driver and invent a new percpu mutex
to serialize percpu timer and frequency limit change.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-06 21:38:28 -04:00
venkatesh.pallipadi@intel.com
7d26e2d5e2 [CPUFREQ] Eliminate the recent lockdep warnings in cpufreq
Commit b14893a62c although it was very
much needed to properly cleanup ondemand timer, opened-up a can of worms
related to locking dependencies in cpufreq.

Patch here defines the need for dbs_mutex and cleans up its usage in
ondemand governor. This also resolves the lockdep warnings reported here

http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/01925.html
http://lkml.indiana.edu/hypermail/linux/kernel/0907.0/00820.html

and few others..

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-06 21:38:27 -04:00
Tejun Heo
245b2e70ea percpu: clean up percpu variable definitions
Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique.  Update percpu
variable definitions accordingly.

* as,cfq: rename ioc_count uniquely

* cpufreq: rename cpu_dbs_info uniquely

* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it

* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
  rename it

* ipv4,6: rename cookie_scratch uniquely

* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
  pmc_irq_entry and nmi_entry to pmc_nmi_entry

* perf_counter: rename disable_count to perf_disable_count

* ftrace: rename test_event_disable to ftrace_test_event_disable

* kmemleak: rename test_pointer to kmemleak_test_pointer

* mce: rename next_interval to mce_next_interval

[ Impact: percpu usage cleanups, no duplicate static percpu var names ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
2009-06-24 15:13:48 +09:00
Thomas Renninger
4f4d1ad6ee [CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful
Update the documentation accordingly.
Cleanup and use printk_once.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:41 -04:00
Thomas Renninger
cef9615a85 [CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
With this patch you have following minimal sampling rate restrictions:

Kernel restrictions:
If CONFIG_NO_HZ is set, the limit is 10ms fixed.
If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the
limits depend on the CONFIG_HZ option:
HZ=1000: min=20000us  (20ms)
HZ=250:  min=80000us  (80ms)
HZ=100:  min=200000us (200ms)

HW restrictions:
Do not sample/poll more often than HW latency * 100  exported by the low
level cpufreq HW driver

The higher value of above restrictions is the minimal sampling rate
that can be set (and can be seen via ondemand/sampling_rate_min sysfs file)

Default sampling rate still is HW latency * 1000, but this will now end
up in lower values on latest (Intel and AMD) hardware as these can switch
really fast and sampling rate mostly was limited to the 80ms or 200ms
(depending on whether HZ=250 or HZ=1000 is used).

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:41 -04:00
Yinghai Lu
eaa958402e cpumask: alloc zeroed cpumask for static cpumask_var_ts
These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already.  Avoid surprises when MAXSMP is enabled.

Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-09 22:30:27 +09:30
Mathieu Desnoyers
b14893a62c [CPUFREQ] fix timer teardown in ondemand governor
* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject		: cpufreq timer teardown problem
> Submitter	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date		: 2009-04-23 14:00 (24 days old)
> References	: http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch		: http://patchwork.kernel.org/patch/19754/
> 		  http://patchwork.kernel.org/patch/19753/
>

(updated changelog)

cpufreq fix timer teardown in ondemand governor

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit.

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes things
usually run smoothly. However, if the governor is quickly stopped and then
started, this could lead to the following race :

dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
This is why a synchronized teardown is required.

The following patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2.

Depends on patch
cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-05-26 12:04:50 -04:00
Mathieu Desnoyers
b253d2b2d2 [CPUFREQ] fix timer teardown in conservative governor
* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject		: cpufreq timer teardown problem
> Submitter	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date		: 2009-04-23 14:00 (24 days old)
> References	: http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch		: http://patchwork.kernel.org/patch/19754/
> 		  http://patchwork.kernel.org/patch/19753/
>

(re-send with updated changelog)

cpufreq fix timer teardown in conservative governor

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit.

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes things
usually run smoothly. However, if the governor is quickly stopped and then
started, this could lead to the following race :

dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
This is why a synchronized teardown is required.

Depends on patch
cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

The following patch applies to 2.6.30-rc2. Stable kernels have a similar
issue which should also be fixed, but the code changed between 2.6.29
and 2.6.30, so this patch only applies to 2.6.30-rc.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-05-26 12:04:50 -04:00
Mathieu Desnoyers
42a06f2166 [CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call
* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject		: cpufreq timer teardown problem
> Submitter	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date		: 2009-04-23 14:00 (24 days old)
> References	: http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch		: http://patchwork.kernel.org/patch/19754/
> 		  http://patchwork.kernel.org/patch/19753/

The patches linked above depend on the following patch to remove
circular locking dependency :

cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

(the following issue was faced when using cancel_delayed_work_sync() in the
timer teardown (which fixes a race).

* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
>
> my box output following warnings.
> it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
>
> A: work -> do_dbs_timer()  -> cpu_policy_rwsem
> B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work
>
>

Hrm, I think it must be due to my attempt to fix the timer teardown race
in ondemand governor mixed with new locking behavior in 2.6.30-rc.

The rwlock seems to be taken around the whole call to
cpufreq_governor_dbs(), when it should be only taken around accesses to
the locked data, and especially *not* around the call to
dbs_timer_exit().

Reverting my fix attempt would put the teardown race back in place
(replacing the cancel_delayed_work_sync by cancel_delayed_work).
Instead, a proper fix would imply modifying this critical section :

cpufreq.c: __cpufreq_remove_dev()
...
        if (cpufreq_driver->target)
                __cpufreq_governor(data, CPUFREQ_GOV_STOP);

        unlock_policy_rwsem_write(cpu);

To make sure the __cpufreq_governor() callback is not called with rwsem
held. This would allow execution of cancel_delayed_work_sync() without
being nested within the rwsem.

Applies on top of the 2.6.30-rc5 tree.

Required to remove circular dep in teardown of both conservative and
ondemande governors so they can use cancel_delayed_work_sync().
CPUFREQ_GOV_STOP does not modify the policy, therefore this locking seemed
unneeded.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-05-26 12:04:50 -04:00
Linus Torvalds
ada19a31a9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (35 commits)
  [CPUFREQ] Prevent p4-clockmod from auto-binding to the ondemand governor.
  [CPUFREQ] Make cpufreq-nforce2 less obnoxious
  [CPUFREQ] p4-clockmod reports wrong frequency.
  [CPUFREQ] powernow-k8: Use a common exit path.
  [CPUFREQ] Change link order of x86 cpufreq modules
  [CPUFREQ] conservative: remove 10x from def_sampling_rate
  [CPUFREQ] conservative: fixup governor to function more like ondemand logic
  [CPUFREQ] conservative: fix dbs_cpufreq_notifier so freq is not locked
  [CPUFREQ] conservative: amend author's email address
  [CPUFREQ] Use swap() in longhaul.c
  [CPUFREQ] checkpatch cleanups for acpi-cpufreq
  [CPUFREQ] powernow-k8: Only print error message once, not per core.
  [CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions
  [CPUFREQ] ondemand/conservative: deprecate sampling_rate{min,max}
  [CPUFREQ] powernow-k8: Always compile powernow-k8 driver with ACPI support
  [CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency
  [CPUFREQ] checkpatch cleanups for powernow-k8
  [CPUFREQ] checkpatch cleanups for ondemand governor.
  [CPUFREQ] checkpatch cleanups for powernow-k7
  [CPUFREQ] checkpatch cleanups for speedstep related drivers.
  ...
2009-03-26 11:04:08 -07:00
Dave Jones
129f8ae9b1 Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod."
This reverts commit e088e4c9cd.

Removing the sysfs interface for p4-clockmod was flagged as a
regression in bug 12826.

Course of action:
 - Find out the remaining causes of overheating, and fix them
   if possible. ACPI should be doing the right thing automatically.
   If it isn't, we need to fix that.
 - mark p4-clockmod ui as deprecated
 - try again with the removal in six months.

It's not really feasible to printk about the deprecation, because
it needs to happen at all the sysfs entry points, which means adding
a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-03-09 15:07:33 -04:00
Alexander Clouter
a75603a084 [CPUFREQ] conservative: remove 10x from def_sampling_rate
AMD users get particular hit by this issue (bug 8081) as it caps at
typically 90 seconds as the minimum period for a frequency change.
Harsh eh?  Years ago I borked this buy puting the 10x in the wrong
place...I fix that by removing it altogether.

Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:32 -05:00
Alexander Clouter
8e677ce83b [CPUFREQ] conservative: fixup governor to function more like ondemand logic
As conservative is based off ondemand the codebases occasionally need to be
resync'd.  This patch, although ugly, does this.

Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:32 -05:00
Alexander Clouter
f407a08bb7 [CPUFREQ] conservative: fix dbs_cpufreq_notifier so freq is not locked
When someone added the dbs_cpufreq_notifier section to the governor the
code ended up causing the frequency to only fall.  This is because
requested_freq is tinkered with and that should only modified if it has
an invlaid value due to changes in the available frequency ranges

This should fix #10055.

Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:32 -05:00
Alexander Clouter
11a80a9c76 [CPUFREQ] conservative: amend author's email address
Amend author's email address.

Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Thomas Renninger
112124ab0a [CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions
Limit sampling rate to transition_latency * 100 or kernel limits.
If sampling_rate is tried to be set too low, set the lowest allowed value.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Thomas Renninger
9411b4ef7f [CPUFREQ] ondemand/conservative: deprecate sampling_rate{min,max}
The same info can be obtained via the transition_latency sysfs file

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Thomas Renninger
ed12978453 [CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency
It's not only useful for the ondemand and conservative governors, but
also for userspace daemons to know about the HW transition latency of
the CPU.
It is especially useful for userspace to know about this value when
the ondemand or conservative governors are run. The sampling rate
control value depends on it and for userspace being able to set sane
tuning values there it has to know about the transition latency.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Dave Jones
2b03f891ad [CPUFREQ] checkpatch cleanups for ondemand governor.
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:30 -05:00
Dave Jones
97acec55de [CPUFREQ] checkpatch cleanups for freq_table
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:30 -05:00
Dave Jones
1bceb8d139 [CPUFREQ] checkpatch cleanups for userspace governor
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:30 -05:00
Dave Jones
0a829c5afd [CPUFREQ] checkpatch cleanups for cpufreq_stats
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:30 -05:00
Dave Jones
9acef48756 [CPUFREQ] checkpatch cleanups for conservative governor
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:29 -05:00
Dave Jones
29464f2813 [CPUFREQ] checkpatch cleanups for cpufreq core
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:29 -05:00
Venkatesh Pallipadi
1ca3abdb6a [CPUFREQ] Make ignore_nice_load setting of ondemand work as expected.
ondemand micro-accounting of idle time changes broke ignore_nice_load
sysfs setting due to a thinko in the code.

The bug entry:
http://bugzilla.kernel.org/show_bug.cgi?id=12310

Reported-by: Jim Bray <jimsantelmo@gmail.com>

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-05 12:25:26 -05:00
Linus Torvalds
4e9b1c184c Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  [IA64] fix typo in cpumask_of_pcibus()
  x86: fix x86_32 builds for summit and es7000 arch's
  cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
  cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
  cpumask: use cpumask_var_t in acpi-cpufreq.c
  cpumask: use work_on_cpu in acpi/cstate.c
  cpumask: convert struct cpufreq_policy to cpumask_var_t
  cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
  x86: cleanup remaining cpumask_t ops in smpboot code
  cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  cpumask: update local_cpus_show to use new cpumask API
  ia64: cpumask fix for is_affinity_mask_valid()
2009-01-10 06:12:18 -08:00
Frederik Schwarzer
0211a9c850 trivial: fix an -> a typos in documentation and comments
It is always "an" if there is a vowel _spoken_ (not written).
So it is:
"an hour" (spoken vowel)
but
"a uniform" (spoken 'j')

Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:07 +01:00
Rusty Russell
835481d9bc cpumask: convert struct cpufreq_policy to cpumask_var_t
Impact: use new cpumask API to reduce memory usage

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS.  cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:31 +01:00
Mike Chan
187d9f4ed4 [CPUFREQ] Fix on resume, now preserves user policy min/max.
Previously driver resume would always set the current policy min/max with
the cpuinfo min/max, defined by user_policy.min/max. Resulting in a reset
of policy settings when policy.min/max != cpuinfo.min/max when coming out
of suspend. Now user_policy is saved as the policy instead of cpuinfo to
preserve what the user actually set.

Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:11 -05:00
Matthew Garrett
e088e4c9cd [CPUFREQ] Disable sysfs ui for p4-clockmod.
p4-clockmod has a long history of abuse.   It pretends to be a CPU
frequency scaling driver, even though it doesn't actually change
the CPU frequency, but instead just modulates the frequency with
wait-states.
The biggest misconception is that when running at the lower 'frequency'
p4-clockmod is saving power.  This isn't the case, as workloads running
slower take longer to complete, preventing the CPU from entering deep C states.

However p4-clockmod does have a purpose.  It can prevent overheating.
Having it hooked up to the cpufreq interfaces is the wrong way to achieve
cooling however. It should instead be hooked up to ACPI.

This diff introduces a means for a cpufreq driver to register with the
cpufreq core, but not present a sysfs interface.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:10 -05:00