We eventually would like to remove the rwlock cpufreq_driver_lock or
convert it back to a spinlock and protect the read sections with RCU.
The first step in that direction is to make cpufreq_driver use RCU.
I don't see an easy wasy to protect the cpufreq_cpu_data structure
with RCU, so I am leaving it with the rwlock for now since under
certain configs __cpufreq_cpu_get is a hot spot with 256+ cores.
[rjw: Subject, changelog, white space]
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The current calculation of the delay time is wrong and a cut and
paste error from a previous experimental driver. This can result in
the timeout being set to jiffies + 1 which setup the driver to race
with itself if the APIC timer interrupt happens at just the right
time.
References: https://bugzilla.redhat.com/show_bug.cgi?id=920289
Reported-by: Adam Williamson <awilliam@redhat.com>
Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
big LITTLE is ARM's new Architecture focussing power/performance needs of modern
world. More information about big LITTLE can be found here:
http://www.arm.com/products/processors/technologies/biglittleprocessing.phphttp://lwn.net/Articles/481055/
In order to keep cpufreq support for all big LITTLE platforms simple/generic,
this patch tries to add a generic cpufreq driver layer for all big LITTLE
platforms.
The driver is divided into two parts:
- Core driver: Generic and shared across all big LITTLE SoC's
- Glue drivers: Per platform drivers providing ops to the core driver
This patch adds in a generic glue driver which would extract information from
Device Tree.
Future SoC's can either reuse the DT glue or write their own depending on the
need.
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some assignments of policy-> min/max/cur/cpuinfo.min_freq/cpuinfo.max_freq
aren't required as part of it is done by cpufreq driver or cpufreq core.
Remove them.
At some places we merge multiple lines together too.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq layer doesn't call cpufreq driver's callback for any offline
CPU and so checking that isn't useful.
Lets get rid of it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
policy->cpus contains all online cpus that have single shared clock line. And
their frequencies are always updated together.
Many SMP system's cpufreq drivers take care of this in individual drivers but
the best place for this code is in cpufreq core.
This patch modifies cpufreq_notify_transition() to notify frequency change for
all cpus in policy->cpus and hence updates all users of this API.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently we are simply returning from target() if we encounter some error after
broadcasting CPUFREQ_PRECHANGE notifier. Which looks to be wrong as others might
depend on POSTCHANGE notifier for their functioning.
So, better broadcast CPUFREQ_POSTCHANGE notifier for these failure cases too,
but with old frequency.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is not possible for init() to be called for any cpu other than cpu0. During
bootup whatever cpu is used to boot system will be assigned as cpu0. And later
on policy->cpu can only change if we hotunplug all cpus first and then hotplug
them back in different order, which isn't possible (system requires atleast one
cpu to be up always :)).
Though I can see one situation where policy->cpu can be different then zero.
- Hot-unplug cpu 0.
- rmmod cpufreq-cpu0 module
- insmod it back
- hotplug cpu 0 again.
Here, policy->cpu would be different. But the driver doesn't have any dependency
on cpu0 as such. We don't mind which cpu of a system is policy->cpu and so this
check is just not required.
Remove it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use an inline function to evaluate freq_target to avoid duplicate code.
Also, define a macro for the default frequency step.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Convert all uses of devm_request_and_ioremap() to the newly introduced
devm_ioremap_resource() which provides more consistent error handling.
devm_ioremap_resource() provides its own error messages so all explicit
error messages can be removed from the failure code paths.
Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When we evaluate the CPU load for frequency decrease we have to compare
the load against down_threshold. There is no need to subtract 10 points
from down_threshold.
Instead, we have to use the default down_threshold or user's selection
unmodified.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently we always calculate the CPU iowait time and add it to idle time.
If we are in ondemand and we use io_is_busy, we re-calculate iowait time
and we subtract it from idle time.
With this patch iowait time is calculated only when necessary avoiding
the double call to get_cpu_iowait_time_us. We use a parameter in
function get_cpu_idle_time to distinguish when the iowait time will be
added to idle time or not, without the need of keeping the prev_io_wait.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.,org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Following patch has introduced per cpu timers or works for ondemand and
conservative governors.
commit 2abfa876f1
Author: Rickard Andersson <rickard.andersson@stericsson.com>
Date: Thu Dec 27 14:55:38 2012 +0000
cpufreq: handle SW coordinated CPUs
This causes additional unnecessary interrupts on all cpus when the load is
recently evaluated by any other cpu. i.e. When load is recently evaluated by cpu
x, we don't really need any other cpu to evaluate this load again for the next
sampling_rate time.
Some sort of code is present to avoid that but we are still getting timer
interrupts for all cpus. A good way of avoiding this would be to modify delays
for all cpus (policy->cpus) whenever any cpu has evaluated load.
This patch does this change and some related code cleanup.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Because we have per cpu timer now, we check if we need to evaluate load again or
not (In case it is recently evaluated). Here the 2nd cpu which got timer
interrupt updates core_dbs_info->sample_type irrespective of load evaluation is
required or not. Which is wrong as the first cpu is dependent on this variable
set to an older value.
Moreover it would be best in this case to schedule 2nd cpu's timer to
sampling_rate instead of freq_lo or hi as that must be managed by the other cpu.
In case the other cpu idles in between then also we wouldn't loose much power.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently MIN_LATENCY_MULTIPLIER is set defined as 100 and so on a system with
transition latency of 1 ms, the minimum sampling time comes to be around 100 ms.
That is quite big if you want to get better performance for your system.
Redefine MIN_LATENCY_MULTIPLIER to 20 so that we can support 20ms sampling rate
for such platforms.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.
Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.
This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.
This patch uses the infrastructure provided by earlier patch and
implements init/exit routines for ondemand and conservative
governors.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.
Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.
This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.
This patch is inclined towards providing this infrastructure. Because
we are required to allocate governor's resources dynamically now, we
must do it at policy creation and end. And so got
CPUFREQ_GOV_POLICY_INIT/EXIT.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This eliminates the contention I am seeing in __cpufreq_cpu_get.
It also nicely stages the lock to be replaced by the rcu.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With the addition of following patch:
fcf8058 cpufreq: Simplify cpufreq_add_dev()
cpufreq driver's .init() routine must initialize policy->cpus with
mask of all possible CPUs (Online + Offline) that share the clock.
Then the core would copy this mask onto policy->related_cpus and will
reset policy->cpus to carry only online cpus.
acpi-cpufreq driver wasn't updated with this assumption and so
sometimes when we try to hot[un]plug CPUs at run time, sysfs
directories get corrupted.
This patch fixes acpi-cpufreq driver against this corruption.
Reported-and-tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In cpufreq_stats_free_sysfs() we aren't balancing calls to
cpufreq_cpu_get() with cpufreq_cpu_put(). This will never let us have
ref count to policy->kobj as zero.
We will get a hang if somehow cpufreq_driver_unregister() is called.
And that can happen when we compile our driver as module and
insmod/rmmod it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If cpufreq_register_driver() fails just free memory that has been
allocated and return. intel_pstate_exit() function is removed since we
are built-in only now there is no reason for a module exit procedure.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>