This patch adds proper logic to cpufreq driver in order to handle
CPU Hotplug.
When CPUs go on/offline, the affected CPUs data, cpufreq_policy->cpus,
is not updated properly. This causes sysfs directories and symlinks to
be in an incorrect state after few CPU on/offlines.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Introduce caching of cpufreq_cpu_data[freqs->cpu], which allows us to
make the function a lot more readable, and as a nice side-effect, it
now fits in < 80 column displays again.
Signed-off-by: Dave Jones <davej@redhat.com>
Userspace governor need not to hold it's own cpufreq_policy,
better make use of the global core policy.
Also fixes a bug in case of frequency changes via _PPC.
Old min/max values have wrongly been passed to __cpufreq_driver_target()
(kind of buffered) and when max freq was available again, only the old
max(normally lowest freq) was still active.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
cpufreq_userspace.c | 53 +++++++++++++++++++++++++++-------------------------
1 files changed, 28 insertions(+), 25 deletions(-)
BIOS might change frequency behind our back when BIOS changes allowed
frequencies via _PPC. In this case cpufreq core got out of sync.
Ask driver for current freq and notify governors about a change
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Make the cpufreq code play nicely with the mutex debugging code: don't free a
held mutex.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
This one fell through the automation at first because it initializes the
semaphore to locked, but that's easily remedied
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Dave Jones <davej@redhat.com>
drivers/cpufreq/cpufreq.c | 37 +++++++++++++++++++------------------
include/linux/cpufreq.h | 3 ++-
2 files changed, 21 insertions(+), 19 deletions(-)
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of
changing frequency?
Today the answer is: It depends.
On i386:
SMP kernel - It is always the boot frequency
UP kernel - Scales with the frequency change and shows that was last set.
On x86_64:
There is one single variable cpu_khz that gets written by all the CPUs. So,
the frequency set by last CPU will be seen on /proc/cpuinfo of all the
CPUs in the system. What you see also depends on whether you have constant_tsc
capable CPU or not.
On ia64:
It is always boot time frequency of a particular CPU that gets displayed.
The patch below changes this to:
Show the last known frequency of the particular CPU, when cpufreq is present. If
cpu doesnot support changing of frequency through cpufreq, then boot frequency
will be shown. The patch affects i386, x86_64 and ia64 architectures.
Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The use of the 'ignore_nice' sysfs file is confusing to anyone using it.
This removes the sysfs file 'ignore_nice' and in its place creates a
'ignore_nice_load' entry that defaults to '0'; meaning nice'd processes
_are_ counted towards the 'business' calculation.
WARNING: this obvious breaks any userland tools that expected ignore_nice'
to exist, to draw attention to this fact it was concluded on the mailing
list that the entry should be removed altogether so the userland app breaks
and so the author can build simple to detect workaround. Having said that
it seems currently very few tools even make use of this functionality; all
I could find was a Gentoo Wiki entry.
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
There are some callers in cpufreq hotplug notify path that the lowest
function calls lock_cpu_hotplug(). The lock is already held during
cpu_up() and cpu_down() calls when the notify calls are broadcast to
registered clients.
Ideally if possible, we could disable_preempt() at the highest caller and
make sure we dont sleep in the path down in cpufreq->driver_target() calls
but the calls are so intertwined and cumbersome to cleanup.
Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in
all places.
- Removed export of cpucontrol semaphore and made it static.
- removed explicit uses of up/down with lock_cpu_hotplug()
so we can keep track of the the callers in same thread context and
just keep refcounts without calling a down() that causes a deadlock.
- Removed current_in_hotplug() uses
- Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug()
temporary workaround.
Tested with insmod of cpufreq_stat.ko, and logical online/offline
to make sure we dont have any hang situations.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/cpufreq/cpufreq.c: In function `cpufreq_remove_dev':
drivers/cpufreq/cpufreq.c:696: warning: unused variable `cpu_sys_dev'
Signed-off-by: Grant Coady <gcoady@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When calling target drivers to set frequency, we take cpucontrol lock.
When we modified the code to accomodate CPU hotplug, there was an attempt
to take a double lock of cpucontrol leading to a deadlock. Since the
current thread context is already holding the cpucontrol lock, we dont need
to make another attempt to acquire it.
Now we leave a trace in current->flags indicating current thread already is
under cpucontrol lock held, so we dont attempt to do this another time.
Thanks to Andrew Morton for the beating:-)
From: Brice Goglin <Brice.Goglin@ens-lyon.org>
Build fix
(akpm: this patch is still unpleasant. Ashok continues to look for a cleaner
solution, doesn't he? ;))
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpufreq entries in sysfs should only be populated when CPU is online state.
When we either boot with maxcpus=x and then boot the other cpus by echoing
to sysfs online file, these entries should be created and destroyed when
CPU_DEAD is notified. Same treatement as cache entries under sysfs.
We place the processor in the lowest frequency, so hw managed P-State
transitions can still work on the other threads to save power.
Primary goal was to just make these directories appear/disapper dynamically.
There is one in this patch i had to do, which i really dont like myself but
probably best if someone handling the cpufreq infrastructure could give
this code right treatment if this is not acceptable. I guess its probably
good for the first cut.
- Converting lock_cpu_hotplug()/unlock_cpu_hotplug() to disable/enable preempt.
The locking was smack in the middle of the notification path, when the
hotplug is already holding the lock. I tried another solution to avoid this
so avoid taking locks if we know we are from notification path. The solution
was getting very ugly and i decided this was probably good for this iteration
until someone who understands cpufreq could do a better job than me.
(akpm: export cpucontrol to GPL modules: drivers/cpufreq/cpufreq_stats.c now
does lock_cpu_hotplug())
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Zwane Mwaikambo <zwane@holomorphy.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpu_sys_devices is redundant with the new API get_cpu_sysdev(). So nuking
this usage since its not needed.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Zwane Mwaikambo <zwane@holomorphy.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't try to access not-present CPUs. Conservative governor will always
oops on SMP without this fix.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=4781
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes an issue found in drivers/cpufreq/cpufreq_stats.c by Coverity.
Error reported:
CID: 2642
Checker: NULL_RETURNS (help)
File: /export2/p4-coverity/mc2/linux26/drivers/cpufreq/cpufreq_stats.c
Function: cpufreq_stats_create_table
Description: Dereferencing NULL value "data"
Patch description:
The return of cpufreq_cpu_get can be NULL, check return code and return
-EINVAL if it is NULL.
Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The problem is in the ondemand governor, there is a periodic measurement
of the CPU usage. This CPU usage is updated by the scheduler after every
tick (basically, by adding 1 either to "idle" or to "user" or to
"system"). So if the frequency of the governor is too high, the stat
will be meaningless (as mostly no number have changed).
So this patch checks that the measurements are separated by at least 10
ticks. It means that by default, stats will have about 5% error (20
ticks). Of course those numbers can be argued but, IMHO, they look sane.
The patch also includes a small clean-up to check more explictly the
result of the conversion from ns to µs being null.
Let's note that (on x86) this has never been really needed before 2.6.13
because HZ was always 1000. Now that HZ can be 100, some CPU might be
affected by this problem. For instance when HZ=100, the centrino ,which
has a 10µs transition latency, would lead to the governor allowing to
read stats every tick (10ms)!
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Dave Jones <davej@redhat.com>
A minor fix for cpufreq_add_dev() error path. We need to call driver->exit()
if driver_init() call has succeeded.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
* ret has no need to be unsigned in cpufreq_driver_target()
* ret has no need to be initialized in __cpufreq_governor()
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Fix u32 vs pm_message_t confusion in cpufreq.
Signed-off-by: Bernard Blackham <bernard@blackham.com.au>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sysfs: fix the rest of the kernel so if an attribute doesn't
implement show or store method read/write will return
-EIO instead of 0 or -EINVAL or -EPERM.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changes to the cpufreq stats driver:
* Changes the way P-state transition table looks in /sysfs providing more
clear output
* Changes the time unit in the output from HZ to clock_t
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
[PATCH] [5/5] ondemand governor default sampling downfactor as 1
Make default sampling downfactor 1.
This works better with earlier auto downscaling change in ondemand governor.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
[PATCH] [4/5] ondemand governor automatic downscaling
Here is a change of policy for the ondemand governor. The modification
concerns the frequency downscaling. Instead of decreasing to a lower
frequency when the CPU usage is under 20%, this new policy automatically
scales to the optimal frequency. The optimal frequency being the lowest
frequency which provides enough power to not trigger the upscaling policy.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
[PATCH] [3/5] ondemand,conservative governor idle_tick clean-up
Ondemand and conservative governor clean-up, it factorises the idle ticks
measurement.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
[PATCH] [2/5] ondemand,conservative governor store the idle ticks for all cpus
Ondemand, conservative governor did not store prev_cpu_idle_up into
prev_cpu_idle_down for other CPUs than the current CPU.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
[PATCH] [1/5] ondemand,conservative minor bug-fix and cleanup
Attached patch fixes some minor issues with Alexander's patch and related
cleanup in both ondemand and conservative governor.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Adds support so that the cpufreq change stepping is no longer fixed at 5% and
can be changed dynamically by the user
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
A new cpufreq module, based on the ondemand one with my additional patches
just posted. This one is more suitable for battery environments where its
probably more appealing to have the cpu freq gracefully increase and decrease
rather than flip between the min and max freq's.
N.B. Bruno Ducrot pointed out that the amd64's "do have unacceptable latency
between min and max freq transition, due to the step-by-step requirements
(200MHz IIRC)"; so AMD64 users would probably benefit from this too.
Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch makes a needlessly global and EXPORT_SYMBOL'ed struct static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>
This comes up time and time again. Until its fixed, place this
comment in the Kconfig which should stem the flow of resubmissions.
Signed-off-by: Rob Weryk <rjweryk@uwo.ca>
Signed-off-by: Dave Jones <davej@redhat.com>
Trivial ondemand governor clean-ups:
- change from sampling_rate_in_HZ() to the official function
usecs_to_jiffies().
- use for_each_online_cpu() to instead of using "if (cpu_online(i))"
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
cpufreq core is printing out messages at KERN_WARNING level that the core
recovers from without intervention, and that the system administrator can
do nothing about. Patch below reduces the severity of these messages to
debug.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
The cpufreq core patch I sent earlier got only half-applied. I added a
flag to let the low level driver disable an annoying warning on
suspend/resume that is normal on ppc, but the "resume" part of it wasn't
applied.
This just adds back that missing bit. The original patch also reworked
the resume() function to avoid nesting too many if () statements along
the way I did the suspend() one, but I didn't include that in the patch
below.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In order to properly fix some issues with cpufreq vs. sleep on
PowerBooks, I had to add a suspend callback to the pmac_cpufreq driver.
I must force a switch to full speed before sleep and I switch back to
previous speed on resume.
I also added a driver flag to disable the warnings in suspend/resume
since it is expected in this case to have different speed (and I want it
to fixup the jiffies properly).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!