linux

Author	SHA1	Message	Date
Rafael J. Wysocki	b9af694802	cpufreq: Fix clamp_val() usage in cpufreq_driver_fast_switch() The return value of clamp_val() has to be stored actually. Fixes: `b7898fda5b` (cpufreq: Support for fast frequency switching) Reported-by: Steve Muckle <steve.muckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-01 22:36:26 +02:00
Srinivas Pandruvada	6cacd115a8	cpufreq: intel_pstate: Downgrade print level for _PPC Downgrade pr_info to pr_debug for the "_PPC limits will be enforced" message. In server systems with many cores this message is annoying. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-30 15:22:02 +02:00
Pankaj Gupta	3834abb4e6	cpufreq: simplified goto out in cpufreq_register_driver() simplified goto out in cpufreq_register_driver for increasing code readability Signed-off-by: Pankaj Gupta <pankaj.gupta@spreadtrum.com> Signed-off-by: Sanjeev Yadav <sanjeev.yadav@spreadtrum.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-18 02:34:41 +02:00
Rafael J. Wysocki	45482c703b	cpufreq: governor: CPUFREQ_GOV_STOP never fails None of the cpufreq governors currently in the tree will ever fail an invocation of the ->governor() callback with the event argument equal to CPUFREQ_GOV_STOP (unless invoked with incorrect arguments which doesn't matter anyway) and it is rather difficult to imagine a valid reason for such a failure. Accordingly, rearrange the code in the core to make it clear that this call never fails. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-05-18 02:28:29 +02:00
Rafael J. Wysocki	36be3418eb	cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails None of the cpufreq governors currently in the tree will ever fail an invocation of the ->governor() callback with the event argument equal to CPUFREQ_GOV_POLICY_EXIT (unless invoked with incorrect arguments which doesn't matter anyway) and it wouldn't really make sense to fail it, because the caller won't be able to handle that failure in a meaningful way. Accordingly, rearrange the code in the core to make it clear that this call never fails. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-05-18 02:27:32 +02:00
Rafael J. Wysocki	c749c64f45	intel_pstate: Simplify conditional in intel_pstate_set_policy() One of the if () statements in intel_pstate_set_policy() causes another if () to be evaluated if the condition is true and it doesn't do anything else, so merge the two if () statements into one. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-05-18 02:26:33 +02:00
Rafael J. Wysocki	1aa7a6e2b8	intel_pstate: Clean up get_target_pstate_use_performance() The comments and the core_busy variable name in get_target_pstate_use_performance() are totally confusing, so modify them to reflect what's going on. The results of the computations should be the same as before. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:58:38 +02:00
Rafael J. Wysocki	8edb0a6e48	intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Notice that get_avg_pstate() can use sample.core_avg_perf instead of carrying the same division again, so make it do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:58:37 +02:00
Rafael J. Wysocki	a1c9787dc3	intel_pstate: Clarify average performance computation The core_pct_busy field of struct sample actually contains the average performace during the last sampling period (in percent) and not the utilization of the core as suggested by its name which is confusing. For this reason, change the name of that field to core_avg_perf and rename the function that computes its value accordingly. Also notice that storing this value as percentage requires a costly integer multiplication to be carried out in a hot path, so instead store it as an "extended fixed point" value with more fraction bits and update the code using it accordingly (it is better to change the name of the field along with its meaning in one go than to make those two changes separately, as that would likely lead to more confusion). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:58:37 +02:00
Chen Yu	4578ee7e1d	intel_pstate: Avoid unnecessary synchronize_sched() during initialization Currently, in intel_pstate_clear_update_util_hook(), after clearing the utilization update hook, we leverage synchronize_sched() to deal with synchronization, which is a little bit time-costly because synchronize_sched() has to wait for all the CPUs to go through a grace period. Actually, the synchronize_sched() is not necessary if the utilization update hook has not been set for the given CPU yet, so make the driver check if that's the case and avoid the synchronize_sched() call then. Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371 Tested-by: Tian Ye <yex.tian@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw : Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:56:34 +02:00
Rafael J. Wysocki	c29af6f1a4	Merge branch 'pm-cpufreq-sched' into pm-cpufreq	2016-05-11 22:48:20 +02:00
Arnd Bergmann	cfe9492fdf	cpufreq: schedutil: Make default depend on CONFIG_SMP CPU_FREQ_GOV_SCHEDUTIL gained a dependency on SMP, so now we get a warning if it gets selected by CPU_FREQ_DEFAULT_GOV_SCHEDUTIL without SMP: warning: (CPU_FREQ_DEFAULT_GOV_SCHEDUTIL) selects CPU_FREQ_GOV_SCHEDUTIL which has unmet direct dependencies (CPU_FREQ && SMP) This adds another dependency to avoid the problem. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `bf7cdff194` (cpufreq: schedutil: Make it depend on CONFIG_SMP) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:47:44 +02:00
Akshay Adiga	0bc10b93f2	cpufreq: powernv: del_timer_sync when global and local pstate are equal When global and local pstate are equal in a powernv_target_index() call, we don't queue a timer. But we may have timer already queued for future. This could cause the timer to fire one additional time for no use. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 02:28:00 +02:00
Akshay Adiga	1fd3ff2874	cpufreq: powernv: Move smp_call_function_any() out of irq safe block Fix a WARN_ON caused by smp_call_function_any() when irq is disabled, because of changes made in the patch ('cpufreq: powernv: Ramp-down global pstate slower than local-pstate') https://patchwork.ozlabs.org/patch/612058/ WARNING: CPU: 0 PID: 4 at kernel/smp.c:291 smp_call_function_single+0x170/0x180 Call Trace: [c0000007f648f9f0] [c0000007f648fa90] 0xc0000007f648fa90 (unreliable) [c0000007f648fa30] [c0000000001430e0] smp_call_function_any+0x170/0x1c0 [c0000007f648fa90] [c0000000007b4b00] powernv_cpufreq_target_index+0xe0/0x250 [c0000007f648fb00] [c0000000007ac9dc] __cpufreq_driver_target+0x20c/0x3d0 [c0000007f648fbc0] [c0000000007b1b4c] od_dbs_timer+0xcc/0x260 [c0000007f648fc10] [c0000000007b3024] dbs_work_handler+0x54/0xa0 [c0000007f648fc50] [c0000000000c49a8] process_one_work+0x1d8/0x590 [c0000007f648fce0] [c0000000000c4e08] worker_thread+0xa8/0x660 [c0000007f648fd80] [c0000000000cca88] kthread+0x108/0x130 [c0000007f648fe30] [c0000000000095e8] ret_from_kernel_thread+0x5c/0x74 - Calling smp_call_function_any() with interrupt disabled (through spin_lock_irqsave) could cause a deadlock, as smp_call_function_any() relies on the IPI to complete. This is detected in the smp_call_function_any() call and hence the WARN_ON. - As the spinlock (gpstates->lock) is only used to synchronize access of global_pstate_info between timer irq handler and target_index calls. And the timer irq handler just try_locks() hence it would not cause a deadlock. Hence could do without making spinlocks irq safe. - As the smp_call_function_any() is a blocking call and does not access global_pstates_info, it could reduce the critcal section by moving smp_call_function_any() after giving up the lock. Reported-by: Abdul Haleem <abdhalee@linux.vnet.linux.com> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 02:28:00 +02:00
Rafael J. Wysocki	f96fd0c86f	intel_pstate: Clean up intel_pstate_get() intel_pstate_get() contains a local variable that's initialized but never used and it can be written in fewer lines of code, so clean it up. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-05-09 22:55:36 +02:00
Rafael J. Wysocki	d87de8f38a	Merge branch 'pm-cpufreq-sched' into pm-cpufreq	2016-05-09 16:00:10 +02:00
Rafael J. Wysocki	bf7cdff194	cpufreq: schedutil: Make it depend on CONFIG_SMP Make the schedutil cpufreq governor depend on CONFIG_SMP, because the scheduler-provided utilization numbers used by it are only available with CONFIG_SMP set. Fixes: `9bdcb44e39` (cpufreq: schedutil: New governor based on scheduler utilization data) Reported-by: Steve Muckle <steve.muckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-06 22:33:33 +02:00
Rafael J. Wysocki	da43af961b	Merge cpufreq fixes going into v4.6. * pm-cpufreq-fixes: intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform cpufreq: intel_pstate: Fix processing for turbo activation ratio	2016-05-06 22:01:14 +02:00
Rafael J. Wysocki	9485e4ca0b	cpufreq: governor: Fix handling of special cases in dbs_update() As reported in KBZ 69821: "With CONFIG_HZ_PERIODIC=y cpu stays at the lowest frequcency 800MHz even if usage goes to 100%, frequency does not scale up, the governor in use is ondemand. Neither works conservative. Performance and userspace governors work as expected. With CONFIG_NO_HZ_IDLE or CONFIG_NO_HZ_FULL cpu scales up with ondemand as expected." Analysis carried out by Chen Yu leads to the conclusion that the observed issue is due to idle_time in dbs_update() representing a negative number in which case the function will return 0 as the load (unless load is greater than 0 for another CPU sharing the policy), although that need not be the right choice. Indeed, idle_time representing a negative number means that during the last sampling interval the CPU was almost 100% busy on the rough average, so 100 should be returned as the load in that case. Modify the code accordingly and rearrange it to clarify the handling of all of the special cases in it. While at it, also avoid returning zero as the load if time_elapsed is 0 (it doesn't really make sense to return 0 then). Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821 Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Timo Valtoaho <timo.valtoaho@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-05-06 14:24:23 +02:00
Srinivas Pandruvada	e59a8f7ff4	cpufreq: intel_pstate: Ignore _PPC processing under HWP When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC is not used. So ignore processing for _PPC limits. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:43:47 +02:00
Sudeep Holla	d9975b0b07	cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table Currently when performing random CPU hot-plugs and suspend-to-ram(S2R) on systems using arm_big_little cpufreq driver, we get warnings similar to something like below: cpu cpu1: _opp_add: duplicate OPPs detected. Existing: freq: 600000000, volt: 800000, enabled: 1. New: freq: 600000000, volt: 800000, enabled: 1 This is mainly because the OPPs for the shared cpus are not set. We can just use dev_pm_opp_of_cpumask_add_table in case the OPPs are obtained from DT(arm_big_little_dt.c) or use dev_pm_opp_set_sharing_cpus if the OPPs are obtained by other means like firmware(e.g. scpi-cpufreq.c) Also now that the generic dev_pm_opp{,_of}_cpumask_remove_table can handle removal of opp table and entries for all associated CPUs, we can re-use dev_pm_opp{,_of}_cpumask_remove_table as free_opp_table in cpufreq_arm_bL_ops. This patch makes necessary changes to reuse the generic OPP functions for {init,free}_opp_table and thereby eliminating the warnings. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:40:04 +02:00
Marc Gonzalez	d9c99acb63	cpufreq: tango: Use generic platdev driver Add tango4 compatible string to the list. Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:36:57 +02:00
Sai Gurrappadi	e43e94c1ed	cpufreq: Fix GOV_LIMITS handling for the userspace governor Currently, the userspace governor only updates frequency on GOV_LIMITS if policy->cur falls outside policy->{min/max}. However, it is also necessary to update current frequency on GOV_LIMITS to match the user requested value if it can be achieved within the new policy->{max/min}. This was previously the behaviour in the governor until commit `d1922f0` ("cpufreq: Simplify userspace governor") which incorrectly assumed that policy->cur == user requested frequency via scaling_setspeed. This won't be true if the user requested frequency falls outside policy->{min/max}. Ex: a temporary thermal cap throttled the user requested frequency. Fix this by storing the user requested frequency in a seperate variable. The governor will then try to achieve this request on every GOV_LIMITS change. Fixes: `d1922f0256` (cpufreq: Simplify userspace governor) Signed-off-by: Sai Gurrappadi <sgurrappadi@nvidia.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:30:38 +02:00
Rafael J. Wysocki	6d45b719cb	intel_pstate: Fix intel_pstate_get() After commit `8fa520af50` "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency() to compute the average frequency, which is problematic for two reasons. First, intel_pstate_get() may be invoked before the driver reads the CPU feedback registers for the first time and if that happens, get_avg_frequency() will attempt to divide by zero. Second, the get_avg_frequency() call in intel_pstate_get() is racy with respect to intel_pstate_sample() and it may end up returning completely meaningless values for this reason. Moreover, after commit `7349ec0470` "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" sample.core_pct_busy is never computed on Atom, but it is used in intel_pstate_adjust_busy_pstate() in that case too. To address those problems notice that if sample.core_pct_busy was used in the average frequency computation carried out by get_avg_frequency(), both the divide by zero problem and the race with respect to intel_pstate_sample() would be avoided. Accordingly, move the invocation of intel_pstate_calc_busy() from get_target_pstate_use_performance() to intel_pstate_update_util(), which also will take care of the uninitialized sample.core_pct_busy on Atom, and modify get_avg_frequency() to use sample.core_pct_busy as per the above. Reported-by: kernel test robot <ying.huang@linux.intel.com> Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4 Fixes: `8fa520af50` "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" Fixes: `7349ec0470` "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-04 14:09:16 +02:00
Rafael J. Wysocki	ba41e1bc28	cpufreq: intel_pstate: Fix HWP on boot CPU after system resume Commit `41cfd64cf4` "Update frequencies of policy->cpus only from ->set_policy()" changed the way the intel_pstate driver's ->set_policy callback updates the HWP (hardware-managed P-states) settings. A side effect of it is that if those settings are modified on the boot CPU during system suspend and wakeup, they will never be restored during subsequent system resume. To address this problem, allow cpufreq drivers that don't provide ->target or ->target_index callbacks to use ->suspend and ->resume callbacks and add a ->resume callback to intel_pstate to restore the HWP settings on the CPUs that belong to the given policy. Fixes: `41cfd64cf4` "Update frequencies of policy->cpus only from ->set_policy()" Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-05-02 13:48:15 +02:00
Sudeep Holla	2482bc31ca	cpufreq: st: enable selective initialization based on the platform The sti-cpufreq does unconditional registration of the cpufreq-dt driver which causes issue on an multi-platform build. For example, on Vexpress TC2 platform, we get the following error on boot: cpu cpu0: OPP-v2 not supported cpu cpu0: Not doing voltage scaling cpu: dev_pm_opp_of_cpumask_add_table: couldn't find opp table for cpu:0, -19 cpu cpu0: dev_pm_opp_get_max_volt_latency: Invalid regulator (-6) ... arm_big_little: bL_cpufreq_register: Failed registering platform driver: vexpress-spc, err: -17 The actual driver fails to initialise as cpufreq-dt is probed successfully, which is incorrect. This issue can happen to any platform not using cpufreq-dt in a multi-platform build. This patch adds a check to do selective initialization of the driver. Fixes: `ab0ea257fc` (cpufreq: st: Provide runtime initialised driver for ST's platforms) Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Lee Jones <lee.jones@linaro.org> Cc: 4.5+ <stable@vger.kernel.org> # 4.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 15:25:56 +02:00
Viresh Kumar	9f123def55	cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/ Move cpufreq bits for mvebu into drivers/cpufreq/ directory, that's where they really belong to. Compiled tested only. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 15:22:43 +02:00
Viresh Kumar	eb96924acd	cpufreq: dt: Kill platform-data There are no more users of platform-data for cpufreq-dt driver, get rid of it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 15:22:43 +02:00
Viresh Kumar	1530b9963e	cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2 Existing platforms, which do not support operating-points-v2, can explicitly tell the opp core that some of the CPUs share opp tables, with help of dev_pm_opp_set_sharing_cpus(). For such platforms, explicitly ask the opp core to provide list of CPUs sharing the opp table with current cpu device, before falling back to platform data. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 15:22:42 +02:00
Rafael J. Wysocki	b4f4b4b371	cpufreq: governor: Change confusing struct field and variable names The name of the prev_cpu_wall field in struct cpu_dbs_info is confusing, because it doesn't represent wall time, but the previous update time as returned by get_cpu_idle_time() (that may be the current value of jiffies_64 in some cases, for example). Moreover, the names of some related variables in dbs_update() take that confusion further. Rename all of those things to make their names reflect the purpose more accurately. While at it, drop unnecessary parens from one of the updated expressions. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Chen Yu <yu.c.chen@intel.com>	2016-04-28 15:10:08 +02:00
Srinivas Pandruvada	2b3ec76505	cpufreq: intel_pstate: Enable PPC enforcement for servers For platforms which are controlled via remove node manager, enable _PPC by default. These platforms are mostly categorized as enterprise server or performance servers. These platforms needs to go through some certifications tests, which tests control via _PPC. The relative risk of enabling by default is low as this is is less likely that these systems have broken _PSS table. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 01:01:39 +02:00
Srinivas Pandruvada	3be9200d51	cpufreq: intel_pstate: Adjust policy->max When policy->max is changed via _PPC or sysfs and is more than the max non turbo frequency, it does not really change resulting performance in some processors. When policy->max results in a P-State ratio more than the turbo activation ratio, then processor can choose any P-State up to max turbo. So the user or _PPC setting has no value, but this can cause undesirable side effects like: - Showing reduced max percentage in Intel P-State sysfs - It can cause reduced max performance under certain boundary conditions: The requested max scaling frequency either via _PPC or via cpufreq-sysfs, will be converted into a fixed floating point max percent scale. In majority of the cases this will result in correct max. But not 100% of the time. If the _PPC is requested at a point where the calculation lead to a lower max, this can result in a lower P-State then expected and it will impact performance. Example of this condition using a Broadwell laptop with config TDP. ACPI _PSS table from a Broadwell laptop `2301000` 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000 1300000 1100000 1000000 900000 800000 600000 500000 The actual results by disabling config TDP so that we can get what is requested on or below 2300000Khz. scaling_max_freq Max Requested P-State Resultant scaling max ---------------------------------------- ---------------------- 2400000 18 `2900000` (max turbo) 2300000 17 2300000 (max physical non turbo) 2200000 15 2100000 2100000 15 2100000 2000000 13 1900000 1900000 13 1900000 1800000 12 1800000 1700000 11 1700000 1600000 10 1600000 1500000 f 1500000 1400000 e 1400000 1300000 d 1300000 1200000 c 1200000 1100000 a 1000000 1000000 a 1000000 900000 9 900000 800000 8 800000 700000 7 700000 600000 6 600000 500000 5 500000 ------------------------------------------------------------------ Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz) in BIOS (not every system will let you adjust this). The turbo activation ratio will be set to one less than that, which will be 0x0a (So any request above 1000000KHz should result in turbo region assuming no thermal limits). Here _PPC will request max to 1100000KHz (which basically should still result in turbo as this is more than the turbo activation ratio up to max allowable turbo frequency), but actual calculation resulted in a max ceiling P-State which is 0x0a. So under any load condition, this driver will not request turbo P-States. This will be a huge performance hit. When config TDP feature is ON, if the _PPC points to a frequency above turbo activation ratio, the performance can still reach max turbo. In this case we don't need to treat this as the reduced frequency in set_policy callback. In this change when config TDP is active (by checking if the physical max non turbo ratio is more than the current max non turbo ratio), any request above current max non turbo is treated as full performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Minor cleanups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 01:01:39 +02:00
Srinivas Pandruvada	9522a2ff9c	cpufreq: intel_pstate: Enforce _PPC limits Use ACPI _PPC notification to limit max P state driver will request. ACPI _PPC change notification is sent by BIOS to limit max P state in several cases: - Reduce impact of platform thermal condition - When Config TDP feature is used, a changed _PPC is sent to follow TDP change - Remote node managers in server want to control platform power via baseboard management controller (BMC) This change registers with ACPI processor performance lib so that _PPC changes are notified to cpufreq core, which in turns will result in call to .setpolicy() callback. Also the way _PSS table identifies a turbo frequency is not compatible to max turbo frequency in intel_pstate, so the very first entry in _PSS needs to be adjusted. This feature can be turned on by using kernel parameters: intel_pstate=support_acpi_ppc Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Minor cleanups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 01:01:39 +02:00
Akshay Adiga	eaa2c3aeef	cpufreq: powernv: Ramp-down global pstate slower than local-pstate The frequency transition latency from pmin to pmax is observed to be in few millisecond granurality. And it usually happens to take a performance penalty during sudden frequency rampup requests. This patch set solves this problem by using an entity called "global pstates". The global pstate is a Chip-level entity, so the global entitiy (Voltage) is managed across the cores. The local pstate is a Core-level entity, so the local entity (frequency) is managed across threads. This patch brings down global pstate at a slower rate than the local pstate. Hence by holding global pstates higher than local pstate makes the subsequent rampups faster. A per policy structure is maintained to keep track of the global and local pstate changes. The global pstate is brought down using a parabolic equation. The ramp down time to pmin is set to ~5 seconds. To make sure that the global pstates are dropped at regular interval , a timer is queued for every 2 seconds during ramp-down phase, which eventually brings the pstate down to local pstate. Iozone results show fairly consistent performance boost. YCSB on redis shows improved Max latencies in most cases. Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb with different record sizes . The following table shows IOoperations/sec with and without patch. Iozone Results ( in op/sec) ( mean over 3 iterations ) --------------------------------------------------------------------- file size- with without % recordsize-IOtype patch patch change ---------------------------------------------------------------------- 200704-1-SeqWrite 1616532 1615425 0.06 200704-1-Rewrite 2423195 2303130 5.21 200704-2-SeqWrite 1628577 1602620 1.61 200704-2-Rewrite 2428264 2312154 5.02 200704-4-SeqWrite 1617605 1617182 0.02 200704-4-Rewrite 2430524 2351238 3.37 200704-8-SeqWrite 1629478 1600436 1.81 200704-8-Rewrite `2415308` 2298136 5.09 200704-16-SeqWrite 1619632 1618250 0.08 200704-16-Rewrite 2396650 2352591 1.87 200704-32-SeqWrite 1632544 1598083 2.15 200704-32-Rewrite 2425119 2329743 4.09 200704-64-SeqWrite 1617812 1617235 0.03 200704-64-Rewrite 2402021 2321080 3.48 200704-128-SeqWrite 1631998 1600256 1.98 200704-128-Rewrite 2422389 2304954 5.09 200704-256 SeqWrite 1617065 1616962 0.00 200704-256-Rewrite 2432539 2301980 5.67 200704-512-SeqWrite 1632599 1598656 2.12 200704-512-Rewrite 2429270 2323676 4.54 200704-1024-SeqWrite 1618758 1616156 0.16 200704-1024-Rewrite 2431631 2315889 4.99 401408-1-SeqWrite 1631479 1608132 1.45 401408-1-Rewrite 2501550 2459409 1.71 401408-2-SeqWrite 1617095 1626069 -0.55 401408-2-Rewrite 2507557 2443621 2.61 401408-4-SeqWrite 1629601 1611869 1.10 401408-4-Rewrite 2505909 2462098 1.77 401408-8-SeqWrite 1617110 1626968 -0.60 401408-8-Rewrite 2512244 2456827 2.25 401408-16-SeqWrite 1632609 1609603 1.42 401408-16-Rewrite 2500792 2451405 2.01 401408-32-SeqWrite 1619294 1628167 -0.54 401408-32-Rewrite 2510115 2451292 2.39 401408-64-SeqWrite 1632709 1603746 1.80 401408-64-Rewrite 2506692 2433186 3.02 401408-128-SeqWrite 1619284 1627461 -0.50 401408-128-Rewrite 2518698 2453361 2.66 401408-256-SeqWrite 1634022 1610681 1.44 401408-256-Rewrite 2509987 2446328 2.60 401408-512-SeqWrite 1617524 1628016 -0.64 401408-512-Rewrite 2504409 2442899 2.51 401408-1024-SeqWrite 1629812 1611566 1.13 401408-1024-Rewrite 2507620 2442968 2.64 Tested with YCSB workload (50% update + 50% read) over redis for 1 million records and 1 million operation. Each test was carried out with target operations per second and persistence disabled. Max-latency (in us)( mean over 5 iterations ) --------------------------------------------------------------- op/s Operation with patch without patch %change --------------------------------------------------------------- 15000 Read 61480.6 50261.4 22.32 15000 cleanup 215.2 293.6 -26.70 15000 update 25666.2 25163.8 2.00 25000 Read 32626.2 89525.4 -63.56 25000 cleanup 292.2 263.0 11.10 25000 update 32293.4 90255.0 -64.22 35000 Read 34783.0 33119.0 5.02 35000 cleanup 321.2 395.8 -18.8 35000 update 36047.0 38747.8 -6.97 40000 Read 38562.2 42357.4 -8.96 40000 cleanup 371.8 384.6 -3.33 40000 update 27861.4 41547.8 -32.94 45000 Read 42271.0 88120.6 -52.03 45000 cleanup 263.6 383.0 -31.17 45000 update 29755.8 81359.0 -63.43 (test without target op/s) 47659 Read 83061.4 136440.6 -39.12 47659 cleanup 195.8 193.8 1.03 47659 update 73429.4 124971.8 -41.24 Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-27 23:56:58 +02:00
Shilpasri G Bhat	2920e9ce8f	cpufreq: powernv: Remove flag use-case of policy->driver_data commit `1b0289848d` ("cpufreq: powernv: Add sysfs attributes to show throttle stats") used policy->driver_data as a flag for one-time creation of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to check if the attribute already exists. This is required as policy->driver_data is used for other purposes in the later patch. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-27 23:56:58 +02:00
Javier Martinez Canillas	6de0dc4b53	cpufreq: e_powersaver: Use IS_ENABLED() instead of checking for built-in or module The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-27 22:42:34 +02:00
Srinivas Pandruvada	1becf03545	cpufreq: intel_pstate: Fix processing for turbo activation ratio When the config TDP level is not nominal (level = 0), the MSR values for reading level 1 and level 2 ratios contain power in low 14 bits and actual ratio bits are at bits [23:16]. The current processing for level 1 and level 2 is wrong as there is no shift done to get actual ratio. Fixes: `6a35fc2d6c` (cpufreq: intel_pstate: get P1 from TAR when available) Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 23:39:09 +02:00
Rafael J. Wysocki	ba1ca654f3	cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start() The way cpufreq_governor_start() initializes j_cdbs->prev_load is questionable. First off, j_cdbs->prev_cpu_wall used as a denominator in the computation may be zero. The case this happens is when get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy() used to return that number is called exactly at the jiffies_64 wrap time. It is rather hard to trigger that error, but it is not impossible and it will just crash the kernel then. Second, j_cdbs->prev_load is computed as the average load during the entire time since the system started and it may not reflect the load in the previous sampling period (as it is expected to). That doesn't play well with the way dbs_update() uses that value. Namely, if the update time delta (wall_time) happens do be greater than twice the sampling rate on the first invocation of it, the initial value of j_cdbs->prev_load (which may be completely off) will be returned to the caller as the current load (unless it is equal to zero and unless another CPU sharing the same policy object has a greater load value). For this reason, notice that the prev_load field of struct cpu_dbs_info is only used by dbs_update() and only in that one place, so if cpufreq_governor_start() is modified to always initialize it to 0, it will make dbs_update() always compute the actual load first time it checks the update time delta against the doubled sampling rate (after initialization) and there won't be any side effects of it. Consequently, modify cpufreq_governor_start() as described. Fixes: `18b46abd00` (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-25 16:21:34 +02:00
Viresh Kumar	3920be471c	cpufreq: hisilicon: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:24 +02:00
Viresh Kumar	5e4249c6d9	cpufreq: zynq: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:24 +02:00
Viresh Kumar	117d4f59af	cpufreq: sunxi: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:24 +02:00
Viresh Kumar	a399dc9fc5	cpufreq: shmobile: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Finley Xiao	014400c127	cpufreq: rockchip: Use generic platdev driver This patch add rockchip's compatible string to the compat list and remove similar code from platform code for supporting generic platdev driver. Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Viresh Kumar	7694ca6e1d	cpufreq: omap: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Viresh Kumar	7ead83f6df	cpufreq: imx: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Note that the complete routine imx27_dt_init() is removed as of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); has same effect as a NULL .init_machine machine callback pointer. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Viresh Kumar	a59511d1da	cpufreq: berlin: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Viresh Kumar	e92bb16674	cpufreq: dt: Mark platdev machines array as __initconst The machines array in cpufreq-dt-platdev is used only once at boot time and so should be marked with __initconst, so that kernel can free up memory used for it, if required. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:22 +02:00
Jia Hongtao	394cb8316b	cpufreq: qoriq: Fix cooling device registration issue during suspend Cooling device is registered by ready callback. It's also invoked while system resuming from sleep (Enabling non-boot cpus). Thus cooling device may be multiple registered. Matchable unregistration is added to exit callback to fix this issue. Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:07:02 +02:00
Jia Hongtao	495c716f17	cpufreq: qoriq: Remove __exit macro from .exit callback .exit callback (qoriq_cpufreq_cpu_exit()) is also used during suspend. So __exit macro should be removed or the function will be discarded. Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:07:02 +02:00
Jia Hongtao	27b8fe8daf	cpufreq: qoriq: Don't show cooling device messages if THERMAL_OF undefined When THERMAL_OF is undefined the cooling device messages should not be shown. -ENOSYS is returned from of_cpufreq_cooling_register() when THERMAL_OF is undefined. Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:00:42 +02:00

1 2 3 4 5 ...

1847 Commits