Using performance_limits in the passive mode doesn't make
sense, because in that mode the global limits are applied to the
frequency selected by the scaling governor.
The maximum and minimum P-state limits in performance_limits are both
set to 100 percent which will put all CPUs into the turbo range
regardless of what governor is used and what frequencies are
selected by it (that is particularly undesirable on CPUs with the
generic powersave governor attached).
For this reason, make intel_pstate_register_driver() always point
limits to powersave_limits in the passive mode.
Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is a problem with intel_pstate operation mode switching
introduced by commit fb1fe1041c (cpufreq: intel_pstate: Operation
mode control from sysfs), because the global sysfs limits are
preserved across operation modes while per-policy limits are
reinitialized from scratch on a mode switch and both sets of limits
may get out of sync this way.
Fix that by always reinitializing the global limits upon the
registration of the driver.
Fixes: fb1fe1041c (cpufreq: intel_pstate: Operation mode control from sysfs)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
This snip code is not needed anymore since its user
get_hard_smp_processor_id() has been removed.
Signed-off-by: Tang Yuantian <yuantian.tang@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add maintainer information for bmips-cpufreq.c.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Without the Kconfig dependency, we can get this warning:
warning: ACPI_CPPC_CPUFREQ selects ACPI_CPPC_LIB which has unmet direct dependencies (ACPI && ACPI_PROCESSOR)
Fixes: 5477fb3bd1 (ACPI / CPPC: Add a CPUFreq driver for use with CPPC)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Kconfig currently controlling compilation of this code is:
drivers/cpufreq/Kconfig.arm:config ARM_TI_CPUFREQ
drivers/cpufreq/Kconfig.arm: bool "Texas Instruments CPUFreq support"
...meaning that it currently is not being built as a module by anyone.
Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.
We also delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If new_policy is set in cpufreq_online(), the policy object has just
been created and its real_cpus mask has been zeroed on allocation,
and the driver's ->init() callback should not touch it.
It doesn't need to be cleared again, so don't do that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Some TI platforms, specifically those in the am33xx, am43xx, dra7xx, and
am57xx families of SoCs can make use of the ti-cpufreq driver to
selectively enable OPPs based on the exact configuration in use. The
ti-cpufreq is given the responsibility of creating the cpufreq-dt
platform device when the driver is in use so drop am33xx and dra7xx
from the cpufreq-dt-platdev driver so it is not created twice.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some TI SoCs, like those in the AM335x, AM437x, DRA7x, and AM57x families,
have different OPPs available for the MPU depending on which specific
variant of the SoC is in use. This can be determined through use of the
revision and an eFuse register present in the silicon. Introduce a
ti-cpufreq driver that can read the aformentioned values and provide
them as version matching data to the opp framework. Through this the
opp-supported-hw dt binding that is part of the operating-points-v2
table can be used to indicate availability of OPPs for each device.
This driver also creates the "cpufreq-dt" platform_device after passing
the version matching data to the OPP framework so that the cpufreq-dt
handles the actual cpufreq implementation. Even without the necessary
data to pass the version matching data the driver will still create this
device to maintain backwards compatibility with operating-points v1
tables.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add the device tree bindings document for the TI CPUFreq/OPP driver
on AM33xx, AM43xx, DRA7xx, and AM57xx SoCs. The operating-points-v2
binding allows us to provide an opp-supported-hw property for each OPP
to define when it is available. This driver is responsible for reading
and parsing registers to determine which OPPs can be selectively enabled
based on the specific SoC in use by matching against the opp-supported-hw
data.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rename _of_get_opp_desc_node to dev_pm_opp_of_get_opp_desc_node and add it
to include/linux/pm_opp.h to allow other drivers, such as platform OPP
and cpufreq drivers, to make use of it.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Get the CPU clock's potential parent clocks from the clock interface
itself, rather than manually parsing the clocks property to find a
phandle, looking at the clock-names property of that, and assuming that
those are valid parent clocks for the cpu clock.
This is necessary now that the clocks are generated based on the clock
driver's knowledge of the chip rather than a fragile device-tree
description of the mux options.
We can now rely on the clock driver to ensure that the mux only exposes
options that are valid. The cpufreq driver was currently being overly
conservative in some cases -- for example, the "min_cpufreq =
get_bus_freq()" restriction only applies to chips with erratum
A-004510, and whether the freq_mask used on p5020 is needed depends on
the actual frequencies of the PLLs (FWIW, p5040 has a similar
limitation but its .freq_mask was zero) -- and the frequency mask
mechanism made assumptions about particular parent clock indices that
are no longer valid.
Signed-off-by: Scott Wood <scottwood@nxp.com>
Signed-off-by: Tang Yuantian <yuantian.tang@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add ARM64 config to Kconfig to enable CPU frequency feature on
NXP ARM64 SoCs.
Signed-off-by: Tang Yuantian <yuantian.tang@nxp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The "goto err_armclk;" error path already does a clk_put(s3c_freq->hclk);
so this is a double free.
Fixes: 34ee550752 ([CPUFREQ] Add S3C2416/S3C2450 cpufreq driver)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Enable all applicable CPUfreq options.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add the MIPS CPUfreq driver. This driver currently supports CPUfreq on
BMIPS5xxx-based SoCs.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Turn on CPU_SUPPORTS_CPUFREQ and MIPS_EXTERNAL_TIMER for BMIPS.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ran "make savedefconfig" to bring bmips_stb_defconfig up to date.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fixes the following sparse warning:
drivers/base/power/opp/core.c:49:18: warning:
symbol '_find_opp_table_unlocked' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some Kabylake desktop processors may not reach max turbo when running in
HWP mode, even if running under sustained 100% utilization.
This occurs when the HWP.EPP (Energy Performance Preference) is set to
"balance_power" (0x80) -- the default on most systems.
It occurs because the platform BIOS may erroneously enable an
energy-efficiency setting -- MSR_IA32_POWER_CTL BIT-EE, which is not
recommended to be enabled on this SKU.
On the failing systems, this BIOS issue was not discovered when the
desktop motherboard was tested with Windows, because the BIOS also
neglects to provide the ACPI/CPPC table, that Windows requires to enable
HWP, and so Windows runs in legacy P-state mode, where this setting has
no effect.
Linux' intel_pstate driver does not require ACPI/CPPC to enable HWP, and
so it runs in HWP mode, exposing this incorrect BIOS configuration.
There are several ways to address this problem.
First, Linux can also run in legacy P-state mode on this system.
As intel_pstate is how Linux enables HWP, booting with
"intel_pstate=disable"
will run in acpi-cpufreq/ondemand legacy p-state mode.
Or second, the "performance" governor can be used with intel_pstate,
which will modify HWP.EPP to 0.
Or third, starting in 4.10, the
/sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference
attribute in can be updated from "balance_power" to "performance".
Or fourth, apply this patch, which fixes the erroneous setting of
MSR_IA32_POWER_CTL BIT_EE on this model, allowing the default
configuration to function as designed.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When HWP is active, turbo activation ratio is not used to calculate max
non turbo ratio. But on these systems the max non turbo ratio is decided
by config TDP settings.
This change removes usage of MSR_TURBO_ACTIVATION_RATIO for HWP systems,
instead directly use TDP ratios, when more than one TDPs are available.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Under HWP the performance limits are calculated using max_perf_pct
and min_perf_pct using possible performance, not available performance.
The available performance can be reduced by no_turbo setting. To make
compatible with legacy mode, use max/min performance percentage with
respect to available performance.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When turbo is not disabled by BIOS, but user disabled from intel P-State
sysfs and changes max/min using cpufreq sysfs, the resultant frequency
is lower than what user requested.
The reason for this, when the perf limits are calculated in set_policy()
callback, they are with reference to max cpu frequency (turbo frequency
), but when enforced in the intel_pstate_get_min_max() they are with
reference to max available performance as documented in the intel_pstate
documentation (in this case max non turbo P-State).
This needs similar change as done in intel_cpufreq_verify_policy() for
passive mode. Set policy->cpuinfo.max_freq based on the turbo status.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make it possible to change the operation mode of intel_pstate with
the help of a new sysfs attribute called "status".
There are three possible configurations that can be selected using
this attribute:
"off" - The driver is not in use at this time.
"active" - The driver works as a P-state governor (default).
"passive" - The driver works as a regular cpufreq one and collaborates
with the generic cpufreq governors (it sets P-states as
requested by those governors). [This is the same mode
the driver can be started in by passing intel_pstate=passive
in the kernel command line.]
The current setting is returned by reads from this attribute. Writing
one of the above strings to it changes the operation mode as indicated
by that string, if possible.
If HW-managed P-states (HWP) feature is enabled, it is not possible
to change the driver's operation mode and attempts to write to this
attribute will fail.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Expose the intel_pstate's global sysfs attributes before registering
the driver to prepare for the addition of an attribute that also will
have to work if the driver is not registered.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
acpi_processor_ppc_notifier() can live without using CPUFREQ_START
(which is gonna be removed soon), as it is only used while setting
ignore_ppc to 0. This can be done with the help of "ignore_ppc < 0"
check alone. The notifier function anyway ignores all events except
CPUFREQ_ADJUST and dropping CPUFREQ_START wouldn't harm at all.
Once CPUFREQ_START event is removed from the cpufreq core,
acpi_processor_ppc_notifier() will get called only for CPUFREQ_NOTIFY or
CPUFREQ_ADJUST event. Drop the return statement from the first if block
to make sure we don't ignore any such events.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In P8+, Workload Optimized Frequency(WOF) provides the capability to
boost the cpu frequency based on the utilization of the other cpus
running in the chip. The On-Chip-Controller(OCC) firmware will control
the achievability of these frequencies depending on the power headroom
available in the chip. Currently the ultra-turbo frequencies provided
by this feature are exported along with the turbo and sub-turbo
frequencies as scaling_available_frequencies. This patch will export
the ultra-turbo frequencies separately as scaling_boost_frequencies in
WOF enabled systems. This patch will add the boost sysfs file which
can be used to disable/enable ultra-turbo frequencies.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq core has gone though lots of updates in recent times, but on
many occasions the documentation wasn't updated along with the code.
This patch tries to catchup the documentation with the code.
Also add Rafael and Viresh as the contributors to the documentation.
Based on a patch from Claudio Scordino.
Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch doesn't change the content of the documentation, but rather
reformat it to make it more readable.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This doesn't have any benefit apart from saving a small amount of memory
when it is disabled. The ifdef hackery in the code makes it dirty
unnecessarily.
Clean it up by removing the Kconfig option completely. Few defconfigs
are also updated and CONFIG_CPU_FREQ_STAT_DETAILS is replaced with
CONFIG_CPU_FREQ_STAT now in them, as users wanted stats to be enabled.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Those were added by:
commit fcd7af917a ("cpufreq: stats: handle cpufreq_unregister_driver()
and suspend/resume properly")
but aren't used anymore since:
commit 1aefc75b24 ("cpufreq: stats: Make the stats code non-modular").
Remove them. Also remove the redundant parameter to the respective
routines.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Update OPP documentation to remove the RCU specific bits.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
dev_pm_opp_get_max_volt_latency() calls _find_opp_table() two times
effectively.
Merge _get_regulator_count() into dev_pm_opp_get_max_volt_latency() to
avoid that.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As we don't use RCU locking anymore, there is no need to replace an
earlier OPP node with a new one. Just update the existing one.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The RCU locking isn't well suited for the OPP core. The RCU locking fits
better for reader heavy stuff, while the OPP core have at max one or two
readers only at a time.
Over that, it was getting very confusing the way RCU locking was used
with the OPP core. The individual OPPs are mostly well handled, i.e. for
an update a new structure was created and then that replaced the older
one. But the OPP tables were updated directly all the time from various
parts of the core. Though they were mostly used from within RCU locked
region, they didn't had much to do with RCU and were governed by the
mutex instead.
And that mixed with the 'opp_table_lock' has made the core even more
confusing.
Now that we are already managing the OPPs and the OPP tables with kernel
reference infrastructure, we can get rid of RCU locking completely and
simplify the code a lot.
Remove all RCU references from code and comments.
Acquire opp_table->lock while parsing the list of OPPs though.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Take reference of the OPP table from within _find_opp_table(). Also
update the callers of _find_opp_table() to call
dev_pm_opp_put_opp_table() after they have used the OPP table.
Note that _find_opp_table() increments the reference under the
opp_table_lock.
Now that the OPP table wouldn't get freed until the callers of
_find_opp_table() call dev_pm_opp_put_opp_table(), there is no need to
take the opp_table_lock or rcu_read_lock() around it. Drop them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch updates dev_pm_opp_find_freq_*() routines to get a reference
to the OPPs returned by them.
Also updates the users of dev_pm_opp_find_freq_*() routines to call
dev_pm_opp_put() after they are done using the OPPs.
As it is guaranteed the that OPPs wouldn't get freed while being used,
the RCU read side locking present with the users isn't required anymore.
Drop it as well.
This patch also updates all users of devfreq_recommended_opp() which was
returning an OPP received from the OPP core.
Note that some of the OPP core routines have gained
rcu_read_{lock|unlock}() calls, as those still use RCU specific APIs
within them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> [Devfreq]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add kref to struct dev_pm_opp for easier accounting of the OPPs.
Note that the OPPs are freed under the opp_table->lock mutex only.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Migrate all users of _add_opp_table() to use dev_pm_opp_get_opp_table()
to guarantee that the OPP table doesn't get freed while being used.
Also update _managed_opp() to get the reference to the OPP table.
Now that the OPP table wouldn't get freed while these routines are
executing after dev_pm_opp_get_opp_table() is called, there is no need
to take opp_table_lock. Drop them as well.
Now that _add_opp_table(), _remove_opp_table() and the unlocked release
routines aren't used anymore, remove them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Take reference of the OPP table while adding and removing OPPs, that
helps us remove special checks in _remove_opp_table().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that we have proper kernel reference infrastructure in place for OPP
tables, use it to guarantee that the OPP table isn't freed while being
used by the callers of dev_pm_opp_set_*() APIs.
Make them all return the pointer to the OPP table after taking its
reference and put the reference back with dev_pm_opp_put_*() APIs.
Now that the OPP table wouldn't get freed while these routines are
executing after dev_pm_opp_get_opp_table() is called, there is no need
to take opp_table_lock. Drop them as well.
Remove the rcu specific comments from these routines as they aren't
relevant anymore.
Note that prototypes of dev_pm_opp_{set|put}_regulators() were already
updated by another patch.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add kref to struct opp_table for easier accounting of the OPP table.
Note that the new routine dev_pm_opp_get_opp_table() takes the reference
from under the opp_table_lock, which guarantees that the OPP table
doesn't get freed unless dev_pm_opp_put_opp_table() is called for the
OPP table.
Two separate release mechanisms are added: locked and unlocked. In
unlocked version the routines aren't required to take/drop
opp_table_lock as the callers have already done that. This is required
to avoid breaking git bisect, otherwise we may get lockdeps between
commits. Once all the users of OPP table are updated the unlocked
version shall be removed.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add per OPP table lock to protect opp_table->opp_list.
Note that at few places opp_list is used under the rcu_read_lock() and
so a mutex can't be added there for now. This will be fixed by a later
patch.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
I've seen this trigger twice now, where the i915_gem_object_to_ggtt()
call in intel_unpin_fb_obj() returns NULL, resulting in an oops
immediately afterwards as the (inlined) call to i915_vma_unpin_fence()
tries to dereference it.
It seems to be some race condition where the object is going away at
shutdown time, since both times happened when shutting down the X
server. The call chains were different:
- VT ioctl(KDSETMODE, KD_TEXT):
intel_cleanup_plane_fb+0x5b/0xa0 [i915]
drm_atomic_helper_cleanup_planes+0x6f/0x90 [drm_kms_helper]
intel_atomic_commit_tail+0x749/0xfe0 [i915]
intel_atomic_commit+0x3cb/0x4f0 [i915]
drm_atomic_commit+0x4b/0x50 [drm]
restore_fbdev_mode+0x14c/0x2a0 [drm_kms_helper]
drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
intel_fbdev_set_par+0x18/0x70 [i915]
fb_set_var+0x236/0x460
fbcon_blank+0x30f/0x350
do_unblank_screen+0xd2/0x1a0
vt_ioctl+0x507/0x12a0
tty_ioctl+0x355/0xc30
do_vfs_ioctl+0xa3/0x5e0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x13/0x94
- i915 unpin_work workqueue:
intel_unpin_work_fn+0x58/0x140 [i915]
process_one_work+0x1f1/0x480
worker_thread+0x48/0x4d0
kthread+0x101/0x140
and this patch purely papers over the issue by adding a NULL pointer
check and a WARN_ON_ONCE() to avoid the oops that would then generally
make the machine unresponsive.
Other callers of i915_gem_object_to_ggtt() seem to also check for the
returned pointer being NULL and warn about it, so this clearly has
happened before in other places.
[ Reported it originally to the i915 developers on Jan 8, applying the
ugly workaround on my own now after triggering the problem for the
second time with no feedback.
This is likely to be the same bug reported as
https://bugs.freedesktop.org/show_bug.cgi?id=98829https://bugs.freedesktop.org/show_bug.cgi?id=99134
which has a patch for the underlying problem, but it hasn't gotten to
me, so I'm applying the workaround. ]
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>