Currently the imx_sc driver is reimplementing part of the thermal zone
parsing from the thermal OF tree code to get the sensor id associated
with a thermal zone sensor.
The driver platform specific code should know what sensor is present
and not rely on the thermal zone description to do a discovery. Well
that is arguable but all the other drivers have a per platform data
telling what sensor id to use.
The imx_sc thermal driver is the only one using a different
approach. Not invalid but forcing to keep a specific function
'thermal_zone_of_get_sensor_id()' to get the sensor id for a specific
thermal zone as the self-explanatory function tells and having device
tree code inside the driver.
The thermal OF code had a rework and remains now self-encapsulated
with a register/unregister functions and their 'devm' variants, except
for the function mentioned above.
After investigating, it appears the imx_sc sensor is defined in
arch/arm64/boot/dts/freescale/imx8qxp.dtsi:
which defines the cpu-thermal zone with the id: IMX_SC_R_SYSTEM
This dtsi is included by:
- imx8qxp-ai_ml.dts
- imx8qxp-colibri.dtsi
- imx8qxp-mek.dts
The two first ones do not define more thermal zones
The third one adds the pmic-thermal0 zone with id: IMX_SC_R_PMIC_0
The thermal OF code returns -ENODEV if the thermal zone registration
with a specific id fails because the description is not available in
the DT for such a sensor id. In this case we continue with the other
ids without bailing out with an error.
So we can build for the 'fsl,imx-sc-thermal' a compatible data, an
array of sensor ids containing IMX_SC_R_SYSTEM and IMX_SC_R_PMIC_0.
The latter won't be found but that will not result in an error but a
normal case where we continue the initialization with other ids.
Just to clarify, it is what the thermal framework does and what the
other drivers are expecting: when a registration fails with -ENODEV
this is not an error but a case where the description is not found in
the device tree, that be can the entire thermal zones description or a
specific thermal zone with an unknown id.
There is one small functional change but without impact. When there is
no 'thermal-zones' description the probe function was returning
'-ENODEV', now it returns zero. When a thermal zone fails to register
with an error different from '-ENODEV', the error is detected and
returned.
Change the code accordingly and remove the OF code from the driver.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220818082316.2717095-1-daniel.lezcano@linaro.org
Merge thermal control driver changes for 6.1-rc1:
- Use module_pci_driver() macro in the int340x processor_thermal
driver (Shang XiaoJing).
- Use get_cpu() instead of smp_processor_id() in the intel_powerclamp
thermal driver to prevent it from crashing and remove unused
accounting for IRQ wakes from it (Srinivas Pandruvada).
- Consolidate priv->data_vault checks in int340x_thermal (Rafael
Wysocki).
- Check the policy first in cpufreq_cooling_register() (Xuewen Yan).
- Drop redundant error message from da9062-thermal (zhaoxiao).
- Drop of_match_ptr() from thermal_mmio (Jean Delvare).
* thermal-intel:
thermal: int340x: processor_thermal: Use module_pci_driver() macro
thermal: intel_powerclamp: Remove accounting for IRQ wakes
thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
thermal: int340x_thermal: Consolidate priv->data_vault checks
* thermal-drivers:
thermal: cpufreq_cooling: Check the policy first in cpufreq_cooling_register()
thermal: da9062-thermal: Drop redundant error message
thermal/drivers/thermal_mmio: Drop of_match_ptr()
Merge core thermal control changes for 6.1-rc1:
- Increase maximum number of trip points in the thermal core (Sumeet
Pawnikar).
- Replace strlcpy() with unused retval with strscpy() in the core
thermal control code (Wolfram Sang).
- Do not lock thermal zone mutex in the user space governor (Rafael
Wysocki)
- Rework the device tree initialization, convert the drivers to the
new API and remove the old OF code (Daniel Lezcano)
- Fix return value to -ENODEV when searching for a specific thermal
zone which does not exist (Daniel Lezcano)
- Fix the return value inspection in of_thermal_zone_find() (Dan
Carpenter)
- Fix kernel panic when KASAN is enabled as it detects use after
free when unregistering a thermal zone (Daniel Lezcano)
- Move the set_trip ops inside the therma sysfs code (Daniel Lezcano)
- Remove unnecessary error message as it is already showed in the
underlying function (Jiapeng Chong)
- Rework the monitoring path and move the locks upper in the call
stack to fix some potentials race windows (Daniel Lezcano)
- Fix lockdep_assert() warning introduced by the lock rework (Daniel
Lezcano)
- Revert the Mellanox 'hotter thermal zone' feature because it is
already handled in the thermal framework core code (Daniel Lezcano)
* thermal-core: (47 commits)
thermal: core: Increase maximum number of trip points
thermal: move from strlcpy() with unused retval to strscpy()
thermal: gov_user_space: Do not lock thermal zone mutex
Revert "mlxsw: core: Add the hottest thermal zone detection"
thermal/core: Fix lockdep_assert() warning
thermal/core: Move the mutex inside the thermal_zone_device_update() function
thermal/core: Move the thermal zone lock out of the governors
thermal/governors: Group the thermal zone lock inside the throttle function
thermal/core: Rework the monitoring a bit
thermal/core: Rearm the monitoring only one time
thermal/drivers/qcom/spmi-adc-tm5: Remove unnecessary print function dev_err()
thermal/of: Remove old OF code
thermal/core: Move set_trip_temp ops to the sysfs code
thermal/drivers/samsung: Switch to new of thermal API
regulator/drivers/max8976: Switch to new of thermal API
Input: sun4i-ts - switch to new of thermal API
iio/drivers/sun4i_gpadc: Switch to new of thermal API
hwmon/drivers/core: Switch to new of thermal API
hwmon: pm_bus: core: Switch to new of thermal API
ata/drivers/ahci_imx: Switch to new of thermal API
...
On one of the Chrome system, if we define more than 12 trip points,
probe for thermal sensor fails with
"int3403 thermal: probe of INTC1046:03 failed with error -22"
and throws an error as
"thermal_sys: Error: Incorrect number of thermal trips".
The thermal_zone_device_register() interface needs maximum
number of trip points supported in a zone as an argument.
This number can't exceed THERMAL_MAX_TRIPS, which is currently
set to 12. To address this issue, THERMAL_MAX_TRIPS value
has to be increased.
This interface also has an argument to specify a mask of trips
which are writable. This mask is defined as an int.
This mask sets the ceiling for increasing maximum number of
supported trips. With the current implementation, maximum number
of trips can be supported is 31.
Also, THERMAL_MAX_TRIPS macro is used in one place only.
So, remove THERMAL_MAX_TRIPS macro and compare num_trips
directly with using a macro BITS_PER_TYPE(int)-1.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since PCI provides helper macro module_pci_driver(), the
module_init/exit code can be replaced with it.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is a static variable "idle_wakeup_counter", which accounts for
number of wake ups because of IRQs and take actions to compensate idle
injection. This is now read and reset to 0, but never incremented.
So all the usage of this counter for idle injection has no use.
Also another static variable "reduce_irq", which depends on
"idle_wakeup_counter", so remove usage of "reduce_irq" also.
Commit feb6cd6a0f ("thermal/intel_powerclamp: stop sched tick in
forced idle") replaced the local use of "mwait_idle_with_hints" with
play_idle(). This removed possibility of updating "idle_wakeup_counter"
without change in play_idle(). This change was made in Linux 4.10.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When CPU 0 is offline and intel_powerclamp is used to inject
idle, it generates kernel BUG:
BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687
caller is debug_smp_processor_id+0x17/0x20
CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57
Call Trace:
<TASK>
dump_stack_lvl+0x49/0x63
dump_stack+0x10/0x16
check_preemption_disabled+0xdd/0xe0
debug_smp_processor_id+0x17/0x20
powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp]
...
...
Here CPU 0 is the control CPU by default and changed to the current CPU,
if CPU 0 offlined. This check has to be performed under cpus_read_lock(),
hence the above warning.
Use get_cpu() instead of smp_processor_id() to avoid this BUG.
Suggested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is sufficient to check priv->data_vault once in the error code path
of int3400_thermal_probe(), so do that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since the policy needs to be accessed first when obtaining cpu devices,
first check whether the policy is legal before this.
Fixes: 5130802ddb ("thermal: cpu_cooling: Switch to QoS requests for freq limits")
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Drop duplicate words from two kerneldoc comments in the thermal
subsystem.
Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
[ rjw: Subject edits and changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since platform_get_irq() already prints an error message on failure, it
is not necessary to print another one for the same purpose.
Signed-off-by: zhaoxiao <zhaoxiao@uniontech.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that the driver depends on OF, we know what of_match_ptr() will
always resolve to, so we might as well save cpp some work.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 670a5e356c ("thermal/core: Move the thermal zone lock out of
the governors") moved thermal zone locking away from governors, but it
forgot about the user space one which deadlocks now.
Fix this by removing the thermal zone locking from the user space
governor.
Fixes: 670a5e356c ("thermal/core: Move the thermal zone lock out of the governors")
Tested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In some case, the GDDV returns a package with a buffer which has
zero length. It causes that kmemdup() returns ZERO_SIZE_PTR (0x10).
Then the data_vault_read() got NULL point dereference problem when
accessing the 0x10 value in data_vault.
[ 71.024560] BUG: kernel NULL pointer dereference, address:
0000000000000010
This patch uses ZERO_OR_NULL_PTR() for checking ZERO_SIZE_PTR or
NULL value in data_vault.
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function thermal_zone_device_is_enabled() must be called with the
thermal zone lock held. In the resume path, it is called without.
As the thermal_zone_device_is_enabled() is also checked in
thermal_zone_device_update(), do the check in resume() function is
pointless, except for saving an extra initialization which does not
hurt if it is done in all the cases.
Fixes: ca48ad71717dd ("thermal/core: Move the mutex inside the thermal_zone_device_update() function")
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
All the different calls inside the thermal_zone_device_update()
function take the mutex.
The previous changes move the mutex out of the different functions,
like the throttling ops. Now that the mutexes are all at the same
level in the call stack for the thermal_zone_device_update() function,
they can be moved inside this one.
That has the benefit of:
1. Simplify the code by not having a plethora of places where the lock is taken
2. Probably closes more race windows because releasing the lock from
one line to another can give the opportunity to the thermal zone to change
its state in the meantime. For example, the thermal zone can be
enabled right after checking it is disabled.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220805153834.2510142-5-daniel.lezcano@linaro.org
The should_stop_polling() function wraps the function
thermal_zone_device_is_enabled().
The monitor_thermal_zone() function checks if the thermal zone is
enabled via the should_stop_polling() function.
However, the instant after checking the thermal zone is enabled, this
one can be disabled, so even if that reduces the race window, it does
not prevent that and the monitoring can be set again with the thermal
zone disabled.
For this reason, the function should_stop_polling() is replaced by a
direct check of the thermal zone mode with the mutex locks held, that
prevents the situation described above.
As the semantic is clear with the thermal_zone_is_enabled() function,
we can remove the should_stop_polling() function and replace the check
with the former function.
While at it, reorder the checks to improve the readability of the
monitor_thermal_zone() function.
In the future, the thermal_zone_device_disable() and the
thermal_zone_device_enable() functions should unset / set the polling
timer directly instead of relying on the next
thermal_zone_device_update() call to do that. That will make a
synchronous thermal zone mode change but the locking scheme should be
double checked for that which out of the scope of this change.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220805153834.2510142-2-daniel.lezcano@linaro.org
The current code calls monitor_thermal_zone() inside the
handle_thermal_trip() function. But this one is called in a loop for
each trip point which means the monitoring is rearmed several times
for nothing (assuming there could be several passive and active trip
points).
Move the monitor_thermal_zone() function out of the
handle_thermal_trip() function and after the thermal trip loop, so the
timer will be disabled or rearmed one time.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220805153834.2510142-1-daniel.lezcano@linaro.org
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-24-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-23-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-22-daniel.lezcano@linexp.org
Reviewed-by: Bryan Brattlof <bb@ti.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-20-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-19-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-17-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-16-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-15-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-13-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-10-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-9-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal OF code has a new API allowing to migrate the OF
initialization to a simpler approach. The ops are no longer device
tree specific and are the generic ones provided by the core code.
Convert the ops to the thermal_zone_device_ops format and use the new
API to register the thermal zone with these generic ops.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-8-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>