linux/drivers/thermal
Rafael J. Wysocki e528be3c87 thermal: core: Allow thermal zones to tell the core to ignore them
The iwlwifi wireless driver registers a thermal zone that is only needed
when the network interface handled by it is up and it wants that thermal
zone to be effectively ignored by the core otherwise.

Before commit a8a2617744 ("thermal: core: Call monitor_thermal_zone()
if zone temperature is invalid") that could be achieved by returning
an error code from the thermal zone's .get_temp() callback because the
core did not really handle errors returned by it almost at all.
However, commit a8a2617744 made the core attempt to recover from the
situation in which the temperature of a thermal zone cannot be
determined due to errors returned by its .get_temp() and is always
invalid from the core's perspective.

That was done because there are thermal zones in which .get_temp()
returns errors to start with due to some difficulties related to the
initialization ordering, but then it will start to produce valid
temperature values at one point.

Unfortunately, the simple approach taken by commit a8a2617744,
which is to poll the thermal zone periodically until its .get_temp()
callback starts to return valid temperature values, is at odds with
the special thermal zone in iwlwifi in which .get_temp() may always
return an error because its network interface may always be down.  If
that happens, every attempt to invoke the thermal zone's .get_temp()
callback resulting in an error causes the thermal core to print a
dev_warn() message to the kernel log which is super-noisy.

To address this problem, make the core handle the case in which
.get_temp() returns 0, but the temperature value returned by it
is not actually valid, in a special way.  Namely, make the core
completely ignore the invalid temperature value coming from
.get_temp() in that case, which requires folding in
update_temperature() into its caller and a few related changes.

On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0
and put THERMAL_TEMP_INVALID into the temperature return memory
location instead of returning an error when the firmware is not
running or it is not of the right type.

Also, to clearly separate the handling of invalid temperature
values from the thermal zone initialization, introduce a special
THERMAL_TEMP_INIT value specifically for the latter purpose.

Fixes: a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Closes: https://lore.kernel.org/linux-pm/20240715044527.GA1544@sol.localdomain/
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4950004.31r3eYUQgx@rjwysocki.net
[ rjw: Rebased on top of the current mainline ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-18 13:35:55 +02:00
..
broadcom thermal/drivers/broadcom: Simplify with dev_err_probe() 2024-07-15 13:31:40 +02:00
intel Merge branch 'thermal-intel' 2024-07-15 20:44:31 +02:00
mediatek thermal/drivers/mediatek/lvts_thermal: Provide default calibration data 2024-07-15 13:31:39 +02:00
qcom Merge branch 'thermal-core' 2024-07-15 20:43:21 +02:00
renesas thermal/drivers/renesas/rcar: Add dependency on OF 2024-07-15 13:31:39 +02:00
samsung thermal/drivers/exynos: Simplify with dev_err_probe() 2024-07-15 13:31:40 +02:00
st thermal/drivers/sti: Cleanup code related to stih416 2024-07-15 13:31:41 +02:00
tegra thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback 2024-07-12 15:14:01 +02:00
ti-soc-thermal
amlogic_thermal.c thermal/drivers/amlogic: Support A1 SoC family Thermal Sensor controller 2024-04-23 12:40:29 +02:00
armada_thermal.c thermal/drivers/armada: Simplify name sanitization 2024-04-23 12:40:29 +02:00
cpufreq_cooling.c thermal/cpufreq: Remove arch_update_thermal_pressure() 2024-04-24 12:08:00 +02:00
cpuidle_cooling.c thermal: cpuidle_cooling: fix kernel-doc warning and a spello 2023-12-21 12:05:48 +01:00
da9062-thermal.c thermal: core: Eliminate writable trip points masks 2024-02-27 12:04:38 +01:00
db8500_thermal.c
devfreq_cooling.c thermal: devfreq_cooling: Fix perf state when calculate dfc res_util 2024-03-27 16:27:39 +01:00
dove_thermal.c
gov_bang_bang.c thermal: gov_bang_bang: Drop unnecessary cooling device target state checks 2024-06-11 21:06:44 +02:00
gov_fair_share.c thermal: gov_fair_share: Eliminate unnecessary integer divisions 2024-04-24 10:15:08 +02:00
gov_power_allocator.c thermal: gov_power_allocator: Return early in manage if trip_max is NULL 2024-07-04 13:35:50 +02:00
gov_step_wise.c thermal: gov_step_wise: Go straight to instance->lower when mitigation is over 2024-06-25 14:37:05 +02:00
gov_user_space.c thermal: gov_user_space: Use .trip_crossed() instead of .throttle() 2024-04-24 20:42:10 +02:00
hisi_thermal.c thermal/drivers/hisi: Simplify with dev_err_probe() 2024-07-15 13:31:40 +02:00
imx8mm_thermal.c
imx_sc_thermal.c
imx_thermal.c Merge branch 'thermal-core' 2024-07-15 20:43:21 +02:00
k3_bandgap.c thermal/drivers/k3_bandgap: Remove some unused fields in struct k3_bandgap 2024-04-23 12:40:29 +02:00
k3_j72xx_bandgap.c thermal/drivers/k3_j72xx_bandgap: Implement suspend/resume support 2024-07-15 13:31:39 +02:00
Kconfig thermal/drivers/renesas: Group all renesas thermal drivers together 2024-07-15 13:31:38 +02:00
khadas_mcu_fan.c
kirkwood_thermal.c
loongson2_thermal.c thermal/drivers/loongson2: Add Loongson-2K2000 support 2024-04-23 12:40:30 +02:00
Makefile thermal/drivers/renesas: Group all renesas thermal drivers together 2024-07-15 13:31:38 +02:00
max77620_thermal.c
qoriq_thermal.c thermal/drivers/qoriq: Fix getting tmu range 2024-03-11 17:14:46 +01:00
rockchip_thermal.c
spear_thermal.c
sprd_thermal.c
sun8i_thermal.c thermal/drivers/sun8i: Don't fail probe due to zone registration failure 2024-03-11 17:14:46 +01:00
thermal_core.c thermal: core: Allow thermal zones to tell the core to ignore them 2024-07-18 13:35:55 +02:00
thermal_core.h thermal: core: Allow thermal zones to tell the core to ignore them 2024-07-18 13:35:55 +02:00
thermal_debugfs.c thermal: trip: Use common set of trip type names 2024-06-11 21:04:40 +02:00
thermal_debugfs.h thermal/debugfs: Do not extend mitigation episodes beyond system resume 2024-06-11 21:04:00 +02:00
thermal_helpers.c thermal: core: Allow thermal zones to tell the core to ignore them 2024-07-18 13:35:55 +02:00
thermal_hwmon.c thermal: core: Store zone ops in struct thermal_zone_device 2024-02-23 18:24:48 +01:00
thermal_hwmon.h
thermal_mmio.c
thermal_netlink.c Merge branch 'thermal-intel' into thermal 2024-04-15 15:45:32 +02:00
thermal_netlink.h thermal: netlink: Add genetlink bind/unbind notifications 2024-03-27 14:50:26 +01:00
thermal_of.c thermal/of: Assume polling-delay(-passive) 0 when absent 2024-03-11 17:14:46 +01:00
thermal_sysfs.c thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback 2024-07-12 15:14:01 +02:00
thermal_trace_ipa.h thermal: core: Make struct thermal_zone_device definition internal 2024-04-08 16:01:20 +02:00
thermal_trace.h tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
thermal_trip.c thermal: trip: Fold __thermal_zone_get_trip() into its caller 2024-07-12 15:14:56 +02:00
thermal-generic-adc.c thermal/drivers/generic-adc: Simplify with dev_err_probe() 2024-07-15 13:31:41 +02:00
uniphier_thermal.c thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip points 2024-07-04 13:25:31 +02:00