The nominal speed of each fan can be obtained with
i8k_get_fan_nominal_speed(), however the result is not available
from userspace.
Change that by adding fanX_min, fanX_max and fanX_target attributes.
All are RO since fan control happens over pwm.
Tested on a Dell Inspiron 3505 and a Dell Latitude C600.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20210926221044.14327-2-W_Armin@gmx.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Recent patch added possibility to disable selected channels. That would
only make sure that the ENODATA is returned for those channels but would
not configure the actual hardware.
With this patch, the config register is written to make sure the
channels are disabled also at hardware level.
Signed-off-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Link: https://lore.kernel.org/r/d451cacdf21bf8eff38a47c055aad8c0c6e8755a.1634206677.git.krzysztof.adamski@nokia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Previous patches added a way to specify some channel specific parameters
in DT and n-factor is definitely one of them. This calibration mechanism
is board specific as its value depends on the diodes/transistors being
connected to the sensor and thus the DT seems like a right fit for that
information. It is very similar to the value of shunt resistor that some
drivers allows specifying in DT.
This patch adds a possibility to set n-factor for each channel via
"n-factor" DT property in each channel subnode.
Signed-off-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Link: https://lore.kernel.org/r/69d0bfcc5ba27c67f21d3eabfb100656a14c75b9.1634206677.git.krzysztof.adamski@nokia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The previous patch introduced per channel subnodes in DT that let us
specify some channel specific properties. This built a ground for easily
disabling individual channels of the sensor that may not be connected to
any external diode and thus are not returning any meaningful data.
This patch adds support for parsing the "status" property of channels DT
subnodes and makes sure the -ENODATA is returned when disabled channels
value is read.
Signed-off-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Link: https://lore.kernel.org/r/a85d623f0792b862870933a875bdf802f4c017f1.1634206677.git.krzysztof.adamski@nokia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
tmp42x is a multichannel temperature sensor with several external
channels. Since those channels can be used to connect diodes placed
anywhere in the system, their meaning will vary depending on the
project. For this case, the hwmon framework has an idea of labels which
allows us to assign the meaning to each channel.
The similar concept is already implemented in ina3221 - the label for
each channel can be defined via device tree. See commit a9e9dd9c6d
("hwmon: (ina3221) Read channel input source info from DT")
This patch adds support for similar feature to tmp421.
Signed-off-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Link: https://lore.kernel.org/r/dab0fda6ac0e8d7f163c3762a7fb1e595a4d8159.1634206677.git.krzysztof.adamski@nokia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Up to now adt7x10_remove() returns zero unconditionally. Make it return
void instead which makes it easier to see in the callers that there is
no error to handle.
Also the return value of i2c and spi remove callbacks is ignored anyway.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20211011132754.2479853-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
We have bool so use it consistently in all the drivers.
The following Coccinelle script was used:
@@
identifier T;
type t = { char, int };
@@
struct T {
...
- t valid;
+ bool valid;
...
}
@@
identifier v;
@@
(
- v->valid = 0
+ v->valid = false
|
- v->valid = 1
+ v->valid = true
)
followed by sed to fixup the comments:
sed '/bool valid;/{s/!=0/true/;s/zero/false/}'
Few whitespace changes were fixed manually. All modified drivers were
compile-tested.
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Link: https://lore.kernel.org/r/20210924195202.27917-1-fercerpav@gmail.com
[groeck: Fixed up 'u8 valid' to 'boool valid' in atxp1.c]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Provide different names for cooling devices registration to allow
binding each cooling devices to relevant thermal zone. Thus, specific
cooling device can be associated with related thermal sensor by setting
thermal cooling device type for example to "mlxreg_fan2" and passing
this type to thermal_zone_bind_cooling_device() through 'cdev->type'.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210926053541.1806937-3-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Validate PWM connectivity only for additional PWM - "pwm1" is connected
on all systems, while "pwm2" - "pwm4" are optional. Validate
connectivity only for optional attributes by reading of related "pwm{n}"
registers - in case "pwm{n}" is not connected, register value is
supposed to be 0xff.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210926053541.1806937-2-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
ASUS Pro WS X570-ACE board has got an nct6775 chip, but by default
there's no use of it because of resource conflict:
```
ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20210604/utaddress-204
)
ACPI: OSL: Resource conflict; ACPI support missing from driver?
ACPI: OSL: Resource conflict: System may be unstable or behave erratically
```
A workaround is to use `acpi_enforce_resources=lax`, but a proper
support needs to be added instead.
This commit adds Pro WS X570-ACE to the list of boards that can be monitored
using ASUS WMI.
Tested by me on this hardware:
```
Base Board Information
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: Pro WS X570-ACE
BIOS Information
Vendor: American Megatrends Inc.
Version: 3801
Release Date: 07/30/2021
```
Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Link: https://lore.kernel.org/r/20211003133344.9036-2-oleksandr@natalenko.name
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
There are multiple power supplies that will indicate
CFFPS_CCIN_VERSION_1, use the manufacturer ID to determine if it should
be treated as version cffps1 or version cffps2.
Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
Link: https://lore.kernel.org/r/20211004144339.2634330-2-bjwyman@gmail.com
[groeck: Fixed continuation line alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The appropriate mantissa values for the lm25066 family's direct-format
current and power readings are a function of the sense resistor
employed between the SENSE and VIN pins of the chip. Instead of
assuming that resistance is always the same 1mOhm as used in the
datasheet, allow it to be configured via a device-tree property
("shunt-resistor-micro-ohms").
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Link: https://lore.kernel.org/r/20210928092242.30036-8-zev@bewilderbeest.net
[groeck: Fixed checkpatch warnings]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The driver doesn't have a struct of_device_id table but supported devices
are registered via Device Trees. This is working on the assumption that a
I2C device registered via OF will always match a legacy I2C device ID and
that the MODALIAS reported will always be of the form i2c:<device>.
But this could change in the future so the correct approach is to have an
OF device ID table if the devices are registered via OF.
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Link: https://lore.kernel.org/r/20210928092242.30036-7-zev@bewilderbeest.net
[groeck: Replaced reference to reasoning with reasoning, fixed checkpatch
warnings, fixed compile warning comparing of_id->data w/ i2c_id->driver_data]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Maintaining this manually is error prone (there are currently only
five chips supported, not six); gcc can do it for us automatically.
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Fixes: 666c14906b ("hwmon: (pmbus/lm25066) Drop support for LM25063")
Link: https://lore.kernel.org/r/20210928092242.30036-5-zev@bewilderbeest.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
At least as of Revision J, the datasheet has a slightly different
value than what we'd had in the driver.
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Link: https://lore.kernel.org/r/20210928092242.30036-3-zev@bewilderbeest.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
With the exception of the lm5066i, all the devices handled by this
driver had been missing their offset ('b') coefficients for direct
format readings.
Cc: stable@vger.kernel.org
Fixes: 58615a94f6 ("hwmon: (pmbus/lm25066) Add support for LM25056")
Fixes: e53e6497fc ("hwmon: (pmbus/lm25066) Refactor device specific coefficients")
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Link: https://lore.kernel.org/r/20210928092242.30036-2-zev@bewilderbeest.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
There are few places where the maximal number of channels is used define
the size of arrays but when raw number is used it is not clear that they
really related to this quantity.
This commit introduces MAX_CHANNELS define and uses it those places to
give some context to the number.
Signed-off-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Link: https://lore.kernel.org/r/abc1a213a25b890b799b35ad94bb543a2ade7fc8.1632473318.git.krzysztof.adamski@nokia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Array fan->pwm[] is MLXREG_FAN_MAX_PWM elements in size, however the
for-loop has a off-by-one error causing index i to be out of range
causing an out of bounds read on the array. Fix this by replacing
the <= operator with < in the for-loop.
Addresses-Coverity: ("Out-of-bounds read")
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Fixes: 35edbaab3bbf ("hwmon: (mlxreg-fan) Extend driver to support multiply cooling devices")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210920180921.16246-1-colin.king@canonical.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Support accessing the NCT677x via Asus WMI functions.
On mainboards that support this way of accessing the chip, the driver will
usually not work without this option since in these mainboards, ACPI will
mark the I/O port as used.
Code uses ACPI firmware interface to communicate with sensors with ASUS
motherboards:
* PRIME B460-PLUS,
* ROG CROSSHAIR VIII IMPACT,
* ROG STRIX B550-E GAMING,
* ROG STRIX B550-F GAMING,
* ROG STRIX B550-F GAMING (WI-FI),
* ROG STRIX Z490-I GAMING,
* TUF GAMING B550M-PLUS,
* TUF GAMING B550M-PLUS (WI-FI),
* TUF GAMING B550-PLUS,
* TUF GAMING X570-PLUS,
* TUF GAMING X570-PRO (WI-FI).
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204807
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
Co-developed-by: Bernhard Seibold <mail@bernhard-seibold.de>
Signed-off-by: Bernhard Seibold <mail@bernhard-seibold.de>
Tested-by: Pär Ekholm <pehlm@pekholm.org>
Tested-by: <to.eivind@gmail.com>
Tested-by: Artem S. Tashkinov <aros@gmx.com>
Tested-by: Vittorio Roberto Alfieri <me@rebtoor.com>
Tested-by: Sahan Fernando <sahan.h.fernando@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210917220240.56553-4-pauk.denis@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Prepare for platform specific callbacks usage:
* Rearrange code for directly use struct nct6775_sio_data in superio_*()
functions.
* Use superio function pointers in nct6775_sio_data struct instead direct
calls.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204807
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
Co-developed-by: Bernhard Seibold <mail@bernhard-seibold.de>
Signed-off-by: Bernhard Seibold <mail@bernhard-seibold.de>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210917220240.56553-2-pauk.denis@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Add support for additional cooling devices in order to support the
systems, which can be equipped with up-to four PWM controllers.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Add additional PWM attributes in order to support the systems, which
can be equipped with up-to four PWM controllers. System capability of
additional PWM support is validated through the reading of relevant
registers.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210916194719.871413-3-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Extend support of maximum tachometers from 12 to 14 in order to support
new systems, equipped with more fans.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210916194719.871413-2-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Use hwmon_notify_event() to make the code easier to
understand and to also generate udev events.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20210905190049.11381-1-W_Armin@gmx.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Use devm_hwmon_device_register_with_info() to simplify code
and use register defines instead of hardcoded values.
Also use the BIT() macro for the alarms.
Only compile-tested.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20210823170724.7662-1-W_Armin@gmx.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
I got memory leak as follows when doing fault injection test:
unreferenced object 0xffff888102740438 (size 8):
comm "27", pid 859, jiffies 4295031351 (age 143.992s)
hex dump (first 8 bytes):
68 77 6d 6f 6e 30 00 00 hwmon0..
backtrace:
[<00000000544b5996>] __kmalloc_track_caller+0x1a6/0x300
[<00000000df0d62b9>] kvasprintf+0xad/0x140
[<00000000d3d2a3da>] kvasprintf_const+0x62/0x190
[<000000005f8f0f29>] kobject_set_name_vargs+0x56/0x140
[<00000000b739e4b9>] dev_set_name+0xb0/0xe0
[<0000000095b69c25>] __hwmon_device_register+0xf19/0x1e50 [hwmon]
[<00000000a7e65b52>] hwmon_device_register_with_info+0xcb/0x110 [hwmon]
[<000000006f181e86>] devm_hwmon_device_register_with_info+0x85/0x100 [hwmon]
[<0000000081bdc567>] tmp421_probe+0x2d2/0x465 [tmp421]
[<00000000502cc3f8>] i2c_device_probe+0x4e1/0xbb0
[<00000000f90bda3b>] really_probe+0x285/0xc30
[<000000007eac7b77>] __driver_probe_device+0x35f/0x4f0
[<000000004953d43d>] driver_probe_device+0x4f/0x140
[<000000002ada2d41>] __device_attach_driver+0x24c/0x330
[<00000000b3977977>] bus_for_each_drv+0x15d/0x1e0
[<000000005bf2a8e3>] __device_attach+0x267/0x410
When device_register() returns an error, the name allocated in
dev_set_name() will be leaked, the put_device() should be used
instead of calling hwmon_dev_release() to give up the device
reference, then the name will be freed in kobject_cleanup().
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: bab2243ce1 ("hwmon: Introduce hwmon_device_register_with_groups")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20211012112758.2681084-1-yangyingliang@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
If driver read tmp value sufficient for
(tmp & 0x08) && (!(tmp & 0x80)) && ((tmp & 0x7) == ((tmp >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-3-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multi-line alignments]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
If driver read val value sufficient for
(val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-2-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multipline alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
If driver read val value sufficient for
(val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-1-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multi-line alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Add missed attribute for reading POUT from page 1.
It is supported by device, but has been missed in initial commit.
Fixes: 2c6fcbb211 ("hwmon: (pmbus) Add support for MPS Multi-phase mp2975 controller")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210927070740.2149290-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The bytes for max_power_out from the ibm-cffps devices differ in byte
order for some power supplies.
The Witherspoon power supply returns the bytes in MSB/LSB order.
The Rainier power supply returns the bytes in LSB/MSB order.
The Witherspoon power supply uses version cffps1. The Rainier power
supply should use version cffps2. If version is cffps1, swap the bytes
before output to max_power_out.
Tested:
Witherspoon before: 3148. Witherspoon after: 3148.
Rainier before: 53255. Rainier after: 2000.
Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20210928205051.1222815-1-bjwyman@gmail.com
[groeck: Replaced yoda programming]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The P10 (temp sensor version 0x10) doesn't do the same VRM status
reporting that was used on P9. It just reports the temperature, so
drop the check for VRM fru type in the sysfs show function, and don't
set the name to "alarm".
Fixes: db4919ec86 ("hwmon: (occ) Add new temperature sensor type")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20210929153604.14968-1-eajames@linux.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The return value of devm_clk_get should in general be propagated to
upper layer. In this case the clk is optional, use the appropriate
wrapper instead of interpreting all errors as "The optional clk is not
available".
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20210923201113.398932-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Old code produces -24999 for 0b1110011100000000 input in standard format due to
always rounding up rather than "away from zero".
Use the common macro for division, unify and simplify the conversion code along
the way.
Fixes: 9410700b88 ("hwmon: Add driver for Texas Instruments TMP421/422/423 sensor chips")
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Link: https://lore.kernel.org/r/20210924093011.26083-3-fercerpav@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
For both local and remote sensors all the supported ICs can report an
"undervoltage lockout" condition which means the conversion wasn't
properly performed due to insufficient power supply voltage and so the
measurement results can't be trusted.
Fixes: 9410700b88 ("hwmon: Add driver for Texas Instruments TMP421/422/423 sensor chips")
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Link: https://lore.kernel.org/r/20210924093011.26083-2-fercerpav@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Function i2c_smbus_read_byte_data() can return a negative error number
instead of the data read if I2C transaction failed for whatever reason.
Lack of error checking can lead to serious issues on production
hardware, e.g. errors treated as temperatures produce spurious critical
temperature-crossed-threshold errors in BMC logs for OCP server
hardware. The patch was tested with Mellanox OCP Mezzanine card
emulating TMP421 protocol for temperature sensing which sometimes leads
to I2C protocol error during early boot up stage.
Fixes: 9410700b88 ("hwmon: Add driver for Texas Instruments TMP421/422/423 sensor chips")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Link: https://lore.kernel.org/r/20210924093011.26083-1-fercerpav@gmail.com
[groeck: dropped unnecessary line breaks]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Fan speed minimum can be enforced from sysfs. For example, setting
current fan speed to 20 is used to enforce fan speed to be at 100%
speed, 19 - to be not below 90% speed, etcetera. This feature provides
ability to limit fan speed according to some system wise
considerations, like absence of some replaceable units or high system
ambient temperature.
Request for changing fan minimum speed is configuration request and can
be set only through 'sysfs' write procedure. In this situation value of
argument 'state' is above nominal fan speed maximum.
Return non-zero code in this case to avoid
thermal_cooling_device_stats_update() call, because in this case
statistics update violates thermal statistics table range.
The issues is observed in case kernel is configured with option
CONFIG_THERMAL_STATISTICS.
Here is the trace from KASAN:
[ 159.506659] BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x7d/0xb0
[ 159.516016] Read of size 4 at addr ffff888116163840 by task hw-management.s/7444
[ 159.545625] Call Trace:
[ 159.548366] dump_stack+0x92/0xc1
[ 159.552084] ? thermal_cooling_device_stats_update+0x7d/0xb0
[ 159.635869] thermal_zone_device_update+0x345/0x780
[ 159.688711] thermal_zone_device_set_mode+0x7d/0xc0
[ 159.694174] mlxsw_thermal_modules_init+0x48f/0x590 [mlxsw_core]
[ 159.700972] ? mlxsw_thermal_set_cur_state+0x5a0/0x5a0 [mlxsw_core]
[ 159.731827] mlxsw_thermal_init+0x763/0x880 [mlxsw_core]
[ 160.070233] RIP: 0033:0x7fd995909970
[ 160.074239] Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ..
[ 160.095242] RSP: 002b:00007fff54f5d938 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 160.103722] RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00007fd995909970
[ 160.111710] RDX: 0000000000000013 RSI: 0000000001906008 RDI: 0000000000000001
[ 160.119699] RBP: 0000000001906008 R08: 00007fd995bc9760 R09: 00007fd996210700
[ 160.127687] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000013
[ 160.135673] R13: 0000000000000001 R14: 00007fd995bc8600 R15: 0000000000000013
[ 160.143671]
[ 160.145338] Allocated by task 2924:
[ 160.149242] kasan_save_stack+0x19/0x40
[ 160.153541] __kasan_kmalloc+0x7f/0xa0
[ 160.157743] __kmalloc+0x1a2/0x2b0
[ 160.161552] thermal_cooling_device_setup_sysfs+0xf9/0x1a0
[ 160.167687] __thermal_cooling_device_register+0x1b5/0x500
[ 160.173833] devm_thermal_of_cooling_device_register+0x60/0xa0
[ 160.180356] mlxreg_fan_probe+0x474/0x5e0 [mlxreg_fan]
[ 160.248140]
[ 160.249807] The buggy address belongs to the object at ffff888116163400
[ 160.249807] which belongs to the cache kmalloc-1k of size 1024
[ 160.263814] The buggy address is located 64 bytes to the right of
[ 160.263814] 1024-byte region [ffff888116163400, ffff888116163800)
[ 160.277536] The buggy address belongs to the page:
[ 160.282898] page:0000000012275840 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888116167000 pfn:0x116160
[ 160.294872] head:0000000012275840 order:3 compound_mapcount:0 compound_pincount:0
[ 160.303251] flags: 0x200000000010200(slab|head|node=0|zone=2)
[ 160.309694] raw: 0200000000010200 ffffea00046f7208 ffffea0004928208 ffff88810004dbc0
[ 160.318367] raw: ffff888116167000 00000000000a0006 00000001ffffffff 0000000000000000
[ 160.327033] page dumped because: kasan: bad access detected
[ 160.333270]
[ 160.334937] Memory state around the buggy address:
[ 160.356469] >ffff888116163800: fc ..
Fixes: 65afb4c8e7 ("hwmon: (mlxreg-fan) Add support for Mellanox FAN driver")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210916183151.869427-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Commit id "b00647c46c9d7f6ee1ff6aaf335906101755e614",
adds reporting current and voltage to k10temp.c
The commit id "0a4e668b5d52eed8026f5d717196b02b55fb2dc6",
removed reporting current and voltage from k10temp.c
The curr and in(voltage) entries are not removed from
"k10temp_info" structure. Removing those residue entries.
while at it, update k10temp driver documentation
Signed-off-by: suma hegde <suma.hegde@amd.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210902174155.7365-2-nchatrad@amd.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>