Commit Graph

1073228 Commits

Author SHA1 Message Date
Babu Moger
8bb050cd5c hwmon: (k10temp) Support up to 12 CCDs on AMD Family of processors
The current driver can read the temperatures from upto 8 CCDs
(Core-Complex Die).

The newer AMD Family 19h Models 10h-1Fh and A0h-AFh can support up to
12 CCDs. Update the driver to read up to 12 CCDs.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/163776976762.904164.5618896687524494215.stgit@bmoger-ubuntu
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Denis Pauk
548820e21c hwmon: (asus_wmi_sensors) Support X370 Asus WMI.
Provides a Linux kernel module "asus_wmi_sensors" that provides sensor
readouts via ASUS' WMI interface present in the UEFI of
X370/X470/B450/X399 Ryzen motherboards.

Supported motherboards:
* ROG CROSSHAIR VI HERO,
* PRIME X399-A,
* PRIME X470-PRO,
* ROG CROSSHAIR VI EXTREME,
* ROG CROSSHAIR VI HERO (WI-FI AC),
* ROG CROSSHAIR VII HERO,
* ROG CROSSHAIR VII HERO (WI-FI),
* ROG STRIX B450-E GAMING,
* ROG STRIX B450-F GAMING,
* ROG STRIX B450-I GAMING,
* ROG STRIX X399-E GAMING,
* ROG STRIX X470-F GAMING,
* ROG STRIX X470-I GAMING,
* ROG ZENITH EXTREME,
* ROG ZENITH EXTREME ALPHA.

Co-developed-by: Ed Brindley <kernel@maidavale.org>
Signed-off-by: Ed Brindley <kernel@maidavale.org>
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
[groeck: Squashed:
 "hwmon: Fix warnings in asus_wmi_sensors.rst documetation."]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Denis Pauk
b87611d437 hwmon: (asus_wmi_ec_sensors) Support B550 Asus WMI.
Linux HWMON sensors driver for ASUS motherboards to read
sensors from the embedded controller.

Many ASUS motherboards do not publish all the available
sensors via the Super I/O chip but the missing ones are
available through the embedded controller (EC) registers.

This driver implements reading those sensor data via the
WMI method BREC, which is known to be present in all ASUS
motherboards based on the AMD 500 series chipsets (and
probably is available in other models too). The driver
needs to know exact register addresses for the sensors and
thus support for each motherboard has to be added explicitly.

The EC registers do not provide critical values for the
sensors and as such they are not published to the HWMON.

Supported motherboards:
* PRIME X570-PRO
* Pro WS X570-ACE
* ROG CROSSHAIR VIII HERO
* ROG CROSSHAIR VIII DARK HERO
* ROG CROSSHAIR VIII FORMULA
* ROG STRIX X570-E GAMING
* ROG STRIX B550-I GAMING
* ROG STRIX B550-E GAMING

Co-developed-by: Eugene Shalygin <eugene.shalygin@gmail.com>
Signed-off-by: Eugene Shalygin <eugene.shalygin@gmail.com>
Co-developed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
Tested-by: Tor Vic <torvic9@mailbox.org>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Menghui Wu
df293076a9 hwmon: (f71882fg) Add F81966 support
This adds hardware monitor support the Fintek F81966 Super I/O chip.
Testing was done on the Aaeon SSE-IPTI

Signed-off-by: Menghui Wu <Menghui_Wu@aaeon.com.tw>
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Link: https://lore.kernel.org/r/20211117024320.2428144-1-acelan.kao@canonical.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Guenter Roeck
ff9b877879 hwmon: (adm1021) Improve detection of LM84, MAX1617, and MAX1617A
The adm1021 driver is quite generous with its automatic chip detection
and easily misdetects several chips. Strengthen detection of MAX1617,
MAX1617A, and LM84 to make the driver less vulnerable to false matches.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Guenter Roeck
ff300b71ba hwmon: (tmp401) Hide register write address differences in regmap code
Since we are using regmap access functions to write into chip registers,
we can hide the difference in register write addresses within regmap
code. By doing this we do not need to clear the regmap cache on register
writes, and the high level code does not need to bother about different
read/write register addresses.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Guenter Roeck
50152fb6c1 hwmon: (tmp401) Use regmap
Use regmap for register accesses to be able to utilize its caching
functionality. This also lets us hide register access differences
in regmap code.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Guenter Roeck
ca53e7640d hwmon: (tmp401) Convert to _info API
The new API is cleaner and reduces code size significantly.
All chip accesses are 'hidden' in chip access to prepare for using
regmap. Local caching code is removed, to be replaced by regmap based
caching in a follow-up patch.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Guenter Roeck
bcb31e6808 hwmon: (tmp401) Simplify temperature register arrays
The difference between TMP431 and other chips of this series is that the
TMP431 has an additional sensor. The register addresses for other sensors
are the same for all chips. It is therefore unnecessary to maintain two
different arrays for TMP431 and the other chips. Just use the same array
for all chips and add the TMP431 register addresses as third column.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Nathan Rossi
eacb52f010 hwmon: Driver for Texas Instruments INA238
The INA238 is a I2C power monitor similar to other INA2xx devices,
providing shunt voltage, bus voltage, current, power and temperature
measurements.

Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
Link: https://lore.kernel.org/r/20211102052754.817220-3-nathan@nathanrossi.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Nathan Rossi
8be23b9b31 dt-bindings: hwmon: ti,ina2xx: Add ti,shunt-gain property
Add a property to the binding to define the selected shunt voltage gain.
This specifies the range and accuracy that applies to the shunt circuit.
This property only applies to devices that have a selectable shunt
voltage range via PGA or ADCRANGE register configuration.

Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211102052754.817220-2-nathan@nathanrossi.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Nathan Rossi
ed68a0effe dt-bindings: hwmon: ti,ina2xx: Document ti,ina238 compatible string
Document the compatible string for the Texas Instruments INA238, this
device is a variant of the existing INA2xx devices and has the same
device tree bindings (shunt resistor).

Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211102052754.817220-1-nathan@nathanrossi.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Babu Moger
3cf90efa13 hwmon: (k10temp) Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh
Add thermal info support for AMD Family 19h Models 10h-1Fh and A0h-AFh.
Thermal info is supported via a new PCI device ID at offset 0x300h.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/163640829419.955062.12539219969781039915.stgit@bmoger-ubuntu
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Babu Moger
f707bcb5d1 hwmon: (k10temp) Remove unused definitions
Usage of these definitions were removed after the commit 0a4e668b5d
("hwmon: (k10temp) Remove support for displaying voltage and current on Zen CPUs").
So, cleanup them up.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/163640828776.955062.15863375803675701204.stgit@bmoger-ubuntu
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:05 -08:00
Yazen Ghannam
4fb0abfee4 x86/amd_nb: Add AMD Family 19h Models (10h-1Fh) and (A0h-AFh) PCI IDs
Add the new PCI Device IDs to support new generation of AMD 19h family of
processors.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>  # pci_ids.h
Link: https://lore.kernel.org/r/163640828133.955062.18349019796157170473.stgit@bmoger-ubuntu
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-12-26 15:02:04 -08:00
Linus Torvalds
fc74e0a40e Linux 5.16-rc7 2021-12-26 13:17:17 -08:00
Linus Torvalds
e8ffcd3ab0 - Prevent potential undefined behavior due to shifting pkey constants into the sign bit
- Move the EFI memory reservation code *after* the efi= cmdline parsing has happened
 
 - Revert two commits which turned out to be the wrong direction to chase
 when accommodating early memblock reservations consolidation and command line parameters
 parsing
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHIYbAACgkQEsHwGGHe
 VUphMhAAptI7YGJFqfH6bUuVh7HX38FOx5pCODpZUXJWIsvgC0MRnxqUF0+Srzbt
 ZBMhh7WUiOpIe/ZeUpRouX5Al4nhiDdG4QeddJLc1DaSaAgOJWD9NH0V0K1aWf4k
 poAUoX7mp5M1mfc/boOQGBdtX+l+3vCYU0RzyQ+IhfDNM6GzJpAC1txWFaL/lWE+
 3IVb4OF37mW7Hlk+HdizcTgm1qaHRobwP7fxbqRwdHZy+YRjz0eOd2dR625jff6b
 y3Z7WVkwx+YtNVD7YnXvmJKyh43rbbMmx0CU6WiYlhGkz5dooe9Er2LuouS1kzRr
 4XWvZrr28qLc9eCdv6MaZGv62KyP0QBm4Fg7E/nC/0Jo8NA3GQXulF2OHofSCX/h
 BZa3mJa+heoXpUb7zDTUMa8UCdOYGJpvHRGYcU3JiaIwv+dnTRBdAzyVcqCMWt5P
 wPOl4A08vVZ+w3aNrZh3Zfg03pG4/9miveIa7EsuCZF/oN+5KAVQdcU74zneH4Qx
 nUfF7b80PyiSnCblQ3DL+efKVG4/Bud6crxo1tOtw6cDQxiAYDzupuZBlHV9791O
 0TwPuLJSAqv4aMiobi5/j32pSLaohYJij9stbfB8zeD49rq5uvT+jDoxcKiZurJP
 jamo6LHffF+o8+F2YwFOSIPayZOViXxWvwPd+E6uL7EodvZgiS0=
 =j12C
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Prevent potential undefined behavior due to shifting pkey constants
   into the sign bit

 - Move the EFI memory reservation code *after* the efi= cmdline parsing
   has happened

 - Revert two commits which turned out to be the wrong direction to
   chase when accommodating early memblock reservations consolidation
   and command line parameters parsing

* tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pkey: Fix undefined behaviour with PKRU_WD_BIT
  x86/boot: Move EFI range reservation after cmdline parsing
  Revert "x86/boot: Pull up cmdline preparation and early param parsing"
  Revert "x86/boot: Mark prepare_command_line() __init"
2021-12-26 10:28:55 -08:00
Linus Torvalds
2afa90bd1c - Prevent clang from reordering the reachable annotation in an inline asm statement
without inputs
 
 - Fix objtool builds on non-glibc systems due to undefined __always_inline
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHIVhIACgkQEsHwGGHe
 VUoG0Q/4rbCfRL4apKqVgZKZ2zXW69V+aWqGzFL56vshUhChnQpIr3ANN4qDOaCu
 TD23gxGcjQzjJsqDOu/YCQIxbcBdsDY5Ufix22eOnO24xcjFCNMcdYkV+nn1iXtO
 VBqqOM7ebm1ZXjIuhu43gNABY7s8v9bFM6MupZUKQ51tSsn2v6Mt2acsRZXeRFl1
 bUPBmEr6nOepeDwuAvdIWFQ2hl87FAlQlkPcvdbE7zEnAE/a/TOT8IFOZ4fHh+IB
 Kluw0FeiWBkP5pqPHNKwP64JNKvlAl5yMxFOYONBevgB1WVQJTFt0PXM78OXT0Fw
 oythqKA64c3XI9GyXuDL5Uwl8X4LmJzVE0LIflSesvAhWuAkiz3pYg8n5Q6vD4IL
 9VV/SZes+8/XmcxirAqnPWrmQmWa8iDf4QGiWwpVnXaHPQxw1HMjU0jC54/mSuiF
 0mV6G4IedsBiOXs0MocZF5VwcN0oroOZQEXqtHSo2uxUDIqDQyGY4f8xsHmi3Oed
 XZdS/RpUVUyrcj/aVXnSFQfBcfUO0Q0xvqqMDmBeSHE8B6gXdm8aW0fPSFQxc1Tk
 blYZ+0rB3ZdQqZsIXFe9nQGIh+4/bSHiLWP5+yf2zr1WAqyCBfSEPkmtvqPt9hJX
 SJfvAHlj2TIWR2VMO88EonRxNIAlhpEAsptHvCLCAQFReCgsNw==
 =QUDC
 -----END PGP SIGNATURE-----

Merge tag 'objtool_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

 - Prevent clang from reordering the reachable annotation in
   an inline asm statement without inputs

 - Fix objtool builds on non-glibc systems due to undefined
   __always_inline

* tag 'objtool_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compiler.h: Fix annotation macro misplacement with Clang
  uapi: Fix undefined __always_inline on non-glibc systems
2021-12-26 10:19:40 -08:00
Ofir Bitton
ce80098db2 habanalabs: support hard-reset scheduling during soft-reset
As hard-reset can be requested during soft-reset, driver must allow
it or else critical events received during soft-reset will be
ignored.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:42:31 +02:00
Ofir Bitton
42eb2872e0 habanalabs: add a lock to protect multiple reset variables
Atomic operations during reset are replaced by a spinlock in order
to have the ability to protect more than a single variable.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:42:11 +02:00
Ofir Bitton
eb13529191 habanalabs: refactor reset information variables
Unify variables related to device reset, which will help us to
add some new reset functionality in future patches.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:41:28 +02:00
Ohad Sharabi
60bf3bfb5a habanalabs: handle skip multi-CS if handling not done
This patch fixes issue in which we have timeout for multi-CS although
the CS in the list actually completed.

Example scenario (the two threads marked as WAIT for the thread that
handles the wait_for_multi_cs and CMPL as the thread that signal
completion for both CS and multi-CS):
1. Submit CS with sequence X
2. [WAIT]: call wait_for_multi_cs with single CS X
3. [CMPL]: CS X do invoke complete_all for both CS and multi-CS
           (multi_cs_completion_done still false)
4. [WAIT]: enter poll_fences, reinit the completion and find the CS
           as completed when asking on the fence but multi_cs_done is
	   still false it returns that no CS actually completed
5. [CMPL]: set multi_cs_handling_done as true
6. [WAIT]: wait for completion but no CS to awake the wait context
           and hence wait till timeout

Solution: if CS detected as completed in poll_fences but multi_cs_done
          is still false invoke complete_all to the multi-CS completion
	  and so it will not go to sleep in wait_for_completion but
	  rather will have a "second chance" to wait for
	  multi_cs_completion_done.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:40:25 +02:00
Tomer Tayar
f297a0e9fe habanalabs: add CPU-CP packet for engine core ASID cfg
In some cases the driver cannot configure ASID of some engines due to
the security level of the relevant registers.
For this a new CPU-CP packet is introduced, which will allow the driver
to ask the F/W to do this configuration instead.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:39:53 +02:00
Richard Zhu
178e244cb6 PCI: imx: Add the imx8mm pcie support
i.MX8MM PCIe works mostly like the i.MX8MQ one, but has a different PHY
and allows to output the internal PHY reference clock via the refclk pad.
Add the i.MX8MM PCIe support based on the standalone PHY driver.

Link: https://lore.kernel.org/r/1640312885-31142-2-git-send-email-hongxing.zhu@nxp.com
Tested-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Tested-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Tim Harvey <tharvey@gateworks.com>
2021-12-26 12:13:32 +00:00
Oded Gabbay
519f4ed0a0 habanalabs: replace some -ENOTTY with -EINVAL
-ENOTTY is returned in case of error in the ioctl arguments themselves,
such as function that doesn't exists.

In all other cases, where the error is in the arguments of the custom
data structures that we define that are passed in the various ioctls,
we need to return -EINVAL.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:10 +02:00
Ofir Bitton
0a63ac769b habanalabs: fix comments according to kernel-doc
Fix missing fields, descriptions not according to kernel-doc style.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:10 +02:00
Ofir Bitton
a7224c2116 habanalabs: fix endianness when reading cpld version
Current sysfs implementation does not take endianness into
consideration when dumping the cpld version.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
farah kassabri
b9d31cada7 habanalabs: change wait_for_interrupt implementation
Currently the cq counters are allocated in userspace memory,
and mapped by the driver to the device address space.

A new requirement that is part of new future API related to this one,
requires that cq counters will be allocated in kernel memory.

We leverage the existing cb_create API with KERNEL_MAPPED flag set to
allocate this memory.

That way we gain two things:
1. The memory cannot be freed while in use since it's protected
by refcount in driver.

2. No need to wake up the user thread upon each interrupt from CQ,
because the kernel has direct access to the counter. Therefore,
it can make comparison with the target value in the interrupt
handler and wake up the user thread only if the counter reaches the
target value. This is instead of waking the thread up to copy counter
value from user then go sleep again if target value wasn't reached.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ohad Sharabi
e2558f0f84 habanalabs: prevent wait if CS in multi-CS list completed
By the original design we assumed that if we "miss" multi CS completion
it is of no severe consequence as we'll just call wait_for_multi_cs
again.

Sequence of events for such scenario:
1. user submit CS with sequence N
2. user calls wait for multi-CS with only CS #N in the list
3. the multi CS call starts with poll of the CSs but find that none
   completed (while CS #N did not completed yet)
4. now, multi CS #N complete but multi CS CTX was not yet created for
   the above multi-CS. so, attempt to complete multi-CS fails (as no
   multi CS CTX exist)
5. wait_for_multi_cs call now does init_wait_multi_cs_completion (and
   for this create the multi-CS CTX)
6. wait_for_multi_cs wits on completion but will not get one as CS #N
   already completed

To fix the issue we initialize the multi-CS CTX prior polling the
fences.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton
86c00b2c36 habanalabs: modify cpu boot status error print
As BTL can be replaced by ROM we should modify relevant error print.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ohad Sharabi
d636a932b3 habanalabs: clean MMU headers definitions
During the MMU development the MMU header files were left with unclean
definitions:

- MMU "version specific" definitions that were left in the mmu_general
  file
- unused definitions

This patch attempts, where possible, to keep definitions that can serve
multiple MMU versions (but that are not tightly bound with specific MMU
arch) in the mmu_general header file (e.g. different definitions for
number of HOPs).

Otherwise, move MMU version specific definitions (e.g. HOPs masks and
shifts) to the specific MMU version file.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton
9993f27de1 habanalabs: expose soft reset sysfs nodes for inference ASIC
As we allow soft-reset to be performed only on inference devices,
having the sysfs nodes may cause a confusion. Hence, we remove those
nodes on training ASICs.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton
b5c92b8882 habanalabs: sysfs support for two infineon versions
Currently sysfs support dumping a single infineon version, in
future asics we will have two infineon versions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Dani Liberman
707c125286 habanalabs: keep control device alive during hard reset
Need to allow user retrieve data during reset and afterwards without
the need to reopen the device.
Did it by seperating the user peocesses list into two lists:
1. fpriv_list which contains list of user processes that opened
   the device (currently only one).
2. fpriv_ctrl_list which contains list of user processes that opened
   the control device. This processes in this list shall not be
   killed during reset, only when the device is suddenly removed from
   PCI chain.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Oded Gabbay
bb099a8051 habanalabs: fix hwmon handling for legacy f/w
In legacy f/w that use old hwmon.h file, the values of the hwmon
enums are different than the values that are in newer kernels (5.6
and above).

Therefore, to support working with those f/w, we need to do some
fixup before registering with the hwmon subsystem and also when
calling the functions that communicate with the f/w to retrieve
sensors information.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton
9acdc21b0b habanalabs: add current PI value to cpu packets
In order to increase cpucp messaging reliability we will add
the current PI value to the descriptor sent to F/W.
F/W will wait for the PI value as an indication of a valid packet.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Oded Gabbay
7363805b8a habanalabs: remove in_debug check in device open
The driver supports only a single user anyway, so there is no point
in checking whether we are in_debug state when a user tries to open
the device, because if we are in_debug, it means a user is already
using the device.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Ofir Bitton
7c623ef732 habanalabs: return correct clock throttling period
Current clock throttling period returned from driver was wrong due
to wrong time comparison.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Ohad Sharabi
b02220536c habanalabs: wait again for multi-CS if no CS completed
The original multi-CS design assumption that stream masters are used
exclusively (i.e. multi-CS with set of stream master QIDs will not get
completed by CS not from the multi-CS set) is inaccurate.

Thus multi-CS behavior is now modified not to treat such case as an
error.

Instead, if we have multi-CS completion but we detect that no CS from
the list is actually completed we will do another multi-CS wait (with
modified timeout).

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
5b90e59d55 habanalabs: remove compute context pointer
It was an error to save the compute context's pointer in the device
structure, as it allowed its use without proper ref-cnt.

Change the variable to a flag that only indicates whether there is
an active compute context. Code that needs the pointer will now
be forced to use proper internal APIs to get the pointer.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
4337b50b5f habanalabs: add helper to get compute context
There are multiple places where the code needs to get the context's
pointer and increment its ref cnt. This is the proper way instead
of using the compute context pointer in the device structure.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
6798676f7e habanalabs: fix etr asid configuration
Pass the user's context pointer into the etr configuration function
to extract its ASID.

Using the compute_ctx pointer is an error as it is just an indication
of whether a user has opened the compute device.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
357ff3dc9a habanalabs: save ctx inside encaps signal
Compute context pointer in hdev shouldn't be used for fetching the
context's pointer.

If an object needs the context's pointer, it should get it while
incrementing its kref, and when the object is released, put it.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
a4dd2ecf36 habanalabs: remove redundant check on ctx_fini
The driver supports only a single context. Therefore, no need to check
if the user context that is closed is the compute context. The user
context, if exists, is always the compute context.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
fee187fe46 habanalabs: free signal handle on failure
Fix a bug where in case of failure to allocate idr, the handle's
memory wasn't freed as part of the error handling code.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Tomer Tayar
b166465452 habanalabs: add missing kernel-doc comments for hl_device fields
Add missing kernel-doc comments for the "last_error" and
"stream_master_qid_arr" fields of the "hl_device" structure".

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Tomer Tayar
d214636be8 habanalabs: pass reset flags to reset thread
The reset flags used by the reset thread are currently a mix of
hard-coded values and a specific flag which is passed from the context
that initiates the reset.
To make it easier to pass more flags in future from this context to the
reset thread, modify it to pass all the original reset flags to the
thread.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Dani Liberman
2487f4a281 habanalabs: enable access to info ioctl during hard reset
Because info ioctl is used to retrieve data, some of its opcodes may be
used during hard reset.
Other ioctls should be blocked while device is not operational.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Dani Liberman
1880f7acd7 habanalabs: add SOB information to signal submission uAPI
For debug purpose, add SOB address and SOB initial counter value
before current submission to uAPI output.

Using SOB address and initial counter, user can calculate how much of
the submmision has been completed.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Ohad Sharabi
4fac990f60 habanalabs: skip read fw errors if dynamic descriptor invalid
Reporting FW errors involves reading of the error registers.

In case we have a corrupted FW descriptor we cannot do that since the
dynamic scratchpad is potentially corrupted as well and may cause kernel
crush when attempting access to a corrupted register offset.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00