Commit Graph

26 Commits

Author SHA1 Message Date
Dani Liberman
707c125286 habanalabs: keep control device alive during hard reset
Need to allow user retrieve data during reset and afterwards without
the need to reopen the device.
Did it by seperating the user peocesses list into two lists:
1. fpriv_list which contains list of user processes that opened
   the device (currently only one).
2. fpriv_ctrl_list which contains list of user processes that opened
   the control device. This processes in this list shall not be
   killed during reset, only when the device is suddenly removed from
   PCI chain.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Oded Gabbay
5b90e59d55 habanalabs: remove compute context pointer
It was an error to save the compute context's pointer in the device
structure, as it allowed its use without proper ref-cnt.

Change the variable to a flag that only indicates whether there is
an active compute context. Code that needs the pointer will now
be forced to use proper internal APIs to get the pointer.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
4337b50b5f habanalabs: add helper to get compute context
There are multiple places where the code needs to get the context's
pointer and increment its ref cnt. This is the proper way instead
of using the compute context pointer in the device structure.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
6798676f7e habanalabs: fix etr asid configuration
Pass the user's context pointer into the etr configuration function
to extract its ASID.

Using the compute_ctx pointer is an error as it is just an indication
of whether a user has opened the compute device.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
357ff3dc9a habanalabs: save ctx inside encaps signal
Compute context pointer in hdev shouldn't be used for fetching the
context's pointer.

If an object needs the context's pointer, it should get it while
incrementing its kref, and when the object is released, put it.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay
a4dd2ecf36 habanalabs: remove redundant check on ctx_fini
The driver supports only a single context. Therefore, no need to check
if the user context that is closed is the compute context. The user
context, if exists, is always the compute context.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Omer Shpigelman
fae132632c habanalabs: context cleanup cosmetics
No need to check the return value if the following action is the same for
both cases. In addition, now that hl_ctx_free() doesn't print if the
context is not released, its name can be misleading as the context might
stay alive after it is executed with no indication for that.
Hence we can discard it and simply put the refcount.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-10-18 12:05:48 +03:00
Oded Gabbay
e62ada5e23 habanalabs: remove redundant warning message
This warning is redundant as we will print a notice in case the device
is still in use after the FD was closed. No need to print the same
message per context.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29 09:47:48 +03:00
farah kassabri
e4cdccd2ec habanalabs: add support for encapsulated signals submission
This commit is the second part of the encapsulated signals feature.
It contains the driver support for submission of cs with encapsulated
signals and the wait for them.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29 09:47:48 +03:00
farah kassabri
dadf17abb7 habanalabs: add support for encapsulated signals reservation
The signaling from within encapsulated OP capability is merged into the
existing stream architecture, such that one can trigger multiple
signaling from an encapsulated op, according to the time the event
was done in the graph execution and avoid the need to wait for the
whole encapsulated OP execution to be complete before the stream can
signal.

This commit implements only the reserve/unreserve part.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29 09:47:48 +03:00
Ohad Sharabi
215f0c1775 habanalabs: add wait-for-multi-CS uAPI
When user sends multiple CSs, waiting for each CS is not efficient
as it involves many user-kernel context switches.

In order to address this issue we add support to "wait on multiple CSs"
using a new uAPI which can wait on maximum of 32 CSs. The new uAPI is
defined using a new flag - WAIT_FOR_MULTI_CS - in the wait_for_cs IOCTL.

The input parameters for this uAPI will be:
@seq: user pointer to an array of up to 32 CS's sequence numbers.
@seq_array_len: length of sequence array.
@timeout_us: timeout for waiting for any CS.

The output paramateres for this API will be:
@status: multi CS ioctl completion status (dedicated status was added as
         well).
@flags: bitmap of output flags of the CS.
@cs_completion_map: bitmap for multi CS, if CS sequence that was placed
                    in index N in input seq array has completed- the N-th
		    bit in cs_completion_map will be 1, otherwise it will
		    be 0.
@timestamp_nsec: timestamp of the first completed CS

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29 09:47:48 +03:00
Ohad Sharabi
c457d5abf8 habanalabs: get multiple fences under same cs_lock
To add proper support for wait-for-multi-CS, locking the CS lock
for each CS fence in the list is not efficient.

Instead, this patch add support to lock the CS lock once to get all
required fences.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29 09:47:48 +03:00
Ofir Bitton
84586de496 habanalabs: reset device upon FD close if not idle
If device is not idle after user closes the FD we must reset device
as next user that will try to open FD will encounter a non-functional
device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Sagiv Ozeri
a4371c1a1e habanalabs: support HW blocks vm show
Improve "vm" debugfs node to print also the virtual addresses which are
currently mapped to HW blocks in the device.

Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ohad Sharabi
cf30339d3f habanalabs: modify device_idle interface
Currently this API uses single 64 bits mask for engines idle indication.
Recently, it was observed that more bits are needed for some ASICs.
This patch modifies the use of the idle mask and the idle_extensions
mask.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27 21:03:51 +02:00
Ofir Bitton
d3f139c462 habanalabs: add driver support for internal cb scheduling
In order to support scnenarios in which driver needs access to
HW components but it cannot access them directly, we add support for
scheduling command buffers internally.
These command buffers will be transmitted upon next user command
submission context.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27 21:03:50 +02:00
Ofir Bitton
8e39e75a13 habanalabs: Init the VM module for kernel context
In order for reserving VA ranges for kernel memory, we need
to allow the VM module to be initiated with kernel context.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27 21:03:48 +02:00
farah kassabri
03df136bc5 habanalabs/gaudi: scrub all memory upon closing FD
In cases of multi-tenants, administrators may want to prevent data
leakage between users running on the same device one after another.

To do that the driver can scrub the internal memory (both SRAM and
DRAM) after a user finish to use the memory.

Because in GAUDI the driver allows only one application to use the
device at a time, it can scrub the memory when user app close FD.

In future devices where we have MMU on the DRAM, we can scrub the DRAM
memory with a finer granularity (page granularity) when the user
allocates the memory.

This feature is not supported in Goya.

To allow users that want to debug their applications, we add a kernel
module parameter to load the driver with this feature disabled.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:31 +02:00
Ofir Bitton
5de406c0b5 habanalabs: sync stream collective support
Implement sync stream collective for GAUDI. Need to allocate additional
resources for that and add ctx_fini() to clean up those resources.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:30 +02:00
Oded Gabbay
eab1f6e7b0 habanalabs: add notice of device not idle
The device should be idle after a context is closed. If not, print a
notice.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25 14:44:21 +03:00
Oded Gabbay
3c3aa5dbd6 habanalabs: add debug messages for opening/closing context
During debugging of error we sometimes need to know whether the error
happened when a user context was open. Add debug prints when opening and
closing user contexts.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25 14:44:20 +03:00
Tomer Tayar
ef6a0f6caa habanalabs: Add an option to map CB to device MMU
There are cases in which the device should access the host memory of a
CB through the device MMU, and thus this memory should be mapped.
The patch adds a flag to the CB IOCTL, in which a user can ask the
driver to perform the mapping when creating a CB.
The mapping is allowed only if a dedicated VA range was allocated for
the specific ASIC.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Ofir Bitton
a98d73c7fa habanalabs: Replace dma-fence mechanism with completions
habanalabs driver uses dma-fence mechanism for synchronization.
dma-fence mechanism was designed solely for GPUs, hence we purpose
a simpler mechanism based on completions to replace current
dma-fence objects.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:48 +03:00
Tomer Tayar
94f8be9eb0 habanalabs: Fix memory leak in error flow of context initialization
Add a missing free of the cs_pending array in the error flow of context
initialization.

Fixes: c16d45f42b ("habanalabs: Use pending CS amount per ASIC")

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24 20:40:06 +03:00
Ofir Bitton
a04b7cd97e habanalabs: create internal CB pool
Create a device MMU-mapped internal command buffer pool, in order to allow
the driver to allocate CBs for the signal/wait operations
that are fetched by the queues when they are configured with the user's
address space ID.

We must pre-map this internal pool due to performance issues.

This pool is needed for future ASIC support and it is currently unused in
GOYA and GAUDI.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24 20:31:37 +03:00
Oded Gabbay
70b2f993ea habanalabs: create common folder
For internal needs of our CI we need to move all the common code into a
common folder instead of putting them in the root folder of the driver.

Same applies to the common header files under include/

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2020-07-24 20:31:37 +03:00