habanalabs: reset device if still in use when released

If the device file is released while a context is still held, it won't
be possible to reopen it until the context is eventually released.
If that doesn't happen, only a device reset will revert it back to an
operational state, i.e. need to wait for a CS timeout or an error, or to
wait for an external intervention of injecting a reset via sysfs.

At this stage, after the device was released by user, context is held
either because of CS which were left running on the device and are not
relevant anymore, or due to missing cleanup steps from user side.

All of this is in any case handled in the device reset flow, so initiate
the reset at this point instead of waiting for it.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is contained in:
Tomer Tayar 2022-11-09 18:08:38 +02:00 committed by Oded Gabbay
parent 9c604af0c9
commit bc8e4bae70

View File

@ -504,9 +504,10 @@ static int hl_device_release(struct inode *inode, struct file *filp)
hdev->compute_ctx_in_release = 1;
if (!hl_hpriv_put(hpriv))
dev_notice(hdev->dev,
"User process closed FD but device still in use\n");
if (!hl_hpriv_put(hpriv)) {
dev_notice(hdev->dev, "User process closed FD but device still in use\n");
hl_device_reset(hdev, HL_DRV_RESET_HARD);
}
hdev->last_open_session_duration_jif =
jiffies - hdev->last_successful_open_jif;