Pull dmaengine fixes from Vinod Koul:
"A couple of core fixes and odd driver fixes for dmaengine subsystem:
Core:
- drop ACPI CSRT table reference after using it
- fix of_dma_router_xlate() error handling
Drivers fixes in idxd, at_hdmac, pl330, dw-edma and jz478"
* tag 'dmaengine-fix-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: ti: k3-udma: Update rchan_oes_offset for am654 SYSFW ABI 3.0
drivers/dma/dma-jz4780: Fix race condition between probe and irq handler
dmaengine: dw-edma: Fix scatter-gather address calculation
dmaengine: ti: k3-udma: Fix the TR initialization for prep_slave_sg
dmaengine: pl330: Fix burst length if burst size is smaller than bus width
dmaengine: at_hdmac: add missing kfree() call in at_dma_xlate()
dmaengine: at_hdmac: add missing put_device() call in at_dma_xlate()
dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate()
dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling
dmaengine: idxd: reset states after device disable or reset
dmaengine: acpi: Put the CSRT table after using it
New drivers should use dma_request_chan() instead
dma_request_slave_channel()
dma_request_slave_channel() is a simple wrapper for dma_request_chan()
eating up the error code for channel request failure and makes deferred
probing impossible.
Move the dma_request_slave_channel() into the header as inline function,
mark it as deprecated.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20200828110507.22407-1-peter.ujfalusi@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Commit ef91bb196b ("kernel.h: Silence sparse warning in
lower_32_bits") caused new warnings to show in the fsldma driver, but
that commit was not to blame: it only exposed some very incorrect code
that tried to take the low 32 bits of an address.
That made no sense for multiple reasons, the most notable one being that
that code was intentionally limited to only 32-bit ppc builds, so "only
low 32 bits of an address" was completely nonsensical. There were no
high bits to mask off to begin with.
But even more importantly fropm a correctness standpoint, turning the
address into an integer then caused the subsequent address arithmetic to
be completely wrong too, and the "+1" actually incremented the address
by one, rather than by four.
Which again was incorrect, since the code was reading two 32-bit values
and trying to make a 64-bit end result of it all. Surprisingly, the
iowrite64() did not suffer from the same odd and incorrect model.
This code has never worked, but it's questionable whether anybody cared:
of the two users that actually read the 64-bit value (by way of some C
preprocessor hackery and eventually the 'get_cdar()' inline function),
one of them explicitly ignored the value, and the other one might just
happen to work despite the incorrect value being read.
This patch at least makes it not fail the build any more, and makes the
logic superficially sane. Whether it makes any difference to the code
_working_ or not shall remain a mystery.
Compile-tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Starting with core version 4.3.a the DMA bus attributes can (and should) be
read from the INTERFACE_DESCRIPTION (0x10) register.
For older core versions, this will still need to be provided from the
device-tree.
The bus-type values are identical to the ones stored in the device-trees,
so we just need to read them. Bus-width values are stored in log2 values,
so we just need to use them as shift values to make them equivalent to the
current format.
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20200825151950.57605-7-alexandru.ardelean@analog.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The channel parameters (which are read from the device-tree) are adjusted
for the DMAEngine framework in the axi_dmac_parse_chan_dt() function, after
they are read from the device-tree.
When we want to read these from registers, we will need to use the same
logic, so this change splits the logic into a separate function.
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20200825151950.57605-6-alexandru.ardelean@analog.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The TR which needs to be initialized for the next sg entry is indexed by
tr_idx and not by the running i counter.
In case any sub element in the SG needs more than one TR, the code would
corrupt an already configured TR.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lore.kernel.org/r/20200824120108.9178-1-peter.ujfalusi@ti.com
Fixes: 6cf668a4ef ("dmaengine: ti: k3-udma: Use the TR counter helper for slave_sg and cyclic")
Signed-off-by: Vinod Koul <vkoul@kernel.org>
DW DMA IP-core provides a way to synthesize the DMA controller with
channels having different parameters like maximum burst-length,
multi-block support, maximum data width, etc. Those parameters both
explicitly and implicitly affect the channels performance. Since DMA slave
devices might be very demanding to the DMA performance, let's provide a
functionality for the slaves to be assigned with DW DMA channels, which
performance according to the platform engineer fulfill their requirements.
After this patch is applied it can be done by passing the mask of suitable
DMA-channels either directly in the dw_dma_slave structure instance or as
a fifth cell of the DMA DT-property. If mask is zero or not provided, then
there is no limitation on the channels allocation.
For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first
two channels are synthesized with max burst length of 16, while the rest
of the channels have been created with max-burst-len=4. It would seem that
the first two channels must be faster than the others and should be more
preferable for the time-critical DMA slave devices. In practice it turned
out that the situation is quite the opposite. The channels with
max-burst-len=4 demonstrated a better performance than the channels with
max-burst-len=16 even when they both had been initialized with the same
settings. The performance drop of the first two DMA-channels made them
unsuitable for the DW APB SSI slave device. No matter what settings they
are configured with, full-duplex SPI transfers occasionally experience the
Rx FIFO overflow. It means that the DMA-engine doesn't keep up with
incoming data pace even though the SPI-bus is enabled with speed of 25MHz
while the DW DMA controller is clocked with 50MHz signal. There is no such
problem has been noticed for the channels synthesized with
max-burst-len=4.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20200731200826.9292-6-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Vinod Koul <vkoul@kernel.org>
According to the DW DMA controller Databook 2.18b (page 40 "3.5 Memory
Peripherals") memory peripherals don't have handshaking interface
connected to the controller, therefore they can never be a flow
controller. Since the CTLx.SRC_MSIZE and CTLx.DEST_MSIZE are properties
valid only for peripherals with a handshaking interface, we can freely
zero these fields out if the memory peripheral is selected to be the
source or the destination of the DMA transfers.
Note according to the databook, length of burst transfers to memory is
always equal to the number of data items available in a channel FIFO or
data items required to complete the block transfer, whichever is smaller;
length of burst transfers from memory is always equal to the space
available in a channel FIFO or number of data items required to complete
the block transfer, whichever is smaller.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20200731200826.9292-5-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Indeed in case of the DMA_DEV_TO_MEM DMA transfers it's enough to take the
destination memory address and the destination master data width into
account to calculate the CTLx.DST_TR_WIDTH setting of the memory
peripheral. According to the DW DMAC IP-core Databook 2.18b (page 66,
Example 5) at the and of a DMA transfer when the DMA-channel internal FIFO
is left with data less than for a single destination burst transaction,
the destination peripheral will enter the Single Transaction Region where
the DW DMA controller can complete a block transfer to the destination
using single transactions (non-burst transaction of CTLx.DST_TR_WIDTH
bytes). If there is no enough data in the DMA-channel internal FIFO for
even a single non-burst transaction of CTLx.DST_TR_WIDTH bytes, then the
channel enters "FIFO flush mode". That mode is activated to empty the FIFO
and flush the leftovers out to the memory peripheral. The flushing
procedure is simple. The data is sent to the memory by means of a set of
single transaction of CTLx.SRC_TR_WIDTH bytes. To sum up it's redundant to
use the LLPs length to find out the CTLx.DST_TR_WIDTH parameter value,
since each DMA transfer will be completed with the CTLx.SRC_TR_WIDTH bytes
transaction if it is required.
We suggest to remove the LLP entry length from the statement which
calculates the memory peripheral DMA transaction width since it's
redundant due to the feature described above. By doing so we'll improve
the memory bus utilization and speed up the DMA-channel performance for
DMA_DEV_TO_MEM DMA-transfers.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200731200826.9292-4-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Vinod Koul <vkoul@kernel.org>
CFGx.FIFO_MODE field controls a DMA-controller "FIFO readiness" criterion.
In other words it determines when to start pushing data out of a DW
DMAC channel FIFO to a destination peripheral or from a source
peripheral to the DW DMAC channel FIFO. Currently FIFO-mode is set to one
for all DW DMAC channels. It means they are tuned to flush data out of
FIFO (to a memory peripheral or by accepting the burst transaction
requests) when FIFO is at least half-full (except at the end of the block
transfer, when FIFO-flush mode is activated) and are configured to get
data to the FIFO when it's at least half-empty.
Such configuration is a good choice when there is no slave device involved
in the DMA transfers. In that case the number of bursts per block is less
than when CFGx.FIFO_MODE = 0 and, hence, the bus utilization will improve.
But the latency of DMA transfers may increase when CFGx.FIFO_MODE = 1,
since DW DMAC will wait for the channel FIFO contents to be either
half-full or half-empty depending on having the destination or the source
transfers. Such latencies might be dangerous in case if the DMA transfers
are expected to be performed from/to a slave device. Since normally
peripheral devices keep data in internal FIFOs, any latency at some
critical moment may cause one being overflown and consequently losing
data. This especially concerns a case when either a peripheral device is
relatively fast or the DW DMAC engine is relatively slow with respect to
the incoming data pace.
In order to solve problems, which might be caused by the latencies
described above, let's enable the FIFO half-full/half-empty "FIFO
readiness" criterion only for DMA transfers with no slave device involved.
Thanks to the commit 99ba8b9b0d ("dmaengine: dw: Initialize channel
before each transfer") we can freely do that in the generic
dw_dma_initialize_chan() method.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200731200826.9292-3-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Vinod Koul <vkoul@kernel.org>