With Switchtec hardware it's impossible to get the alignment parameters
for a peer's memory window until the peer's driver has configured its
windows. Strictly speaking, the link doesn't have to be up for this,
but the link being up is the only way the client can tell that
the other side has been configured.
This patch converts ntb_transport and ntb_perf to use this function after
the link goes up. This simplifies these clients slightly because they
no longer have to store the alignment parameters. It also tweaks
ntb_tool so that peer_mw_trans will print zero if it is run before
the link goes up.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
After converting to the new API, both ntb_tool and ntb_transport are
using ntb_mw_count to iterate through ntb_peer_get_addr when they
should be using ntb_peer_mw_count.
This probably isn't an issue with the Intel and AMD drivers but
this will matter for any future driver with asymetric memory window
counts.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Fixes: 443b9a14ec ("NTB: Alter MW API to support multi-ports devices")
If a failure occurs when creating Debug FS entries, unroll all of
the work that's been done.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The ntb_perf tool uses module parameters to control the
characteristics of its test. Enable the changing of these
options through debugfs, and eliminating the need to unload
and reload the module to make changes and run additional tests.
Add a new module parameter that forces the DMA channel
selection onto the same node as the NTB device (default: true).
- seg_order: Size of the NTB memory window; power of 2.
- run_order: Size of the data buffer; power of 2.
- use_dma: Use DMA or memcpy? Default: 0.
- on_node: Only use DMA channel(s) on the NTB node. Default: true.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The Debug FS entries manage themselves; we don't need to hang onto
them in the context structure.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The DMA channel(s)/memory used to transfer data to an NTB device
may not be required to be on the same node as the device. Add a
module parameter that allows any candidate channel (aside from
node assocation) and allocated memory to be used.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Even though there is no any real NTB hardware, which would have both more
than two ports and Scratchpad registers, it is logically correct to have
Scratchpad API accepting a peer port index as well. Intel/AMD drivers utilize
Primary and Secondary topology to split Scratchpad between connected root
devices. Since port-index API introduced, Intel/AMD NTB hardware drivers can
use device port to determine which Scratchpad registers actually belong to
local and peer devices. The same approach can be used if some potential
hardware in future will be multi-port and have some set of Scratchpads.
Here are the brief of changes in the API:
ntb_spad_count() - return number of Scratchpads per each port
ntb_peer_spad_addr(pidx, sidx) - address of Scratchpad register of the
peer device with pidx-index
ntb_peer_spad_read(pidx, sidx) - read specified Scratchpad register of the
peer with pidx-index
ntb_peer_spad_write(pidx, sidx) - write data to Scratchpad register of the
peer with pidx-index
Since there is hardware which doesn't support Scratchpad registers, the
corresponding API methods are now made optional.
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Multi-port NTB devices permit to share a memory between all accessible peers.
Memory Windows API is altered to correspondingly initialize and map memory
windows for such devices:
ntb_mw_count(pidx); - number of inbound memory windows, which can be allocated
for shared buffer with specified peer device.
ntb_mw_get_align(pidx, widx); - get alignment and size restriction parameters
to properly allocate inbound memory region.
ntb_peer_mw_count(); - get number of outbound memory windows.
ntb_peer_mw_get_addr(widx); - get mapping address of an outbound memory window
If hardware supports inbound translation configured on the local ntb port:
ntb_mw_set_trans(pidx, widx); - set translation address of allocated inbound
memory window so a peer device could access it.
ntb_mw_clear_trans(pidx, widx); - clear the translation address of an inbound
memory window.
If hardware supports outbound translation configured on the peer ntb port:
ntb_peer_mw_set_trans(pidx, widx); - set translation address of a memory
window retrieved from a peer device
ntb_peer_mw_clear_trans(pidx, widx); - clear the translation address of an
outbound memory window
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
There is some NTB hardware, which can combine more than just two domains
over NTB. For instance, some IDT PCIe-switches can have NTB-functions
activated on more than two-ports. The different domains are distinguished
by ports they are connected to. So the new port-related methods are added to
the NTB API:
ntb_port_number() - return local port
ntb_peer_port_count() - return number of peers local port can connect to
ntb_peer_port_number(pdix) - return port number by it index
ntb_peer_port_idx(port) - return port index by it number
Current test-drivers aren't changed much. They still support two-ports devices
for the time being while multi-ports hardware drivers aren't added.
By default port-related API is declared for two-ports hardware.
So corresponding hardware drivers won't need to implement it.
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The order parameters are powers of 2; adjust the usage information
to use correct mathematical representations.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Fixes: 8a7b6a778a ("ntb: ntb perf tool")
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
In the normal I/O execution path, ntb_perf is missing a call to
dmaengine_unmap_put() after submission. That causes us to leak
unmap objects.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 8a7b6a77 ("ntb: ntb perf tool")
Signed-off-by: Jon Mason <jdmason@kudzu.us>
This is a static checker warning, not something I'm desperately
concerned about. But snprintf() returns the number of bytes that
would have been copied if there were space. We really care about the
number of bytes that actually were copied so we should use scnprintf()
instead.
It probably won't overrun, and in that case we may as well just use
sprintf() but these sorts of things make static checkers and code
reviewers happier.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
schedule_timeout_* takes a timeout in jiffies but the code currently is
passing in a constant which makes this timeout HZ dependent, so pass it
through msecs_to_jiffies() to fix this up.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
When the link goes down, the link_is_up flag did not return to
false. This could have caused some subtle corner case bugs
when the link goes up and down quickly.
Once that was fixed, there was found to be a race if the link was
brought down then immediately up. The link_cleanup work would
occasionally be scheduled after the next link up event. This would
cancel the link_work that was supposed to occur and leave ntb_perf
in an unusable state.
To fix this we get rid of the link_cleanup work and put the actions
directly in the link_down event.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
This commit adds a debugfs 'count' file to ntb_pingpong. This is so
testing with ntb_pingpong can be automated beyond just checking the
logs for pong messages.
The count file returns a number which increments every pong. The
counter can be cleared by writing a zero.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
In order to more successfully script with ntb_tool it's useful to
have a link file to check the link status so that the script
doesn't use the other files until the link is up.
This commit adds a 'link' file to the debugfs directory which reads
boolean (Y or N) depending on the link status. Writing to the file
change the link state using ntb_link_enable or ntb_link_disable.
A 'link_event' file is also provided so an application can block until
the link changes to the desired state. If the user writes a 1, it will
block until the link is up. If the user writes a 0, it will block until
the link is down.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
In order to make the interface closer to the raw NTB API, this commit
changes memory windows so they are not initialized on link up.
Instead, the 'peer_trans*' debugfs files are introduced. When read,
they return information provided by ntb_mw_get_range. When written,
they create a buffer and initialize the memory window. The
value written is taken as the requested size of the buffer (which
is then rounded for alignment). Writing a value of zero frees the buffer
and tears down the memory window translation. The 'peer_mw*' file is
only created once the memory window translation is setup by the user.
Additionally, it was noticed that the read and write functions for the
'peer_mw*' files should have checked for a NULL pointer.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Instead of returning immediately with an error when the link is
down, wait for the link to come up (or the user sends a SIGINT).
This is to make scripting ntb_perf easier.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Instead of having to watch logs, allow the results to be retrieved
by reading back the run file. This file will return "running" when
the test is running and nothing if no tests have been run yet.
It returns 1 line per thread, and will display an error message if the
corresponding thread returns an error.
With the above change, the pr_info calls that returned the results are
then changed to pr_debug calls.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
This commit accomplishes a few things:
1) Properly prevent multiple sets of threads from running at once using
a mutex. Lots of race issues existed with the thread_cleanup.
2) The mutex allows us to ensure that threads are finished before
tearing down the device or module.
3) Don't use kthread_stop when the threads can exit by themselves, as
this is counter-indicated by the kthread_create documentation. Threads
now wait for kthread_stop to occur.
4) Writing to the run file now blocks until the threads are complete.
The test can then be safely interrupted by a SIGINT.
Also, while I was at it:
5) debugfs_run_write shouldn't return 0 in the early check cases as this
could cause debugfs_run_write to loop undesirably.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
When debugging performance problems, if some issue causes the ntb
hardware to be significantly slower than expected, ntb_perf will
hang requiring a reboot because it only schedules once every 4GB.
Instead, schedule based on jiffies so it will not hang the CPU if
the transfer is slow.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
I'm working on hardware that currently has a limited number of
scratchpad registers and ntb_ndev fails with no clue as to why. I
feel it is better to fail early and provide a reasonable error message
then to fail later on.
The same is done to ntb_perf, but it doesn't currently require enough
spads to actually fail. I've also removed the unused SPAD_MSG and
SPAD_ACK enums so that MAX_SPAD accurately reflects the number of
spads used.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
We allocate some memory window buffers when the link comes up, then we
provide debugfs files to read/write each side of the link.
This is useful for debugging the mapping when writing new drivers.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
On my system, dma_alloc_coherent won't produce memory anywhere
near the size of the BAR. So I needed a way to limit this.
It's pretty much copied straight from ntb_transport.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
On hardware with 32 scratchpad registers the spad field in ntb tool
could chop off the end. The maximum buffer size is increased from
256 to 15 times the number or scratchpads.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
If you tried to write two spads in one line, as per the example:
root@peer# echo '0 0x01010101 1 0x7f7f7f7f' > $DBG_DIR/peer_spad
then the CPU would freeze in an infinite loop.
This wasn't immediately obvious but 'pos' was not incrementing the
buffer, so after reading the second pair of values, 'pos' would once
again be 3 and it would re-read the second pair of values ad infinitum.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The clean up routine when we failed to allocate kthread is not cleaning
up all the threads, only the same one over and over again.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
kthread_create_no_node() returns error pointers, never NULL. Fix check so
it handles error correctly.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
kmalloc can fail and we should check for NULL before using the pointer
returned by kmalloc.
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The perf tool is missing the setup of translation window. Adding call to
setup the translation window for backed memory.
Signed-off-by: John Kading <john.kading@gd-ms.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The ntb driver assigns between pointers an __iomem tokens, and
also casts them to 64-bit integers, which results in compiler
warnings on 32-bit systems:
drivers/ntb/test/ntb_perf.c: In function 'perf_copy':
drivers/ntb/test/ntb_perf.c:213:10: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
vbase = (u64)(u64 *)mw->vbase;
^
drivers/ntb/test/ntb_perf.c:214:14: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
dst_vaddr = (u64)(u64 *)dst;
^
This adds __iomem annotations where needed and changes the temporary
variables to iomem pointers to avoid casting them to u64. I did not
see the problem in linux-next earlier, but it show showed up in
4.5-rc1.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 8a7b6a778a ("ntb: ntb perf tool")
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Providing raw performance data via a tool that directly access data from
NTB w/o any software overhead. This allows measurement of the hardware
performance limit. In revision one we are only doing single direction
CPU and DMA writes. Eventually we will provide bi-directional writes.
The measurement using DMA engine for NTB performance measure does
not measure the raw performance of DMA engine over NTB due to software
overhead. But it should provide the peak performance through the Linux DMA
driver.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
This is a simple debugging driver that enables the doorbell and
scratch pad registers to be read and written from the debugfs. This
tool enables more complicated debugging to be scripted from user space.
This driver may be used to test that your ntb hardware and drivers are
functioning at a basic level.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
This is a simple ping pong driver that exercises the scratch pads and
doorbells of the ntb hardware. This driver may be used to test that
your ntb hardware and drivers are functioning at a basic level.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>