forked from Minki/linux
7df20f2d89
This patch introduces the SCIF documentation in the header file and describes the IOCTL interface for user mode. mic_overview.txt is updated with documentation on SCIF and a new document describing SCIF in more details is available in scif_overview.txt. Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
99 lines
3.9 KiB
Plaintext
99 lines
3.9 KiB
Plaintext
The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low
|
|
level communications API across PCIe currently implemented for MIC. Currently
|
|
SCIF provides inter-node communication within a single host platform, where a
|
|
node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of
|
|
communicating over the PCIe bus while providing an API that is symmetric
|
|
across all the nodes in the PCIe network. An important design objective for SCIF
|
|
is to deliver the maximum possible performance given the communication
|
|
abilities of the hardware. SCIF has been used to implement an offload compiler
|
|
runtime and OFED support for MPI implementations for MIC coprocessors.
|
|
|
|
==== SCIF API Components ====
|
|
The SCIF API has the following parts:
|
|
1. Connection establishment using a client server model
|
|
2. Byte stream messaging intended for short messages
|
|
3. Node enumeration to determine online nodes
|
|
4. Poll semantics for detection of incoming connections and messages
|
|
5. Memory registration to pin down pages
|
|
6. Remote memory mapping for low latency CPU accesses via mmap
|
|
7. Remote DMA (RDMA) for high bandwidth DMA transfers
|
|
8. Fence APIs for RDMA synchronization
|
|
|
|
SCIF exposes the notion of a connection which can be used by peer processes on
|
|
nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A
|
|
process in a SCIF node initiates a SCIF connection to a peer process on a
|
|
different node via a SCIF "endpoint". SCIF endpoints support messaging APIs
|
|
which are similar to connection oriented socket APIs. Connected SCIF endpoints
|
|
can also register local memory which is followed by data transfer using either
|
|
DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and
|
|
kernel mode clients which are functionally equivalent.
|
|
|
|
==== SCIF Performance for MIC ====
|
|
DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus
|
|
SCIF shows the performance advantages of SCIF for HPC applications and runtimes.
|
|
|
|
Comparison of TCP and SCIF based BW
|
|
|
|
Throughput (GB/sec)
|
|
8 + PCIe Bandwidth ******
|
|
+ TCP ######
|
|
7 + ************************************** SCIF %%%%%%
|
|
| %%%%%%%%%%%%%%%%%%%
|
|
6 + %%%%
|
|
| %%
|
|
| %%%
|
|
5 + %%
|
|
| %%
|
|
4 + %%
|
|
| %%
|
|
3 + %%
|
|
| %
|
|
2 + %%
|
|
| %%
|
|
| %
|
|
1 +
|
|
+ ######################################
|
|
0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+-
|
|
1 10 100 1000 10000 100000
|
|
Transfer Size (KBytes)
|
|
|
|
SCIF allows memory sharing via mmap(..) between processes on different PCIe
|
|
nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap
|
|
latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs.
|
|
|
|
SCIF has a user space library which is a thin IOCTL wrapper providing a user
|
|
space API similar to the kernel API in scif.h. The SCIF user space library
|
|
is distributed @ https://software.intel.com/en-us/mic-developer
|
|
|
|
Here is some pseudo code for an example of how two applications on two PCIe
|
|
nodes would typically use the SCIF API:
|
|
|
|
Process A (on node A) Process B (on node B)
|
|
|
|
/* get online node information */
|
|
scif_get_node_ids(..) scif_get_node_ids(..)
|
|
scif_open(..) scif_open(..)
|
|
scif_bind(..) scif_bind(..)
|
|
scif_listen(..)
|
|
scif_accept(..) scif_connect(..)
|
|
/* SCIF connection established */
|
|
|
|
/* Send and receive short messages */
|
|
scif_send(..)/scif_recv(..) scif_send(..)/scif_recv(..)
|
|
|
|
/* Register memory */
|
|
scif_register(..) scif_register(..)
|
|
|
|
/* RDMA */
|
|
scif_readfrom(..)/scif_writeto(..) scif_readfrom(..)/scif_writeto(..)
|
|
|
|
/* Fence DMAs */
|
|
scif_fence_signal(..) scif_fence_signal(..)
|
|
|
|
mmap(..) mmap(..)
|
|
|
|
/* Access remote registered memory */
|
|
|
|
/* Close the endpoints */
|
|
scif_close(..) scif_close(..)
|