summary refs log tree commit diff
path: root/Documentation/misc-devices
diff options
context:
space:
mode:
authorJonathan Neuschäfer <j.neuschaefer@gmx.net>2020-03-08 22:14:43 +0100
committerJonathan Corbet <corbet@lwn.net>2020-03-10 11:12:34 -0600
commit19796c348ab62287fc6434f296ae279d7b97e39f (patch)
treec7912efe1f5cfbd0ddc00471b8371cec2bc86e8d /Documentation/misc-devices
parent0150aedda00efe2df9156c57de63f4dea8568e8e (diff)
downloadlinux-19796c348ab62287fc6434f296ae279d7b97e39f.tar.gz
docs: Move Intel Many Integrated Core documentation (mic) under misc-devices
It doesn't need to be a top-level chapter.

This patch also updates MAINTAINERS and makes sure the F: lines are
properly sorted.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20200308211519.8414-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/misc-devices')
-rw-r--r--Documentation/misc-devices/index.rst1
-rw-r--r--Documentation/misc-devices/mic/index.rst16
-rw-r--r--Documentation/misc-devices/mic/mic_overview.rst85
-rw-r--r--Documentation/misc-devices/mic/scif_overview.rst108
4 files changed, 210 insertions, 0 deletions
diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
index f11c5daeada5..c1dcd2628911 100644
--- a/Documentation/misc-devices/index.rst
+++ b/Documentation/misc-devices/index.rst
@@ -20,4 +20,5 @@ fit into other categories.
    isl29003
    lis3lv02d
    max6875
+   mic/index
    xilinx_sdfec
diff --git a/Documentation/misc-devices/mic/index.rst b/Documentation/misc-devices/mic/index.rst
new file mode 100644
index 000000000000..3a8d06367ef1
--- /dev/null
+++ b/Documentation/misc-devices/mic/index.rst
@@ -0,0 +1,16 @@
+=============================================
+Intel Many Integrated Core (MIC) architecture
+=============================================
+
+.. toctree::
+    :maxdepth: 1
+
+    mic_overview
+    scif_overview
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/misc-devices/mic/mic_overview.rst b/Documentation/misc-devices/mic/mic_overview.rst
new file mode 100644
index 000000000000..17d956bdaf7c
--- /dev/null
+++ b/Documentation/misc-devices/mic/mic_overview.rst
@@ -0,0 +1,85 @@
+======================================================
+Intel Many Integrated Core (MIC) architecture overview
+======================================================
+
+An Intel MIC X100 device is a PCIe form factor add-in coprocessor
+card based on the Intel Many Integrated Core (MIC) architecture
+that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
+implements the three required standard address spaces i.e. configuration,
+memory and I/O. The host OS loads a device driver as is typical for
+PCIe devices. The card itself runs a bootstrap after reset that
+transfers control to the card OS downloaded from the host driver. The
+host driver supports OSPM suspend and resume operations. It shuts down
+the card during suspend and reboots the card OS during resume.
+The card OS as shipped by Intel is a Linux kernel with modifications
+for the X100 devices.
+
+Since it is a PCIe card, it does not have the ability to host hardware
+devices for networking, storage and console. We provide these devices
+on X100 coprocessors thus enabling a self-bootable equivalent
+environment for applications. A key benefit of our solution is that it
+leverages the standard virtio framework for network, disk and console
+devices, though in our case the virtio framework is used across a PCIe
+bus. A Virtio Over PCIe (VOP) driver allows creating user space
+backends or devices on the host which are used to probe virtio drivers
+for these devices on the MIC card. The existing VRINGH infrastructure
+in the kernel is used to access virtio rings from the host. The card
+VOP driver allows card virtio drivers to communicate with their user
+space backends on the host via a device page. Ring 3 apps on the host
+can add, remove and configure virtio devices. A thin MIC specific
+virtio_config_ops is implemented which is borrowed heavily from
+previous similar implementations in lguest and s390.
+
+MIC PCIe card has a dma controller with 8 channels. These channels are
+shared between the host s/w and the card s/w. 0 to 3 are used by host
+and 4 to 7 by card. As the dma device doesn't show up as PCIe device,
+a virtual bus called mic bus is created and virtual dma devices are
+created on it by the host/card drivers. On host the channels are private
+and used only by the host driver to transfer data for the virtio devices.
+
+The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a
+low level communications API across PCIe currently implemented for MIC.
+More details are available at scif_overview.txt.
+
+The Coprocessor State Management (COSM) driver on the host allows for
+boot, shutdown and reset of Intel MIC devices. It communicates with a COSM
+"client" driver on the MIC cards over SCIF to perform these functions.
+
+Here is a block diagram of the various components described above. The
+virtio backends are situated on the host rather than the card given better
+single threaded performance for the host compared to MIC, the ability of
+the host to initiate DMA's to/from the card using the MIC DMA engine and
+the fact that the virtio block storage backend can only be on the host::
+
+               +----------+           |             +----------+
+               | Card OS  |           |             | Host OS  |
+               +----------+           |             +----------+
+                                      |
+        +-------+ +--------+ +------+ | +---------+  +--------+ +--------+
+        | Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
+        | Net   | |Console | |Block | | |Net      |  |Console | |Block   |
+        | Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
+        +---+---+ +---+----+ +--+---+ | +---------+  +----+---+ +--------+
+            |         |         |     |      |            |         |
+            |         |         |     |User  |            |         |
+            |         |         |     |------|------------|--+------|-------
+            +---------+---------+     |Kernel                |
+                      |               |                      |
+  +---------+     +---+----+ +------+ | +------+ +------+ +--+---+  +-------+
+  |MIC DMA  |     |  VOP   | | SCIF | | | SCIF | | COSM | | VOP  |  |MIC DMA|
+  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
+      |               |         |     |    |        |                    |
+  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
+  |MIC      |     |  VOP   | |SCIF  | | |SCIF  | | COSM | | VOP  |  | MIC   |
+  |HW Bus   |     |  HW Bus| |HW Bus| | |HW Bus| | Bus  | |HW Bus|  |HW Bus |
+  +---------+     +--------+ +--+---+ | +--+---+ +------+ +------+  +-------+
+      |               |         |     |       |     |                    |
+      |   +-----------+--+      |     |       |    +---------------+     |
+      |   |Intel MIC     |      |     |       |    |Intel MIC      |     |
+      |   |Card Driver   |      |     |       |    |Host Driver    |     |
+      +---+--------------+------+     |       +----+---------------+-----+
+                 |                    |                   |
+             +-------------------------------------------------------------+
+             |                                                             |
+             |                    PCIe Bus                                 |
+             +-------------------------------------------------------------+
diff --git a/Documentation/misc-devices/mic/scif_overview.rst b/Documentation/misc-devices/mic/scif_overview.rst
new file mode 100644
index 000000000000..4c8ad9e43706
--- /dev/null
+++ b/Documentation/misc-devices/mic/scif_overview.rst
@@ -0,0 +1,108 @@
+========================================
+Symmetric Communication Interface (SCIF)
+========================================
+
+The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low
+level communications API across PCIe currently implemented for MIC. Currently
+SCIF provides inter-node communication within a single host platform, where a
+node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of
+communicating over the PCIe bus while providing an API that is symmetric
+across all the nodes in the PCIe network. An important design objective for SCIF
+is to deliver the maximum possible performance given the communication
+abilities of the hardware. SCIF has been used to implement an offload compiler
+runtime and OFED support for MPI implementations for MIC coprocessors.
+
+SCIF API Components
+===================
+
+The SCIF API has the following parts:
+
+1. Connection establishment using a client server model
+2. Byte stream messaging intended for short messages
+3. Node enumeration to determine online nodes
+4. Poll semantics for detection of incoming connections and messages
+5. Memory registration to pin down pages
+6. Remote memory mapping for low latency CPU accesses via mmap
+7. Remote DMA (RDMA) for high bandwidth DMA transfers
+8. Fence APIs for RDMA synchronization
+
+SCIF exposes the notion of a connection which can be used by peer processes on
+nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A
+process in a SCIF node initiates a SCIF connection to a peer process on a
+different node via a SCIF "endpoint". SCIF endpoints support messaging APIs
+which are similar to connection oriented socket APIs. Connected SCIF endpoints
+can also register local memory which is followed by data transfer using either
+DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and
+kernel mode clients which are functionally equivalent.
+
+SCIF Performance for MIC
+========================
+
+DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus
+SCIF shows the performance advantages of SCIF for HPC applications and
+runtimes::
+
+             Comparison of TCP and SCIF based BW
+
+  Throughput (GB/sec)
+    8 +                                             PCIe Bandwidth ******
+      +                                                        TCP ######
+    7 +    **************************************             SCIF %%%%%%
+      |                       %%%%%%%%%%%%%%%%%%%
+    6 +                   %%%%
+      |                 %%
+      |               %%%
+    5 +              %%
+      |            %%
+    4 +           %%
+      |          %%
+    3 +         %%
+      |        %
+    2 +      %%
+      |     %%
+      |    %
+    1 +
+      +    ######################################
+    0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+-
+      1       10     100      1000   10000   100000
+                   Transfer Size (KBytes)
+
+SCIF allows memory sharing via mmap(..) between processes on different PCIe
+nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap
+latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs.
+
+SCIF has a user space library which is a thin IOCTL wrapper providing a user
+space API similar to the kernel API in scif.h. The SCIF user space library
+is distributed @ https://software.intel.com/en-us/mic-developer
+
+Here is some pseudo code for an example of how two applications on two PCIe
+nodes would typically use the SCIF API::
+
+  Process A (on node A)			Process B (on node B)
+
+  /* get online node information */
+  scif_get_node_ids(..)			scif_get_node_ids(..)
+  scif_open(..)				scif_open(..)
+  scif_bind(..)				scif_bind(..)
+  scif_listen(..)
+  scif_accept(..)				scif_connect(..)
+  /* SCIF connection established */
+
+  /* Send and receive short messages */
+  scif_send(..)/scif_recv(..)		scif_send(..)/scif_recv(..)
+
+  /* Register memory */
+  scif_register(..)			scif_register(..)
+
+  /* RDMA */
+  scif_readfrom(..)/scif_writeto(..)	scif_readfrom(..)/scif_writeto(..)
+
+  /* Fence DMAs */
+  scif_fence_signal(..)			scif_fence_signal(..)
+
+  mmap(..)				mmap(..)
+
+  /* Access remote registered memory */
+
+  /* Close the endpoints */
+  scif_close(..)				scif_close(..)