mainlining shenanigans
Go to file
Jens Axboe 2b188cc1bb Add io_uring IO interface
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.

IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.

Two new system calls are added for this:

io_uring_setup(entries, params)
	Sets up an io_uring instance for doing async IO. On success,
	returns a file descriptor that the application can mmap to
	gain access to the SQ ring, CQ ring, and io_uring_sqes.

io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
	Initiates IO against the rings mapped to this fd, or waits for
	them to complete, or both. The behavior is controlled by the
	parameters passed in. If 'to_submit' is non-zero, then we'll
	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
	kernel will wait for 'min_complete' events, if they aren't
	already available. It's valid to set IORING_ENTER_GETEVENTS
	and 'min_complete' == 0 at the same time, this allows the
	kernel to return already completed events without waiting
	for them. This is useful only for polling, as for IRQ
	driven IO, the application can just check the CQ ring
	without entering the kernel.

With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.

For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.

Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.

Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28 08:24:23 -07:00
arch Add io_uring IO interface 2019-02-28 08:24:23 -07:00
block block: optimize blk_bio_segment_split for single-page bvec 2019-02-27 06:18:55 -07:00
certs kbuild: remove redundant target cleaning on failure 2019-01-06 09:46:51 +09:00
crypto crypto: sm3 - fix undefined shift by >= width of value 2019-01-10 21:37:32 +08:00
Documentation fs: add an iopoll method to struct file_operations 2019-02-24 08:20:17 -07:00
drivers loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() 2019-02-22 15:51:22 -07:00
firmware kbuild: change filechk to surround the given command with { } 2019-01-06 09:46:51 +09:00
fs Add io_uring IO interface 2019-02-28 08:24:23 -07:00
include Add io_uring IO interface 2019-02-28 08:24:23 -07:00
init Add io_uring IO interface 2019-02-28 08:24:23 -07:00
ipc ipc: IPCMNI limit check for semmni 2018-10-31 08:54:14 -07:00
kernel Add io_uring IO interface 2019-02-28 08:24:23 -07:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-08 11:21:54 -08:00
LICENSES This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
mm mm: migrate: don't rely on __PageMovable() of newpage after unlocking it 2019-02-01 15:46:24 -08:00
net Add io_uring IO interface 2019-02-28 08:24:23 -07:00
samples samples: mei: use /dev/mei0 instead of /dev/mei 2019-01-30 15:24:45 +01:00
scripts Bug fixes for gcc-plugins 2019-01-21 13:07:03 +13:00
security apparmor: Fix aa_label_build() error handling for failed merges 2019-02-01 08:01:39 -08:00
sound sound fixes for 5.0-rc6 2019-02-07 08:33:56 +00:00
tools Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-02-10 09:48:18 -08:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974) 2019-02-07 19:02:38 +01:00
.clang-format clang-format: Update .clang-format with the latest for_each macro list 2019-01-19 19:26:06 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: Add support for DT binding schema checks 2018-12-13 09:41:32 -06:00
.mailmap A few early MIPS fixes for 4.21: 2019-01-05 12:48:25 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS Add CREDITS entry for Shaohua Li 2019-01-04 14:27:09 -07:00
Kbuild kbuild: use assignment instead of define ... endef for filechk_* rules 2019-01-06 10:22:35 +09:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2019-02-09 13:43:12 -08:00
Makefile Linux 5.0-rc6 2019-02-10 14:42:20 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.