linux/net
Jens Axboe 2b188cc1bb Add io_uring IO interface
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.

IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.

Two new system calls are added for this:

io_uring_setup(entries, params)
	Sets up an io_uring instance for doing async IO. On success,
	returns a file descriptor that the application can mmap to
	gain access to the SQ ring, CQ ring, and io_uring_sqes.

io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
	Initiates IO against the rings mapped to this fd, or waits for
	them to complete, or both. The behavior is controlled by the
	parameters passed in. If 'to_submit' is non-zero, then we'll
	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
	kernel will wait for 'min_complete' events, if they aren't
	already available. It's valid to set IORING_ENTER_GETEVENTS
	and 'min_complete' == 0 at the same time, this allows the
	kernel to return already completed events without waiting
	for them. This is useful only for polling, as for IRQ
	driven IO, the application can just check the CQ ring
	without entering the kernel.

With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.

For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.

Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.

Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28 08:24:23 -07:00
..
6lowpan 6lowpan: convert to DEFINE_SHOW_ATTRIBUTE 2018-12-19 00:28:05 +01:00
9p 9p/net: put a lower bound on msize 2018-12-25 17:07:49 +09:00
802
8021q net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
appletalk
atm Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
ax25 ax25: fix possible use-after-free 2019-01-23 11:18:00 -08:00
batman-adv Here are some batman-adv bugfixes: 2019-02-01 10:19:26 -08:00
bluetooth Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-12-27 13:53:32 -08:00
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-12-10 18:00:43 -08:00
bpfilter net: bpfilter: change section name of bpfilter UMH blob. 2019-01-16 15:46:46 -08:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2019-01-28 10:51:51 -08:00
caif Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
can can: bcm: check timer values before ktime conversion 2019-01-22 11:33:46 +01:00
ceph libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive() 2019-01-21 14:53:12 +01:00
core Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2019-02-01 15:28:07 -08:00
dcb
dccp dccp: fool proof ccid_hc_[rt]x_parse_options() 2019-02-01 14:49:10 -08:00
decnet decnet: fix DN_IFREQ_SIZE 2019-01-27 23:11:55 -08:00
dns_resolver dns: Allow the dns resolver to retrieve a server set 2018-10-04 09:40:52 -07:00
dsa net: dsa: Fix NULL checking in dsa_slave_set_eee() 2019-02-06 13:42:54 -08:00
ethernet net: ethernet: provide nvmem_get_mac_address() 2018-12-03 15:40:30 -08:00
hsr
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-24 16:19:56 -08:00
ife
ipv4 net: ip_gre: always reports o_key to userspace 2019-01-30 14:00:02 -08:00
ipv6 sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() 2019-02-07 10:48:42 -08:00
iucv iucv: Remove SKB list assumptions. 2018-11-10 16:55:11 -08:00
kcm
key af_key: fix indentation on declaration statement 2018-11-15 18:09:32 +01:00
l2tp l2tp: copy 4 more bytes to linear part if necessary 2019-01-31 08:58:46 -08:00
l3mdev l3mdev: add function to retreive upper master 2018-12-03 14:15:26 -08:00
lapb
llc llc: do not use sk_eat_skb() 2018-10-22 19:59:20 -07:00
mac80211 mac80211: ensure that mgmt tx skbs have tailroom for encryption 2019-02-01 11:08:02 +01:00
mac802154
mpls net/mpls: Handle kernel side filtering of route dumps 2018-10-16 00:14:07 -07:00
ncsi net/ncsi: Add NCSI Mellanox OEM command 2018-11-27 16:37:20 -08:00
netfilter netfilter: nft_compat: don't use refcount_inc on newly allocated entry 2019-02-05 14:10:33 +01:00
netlabel
netlink net: netlink: rename NETLINK_DUMP_STRICT_CHK -> NETLINK_GET_STRICT_CHK 2018-12-14 11:44:31 -08:00
netrom netrom: switch to sock timer API 2019-01-27 10:38:04 -08:00
nfc net: Revert recent Spectre-v1 patches. 2018-12-23 16:01:35 -08:00
nsh
openvswitch openvswitch: Avoid OOB read when parsing flow nlattrs 2019-01-16 13:35:21 -08:00
packet af_packet: fix raw sockets over 6in4 tunnel 2019-01-17 15:54:45 -08:00
phonet net: Revert recent Spectre-v1 patches. 2018-12-23 16:01:35 -08:00
psample
qrtr
rds rds: fix refcount bug in rds_sock_addref 2019-01-31 09:43:27 -08:00
rfkill rfkill: gpio: Remove unused include 2018-12-18 13:13:56 +01:00
rose net/rose: fix NULL ax25_cb kernel panic 2019-01-27 10:40:01 -08:00
rxrpc rxrpc: bad unlock balance in rxrpc_recvmsg 2019-02-06 10:54:07 -08:00
sched net: cls_flower: Remove filter from mask before freeing it 2019-02-04 09:19:14 -08:00
sctp sctp: check and update stream->out_curr when allocating stream_out 2019-02-03 14:27:47 -08:00
smc net/smc: correct state change for peer closing 2019-02-04 09:11:19 -08:00
strparser bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
sunrpc svcrdma: Remove max_sge check at connect time 2019-02-06 15:32:34 -05:00
switchdev net: switchdev: Add extack to switchdev_handle_port_obj_add() callback 2018-12-12 16:34:22 -08:00
tipc tipc: fix uninit-value in tipc_nl_compat_doit 2019-01-15 20:29:21 -08:00
tls net: tls: Fix deadlock in free_resources tx 2019-01-28 23:07:08 -08:00
unix Add io_uring IO interface 2019-02-28 08:24:23 -07:00
vmw_vsock vsock/virtio: reset connected sockets on device removal 2019-02-03 11:06:25 -08:00
wimax
wireless cfg80211: call disconnect_wk when AP stops 2019-02-01 11:12:50 +01:00
x25 net/x25: handle call collisions 2018-11-29 14:25:36 -08:00
xdp xsk: Check if a queue exists during umem setup 2019-01-15 20:51:57 +01:00
xfrm xfrm: Make set-mark default behavior backward compatible 2019-01-16 13:10:55 +01:00
compat.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
Kconfig net: convert bridge_nf to use skb extension infrastructure 2018-12-19 11:21:37 -08:00
Makefile
socket.c net: socket: make bond ioctls go through compat_ifreq_ioctl() 2019-01-30 10:19:31 -08:00
sysctl_net.c