Add comments to explain why it is always under uring lock when
incrementing head in __io_kbuf_recycle. And rectify one comemnt about
kbuf consuming in iowq case.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220617050429.94293-1-hao.xu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently we do two extra spin lock/unlock pairs to add a poll/apoll
request to the cancellation hash table and remove it from there.
On the submission side we often already hold ->uring_lock and tw
completion is likely to hold it as well. Add a second cancellation hash
table protected by ->uring_lock. In concerns for latency because of a
need to have the mutex locked on the completion side, use the new table
only in following cases:
1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
is little to no contention and so the main tw hander will almost
always end up grabbing it before calling callbacks.
2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
a major user of ->uring_lock.
3) apoll: we normally grab the lock on the completion side anyway to
execute the request, so it's free.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1bbad9c78c454b7b92f100bbf46730a37df7194f.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
of it, i.e. put limitations of submitters. Also, don't allow it together
with IOPOLL as we're not going to put it to good use.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4bcc41ee467fdf04c8aab8baf6ce3ba21858c3d4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't allocate to many hash/cancellation buckets, there might be too
many, clamp it to 8 bits, or 256 * 64B = 16KB. We don't usually have too
many requests, and 256 buckets should be enough, especially since we
do hash search only in the cancellation path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b9620c8072ba61a2d50eba894b89bd93a94a9abd.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dae1dc5749aa34367812ecf62f82fd3f053aae44.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.
As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/600ba20a9338b8a39b249b23d3d177803613dde4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_issue_sqe() from the io_uring core knows how to complete requests
based on the returned error code, we can delegate io_read()/io_write()
completion to it. Make kiocb_done() to return the right completion
code and propagate it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/32ef005b45d23bf6b5e6837740dc0331bb051bd4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a65ed3f5effd9321ee06e6edea294a03be3e15a0.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Following timeout fields access patterns, move all of them into a
separate cache line inside ctx, so they don't intervene with normal
completion caching, especially since timeout removals and completion
are separated and the later is done via tw.
It also sheds some bytes from io_ring_ctx, 1216B -> 1152B
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4b163793072840de53b3cb66e0c2995e7226ff78.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As far as we know, nobody ever adopted the epoll_ctl management via
io_uring. Deprecate it now with a warning, and plan on removing it in
a later kernel version. When we do remove it, we can revert the following
commits as well:
39220e8d4a ("eventpoll: support non-blocking do_epoll_ctl() calls")
58e41a44c4 ("eventpoll: abstract out epoll_ctl() handler")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wiTyisXBgKnVHAGYCNvkmjk=50agS2Uk6nr+n3ssLZg2w@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
By default, the POLL_ADD command does edge triggered poll - if we get
a non-zero mask on the initial poll attempt, we complete the request
successfully.
Support level triggered by always waiting for a notification, regardless
of whether or not the initial mask matches the file state.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We already have the declarations in opdef.h, move the rest into its own
file rather than in the main io_uring.c file.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a io_poll_issue() rather than export the general task_work locking
and io_issue_sqe(), and put the io_op_defs definition and structure into
a separate header file so that poll can use it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove some dead headers we no longer need, and get rid of the
io_ring_ctx and io_uring_fops forward declarations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This also means moving a bit more of the fixed file handling to the
filetable side, which makes sense separately too.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Would be nice to sort out Kconfig for this and don't even compile
epoll.c if we don't have epoll configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Normally request handlers complete requests themselves, if they don't
return an error. For the latter case, the core will complete it for
them.
This is unhandy for pushing opcode handlers further out, as we don't
want a bunch of inline completion code and we don't want to make the
completion path slower than it is now.
Let the core handle any completion, unless the handler explicitly
asks us not to.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This can move request type specific cleanup into a private handler,
removing the need for the core io_uring parts to know what types
they are dealing with.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There's a special case for recvmsg with MSG_ERRQUEUE set. This is
problematic as it means the core needs to know about this special
request type.
For now, just add a generic flag for it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove struct io_poll_update from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove struct io_poll_iocb from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.
While at it, rename io_poll_iocb to io_poll which is consistent with the
other request type private structures.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove struct io_rw from io_kiocb, and convert the read/write path to
use the io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Each opcode generally has a command structure in io_kiocb which it can
use to store data associated with that request.
In preparation for having the core layer not know about what's inside
these fields, add a generic io_cmd_data type and put in the union as
well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Define an io_op_def->prep_async() handler and push the async preparation
to there. Since we now have that, we can drop ->needs_async_setup, as
they mean the same thing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for splitting io_uring up a bit, move it into its own
top level directory. It didn't really belong in fs/ anyway, as it's
not a file system only API.
This adds io_uring/ and moves the core files in there, and updates the
MAINTAINERS file for the new location.
Signed-off-by: Jens Axboe <axboe@kernel.dk>