Pavel Begunkov
1ab1edb0a1
io_uring: pass poll_find lock back
...
Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/dae1dc5749aa34367812ecf62f82fd3f053aae44.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Hao Xu
38513c464d
io_uring: switch cancel_hash to use per entry spinlock
...
Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.
Signed-off-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Hao Xu
3654ab0c51
io_uring: poll: remove unnecessary req->ref set
...
We now don't need to set req->refcount for poll requests since the
reworked poll code ensures no request release race.
Signed-off-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/ec6fee45705890bdb968b0c175519242753c0215.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
53ccf69bda
io_uring: don't inline io_put_kbuf
...
io_put_kbuf() is huge, don't bloat the kernel with inlining.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/2e21ccf0be471ffa654032914b9430813cae53f8.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
7012c81593
io_uring: refactor io_req_task_complete()
...
Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/ae3148ac7eb5cce3e06895cde306e9e959d6f6ae.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
75d7b3aec1
io_uring: kill REQ_F_COMPLETE_INLINE
...
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.
As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/600ba20a9338b8a39b249b23d3d177803613dde4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
df9830d883
io_uring: rw: delegate sync completions to core io_uring
...
io_issue_sqe() from the io_uring core knows how to complete requests
based on the returned error code, we can delegate io_read()/io_write()
completion to it. Make kiocb_done() to return the right completion
code and propagate it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/32ef005b45d23bf6b5e6837740dc0331bb051bd4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Jens Axboe
bb8f870031
io_uring: remove unused IO_REQ_CACHE_SIZE defined
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
c65f5279ba
io_uring: don't set REQ_F_COMPLETE_INLINE in tw
...
io_req_task_complete() enqueues requests for state completion itself, no
need for REQ_F_COMPLETE_INLINE, which is only serve the purpose of not
bloating the kernel.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/aca80f71464ad02c06f1311d998a2d6ee0b31573.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
3a08576b96
io_uring: remove check_cq checking from hot paths
...
All ctx->check_cq events are slow path, don't test every single flag one
by one in the hot path, but add a common guarding if.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/dff026585cea7ff3a172a7c83894a3b0111bbf6a.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
aeaa72c694
io_uring: never defer-complete multi-apoll
...
Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/a65ed3f5effd9321ee06e6edea294a03be3e15a0.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
6a02e4be81
io_uring: inline ->registered_rings
...
There can be only 16 registered rings, no need to allocate an array for
them separately but store it in tctx.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/495f0b953c87994dd9e13de2134019054fa5830d.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
48c13d8980
io_uring: explain io_wq_work::cancel_seq placement
...
Add a comment on why we keep ->cancel_seq in struct io_wq_work instead
of struct io_kiocb despite it needed only by io_uring but not io-wq.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/988e87eec9dc700b5dae933df3aefef303502f6c.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
aa1e90f64e
io_uring: move small helpers to headers
...
There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/22df99c83723e44cba7e945e8519e64e3642c064.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
22eb2a3fde
io_uring: refactor ctx slow data placement
...
Shove all slow path data at the end of ctx and get rid of extra
indention.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/bcaf200298dd469af20787650550efc66d89bef2.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Pavel Begunkov
aff5b2df9e
io_uring: better caching for ctx timeout fields
...
Following timeout fields access patterns, move all of them into a
separate cache line inside ctx, so they don't intervene with normal
completion caching, especially since timeout removals and completion
are separated and the later is done via tw.
It also sheds some bytes from io_ring_ctx, 1216B -> 1152B
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/4b163793072840de53b3cb66e0c2995e7226ff78.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Pavel Begunkov
b25436038f
io_uring: move defer_list to slow data
...
draining is slow path, move defer_list to the end where slow data lives
inside the context.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/e16379391ca72b490afdd24e8944baab849b4a7b.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Pavel Begunkov
5ff4fdffad
io_uring: make reg buf init consistent
...
The default (i.e. empty) state of register buffer is dummy_ubuf, so set
it to dummy on init instead of NULL.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/c5456aecf03d9627fbd6e65e100e2b5293a6151e.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
61a2732af4
io_uring: deprecate epoll_ctl support
...
As far as we know, nobody ever adopted the epoll_ctl management via
io_uring. Deprecate it now with a warning, and plan on removing it in
a later kernel version. When we do remove it, we can revert the following
commits as well:
39220e8d4a ("eventpoll: support non-blocking do_epoll_ctl() calls")
58e41a44c4 ("eventpoll: abstract out epoll_ctl() handler")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org >
Link: https://lore.kernel.org/io-uring/CAHk-=wiTyisXBgKnVHAGYCNvkmjk=50agS2Uk6nr+n3ssLZg2w@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
b9ba8a4463
io_uring: add support for level triggered poll
...
By default, the POLL_ADD command does edge triggered poll - if we get
a non-zero mask on the initial poll attempt, we complete the request
successfully.
Support level triggered by always waiting for a notification, regardless
of whether or not the initial mask matches the file state.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
d9b57aa3cf
io_uring: move opcode table to opdef.c
...
We already have the declarations in opdef.h, move the rest into its own
file rather than in the main io_uring.c file.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
f3b44f92e5
io_uring: move read/write related opcodes to its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
c98817e6cd
io_uring: move remaining file table manipulation to filetable.c
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
7357298448
io_uring: move rsrc related data, core, and commands
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
3b77495a97
io_uring: split provided buffers handling into its own file
...
Move both the opcodes related to it, and the internals code dealing with
it.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
7aaff708a7
io_uring: move cancelation into its own file
...
This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
329061d3e2
io_uring: move poll handling into its own file
...
Add a io_poll_issue() rather than export the general task_work locking
and io_issue_sqe(), and put the io_op_defs definition and structure into
a separate header file so that poll can use it.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
cfd22e6b33
io_uring: add opcode name to io_op_defs
...
This kills the last per-op switch.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
92ac8beaea
io_uring: include and forward-declaration sanitation
...
Remove some dead headers we no longer need, and get rid of the
io_ring_ctx and io_uring_fops forward declarations.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
c9f06aa7de
io_uring: move io_uring_task (tctx) helpers into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
a4ad4f748e
io_uring: move fdinfo helpers to its own file
...
This also means moving a bit more of the fixed file handling to the
filetable side, which makes sense separately too.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
e5550a1447
io_uring: use io_is_uring_fops() consistently
...
Convert the last spots that check for io_uring_fops to use the provided
helper instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
17437f3114
io_uring: move SQPOLL related handling into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
59915143e8
io_uring: move timeout opcodes and handling into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
e418bbc97b
io_uring: move our reference counting into a header
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
36404b09aa
io_uring: move msg_ring into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
f9ead18c10
io_uring: split network related opcodes into its own file
...
While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
e0da14def1
io_uring: move statx handling to its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
a9c210cebe
io_uring: move epoll handler to its own file
...
Would be nice to sort out Kconfig for this and don't even compile
epoll.c if we don't have epoll configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
4cf9049528
io_uring: add a dummy -EOPNOTSUPP prep handler
...
Add it and use it for the epoll handling, if epoll isn't configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
99f15d8d61
io_uring: move uring_cmd handling to its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
cd40cae29e
io_uring: split out open/close operations
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
453b329be5
io_uring: separate out file table handling code
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
f4c163dd7d
io_uring: split out fadvise/madvise operations
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
0d58472740
io_uring: split out fs related sync/fallocate functions
...
This splits out sync_file_range, fsync, and fallocate.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
531113bbd5
io_uring: split out splice related operations
...
This splits out splice and tee support.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
11aeb71406
io_uring: split out filesystem related operations
...
This splits out renameat, unlinkat, mkdirat, symlinkat, and linkat.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
e28683bdfc
io_uring: move nop into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
5e2a18d93f
io_uring: move xattr related opcodes to its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
97b388d70b
io_uring: handle completions in the core
...
Normally request handlers complete requests themselves, if they don't
return an error. For the latter case, the core will complete it for
them.
This is unhandy for pushing opcode handlers further out, as we don't
want a bunch of inline completion code and we don't want to make the
completion path slower than it is now.
Let the core handle any completion, unless the handler explicitly
asks us not to.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00