syzbot reports that KMSAN complains that 'nr_tw' is an uninit-value
with the following report:
BUG: KMSAN: uninit-value in io_req_local_work_add io_uring/io_uring.c:1192 [inline]
BUG: KMSAN: uninit-value in io_req_task_work_add_remote+0x588/0x5d0 io_uring/io_uring.c:1240
io_req_local_work_add io_uring/io_uring.c:1192 [inline]
io_req_task_work_add_remote+0x588/0x5d0 io_uring/io_uring.c:1240
io_msg_remote_post io_uring/msg_ring.c:102 [inline]
io_msg_data_remote io_uring/msg_ring.c:133 [inline]
io_msg_ring_data io_uring/msg_ring.c:152 [inline]
io_msg_ring+0x1c38/0x1ef0 io_uring/msg_ring.c:305
io_issue_sqe+0x383/0x22c0 io_uring/io_uring.c:1710
io_queue_sqe io_uring/io_uring.c:1924 [inline]
io_submit_sqe io_uring/io_uring.c:2180 [inline]
io_submit_sqes+0x1259/0x2f20 io_uring/io_uring.c:2295
__do_sys_io_uring_enter io_uring/io_uring.c:3205 [inline]
__se_sys_io_uring_enter+0x40c/0x3ca0 io_uring/io_uring.c:3142
__x64_sys_io_uring_enter+0x11f/0x1a0 io_uring/io_uring.c:3142
x64_sys_call+0x2d82/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:427
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
which is the following check:
if (nr_tw < nr_wait)
return;
in io_req_local_work_add(). While nr_tw itself cannot be uninitialized,
it does depend on req->flags, which off the msg ring issue path can
indeed be uninitialized.
Fix this by always clearing the allocated 'req' fully if we can't grab
one from the cache itself.
Fixes: 50cf5f3842 ("io_uring/msg_ring: add an alloc cache for io_kiocb entries")
Reported-by: syzbot+82609b8937a4458106ca@syzkaller.appspotmail.com
Link: https://lore.kernel.org/io-uring/000000000000fd3d8d061dfc0e4a@google.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The change adding caching around the request allocated and freed for
data messages changed a kmem_cache_free() to a kfree(), which isn't
correct as the request came from slab in the first place. Fix that up
and use the right freeing function if the cache is already at its limit.
Note that the current mixing of kmem_cache_alloc and kfree is fine, but
consistent alloc/free functions should be used as it's otherwise somewhat
confusing.
Fixes: 50cf5f3842 ("io_uring/msg_ring: add an alloc cache for io_kiocb entries")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The change for improving the handling of the target CQE posting
inadvertently dropped the NULL check for the submitter task on the target
ring, reinstate that.
Fixes: 0617bb500b ("io_uring/msg_ring: improve handling of target CQE posting")
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With slab accounting, allocating and freeing memory has considerable
overhead. Add a basic alloc cache for the io_kiocb allocations that
msg_ring needs to do. Unlike other caches, this one is used by the
sender, grabbing it from the remote ring. When the remote ring gets
the posted completion, it'll free it locally. Hence it is separately
locked, using ctx->msg_lock.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use the exported helper for queueing task_work for message passing,
rather than rolling our own.
Note that this is only done for strict data messages for now, file
descriptor passing messages still rely on the kernel task_work. It could
get converted at some point if it's performance critical.
This improves peak performance of message passing by about 5x in some
basic testing, with 2 threads just sending messages to each other.
Before this change, it was capped at around 700K/sec, with the change
it's at over 4M/sec.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently this is gated on whether or not the target ring needs a local
completion - and if so, whether or not we're running on the right task.
The use case for same thread cross posting is probably a lot less
relevant than remote posting. And since we're going to improve this
situation anyway, just gate it on local posting and ignore what task
we're currently running on.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In io_msg_exec_remote(), ctx->submitter_task is read using READ_ONCE at
the beginning of the function, checked, and then re-read from
ctx->submitter_task, voiding all guarantees of the checks. Reuse the value
that was read by READ_ONCE to ensure the consistency of the task struct
throughout the function.
Signed-off-by: linke li <lilinke99@qq.com>
Link: https://lore.kernel.org/r/tencent_F9B2296C93928D6F68FF0C95C33475C68209@qq.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
msg_ring requests transferring files support auto index selection via
IORING_FILE_INDEX_ALLOC, however they don't return the selected index
to the target ring and there is no other good way for the userspace to
know where is the receieved file.
Return the index for allocated slots and 0 otherwise, which is
consistent with other fixed file installing requests.
Cc: stable@vger.kernel.org # v6.0+
Fixes: e6130eba8a ("io_uring: add support for passing fixed file descriptors")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://github.com/axboe/liburing/issues/809
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the target ring is using IORING_SETUP_SINGLE_ISSUER and we're posting
a message from a different thread, then we need to ensure that the
fallback task_work that posts the CQE knwos about the flags passing as
well. If not we'll always be posting 0 as the flags.
Fixes: 3563d7ed58a5 ("io_uring/msg_ring: Pass custom flags to the cqe")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds a new flag (IORING_MSG_RING_FLAGS_PASS) in the message
ring operations (IORING_OP_MSG_RING). This new flag enables the sender
to specify custom flags, which will be copied over to cqe->flags in the
receiving ring. These custom flags should be specified using the
sqe->file_index field.
This mechanism provides additional flexibility when sending messages
between rings.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230103160507.617416-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
IORING_SETUP_R_DISABLED rings don't have the submitter task set, so
it's not always safe to use ->submitter_task. Disallow posting msg_ring
messaged to disabled rings. Also add task NULL check for loosy sync
around testing for IORING_SETUP_R_DISABLED.
Cc: stable@vger.kernel.org
Fixes: 6d043ee116 ("io_uring: do msg_ring in target task via tw")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a couple of problems with queueing a tw in io_msg_ring_data()
for remote execution. First, once we queue it the target ring can
go away and so setting IORING_SQ_TASKRUN there is not safe. Secondly,
the userspace might not expect IORING_SQ_TASKRUN.
Extract a helper and uniformly use TWA_SIGNAL without TWA_SIGNAL_NO_IPI
tricks for now, just as it was done in the original patch.
Cc: stable@vger.kernel.org
Fixes: 6d043ee116 ("io_uring: do msg_ring in target task via tw")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the target ring is configured with IOPOLL, then we always need to hold
the target ring uring_lock before posting CQEs. We could just grab it
unconditionally, but since we don't expect many target rings to be of this
type, make grabbing the uring_lock conditional on the ring type.
Link: https://lore.kernel.org/io-uring/Y8krlYa52%2F0YGqkg@ip-172-31-85-199.ec2.internal/
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for needing them somewhere else, move them and get rid of
the unused 'issue_flags' for the unlock side.
No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Before the recent change, we didn't even wake the targeted task when
posting the cqe remotely. Now we do wake it, but we do want to ensure
that the recipient knows there's potential work there that needs to
get processed to get the CQE posted.
OR in IORING_SQ_TASKRUN for that purpose.
Link: https://lore.kernel.org/io-uring/2843c6b4-ba9a-b67d-e0f4-957f42098489@kernel.dk/
Fixes: 6d043ee116 ("io_uring: do msg_ring in target task via tw")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While executing in a context of one io_uring instance msg_ring
manipulates another ring. We're trying to keep CQEs posting contained in
the context of the ring-owner task, use task_work to send the request to
the target ring's task when we're modifying its CQ or trying to install
a file. Note, we can't safely use io_uring task_work infra and have to
use task_work directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4d76c7b28ed5d71b520de4482fbb7f660f21cd80.1670384893.git.asml.silence@gmail.com
[axboe: use TWA_SIGNAL_NO_IPI]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The only call sites which would not allow overflow are also call sites
which would use the io_aux_cqe as they care about ordering.
So remove this parameter from io_post_aux_cqe.
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221124093559.3780686-9-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we're invoked with a fixed file, follow the normal rules of not
calling io_fput_file(). Fixed files are permanently registered to the
ring, and do not need putting separately.
Cc: stable@vger.kernel.org
Fixes: aa184e8671 ("io_uring: don't attempt to IOPOLL for MSG_RING requests")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some use cases of io_post_aux_cqe would not want to overflow as is, but
might want to change the flags/result. For example multishot receive
requires in order CQE, and so if there is an overflow it would need to
stop receiving until the overflow is taken care of.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-8-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With IORING_OP_MSG_RING, one ring can send a message to another ring.
Extend that support to also allow sending a fixed file descriptor to
that ring, enabling one ring to pass a registered descriptor to another
one.
Arguments are extended to pass in:
sqe->addr3 fixed file slot in source ring
sqe->file_index fixed file slot in destination ring
IORING_OP_MSG_RING is extended to take a command argument in sqe->addr.
If set to zero (or IORING_MSG_DATA), it sends just a message like before.
If set to IORING_MSG_SEND_FD, a fixed file descriptor is sent according
to the above arguments.
Two common use cases for this are:
1) Server needs to be shutdown or restarted, pass file descriptors to
another onei
2) Backend is split, and one accepts connections, while others then get
the fd passed and handle the actual connection.
Both of those are classic SCM_RIGHTS use cases, and it's not possible to
support them with direct descriptors today.
By default, this will post a CQE to the target ring, similarly to how
IORING_MSG_DATA does it. If IORING_MSG_RING_CQE_SKIP is set, no message
is posted to the target ring. The issuer is expected to notify the
receiver side separately.
Signed-off-by: Jens Axboe <axboe@kernel.dk>