io_uring: improve in tctx_task_work() resubmission

If task_state is cleared, io_req_task_work_add() will go the slow path
adding a task_work, setting the task_state, waking up the task and so
on. Not to mention it's expensive. tctx_task_work() first clears the
state and then executes all the work items queued, so if any of them
resubmits or adds new task_work items, it would unnecessarily go through
the slow path of io_req_task_work_add().

Let's clear the ->task_state at the end. We still have to check
->task_list for emptiness afterward to synchronise with
io_req_task_work_add(), do that, and set the state back if we're going
to retry, because clearing not-ours task_state on the next iteration
would be buggy.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ef72cdac7022adf0cd7ce4bfe3bb5c82a62eb93.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This commit is contained in:
Pavel Begunkov 2021-06-17 18:14:10 +01:00 committed by Jens Axboe
parent 16f7207038
commit 7a778f9dc3

View File

@ -1894,8 +1894,6 @@ static void tctx_task_work(struct callback_head *cb)
struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
task_work);
clear_bit(0, &tctx->task_state);
while (1) {
struct io_wq_work_node *node;
@ -1917,8 +1915,14 @@ static void tctx_task_work(struct callback_head *cb)
req->task_work.func(&req->task_work);
node = next;
}
if (wq_list_empty(&tctx->task_list))
break;
if (wq_list_empty(&tctx->task_list)) {
clear_bit(0, &tctx->task_state);
if (wq_list_empty(&tctx->task_list))
break;
/* another tctx_task_work() is enqueued, yield */
if (test_and_set_bit(0, &tctx->task_state))
break;
}
cond_resched();
}