mirror of
https://github.com/torvalds/linux.git
synced 2024-11-16 09:02:00 +00:00
ec3b39c731
Add Documentation for multithreaded jobs. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-9-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
191 lines
8.1 KiB
ReStructuredText
191 lines
8.1 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
=======================================
|
|
The padata parallel execution mechanism
|
|
=======================================
|
|
|
|
:Date: May 2020
|
|
|
|
Padata is a mechanism by which the kernel can farm jobs out to be done in
|
|
parallel on multiple CPUs while optionally retaining their ordering.
|
|
|
|
It was originally developed for IPsec, which needs to perform encryption and
|
|
decryption on large numbers of packets without reordering those packets. This
|
|
is currently the sole consumer of padata's serialized job support.
|
|
|
|
Padata also supports multithreaded jobs, splitting up the job evenly while load
|
|
balancing and coordinating between threads.
|
|
|
|
Running Serialized Jobs
|
|
=======================
|
|
|
|
Initializing
|
|
------------
|
|
|
|
The first step in using padata to run serialized jobs is to set up a
|
|
padata_instance structure for overall control of how jobs are to be run::
|
|
|
|
#include <linux/padata.h>
|
|
|
|
struct padata_instance *padata_alloc_possible(const char *name);
|
|
|
|
'name' simply identifies the instance.
|
|
|
|
There are functions for enabling and disabling the instance::
|
|
|
|
int padata_start(struct padata_instance *pinst);
|
|
void padata_stop(struct padata_instance *pinst);
|
|
|
|
These functions are setting or clearing the "PADATA_INIT" flag; if that flag is
|
|
not set, other functions will refuse to work. padata_start() returns zero on
|
|
success (flag set) or -EINVAL if the padata cpumask contains no active CPU
|
|
(flag not set). padata_stop() clears the flag and blocks until the padata
|
|
instance is unused.
|
|
|
|
Finally, complete padata initialization by allocating a padata_shell::
|
|
|
|
struct padata_shell *padata_alloc_shell(struct padata_instance *pinst);
|
|
|
|
A padata_shell is used to submit a job to padata and allows a series of such
|
|
jobs to be serialized independently. A padata_instance may have one or more
|
|
padata_shells associated with it, each allowing a separate series of jobs.
|
|
|
|
Modifying cpumasks
|
|
------------------
|
|
|
|
The CPUs used to run jobs can be changed in two ways, programatically with
|
|
padata_set_cpumask() or via sysfs. The former is defined::
|
|
|
|
int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
|
|
cpumask_var_t cpumask);
|
|
|
|
Here cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a
|
|
parallel cpumask describes which processors will be used to execute jobs
|
|
submitted to this instance in parallel and a serial cpumask defines which
|
|
processors are allowed to be used as the serialization callback processor.
|
|
cpumask specifies the new cpumask to use.
|
|
|
|
There may be sysfs files for an instance's cpumasks. For example, pcrypt's
|
|
live in /sys/kernel/pcrypt/<instance-name>. Within an instance's directory
|
|
there are two files, parallel_cpumask and serial_cpumask, and either cpumask
|
|
may be changed by echoing a bitmask into the file, for example::
|
|
|
|
echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
|
|
|
|
Reading one of these files shows the user-supplied cpumask, which may be
|
|
different from the 'usable' cpumask.
|
|
|
|
Padata maintains two pairs of cpumasks internally, the user-supplied cpumasks
|
|
and the 'usable' cpumasks. (Each pair consists of a parallel and a serial
|
|
cpumask.) The user-supplied cpumasks default to all possible CPUs on instance
|
|
allocation and may be changed as above. The usable cpumasks are always a
|
|
subset of the user-supplied cpumasks and contain only the online CPUs in the
|
|
user-supplied masks; these are the cpumasks padata actually uses. So it is
|
|
legal to supply a cpumask to padata that contains offline CPUs. Once an
|
|
offline CPU in the user-supplied cpumask comes online, padata is going to use
|
|
it.
|
|
|
|
Changing the CPU masks are expensive operations, so it should not be done with
|
|
great frequency.
|
|
|
|
Running A Job
|
|
-------------
|
|
|
|
Actually submitting work to the padata instance requires the creation of a
|
|
padata_priv structure, which represents one job::
|
|
|
|
struct padata_priv {
|
|
/* Other stuff here... */
|
|
void (*parallel)(struct padata_priv *padata);
|
|
void (*serial)(struct padata_priv *padata);
|
|
};
|
|
|
|
This structure will almost certainly be embedded within some larger
|
|
structure specific to the work to be done. Most of its fields are private to
|
|
padata, but the structure should be zeroed at initialisation time, and the
|
|
parallel() and serial() functions should be provided. Those functions will
|
|
be called in the process of getting the work done as we will see
|
|
momentarily.
|
|
|
|
The submission of the job is done with::
|
|
|
|
int padata_do_parallel(struct padata_shell *ps,
|
|
struct padata_priv *padata, int *cb_cpu);
|
|
|
|
The ps and padata structures must be set up as described above; cb_cpu
|
|
points to the preferred CPU to be used for the final callback when the job is
|
|
done; it must be in the current instance's CPU mask (if not the cb_cpu pointer
|
|
is updated to point to the CPU actually chosen). The return value from
|
|
padata_do_parallel() is zero on success, indicating that the job is in
|
|
progress. -EBUSY means that somebody, somewhere else is messing with the
|
|
instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the
|
|
serial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped
|
|
instance.
|
|
|
|
Each job submitted to padata_do_parallel() will, in turn, be passed to
|
|
exactly one call to the above-mentioned parallel() function, on one CPU, so
|
|
true parallelism is achieved by submitting multiple jobs. parallel() runs with
|
|
software interrupts disabled and thus cannot sleep. The parallel()
|
|
function gets the padata_priv structure pointer as its lone parameter;
|
|
information about the actual work to be done is probably obtained by using
|
|
container_of() to find the enclosing structure.
|
|
|
|
Note that parallel() has no return value; the padata subsystem assumes that
|
|
parallel() will take responsibility for the job from this point. The job
|
|
need not be completed during this call, but, if parallel() leaves work
|
|
outstanding, it should be prepared to be called again with a new job before
|
|
the previous one completes.
|
|
|
|
Serializing Jobs
|
|
----------------
|
|
|
|
When a job does complete, parallel() (or whatever function actually finishes
|
|
the work) should inform padata of the fact with a call to::
|
|
|
|
void padata_do_serial(struct padata_priv *padata);
|
|
|
|
At some point in the future, padata_do_serial() will trigger a call to the
|
|
serial() function in the padata_priv structure. That call will happen on
|
|
the CPU requested in the initial call to padata_do_parallel(); it, too, is
|
|
run with local software interrupts disabled.
|
|
Note that this call may be deferred for a while since the padata code takes
|
|
pains to ensure that jobs are completed in the order in which they were
|
|
submitted.
|
|
|
|
Destroying
|
|
----------
|
|
|
|
Cleaning up a padata instance predictably involves calling the three free
|
|
functions that correspond to the allocation in reverse::
|
|
|
|
void padata_free_shell(struct padata_shell *ps);
|
|
void padata_stop(struct padata_instance *pinst);
|
|
void padata_free(struct padata_instance *pinst);
|
|
|
|
It is the user's responsibility to ensure all outstanding jobs are complete
|
|
before any of the above are called.
|
|
|
|
Running Multithreaded Jobs
|
|
==========================
|
|
|
|
A multithreaded job has a main thread and zero or more helper threads, with the
|
|
main thread participating in the job and then waiting until all helpers have
|
|
finished. padata splits the job into units called chunks, where a chunk is a
|
|
piece of the job that one thread completes in one call to the thread function.
|
|
|
|
A user has to do three things to run a multithreaded job. First, describe the
|
|
job by defining a padata_mt_job structure, which is explained in the Interface
|
|
section. This includes a pointer to the thread function, which padata will
|
|
call each time it assigns a job chunk to a thread. Then, define the thread
|
|
function, which accepts three arguments, ``start``, ``end``, and ``arg``, where
|
|
the first two delimit the range that the thread operates on and the last is a
|
|
pointer to the job's shared state, if any. Prepare the shared state, which is
|
|
typically allocated on the main thread's stack. Last, call
|
|
padata_do_multithreaded(), which will return once the job is finished.
|
|
|
|
Interface
|
|
=========
|
|
|
|
.. kernel-doc:: include/linux/padata.h
|
|
.. kernel-doc:: kernel/padata.c
|