Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-send

* for-5.20/io_uring: (716 commits)
  io_uring: ensure REQ_F_ISREG is set async offload
  net: fix compat pointer in get_compat_msghdr()
  io_uring: Don't require reinitable percpu_ref
  io_uring: fix types in io_recvmsg_multishot_overflow
  io_uring: Use atomic_long_try_cmpxchg in __io_account_mem
  io_uring: support multishot in recvmsg
  net: copy from user before calling __get_compat_msghdr
  net: copy from user before calling __copy_msghdr
  io_uring: support 0 length iov in buffer select in compat
  io_uring: fix multishot ending when not polled
  io_uring: add netmsg cache
  io_uring: impose max limit on apoll cache
  io_uring: add abstraction around apoll cache
  io_uring: move apoll cache to poll.c
  io_uring: consolidate hash_locked io-wq handling
  io_uring: clear REQ_F_HASH_LOCKED on hash removal
  io_uring: don't race double poll setting REQ_F_ASYNC_DATA
  io_uring: don't miss setting REQ_F_DOUBLE_POLL
  io_uring: disable multishot recvmsg
  io_uring: only trace one of complete or overflow
  ...

Signed-off-by: Jens Axboe <axboe@kernel.dk>
This commit is contained in:
Jens Axboe 2022-07-24 18:41:03 -06:00
commit 4effe18fc0
633 changed files with 21989 additions and 16796 deletions

View File

@ -627,6 +627,10 @@ S: 48287 Sawleaf
S: Fremont, California 94539
S: USA
N: Tomas Cech
E: sleep_walker@suse.com
D: arm/palm treo support
N: Florent Chabaud
E: florent.chabaud@polytechnique.org
D: software suspend

View File

@ -5197,6 +5197,30 @@
retain_initrd [RAM] Keep initrd memory after extraction
retbleed= [X86] Control mitigation of RETBleed (Arbitrary
Speculative Code Execution with Return Instructions)
vulnerability.
off - no mitigation
auto - automatically select a migitation
auto,nosmt - automatically select a mitigation,
disabling SMT if necessary for
the full mitigation (only on Zen1
and older without STIBP).
ibpb - mitigate short speculation windows on
basic block boundaries too. Safe, highest
perf impact.
unret - force enable untrained return thunks,
only effective on AMD f15h-f17h
based systems.
unret,nosmt - like unret, will disable SMT when STIBP
is not available.
Selecting 'auto' will choose a mitigation method at run
time according to the CPU.
Not specifying this option is equivalent to retbleed=auto.
rfkill.default_state=
0 "airplane mode". All wifi, bluetooth, wimax, gps, fm,
etc. communication is blocked by default.
@ -5568,6 +5592,7 @@
eibrs - enhanced IBRS
eibrs,retpoline - enhanced IBRS + Retpolines
eibrs,lfence - enhanced IBRS + LFENCE
ibrs - use IBRS to protect kernel
Not specifying this option is equivalent to
spectre_v2=auto.
@ -5771,6 +5796,24 @@
expediting. Set to zero to disable automatic
expediting.
srcutree.srcu_max_nodelay [KNL]
Specifies the number of no-delay instances
per jiffy for which the SRCU grace period
worker thread will be rescheduled with zero
delay. Beyond this limit, worker thread will
be rescheduled with a sleep delay of one jiffy.
srcutree.srcu_max_nodelay_phase [KNL]
Specifies the per-grace-period phase, number of
non-sleeping polls of readers. Beyond this limit,
grace period worker thread will be rescheduled
with a sleep delay of one jiffy, between each
rescan of the readers, for a grace period phase.
srcutree.srcu_retry_check_delay [KNL]
Specifies number of microseconds of non-sleeping
delay between each non-sleeping poll of readers.
srcutree.small_contention_lim [KNL]
Specifies the number of update-side contention
events per jiffy will be tolerated before

View File

@ -223,7 +223,7 @@ Module Loading
Inter Module support
--------------------
Refer to the file kernel/module.c for more information.
Refer to the files in kernel/module/ for more information.
Hardware Interfaces
===================

View File

@ -51,8 +51,8 @@ namespace ``USB_STORAGE``, use::
The corresponding ksymtab entry struct ``kernel_symbol`` will have the member
``namespace`` set accordingly. A symbol that is exported without a namespace will
refer to ``NULL``. There is no default namespace if none is defined. ``modpost``
and kernel/module.c make use the namespace at build time or module load time,
respectively.
and kernel/module/main.c make use the namespace at build time or module load
time, respectively.
2.2 Using the DEFAULT_SYMBOL_NAMESPACE define
=============================================

View File

@ -94,6 +94,7 @@ if:
- allwinner,sun8i-a83t-display-engine
- allwinner,sun8i-r40-display-engine
- allwinner,sun9i-a80-display-engine
- allwinner,sun20i-d1-display-engine
- allwinner,sun50i-a64-display-engine
then:

View File

@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: Qualcomm Atheros ath9k wireless devices Generic Binding
maintainers:
- Kalle Valo <kvalo@codeaurora.org>
- Toke Høiland-Jørgensen <toke@toke.dk>
description: |
This node provides properties for configuring the ath9k wireless device.

View File

@ -9,7 +9,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: Qualcomm Technologies ath11k wireless devices Generic Binding
maintainers:
- Kalle Valo <kvalo@codeaurora.org>
- Kalle Valo <kvalo@kernel.org>
description: |
These are dt entries for Qualcomm Technologies, Inc. IEEE 802.11ax

View File

@ -25,12 +25,12 @@ properties:
- qcom,sc7280-lpass-cpu
reg:
minItems: 2
minItems: 1
maxItems: 6
description: LPAIF core registers
reg-names:
minItems: 2
minItems: 1
maxItems: 6
clocks:
@ -42,12 +42,12 @@ properties:
maxItems: 10
interrupts:
minItems: 2
minItems: 1
maxItems: 4
description: LPAIF DMA buffer interrupt
interrupt-names:
minItems: 2
minItems: 1
maxItems: 4
qcom,adsp:

View File

@ -301,7 +301,7 @@ through which it can issue requests and negotiate::
void (*issue_read)(struct netfs_io_subrequest *subreq);
bool (*is_still_valid)(struct netfs_io_request *rreq);
int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
struct folio *folio, void **_fsdata);
struct folio **foliop, void **_fsdata);
void (*done)(struct netfs_io_request *rreq);
};
@ -381,8 +381,10 @@ The operations are as follows:
allocated/grabbed the folio to be modified to allow the filesystem to flush
conflicting state before allowing it to be modified.
It should return 0 if everything is now fine, -EAGAIN if the folio should be
regrabbed and any other error code to abort the operation.
It may unlock and discard the folio it was given and set the caller's folio
pointer to NULL. It should return 0 if everything is now fine (``*foliop``
left set) or the op should be retried (``*foliop`` cleared) and any other
error code to abort the operation.
* ``done``

View File

@ -466,6 +466,10 @@ overlay filesystem and the value of st_ino for filesystem objects may not be
persistent and could change even while the overlay filesystem is mounted, as
summarized in the `Inode properties`_ table above.
4) "idmapped mounts"
When the upper or lower layers are idmapped mounts overlayfs will be mounted
without support for POSIX Access Control Lists (ACLs). This limitation will
eventually be lifted.
Changes to underlying filesystems
---------------------------------

View File

@ -210,11 +210,11 @@ module->symtab.
=====================================
Normally, a stripped down copy of a module's symbol table (containing only
"core" symbols) is made available through module->symtab (See layout_symtab()
in kernel/module.c). For livepatch modules, the symbol table copied into memory
on module load must be exactly the same as the symbol table produced when the
patch module was compiled. This is because the relocations in each livepatch
relocation section refer to their respective symbols with their symbol indices,
and the original symbol indices (and thus the symtab ordering) must be
in kernel/module/kallsyms.c). For livepatch modules, the symbol table copied
into memory on module load must be exactly the same as the symbol table produced
when the patch module was compiled. This is because the relocations in each
livepatch relocation section refer to their respective symbols with their symbol
indices, and the original symbol indices (and thus the symtab ordering) must be
preserved in order for apply_relocate_add() to find the right symbol.
For example, take this particular rela from a livepatch module:::

View File

@ -503,26 +503,108 @@ per-port PHY specific details: interface connection, MDIO bus location, etc.
Driver development
==================
DSA switch drivers need to implement a dsa_switch_ops structure which will
DSA switch drivers need to implement a ``dsa_switch_ops`` structure which will
contain the various members described below.
``register_switch_driver()`` registers this dsa_switch_ops in its internal list
of drivers to probe for. ``unregister_switch_driver()`` does the exact opposite.
Probing, registration and device lifetime
-----------------------------------------
Unless requested differently by setting the priv_size member accordingly, DSA
does not allocate any driver private context space.
DSA switches are regular ``device`` structures on buses (be they platform, SPI,
I2C, MDIO or otherwise). The DSA framework is not involved in their probing
with the device core.
Switch registration from the perspective of a driver means passing a valid
``struct dsa_switch`` pointer to ``dsa_register_switch()``, usually from the
switch driver's probing function. The following members must be valid in the
provided structure:
- ``ds->dev``: will be used to parse the switch's OF node or platform data.
- ``ds->num_ports``: will be used to create the port list for this switch, and
to validate the port indices provided in the OF node.
- ``ds->ops``: a pointer to the ``dsa_switch_ops`` structure holding the DSA
method implementations.
- ``ds->priv``: backpointer to a driver-private data structure which can be
retrieved in all further DSA method callbacks.
In addition, the following flags in the ``dsa_switch`` structure may optionally
be configured to obtain driver-specific behavior from the DSA core. Their
behavior when set is documented through comments in ``include/net/dsa.h``.
- ``ds->vlan_filtering_is_global``
- ``ds->needs_standalone_vlan_filtering``
- ``ds->configure_vlan_while_not_filtering``
- ``ds->untag_bridge_pvid``
- ``ds->assisted_learning_on_cpu_port``
- ``ds->mtu_enforcement_ingress``
- ``ds->fdb_isolation``
Internally, DSA keeps an array of switch trees (group of switches) global to
the kernel, and attaches a ``dsa_switch`` structure to a tree on registration.
The tree ID to which the switch is attached is determined by the first u32
number of the ``dsa,member`` property of the switch's OF node (0 if missing).
The switch ID within the tree is determined by the second u32 number of the
same OF property (0 if missing). Registering multiple switches with the same
switch ID and tree ID is illegal and will cause an error. Using platform data,
a single switch and a single switch tree is permitted.
In case of a tree with multiple switches, probing takes place asymmetrically.
The first N-1 callers of ``dsa_register_switch()`` only add their ports to the
port list of the tree (``dst->ports``), each port having a backpointer to its
associated switch (``dp->ds``). Then, these switches exit their
``dsa_register_switch()`` call early, because ``dsa_tree_setup_routing_table()``
has determined that the tree is not yet complete (not all ports referenced by
DSA links are present in the tree's port list). The tree becomes complete when
the last switch calls ``dsa_register_switch()``, and this triggers the effective
continuation of initialization (including the call to ``ds->ops->setup()``) for
all switches within that tree, all as part of the calling context of the last
switch's probe function.
The opposite of registration takes place when calling ``dsa_unregister_switch()``,
which removes a switch's ports from the port list of the tree. The entire tree
is torn down when the first switch unregisters.
It is mandatory for DSA switch drivers to implement the ``shutdown()`` callback
of their respective bus, and call ``dsa_switch_shutdown()`` from it (a minimal
version of the full teardown performed by ``dsa_unregister_switch()``).
The reason is that DSA keeps a reference on the master net device, and if the
driver for the master device decides to unbind on shutdown, DSA's reference
will block that operation from finalizing.
Either ``dsa_switch_shutdown()`` or ``dsa_unregister_switch()`` must be called,
but not both, and the device driver model permits the bus' ``remove()`` method
to be called even if ``shutdown()`` was already called. Therefore, drivers are
expected to implement a mutual exclusion method between ``remove()`` and
``shutdown()`` by setting their drvdata to NULL after any of these has run, and
checking whether the drvdata is NULL before proceeding to take any action.
After ``dsa_switch_shutdown()`` or ``dsa_unregister_switch()`` was called, no
further callbacks via the provided ``dsa_switch_ops`` may take place, and the
driver may free the data structures associated with the ``dsa_switch``.
Switch configuration
--------------------
- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported,
should be a valid value from the ``dsa_tag_protocol`` enum
- ``get_tag_protocol``: this is to indicate what kind of tagging protocol is
supported, should be a valid value from the ``dsa_tag_protocol`` enum.
The returned information does not have to be static; the driver is passed the
CPU port number, as well as the tagging protocol of a possibly stacked
upstream switch, in case there are hardware limitations in terms of supported
tag formats.
- ``probe``: probe routine which will be invoked by the DSA platform device upon
registration to test for the presence/absence of a switch device. For MDIO
devices, it is recommended to issue a read towards internal registers using
the switch pseudo-PHY and return whether this is a supported device. For other
buses, return a non-NULL string
- ``change_tag_protocol``: when the default tagging protocol has compatibility
problems with the master or other issues, the driver may support changing it
at runtime, either through a device tree property or through sysfs. In that
case, further calls to ``get_tag_protocol`` should report the protocol in
current use.
- ``setup``: setup function for the switch, this function is responsible for setting
up the ``dsa_switch_ops`` private structure with all it needs: register maps,
@ -535,7 +617,17 @@ Switch configuration
fully configured and ready to serve any kind of request. It is recommended
to issue a software reset of the switch during this setup function in order to
avoid relying on what a previous software agent such as a bootloader/firmware
may have previously configured.
may have previously configured. The method responsible for undoing any
applicable allocations or operations done here is ``teardown``.
- ``port_setup`` and ``port_teardown``: methods for initialization and
destruction of per-port data structures. It is mandatory for some operations
such as registering and unregistering devlink port regions to be done from
these methods, otherwise they are optional. A port will be torn down only if
it has been previously set up. It is possible for a port to be set up during
probing only to be torn down immediately afterwards, for example in case its
PHY cannot be found. In this case, probing of the DSA switch continues
without that particular port.
PHY devices and link management
-------------------------------
@ -635,26 +727,198 @@ Power management
``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is
disabled while being a bridge member
Address databases
-----------------
Switching hardware is expected to have a table for FDB entries, however not all
of them are active at the same time. An address database is the subset (partition)
of FDB entries that is active (can be matched by address learning on RX, or FDB
lookup on TX) depending on the state of the port. An address database may
occasionally be called "FID" (Filtering ID) in this document, although the
underlying implementation may choose whatever is available to the hardware.
For example, all ports that belong to a VLAN-unaware bridge (which is
*currently* VLAN-unaware) are expected to learn source addresses in the
database associated by the driver with that bridge (and not with other
VLAN-unaware bridges). During forwarding and FDB lookup, a packet received on a
VLAN-unaware bridge port should be able to find a VLAN-unaware FDB entry having
the same MAC DA as the packet, which is present on another port member of the
same bridge. At the same time, the FDB lookup process must be able to not find
an FDB entry having the same MAC DA as the packet, if that entry points towards
a port which is a member of a different VLAN-unaware bridge (and is therefore
associated with a different address database).
Similarly, each VLAN of each offloaded VLAN-aware bridge should have an
associated address database, which is shared by all ports which are members of
that VLAN, but not shared by ports belonging to different bridges that are
members of the same VID.
In this context, a VLAN-unaware database means that all packets are expected to
match on it irrespective of VLAN ID (only MAC address lookup), whereas a
VLAN-aware database means that packets are supposed to match based on the VLAN
ID from the classified 802.1Q header (or the pvid if untagged).
At the bridge layer, VLAN-unaware FDB entries have the special VID value of 0,
whereas VLAN-aware FDB entries have non-zero VID values. Note that a
VLAN-unaware bridge may have VLAN-aware (non-zero VID) FDB entries, and a
VLAN-aware bridge may have VLAN-unaware FDB entries. As in hardware, the
software bridge keeps separate address databases, and offloads to hardware the
FDB entries belonging to these databases, through switchdev, asynchronously
relative to the moment when the databases become active or inactive.
When a user port operates in standalone mode, its driver should configure it to
use a separate database called a port private database. This is different from
the databases described above, and should impede operation as standalone port
(packet in, packet out to the CPU port) as little as possible. For example,
on ingress, it should not attempt to learn the MAC SA of ingress traffic, since
learning is a bridging layer service and this is a standalone port, therefore
it would consume useless space. With no address learning, the port private
database should be empty in a naive implementation, and in this case, all
received packets should be trivially flooded to the CPU port.
DSA (cascade) and CPU ports are also called "shared" ports because they service
multiple address databases, and the database that a packet should be associated
to is usually embedded in the DSA tag. This means that the CPU port may
simultaneously transport packets coming from a standalone port (which were
classified by hardware in one address database), and from a bridge port (which
were classified to a different address database).
Switch drivers which satisfy certain criteria are able to optimize the naive
configuration by removing the CPU port from the flooding domain of the switch,
and just program the hardware with FDB entries pointing towards the CPU port
for which it is known that software is interested in those MAC addresses.
Packets which do not match a known FDB entry will not be delivered to the CPU,
which will save CPU cycles required for creating an skb just to drop it.
DSA is able to perform host address filtering for the following kinds of
addresses:
- Primary unicast MAC addresses of ports (``dev->dev_addr``). These are
associated with the port private database of the respective user port,
and the driver is notified to install them through ``port_fdb_add`` towards
the CPU port.
- Secondary unicast and multicast MAC addresses of ports (addresses added
through ``dev_uc_add()`` and ``dev_mc_add()``). These are also associated
with the port private database of the respective user port.
- Local/permanent bridge FDB entries (``BR_FDB_LOCAL``). These are the MAC
addresses of the bridge ports, for which packets must be terminated locally
and not forwarded. They are associated with the address database for that
bridge.
- Static bridge FDB entries installed towards foreign (non-DSA) interfaces
present in the same bridge as some DSA switch ports. These are also
associated with the address database for that bridge.
- Dynamically learned FDB entries on foreign interfaces present in the same
bridge as some DSA switch ports, only if ``ds->assisted_learning_on_cpu_port``
is set to true by the driver. These are associated with the address database
for that bridge.
For various operations detailed below, DSA provides a ``dsa_db`` structure
which can be of the following types:
- ``DSA_DB_PORT``: the FDB (or MDB) entry to be installed or deleted belongs to
the port private database of user port ``db->dp``.
- ``DSA_DB_BRIDGE``: the entry belongs to one of the address databases of bridge
``db->bridge``. Separation between the VLAN-unaware database and the per-VID
databases of this bridge is expected to be done by the driver.
- ``DSA_DB_LAG``: the entry belongs to the address database of LAG ``db->lag``.
Note: ``DSA_DB_LAG`` is currently unused and may be removed in the future.
The drivers which act upon the ``dsa_db`` argument in ``port_fdb_add``,
``port_mdb_add`` etc should declare ``ds->fdb_isolation`` as true.
DSA associates each offloaded bridge and each offloaded LAG with a one-based ID
(``struct dsa_bridge :: num``, ``struct dsa_lag :: id``) for the purposes of
refcounting addresses on shared ports. Drivers may piggyback on DSA's numbering
scheme (the ID is readable through ``db->bridge.num`` and ``db->lag.id`` or may
implement their own.
Only the drivers which declare support for FDB isolation are notified of FDB
entries on the CPU port belonging to ``DSA_DB_PORT`` databases.
For compatibility/legacy reasons, ``DSA_DB_BRIDGE`` addresses are notified to
drivers even if they do not support FDB isolation. However, ``db->bridge.num``
and ``db->lag.id`` are always set to 0 in that case (to denote the lack of
isolation, for refcounting purposes).
Note that it is not mandatory for a switch driver to implement physically
separate address databases for each standalone user port. Since FDB entries in
the port private databases will always point to the CPU port, there is no risk
for incorrect forwarding decisions. In this case, all standalone ports may
share the same database, but the reference counting of host-filtered addresses
(not deleting the FDB entry for a port's MAC address if it's still in use by
another port) becomes the responsibility of the driver, because DSA is unaware
that the port databases are in fact shared. This can be achieved by calling
``dsa_fdb_present_in_other_db()`` and ``dsa_mdb_present_in_other_db()``.
The down side is that the RX filtering lists of each user port are in fact
shared, which means that user port A may accept a packet with a MAC DA it
shouldn't have, only because that MAC address was in the RX filtering list of
user port B. These packets will still be dropped in software, however.
Bridge layer
------------
Offloading the bridge forwarding plane is optional and handled by the methods
below. They may be absent, return -EOPNOTSUPP, or ``ds->max_num_bridges`` may
be non-zero and exceeded, and in this case, joining a bridge port is still
possible, but the packet forwarding will take place in software, and the ports
under a software bridge must remain configured in the same way as for
standalone operation, i.e. have all bridging service functions (address
learning etc) disabled, and send all received packets to the CPU port only.
Concretely, a port starts offloading the forwarding plane of a bridge once it
returns success to the ``port_bridge_join`` method, and stops doing so after
``port_bridge_leave`` has been called. Offloading the bridge means autonomously
learning FDB entries in accordance with the software bridge port's state, and
autonomously forwarding (or flooding) received packets without CPU intervention.
This is optional even when offloading a bridge port. Tagging protocol drivers
are expected to call ``dsa_default_offload_fwd_mark(skb)`` for packets which
have already been autonomously forwarded in the forwarding domain of the
ingress switch port. DSA, through ``dsa_port_devlink_setup()``, considers all
switch ports part of the same tree ID to be part of the same bridge forwarding
domain (capable of autonomous forwarding to each other).
Offloading the TX forwarding process of a bridge is a distinct concept from
simply offloading its forwarding plane, and refers to the ability of certain
driver and tag protocol combinations to transmit a single skb coming from the
bridge device's transmit function to potentially multiple egress ports (and
thereby avoid its cloning in software).
Packets for which the bridge requests this behavior are called data plane
packets and have ``skb->offload_fwd_mark`` set to true in the tag protocol
driver's ``xmit`` function. Data plane packets are subject to FDB lookup,
hardware learning on the CPU port, and do not override the port STP state.
Additionally, replication of data plane packets (multicast, flooding) is
handled in hardware and the bridge driver will transmit a single skb for each
packet that may or may not need replication.
When the TX forwarding offload is enabled, the tag protocol driver is
responsible to inject packets into the data plane of the hardware towards the
correct bridging domain (FID) that the port is a part of. The port may be
VLAN-unaware, and in this case the FID must be equal to the FID used by the
driver for its VLAN-unaware address database associated with that bridge.
Alternatively, the bridge may be VLAN-aware, and in that case, it is guaranteed
that the packet is also VLAN-tagged with the VLAN ID that the bridge processed
this packet in. It is the responsibility of the hardware to untag the VID on
the egress-untagged ports, or keep the tag on the egress-tagged ones.
- ``port_bridge_join``: bridge layer function invoked when a given switch port is
added to a bridge, this function should do what's necessary at the switch
level to permit the joining port to be added to the relevant logical
domain for it to ingress/egress traffic with other members of the bridge.
By setting the ``tx_fwd_offload`` argument to true, the TX forwarding process
of this bridge is also offloaded.
- ``port_bridge_leave``: bridge layer function invoked when a given switch port is
removed from a bridge, this function should do what's necessary at the
switch level to deny the leaving port from ingress/egress traffic from the
remaining bridge members. When the port leaves the bridge, it should be aged
out at the switch hardware for the switch to (re) learn MAC addresses behind
this port.
remaining bridge members.
- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP
state is computed by the bridge layer and should be propagated to switch
hardware to forward/block/learn traffic. The switch driver is responsible for
computing a STP state change based on current and asked parameters and perform
the relevant ageing based on the intersection results
hardware to forward/block/learn traffic.
- ``port_bridge_flags``: bridge layer function invoked when a port must
configure its settings for e.g. flooding of unknown traffic or source address
@ -667,21 +931,11 @@ Bridge layer
CPU port, and flooding towards the CPU port should also be enabled, due to a
lack of an explicit address filtering mechanism in the DSA core.
- ``port_bridge_tx_fwd_offload``: bridge layer function invoked after
``port_bridge_join`` when a driver sets ``ds->num_fwd_offloading_bridges`` to
a non-zero value. Returning success in this function activates the TX
forwarding offload bridge feature for this port, which enables the tagging
protocol driver to inject data plane packets towards the bridging domain that
the port is a part of. Data plane packets are subject to FDB lookup, hardware
learning on the CPU port, and do not override the port STP state.
Additionally, replication of data plane packets (multicast, flooding) is
handled in hardware and the bridge driver will transmit a single skb for each
packet that needs replication. The method is provided as a configuration
point for drivers that need to configure the hardware for enabling this
feature.
- ``port_bridge_tx_fwd_unoffload``: bridge layer function invoked when a driver
leaves a bridge port which had the TX forwarding offload feature enabled.
- ``port_fast_age``: bridge layer function invoked when flushing the
dynamically learned FDB entries on the port is necessary. This is called when
transitioning from an STP state where learning should take place to an STP
state where it shouldn't, or when leaving a bridge, or when address learning
is turned off via ``port_bridge_flags``.
Bridge VLAN filtering
---------------------
@ -697,55 +951,44 @@ Bridge VLAN filtering
allowed.
- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured
(tagged or untagged) for the given switch port. If the operation is not
supported by the hardware, this function should return ``-EOPNOTSUPP`` to
inform the bridge code to fallback to a software implementation.
(tagged or untagged) for the given switch port. The CPU port becomes a member
of a VLAN only if a foreign bridge port is also a member of it (and
forwarding needs to take place in software), or the VLAN is installed to the
VLAN group of the bridge device itself, for termination purposes
(``bridge vlan add dev br0 vid 100 self``). VLANs on shared ports are
reference counted and removed when there is no user left. Drivers do not need
to manually install a VLAN on the CPU port.
- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the
given switch port
- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each VLAN the given port is a member
of. A switchdev object is used to carry the VID and bridge flags.
- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a
Forwarding Database entry, the switch hardware should be programmed with the
specified address in the specified VLAN Id in the forwarding database
associated with this VLAN ID. If the operation is not supported, this
function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to
a software implementation.
.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
of DSA, would be its port-based VLAN, used by the associated bridge device.
associated with this VLAN ID.
- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a
Forwarding Database entry, the switch hardware should be programmed to delete
the specified MAC address from the specified VLAN ID if it was mapped into
this port forwarding database
- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each MAC address known to be behind
the given port. A switchdev object is used to carry the VID and FDB info.
- ``port_fdb_dump``: bridge bypass function invoked by ``ndo_fdb_dump`` on the
physical DSA port interfaces. Since DSA does not attempt to keep in sync its
hardware FDB entries with the software bridge, this method is implemented as
a means to view the entries visible on user ports in the hardware database.
The entries reported by this function have the ``self`` flag in the output of
the ``bridge fdb show`` command.
- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install
a multicast database entry. If the operation is not supported, this function
should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to a
software implementation. The switch hardware should be programmed with the
a multicast database entry. The switch hardware should be programmed with the
specified address in the specified VLAN ID in the forwarding database
associated with this VLAN ID.
.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
of DSA, would be its port-based VLAN, used by the associated bridge device.
- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a
multicast database entry, the switch hardware should be programmed to delete
the specified MAC address from the specified VLAN ID if it was mapped into
this port forwarding database.
- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each MAC address known to be behind
the given port. A switchdev object is used to carry the VID and MDB info.
Link aggregation
----------------

View File

@ -1052,11 +1052,7 @@ udp_rmem_min - INTEGER
Default: 4K
udp_wmem_min - INTEGER
Minimal size of send buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for sending data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
Default: 4K
UDP does not have tx memory accounting and this tunable has no effect.
RAW variables
=============
@ -1085,7 +1081,7 @@ cipso_cache_enable - BOOLEAN
cipso_cache_bucket_size - INTEGER
The CIPSO label cache consists of a fixed size hash table with each
hash bucket containing a number of cache entries. This variable limits
the number of entries in each hash bucket; the larger the value the
the number of entries in each hash bucket; the larger the value is, the
more CIPSO label mappings that can be cached. When the number of
entries in a given hash bucket reaches this limit adding new entries
causes the oldest entry in the bucket to be removed to make room.
@ -1179,7 +1175,7 @@ ip_autobind_reuse - BOOLEAN
option should only be set by experts.
Default: 0
ip_dynaddr - BOOLEAN
ip_dynaddr - INTEGER
If set non-zero, enables support for dynamic addresses.
If set to a non-zero value larger than 1, a kernel log
message will be printed when dynamic address rewriting

View File

@ -10,7 +10,7 @@ AC97
====
AC97 is a five wire interface commonly found on many PC sound cards. It is
now also popular in many portable devices. This DAI has a reset line and time
now also popular in many portable devices. This DAI has a RESET line and time
multiplexes its data on its SDATA_OUT (playback) and SDATA_IN (capture) lines.
The bit clock (BCLK) is always driven by the CODEC (usually 12.288MHz) and the
frame (FRAME) (usually 48kHz) is always driven by the controller. Each AC97

View File

@ -50,9 +50,9 @@ Di conseguenza, nella tabella dei simboli del kernel ci sarà una voce
rappresentata dalla struttura ``kernel_symbol`` che avrà il campo
``namespace`` (spazio dei nomi) impostato. Un simbolo esportato senza uno spazio
dei nomi avrà questo campo impostato a ``NULL``. Non esiste uno spazio dei nomi
di base. Il programma ``modpost`` e il codice in kernel/module.c usano lo spazio
dei nomi, rispettivamente, durante la compilazione e durante il caricamento
di un modulo.
di base. Il programma ``modpost`` e il codice in kernel/module/main.c usano lo
spazio dei nomi, rispettivamente, durante la compilazione e durante il
caricamento di un modulo.
2.2 Usare il simbolo di preprocessore DEFAULT_SYMBOL_NAMESPACE
==============================================================

View File

@ -224,7 +224,7 @@ kernel/kmod.c
模块接口支持
------------
更多信息请参考文件kernel/module.c
更多信息请参阅kernel/module/目录下的文件
硬件接口
========

View File

@ -52,7 +52,7 @@
相应的 ksymtab 条目结构体 ``kernel_symbol`` 将有相应的成员 ``命名空间`` 集。
导出时未指明命名空间的符号将指向 ``NULL`` 。如果没有定义命名空间,则默认没有。
``modpost`` 和kernel/module.c分别在构建时或模块加载时使用名称空间。
``modpost`` 和kernel/module/main.c分别在构建时或模块加载时使用名称空间。
2.2 使用DEFAULT_SYMBOL_NAMESPACE定义
====================================

View File

@ -5657,7 +5657,8 @@ by a string of size ``name_size``.
#define KVM_STATS_UNIT_BYTES (0x1 << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_UNIT_SECONDS (0x2 << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_UNIT_CYCLES (0x3 << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_UNIT_MAX KVM_STATS_UNIT_CYCLES
#define KVM_STATS_UNIT_BOOLEAN (0x4 << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_UNIT_MAX KVM_STATS_UNIT_BOOLEAN
#define KVM_STATS_BASE_SHIFT 8
#define KVM_STATS_BASE_MASK (0xF << KVM_STATS_BASE_SHIFT)
@ -5702,14 +5703,13 @@ Bits 0-3 of ``flags`` encode the type:
by the ``hist_param`` field. The range of the Nth bucket (1 <= N < ``size``)
is [``hist_param``*(N-1), ``hist_param``*N), while the range of the last
bucket is [``hist_param``*(``size``-1), +INF). (+INF means positive infinity
value.) The bucket value indicates how many samples fell in the bucket's range.
value.)
* ``KVM_STATS_TYPE_LOG_HIST``
The statistic is reported as a logarithmic histogram. The number of
buckets is specified by the ``size`` field. The range of the first bucket is
[0, 1), while the range of the last bucket is [pow(2, ``size``-2), +INF).
Otherwise, The Nth bucket (1 < N < ``size``) covers
[pow(2, N-2), pow(2, N-1)). The bucket value indicates how many samples fell
in the bucket's range.
[pow(2, N-2), pow(2, N-1)).
Bits 4-7 of ``flags`` encode the unit:
@ -5724,6 +5724,15 @@ Bits 4-7 of ``flags`` encode the unit:
It indicates that the statistics data is used to measure time or latency.
* ``KVM_STATS_UNIT_CYCLES``
It indicates that the statistics data is used to measure CPU clock cycles.
* ``KVM_STATS_UNIT_BOOLEAN``
It indicates that the statistic will always be either 0 or 1. Boolean
statistics of "peak" type will never go back from 1 to 0. Boolean
statistics can be linear histograms (with two buckets) but not logarithmic
histograms.
Note that, in the case of histograms, the unit applies to the bucket
ranges, while the bucket value indicates how many samples fell in the
bucket's range.
Bits 8-11 of ``flags``, together with ``exponent``, encode the scale of the
unit:
@ -5746,7 +5755,7 @@ the corresponding statistics data.
The ``bucket_size`` field is used as a parameter for histogram statistics data.
It is only used by linear histogram statistics data, specifying the size of a
bucket.
bucket in the unit expressed by bits 4-11 of ``flags`` together with ``exponent``.
The ``name`` field is the name string of the statistics data. The name string
starts at the end of ``struct kvm_stats_desc``. The maximum length including

View File

@ -1038,6 +1038,7 @@ F: arch/arm64/boot/dts/amd/
AMD XGBE DRIVER
M: Tom Lendacky <thomas.lendacky@amd.com>
M: "Shyam Sundar S K" <Shyam-sundar.S-k@amd.com>
L: netdev@vger.kernel.org
S: Supported
F: arch/arm64/boot/dts/amd/amd-seattle-xgbe*.dtsi
@ -2497,10 +2498,8 @@ F: drivers/power/reset/oxnas-restart.c
N: oxnas
ARM/PALM TREO SUPPORT
M: Tomas Cech <sleep_walker@suse.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
W: http://hackndev.com
S: Orphan
F: arch/arm/mach-pxa/palmtreo.*
ARM/PALMTX,PALMT5,PALMLD,PALMTE2,PALMTC SUPPORT
@ -7774,9 +7773,6 @@ F: include/linux/fs.h
F: include/linux/fs_types.h
F: include/uapi/linux/fs.h
F: include/uapi/linux/openat2.h
X: fs/io-wq.c
X: fs/io-wq.h
X: fs/io_uring.c
FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER
M: Riku Voipio <riku.voipio@iki.fi>
@ -10477,9 +10473,7 @@ L: io-uring@vger.kernel.org
S: Maintained
T: git git://git.kernel.dk/linux-block
T: git git://git.kernel.dk/liburing
F: fs/io-wq.c
F: fs/io-wq.h
F: fs/io_uring.c
F: io_uring/
F: include/linux/io_uring.h
F: include/uapi/linux/io_uring.h
F: tools/io_uring/
@ -14363,7 +14357,8 @@ S: Maintained
F: drivers/net/phy/nxp-c45-tja11xx.c
NXP FSPI DRIVER
M: Ashish Kumar <ashish.kumar@nxp.com>
M: Han Xu <han.xu@nxp.com>
M: Haibo Chen <haibo.chen@nxp.com>
R: Yogesh Gaur <yogeshgaur.83@gmail.com>
L: linux-spi@vger.kernel.org
S: Maintained
@ -15849,7 +15844,7 @@ PIN CONTROLLER - FREESCALE
M: Dong Aisheng <aisheng.dong@nxp.com>
M: Fabio Estevam <festevam@gmail.com>
M: Shawn Guo <shawnguo@kernel.org>
M: Stefan Agner <stefan@agner.ch>
M: Jacky Bai <ping.bai@nxp.com>
R: Pengutronix Kernel Team <kernel@pengutronix.de>
L: linux-gpio@vger.kernel.org
S: Maintained
@ -17273,12 +17268,15 @@ N: riscv
K: riscv
RISC-V/MICROCHIP POLARFIRE SOC SUPPORT
M: Lewis Hanly <lewis.hanly@microchip.com>
M: Conor Dooley <conor.dooley@microchip.com>
M: Daire McNamara <daire.mcnamara@microchip.com>
L: linux-riscv@lists.infradead.org
S: Supported
F: arch/riscv/boot/dts/microchip/
F: drivers/char/hw_random/mpfs-rng.c
F: drivers/clk/microchip/clk-mpfs.c
F: drivers/mailbox/mailbox-mpfs.c
F: drivers/pci/controller/pcie-microchip-host.c
F: drivers/soc/microchip/
F: include/soc/microchip/mpfs.h

View File

@ -2,7 +2,7 @@
VERSION = 5
PATCHLEVEL = 19
SUBLEVEL = 0
EXTRAVERSION = -rc6
EXTRAVERSION = -rc8
NAME = Superb Owl
# *DOCUMENTATION*
@ -1097,6 +1097,7 @@ export MODULES_NSDEPS := $(extmod_prefix)modules.nsdeps
ifeq ($(KBUILD_EXTMOD),)
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/
core-$(CONFIG_BLOCK) += block/
core-$(CONFIG_IO_URING) += io_uring/
vmlinux-dirs := $(patsubst %/,%,$(filter %/, \
$(core-y) $(core-m) $(drivers-y) $(drivers-m) \

View File

@ -438,6 +438,13 @@ config MMU_GATHER_PAGE_SIZE
config MMU_GATHER_NO_RANGE
bool
select MMU_GATHER_MERGE_VMAS
config MMU_GATHER_NO_FLUSH_CACHE
bool
config MMU_GATHER_MERGE_VMAS
bool
config MMU_GATHER_NO_GATHER
bool

View File

@ -226,7 +226,7 @@
reg = <0x28>;
#gpio-cells = <2>;
gpio-controller;
ngpio = <32>;
ngpios = <62>;
};
sgtl5000: codec@a {

View File

@ -166,7 +166,7 @@
atmel_mxt_ts: touchscreen@4a {
compatible = "atmel,maxtouch";
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_atmel_conn>;
pinctrl-0 = <&pinctrl_atmel_conn &pinctrl_atmel_snvs_conn>;
reg = <0x4a>;
interrupt-parent = <&gpio5>;
interrupts = <4 IRQ_TYPE_EDGE_FALLING>; /* SODIMM 107 / INT */
@ -331,7 +331,6 @@
pinctrl_atmel_conn: atmelconngrp {
fsl,pins = <
MX6UL_PAD_JTAG_MOD__GPIO1_IO10 0xb0a0 /* SODIMM 106 */
MX6ULL_PAD_SNVS_TAMPER4__GPIO5_IO04 0xb0a0 /* SODIMM 107 */
>;
};
@ -684,6 +683,12 @@
};
&iomuxc_snvs {
pinctrl_atmel_snvs_conn: atmelsnvsconngrp {
fsl,pins = <
MX6ULL_PAD_SNVS_TAMPER4__GPIO5_IO04 0xb0a0 /* SODIMM 107 */
>;
};
pinctrl_snvs_gpio1: snvsgpio1grp {
fsl,pins = <
MX6ULL_PAD_SNVS_TAMPER6__GPIO5_IO06 0x110a0 /* SODIMM 93 */

View File

@ -87,22 +87,22 @@
phy4: ethernet-phy@5 {
reg = <5>;
coma-mode-gpios = <&gpio 37 GPIO_ACTIVE_HIGH>;
coma-mode-gpios = <&gpio 37 GPIO_OPEN_DRAIN>;
};
phy5: ethernet-phy@6 {
reg = <6>;
coma-mode-gpios = <&gpio 37 GPIO_ACTIVE_HIGH>;
coma-mode-gpios = <&gpio 37 GPIO_OPEN_DRAIN>;
};
phy6: ethernet-phy@7 {
reg = <7>;
coma-mode-gpios = <&gpio 37 GPIO_ACTIVE_HIGH>;
coma-mode-gpios = <&gpio 37 GPIO_OPEN_DRAIN>;
};
phy7: ethernet-phy@8 {
reg = <8>;
coma-mode-gpios = <&gpio 37 GPIO_ACTIVE_HIGH>;
coma-mode-gpios = <&gpio 37 GPIO_OPEN_DRAIN>;
};
};

View File

@ -506,6 +506,8 @@
interrupts = <GIC_SPI 108 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&gcc GCC_BLSP1_UART2_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
clock-names = "core", "iface";
pinctrl-names = "default";
pinctrl-0 = <&blsp1_uart2_default>;
status = "disabled";
};
@ -581,6 +583,9 @@
interrupts = <GIC_SPI 113 IRQ_TYPE_NONE>;
clocks = <&gcc GCC_BLSP2_UART1_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
clock-names = "core", "iface";
pinctrl-names = "default", "sleep";
pinctrl-0 = <&blsp2_uart1_default>;
pinctrl-1 = <&blsp2_uart1_sleep>;
status = "disabled";
};
@ -599,6 +604,8 @@
interrupts = <GIC_SPI 116 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&gcc GCC_BLSP2_UART4_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
clock-names = "core", "iface";
pinctrl-names = "default";
pinctrl-0 = <&blsp2_uart4_default>;
status = "disabled";
};
@ -639,6 +646,9 @@
interrupts = <0 106 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&gcc GCC_BLSP2_QUP6_I2C_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
clock-names = "core", "iface";
pinctrl-names = "default", "sleep";
pinctrl-0 = <&blsp2_i2c6_default>;
pinctrl-1 = <&blsp2_i2c6_sleep>;
#address-cells = <1>;
#size-cells = <0>;
};
@ -1256,7 +1266,7 @@
};
};
blsp1_uart2_active: blsp1-uart2-active {
blsp1_uart2_default: blsp1-uart2-default {
rx {
pins = "gpio5";
function = "blsp_uart2";
@ -1272,7 +1282,7 @@
};
};
blsp2_uart1_active: blsp2-uart1-active {
blsp2_uart1_default: blsp2-uart1-default {
tx-rts {
pins = "gpio41", "gpio44";
function = "blsp_uart7";
@ -1295,7 +1305,7 @@
bias-pull-down;
};
blsp2_uart4_active: blsp2-uart4-active {
blsp2_uart4_default: blsp2-uart4-default {
tx-rts {
pins = "gpio53", "gpio56";
function = "blsp_uart10";
@ -1406,7 +1416,19 @@
bias-pull-up;
};
/* BLSP2_I2C6 info is missing - nobody uses it though? */
blsp2_i2c6_default: blsp2-i2c6-default {
pins = "gpio87", "gpio88";
function = "blsp_i2c12";
drive-strength = <2>;
bias-disable;
};
blsp2_i2c6_sleep: blsp2-i2c6-sleep {
pins = "gpio87", "gpio88";
function = "blsp_i2c12";
drive-strength = <2>;
bias-pull-up;
};
spi8_default: spi8_default {
mosi {

View File

@ -1124,7 +1124,7 @@
clocks = <&pmc PMC_TYPE_PERIPHERAL 55>, <&pmc PMC_TYPE_GCK 55>;
clock-names = "pclk", "gclk";
assigned-clocks = <&pmc PMC_TYPE_CORE PMC_I2S1_MUX>;
assigned-parrents = <&pmc PMC_TYPE_GCK 55>;
assigned-clock-parents = <&pmc PMC_TYPE_GCK 55>;
status = "disabled";
};

View File

@ -169,7 +169,7 @@
flash@0 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "mxicy,mx25l1606e", "winbond,w25q128";
compatible = "mxicy,mx25l1606e", "jedec,spi-nor";
reg = <0>;
spi-max-frequency = <40000000>;
};

View File

@ -112,19 +112,6 @@ static __always_inline void set_domain(unsigned int val)
}
#endif
#ifdef CONFIG_CPU_USE_DOMAINS
#define modify_domain(dom,type) \
do { \
unsigned int domain = get_domain(); \
domain &= ~domain_mask(dom); \
domain = domain | domain_val(dom, type); \
set_domain(domain); \
} while (0)
#else
static inline void modify_domain(unsigned dom, unsigned type) { }
#endif
/*
* Generate the T (user) versions of the LDR/STR and related
* instructions (inline assembly)

View File

@ -27,6 +27,7 @@ enum {
MT_HIGH_VECTORS,
MT_MEMORY_RWX,
MT_MEMORY_RW,
MT_MEMORY_RO,
MT_ROM,
MT_MEMORY_RWX_NONCACHED,
MT_MEMORY_RW_DTCM,

View File

@ -163,5 +163,31 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs)
((current_stack_pointer | (THREAD_SIZE - 1)) - 7) - 1; \
})
/*
* Update ITSTATE after normal execution of an IT block instruction.
*
* The 8 IT state bits are split into two parts in CPSR:
* ITSTATE<1:0> are in CPSR<26:25>
* ITSTATE<7:2> are in CPSR<15:10>
*/
static inline unsigned long it_advance(unsigned long cpsr)
{
if ((cpsr & 0x06000400) == 0) {
/* ITSTATE<2:0> == 0 means end of IT block, so clear IT state */
cpsr &= ~PSR_IT_MASK;
} else {
/* We need to shift left ITSTATE<4:0> */
const unsigned long mask = 0x06001c00; /* Mask ITSTATE<4:0> */
unsigned long it = cpsr & mask;
it <<= 1;
it |= it >> (27 - 10); /* Carry ITSTATE<2> to correct place */
it &= mask;
cpsr &= ~mask;
cpsr |= it;
}
return cpsr;
}
#endif /* __ASSEMBLY__ */
#endif

View File

@ -302,6 +302,7 @@ local_restart:
b ret_fast_syscall
#endif
ENDPROC(vector_swi)
.ltorg
/*
* This is the really slow path. We're going to be doing

View File

@ -311,7 +311,7 @@ void __init rockchip_suspend_init(void)
&match);
if (!match) {
pr_err("Failed to find PMU node\n");
return;
goto out_put;
}
pm_data = (struct rockchip_pm_data *) match->data;
@ -320,9 +320,12 @@ void __init rockchip_suspend_init(void)
if (ret) {
pr_err("%s: matches init error %d\n", __func__, ret);
return;
goto out_put;
}
}
suspend_set_ops(pm_data->ops);
out_put:
of_node_put(np);
}

View File

@ -631,7 +631,11 @@ config CPU_USE_DOMAINS
bool
help
This option enables or disables the use of domain switching
via the set_fs() function.
using the DACR (domain access control register) to protect memory
domains from each other. In Linux we use three domains: kernel, user
and IO. The domains are used to protect userspace from kernelspace
and to handle IO-space as a special type of memory by assigning
manager or client roles to running code (such as a process).
config CPU_V7M_NUM_IRQ
int "Number of external interrupts connected to the NVIC"

View File

@ -935,6 +935,9 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
if (type == TYPE_LDST)
do_alignment_finish_ldst(addr, instr, regs, offset);
if (thumb_mode(regs))
regs->ARM_cpsr = it_advance(regs->ARM_cpsr);
return 0;
bad_or_fault:

View File

@ -296,6 +296,13 @@ static struct mem_type mem_types[] __ro_after_init = {
.prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE,
.domain = DOMAIN_KERNEL,
},
[MT_MEMORY_RO] = {
.prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
L_PTE_XN | L_PTE_RDONLY,
.prot_l1 = PMD_TYPE_TABLE,
.prot_sect = PMD_TYPE_SECT,
.domain = DOMAIN_KERNEL,
},
[MT_ROM] = {
.prot_sect = PMD_TYPE_SECT,
.domain = DOMAIN_KERNEL,
@ -489,6 +496,7 @@ static void __init build_mem_type_table(void)
/* Also setup NX memory mapping */
mem_types[MT_MEMORY_RW].prot_sect |= PMD_SECT_XN;
mem_types[MT_MEMORY_RO].prot_sect |= PMD_SECT_XN;
}
if (cpu_arch >= CPU_ARCH_ARMv7 && (cr & CR_TRE)) {
/*
@ -568,6 +576,7 @@ static void __init build_mem_type_table(void)
mem_types[MT_ROM].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
mem_types[MT_MINICLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
mem_types[MT_CACHECLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
mem_types[MT_MEMORY_RO].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
#endif
/*
@ -587,6 +596,8 @@ static void __init build_mem_type_table(void)
mem_types[MT_MEMORY_RWX].prot_pte |= L_PTE_SHARED;
mem_types[MT_MEMORY_RW].prot_sect |= PMD_SECT_S;
mem_types[MT_MEMORY_RW].prot_pte |= L_PTE_SHARED;
mem_types[MT_MEMORY_RO].prot_sect |= PMD_SECT_S;
mem_types[MT_MEMORY_RO].prot_pte |= L_PTE_SHARED;
mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
mem_types[MT_MEMORY_RWX_NONCACHED].prot_sect |= PMD_SECT_S;
mem_types[MT_MEMORY_RWX_NONCACHED].prot_pte |= L_PTE_SHARED;
@ -647,6 +658,8 @@ static void __init build_mem_type_table(void)
mem_types[MT_MEMORY_RWX].prot_pte |= kern_pgprot;
mem_types[MT_MEMORY_RW].prot_sect |= ecc_mask | cp->pmd;
mem_types[MT_MEMORY_RW].prot_pte |= kern_pgprot;
mem_types[MT_MEMORY_RO].prot_sect |= ecc_mask | cp->pmd;
mem_types[MT_MEMORY_RO].prot_pte |= kern_pgprot;
mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
mem_types[MT_MEMORY_RWX_NONCACHED].prot_sect |= ecc_mask;
mem_types[MT_ROM].prot_sect |= cp->pmd;
@ -1360,7 +1373,7 @@ static void __init devicemaps_init(const struct machine_desc *mdesc)
map.pfn = __phys_to_pfn(__atags_pointer & SECTION_MASK);
map.virtual = FDT_FIXED_BASE;
map.length = FDT_FIXED_SIZE;
map.type = MT_ROM;
map.type = MT_MEMORY_RO;
create_mapping(&map);
}

View File

@ -108,8 +108,7 @@ static unsigned int spectre_v2_install_workaround(unsigned int method)
#else
static unsigned int spectre_v2_install_workaround(unsigned int method)
{
pr_info("CPU%u: Spectre V2: workarounds disabled by configuration\n",
smp_processor_id());
pr_info_once("Spectre V2: workarounds disabled by configuration\n");
return SPECTRE_VULNERABLE;
}
@ -209,10 +208,10 @@ static int spectre_bhb_install_workaround(int method)
return SPECTRE_VULNERABLE;
spectre_bhb_method = method;
}
pr_info("CPU%u: Spectre BHB: using %s workaround\n",
smp_processor_id(), spectre_bhb_method_name(method));
pr_info("CPU%u: Spectre BHB: enabling %s workaround for all CPUs\n",
smp_processor_id(), spectre_bhb_method_name(method));
}
return SPECTRE_MITIGATED;
}

View File

@ -14,6 +14,7 @@
#include <linux/types.h>
#include <linux/stddef.h>
#include <asm/probes.h>
#include <asm/ptrace.h>
#include <asm/kprobes.h>
void __init arm_probes_decode_init(void);
@ -35,31 +36,6 @@ void __init find_str_pc_offset(void);
#endif
/*
* Update ITSTATE after normal execution of an IT block instruction.
*
* The 8 IT state bits are split into two parts in CPSR:
* ITSTATE<1:0> are in CPSR<26:25>
* ITSTATE<7:2> are in CPSR<15:10>
*/
static inline unsigned long it_advance(unsigned long cpsr)
{
if ((cpsr & 0x06000400) == 0) {
/* ITSTATE<2:0> == 0 means end of IT block, so clear IT state */
cpsr &= ~PSR_IT_MASK;
} else {
/* We need to shift left ITSTATE<4:0> */
const unsigned long mask = 0x06001c00; /* Mask ITSTATE<4:0> */
unsigned long it = cpsr & mask;
it <<= 1;
it |= it >> (27 - 10); /* Carry ITSTATE<2> to correct place */
it &= mask;
cpsr &= ~mask;
cpsr |= it;
}
return cpsr;
}
static inline void __kprobes bx_write_pc(long pcv, struct pt_regs *regs)
{
long cpsr = regs->ARM_cpsr;

View File

@ -9,6 +9,14 @@
/delete-node/ cpu@3;
};
timer {
compatible = "arm,armv8-timer";
interrupts = <GIC_PPI 13 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>,
<GIC_PPI 14 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>,
<GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>,
<GIC_PPI 10 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>;
};
pmu {
compatible = "arm,cortex-a53-pmu";
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>,

View File

@ -29,6 +29,8 @@
device_type = "cpu";
compatible = "brcm,brahma-b53";
reg = <0x0>;
enable-method = "spin-table";
cpu-release-addr = <0x0 0xfff8>;
next-level-cache = <&l2>;
};

View File

@ -224,9 +224,12 @@
little-endian;
};
efuse@1e80000 {
sfp: efuse@1e80000 {
compatible = "fsl,ls1028a-sfp";
reg = <0x0 0x1e80000 0x0 0x10000>;
clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL
QORIQ_CLK_PLL_DIV(4)>;
clock-names = "sfp";
#address-cells = <1>;
#size-cells = <1>;

View File

@ -376,7 +376,8 @@ camera: &i2c7 {
<&cru ACLK_VIO>,
<&cru ACLK_GIC_PRE>,
<&cru PCLK_DDR>,
<&cru ACLK_HDCP>;
<&cru ACLK_HDCP>,
<&cru ACLK_VDU>;
assigned-clock-rates =
<600000000>, <1600000000>,
<1000000000>,
@ -388,6 +389,7 @@ camera: &i2c7 {
<400000000>,
<200000000>,
<200000000>,
<400000000>,
<400000000>;
};

View File

@ -1462,7 +1462,8 @@
<&cru HCLK_PERILP1>, <&cru PCLK_PERILP1>,
<&cru ACLK_VIO>, <&cru ACLK_HDCP>,
<&cru ACLK_GIC_PRE>,
<&cru PCLK_DDR>;
<&cru PCLK_DDR>,
<&cru ACLK_VDU>;
assigned-clock-rates =
<594000000>, <800000000>,
<1000000000>,
@ -1473,7 +1474,8 @@
<100000000>, <50000000>,
<400000000>, <400000000>,
<200000000>,
<200000000>;
<200000000>,
<400000000>;
};
grf: syscon@ff770000 {

View File

@ -687,6 +687,7 @@
};
&usb_host0_xhci {
dr_mode = "host";
status = "okay";
};

View File

@ -133,7 +133,7 @@
assigned-clocks = <&cru SCLK_GMAC1_RX_TX>, <&cru SCLK_GMAC1_RGMII_SPEED>, <&cru SCLK_GMAC1>;
assigned-clock-parents = <&cru SCLK_GMAC1_RGMII_SPEED>, <&cru SCLK_GMAC1>, <&gmac1_clkin>;
clock_in_out = "input";
phy-mode = "rgmii-id";
phy-mode = "rgmii";
phy-supply = <&vcc_3v3>;
pinctrl-names = "default";
pinctrl-0 = <&gmac1m1_miim

View File

@ -4,21 +4,6 @@
#define __ASM_CSKY_TLB_H
#include <asm/cacheflush.h>
#define tlb_start_vma(tlb, vma) \
do { \
if (!(tlb)->fullmm) \
flush_cache_range(vma, (vma)->vm_start, (vma)->vm_end); \
} while (0)
#define tlb_end_vma(tlb, vma) \
do { \
if (!(tlb)->fullmm) \
flush_tlb_range(vma, (vma)->vm_start, (vma)->vm_end); \
} while (0)
#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
#include <asm-generic/tlb.h>
#endif /* __ASM_CSKY_TLB_H */

View File

@ -108,6 +108,7 @@ config LOONGARCH
select TRACE_IRQFLAGS_SUPPORT
select USE_PERCPU_NUMA_NODE_ID
select ZONE_DMA32
select MMU_GATHER_MERGE_VMAS if MMU
config 32BIT
bool

View File

@ -137,16 +137,6 @@ static inline void invtlb_all(u32 op, u32 info, u64 addr)
);
}
/*
* LoongArch doesn't need any special per-pte or per-vma handling, except
* we need to flush cache for area to be unmapped.
*/
#define tlb_start_vma(tlb, vma) \
do { \
if (!(tlb)->fullmm) \
flush_cache_range(vma, vma->vm_start, vma->vm_end); \
} while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
static void tlb_flush(struct mmu_gather *tlb);

View File

@ -256,6 +256,7 @@ config PPC
select IRQ_FORCED_THREADING
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
select MMU_GATHER_MERGE_VMAS
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE
select NEED_PER_CPU_EMBED_FIRST_CHUNK if PPC64

View File

@ -19,8 +19,6 @@
#include <linux/pagemap.h>
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
#define __tlb_remove_tlb_entry __tlb_remove_tlb_entry
#define tlb_flush tlb_flush

View File

@ -38,7 +38,7 @@ config RISCV
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOC if MMU
select ARCH_SUPPORTS_HUGETLBFS if MMU
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU

View File

@ -73,6 +73,7 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
endif
KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-relax)
KBUILD_AFLAGS_MODULE += $(call as-option,-Wa$(comma)-mno-relax)
# GCC versions that support the "-mstrict-align" option default to allowing
# unaligned accesses. While unaligned accesses are explicitly allowed in the

View File

@ -35,7 +35,7 @@
gpio-keys {
compatible = "gpio-keys";
key0 {
key {
label = "KEY0";
linux,code = <BTN_0>;
gpios = <&gpio0 10 GPIO_ACTIVE_LOW>;

View File

@ -47,7 +47,7 @@
gpio-keys {
compatible = "gpio-keys";
boot {
key-boot {
label = "BOOT";
linux,code = <BTN_0>;
gpios = <&gpio0 0 GPIO_ACTIVE_LOW>;

View File

@ -52,7 +52,7 @@
gpio-keys {
compatible = "gpio-keys";
boot {
key-boot {
label = "BOOT";
linux,code = <BTN_0>;
gpios = <&gpio0 0 GPIO_ACTIVE_LOW>;

View File

@ -46,19 +46,19 @@
gpio-keys {
compatible = "gpio-keys";
up {
key-up {
label = "UP";
linux,code = <BTN_1>;
gpios = <&gpio1_0 7 GPIO_ACTIVE_LOW>;
};
press {
key-press {
label = "PRESS";
linux,code = <BTN_0>;
gpios = <&gpio0 0 GPIO_ACTIVE_LOW>;
};
down {
key-down {
label = "DOWN";
linux,code = <BTN_2>;
gpios = <&gpio0 1 GPIO_ACTIVE_LOW>;

View File

@ -23,7 +23,7 @@
gpio-keys {
compatible = "gpio-keys";
boot {
key-boot {
label = "BOOT";
linux,code = <BTN_0>;
gpios = <&gpio0 0 GPIO_ACTIVE_LOW>;

View File

@ -50,6 +50,7 @@
riscv,isa = "rv64imafdc";
clocks = <&clkcfg CLK_CPU>;
tlb-split;
next-level-cache = <&cctrllr>;
status = "okay";
cpu1_intc: interrupt-controller {
@ -77,6 +78,7 @@
riscv,isa = "rv64imafdc";
clocks = <&clkcfg CLK_CPU>;
tlb-split;
next-level-cache = <&cctrllr>;
status = "okay";
cpu2_intc: interrupt-controller {
@ -104,6 +106,7 @@
riscv,isa = "rv64imafdc";
clocks = <&clkcfg CLK_CPU>;
tlb-split;
next-level-cache = <&cctrllr>;
status = "okay";
cpu3_intc: interrupt-controller {
@ -131,6 +134,7 @@
riscv,isa = "rv64imafdc";
clocks = <&clkcfg CLK_CPU>;
tlb-split;
next-level-cache = <&cctrllr>;
status = "okay";
cpu4_intc: interrupt-controller {
#interrupt-cells = <1>;

View File

@ -111,6 +111,7 @@ void __init_or_module sifive_errata_patch_func(struct alt_entry *begin,
cpu_apply_errata |= tmp;
}
}
if (cpu_apply_errata != cpu_req_errata)
if (stage != RISCV_ALTERNATIVES_MODULE &&
cpu_apply_errata != cpu_req_errata)
warn_miss_errata(cpu_req_errata - cpu_apply_errata);
}

View File

@ -175,7 +175,7 @@ static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot)
static inline unsigned long _pud_pfn(pud_t pud)
{
return pud_val(pud) >> _PAGE_PFN_SHIFT;
return __page_val_to_pfn(pud_val(pud));
}
static inline pmd_t *pud_pgtable(pud_t pud)
@ -278,13 +278,13 @@ static inline p4d_t pfn_p4d(unsigned long pfn, pgprot_t prot)
static inline unsigned long _p4d_pfn(p4d_t p4d)
{
return p4d_val(p4d) >> _PAGE_PFN_SHIFT;
return __page_val_to_pfn(p4d_val(p4d));
}
static inline pud_t *p4d_pgtable(p4d_t p4d)
{
if (pgtable_l4_enabled)
return (pud_t *)pfn_to_virt(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
return (pud_t *)pfn_to_virt(__page_val_to_pfn(p4d_val(p4d)));
return (pud_t *)pud_pgtable((pud_t) { p4d_val(p4d) });
}
@ -292,7 +292,7 @@ static inline pud_t *p4d_pgtable(p4d_t p4d)
static inline struct page *p4d_page(p4d_t p4d)
{
return pfn_to_page(p4d_val(p4d) >> _PAGE_PFN_SHIFT);
return pfn_to_page(__page_val_to_pfn(p4d_val(p4d)));
}
#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
@ -347,7 +347,7 @@ static inline void pgd_clear(pgd_t *pgd)
static inline p4d_t *pgd_pgtable(pgd_t pgd)
{
if (pgtable_l5_enabled)
return (p4d_t *)pfn_to_virt(pgd_val(pgd) >> _PAGE_PFN_SHIFT);
return (p4d_t *)pfn_to_virt(__page_val_to_pfn(pgd_val(pgd)));
return (p4d_t *)p4d_pgtable((p4d_t) { pgd_val(pgd) });
}
@ -355,7 +355,7 @@ static inline p4d_t *pgd_pgtable(pgd_t pgd)
static inline struct page *pgd_page(pgd_t pgd)
{
return pfn_to_page(pgd_val(pgd) >> _PAGE_PFN_SHIFT);
return pfn_to_page(__page_val_to_pfn(pgd_val(pgd)));
}
#define pgd_page(pgd) pgd_page(pgd)

View File

@ -261,7 +261,7 @@ static inline pgd_t pfn_pgd(unsigned long pfn, pgprot_t prot)
static inline unsigned long _pgd_pfn(pgd_t pgd)
{
return pgd_val(pgd) >> _PAGE_PFN_SHIFT;
return __page_val_to_pfn(pgd_val(pgd));
}
static inline struct page *pmd_page(pmd_t pmd)
@ -590,14 +590,14 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd)
return __pmd(pmd_val(pmd) & ~(_PAGE_PRESENT|_PAGE_PROT_NONE));
}
#define __pmd_to_phys(pmd) (pmd_val(pmd) >> _PAGE_PFN_SHIFT << PAGE_SHIFT)
#define __pmd_to_phys(pmd) (__page_val_to_pfn(pmd_val(pmd)) << PAGE_SHIFT)
static inline unsigned long pmd_pfn(pmd_t pmd)
{
return ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT);
}
#define __pud_to_phys(pud) (pud_val(pud) >> _PAGE_PFN_SHIFT << PAGE_SHIFT)
#define __pud_to_phys(pud) (__page_val_to_pfn(pud_val(pud)) << PAGE_SHIFT)
static inline unsigned long pud_pfn(pud_t pud)
{

View File

@ -78,7 +78,7 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
endif
obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
obj-$(CONFIG_KGDB) += kgdb.o
obj-$(CONFIG_KEXEC) += kexec_relocate.o crash_save_regs.o machine_kexec.o
obj-$(CONFIG_KEXEC_CORE) += kexec_relocate.o crash_save_regs.o machine_kexec.o
obj-$(CONFIG_KEXEC_FILE) += elf_kexec.o machine_kexec_file.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o

View File

@ -349,7 +349,7 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
{
const char *strtab, *name, *shstrtab;
const Elf_Shdr *sechdrs;
Elf_Rela *relas;
Elf64_Rela *relas;
int i, r_type;
/* String & section header string table */

View File

@ -54,7 +54,7 @@ static inline unsigned long gstage_pte_index(gpa_t addr, u32 level)
static inline unsigned long gstage_pte_page_vaddr(pte_t pte)
{
return (unsigned long)pfn_to_virt(pte_val(pte) >> _PAGE_PFN_SHIFT);
return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte)));
}
static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)

View File

@ -781,9 +781,11 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_request_pending(vcpu)) {
if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) {
kvm_vcpu_srcu_read_unlock(vcpu);
rcuwait_wait_event(wait,
(!vcpu->arch.power_off) && (!vcpu->arch.pause),
TASK_INTERRUPTIBLE);
kvm_vcpu_srcu_read_lock(vcpu);
if (vcpu->arch.power_off || vcpu->arch.pause) {
/*

View File

@ -204,6 +204,7 @@ config S390
select IOMMU_SUPPORT if PCI
select MMU_GATHER_NO_GATHER
select MMU_GATHER_RCU_TABLE_FREE
select MMU_GATHER_MERGE_VMAS
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE if PCI
select NEED_SG_DMA_LENGTH if PCI

View File

@ -82,7 +82,7 @@ endif
ifdef CONFIG_EXPOLINE
ifdef CONFIG_EXPOLINE_EXTERN
KBUILD_LDFLAGS_MODULE += arch/s390/lib/expoline.o
KBUILD_LDFLAGS_MODULE += arch/s390/lib/expoline/expoline.o
CC_FLAGS_EXPOLINE := -mindirect-branch=thunk-extern
CC_FLAGS_EXPOLINE += -mfunction-return=thunk-extern
else
@ -163,6 +163,12 @@ vdso_prepare: prepare0
$(Q)$(MAKE) $(build)=arch/s390/kernel/vdso64 include/generated/vdso64-offsets.h
$(if $(CONFIG_COMPAT),$(Q)$(MAKE) \
$(build)=arch/s390/kernel/vdso32 include/generated/vdso32-offsets.h)
ifdef CONFIG_EXPOLINE_EXTERN
modules_prepare: expoline_prepare
expoline_prepare:
$(Q)$(MAKE) $(build)=arch/s390/lib/expoline arch/s390/lib/expoline/expoline.o
endif
endif
# Don't use tabs in echo arguments

View File

@ -2,8 +2,6 @@
#ifndef _ASM_S390_NOSPEC_ASM_H
#define _ASM_S390_NOSPEC_ASM_H
#include <asm/alternative-asm.h>
#include <asm/asm-offsets.h>
#include <asm/dwarf.h>
#ifdef __ASSEMBLY__

View File

@ -27,9 +27,6 @@ static inline void tlb_flush(struct mmu_gather *tlb);
static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
struct page *page, int page_size);
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
#define tlb_flush tlb_flush
#define pte_free_tlb pte_free_tlb
#define pmd_free_tlb pmd_free_tlb

View File

@ -7,7 +7,6 @@ lib-y += delay.o string.o uaccess.o find.o spinlock.o
obj-y += mem.o xor.o
lib-$(CONFIG_KPROBES) += probes.o
lib-$(CONFIG_UPROBES) += probes.o
obj-$(CONFIG_EXPOLINE_EXTERN) += expoline.o
obj-$(CONFIG_S390_KPROBES_SANITY_TEST) += test_kprobes_s390.o
test_kprobes_s390-objs += test_kprobes_asm.o test_kprobes.o
@ -22,3 +21,5 @@ obj-$(CONFIG_S390_MODULES_SANITY_TEST) += test_modules.o
obj-$(CONFIG_S390_MODULES_SANITY_TEST_HELPERS) += test_modules_helpers.o
lib-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
obj-$(CONFIG_EXPOLINE_EXTERN) += expoline/

View File

@ -0,0 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
obj-y += expoline.o

View File

@ -271,8 +271,12 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
#endif /* CONFIG_HAVE_IOREMAP_PROT */
#else /* CONFIG_MMU */
#define iounmap(addr) do { } while (0)
#define ioremap(offset, size) ((void __iomem *)(unsigned long)(offset))
static inline void __iomem *ioremap(phys_addr_t offset, size_t size)
{
return (void __iomem *)(unsigned long)offset;
}
static inline void iounmap(volatile void __iomem *addr) { }
#endif /* CONFIG_MMU */
#define ioremap_uc ioremap

View File

@ -67,6 +67,8 @@ config SPARC64
select HAVE_KRETPROBES
select HAVE_KPROBES
select MMU_GATHER_RCU_TABLE_FREE if SMP
select MMU_GATHER_MERGE_VMAS
select MMU_GATHER_NO_FLUSH_CACHE
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD

View File

@ -22,8 +22,6 @@ void smp_flush_tlb_mm(struct mm_struct *mm);
void __flush_tlb_pending(unsigned long, unsigned long, unsigned long *);
void flush_tlb_pending(void);
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
#define tlb_flush(tlb) flush_tlb_pending()
/*

View File

@ -102,8 +102,8 @@ extern unsigned long uml_physmem;
* casting is the right thing, but 32-bit UML can't have 64-bit virtual
* addresses
*/
#define __pa(virt) to_phys((void *) (unsigned long) (virt))
#define __va(phys) to_virt((unsigned long) (phys))
#define __pa(virt) uml_to_phys((void *) (unsigned long) (virt))
#define __va(phys) uml_to_virt((unsigned long) (phys))
#define phys_to_pfn(p) ((p) >> PAGE_SHIFT)
#define pfn_to_phys(pfn) PFN_PHYS(pfn)

View File

@ -9,12 +9,12 @@
extern int phys_mapping(unsigned long phys, unsigned long long *offset_out);
extern unsigned long uml_physmem;
static inline unsigned long to_phys(void *virt)
static inline unsigned long uml_to_phys(void *virt)
{
return(((unsigned long) virt) - uml_physmem);
}
static inline void *to_virt(unsigned long phys)
static inline void *uml_to_virt(unsigned long phys)
{
return((void *) uml_physmem + phys);
}

View File

@ -432,6 +432,10 @@ void apply_retpolines(s32 *start, s32 *end)
{
}
void apply_returns(s32 *start, s32 *end)
{
}
void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
{
}

View File

@ -251,7 +251,7 @@ static int userspace_tramp(void *stack)
signal(SIGTERM, SIG_DFL);
signal(SIGWINCH, SIG_IGN);
fd = phys_mapping(to_phys(__syscall_stub_start), &offset);
fd = phys_mapping(uml_to_phys(__syscall_stub_start), &offset);
addr = mmap64((void *) STUB_CODE, UM_KERN_PAGE_SIZE,
PROT_EXEC, MAP_FIXED | MAP_PRIVATE, fd, offset);
if (addr == MAP_FAILED) {
@ -261,7 +261,7 @@ static int userspace_tramp(void *stack)
}
if (stack != NULL) {
fd = phys_mapping(to_phys(stack), &offset);
fd = phys_mapping(uml_to_phys(stack), &offset);
addr = mmap((void *) STUB_DATA,
UM_KERN_PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_FIXED | MAP_SHARED, fd, offset);
@ -534,7 +534,7 @@ int copy_context_skas0(unsigned long new_stack, int pid)
struct stub_data *data = (struct stub_data *) current_stack;
struct stub_data *child_data = (struct stub_data *) new_stack;
unsigned long long new_offset;
int new_fd = phys_mapping(to_phys((void *)new_stack), &new_offset);
int new_fd = phys_mapping(uml_to_phys((void *)new_stack), &new_offset);
/*
* prepare offset and fd of child's stack as argument for parent's

View File

@ -245,6 +245,7 @@ config X86
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select MMU_GATHER_RCU_TABLE_FREE if PARAVIRT
select MMU_GATHER_MERGE_VMAS
select HAVE_POSIX_CPU_TIMERS_TASK_WORK
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if UNWINDER_ORC || STACK_VALIDATION
@ -462,29 +463,6 @@ config GOLDFISH
def_bool y
depends on X86_GOLDFISH
config RETPOLINE
bool "Avoid speculative indirect branches in kernel"
select OBJTOOL if HAVE_OBJTOOL
default y
help
Compile kernel with the retpoline compiler options to guard against
kernel-to-user data leaks by avoiding speculative indirect
branches. Requires a compiler with -mindirect-branch=thunk-extern
support for full protection. The kernel may run slower.
config CC_HAS_SLS
def_bool $(cc-option,-mharden-sls=all)
config SLS
bool "Mitigate Straight-Line-Speculation"
depends on CC_HAS_SLS && X86_64
select OBJTOOL if HAVE_OBJTOOL
default n
help
Compile the kernel with straight-line-speculation options to guard
against straight line speculation. The kernel image might be slightly
larger.
config X86_CPU_RESCTRL
bool "x86 CPU resource control support"
depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
@ -2453,6 +2431,91 @@ source "kernel/livepatch/Kconfig"
endmenu
config CC_HAS_SLS
def_bool $(cc-option,-mharden-sls=all)
config CC_HAS_RETURN_THUNK
def_bool $(cc-option,-mfunction-return=thunk-extern)
menuconfig SPECULATION_MITIGATIONS
bool "Mitigations for speculative execution vulnerabilities"
default y
help
Say Y here to enable options which enable mitigations for
speculative execution hardware vulnerabilities.
If you say N, all mitigations will be disabled. You really
should know what you are doing to say so.
if SPECULATION_MITIGATIONS
config PAGE_TABLE_ISOLATION
bool "Remove the kernel mapping in user mode"
default y
depends on (X86_64 || X86_PAE)
help
This feature reduces the number of hardware side channels by
ensuring that the majority of kernel addresses are not mapped
into userspace.
See Documentation/x86/pti.rst for more details.
config RETPOLINE
bool "Avoid speculative indirect branches in kernel"
select OBJTOOL if HAVE_OBJTOOL
default y
help
Compile kernel with the retpoline compiler options to guard against
kernel-to-user data leaks by avoiding speculative indirect
branches. Requires a compiler with -mindirect-branch=thunk-extern
support for full protection. The kernel may run slower.
config RETHUNK
bool "Enable return-thunks"
depends on RETPOLINE && CC_HAS_RETURN_THUNK
select OBJTOOL if HAVE_OBJTOOL
default y if X86_64
help
Compile the kernel with the return-thunks compiler option to guard
against kernel-to-user data leaks by avoiding return speculation.
Requires a compiler with -mfunction-return=thunk-extern
support for full protection. The kernel may run slower.
config CPU_UNRET_ENTRY
bool "Enable UNRET on kernel entry"
depends on CPU_SUP_AMD && RETHUNK && X86_64
default y
help
Compile the kernel with support for the retbleed=unret mitigation.
config CPU_IBPB_ENTRY
bool "Enable IBPB on kernel entry"
depends on CPU_SUP_AMD && X86_64
default y
help
Compile the kernel with support for the retbleed=ibpb mitigation.
config CPU_IBRS_ENTRY
bool "Enable IBRS on kernel entry"
depends on CPU_SUP_INTEL && X86_64
default y
help
Compile the kernel with support for the spectre_v2=ibrs mitigation.
This mitigates both spectre_v2 and retbleed at great cost to
performance.
config SLS
bool "Mitigate Straight-Line-Speculation"
depends on CC_HAS_SLS && X86_64
select OBJTOOL if HAVE_OBJTOOL
default n
help
Compile the kernel with straight-line-speculation options to guard
against straight line speculation. The kernel image might be slightly
larger.
endif
config ARCH_HAS_ADD_PAGES
def_bool y
depends on ARCH_ENABLE_MEMORY_HOTPLUG

View File

@ -21,6 +21,13 @@ ifdef CONFIG_CC_IS_CLANG
RETPOLINE_CFLAGS := -mretpoline-external-thunk
RETPOLINE_VDSO_CFLAGS := -mretpoline
endif
ifdef CONFIG_RETHUNK
RETHUNK_CFLAGS := -mfunction-return=thunk-extern
RETPOLINE_CFLAGS += $(RETHUNK_CFLAGS)
endif
export RETHUNK_CFLAGS
export RETPOLINE_CFLAGS
export RETPOLINE_VDSO_CFLAGS

View File

@ -11,7 +11,7 @@ CFLAGS_REMOVE_common.o = $(CC_FLAGS_FTRACE)
CFLAGS_common.o += -fno-stack-protector
obj-y := entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
obj-y := entry.o entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
obj-y += common.o
obj-y += vdso/

View File

@ -7,6 +7,8 @@
#include <asm/asm-offsets.h>
#include <asm/processor-flags.h>
#include <asm/ptrace-abi.h>
#include <asm/msr.h>
#include <asm/nospec-branch.h>
/*
@ -282,6 +284,66 @@ For 32-bit we have the following conventions - kernel is built with
#endif
/*
* IBRS kernel mitigation for Spectre_v2.
*
* Assumes full context is established (PUSH_REGS, CR3 and GS) and it clobbers
* the regs it uses (AX, CX, DX). Must be called before the first RET
* instruction (NOTE! UNTRAIN_RET includes a RET instruction)
*
* The optional argument is used to save/restore the current value,
* which is used on the paranoid paths.
*
* Assumes x86_spec_ctrl_{base,current} to have SPEC_CTRL_IBRS set.
*/
.macro IBRS_ENTER save_reg
#ifdef CONFIG_CPU_IBRS_ENTRY
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_KERNEL_IBRS
movl $MSR_IA32_SPEC_CTRL, %ecx
.ifnb \save_reg
rdmsr
shl $32, %rdx
or %rdx, %rax
mov %rax, \save_reg
test $SPEC_CTRL_IBRS, %eax
jz .Ldo_wrmsr_\@
lfence
jmp .Lend_\@
.Ldo_wrmsr_\@:
.endif
movq PER_CPU_VAR(x86_spec_ctrl_current), %rdx
movl %edx, %eax
shr $32, %rdx
wrmsr
.Lend_\@:
#endif
.endm
/*
* Similar to IBRS_ENTER, requires KERNEL GS,CR3 and clobbers (AX, CX, DX)
* regs. Must be called after the last RET.
*/
.macro IBRS_EXIT save_reg
#ifdef CONFIG_CPU_IBRS_ENTRY
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_KERNEL_IBRS
movl $MSR_IA32_SPEC_CTRL, %ecx
.ifnb \save_reg
mov \save_reg, %rdx
.else
movq PER_CPU_VAR(x86_spec_ctrl_current), %rdx
andl $(~SPEC_CTRL_IBRS), %edx
.endif
movl %edx, %eax
shr $32, %rdx
wrmsr
.Lend_\@:
#endif
.endm
/*
* Mitigate Spectre v1 for conditional swapgs code paths.
*

22
arch/x86/entry/entry.S Normal file
View File

@ -0,0 +1,22 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Common place for both 32- and 64-bit entry routines.
*/
#include <linux/linkage.h>
#include <asm/export.h>
#include <asm/msr-index.h>
.pushsection .noinstr.text, "ax"
SYM_FUNC_START(entry_ibpb)
movl $MSR_IA32_PRED_CMD, %ecx
movl $PRED_CMD_IBPB, %eax
xorl %edx, %edx
wrmsr
RET
SYM_FUNC_END(entry_ibpb)
/* For KVM */
EXPORT_SYMBOL_GPL(entry_ibpb);
.popsection

View File

@ -698,7 +698,6 @@ SYM_CODE_START(__switch_to_asm)
movl %ebx, PER_CPU_VAR(__stack_chk_guard)
#endif
#ifdef CONFIG_RETPOLINE
/*
* When switching from a shallower to a deeper call stack
* the RSB may either underflow or use entries populated
@ -707,7 +706,6 @@ SYM_CODE_START(__switch_to_asm)
* speculative execution to prevent attack.
*/
FILL_RETURN_BUFFER %ebx, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
#endif
/* Restore flags or the incoming task to restore AC state. */
popfl

View File

@ -85,7 +85,7 @@
*/
SYM_CODE_START(entry_SYSCALL_64)
UNWIND_HINT_EMPTY
UNWIND_HINT_ENTRY
ENDBR
swapgs
@ -112,6 +112,11 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
movq %rsp, %rdi
/* Sign extend the lower 32bit as syscall numbers are treated as int */
movslq %eax, %rsi
/* clobbers %rax, make sure it is after saving the syscall nr */
IBRS_ENTER
UNTRAIN_RET
call do_syscall_64 /* returns with IRQs disabled */
/*
@ -191,6 +196,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
* perf profiles. Nothing jumps here.
*/
syscall_return_via_sysret:
IBRS_EXIT
POP_REGS pop_rdi=0
/*
@ -249,7 +255,6 @@ SYM_FUNC_START(__switch_to_asm)
movq %rbx, PER_CPU_VAR(fixed_percpu_data) + stack_canary_offset
#endif
#ifdef CONFIG_RETPOLINE
/*
* When switching from a shallower to a deeper call stack
* the RSB may either underflow or use entries populated
@ -258,7 +263,6 @@ SYM_FUNC_START(__switch_to_asm)
* speculative execution to prevent attack.
*/
FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
#endif
/* restore callee-saved registers */
popq %r15
@ -322,13 +326,13 @@ SYM_CODE_END(ret_from_fork)
#endif
.endm
/* Save all registers in pt_regs */
SYM_CODE_START_LOCAL(push_and_clear_regs)
SYM_CODE_START_LOCAL(xen_error_entry)
UNWIND_HINT_FUNC
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
UNTRAIN_RET
RET
SYM_CODE_END(push_and_clear_regs)
SYM_CODE_END(xen_error_entry)
/**
* idtentry_body - Macro to emit code calling the C function
@ -337,9 +341,6 @@ SYM_CODE_END(push_and_clear_regs)
*/
.macro idtentry_body cfunc has_error_code:req
call push_and_clear_regs
UNWIND_HINT_REGS
/*
* Call error_entry() and switch to the task stack if from userspace.
*
@ -349,7 +350,7 @@ SYM_CODE_END(push_and_clear_regs)
* switch the CR3. So it can skip invoking error_entry().
*/
ALTERNATIVE "call error_entry; movq %rax, %rsp", \
"", X86_FEATURE_XENPV
"call xen_error_entry", X86_FEATURE_XENPV
ENCODE_FRAME_POINTER
UNWIND_HINT_REGS
@ -612,6 +613,7 @@ __irqentry_text_end:
SYM_CODE_START_LOCAL(common_interrupt_return)
SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
IBRS_EXIT
#ifdef CONFIG_DEBUG_ENTRY
/* Assert that pt_regs indicates user mode. */
testb $3, CS(%rsp)
@ -897,6 +899,9 @@ SYM_CODE_END(xen_failsafe_callback)
* 1 -> no SWAPGS on exit
*
* Y GSBASE value at entry, must be restored in paranoid_exit
*
* R14 - old CR3
* R15 - old SPEC_CTRL
*/
SYM_CODE_START_LOCAL(paranoid_entry)
UNWIND_HINT_FUNC
@ -940,7 +945,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
* is needed here.
*/
SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
RET
jmp .Lparanoid_gsbase_done
.Lparanoid_entry_checkgs:
/* EBX = 1 -> kernel GSBASE active, no restore required */
@ -959,8 +964,16 @@ SYM_CODE_START_LOCAL(paranoid_entry)
xorl %ebx, %ebx
swapgs
.Lparanoid_kernel_gsbase:
FENCE_SWAPGS_KERNEL_ENTRY
.Lparanoid_gsbase_done:
/*
* Once we have CR3 and %GS setup save and set SPEC_CTRL. Just like
* CR3 above, keep the old value in a callee saved register.
*/
IBRS_ENTER save_reg=%r15
UNTRAIN_RET
RET
SYM_CODE_END(paranoid_entry)
@ -982,9 +995,19 @@ SYM_CODE_END(paranoid_entry)
* 1 -> no SWAPGS on exit
*
* Y User space GSBASE, must be restored unconditionally
*
* R14 - old CR3
* R15 - old SPEC_CTRL
*/
SYM_CODE_START_LOCAL(paranoid_exit)
UNWIND_HINT_REGS
/*
* Must restore IBRS state before both CR3 and %GS since we need access
* to the per-CPU x86_spec_ctrl_shadow variable.
*/
IBRS_EXIT save_reg=%r15
/*
* The order of operations is important. RESTORE_CR3 requires
* kernel GSBASE.
@ -1017,6 +1040,10 @@ SYM_CODE_END(paranoid_exit)
*/
SYM_CODE_START_LOCAL(error_entry)
UNWIND_HINT_FUNC
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace
@ -1028,9 +1055,12 @@ SYM_CODE_START_LOCAL(error_entry)
FENCE_SWAPGS_USER_ENTRY
/* We have user CR3. Change to kernel CR3. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
IBRS_ENTER
UNTRAIN_RET
leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
.Lerror_entry_from_usermode_after_swapgs:
/* Put us onto the real thread stack. */
call sync_regs
RET
@ -1065,6 +1095,7 @@ SYM_CODE_START_LOCAL(error_entry)
.Lerror_entry_done_lfence:
FENCE_SWAPGS_KERNEL_ENTRY
leaq 8(%rsp), %rax /* return pt_regs pointer */
ANNOTATE_UNRET_END
RET
.Lbstep_iret:
@ -1080,6 +1111,8 @@ SYM_CODE_START_LOCAL(error_entry)
swapgs
FENCE_SWAPGS_USER_ENTRY
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
IBRS_ENTER
UNTRAIN_RET
/*
* Pretend that the exception came from user mode: set up pt_regs
@ -1185,6 +1218,9 @@ SYM_CODE_START(asm_exc_nmi)
PUSH_AND_CLEAR_REGS rdx=(%rdx)
ENCODE_FRAME_POINTER
IBRS_ENTER
UNTRAIN_RET
/*
* At this point we no longer need to worry about stack damage
* due to nesting -- we're on the normal thread stack and we're
@ -1409,6 +1445,9 @@ end_repeat_nmi:
movq $-1, %rsi
call exc_nmi
/* Always restore stashed SPEC_CTRL value (see paranoid_entry) */
IBRS_EXIT save_reg=%r15
/* Always restore stashed CR3 value (see paranoid_entry) */
RESTORE_CR3 scratch_reg=%r15 save_reg=%r14

View File

@ -4,7 +4,6 @@
*
* Copyright 2000-2002 Andi Kleen, SuSE Labs.
*/
#include "calling.h"
#include <asm/asm-offsets.h>
#include <asm/current.h>
#include <asm/errno.h>
@ -14,9 +13,12 @@
#include <asm/irqflags.h>
#include <asm/asm.h>
#include <asm/smap.h>
#include <asm/nospec-branch.h>
#include <linux/linkage.h>
#include <linux/err.h>
#include "calling.h"
.section .entry.text, "ax"
/*
@ -47,7 +49,7 @@
* 0(%ebp) arg6
*/
SYM_CODE_START(entry_SYSENTER_compat)
UNWIND_HINT_EMPTY
UNWIND_HINT_ENTRY
ENDBR
/* Interrupts are off on entry. */
swapgs
@ -88,6 +90,9 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
cld
IBRS_ENTER
UNTRAIN_RET
/*
* SYSENTER doesn't filter flags, so we need to clear NT and AC
* ourselves. To save a few cycles, we can check whether
@ -174,7 +179,7 @@ SYM_CODE_END(entry_SYSENTER_compat)
* 0(%esp) arg6
*/
SYM_CODE_START(entry_SYSCALL_compat)
UNWIND_HINT_EMPTY
UNWIND_HINT_ENTRY
ENDBR
/* Interrupts are off on entry. */
swapgs
@ -203,6 +208,9 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
PUSH_AND_CLEAR_REGS rcx=%rbp rax=$-ENOSYS
UNWIND_HINT_REGS
IBRS_ENTER
UNTRAIN_RET
movq %rsp, %rdi
call do_fast_syscall_32
/* XEN PV guests always use IRET path */
@ -217,6 +225,8 @@ sysret32_from_system_call:
*/
STACKLEAK_ERASE
IBRS_EXIT
movq RBX(%rsp), %rbx /* pt_regs->rbx */
movq RBP(%rsp), %rbp /* pt_regs->rbp */
movq EFLAGS(%rsp), %r11 /* pt_regs->flags (in r11) */
@ -295,7 +305,7 @@ SYM_CODE_END(entry_SYSCALL_compat)
* ebp arg6
*/
SYM_CODE_START(entry_INT80_compat)
UNWIND_HINT_EMPTY
UNWIND_HINT_ENTRY
ENDBR
/*
* Interrupts are off on entry.
@ -337,6 +347,9 @@ SYM_CODE_START(entry_INT80_compat)
cld
IBRS_ENTER
UNTRAIN_RET
movq %rsp, %rdi
call do_int80_syscall_32
jmp swapgs_restore_regs_and_return_to_usermode

View File

@ -92,6 +92,7 @@ endif
endif
$(vobjs): KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO) $(RANDSTRUCT_CFLAGS) $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
$(vobjs): KBUILD_AFLAGS += -DBUILD_VDSO
#
# vDSO code runs in userspace and -pg doesn't help with profiling anyway.

View File

@ -19,17 +19,20 @@ __vsyscall_page:
mov $__NR_gettimeofday, %rax
syscall
RET
ret
int3
.balign 1024, 0xcc
mov $__NR_time, %rax
syscall
RET
ret
int3
.balign 1024, 0xcc
mov $__NR_getcpu, %rax
syscall
RET
ret
int3
.balign 4096, 0xcc

View File

@ -278,9 +278,9 @@ enum {
};
/*
* For formats with LBR_TSX flags (e.g. LBR_FORMAT_EIP_FLAGS2), bits 61:62 in
* MSR_LAST_BRANCH_FROM_x are the TSX flags when TSX is supported, but when
* TSX is not supported they have no consistent behavior:
* For format LBR_FORMAT_EIP_FLAGS2, bits 61:62 in MSR_LAST_BRANCH_FROM_x
* are the TSX flags when TSX is supported, but when TSX is not supported
* they have no consistent behavior:
*
* - For wrmsr(), bits 61:62 are considered part of the sign extension.
* - For HW updates (branch captures) bits 61:62 are always OFF and are not
@ -288,7 +288,7 @@ enum {
*
* Therefore, if:
*
* 1) LBR has TSX format
* 1) LBR format LBR_FORMAT_EIP_FLAGS2
* 2) CPU has no TSX support enabled
*
* ... then any value passed to wrmsr() must be sign extended to 63 bits and any
@ -300,7 +300,7 @@ static inline bool lbr_from_signext_quirk_needed(void)
bool tsx_support = boot_cpu_has(X86_FEATURE_HLE) ||
boot_cpu_has(X86_FEATURE_RTM);
return !tsx_support && x86_pmu.lbr_has_tsx;
return !tsx_support;
}
static DEFINE_STATIC_KEY_FALSE(lbr_from_quirk_key);
@ -1609,9 +1609,6 @@ void intel_pmu_lbr_init_hsw(void)
x86_pmu.lbr_sel_map = hsw_lbr_sel_map;
x86_get_pmu(smp_processor_id())->task_ctx_cache = create_lbr_kmem_cache(size, 0);
if (lbr_from_signext_quirk_needed())
static_branch_enable(&lbr_from_quirk_key);
}
/* skylake */
@ -1702,7 +1699,11 @@ void intel_pmu_lbr_init(void)
switch (x86_pmu.intel_cap.lbr_format) {
case LBR_FORMAT_EIP_FLAGS2:
x86_pmu.lbr_has_tsx = 1;
fallthrough;
x86_pmu.lbr_from_flags = 1;
if (lbr_from_signext_quirk_needed())
static_branch_enable(&lbr_from_quirk_key);
break;
case LBR_FORMAT_EIP_FLAGS:
x86_pmu.lbr_from_flags = 1;
break;

View File

@ -76,6 +76,7 @@ extern int alternatives_patched;
extern void alternative_instructions(void);
extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
extern void apply_retpolines(s32 *start, s32 *end);
extern void apply_returns(s32 *start, s32 *end);
extern void apply_ibt_endbr(s32 *start, s32 *end);
struct module;

View File

@ -203,8 +203,8 @@
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
#define X86_FEATURE_XCOMPACTED ( 7*32+10) /* "" Use compacted XSTATE (XSAVES or XSAVEC) */
#define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */
#define X86_FEATURE_RETPOLINE ( 7*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_RETPOLINE_LFENCE ( 7*32+13) /* "" Use LFENCE for Spectre variant 2 */
#define X86_FEATURE_KERNEL_IBRS ( 7*32+12) /* "" Set/clear IBRS on kernel entry/exit */
#define X86_FEATURE_RSB_VMEXIT ( 7*32+13) /* "" Fill RSB on VM-Exit */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_CDP_L2 ( 7*32+15) /* Code and Data Prioritization L2 */
#define X86_FEATURE_MSR_SPEC_CTRL ( 7*32+16) /* "" MSR SPEC_CTRL is implemented */
@ -296,6 +296,13 @@
#define X86_FEATURE_PER_THREAD_MBA (11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
#define X86_FEATURE_SGX1 (11*32+ 8) /* "" Basic SGX */
#define X86_FEATURE_SGX2 (11*32+ 9) /* "" SGX Enclave Dynamic Memory Management (EDMM) */
#define X86_FEATURE_ENTRY_IBPB (11*32+10) /* "" Issue an IBPB on kernel entry */
#define X86_FEATURE_RRSBA_CTRL (11*32+11) /* "" RET prediction control */
#define X86_FEATURE_RETPOLINE (11*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */
#define X86_FEATURE_RETHUNK (11*32+14) /* "" Use REturn THUNK */
#define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */
#define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
@ -316,6 +323,7 @@
#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
#define X86_FEATURE_CPPC (13*32+27) /* Collaborative Processor Performance Control */
#define X86_FEATURE_BTC_NO (13*32+29) /* "" Not vulnerable to Branch Type Confusion */
#define X86_FEATURE_BRS (13*32+31) /* Branch Sampling available */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
@ -447,5 +455,6 @@
#define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
#define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */
#define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */
#define X86_BUG_RETBLEED X86_BUG(26) /* CPU is affected by RETBleed */
#endif /* _ASM_X86_CPUFEATURES_H */

View File

@ -50,6 +50,25 @@
# define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31))
#endif
#ifdef CONFIG_RETPOLINE
# define DISABLE_RETPOLINE 0
#else
# define DISABLE_RETPOLINE ((1 << (X86_FEATURE_RETPOLINE & 31)) | \
(1 << (X86_FEATURE_RETPOLINE_LFENCE & 31)))
#endif
#ifdef CONFIG_RETHUNK
# define DISABLE_RETHUNK 0
#else
# define DISABLE_RETHUNK (1 << (X86_FEATURE_RETHUNK & 31))
#endif
#ifdef CONFIG_CPU_UNRET_ENTRY
# define DISABLE_UNRET 0
#else
# define DISABLE_UNRET (1 << (X86_FEATURE_UNRET & 31))
#endif
#ifdef CONFIG_INTEL_IOMMU_SVM
# define DISABLE_ENQCMD 0
#else
@ -82,7 +101,7 @@
#define DISABLED_MASK8 (DISABLE_TDX_GUEST)
#define DISABLED_MASK9 (DISABLE_SGX)
#define DISABLED_MASK10 0
#define DISABLED_MASK11 0
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET)
#define DISABLED_MASK12 0
#define DISABLED_MASK13 0
#define DISABLED_MASK14 0

View File

@ -19,19 +19,27 @@
#define __ALIGN_STR __stringify(__ALIGN)
#endif
#if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO)
#define RET jmp __x86_return_thunk
#else /* CONFIG_RETPOLINE */
#ifdef CONFIG_SLS
#define RET ret; int3
#else
#define RET ret
#endif
#endif /* CONFIG_RETPOLINE */
#else /* __ASSEMBLY__ */
#if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO)
#define ASM_RET "jmp __x86_return_thunk\n\t"
#else /* CONFIG_RETPOLINE */
#ifdef CONFIG_SLS
#define ASM_RET "ret; int3\n\t"
#else
#define ASM_RET "ret\n\t"
#endif
#endif /* CONFIG_RETPOLINE */
#endif /* __ASSEMBLY__ */

View File

@ -51,6 +51,8 @@
#define SPEC_CTRL_STIBP BIT(SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */
#define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */
#define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
#define SPEC_CTRL_RRSBA_DIS_S_SHIFT 6 /* Disable RRSBA behavior */
#define SPEC_CTRL_RRSBA_DIS_S BIT(SPEC_CTRL_RRSBA_DIS_S_SHIFT)
#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */
#define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */
@ -93,6 +95,7 @@
#define MSR_IA32_ARCH_CAPABILITIES 0x0000010a
#define ARCH_CAP_RDCL_NO BIT(0) /* Not susceptible to Meltdown */
#define ARCH_CAP_IBRS_ALL BIT(1) /* Enhanced IBRS support */
#define ARCH_CAP_RSBA BIT(2) /* RET may use alternative branch predictors */
#define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH BIT(3) /* Skip L1D flush on vmentry */
#define ARCH_CAP_SSB_NO BIT(4) /*
* Not susceptible to Speculative Store Bypass
@ -140,6 +143,13 @@
* bit available to control VERW
* behavior.
*/
#define ARCH_CAP_RRSBA BIT(19) /*
* Indicates RET may use predictors
* other than the RSB. With eIBRS
* enabled predictions in kernel mode
* are restricted to targets in
* kernel.
*/
#define MSR_IA32_FLUSH_CMD 0x0000010b
#define L1D_FLUSH BIT(0) /*
@ -567,6 +577,9 @@
/* Fam 17h MSRs */
#define MSR_F17H_IRPERF 0xc00000e9
#define MSR_ZEN2_SPECTRAL_CHICKEN 0xc00110e3
#define MSR_ZEN2_SPECTRAL_CHICKEN_BIT BIT_ULL(1)
/* Fam 16h MSRs */
#define MSR_F16H_L2I_PERF_CTL 0xc0010230
#define MSR_F16H_L2I_PERF_CTR 0xc0010231

View File

@ -11,6 +11,7 @@
#include <asm/cpufeatures.h>
#include <asm/msr-index.h>
#include <asm/unwind_hints.h>
#include <asm/percpu.h>
#define RETPOLINE_THUNK_SIZE 32
@ -75,6 +76,23 @@
.popsection
.endm
/*
* (ab)use RETPOLINE_SAFE on RET to annotate away 'bare' RET instructions
* vs RETBleed validation.
*/
#define ANNOTATE_UNRET_SAFE ANNOTATE_RETPOLINE_SAFE
/*
* Abuse ANNOTATE_RETPOLINE_SAFE on a NOP to indicate UNRET_END, should
* eventually turn into it's own annotation.
*/
.macro ANNOTATE_UNRET_END
#ifdef CONFIG_DEBUG_ENTRY
ANNOTATE_RETPOLINE_SAFE
nop
#endif
.endm
/*
* JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple
* indirect jmp/call which may be susceptible to the Spectre variant 2
@ -105,10 +123,34 @@
* monstrosity above, manually.
*/
.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
#ifdef CONFIG_RETPOLINE
ALTERNATIVE "jmp .Lskip_rsb_\@", "", \ftr
__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)
.Lskip_rsb_\@:
.endm
#ifdef CONFIG_CPU_UNRET_ENTRY
#define CALL_ZEN_UNTRAIN_RET "call zen_untrain_ret"
#else
#define CALL_ZEN_UNTRAIN_RET ""
#endif
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
* return thunk isn't mapped into the userspace tables (then again, AMD
* typically has NO_MELTDOWN).
*
* While zen_untrain_ret() doesn't clobber anything but requires stack,
* entry_ibpb() will clobber AX, CX, DX.
*
* As such, this must be placed after every *SWITCH_TO_KERNEL_CR3 at a point
* where we have a stack but before any RET instruction.
*/
.macro UNTRAIN_RET
#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY)
ANNOTATE_UNRET_END
ALTERNATIVE_2 "", \
CALL_ZEN_UNTRAIN_RET, X86_FEATURE_UNRET, \
"call entry_ibpb", X86_FEATURE_ENTRY_IBPB
#endif
.endm
@ -120,17 +162,20 @@
_ASM_PTR " 999b\n\t" \
".popsection\n\t"
#ifdef CONFIG_RETPOLINE
typedef u8 retpoline_thunk_t[RETPOLINE_THUNK_SIZE];
extern retpoline_thunk_t __x86_indirect_thunk_array[];
extern void __x86_return_thunk(void);
extern void zen_untrain_ret(void);
extern void entry_ibpb(void);
#ifdef CONFIG_RETPOLINE
#define GEN(reg) \
extern retpoline_thunk_t __x86_indirect_thunk_ ## reg;
#include <asm/GEN-for-each-reg.h>
#undef GEN
extern retpoline_thunk_t __x86_indirect_thunk_array[];
#ifdef CONFIG_X86_64
/*
@ -193,6 +238,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_EIBRS,
SPECTRE_V2_EIBRS_RETPOLINE,
SPECTRE_V2_EIBRS_LFENCE,
SPECTRE_V2_IBRS,
};
/* The indirect branch speculation control variants */
@ -235,6 +281,9 @@ static inline void indirect_branch_prediction_barrier(void)
/* The Intel SPEC CTRL MSR base value cache */
extern u64 x86_spec_ctrl_base;
DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
extern void write_spec_ctrl_current(u64 val, bool force);
extern u64 spec_ctrl_current(void);
/*
* With retpoline, we must use IBRS to restrict branch prediction
@ -244,18 +293,18 @@ extern u64 x86_spec_ctrl_base;
*/
#define firmware_restrict_branch_speculation_start() \
do { \
u64 val = x86_spec_ctrl_base | SPEC_CTRL_IBRS; \
\
preempt_disable(); \
alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \
alternative_msr_write(MSR_IA32_SPEC_CTRL, \
spec_ctrl_current() | SPEC_CTRL_IBRS, \
X86_FEATURE_USE_IBRS_FW); \
alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, \
X86_FEATURE_USE_IBPB_FW); \
} while (0)
#define firmware_restrict_branch_speculation_end() \
do { \
u64 val = x86_spec_ctrl_base; \
\
alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \
alternative_msr_write(MSR_IA32_SPEC_CTRL, \
spec_ctrl_current(), \
X86_FEATURE_USE_IBRS_FW); \
preempt_enable(); \
} while (0)

View File

@ -21,6 +21,16 @@
* relative displacement across sections.
*/
/*
* The trampoline is 8 bytes and of the general form:
*
* jmp.d32 \func
* ud1 %esp, %ecx
*
* That trailing #UD provides both a speculation stop and serves as a unique
* 3 byte signature identifying static call trampolines. Also see tramp_ud[]
* and __static_call_fixup().
*/
#define __ARCH_DEFINE_STATIC_CALL_TRAMP(name, insns) \
asm(".pushsection .static_call.text, \"ax\" \n" \
".align 4 \n" \
@ -28,7 +38,7 @@
STATIC_CALL_TRAMP_STR(name) ": \n" \
ANNOTATE_NOENDBR \
insns " \n" \
".byte 0x53, 0x43, 0x54 \n" \
".byte 0x0f, 0xb9, 0xcc \n" \
".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
".popsection \n")
@ -36,8 +46,13 @@
#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) \
__ARCH_DEFINE_STATIC_CALL_TRAMP(name, ".byte 0xe9; .long " #func " - (. + 4)")
#ifdef CONFIG_RETHUNK
#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \
__ARCH_DEFINE_STATIC_CALL_TRAMP(name, "jmp __x86_return_thunk")
#else
#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \
__ARCH_DEFINE_STATIC_CALL_TRAMP(name, "ret; int3; nop; nop; nop")
#endif
#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name) \
ARCH_DEFINE_STATIC_CALL_TRAMP(name, __static_call_return0)
@ -48,4 +63,6 @@
".long " STATIC_CALL_KEY_STR(name) " - . \n" \
".popsection \n")
extern bool __static_call_fixup(void *tramp, u8 op, void *dest);
#endif /* _ASM_STATIC_CALL_H */

View File

@ -2,9 +2,6 @@
#ifndef _ASM_X86_TLB_H
#define _ASM_X86_TLB_H
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
#define tlb_flush tlb_flush
static inline void tlb_flush(struct mmu_gather *tlb);

View File

@ -8,7 +8,11 @@
#ifdef __ASSEMBLY__
.macro UNWIND_HINT_EMPTY
UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1
UNWIND_HINT type=UNWIND_HINT_TYPE_CALL end=1
.endm
.macro UNWIND_HINT_ENTRY
UNWIND_HINT type=UNWIND_HINT_TYPE_ENTRY end=1
.endm
.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0
@ -52,6 +56,14 @@
UNWIND_HINT sp_reg=ORC_REG_SP sp_offset=8 type=UNWIND_HINT_TYPE_FUNC
.endm
.macro UNWIND_HINT_SAVE
UNWIND_HINT type=UNWIND_HINT_TYPE_SAVE
.endm
.macro UNWIND_HINT_RESTORE
UNWIND_HINT type=UNWIND_HINT_TYPE_RESTORE
.endm
#else
#define UNWIND_HINT_FUNC \

View File

@ -16,6 +16,12 @@ bool cpc_supported_by_cpu(void)
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
case X86_VENDOR_HYGON:
if (boot_cpu_data.x86 == 0x19 && ((boot_cpu_data.x86_model <= 0x0f) ||
(boot_cpu_data.x86_model >= 0x20 && boot_cpu_data.x86_model <= 0x2f)))
return true;
else if (boot_cpu_data.x86 == 0x17 &&
boot_cpu_data.x86_model >= 0x70 && boot_cpu_data.x86_model <= 0x7f)
return true;
return boot_cpu_has(X86_FEATURE_CPPC);
}
return false;

View File

@ -115,6 +115,7 @@ static void __init_or_module add_nops(void *insns, unsigned int len)
}
extern s32 __retpoline_sites[], __retpoline_sites_end[];
extern s32 __return_sites[], __return_sites_end[];
extern s32 __ibt_endbr_seal[], __ibt_endbr_seal_end[];
extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern s32 __smp_locks[], __smp_locks_end[];
@ -507,9 +508,78 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
}
}
#ifdef CONFIG_RETHUNK
/*
* Rewrite the compiler generated return thunk tail-calls.
*
* For example, convert:
*
* JMP __x86_return_thunk
*
* into:
*
* RET
*/
static int patch_return(void *addr, struct insn *insn, u8 *bytes)
{
int i = 0;
if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
return -1;
bytes[i++] = RET_INSN_OPCODE;
for (; i < insn->length;)
bytes[i++] = INT3_INSN_OPCODE;
return i;
}
void __init_or_module noinline apply_returns(s32 *start, s32 *end)
{
s32 *s;
for (s = start; s < end; s++) {
void *dest = NULL, *addr = (void *)s + *s;
struct insn insn;
int len, ret;
u8 bytes[16];
u8 op;
ret = insn_decode_kernel(&insn, addr);
if (WARN_ON_ONCE(ret < 0))
continue;
op = insn.opcode.bytes[0];
if (op == JMP32_INSN_OPCODE)
dest = addr + insn.length + insn.immediate.value;
if (__static_call_fixup(addr, op, dest) ||
WARN_ONCE(dest != &__x86_return_thunk,
"missing return thunk: %pS-%pS: %*ph",
addr, dest, 5, addr))
continue;
DPRINTK("return thunk at: %pS (%px) len: %d to: %pS",
addr, addr, insn.length,
addr + insn.length + insn.immediate.value);
len = patch_return(addr, &insn, bytes);
if (len == insn.length) {
DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr);
DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
text_poke_early(addr, bytes, len);
}
}
}
#else
void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }
#endif /* CONFIG_RETHUNK */
#else /* !CONFIG_RETPOLINE || !CONFIG_OBJTOOL */
void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }
#endif /* CONFIG_RETPOLINE && CONFIG_OBJTOOL */
@ -860,6 +930,7 @@ void __init alternative_instructions(void)
* those can rewrite the retpoline thunks.
*/
apply_retpolines(__retpoline_sites, __retpoline_sites_end);
apply_returns(__return_sites, __return_sites_end);
/*
* Then patch alternatives, such that those paravirt calls that are in

Some files were not shown because too many files have changed in this diff Show More